pipeline.utils.graphutils package#
A package that defines common utilies to handle graphs in PyTorch geometric.
pipeline.utils.graphutils.batch2df module#
A python module that allows to build dataframes directly from a batch PyTorch data object.
- pipeline.utils.graphutils.batch2df.get_df_edges(batch, df_hits_particles=None, combine_particle_id=False)[source]#
Get the dataframe of edges.
- Parameters:
batch (
Data
) – PyTorch Data object that contains the tensorsedge_index
andy
.df_hits_particles (
Optional
[TypeVar
(DataFrame
,DataFrame
,DataFrame
)]) – Optional dataframe of hits-particles to merge to to the left and right hits of the dataframe of edges
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)- Returns:
Dataframe of edges, with columns
hit_idx_left
,hit_idx_right
,y
,edge_idx
and the columns provided indf_hits_particles
suffixed by_left
and_right
.
- pipeline.utils.graphutils.batch2df.get_df_edges_from_edge_index(edge_index, tensors=None)[source]#
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)
- pipeline.utils.graphutils.batch2df.get_df_hits(batch, hit_columns)[source]#
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)
- pipeline.utils.graphutils.batch2df.get_df_hits_particles(batch, particle_columns=None)[source]#
Get the dataframe of hits-particles.
- Parameters:
batch (
Data
) – PyTorch Data object that contains the tensorparticle_id_hit_idx
particle_columns (
Optional
[List
[str
]]) – A list of particle columns to merge to the outputted dataframe. The particle column names are expected to be prefixed byparticle_
inbatch
.
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)- Returns:
Dataframe of hits-particles with columns
particle_id
,hit_idx
,hit_particle_idx
and the columnsparticle_columns
.
- pipeline.utils.graphutils.batch2df.get_df_hits_particles_from_particle_id_hit_idx(particle_id_hit_idx)[source]#
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)
- pipeline.utils.graphutils.batch2df.get_df_triplets_from_triplet_index(triplet_index)[source]#
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)
- pipeline.utils.graphutils.batch2df.merge_df_hits_particles_to_edges(df_edges, df_hits_particles, combine_particle_id=False)[source]#
Merge the dataframe of edges to the left and right hits of the dataframe of edges.
- Parameters:
df_edges (
TypeVar
(DataFrame
,DataFrame
,DataFrame
)) – Dataframe of edges, at least with columnshit_idx_left
,hit_idx_right
df_hits_particles (
TypeVar
(DataFrame
,DataFrame
,DataFrame
)) – Dataframe of hits particles, at least with columnhit_idx
andparticle_id
combine_particle_id (
bool
) – whether to combineparticle_id_left
andparticle_id_right
intoparticle_id
- Return type:
TypeVar
(DataFrame
,DataFrame
,DataFrame
)- Returns:
Dataframe of edges-particles.
pipeline.utils.graphutils.edgebuilding module#
A module that allows to build edges in various ways.
- pipeline.utils.graphutils.edgebuilding.get_random_pairs_plane_by_plane(n_random, planes, query_indices, n_planes, plane_range=None)[source]#
Build random edges from query hits.
- Parameters:
n_random (
int
) – Number of random pairs by query pointplanes (
Tensor
) – 1D tensor of planes of all the pointsquery_indices (
Tensor
) – indices of the query pointsplane_range (
Optional
[int
]) – for each plane, random edges will be drawn from the query point to a points belonging to one of the nextplane_range
planes. ANone
means that the plane range that is considered is infinite.
- Return type:
Tensor
- Returns:
Random edge indices drawn from query hits.
pipeline.utils.graphutils.edgeutils module#
A module that defines utilities to handle edges exclusively.
- pipeline.utils.graphutils.edgeutils.compute_edge_labels_from_pid_only(edge_indices, particle_ids)[source]#
Compute the array of labels that indicate whether an edge is True or False. Can be used for training.
- Parameters:
edge_indices (
Tensor
) – 2D tensor whose columns are two hit indices, corresponding to an edgeparticle_ids (
Tensor
) – list of particle IDs of the hits
- Return type:
Tensor
- Returns:
1D tensor whose size is equal to the number of columns in
edge_indices
, and that indicates whether the corresponding edge inedge_indices
is a True edge or not.
- pipeline.utils.graphutils.edgeutils.remove_duplicate_edges(edge_indices, edge_tensors)[source]#
Remove duplicate edges in
edge_indices
and propagate the removing to the other “edge” tensors inedge_tensors
.- Parameters:
edge_indices (
Tensor
) – the edge indicesedge_tensors (
List
[Tensor
]) – a list of edge tensors
- Return type:
Tuple
[Tensor
,List
[Tensor
]]- Returns:
Updated
edge_indices
andedge_tensors
.
- pipeline.utils.graphutils.edgeutils.sort_edge_nodes(edges, ordering_tensor)[source]#
Sort the nodes of the edges in ascending value of a certain tensor
- Parameters:
edges (
Tensor
) – Two-dimensional array of edges, with shape \(\left(2, n_edges)`\)ordering_tensor (
Tensor
) – Tensor of values for the nodes. The first node of an edge is required to have a lower value that the second node.
- Return type:
None
pipeline.utils.graphutils.knn module#
A module that contains various ways of applying a kNN.
- pipeline.utils.graphutils.knn.build_edges_exatrkx(query, database, query_indices=None, r_max=1.0, k_max=10, device=None)[source]#
NOTE: These KNN/FRNN algorithms return the distances**2. Therefore we need to be careful when comparing them to the target distances (r_val, r_test), and to the margin parameter (which is L1 distance)
- Return type:
torch.Tensor
- pipeline.utils.graphutils.knn.build_edges_faiss(start_coords, end_coords, k_max, squared_distance_max, res=None, enforce_cpu=False)[source]#
Apply a kNN using
faiss
. The CPU execution is much, much faster than the GPU one in our case.- Parameters:
start_coords (torch.Tensor) – Coordinates of the starting points to connect to end points
end_coords (torch.Tensor) – Coordinates of the points that can be connected to starting points
k_max (int) – Maximum number of neighbours to connect to each starting point.
squared_distance_max (float) – Maximum distance for a two points to be considered as neighbours
res (faiss.StandardGpuResources | None) – Faiss GPU resource
- Return type:
torch.Tensor
- Returns:
Edge indices built by the kNN.
- pipeline.utils.graphutils.knn.build_edges_plane_by_plane(coords, planes, k_max, squared_distance_max, n_planes, plane_range=None, start_coords=None, start_planes=None, start_indices=None, enforce_cpu=True)[source]#
Build edges by applying a kNN for each plane, to build edges between this plane and the next
plane_range
. The loop over the planes is sequential but this is not a requirement.- Parameters:
coords (torch.Tensor) – 2D tensor of (embedded) coordinates of the points to apply the kNN on. The points must be sorted by plane number.
planes (torch.Tensor) – 1D tensor of plane number for each point
plane_range (int | None) – Maximum number of planes 2 connected points can be separated by. A
None
value means that the plane_range is infinitek_max (int) – Maximum number of neighbours to connect to each starting point.
squared_distance_max (float) – Maximum distance for a two points to be considered as neighbours
start_coords (torch.Tensor | None) – the coordinates of the starting points, in the case where there are different from
coords
start_planes (torch.Tensor | None) – the coordinates of the starting planes, in the case where
start_coords
provided.start_indices (torch.Tensor | None) – the corresponding indices in
coords
ofstart_coords
andstart_planes
. If provided, it is not necessary to provide neitherstart_coords
norstart_planes
.
- Returns:
2D tensor of edge indices built by the kNNs.
pipeline.utils.graphutils.torchutils module#
A python module that defines utilities using with PyTorch computation.
- pipeline.utils.graphutils.torchutils.get_groupby_indices(sorted_tensor, expected_unique_values=None, end_padding=0)[source]#
Get the array of grouping indices.
- Parameters:
sorted_tensor (torch.Tensor) – A tensor of sorted values.
expected_unique_values (torch.Tensor | None) – The expected unique values in the sorted tensor. This allows to consider missing values in
sorted_tensor
end_padding (int) – this parameter allows to append to the returned 1D tensor the last value
end_padding
times.
- Return type:
torch.Tensor
- Returns:
1D tensor of grouping indices, i.e., the indices of the starts of a new group in
sorted_tensor
, starting from 0, and including the last valuelen(sorted_tensor)
. This tensor allows to loop over slices of unique values ofsorted_tensor
.
- pipeline.utils.graphutils.torchutils.scatter_reduce(src, index, reduce, dim_size)[source]#
A scatter reduce for
dim=0
that works with ONNX export. It uses the experimental functiontorch.scatter_reduce()
. The arguments match the ones of the pytorch-scatter library, with a supplementary argumentreduce
(sum
,prod
,mean
,amax
oramin
) that allows to choose the reduction.- Return type:
Tensor
pipeline.utils.graphutils.tripletbuilding module#
A module to build triplets from edges.
- pipeline.utils.graphutils.tripletbuilding.compute_triplet_truths(df_triplets, df_edges_particles)[source]#
Add the target column
y
to the dataframe of triplets. In-place.- Parameters:
df_triplets (
TypeVar
(DataFrame
,DataFrame
,DataFrame
)) – dataframe of triplets, without truth information, with columnstriplet_idx
,edge_idx_1
andedge_idx_2
df_edges_particles (
TypeVar
(DataFrame
,DataFrame
,DataFrame
)) – dataframe of edges, with truth information, with columnsedge_idx
,y
andparticle_id
- Return type:
None
- pipeline.utils.graphutils.tripletbuilding.from_df_edges_to_df_triplets(df_edges)[source]#
- Return type:
Dict
[str
,TypeVar
(DataFrame
,DataFrame
,DataFrame
)]
- pipeline.utils.graphutils.tripletbuilding.from_edge_index_to_triplet_indices(edge_index)[source]#
Build the triplet indices from the array of edge indices.
- Parameters:
edge_index (
Tensor
) – tensor of shape(2, n_edges)
with the edge indices- Return type:
Dict
[str
,Tensor
]- Returns:
Dictionary that associates a triplet name with the tensor of triplet indices
(2, n_triplets)
pipeline.utils.graphutils.truths module#
A module that implements various ways of getting the intersection of the
predicted and truth graphs in order to get the target y
.
The Exa.TrkX function remains the fastest.
- pipeline.utils.graphutils.truths.get_truth_cudf(edge_indices, true_edge_indices)[source]#
Get the truth array
y
. This function is approximatively 7 times slower than going to the CSR representation.- Return type:
Tensor
- pipeline.utils.graphutils.truths.get_truths_exatrkx(edge_indices, true_edge_indices, device=None)[source]#
Get the targets of each edge in
edge_indices
given the true edges intrue_edge_indices
.- Parameters:
edge_indices (torch.Tensor) – predicted edge indices
true_edge_indices (torch.Tensor) – true edge indices
- Returns:
Edge indices (might be in a different order) and corresponding targets.
Notes
The function turns the tensors into numpy arrays, on CPU.
- pipeline.utils.graphutils.truths.get_truths_pytorch(edge_indices, true_edge_indices)[source]#
Get the targets of each edge in
edge_indices
given the true edges intrue_edge_indices
.- Parameters:
edge_indices (
Tensor
) – predicted edge indicestrue_edge_indices (
Tensor
) – true edge indices
- Return type:
Tensor
- Returns:
Edge targets.