pipeline.utils.graphutils package#
A package that defines common utilies to handle graphs in PyTorch geometric.
pipeline.utils.graphutils.batch2df module#
A python module that allows to build dataframes directly from a batch PyTorch data object.
- pipeline.utils.graphutils.batch2df.get_df_edges(batch, df_hits_particles=None, combine_particle_id=False)[source]#
Get the dataframe of edges.
- Parameters:
batch (
Data) – PyTorch Data object that contains the tensorsedge_indexandy.df_hits_particles (
Optional[TypeVar(DataFrame,DataFrame,DataFrame)]) – Optional dataframe of hits-particles to merge to to the left and right hits of the dataframe of edges
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)- Returns:
Dataframe of edges, with columns
hit_idx_left,hit_idx_right,y,edge_idxand the columns provided indf_hits_particlessuffixed by_leftand_right.
- pipeline.utils.graphutils.batch2df.get_df_edges_from_edge_index(edge_index, tensors=None)[source]#
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)
- pipeline.utils.graphutils.batch2df.get_df_hits(batch, hit_columns)[source]#
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)
- pipeline.utils.graphutils.batch2df.get_df_hits_particles(batch, particle_columns=None)[source]#
Get the dataframe of hits-particles.
- Parameters:
batch (
Data) – PyTorch Data object that contains the tensorparticle_id_hit_idxparticle_columns (
Optional[List[str]]) – A list of particle columns to merge to the outputted dataframe. The particle column names are expected to be prefixed byparticle_inbatch.
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)- Returns:
Dataframe of hits-particles with columns
particle_id,hit_idx,hit_particle_idxand the columnsparticle_columns.
- pipeline.utils.graphutils.batch2df.get_df_hits_particles_from_particle_id_hit_idx(particle_id_hit_idx)[source]#
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)
- pipeline.utils.graphutils.batch2df.get_df_triplets_from_triplet_index(triplet_index)[source]#
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)
- pipeline.utils.graphutils.batch2df.merge_df_hits_particles_to_edges(df_edges, df_hits_particles, combine_particle_id=False)[source]#
Merge the dataframe of edges to the left and right hits of the dataframe of edges.
- Parameters:
df_edges (
TypeVar(DataFrame,DataFrame,DataFrame)) – Dataframe of edges, at least with columnshit_idx_left,hit_idx_rightdf_hits_particles (
TypeVar(DataFrame,DataFrame,DataFrame)) – Dataframe of hits particles, at least with columnhit_idxandparticle_idcombine_particle_id (
bool) – whether to combineparticle_id_leftandparticle_id_rightintoparticle_id
- Return type:
TypeVar(DataFrame,DataFrame,DataFrame)- Returns:
Dataframe of edges-particles.
pipeline.utils.graphutils.edgebuilding module#
A module that allows to build edges in various ways.
- pipeline.utils.graphutils.edgebuilding.get_random_pairs_plane_by_plane(n_random, planes, query_indices, n_planes, plane_range=None)[source]#
Build random edges from query hits.
- Parameters:
n_random (
int) – Number of random pairs by query pointplanes (
Tensor) – 1D tensor of planes of all the pointsquery_indices (
Tensor) – indices of the query pointsplane_range (
Optional[int]) – for each plane, random edges will be drawn from the query point to a points belonging to one of the nextplane_rangeplanes. ANonemeans that the plane range that is considered is infinite.
- Return type:
Tensor- Returns:
Random edge indices drawn from query hits.
pipeline.utils.graphutils.edgeutils module#
A module that defines utilities to handle edges exclusively.
- pipeline.utils.graphutils.edgeutils.compute_edge_labels_from_pid_only(edge_indices, particle_ids)[source]#
Compute the array of labels that indicate whether an edge is True or False. Can be used for training.
- Parameters:
edge_indices (
Tensor) – 2D tensor whose columns are two hit indices, corresponding to an edgeparticle_ids (
Tensor) – list of particle IDs of the hits
- Return type:
Tensor- Returns:
1D tensor whose size is equal to the number of columns in
edge_indices, and that indicates whether the corresponding edge inedge_indicesis a True edge or not.
- pipeline.utils.graphutils.edgeutils.remove_duplicate_edges(edge_indices, edge_tensors)[source]#
Remove duplicate edges in
edge_indicesand propagate the removing to the other “edge” tensors inedge_tensors.- Parameters:
edge_indices (
Tensor) – the edge indicesedge_tensors (
List[Tensor]) – a list of edge tensors
- Return type:
Tuple[Tensor,List[Tensor]]- Returns:
Updated
edge_indicesandedge_tensors.
- pipeline.utils.graphutils.edgeutils.sort_edge_nodes(edges, ordering_tensor)[source]#
Sort the nodes of the edges in ascending value of a certain tensor
- Parameters:
edges (
Tensor) – Two-dimensional array of edges, with shape \(\left(2, n_edges)`\)ordering_tensor (
Tensor) – Tensor of values for the nodes. The first node of an edge is required to have a lower value that the second node.
- Return type:
None
pipeline.utils.graphutils.knn module#
A module that contains various ways of applying a kNN.
- pipeline.utils.graphutils.knn.build_edges_exatrkx(query, database, query_indices=None, r_max=1.0, k_max=10, device=None)[source]#
NOTE: These KNN/FRNN algorithms return the distances**2. Therefore we need to be careful when comparing them to the target distances (r_val, r_test), and to the margin parameter (which is L1 distance)
- Return type:
torch.Tensor
- pipeline.utils.graphutils.knn.build_edges_faiss(start_coords, end_coords, k_max, squared_distance_max, res=None, enforce_cpu=False)[source]#
Apply a kNN using
faiss. The CPU execution is much, much faster than the GPU one in our case.- Parameters:
start_coords (torch.Tensor) – Coordinates of the starting points to connect to end points
end_coords (torch.Tensor) – Coordinates of the points that can be connected to starting points
k_max (int) – Maximum number of neighbours to connect to each starting point.
squared_distance_max (float) – Maximum distance for a two points to be considered as neighbours
res (faiss.StandardGpuResources | None) – Faiss GPU resource
- Return type:
torch.Tensor
- Returns:
Edge indices built by the kNN.
- pipeline.utils.graphutils.knn.build_edges_plane_by_plane(coords, planes, k_max, squared_distance_max, n_planes, plane_range=None, start_coords=None, start_planes=None, start_indices=None, enforce_cpu=True)[source]#
Build edges by applying a kNN for each plane, to build edges between this plane and the next
plane_range. The loop over the planes is sequential but this is not a requirement.- Parameters:
coords (torch.Tensor) – 2D tensor of (embedded) coordinates of the points to apply the kNN on. The points must be sorted by plane number.
planes (torch.Tensor) – 1D tensor of plane number for each point
plane_range (int | None) – Maximum number of planes 2 connected points can be separated by. A
Nonevalue means that the plane_range is infinitek_max (int) – Maximum number of neighbours to connect to each starting point.
squared_distance_max (float) – Maximum distance for a two points to be considered as neighbours
start_coords (torch.Tensor | None) – the coordinates of the starting points, in the case where there are different from
coordsstart_planes (torch.Tensor | None) – the coordinates of the starting planes, in the case where
start_coordsprovided.start_indices (torch.Tensor | None) – the corresponding indices in
coordsofstart_coordsandstart_planes. If provided, it is not necessary to provide neitherstart_coordsnorstart_planes.
- Returns:
2D tensor of edge indices built by the kNNs.
pipeline.utils.graphutils.torchutils module#
A python module that defines utilities using with PyTorch computation.
- pipeline.utils.graphutils.torchutils.get_groupby_indices(sorted_tensor, expected_unique_values=None, end_padding=0)[source]#
Get the array of grouping indices.
- Parameters:
sorted_tensor (torch.Tensor) – A tensor of sorted values.
expected_unique_values (torch.Tensor | None) – The expected unique values in the sorted tensor. This allows to consider missing values in
sorted_tensorend_padding (int) – this parameter allows to append to the returned 1D tensor the last value
end_paddingtimes.
- Return type:
torch.Tensor
- Returns:
1D tensor of grouping indices, i.e., the indices of the starts of a new group in
sorted_tensor, starting from 0, and including the last valuelen(sorted_tensor). This tensor allows to loop over slices of unique values ofsorted_tensor.
- pipeline.utils.graphutils.torchutils.scatter_reduce(src, index, reduce, dim_size)[source]#
A scatter reduce for
dim=0that works with ONNX export. It uses the experimental functiontorch.scatter_reduce(). The arguments match the ones of the pytorch-scatter library, with a supplementary argumentreduce(sum,prod,mean,amaxoramin) that allows to choose the reduction.- Return type:
Tensor
pipeline.utils.graphutils.tripletbuilding module#
A module to build triplets from edges.
- pipeline.utils.graphutils.tripletbuilding.compute_triplet_truths(df_triplets, df_edges_particles)[source]#
Add the target column
yto the dataframe of triplets. In-place.- Parameters:
df_triplets (
TypeVar(DataFrame,DataFrame,DataFrame)) – dataframe of triplets, without truth information, with columnstriplet_idx,edge_idx_1andedge_idx_2df_edges_particles (
TypeVar(DataFrame,DataFrame,DataFrame)) – dataframe of edges, with truth information, with columnsedge_idx,yandparticle_id
- Return type:
None
- pipeline.utils.graphutils.tripletbuilding.from_df_edges_to_df_triplets(df_edges)[source]#
- Return type:
Dict[str,TypeVar(DataFrame,DataFrame,DataFrame)]
- pipeline.utils.graphutils.tripletbuilding.from_edge_index_to_triplet_indices(edge_index)[source]#
Build the triplet indices from the array of edge indices.
- Parameters:
edge_index (
Tensor) – tensor of shape(2, n_edges)with the edge indices- Return type:
Dict[str,Tensor]- Returns:
Dictionary that associates a triplet name with the tensor of triplet indices
(2, n_triplets)
pipeline.utils.graphutils.truths module#
A module that implements various ways of getting the intersection of the
predicted and truth graphs in order to get the target y.
The Exa.TrkX function remains the fastest.
- pipeline.utils.graphutils.truths.get_truth_cudf(edge_indices, true_edge_indices)[source]#
Get the truth array
y. This function is approximatively 7 times slower than going to the CSR representation.- Return type:
Tensor
- pipeline.utils.graphutils.truths.get_truths_exatrkx(edge_indices, true_edge_indices, device=None)[source]#
Get the targets of each edge in
edge_indicesgiven the true edges intrue_edge_indices.- Parameters:
edge_indices (torch.Tensor) – predicted edge indices
true_edge_indices (torch.Tensor) – true edge indices
- Returns:
Edge indices (might be in a different order) and corresponding targets.
Notes
The function turns the tensors into numpy arrays, on CPU.
- pipeline.utils.graphutils.truths.get_truths_pytorch(edge_indices, true_edge_indices)[source]#
Get the targets of each edge in
edge_indicesgiven the true edges intrue_edge_indices.- Parameters:
edge_indices (
Tensor) – predicted edge indicestrue_edge_indices (
Tensor) – true edge indices
- Return type:
Tensor- Returns:
Edge targets.