pipeline.utils.graphutils package#

A package that defines common utilies to handle graphs in PyTorch geometric.

pipeline.utils.graphutils.batch2df module#

A python module that allows to build dataframes directly from a batch PyTorch data object.

pipeline.utils.graphutils.batch2df.get_df_edges(batch, df_hits_particles=None, combine_particle_id=False)[source]#

Get the dataframe of edges.

Parameters:
  • batch (Data) – PyTorch Data object that contains the tensors edge_index and y.

  • df_hits_particles (Optional[TypeVar(DataFrame, DataFrame, DataFrame)]) – Optional dataframe of hits-particles to merge to to the left and right hits of the dataframe of edges

Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

Returns:

Dataframe of edges, with columns hit_idx_left, hit_idx_right, y, edge_idx and the columns provided in df_hits_particles suffixed by _left and _right.

pipeline.utils.graphutils.batch2df.get_df_edges_from_edge_index(edge_index, tensors=None)[source]#
Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

pipeline.utils.graphutils.batch2df.get_df_hits(batch, hit_columns)[source]#
Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

pipeline.utils.graphutils.batch2df.get_df_hits_particles(batch, particle_columns=None)[source]#

Get the dataframe of hits-particles.

Parameters:
  • batch (Data) – PyTorch Data object that contains the tensor particle_id_hit_idx

  • particle_columns (Optional[List[str]]) – A list of particle columns to merge to the outputted dataframe. The particle column names are expected to be prefixed by particle_ in batch.

Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

Returns:

Dataframe of hits-particles with columns particle_id, hit_idx, hit_particle_idx and the columns particle_columns.

pipeline.utils.graphutils.batch2df.get_df_hits_particles_from_particle_id_hit_idx(particle_id_hit_idx)[source]#
Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

pipeline.utils.graphutils.batch2df.get_df_triplets_from_triplet_index(triplet_index)[source]#
Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

pipeline.utils.graphutils.batch2df.merge_df_hits_particles_to_edges(df_edges, df_hits_particles, combine_particle_id=False)[source]#

Merge the dataframe of edges to the left and right hits of the dataframe of edges.

Parameters:
  • df_edges (TypeVar(DataFrame, DataFrame, DataFrame)) – Dataframe of edges, at least with columns hit_idx_left, hit_idx_right

  • df_hits_particles (TypeVar(DataFrame, DataFrame, DataFrame)) – Dataframe of hits particles, at least with column hit_idx and particle_id

  • combine_particle_id (bool) – whether to combine particle_id_left and particle_id_right into particle_id

Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

Returns:

Dataframe of edges-particles.

pipeline.utils.graphutils.edgebuilding module#

A module that allows to build edges in various ways.

pipeline.utils.graphutils.edgebuilding.get_random_pairs_plane_by_plane(n_random, planes, query_indices, n_planes, plane_range=None)[source]#

Build random edges from query hits.

Parameters:
  • n_random (int) – Number of random pairs by query point

  • planes (Tensor) – 1D tensor of planes of all the points

  • query_indices (Tensor) – indices of the query points

  • plane_range (Optional[int]) – for each plane, random edges will be drawn from the query point to a points belonging to one of the next plane_range planes. A None means that the plane range that is considered is infinite.

Return type:

Tensor

Returns:

Random edge indices drawn from query hits.

pipeline.utils.graphutils.edgeutils module#

A module that defines utilities to handle edges exclusively.

pipeline.utils.graphutils.edgeutils.compute_edge_labels_from_pid_only(edge_indices, particle_ids)[source]#

Compute the array of labels that indicate whether an edge is True or False. Can be used for training.

Parameters:
  • edge_indices (Tensor) – 2D tensor whose columns are two hit indices, corresponding to an edge

  • particle_ids (Tensor) – list of particle IDs of the hits

Return type:

Tensor

Returns:

1D tensor whose size is equal to the number of columns in edge_indices, and that indicates whether the corresponding edge in edge_indices is a True edge or not.

pipeline.utils.graphutils.edgeutils.remove_duplicate_edges(edge_indices, edge_tensors)[source]#

Remove duplicate edges in edge_indices and propagate the removing to the other “edge” tensors in edge_tensors.

Parameters:
  • edge_indices (Tensor) – the edge indices

  • edge_tensors (List[Tensor]) – a list of edge tensors

Return type:

Tuple[Tensor, List[Tensor]]

Returns:

Updated edge_indices and edge_tensors.

pipeline.utils.graphutils.edgeutils.sort_edge_nodes(edges, ordering_tensor)[source]#

Sort the nodes of the edges in ascending value of a certain tensor

Parameters:
  • edges (Tensor) – Two-dimensional array of edges, with shape \(\left(2, n_edges)`\)

  • ordering_tensor (Tensor) – Tensor of values for the nodes. The first node of an edge is required to have a lower value that the second node.

Return type:

None

pipeline.utils.graphutils.knn module#

A module that contains various ways of applying a kNN.

pipeline.utils.graphutils.knn.build_edges_exatrkx(query, database, query_indices=None, r_max=1.0, k_max=10, device=None)[source]#

NOTE: These KNN/FRNN algorithms return the distances**2. Therefore we need to be careful when comparing them to the target distances (r_val, r_test), and to the margin parameter (which is L1 distance)

Return type:

torch.Tensor

pipeline.utils.graphutils.knn.build_edges_faiss(start_coords, end_coords, k_max, squared_distance_max, res=None, enforce_cpu=False)[source]#

Apply a kNN using faiss. The CPU execution is much, much faster than the GPU one in our case.

Parameters:
  • start_coords (torch.Tensor) – Coordinates of the starting points to connect to end points

  • end_coords (torch.Tensor) – Coordinates of the points that can be connected to starting points

  • k_max (int) – Maximum number of neighbours to connect to each starting point.

  • squared_distance_max (float) – Maximum distance for a two points to be considered as neighbours

  • res (faiss.StandardGpuResources | None) – Faiss GPU resource

Return type:

torch.Tensor

Returns:

Edge indices built by the kNN.

pipeline.utils.graphutils.knn.build_edges_plane_by_plane(coords, planes, k_max, squared_distance_max, n_planes, plane_range=None, start_coords=None, start_planes=None, start_indices=None, enforce_cpu=True)[source]#

Build edges by applying a kNN for each plane, to build edges between this plane and the next plane_range. The loop over the planes is sequential but this is not a requirement.

Parameters:
  • coords (torch.Tensor) – 2D tensor of (embedded) coordinates of the points to apply the kNN on. The points must be sorted by plane number.

  • planes (torch.Tensor) – 1D tensor of plane number for each point

  • plane_range (int | None) – Maximum number of planes 2 connected points can be separated by. A None value means that the plane_range is infinite

  • k_max (int) – Maximum number of neighbours to connect to each starting point.

  • squared_distance_max (float) – Maximum distance for a two points to be considered as neighbours

  • start_coords (torch.Tensor | None) – the coordinates of the starting points, in the case where there are different from coords

  • start_planes (torch.Tensor | None) – the coordinates of the starting planes, in the case where start_coords provided.

  • start_indices (torch.Tensor | None) – the corresponding indices in coords of start_coords and start_planes. If provided, it is not necessary to provide neither start_coords nor start_planes.

Returns:

2D tensor of edge indices built by the kNNs.

pipeline.utils.graphutils.torchutils module#

A python module that defines utilities using with PyTorch computation.

pipeline.utils.graphutils.torchutils.get_groupby_indices(sorted_tensor, expected_unique_values=None, end_padding=0)[source]#

Get the array of grouping indices.

Parameters:
  • sorted_tensor (torch.Tensor) – A tensor of sorted values.

  • expected_unique_values (torch.Tensor | None) – The expected unique values in the sorted tensor. This allows to consider missing values in sorted_tensor

  • end_padding (int) – this parameter allows to append to the returned 1D tensor the last value end_padding times.

Return type:

torch.Tensor

Returns:

1D tensor of grouping indices, i.e., the indices of the starts of a new group in sorted_tensor, starting from 0, and including the last value len(sorted_tensor). This tensor allows to loop over slices of unique values of sorted_tensor.

pipeline.utils.graphutils.torchutils.scatter_reduce(src, index, reduce, dim_size)[source]#

A scatter reduce for dim=0 that works with ONNX export. It uses the experimental function torch.scatter_reduce(). The arguments match the ones of the pytorch-scatter library, with a supplementary argument reduce (sum, prod, mean, amax or amin) that allows to choose the reduction.

Return type:

Tensor

pipeline.utils.graphutils.tripletbuilding module#

A module to build triplets from edges.

pipeline.utils.graphutils.tripletbuilding.compute_triplet_truths(df_triplets, df_edges_particles)[source]#

Add the target column y to the dataframe of triplets. In-place.

Parameters:
  • df_triplets (TypeVar(DataFrame, DataFrame, DataFrame)) – dataframe of triplets, without truth information, with columns triplet_idx, edge_idx_1 and edge_idx_2

  • df_edges_particles (TypeVar(DataFrame, DataFrame, DataFrame)) – dataframe of edges, with truth information, with columns edge_idx, y and particle_id

Return type:

None

pipeline.utils.graphutils.tripletbuilding.from_df_edges_to_df_triplets(df_edges)[source]#
Return type:

Dict[str, TypeVar(DataFrame, DataFrame, DataFrame)]

pipeline.utils.graphutils.tripletbuilding.from_edge_index_to_triplet_indices(edge_index)[source]#

Build the triplet indices from the array of edge indices.

Parameters:

edge_index (Tensor) – tensor of shape (2, n_edges) with the edge indices

Return type:

Dict[str, Tensor]

Returns:

Dictionary that associates a triplet name with the tensor of triplet indices (2, n_triplets)

pipeline.utils.graphutils.tripletbuilding.from_triplet_index_to_triplet_truth(triplet_index, df_edges_particles)[source]#
Return type:

Tensor

pipeline.utils.graphutils.tripletbuilding.get_triplet_truths_from_tensors(triplet_indices, edge_index, edge_truth, particle_id_hit_idx)[source]#
Return type:

Dict[str, Tensor]

pipeline.utils.graphutils.truths module#

A module that implements various ways of getting the intersection of the predicted and truth graphs in order to get the target y.

The Exa.TrkX function remains the fastest.

pipeline.utils.graphutils.truths.get_truth_cudf(edge_indices, true_edge_indices)[source]#

Get the truth array y. This function is approximatively 7 times slower than going to the CSR representation.

Return type:

Tensor

pipeline.utils.graphutils.truths.get_truths_exatrkx(edge_indices, true_edge_indices, device=None)[source]#

Get the targets of each edge in edge_indices given the true edges in true_edge_indices.

Parameters:
  • edge_indices (torch.Tensor) – predicted edge indices

  • true_edge_indices (torch.Tensor) – true edge indices

Returns:

Edge indices (might be in a different order) and corresponding targets.

Notes

The function turns the tensors into numpy arrays, on CPU.

pipeline.utils.graphutils.truths.get_truths_pytorch(edge_indices, true_edge_indices)[source]#

Get the targets of each edge in edge_indices given the true edges in true_edge_indices.

Parameters:
  • edge_indices (Tensor) – predicted edge indices

  • true_edge_indices (Tensor) – true edge indices

Return type:

Tensor

Returns:

Edge targets.