pipeline.GNN package#

This package define the Graph Neural Network which allows to classify the edges and triplets in the graph.

Subpackages#

pipeline.GNN.triplet_gnn_base module#

A module that define TripletGNNBase, the base class of all triplet-based GNNs in this repository.

class pipeline.GNN.triplet_gnn_base.TripletGNNBase(*args: Any, **kwargs: Any)[source]#

Bases: ModelBase

The base class for triplet-base models, that first classify edges, then triplets.

common_training_validation_step(batch, edge_score_cut=None, with_triplets=None, compute_loss=False)[source]#

Common forward step and loss computation for the training and validation steps.

Parameters:
  • batch (Data) – event graph

  • with_triplets (Optional[bool]) – whether to include the forward step on triplets

  • edge_score_cut (Optional[float]) – minimal edge score the edges are required to have

Return type:

Dict[str, Any]

Returns:

Output of the forward step and loss computation.

compute_normalised_loss(output, truth)[source]#

Compute typical weighted focal loss for given output and truth.

Parameters:
  • output (Tensor) – logits

  • truth (Tensor) – targets

Return type:

Tensor

Returns:

Normalised sigmoid focal loss.

property edge_checkpointing: bool#
filter_edges(edge_index, edge_score, edge_score_cut=None)[source]#
Return type:

Tuple[Tensor, Tensor]

forward(x, edge_index, edge_score_cut=None, with_triplets=True)[source]#

Forward step of the triplet-based Neural Network.

1. forward_edges() method is called, and outputs the edge logits edge_output, with possibly other tensors that can be used for the triplets.

2. The edges are filtered using the self.filter_edges_for_triplets() method.

3. The triplets are built using utils.graphutils.tripletbuilding.from_edge_index_to_triplet_indices()

4. forward_triplets() method is called, and outputs the triplets logits triplet_outputs.

Parameters:
  • x (Tensor) – node features

  • edge_index (Tensor) – tensor with shape (2, n_edges) of the edge indices

  • edge_score_cut (Optional[float]) – Minal edge score

  • with_triplets (bool) – whether to include the triplet inference

Return type:

Dict[str, Any]

forward_edges(x, start, end)[source]#

Forward step for edge classification.

Parameters:
  • x (Tensor) – Hit features

  • start (Tensor) – tensor of start indices

  • end (Tensor) – tensor of edge indices

Return type:

Dict[str, Tensor]

Returns:

A dictionary of tensors. Should at least contain edge_output, the logits of each edges.

forward_triplets(dict_triplet_indices, *args, **kwargs)[source]#

Forward step for triplet building and classification.

Parameters:
  • dict_triplet_indices (Dict[str, Tensor]) – associates articulation, elbow_left and elbow_right with the corresponding triplet indices.

  • args – Other arguments to pass to the triplet output step.

  • kwargs – Other arguments to pass to the triplet output step.

Return type:

Dict[str, Tensor]

Returns:

A dictionary that associates articulation, elbow_left and elbow_right with the logits of the corresponding triplets.

get_lazy_dataset(*args, **kwargs)[source]#

Get the lazy dataset object.

Parameters:
  • input_dir – input directory

  • n_events – number of events to load

  • shuffle – whether to shuffle the input paths (applied before selected the first n_events)

  • seed – seed for the shuffling

  • **kwargs – Other keyword arguments passed to the utils.loaderutils.dataiterator.LazyDatasetBase constructor.

Return type:

TripletGNNLazyDataset

Returns:

utils.loaderutils.dataiterator.LazyDatasetBase object

get_lazy_dataset_partition(partition, *args, **kwargs)[source]#

Get the lazy dataset of a partition.

Parameters:
  • partition (str) – train, val or name of the test dataset

  • n_events – number of events to load

  • shuffle – whether to shuffle the input paths (applied before selected the first n_events)

  • seed – seed for the shuffling

  • **kwargs – Other keyword arguments passed to ModelBase.get_lazy_dataset()

Return type:

LazyDatasetBase

Returns:

Lazy dataset of the partition

inference(batch, with_triplets=True, with_triplet_truths=False, edge_score_cut=None)[source]#

Run inference (without loss computation).

Parameters:
  • batch (Data) – event graph

  • with_triplets (bool) – whether to include the forward step on triplets

  • edge_score_cut (Optional[float]) – minimal edge score the edges are required to have

Return type:

Dict[str, Any]

Returns:

Output of the forward step.

property input_kwargs: Dict[str, Any]#

Associates an input name with a dictionary corresponding to the keyword arguments used to build a dummy tensor representing the input. This dictionary basically gives the size and dtype of the tensor.

property input_to_dynamic_axes#

A dictionary that associates an input name with the dynamic axis specification.

log_metrics_gen(loss, scores, predictions, truths, suffix='')[source]#

Add entry to the log.

Parameters:
  • loss (Tensor) – overall loss

  • scores (Tensor) – edge or triplet scores. Used to compute the AUC

  • predictions (Tensor) – edge or triplet predicted targets

  • truths (Tensor) – edge or triplet targets

  • suffix (str) – optional suffix, e.g., _edge or _triplet

Return type:

None

property loss: str#
property n_hiddens: int#

Number of hidden units

shared_evaluation(batch, log=False, with_triplets=None)[source]#

Evaluation step. Can be used for validation and test.

Parameters:
  • batch (Data) – event graph

  • log (bool) – whether to add an entry to the log

  • with_triplets (Optional[bool]) – whether to include the triplet inference

property subnetwork_to_outputs: Dict[str, List[str]]#

A dictionary that associates a subnetwork name with the list of its output names.

test_step(batch, batch_idx)[source]#
to_onnx(outpath, mode=None, options=None)[source]#

Export the model to ONNX

Parameters:
  • outpath (str) – path to the ONNX output file

  • mode (Optional[str]) – subnetwork to save

Return type:

None

training_step(batch, batch_idx)[source]#

Training step.

property triplet_checkpointing: bool#
triplet_output_step(dict_triplet_indices, *args, **kwargs)[source]#
Return type:

Dict[str, Tensor]

triplet_output_step_articulation(triplet_indices, *args, **kwargs)[source]#
triplet_output_step_elbow_left(triplet_indices, *args, **kwargs)[source]#
triplet_output_step_elbow_right(triplet_indices, *args, **kwargs)[source]#
validation_step(batch, batch_idx)[source]#
property with_triplets: bool#
class pipeline.GNN.triplet_gnn_base.TripletGNNLazyDataset(input_dir, n_events=None, shuffle=False, seed=None, **kwargs)[source]#

Bases: LazyDatasetBase

fetch_dataset(input_path, **kwargs)[source]#

Load and process one PyTorch DataSet.

Parameters:
  • input_path (str) – path to the PyTorch dataset

  • map_location – location where to load the dataset

  • **kwargs – Other keyword arguments passed to torch.load()

Return type:

Data

Returns:

Load PyTorch data object

pipeline.GNN.triplet_gnn_base.get_df_edges_from_batch_only(batch)[source]#

Get the dataframe of edges with particle ID information.

Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

pipeline.GNN.perfect_gnn module#

Replace the GNN by a perfect inference in order to understand what is the best result that can be obtained with the current pipeline.

class pipeline.GNN.perfect_gnn.PerfectInferenceBuilder[source]#

Bases: BuilderBase

Generate perfect inference, that is, the edge score is equal to the truth.

construct_downstream(batch, pid=False)[source]#

Run the inference on a PyTorch Data. In-place.

class pipeline.GNN.perfect_gnn.PerfectTripletInferenceBuilder[source]#

Bases: BuilderBase

construct_downstream(batch)[source]#

Run the inference on a PyTorch Data. In-place.

load_batch(input_path)[source]#

Load a PyTorch Data object from its path. Might apply necessary pre-processing.

Return type:

Data

pipeline.GNN.gnn_validation module#

class pipeline.GNN.gnn_validation.GNNScoreCutExplorer(model, builder='default')[source]#

Bases: ParamExplorer

A class that allows to vary the score cut after the GNN, and compare the metric performances of track finding.

property default_step: str#

Name of the temp to fall back to if not provided.

get_tracks(value, batches)[source]#

Get the dataframe of tracks from the inferred batches.

Parameters:
  • value (float) – current value of the parameter that is explored

  • batches (List[Data]) – list of inferred batches

Returns:

Dataframe of tracks, with columns track_id and hit_id

run_inference(batches)[source]#

Run the inference on a batch.

Parameters:

batches (Union[List[Data], LazyDatasetBase]) – List of batches

Returns:

List of inferred batches

class pipeline.GNN.gnn_validation.TripletGNNScoreCutExplorer(model)[source]#

Bases: ParamExplorer

A class that allows to vary the score cut after the GNN, and compare the metric performances of track finding.

add_lhcb_text(ax, metric_name)[source]#
get_tracks(value, batches)[source]#

Get the dataframe of tracks from the inferred batches.

Parameters:
  • value (float) – current value of the parameter that is explored

  • batches (List[Data]) – list of inferred batches

Returns:

Dataframe of tracks, with columns track_id and hit_id

run_inference(batches)[source]#

Run the inference on a batch.

Parameters:

batches (Union[List[Data], LazyDatasetBase]) – List of batches

Returns:

List of inferred batches

pipeline.GNN.gnn_plots module#

pipeline.GNN.gnn_plots.plot_best_performances_score_cut(model, partition, edge_score_cuts, builder='default', n_events=None, seed=None, identifier=None, path_or_config=None, step='gnn', **kwargs)[source]#
Return type:

Tuple[Figure | npt.NDArray, List[Axes], Dict[float, Dict[Tuple[str | None, str], float]]]

pipeline.GNN.gnn_plots.plot_best_performances_score_cut_triplets(model, partition, edge_score_cut, triplet_score_cuts, n_events=None, seed=None, identifier=None, path_or_config=None, step='gnn', **kwargs)[source]#
Return type:

Tuple[Figure | npt.NDArray, List[Axes], Dict[float, Dict[Tuple[str | None, str], float]]]