Generate Figures with Snakemake#

Snakemake is utilised to ensure the reproducible generation of figures once the models have completed training.

You can configure the Snakemake pipeline by modifying the snakefiles/snakeconfig.yaml configuration file.

For instance, consider the following example configuration:

# Test dataset name(s) used to generate the plots that do not depend on an experiment
standatalone_test_dataset_names: ["minbias-sim10b-xdigi_v2.4_1498"]
# Experiment names to use to generate figures unrelated to the training but
# depending on the choice of an experiment
main_experiment_names: ["focal-loss-nopid-triplets-embedding-3-withspillover-new"]
experiments:
  focal-loss-nopid-triplets-embedding-3-withspillover-new:
    # Sample used to choice the edge and triplet score cuts, and the maximal squared distance
    choice: "minbias-sim10b-xdigi_v2.4_1498"
    steps: null # means all the steps
  focal-loss-nopid-triplets-embedding-3-withspillover-d2max0.02:
    choice: "minbias-sim10b-xdigi_v2.4_1498"
    steps: ["gnn", "track_building"]

The standalone_test_dataset_names entry is used for generating plots unrelated to training, such as distribution plots for the number of hits or particles.
The main_experiment_names entry is employed to generate explanatory figures illustrating the pipeline’s principles. These figures are typically derived from the “choice” test dataset, which is specified in the next point.
You can then list multiple experiments under the experiments section. Each experiment can have the following attributes:
- The choice test dataset name is the dataset used to select model parameters, such as the maximal squared distance in the embedding space (\(d^2_{\text{max}}\)).
- steps attribute specifies the list of steps to be plotted, such as embedding, gnn and/or track_building. For example, the second experiment focal-loss-nopid-triplets-embedding-3-withspillover-d2max0.02 corresponds to a different GNN training using a distinct value of \(d^2_\text{max}\) compared to the default training. Therefore, it is unnecessary to regenerate the plots for the embedding step in this case.

Once the Snakemake pipeline is properly configured, running it is straightforward:

# If not already, set up the environment
source setup/setup.sh
conda activate etx4velo_env
cd etx4velo

snakemake -c2 -p all

In this command, the -c2 flag restricts the simultaneous execution of scripts to a maximum of 2.

The all rule serves as a master rule that will execute all available sub-rules However, you can also selectively run specific sub-rules to perform focused tasks. Here are the sub-rules you can use:

Apart from all, other rules that can be run are:

plotfactory_all: generate various statistics plots
embedding_all: Creates plots illustrating the performance of the embedding training.
gnn_all: Creates plots illustrating the performance of the GNN training.
matching_all: Produces plots that compute track-finding performances of the GNN-based pipeline and compare them to the baseline Allen performance
test_all: Evaluates the performance of the baseline Allen algorithm across all test datasets.
plotfactory_all: Generates plots related to tracks.