Generate Figures with Snakemake#
Snakemake is utilised to ensure the reproducible generation of figures once the models have completed training.
You can configure the Snakemake pipeline by modifying the snakefiles/snakeconfig.yaml
configuration file.
For instance, consider the following example configuration:
# Test dataset name(s) used to generate the plots that do not depend on an experiment
standatalone_test_dataset_names: ["minbias-sim10b-xdigi_v2.4_1498"]
# Experiment names to use to generate figures unrelated to the training but
# depending on the choice of an experiment
main_experiment_names: ["focal-loss-nopid-triplets-embedding-3-withspillover-new"]
experiments:
focal-loss-nopid-triplets-embedding-3-withspillover-new:
# Sample used to choice the edge and triplet score cuts, and the maximal squared distance
choice: "minbias-sim10b-xdigi_v2.4_1498"
steps: null # means all the steps
focal-loss-nopid-triplets-embedding-3-withspillover-d2max0.02:
choice: "minbias-sim10b-xdigi_v2.4_1498"
steps: ["gnn", "track_building"]
The
standalone_test_dataset_names
entry is used for generating plots unrelated to training, such as distribution plots for the number of hits or particles.The
main_experiment_names
entry is employed to generate explanatory figures illustrating the pipeline’s principles. These figures are typically derived from the “choice” test dataset, which is specified in the next point.You can then list multiple experiments under the
experiments
section. Each experiment can have the following attributes:The
choice
test dataset name is the dataset used to select model parameters, such as the maximal squared distance in the embedding space (\(d^2_{\text{max}}\)).steps
attribute specifies the list of steps to be plotted, such asembedding
,gnn
and/ortrack_building
. For example, the second experimentfocal-loss-nopid-triplets-embedding-3-withspillover-d2max0.02
corresponds to a different GNN training using a distinct value of \(d^2_\text{max}\) compared to the default training. Therefore, it is unnecessary to regenerate the plots for the embedding step in this case.
Once the Snakemake pipeline is properly configured, running it is straightforward:
# If not already, set up the environment
source setup/setup.sh
conda activate etx4velo_env
cd etx4velo
snakemake -c2 -p all
In this command, the -c2 flag restricts the simultaneous execution of scripts to a maximum of 2.
The all
rule serves as a master rule that will execute all available sub-rules
However, you can also selectively run specific sub-rules to perform focused tasks.
Here are the sub-rules you can use:
Apart from all
, other rules that can be run are:
plotfactory_all
: generate various statistics plotsembedding_all
: Creates plots illustrating the performance of the embedding training.gnn_all
: Creates plots illustrating the performance of the GNN training.matching_all
: Produces plots that compute track-finding performances of the GNN-based pipeline and compare them to the baseline Allen performancetest_all
: Evaluates the performance of the baseline Allen algorithm across all test datasets.plotfactory_all
: Generates plots related to tracks.