Training a Pipeline#
This page outlines the procedure to run a training of the pipeline. You will work with a training dataset containing 5,000 events, a validation dataset of 500 events, and two test datasets, each consisting of 1,000 events.
The pipeline configuration file of interest is located
in etx4velo/pipeline_configs/example.yaml
.
1. Downloading the Training Data#
In this initial step, we will download the necessary training data.
In general, uou can access and download the data from the
following page.
The data is stored in my EOS space at the location /eos/lhcb/user/a/anthonyc/tracking/data/csv
.
To download a specific subset of the training data, which consist of \(p\)-\(p\) collisions in the Upgrade, including spillover, execute the following command:
xrdcp -r root://eoslhcb.cern.ch//eos/lhcb/user/a/anthonyc/tracking/data/csv/v2.4/minbias-sim10b-xdigi_subset . --parallel 4
You may need to set up a kerberos ticket (kinit yourusername@CERN.CH
) as explained
the page shown earlier.
Alternatively, you can copy the files from an LXPLUS machine using this command:
rsync -arv yourusername@lxplus.cern.ch:/eos/lhcb/user/a/anthonyc/tracking/data/csv/v2.4/minbias-sim10b-xdigi_subset .
Make sure to replace yourusername
with your actual username.
Once the data is downloaded, ensure that the input_dir
parameter in the preprocessing
section of the pipeline configuration file,
located at etx4velo/pipeline_configs/example.yaml
, corresponds to the location of the
minbias-sim10b-xdigi_subset
directory on your machine.
The minbias-sim10b-xdigi_subset
folder contains the subfolders labeled 0, 2, 4, 5, 6, 8 and 10.
Each of these folders contains between 1,000 to 2,000 events. Within each subfolder,
you’ll find 4 files hits_velo.parquet.lz4
, hits_ut.parquet.lz4
, hits_scifi.parquet.lz4
and mc_particles.parquet.lz4
, described in detail in the (X)DIGI2CSV documentation
Please note that at this stage, only the hits_velo.parquet.lz4
and mc_particles.parquet.lz4
files are guaranteed to be in usable state.
The downloaded data will be divided into a training set consisting of 5,000 events and a test set containing 1,000 events.
2. Setting Up Test Samples#
For configuring the test samples, you can follow the guide available on this page. The essential steps are outlined below:
Begin by downloading the archive containing the test samples and then untarring it using the following commands:
xrdcp root://eoslhcb.cern.ch//eos/lhcb/user/a/anthonyc/tracking/data/data_validation/v2.4/reference_samples.tar.lz4 . lz4 -d reference_samples.tar.lz4 -c | tar xvf -
Next, open the
setup/common_config.yaml
file and modify thereference_directory
field to specify the actualy location of thereference_directory
directory.Finally, collect the test samples by following these steps:
# If you have not already, activate the environment conda activate etx4velo_env source setup/setup.sh cd etx4velo # Run the test sample collection script ./scripts/collect_test_samples.py
Once you’ve completed these steps, the configuration for the test samples will be available in the
etx4velo/evaluation/test_samples.yaml
file, ready for use in the next steps.
3. Verify File Output Directories#
To ensure the correct placement of output files generated during training and testing,
open the configuration file setup/common_config.yaml
and check that the specified
output paths align with your intended setup.
4. Launch the Jupyter Notebook#
To interactively run the pipeline, follow these steps:
# If you haven't already, activate the environment
conda activate etx4velo_env
source setup/setup.sh
cd etx4velo
# Launch Jupyter Lab on a given port
jupyter-lab --port 8889
After launching Jupyter Lab, access the Jupyter notebook located
at notebooks/full_pipeline.ipynb
.
Within this notebook, you’ll find detailed instructions to guide you through
the subsequent steps of the pipeline.
Note
If LaTeX is not currently installed on your machine, you may encounter an issue
when generating figures. To address this, you have two options:
You can follow the installation instructions provided at
in https://tug.org/texlive/quickinstall.html
to install TexLive. You do not
require admin rights for this installation.
Alternatively, you can disable the usage of LaTeX by modifying
the pipeline/utils/plotutils/plotconfig.py
file.
Locate the "text.usetex"
parameter and set it to False
.