Training a Pipeline#

This page outlines the procedure to run a training of the pipeline. You will work with a training dataset containing 5,000 events, a validation dataset of 500 events, and two test datasets, each consisting of 1,000 events.

The pipeline configuration file of interest is located in etx4velo/pipeline_configs/example.yaml.

1. Downloading the Training Data#

In this initial step, we will download the necessary training data. In general, uou can access and download the data from the following page. The data is stored in my EOS space at the location /eos/lhcb/user/a/anthonyc/tracking/data/csv.

To download a specific subset of the training data, which consist of \(p\)-\(p\) collisions in the Upgrade, including spillover, execute the following command:

xrdcp -r root://eoslhcb.cern.ch//eos/lhcb/user/a/anthonyc/tracking/data/csv/v2.4/minbias-sim10b-xdigi_subset . --parallel 4

You may need to set up a kerberos ticket (kinit yourusername@CERN.CH) as explained the page shown earlier. Alternatively, you can copy the files from an LXPLUS machine using this command:

rsync -arv yourusername@lxplus.cern.ch:/eos/lhcb/user/a/anthonyc/tracking/data/csv/v2.4/minbias-sim10b-xdigi_subset .

Make sure to replace yourusername with your actual username.

Once the data is downloaded, ensure that the input_dir parameter in the preprocessing section of the pipeline configuration file, located at etx4velo/pipeline_configs/example.yaml, corresponds to the location of the minbias-sim10b-xdigi_subset directory on your machine.

The minbias-sim10b-xdigi_subset folder contains the subfolders labeled 0, 2, 4, 5, 6, 8 and 10. Each of these folders contains between 1,000 to 2,000 events. Within each subfolder, you’ll find 4 files hits_velo.parquet.lz4, hits_ut.parquet.lz4, hits_scifi.parquet.lz4 and mc_particles.parquet.lz4, described in detail in the (X)DIGI2CSV documentation

Please note that at this stage, only the hits_velo.parquet.lz4 and mc_particles.parquet.lz4 files are guaranteed to be in usable state.

The downloaded data will be divided into a training set consisting of 5,000 events and a test set containing 1,000 events.

2. Setting Up Test Samples#

For configuring the test samples, you can follow the guide available on this page. The essential steps are outlined below:

Begin by downloading the archive containing the test samples and then untarring it using the following commands:

xrdcp root://eoslhcb.cern.ch//eos/lhcb/user/a/anthonyc/tracking/data/data_validation/v2.4/reference_samples.tar.lz4 .
lz4 -d reference_samples.tar.lz4 -c | tar xvf -

Next, open the setup/common_config.yaml file and modify the reference_directory field to specify the actualy location of the reference_directory directory.
Finally, collect the test samples by following these steps:
```
# If you have not already, activate the environment
conda activate etx4velo_env
source setup/setup.sh
cd etx4velo
# Run the test sample collection script
./scripts/collect_test_samples.py
```
Once you’ve completed these steps, the configuration for the test samples will be available in the etx4velo/evaluation/test_samples.yaml file, ready for use in the next steps.

3. Verify File Output Directories#

To ensure the correct placement of output files generated during training and testing, open the configuration file setup/common_config.yaml and check that the specified output paths align with your intended setup.

4. Launch the Jupyter Notebook#

To interactively run the pipeline, follow these steps:

# If you haven't already, activate the environment
conda activate etx4velo_env
source setup/setup.sh
cd etx4velo

# Launch Jupyter Lab on a given port
jupyter-lab --port 8889

After launching Jupyter Lab, access the Jupyter notebook located at notebooks/full_pipeline.ipynb. Within this notebook, you’ll find detailed instructions to guide you through the subsequent steps of the pipeline.

Note

If LaTeX is not currently installed on your machine, you may encounter an issue when generating figures. To address this, you have two options: You can follow the installation instructions provided at in https://tug.org/texlive/quickinstall.html to install TexLive. You do not require admin rights for this installation. Alternatively, you can disable the usage of LaTeX by modifying the pipeline/utils/plotutils/plotconfig.py file. Locate the "text.usetex" parameter and set it to False.

Training a Pipeline

Contents

Training a Pipeline#

1. Downloading the Training Data#

2. Setting Up Test Samples#

3. Verify File Output Directories#

4. Launch the Jupyter Notebook#