pipeline.utils.commonutils package#
A small package that defines some helper functions used for different steps of the machine-learning pipeline.
pipeline.utils.commonutils.cdetector module#
A module to handle different detectors.
pipeline.utils.commonutils.cfeatures module#
A module that defines common utilies for data-handling.
- pipeline.utils.commonutils.cfeatures.get_input_features(all_features, feature_indices)[source]#
Extract the features that are trained on, from the
batch
pytorch geometric data object.- Parameters:
batch – all features
feature_indices (
Union
[List
[int
],int
,None
]) – if it is an integer, corresponds to the number of features to include in the array of features. If it is a list of integers, it corresponds to the indices of the features to include inall_features
- Return type:
Tensor
- Returns:
Array of features
- pipeline.utils.commonutils.cfeatures.get_number_input_features(feature_indices)[source]#
Get the number of input features.
- Parameters:
feature_indices (
Union
[int
,List
[int
]]) – if it is an integer, corresponds to the number of features to include in the array of features. If it is a list of integers, it corresponds to the indices of the features to include inbatch.x
- Return type:
int
- Returns:
Number of input features
- pipeline.utils.commonutils.cfeatures.get_unnormalised_features(batch, path_or_config, feature_names)[source]#
Get the unnormalised features from the PyTorch Geometric data object, according to the configuration.
- Parameters:
batch (
Data
) – PyTorch geometric data object, that contains thex
attribute, which corresponds to the array of the featurespath_or_config (
str
|dict
) – configuration dictionary, or path to the YAML file that contains the configurationfeature_names (
List
[str
]) – list of the names of the features to extract the unnormalised values of
- Return type:
List
[Tensor
]- Returns:
List of PyTorch tensors, corresponding the the arrays of values of the features whose names are given by
features_names
pipeline.utils.commonutils.config module#
A module that helps to handle the YAML configuration.
- class pipeline.utils.commonutils.config.CommonDirs[source]#
Bases:
object
A class that handles the common configuration in
setup/common_config.yaml
.- property common_config: Dict[str, Any]#
Common configuration dictionary, in
setup/common_config.yaml
.
- property detectors: List[str]#
List of available detectors.
- get_filenames_from_detector(detector)[source]#
Get the .parquet filenames for a given detector.
- Return type:
Dict
[str
,str
]
- property repository: str#
Path to the repository.
- property test_config_path#
Path to the test configuration file.
- class pipeline.utils.commonutils.config.PipelineConfig(path_or_config, common_config=None, dir_path=None)[source]#
Bases:
MutableMapping
- add_config(path_or_config, steps=None)[source]#
Add a configuration to the current configuration.
- Parameters:
path – path to the configuration to add
steps (
Optional
[Sequence
[str
]]) – list of steps to add from this configuration. If not specified, all the steps are added.
- Raises:
ValueError – a step that already exists in the current dictionary is trying to be added.
- Return type:
None
- property common_config: Dict[str, Any]#
Common configuration dictionary
- property data_experiment_dir: str#
Path to the dictionary that contains all the data of the given experiment.
- property detector: str#
Detector the pipeline is applied to.
- dict()[source]#
Turn the experiment configuration dictionary into a regular dictionary of dictionaries.
- Return type:
Dict
[str
,Dict
[str
,Any
]]
- property dir_path: str | None#
Path to the directory the paths in input are expressed w.r.t.
- property experiment_name: str#
Name of the experiment
- property performance_dir: str#
Directory where
- property required_test_dataset_names: List[str]#
- property steps: List[str]#
- pipeline.utils.commonutils.config.get_detector_from_experiment_name(experiment_name)[source]#
Get the detector of an experimetn.
- Parameters:
experiment_name (
str
) – Name of an experiment- Return type:
str
- Returns:
Detector used in this experiment
- pipeline.utils.commonutils.config.get_detector_from_pipeline_config(path_or_config)[source]#
- Return type:
str
- pipeline.utils.commonutils.config.get_performance_directory_experiment(path_or_config)[source]#
Helper function to get the directory where to save plots and reports of metric performances.
- Parameters:
path_or_config (
str
|dict
) – configuration dictionary, or path to the YAML file that contains the configuration- Return type:
str
- Returns:
Path to the directory whereto save performance metric plots and reports
- pipeline.utils.commonutils.config.get_pipeline_config_path(experiment_name)[source]#
Get the path to the pipeline config YAML file.
- Parameters:
experiment_name (
str
) – name of the experiment- Return type:
str
- Returns:
Path where the YAML file that contains the configuration of
experiment_name
is stored.
- pipeline.utils.commonutils.config.load_config(path_or_config)[source]#
Load the configuration if not already.
Also replace
input_subdirectory
byinput_dir
andoutput_subdirectory
byoutput_subdirectory
in the loaded configuration. For this reason, please always load the configuration using this function.- Return type:
Dict
[str
,Dict
[str
,Any
]]
- pipeline.utils.commonutils.config.load_dict(path_or_config)[source]#
Load the dictionary stored in a dictionary file, or just passthrough if the provided input is already a dictionary.
- Parameters:
path_or_config (
Union
[str
,Dict
[Any
,Any
]]) – dictionary or path to a YAML file containing a dictionary- Return type:
Dict
[Any
,Any
]- Returns:
dictionary contained in the YAML file or inputted dictionary
pipeline.utils.commonutils.crun module#
- pipeline.utils.commonutils.crun.run_for_different_partitions(func, input_dir, output_dir, partitions=['train', 'val', 'test'], test_dataset_names=None, reproduce=True, list_kwargs=None, **kwargs)[source]#
Run a function for different dataset “partitions”.
- Parameters:
func (
InOutFunction
) – Function to run, with inputinput_dir
,output_dir
,reproduce
and possibly additional keyword arguments.input_dir (
str
) – input directoryoutput_dir (
str
) – output directorypartitions (
List
[str
]) –Partitions to run run the
func
on:train
: train datasetval
: validation datasettest
: all the test datasetsA specific test dataset name
test_dataset_names (
Optional
[List
[str
]]) – list of possible test dataset namesreproduce (
bool
) – whether to reproduce the output. This will remove the output directory.**kwargs – keyword arguments passed to
func
pipeline.utils.commonutils.ctests module#
A module that define utilities to handle test datasets.
- pipeline.utils.commonutils.ctests.collect_test_samples(reference_directory=None, output_path=None, n_events=1000, supplementary_test_config_path=None)[source]#
- pipeline.utils.commonutils.ctests.get_available_test_dataset_names(path_or_config_test=None)[source]#
Get the list of available test dataset names from the test dataset configuration file.
- Parameters:
path_or_config_test (
Union
[str
,Dict
[str
,Any
],None
]) – YAML test dataset configuration dictionary or path to it- Return type:
List
[str
]- Returns:
List of test dataset names that can be produced or/and used.
- pipeline.utils.commonutils.ctests.get_preprocessed_test_dataset_dir(test_dataset_name, detector)[source]#
Get the path to the directory that contains the preprocessed files of a given test dataset.
- Parameters:
test_dataset_name (
str
) – name of the test dataset to pre-process- Return type:
str
- pipeline.utils.commonutils.ctests.get_required_test_dataset_names(path_or_config)[source]#
Get the list of the dataset names required by the configuration.
- Return type:
List
[str
]
- pipeline.utils.commonutils.ctests.get_test_batch_dir(experiment_name, stage, test_dataset_name)[source]#
Get the directory where the batches of a particular experiment, and of a given test sample are saved.
- Parameters:
experiment_name (
str
) – name of the experimentstage (
str
) – name of the pipeline stagetest_dataset_name (
str
) – name of the test dataset
- Returns:
Path to the directory where the torch batch files are saved.
- pipeline.utils.commonutils.ctests.get_test_batch_paths(experiment_name, stage, test_dataset_name)[source]#
Get the list of paths of test batches of a given stage and experiment.
- Parameters:
experiment_name (
str
) – name of the experimentstage (
str
) – name of the pipeline stagetest_dataset_name (
str
) – name of the test dataset
- Return type:
List
[str
]- Returns:
List of paths of the test batches.
Notes
If
stage
containsembedding
and the test batch directory does not exists, the function tries to replaceembedding
bymetric_learning
for backward compatiblity.
- pipeline.utils.commonutils.ctests.get_test_config_for_preprocessing(test_dataset_name, path_or_config_test, detector)[source]#
Get the configuration used for the pre-processing of a given test dataset.
- Parameters:
test_dataset_name (
str
) – name of the test dataset to pre-processpath_or_config_test (
str
|dict
) – YAML test dataset configuration dictionary or path to it
- Return type:
dict