pipeline.utils.tools package#
A package that contains general-purpose functions, that could be used outside this repository.
pipeline.utils.tools.tarray module#
A module that allows to handle conversion between tensors, arrays and dataframes, on CPU (numpy and pandas) or GPU (cupy and cudf).
- pipeline.utils.tools.tarray.array_to_tensor(array, **kwargs)[source]#
- Turn an numpy or cupy array to a torch Tensor. - Return type:
- Tensor
 
- pipeline.utils.tools.tarray.count_occurences(tensor)[source]#
- Count the number of times an element of a tensor appears in this tensor. - Parameters:
- tensor ( - TypeVar(- TensorOrArray,- Tensor,- ndarray,- ndarray)) – Torch tensor
- Return type:
- TypeVar(- TensorOrArray,- Tensor,- ndarray,- ndarray)
- Returns:
- For each element in tensor, number of times it appears in this tensor. 
 
- pipeline.utils.tools.tarray.get_numpy_or_cupy(use_cuda)[source]#
- Get either - numpy(CPU) or- cupy(GPU) according to whether cuda is used or not.
- pipeline.utils.tools.tarray.get_pandas_or_cudf(use_cuda)[source]#
- Get either - pandas(CPU) or- cudf(GPU) according to whether cuda is used or not.
- pipeline.utils.tools.tarray.get_use_cuda_from_dataframe(dataframe)[source]#
- Get whether a dataframe on GPU is being used. - Return type:
- bool
 
- pipeline.utils.tools.tarray.series_to_array(series)[source]#
- Turn a Pandas/cudf dataframe or series into a numpy/cupy array. - Return type:
- TypeVar(- Array,- ndarray,- ndarray)
 
- pipeline.utils.tools.tarray.series_to_tensor(series)[source]#
- Turn a Pandas/cudf series into a Torch tensor - Return type:
- Tensor
 
- pipeline.utils.tools.tarray.tensor_to_array(tensor)[source]#
- Turn a tensor into an array on CPU or GPU. - Return type:
- TypeVar(- Array,- ndarray,- ndarray)
 
pipeline.utils.tools.tfiles module#
A module that defines common utilities for handling files and directories.
pipeline.utils.tools.tgroupby module#
A module that defines function to efficiently groupby using numba.
- pipeline.utils.tools.tgroupby.get_group_indices(sorted_array)#
- Get the array of starting indices of each group of values in a sorted array. - Parameters:
- sorted_array ( - ndarray) – a sorted one-dimensional array
- Returns:
- An array that contains the starting index of each group of values in the - sorted_array. The last element corresponds to the end index of the last group.
 
- pipeline.utils.tools.tgroupby.get_group_indices_from_group_lengths(array_group_lengths)#
- Get the array of starting indices of each group of values in a sorted array. - Parameters:
- counts – Array of length of every group after grouping by 
- Returns:
- An array that contains the starting index of each group of values in the - sorted_array. The last element corresponds to the end index of the last group.
 
pipeline.utils.tools.tstr module#
A module that defines utilies to handle strings.
- class pipeline.utils.tools.tstr.MyFormatter(default='{{{0}}}')[source]#
- Bases: - Formatter- Class used for formatting not all the arguments of a string. Taken from https://stackoverflow.com/questions/17215400/format-string-unused-named-arguments - Examples - >>> fmt=MyFormatter() >>> fmt.format("{a}{b}", a="blabla") 'blabla{b}' 
