pipeline.utils.tools package#
A package that contains general-purpose functions, that could be used outside this repository.
pipeline.utils.tools.tarray module#
A module that allows to handle conversion between tensors, arrays and dataframes, on CPU (numpy and pandas) or GPU (cupy and cudf).
- pipeline.utils.tools.tarray.array_to_tensor(array, **kwargs)[source]#
Turn an numpy or cupy array to a torch Tensor.
- Return type:
Tensor
- pipeline.utils.tools.tarray.count_occurences(tensor)[source]#
Count the number of times an element of a tensor appears in this tensor.
- Parameters:
tensor (
TypeVar
(TensorOrArray
,Tensor
,ndarray
,ndarray
)) – Torch tensor- Return type:
TypeVar
(TensorOrArray
,Tensor
,ndarray
,ndarray
)- Returns:
For each element in tensor, number of times it appears in this tensor.
- pipeline.utils.tools.tarray.get_numpy_or_cupy(use_cuda)[source]#
Get either
numpy
(CPU) orcupy
(GPU) according to whether cuda is used or not.
- pipeline.utils.tools.tarray.get_pandas_or_cudf(use_cuda)[source]#
Get either
pandas
(CPU) orcudf
(GPU) according to whether cuda is used or not.
- pipeline.utils.tools.tarray.get_use_cuda_from_dataframe(dataframe)[source]#
Get whether a dataframe on GPU is being used.
- Return type:
bool
- pipeline.utils.tools.tarray.series_to_array(series)[source]#
Turn a Pandas/cudf dataframe or series into a numpy/cupy array.
- Return type:
TypeVar
(Array
,ndarray
,ndarray
)
- pipeline.utils.tools.tarray.series_to_tensor(series)[source]#
Turn a Pandas/cudf series into a Torch tensor
- Return type:
Tensor
- pipeline.utils.tools.tarray.tensor_to_array(tensor)[source]#
Turn a tensor into an array on CPU or GPU.
- Return type:
TypeVar
(Array
,ndarray
,ndarray
)
pipeline.utils.tools.tfiles module#
A module that defines common utilities for handling files and directories.
pipeline.utils.tools.tgroupby module#
A module that defines function to efficiently groupby using numba.
- pipeline.utils.tools.tgroupby.get_group_indices(sorted_array)#
Get the array of starting indices of each group of values in a sorted array.
- Parameters:
sorted_array (
ndarray
) – a sorted one-dimensional array- Returns:
An array that contains the starting index of each group of values in the
sorted_array
. The last element corresponds to the end index of the last group.
- pipeline.utils.tools.tgroupby.get_group_indices_from_group_lengths(array_group_lengths)#
Get the array of starting indices of each group of values in a sorted array.
- Parameters:
counts – Array of length of every group after grouping by
- Returns:
An array that contains the starting index of each group of values in the
sorted_array
. The last element corresponds to the end index of the last group.
pipeline.utils.tools.tstr module#
A module that defines utilies to handle strings.
- class pipeline.utils.tools.tstr.MyFormatter(default='{{{0}}}')[source]#
Bases:
Formatter
Class used for formatting not all the arguments of a string. Taken from https://stackoverflow.com/questions/17215400/format-string-unused-named-arguments
Examples
>>> fmt=MyFormatter() >>> fmt.format("{a}{b}", a="blabla") 'blabla{b}'