pipeline.utils.tools package#

A package that contains general-purpose functions, that could be used outside this repository.

pipeline.utils.tools.tarray module#

A module that allows to handle conversion between tensors, arrays and dataframes, on CPU (numpy and pandas) or GPU (cupy and cudf).

pipeline.utils.tools.tarray.array_to_tensor(array, **kwargs)[source]#

Turn an numpy or cupy array to a torch Tensor.

Return type:

Tensor

pipeline.utils.tools.tarray.count_occurences(tensor)[source]#

Count the number of times an element of a tensor appears in this tensor.

Parameters:

tensor (TypeVar(TensorOrArray, Tensor, ndarray, ndarray)) – Torch tensor

Return type:

TypeVar(TensorOrArray, Tensor, ndarray, ndarray)

Returns:

For each element in tensor, number of times it appears in this tensor.

pipeline.utils.tools.tarray.get_numpy_or_cupy(use_cuda)[source]#

Get either numpy (CPU) or cupy (GPU) according to whether cuda is used or not.

pipeline.utils.tools.tarray.get_pandas_or_cudf(use_cuda)[source]#

Get either pandas (CPU) or cudf (GPU) according to whether cuda is used or not.

pipeline.utils.tools.tarray.get_use_cuda_from_dataframe(dataframe)[source]#

Get whether a dataframe on GPU is being used.

Return type:

bool

pipeline.utils.tools.tarray.series_to_array(series)[source]#

Turn a Pandas/cudf dataframe or series into a numpy/cupy array.

Return type:

TypeVar(Array, ndarray, ndarray)

pipeline.utils.tools.tarray.series_to_tensor(series)[source]#

Turn a Pandas/cudf series into a Torch tensor

Return type:

Tensor

pipeline.utils.tools.tarray.tensor_to_array(tensor)[source]#

Turn a tensor into an array on CPU or GPU.

Return type:

TypeVar(Array, ndarray, ndarray)

pipeline.utils.tools.tarray.tensor_to_cupy_array(tensor)[source]#

Turn a tensor on GPU into a cupy array.

Return type:

cp.ndarray

Notes

Handle the corny case of a boolean tensor

pipeline.utils.tools.tarray.to_dataframe(tensors, use_cuda, index=None)[source]#

Convert a dictionary of tensors / arrays into a dataframe on CPU or GPU.

Return type:

TypeVar(DataFrame, DataFrame, DataFrame)

pipeline.utils.tools.tfiles module#

A module that defines common utilities for handling files and directories.

pipeline.utils.tools.tfiles.delete_directory(dir)[source]#

Delete a directory. Does not raise an error if the directory does not exit.

pipeline.utils.tools.tfiles.is_directory_not_empty(dir)[source]#

Check whether a directory exists and is not empty.

Return type:

bool

pipeline.utils.tools.tgroupby module#

A module that defines function to efficiently groupby using numba.

pipeline.utils.tools.tgroupby.get_group_indices(sorted_array)#

Get the array of starting indices of each group of values in a sorted array.

Parameters:

sorted_array (ndarray) – a sorted one-dimensional array

Returns:

An array that contains the starting index of each group of values in the sorted_array. The last element corresponds to the end index of the last group.

pipeline.utils.tools.tgroupby.get_group_indices_from_group_lengths(array_group_lengths)#

Get the array of starting indices of each group of values in a sorted array.

Parameters:

counts – Array of length of every group after grouping by

Returns:

An array that contains the starting index of each group of values in the sorted_array. The last element corresponds to the end index of the last group.

pipeline.utils.tools.tstr module#

A module that defines utilies to handle strings.

class pipeline.utils.tools.tstr.MyFormatter(default='{{{0}}}')[source]#

Bases: Formatter

Class used for formatting not all the arguments of a string. Taken from https://stackoverflow.com/questions/17215400/format-string-unused-named-arguments

Examples

>>> fmt=MyFormatter()
>>> fmt.format("{a}{b}", a="blabla")
'blabla{b}'
get_value(key, args, kwargs)[source]#
pipeline.utils.tools.tstr.partial_format(string, **kwargs)[source]#
Return type:

str