corelay.io.hashing

A module that contains non-cryptographic hashing functionality for Python objects. These are used to compute hashes of the inputs of operations performed by instances of Processor to identify them in a way that is independent of their memory address and can be used to identify data between subsequent runs of the same Pipeline.

Note

Please refer to the Funcache Project to see the original implementation of this module.

Module Attributes

Tensor

Either the PyTorch Tensor class or the placeholder TensorPlaceholder class if PyTorch is not installed.

Functions

ext_hash

Hashes the specified data.

Classes

HashPickler

A pickler for computing hashes.

Hasher

Hasher object with a write function for file-like updates

SupportsConversionToNumPyArray

A protocol that defines an interface for objects that can be converted to a ndarray.

Tensor

Either the PyTorch Tensor class or the placeholder TensorPlaceholder class if PyTorch is not installed.

TensorPlaceholder

A placeholder class to stand in for PyTorch's Tensor class in case PyTorch is not installed.

class corelay.io.hashing.SupportsConversionToNumPyArray[source]

Bases: Protocol

A protocol that defines an interface for objects that can be converted to a ndarray.

numpy() ndarray[Any, Any][source]

Converts the object to a ndarray.

Returns:

Returns a ndarray representation of the object.

Return type:

numpy.ndarray[Any, Any]

__init__(*args, **kwargs)[source]
class corelay.io.hashing.TensorPlaceholder[source]

Bases: object

A placeholder class to stand in for PyTorch’s Tensor class in case PyTorch is not installed.

numpy() ndarray[Any, Any][source]

Converts the Tensor to a ndarray.

Raises:

NotImplementedError – This method should not be called, as this is a placeholder class.

Returns:

Returns a ndarray representation of the Tensor.

Return type:

numpy.ndarray[Any, Any]

corelay.io.hashing.Tensor[source]

Either the PyTorch Tensor class or the placeholder TensorPlaceholder class if PyTorch is not installed.

Note

This is used to check if an object that is to be pickled is a Tensor or not, because PyTorch Tensor objects are converted to ndarray before pickling.

class corelay.io.hashing.Hasher[source]

Bases: MetroHash128

Hasher object with a write function for file-like updates

write(data: bytes) int[source]

Updates the hash, by adding the specified data to the end of the input.

Note

This method was made to give the MetroHash128 object a file-like interface.

Parameters:

data (bytes) – The data to add to the hash. This can be any bytes-like object.

Returns:

Returns the number of bytes added to the hash.

Return type:

int

class corelay.io.hashing.HashPickler[source]

Bases: Pickler

A pickler for computing hashes.

static numpy_id(array: ndarray[Any, Any]) tuple[str, tuple[int, ...], bytes, bytes][source]

Computes a unique ID for a ndarray, which consists of the data type name, the array’s shape, and the values of the array decomposed into their respective mantissas and exponents as a bytes sequence.

Parameters:

array (numpy.ndarray[Any, Any]) – The ndarray to compute the ID for.

Returns:

Returns a tuple containing the data type name, the array’s shape, and the values of the array decomposed into their respective mantissas and exponents as a bytes sequence.

Return type:

tuple[str, tuple[int, …], bytes, bytes]

persistent_id(obj: Any) tuple[str, tuple[int, ...], bytes, bytes] | None[source]

Computes a persistent ID for an object that is to be pickled, which can be used by the pickle module to identify two objects as “the same” during the un-pickling process. The persistent ID is used to identify the object in a way that is independent of its memory address. This is useful for caching and serialization purposes.

Parameters:

obj (Any) – The object to compute the persistent ID for.

Returns:

Returns a persistent ID for the object. If the object is a ndarray, it returns a tuple containing the data type name, the array’s shape, and the values of the array decomposed into their respective mantissas and exponents as a bytes sequence. If the object is a Tensor, it converts the tensor to a ndarray and computes a unique ID for the array. If the object is neither, it returns None.

Return type:

tuple[str, tuple[int, …], bytes, bytes] | None

corelay.io.hashing.ext_hash(data: Any) str[source]

Hashes the specified data. It uses an extended, non-cryptographic hashing algorithm, which first pickles the specified object and then hashes the resulting bytes sequence using MetroHash.

Parameters:

data (Any) – The data to hash. This can be any Python object, including ndarray and Tensor.

Returns:

Returns the hash of the data as a hexadecimal str.

Return type:

str