corelay.io.storage

A module that contains classes to read and write intermediate results of operations performed by instances of Processor to and from different file formats like HDF5 and Python pickles.

Module Attributes

RecursiveNumPyArrayTuple

A recursive tuple of ndarray, i.e., a tuple that contains ndarray or other tuples of ndarray, which themselves can contain other tuples of ndarray, and so on.

RecursiveHashTuple

A recursive tuple of strings, i.e., a tuple that contains strings or other tuples of strings, which themselves can contain other tuples of strings, and so on.

Classes

DataStorageBase

The abstract base class for key-value stores.

HDF5Storage

A storage that used HDF5 files to store data.

HashedHDF5

A storage container, which can be used to store Processor data in HDF5 files.

NoStorage

A placeholder data storage class, which does not actually use persistent storage and raises exceptions when trying to read from it or write to it.

PickleStorage

Experimental pickle storage that uses the pickle module to store data.

Storable

An abstract class that defines the interface for storable objects, i.e., objects that have a read() and write() method.

StringInfo

A type for the type information that the h5py.check_string_dtype() function returns.

Exceptions

NoDataSource

An exception, which is raised when no data source available.

NoDataTarget

An exception, which is raised when no target source available.

class corelay.io.storage.Storable[source]

Bases: Protocol

An abstract class that defines the interface for storable objects, i.e., objects that have a read() and write() method.

read(data_in: Any, meta: Any) Any[source]

Retrieves the output data that was produced by the specified input data if it is available. The metadata can contain additional identifying information about the data.

Parameters:
  • data_in (Any) – The input data to retrieve the output data for.

  • meta (Any) – The metadata to retrieve the output data for, which can contain additional identifying information about the data.

Returns:

Returns the output data that was produced by the specified input data if it is available.

Return type:

Any

write(data_out: Any, data_in: Any, meta: Any) None[source]

Writes the specified output data to the storage container. The metadata that can be used to store additional identifying information about the data.

Parameters:
  • data_out (Any) – The output data to write.

  • data_in (Any) – The input data that produced the output data.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data.

Return type:

None

__init__(*args, **kwargs)[source]
exception corelay.io.storage.NoDataSource[source]

Bases: Exception

An exception, which is raised when no data source available.

__init__(message: str = 'No Data Source available.') None[source]

Initializes a new NoDataSource instance.

Parameters:

message (str) – The error message to be displayed. Defaults to “No Data Source available.”

Return type:

None

exception corelay.io.storage.NoDataTarget[source]

Bases: Exception

An exception, which is raised when no target source available.

__init__() None[source]

Initializes a new NoDataTarget instance.

Return type:

None

corelay.io.storage.RecursiveNumPyArrayTuple

A recursive tuple of ndarray, i.e., a tuple that contains ndarray or other tuples of ndarray, which themselves can contain other tuples of ndarray, and so on. This is used to represent a nested structure of ndarray.

alias of tuple[numpy.ndarray[typing.Any, typing.Any] | RecursiveNumPyArrayTuple, …]

corelay.io.storage.RecursiveHashTuple

A recursive tuple of strings, i.e., a tuple that contains strings or other tuples of strings, which themselves can contain other tuples of strings, and so on. This is used to represent a nested structure of hashes for the data that is stored in a RecursiveNumPyArrayTuple.

alias of tuple[str | RecursiveHashTuple, …]

class corelay.io.storage.HashedHDF5[source]

Bases: object

A storage container, which can be used to store Processor data in HDF5 files. A hash of the input data that produced the stored data is stored alongside the data, so that the data can later be retrieved based on the input data.

__init__(h5group: Group) None[source]

Initializes a new HashedHDF5 instance.

Parameters:

h5group (h5py.Group) – The HDF5 group to store the data in.

Return type:

None

base: Group

The HDF5 group to store the data in.

read(data_in: Any, meta: Any) Any[source]

Retrieves the output data that was produced by the specified input data if it is available. The hash is computed from the input data and the metadata. The metadata can contain additional identifying information about the data.

Parameters:
  • data_in (Any) – The input data to retrieve the output data for.

  • meta (Any) – The metadata to retrieve the output data for, which can contain additional identifying information about the data.

Raises:

NoDataSource – The data source is not available.

Returns:

Returns the output data that was produced by the specified input data if it is available.

Return type:

Any

write(data_out: Any, data_in: Any, meta: Any) None[source]

Writes the specified output data to a hashed HDF5 group. The hash is computed from the input data and the metadata. The metadata that can be used to store additional identifying information about the data.

Parameters:
  • data_out (Any) – The output data to write.

  • data_in (Any) – The input data that produced the output data. Is used to compute the hash.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Is used to compute the hash.

Raises:

TypeError – The data type of the input data is not supported.

Return type:

None

class corelay.io.storage.DataStorageBase[source]

Bases: ABC, Plugboard

The abstract base class for key-value stores.

__init__(**kwargs: Any) None[source]

Initializes a new DataStorageBase instance.

Parameters:

**kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e., Plugboard.

Return type:

None

io: Any

The storage object to read and write data to. Defaults to None.

abstractmethod read(data_in: Any = None, meta: Any = None) Any[source]

Reads the output data that was produced by the specified input data, if it is available. The metadata can contain additional identifying information about the data.

Parameters:
  • data_in (Any) – Input data that produces the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – The data source is not available.

Returns:

Returns the data that was produced by the specified input data if it is available.

Return type:

Any

abstractmethod write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the storage. The hash is computed from the input data and the metadata. The metadata can be used to store additional identifying information about the data.

Parameters:
  • data_out (Any) – The output data to write.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Return type:

None

abstractmethod exists() bool[source]

Checks if the data if data exists.

Returns:

Returns True if the data exists and False otherwise.

Return type:

bool

abstractmethod keys() KeysView[str][source]

Retrieves the keys of the data stored in the storage container.

Returns:

Returns a list of keys of the IO file object.

Return type:

collections.abc.KeysView[str]

__enter__() DataStorageBase[source]

Opens the IO object and returns the instance. This is used to implement the context manager protocol, which allows the use of the with statement to automatically close the IO object when it is no longer needed. This is useful for ensuring that the IO object is properly closed and resources are released when the context manager exits.

Returns:

Returns this instance of the DataStorageBase class.

Return type:

DataStorageBase

__exit__(exception_type: type[Exception] | None, exception: Exception, traceback: TracebackType | None) None[source]

Closes the IO object. This is used to implement the context manager protocol, which allows the use of the with statement to automatically close the IO object when it is no longer needed. This is useful for ensuring that the IO object is properly closed and resources are released when the context manager exits.

Parameters:
  • exception_type (type[Exception] | None) – When the context manager exits due to an exception, this is the type of the exception that was raised, otherwise it is None.

  • exception (Exception) – When the context manager exits due to an exception, this is the exception that was raised, otherwise it is None.

  • traceback (types.TracebackType | None) – When the context manager exits due to an exception, this is the traceback of the exception that was raised, otherwise it is None.

Return type:

None

__getitem__(key: str) Any[source]

Gets the data for a given key.

Parameters:

key (str) – The key to get the data for.

Raises:

TypeError – The key is not a str.

Returns:

Returns the data for the given key.

Return type:

Any

__setitem__(key: str, value: Any) None[source]

Sets the data for a given key.

Parameters:
  • key (str) – The key to set the data for.

  • value (Any) – The data to set for the given key.

Raises:

TypeError – The key is not a str.

Return type:

None

__bool__() bool[source]

Converts the data storage object to a bool value. This is used to determine if the data storage object is actually backed by a store.

Returns:

Returns True if the data storage object is backed by a store and False otherwise.

Return type:

bool

close() None[source]

Close opened IO file object.

Return type:

None

at(**kwargs: Any) DataStorageBase[source]

Returns a copy of the instance where the keyword arguments were added as attributes of the class become the attributes of the class.

Parameters:

**kwargs (Any) – The keyword arguments, which are added as attributes of the class.

Raises:

TypeError – One or more of the names in the keyword arguments are not valid attribute names.

Returns:

Returns a copy of the instance where the keyword arguments were added as attributes of the class become the attributes of the class. This allows to create a new instance of the class with new or updated attributes without modifying the original instance.

Return type:

DataStorageBase

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

class corelay.io.storage.NoStorage[source]

Bases: DataStorageBase

A placeholder data storage class, which does not actually use persistent storage and raises exceptions when trying to read from it or write to it.

__bool__() bool[source]

Converts the data storage object to a bool value. This is used to determine if the data storage object is actually backed by a store.

Returns:

Returns False since this is a placeholder data storage class and does not actually use persistent storage.

Return type:

bool

read(data_in: Any = None, meta: Any = None) Any[source]

Reads the output data that was produced by the specified input data, if it is available. The metadata can contain additional identifying information about the data.

Parameters:
  • data_in (Any) – Input data that produces the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Returns:

Returns the data that was produced by the specified input data if it is available.

Return type:

Any

write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the storage. The metadata can be used to store additional identifying information about the data.

Parameters:
  • data_out (Any) – The output data to write.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Raises:

NoDataTarget – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Return type:

None

exists() bool[source]

Returns True if data exists.

Raises:

NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Returns:

Returns False since this is a placeholder data storage class and does not actually use persistent storage.

Return type:

bool

keys() KeysView[str][source]

Retrieves the keys of the data stored in the storage container.

Raises:

NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Returns:

Returns never, since this is a placeholder data storage class that does not actually use persistent storage and raises an exception.

Return type:

collections.abc.KeysView[str]

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

io: Any

The storage object to read and write data to. Defaults to None.

class corelay.io.storage.PickleStorage[source]

Bases: DataStorageBase

Experimental pickle storage that uses the pickle module to store data.

data_key: Annotated[str, Param]

The key of the data that is read from the pickle file or written to the pickle file.

Parameters:
Return type:

Plug

__init__(path: str | Path, mode: str = 'r', data_key: str | None = None, **kwargs: Any) None[source]

Initializes a new PickleStorage instance.

Parameters:
  • path (str | pathlib.Path) – The path to the pickle file where the data is to read from or written to.

  • mode (str) – The mode in which the file is opened. This can be either “w” for write mode, “r” for read mode or “a” for append mode. In write mode, the file is created if it does not exist and the existing file is overwritten. In read mode, the file must already exist and the data is read from the file. In append mode, the file is created if it does not exist and the data is appended to the end of the file. Defaults to “r”.

  • data_key (str | None) – The key of the data that is read from the pickle file or written to the pickle file. Defaults to None.

  • **kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e., DataStorageBase.

Raises:

ValueError – The mode is not “w”, “r”, or “a”.

Return type:

None

io: IO[Any]

The file object to read data from and write data to. This is a binary file object that is used to store the pickled data.

data: dict[str, Any]

A dict that stores the data that is read from or written to the file. The keys of the dict are the keys of the data that is stored in the file, and the values are the data that is stored in the file. The dict is used to cache the data that is read from the file, so that it does not need to be read from the file again if it is already cached.

read(data_in: Any = None, meta: Any = None) Any[source]

Retrieves the data for a given data key.

Parameters:
  • data_in (Any) – Input data that produced the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – The data source for the given data key does not exist.

Returns:

Returns the data for the given data key.

Return type:

Any

write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the pickle file using the given data key as: {‘data’: data_out, ‘key’: self.data_key}.

Parameters:
  • data_out (Any) – The data to write to the pickle file.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Return type:

None

exists() bool[source]

Determines if the data key exists in the data.

Returns:

Returns True if the data key exists and False otherwise.

Return type:

bool

keys() KeysView[str][source]

Retrieves the keys of the data stored in the pickle file.

Returns:

Returns a view of keys of the data that is stored in the file.

Return type:

collections.abc.KeysView[str]

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

class corelay.io.storage.StringInfo[source]

Bases: NamedTuple

A type for the type information that the h5py.check_string_dtype() function returns. This class is, unfortunately, not exported by the h5py module, so we have to define it ourselves to gain type safety.

static __new__(_cls, encoding: str, length: int)

Create new instance of StringInfo(encoding, length)

Parameters:
__repr__()[source]

Return a nicely formatted representation string

encoding: str

The encoding of the str, e.g., “utf-8” or “ascii”.

length: int

The length of the str.

class corelay.io.storage.HDF5Storage[source]

Bases: DataStorageBase

A storage that used HDF5 files to store data.

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

data_key: Annotated[str, Param]

The key of the data that is read from the HDF5 file or written to the HDF5 file.

Parameters:
Return type:

Plug

__init__(path: str | Path, mode: str = 'r', data_key: str | None = None, **kwargs: Any) None[source]

Initializes a new HDF5Storage instance.

Parameters:
  • path (str | pathlib.Path) – The path to the HDF5 file where the data is to read from or written to.

  • mode (str) – The mode to open the HDF5 file in. This can be either “w” for write mode, “r” for read mode or “a” for append mode. In write mode, the file is created if it does not exist and existing files will be overwritten. In read mode, the file must already exist and the data is read from the file. In append mode, the file is created if it does not exist and the data is appended to the end of the file if the file already exists. Defaults to “r”.

  • data_key (str | None) – The key of the data that is read from the HDF5 file or written to the HDF5 file. Defaults to None.

  • **kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e., DataStorageBase.

Raises:

ValueError – The mode is not “w”, “r”, or “a”.

Return type:

None

io: File

The HDF5 file object to read data from and write data to.

read(data_in: Any = None, meta: Any = None) Any[source]

Retrieves the data for a given data key.

Parameters:
  • data_in (Any) – Input data that produced the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – The data source for the given data key does not exist.

Returns:

Returns the data for the given data key.

Return type:

Any

write(data_out: dict[str, Any] | tuple[Any, ...] | Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the HDF5 file. If the output data is a dict, then the output data is stored in an HDF5 group with the name given by the data key. The key-value pairs of the dict will be stored in this HDF5 group with the keys of the dict used as the names of the datasets and the values of the dict used as the data for the datasets. If the output data is a tuple, then the output data is stored in an HDF5 group with the name given by the data key. The values of the tuple will be stored as datasets in this HDF5 group, with the indices of the tuple used as the names of the datasets and the values of the tuple used as the data for the datasets. If the output data is neither a dict nor a tuple, then the output data is stored in an HDF5 dataset with the name given by the data key and the output data used as the data for the dataset.

Parameters:
  • data_out (dict[str, Any] | tuple[Any, ...] | Any) – The data to write to the HDF5 file. This can either be a dataset, a tuple, or any value that can be written to an HDF5 file (i.e., basic data types like int, float, bool, or str, or a ndarray). If the data is a dict, then it will be stored as an HDF5 group with the name given by the data key. The key-value pairs of the dict will be stored in this HDF5 group with the keys of the dict used as the names of the datasets and the values of the dict used as the data for the datasets. If the data is a tuple, then it will be stored as an HDF5 group with the name given by the data key. The values of the tuple will be stored as datasets in this HDF5 group, with the indices of the tuple used as the names of the datasets and the values of the tuple used as the data for the datasets. If the data is neither a dict nor a tuple, then it will be stored in an HDF5 dataset with the name given by the data key and the data used as the data for the dataset.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Return type:

None

exists() bool[source]

Checks if the data key exists in the HDF5 file.

Returns:

Returns True if the data key exists and False otherwise.

Return type:

bool

keys() KeysView[str][source]

Retrieves the keys of the data stored in the HDF5 file.

Returns:

Returns a view of keys of the data in the HDF5 file.

Return type:

collections.abc.KeysView[str]