corelay.io

A sub-package containing IO-related modules for storing intermediate results of operations performed by instances of Processor. This can be used in a Pipeline to prevent the re-computation of intermediate results needed multiple times, or as a cache for subsequent runs of the same pipeline.

exception corelay.io.NoDataSource[source]

Bases: Exception

An exception, which is raised when no data source available.

__init__(message: str = 'No Data Source available.') None[source]

Initializes a new NoDataSource instance.

Parameters:

message (str) – The error message to be displayed. Defaults to “No Data Source available.”

Return type:

None

exception corelay.io.NoDataTarget[source]

Bases: Exception

An exception, which is raised when no target source available.

__init__() None[source]

Initializes a new NoDataTarget instance.

Return type:

None

class corelay.io.DataStorageBase[source]

Bases: ABC, Plugboard

The abstract base class for key-value stores.

__bool__() bool[source]

Converts the data storage object to a bool value. This is used to determine if the data storage object is actually backed by a store.

Returns:

Returns True if the data storage object is backed by a store and False otherwise.

Return type:

bool

__enter__() DataStorageBase[source]

Opens the IO object and returns the instance. This is used to implement the context manager protocol, which allows the use of the with statement to automatically close the IO object when it is no longer needed. This is useful for ensuring that the IO object is properly closed and resources are released when the context manager exits.

Returns:

Returns this instance of the DataStorageBase class.

Return type:

DataStorageBase

__exit__(exception_type: type[Exception] | None, exception: Exception, traceback: TracebackType | None) None[source]

Closes the IO object. This is used to implement the context manager protocol, which allows the use of the with statement to automatically close the IO object when it is no longer needed. This is useful for ensuring that the IO object is properly closed and resources are released when the context manager exits.

Parameters:
  • exception_type (type[Exception] | None) – When the context manager exits due to an exception, this is the type of the exception that was raised, otherwise it is None.

  • exception (Exception) – When the context manager exits due to an exception, this is the exception that was raised, otherwise it is None.

  • traceback (types.TracebackType | None) – When the context manager exits due to an exception, this is the traceback of the exception that was raised, otherwise it is None.

Return type:

None

__getitem__(key: str) Any[source]

Gets the data for a given key.

Parameters:

key (str) – The key to get the data for.

Raises:

TypeError – The key is not a str.

Returns:

Returns the data for the given key.

Return type:

Any

__init__(**kwargs: Any) None[source]

Initializes a new DataStorageBase instance.

Parameters:

**kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e., Plugboard.

Return type:

None

__setitem__(key: str, value: Any) None[source]

Sets the data for a given key.

Parameters:
  • key (str) – The key to set the data for.

  • value (Any) – The data to set for the given key.

Raises:

TypeError – The key is not a str.

Return type:

None

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

at(**kwargs: Any) DataStorageBase[source]

Returns a copy of the instance where the keyword arguments were added as attributes of the class become the attributes of the class.

Parameters:

**kwargs (Any) – The keyword arguments, which are added as attributes of the class.

Raises:

TypeError – One or more of the names in the keyword arguments are not valid attribute names.

Returns:

Returns a copy of the instance where the keyword arguments were added as attributes of the class become the attributes of the class. This allows to create a new instance of the class with new or updated attributes without modifying the original instance.

Return type:

DataStorageBase

close() None[source]

Close opened IO file object.

Return type:

None

abstractmethod exists() bool[source]

Checks if the data if data exists.

Returns:

Returns True if the data exists and False otherwise.

Return type:

bool

abstractmethod keys() KeysView[str][source]

Retrieves the keys of the data stored in the storage container.

Returns:

Returns a list of keys of the IO file object.

Return type:

collections.abc.KeysView[str]

abstractmethod read(data_in: Any = None, meta: Any = None) Any[source]

Reads the output data that was produced by the specified input data, if it is available. The metadata can contain additional identifying information about the data.

Parameters:
  • data_in (Any) – Input data that produces the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – The data source is not available.

Returns:

Returns the data that was produced by the specified input data if it is available.

Return type:

Any

abstractmethod write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the storage. The hash is computed from the input data and the metadata. The metadata can be used to store additional identifying information about the data.

Parameters:
  • data_out (Any) – The output data to write.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Return type:

None

io: Any

The storage object to read and write data to. Defaults to None.

class corelay.io.NoStorage[source]

Bases: DataStorageBase

A placeholder data storage class, which does not actually use persistent storage and raises exceptions when trying to read from it or write to it.

__bool__() bool[source]

Converts the data storage object to a bool value. This is used to determine if the data storage object is actually backed by a store.

Returns:

Returns False since this is a placeholder data storage class and does not actually use persistent storage.

Return type:

bool

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

exists() bool[source]

Returns True if data exists.

Raises:

NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Returns:

Returns False since this is a placeholder data storage class and does not actually use persistent storage.

Return type:

bool

keys() KeysView[str][source]

Retrieves the keys of the data stored in the storage container.

Raises:

NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Returns:

Returns never, since this is a placeholder data storage class that does not actually use persistent storage and raises an exception.

Return type:

collections.abc.KeysView[str]

read(data_in: Any = None, meta: Any = None) Any[source]

Reads the output data that was produced by the specified input data, if it is available. The metadata can contain additional identifying information about the data.

Parameters:
  • data_in (Any) – Input data that produces the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Returns:

Returns the data that was produced by the specified input data if it is available.

Return type:

Any

write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the storage. The metadata can be used to store additional identifying information about the data.

Parameters:
  • data_out (Any) – The output data to write.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Raises:

NoDataTarget – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.

Return type:

None

class corelay.io.PickleStorage[source]

Bases: DataStorageBase

Experimental pickle storage that uses the pickle module to store data.

__init__(path: str | Path, mode: str = 'r', data_key: str | None = None, **kwargs: Any) None[source]

Initializes a new PickleStorage instance.

Parameters:
  • path (str | pathlib.Path) – The path to the pickle file where the data is to read from or written to.

  • mode (str) – The mode in which the file is opened. This can be either “w” for write mode, “r” for read mode or “a” for append mode. In write mode, the file is created if it does not exist and the existing file is overwritten. In read mode, the file must already exist and the data is read from the file. In append mode, the file is created if it does not exist and the data is appended to the end of the file. Defaults to “r”.

  • data_key (str | None) – The key of the data that is read from the pickle file or written to the pickle file. Defaults to None.

  • **kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e., DataStorageBase.

Raises:

ValueError – The mode is not “w”, “r”, or “a”.

Return type:

None

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

data_key: Annotated[str, Param]

The key of the data that is read from the pickle file or written to the pickle file.

Parameters:
Return type:

Plug

exists() bool[source]

Determines if the data key exists in the data.

Returns:

Returns True if the data key exists and False otherwise.

Return type:

bool

keys() KeysView[str][source]

Retrieves the keys of the data stored in the pickle file.

Returns:

Returns a view of keys of the data that is stored in the file.

Return type:

collections.abc.KeysView[str]

read(data_in: Any = None, meta: Any = None) Any[source]

Retrieves the data for a given data key.

Parameters:
  • data_in (Any) – Input data that produced the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – The data source for the given data key does not exist.

Returns:

Returns the data for the given data key.

Return type:

Any

write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the pickle file using the given data key as: {‘data’: data_out, ‘key’: self.data_key}.

Parameters:
  • data_out (Any) – The data to write to the pickle file.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Return type:

None

io: IO[Any]

The file object to read data from and write data to. This is a binary file object that is used to store the pickled data.

data: dict[str, Any]

A dict that stores the data that is read from or written to the file. The keys of the dict are the keys of the data that is stored in the file, and the values are the data that is stored in the file. The dict is used to cache the data that is read from the file, so that it does not need to be read from the file again if it is already cached.

class corelay.io.HDF5Storage[source]

Bases: DataStorageBase

A storage that used HDF5 files to store data.

__init__(path: str | Path, mode: str = 'r', data_key: str | None = None, **kwargs: Any) None[source]

Initializes a new HDF5Storage instance.

Parameters:
  • path (str | pathlib.Path) – The path to the HDF5 file where the data is to read from or written to.

  • mode (str) – The mode to open the HDF5 file in. This can be either “w” for write mode, “r” for read mode or “a” for append mode. In write mode, the file is created if it does not exist and existing files will be overwritten. In read mode, the file must already exist and the data is read from the file. In append mode, the file is created if it does not exist and the data is appended to the end of the file if the file already exists. Defaults to “r”.

  • data_key (str | None) – The key of the data that is read from the HDF5 file or written to the HDF5 file. Defaults to None.

  • **kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e., DataStorageBase.

Raises:

ValueError – The mode is not “w”, “r”, or “a”.

Return type:

None

__tracked__: OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

data_key: Annotated[str, Param]

The key of the data that is read from the HDF5 file or written to the HDF5 file.

Parameters:
Return type:

Plug

exists() bool[source]

Checks if the data key exists in the HDF5 file.

Returns:

Returns True if the data key exists and False otherwise.

Return type:

bool

keys() KeysView[str][source]

Retrieves the keys of the data stored in the HDF5 file.

Returns:

Returns a view of keys of the data in the HDF5 file.

Return type:

collections.abc.KeysView[str]

read(data_in: Any = None, meta: Any = None) Any[source]

Retrieves the data for a given data key.

Parameters:
  • data_in (Any) – Input data that produced the data that is to be read. Defaults to None.

  • meta (Any) – Meta data that contains additional identifying information about the data that is to be read. Defaults to None.

Raises:

NoDataSource – The data source for the given data key does not exist.

Returns:

Returns the data for the given data key.

Return type:

Any

write(data_out: dict[str, Any] | tuple[Any, ...] | Any, data_in: Any = None, meta: Any = None) None[source]

Writes the specified output data to the HDF5 file. If the output data is a dict, then the output data is stored in an HDF5 group with the name given by the data key. The key-value pairs of the dict will be stored in this HDF5 group with the keys of the dict used as the names of the datasets and the values of the dict used as the data for the datasets. If the output data is a tuple, then the output data is stored in an HDF5 group with the name given by the data key. The values of the tuple will be stored as datasets in this HDF5 group, with the indices of the tuple used as the names of the datasets and the values of the tuple used as the data for the datasets. If the output data is neither a dict nor a tuple, then the output data is stored in an HDF5 dataset with the name given by the data key and the output data used as the data for the dataset.

Parameters:
  • data_out (dict[str, Any] | tuple[Any, ...] | Any) – The data to write to the HDF5 file. This can either be a dataset, a tuple, or any value that can be written to an HDF5 file (i.e., basic data types like int, float, bool, or str, or a ndarray). If the data is a dict, then it will be stored as an HDF5 group with the name given by the data key. The key-value pairs of the dict will be stored in this HDF5 group with the keys of the dict used as the names of the datasets and the values of the dict used as the data for the datasets. If the data is a tuple, then it will be stored as an HDF5 group with the name given by the data key. The values of the tuple will be stored as datasets in this HDF5 group, with the indices of the tuple used as the names of the datasets and the values of the tuple used as the data for the datasets. If the data is neither a dict nor a tuple, then it will be stored in an HDF5 dataset with the name given by the data key and the data used as the data for the dataset.

  • data_in (Any) – The input data that produced the output data. Defaults to None.

  • meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to None.

Return type:

None

io: File

The HDF5 file object to read data from and write data to.

Modules

hashing

A module that contains non-cryptographic hashing functionality for Python objects.

storage

A module that contains classes to read and write intermediate results of operations performed by instances of Processor to and from different file formats like HDF5 and Python pickles.