corelay.io.storage
A module that contains classes to read and write intermediate results of operations performed by instances of
Processor to and from different file formats like HDF5 and Python pickles.
Module Attributes
A recursive tuple of |
|
A recursive tuple of strings, i.e., a tuple that contains strings or other tuples of strings, which themselves can contain other tuples of strings, and so on. |
Classes
The abstract base class for key-value stores. |
|
A storage that used HDF5 files to store data. |
|
A storage container, which can be used to store |
|
A placeholder data storage class, which does not actually use persistent storage and raises exceptions when trying to read from it or write to it. |
|
Experimental pickle storage that uses the |
|
An abstract class that defines the interface for storable objects, i.e., objects that have a |
|
A type for the type information that the |
Exceptions
An exception, which is raised when no data source available. |
|
An exception, which is raised when no target source available. |
- class corelay.io.storage.Storable[source]
Bases:
ProtocolAn abstract class that defines the interface for storable objects, i.e., objects that have a
read()andwrite()method.- read(data_in: Any, meta: Any) Any[source]
Retrieves the output data that was produced by the specified input data if it is available. The metadata can contain additional identifying information about the data.
- Parameters:
- Returns:
Returns the output data that was produced by the specified input data if it is available.
- Return type:
- exception corelay.io.storage.NoDataSource[source]
Bases:
ExceptionAn exception, which is raised when no data source available.
- exception corelay.io.storage.NoDataTarget[source]
Bases:
ExceptionAn exception, which is raised when no target source available.
- __init__() None[source]
Initializes a new
NoDataTargetinstance.- Return type:
None
- corelay.io.storage.RecursiveNumPyArrayTuple
A recursive tuple of
ndarray, i.e., a tuple that containsndarrayor other tuples ofndarray, which themselves can contain other tuples ofndarray, and so on. This is used to represent a nested structure ofndarray.alias of
tuple[numpy.ndarray[typing.Any, typing.Any] | RecursiveNumPyArrayTuple, …]
- corelay.io.storage.RecursiveHashTuple
A recursive tuple of strings, i.e., a tuple that contains strings or other tuples of strings, which themselves can contain other tuples of strings, and so on. This is used to represent a nested structure of hashes for the data that is stored in a
RecursiveNumPyArrayTuple.alias of
tuple[str | RecursiveHashTuple, …]
- class corelay.io.storage.HashedHDF5[source]
Bases:
objectA storage container, which can be used to store
Processordata in HDF5 files. A hash of the input data that produced the stored data is stored alongside the data, so that the data can later be retrieved based on the input data.- __init__(h5group: Group) None[source]
Initializes a new
HashedHDF5instance.- Parameters:
h5group (h5py.Group) – The HDF5 group to store the data in.
- Return type:
None
- read(data_in: Any, meta: Any) Any[source]
Retrieves the output data that was produced by the specified input data if it is available. The hash is computed from the input data and the metadata. The metadata can contain additional identifying information about the data.
- Parameters:
- Raises:
NoDataSource – The data source is not available.
- Returns:
Returns the output data that was produced by the specified input data if it is available.
- Return type:
- write(data_out: Any, data_in: Any, meta: Any) None[source]
Writes the specified output data to a hashed HDF5 group. The hash is computed from the input data and the metadata. The metadata that can be used to store additional identifying information about the data.
- Parameters:
- Raises:
TypeError – The data type of the input data is not supported.
- Return type:
None
- class corelay.io.storage.DataStorageBase[source]
-
The abstract base class for key-value stores.
- __init__(**kwargs: Any) None[source]
Initializes a new
DataStorageBaseinstance.
- abstractmethod read(data_in: Any = None, meta: Any = None) Any[source]
Reads the output data that was produced by the specified input data, if it is available. The metadata can contain additional identifying information about the data.
- Parameters:
- Raises:
NoDataSource – The data source is not available.
- Returns:
Returns the data that was produced by the specified input data if it is available.
- Return type:
- abstractmethod write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]
Writes the specified output data to the storage. The hash is computed from the input data and the metadata. The metadata can be used to store additional identifying information about the data.
- abstractmethod keys() KeysView[str][source]
Retrieves the keys of the data stored in the storage container.
- Returns:
Returns a list of keys of the IO file object.
- Return type:
- __enter__() DataStorageBase[source]
Opens the IO object and returns the instance. This is used to implement the context manager protocol, which allows the use of the with statement to automatically close the IO object when it is no longer needed. This is useful for ensuring that the IO object is properly closed and resources are released when the context manager exits.
- Returns:
Returns this instance of the
DataStorageBaseclass.- Return type:
- __exit__(exception_type: type[Exception] | None, exception: Exception, traceback: TracebackType | None) None[source]
Closes the IO object. This is used to implement the context manager protocol, which allows the use of the with statement to automatically close the IO object when it is no longer needed. This is useful for ensuring that the IO object is properly closed and resources are released when the context manager exits.
- Parameters:
exception_type (type[Exception] | None) – When the context manager exits due to an exception, this is the type of the exception that was raised, otherwise it is
None.exception (Exception) – When the context manager exits due to an exception, this is the exception that was raised, otherwise it is
None.traceback (types.TracebackType | None) – When the context manager exits due to an exception, this is the traceback of the exception that was raised, otherwise it is
None.
- Return type:
None
- __bool__() bool[source]
Converts the data storage object to a
boolvalue. This is used to determine if the data storage object is actually backed by a store.
- at(**kwargs: Any) DataStorageBase[source]
Returns a copy of the instance where the keyword arguments were added as attributes of the class become the attributes of the class.
- Parameters:
**kwargs (Any) – The keyword arguments, which are added as attributes of the class.
- Raises:
TypeError – One or more of the names in the keyword arguments are not valid attribute names.
- Returns:
Returns a copy of the instance where the keyword arguments were added as attributes of the class become the attributes of the class. This allows to create a new instance of the class with new or updated attributes without modifying the original instance.
- Return type:
- __tracked__: OrderedDict[str, Any]
An
collections.OrderedDictwith all public class attributes, i.e., all class attributes not enclosed with double underscores.
- class corelay.io.storage.NoStorage[source]
Bases:
DataStorageBaseA placeholder data storage class, which does not actually use persistent storage and raises exceptions when trying to read from it or write to it.
- __bool__() bool[source]
Converts the data storage object to a
boolvalue. This is used to determine if the data storage object is actually backed by a store.
- read(data_in: Any = None, meta: Any = None) Any[source]
Reads the output data that was produced by the specified input data, if it is available. The metadata can contain additional identifying information about the data.
- Parameters:
- Raises:
NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.
- Returns:
Returns the data that was produced by the specified input data if it is available.
- Return type:
- write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]
Writes the specified output data to the storage. The metadata can be used to store additional identifying information about the data.
- Parameters:
- Raises:
NoDataTarget – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.
- Return type:
None
- exists() bool[source]
Returns True if data exists.
- Raises:
NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.
- Returns:
Returns
Falsesince this is a placeholder data storage class and does not actually use persistent storage.- Return type:
- keys() KeysView[str][source]
Retrieves the keys of the data stored in the storage container.
- Raises:
NoDataSource – This is a placeholder data storage class and does not actually use persistent storage and therefore always raises this exception.
- Returns:
Returns never, since this is a placeholder data storage class that does not actually use persistent storage and raises an exception.
- Return type:
- __tracked__: OrderedDict[str, Any]
An
collections.OrderedDictwith all public class attributes, i.e., all class attributes not enclosed with double underscores.
- class corelay.io.storage.PickleStorage[source]
Bases:
DataStorageBaseExperimental pickle storage that uses the
picklemodule to store data.- data_key: Annotated[str, Param]
The key of the data that is read from the pickle file or written to the pickle file.
- __init__(path: str | Path, mode: str = 'r', data_key: str | None = None, **kwargs: Any) None[source]
Initializes a new
PickleStorageinstance.- Parameters:
path (str | pathlib.Path) – The path to the pickle file where the data is to read from or written to.
mode (str) – The mode in which the file is opened. This can be either “w” for write mode, “r” for read mode or “a” for append mode. In write mode, the file is created if it does not exist and the existing file is overwritten. In read mode, the file must already exist and the data is read from the file. In append mode, the file is created if it does not exist and the data is appended to the end of the file. Defaults to “r”.
data_key (str | None) – The key of the data that is read from the pickle file or written to the pickle file. Defaults to
None.**kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e.,
DataStorageBase.
- Raises:
ValueError – The mode is not “w”, “r”, or “a”.
- Return type:
None
- io: IO[Any]
The file object to read data from and write data to. This is a binary file object that is used to store the pickled data.
- data: dict[str, Any]
A
dictthat stores the data that is read from or written to the file. The keys of thedictare the keys of the data that is stored in the file, and the values are the data that is stored in the file. Thedictis used to cache the data that is read from the file, so that it does not need to be read from the file again if it is already cached.
- read(data_in: Any = None, meta: Any = None) Any[source]
Retrieves the data for a given data key.
- Parameters:
- Raises:
NoDataSource – The data source for the given data key does not exist.
- Returns:
Returns the data for the given data key.
- Return type:
- write(data_out: Any, data_in: Any = None, meta: Any = None) None[source]
Writes the specified output data to the pickle file using the given data key as: {‘data’: data_out, ‘key’: self.data_key}.
- keys() KeysView[str][source]
Retrieves the keys of the data stored in the pickle file.
- Returns:
Returns a view of keys of the data that is stored in the file.
- Return type:
- __tracked__: OrderedDict[str, Any]
An
collections.OrderedDictwith all public class attributes, i.e., all class attributes not enclosed with double underscores.
- class corelay.io.storage.StringInfo[source]
Bases:
NamedTupleA type for the type information that the
h5py.check_string_dtype()function returns. This class is, unfortunately, not exported by theh5pymodule, so we have to define it ourselves to gain type safety.
- class corelay.io.storage.HDF5Storage[source]
Bases:
DataStorageBaseA storage that used HDF5 files to store data.
- __tracked__: OrderedDict[str, Any]
An
collections.OrderedDictwith all public class attributes, i.e., all class attributes not enclosed with double underscores.
- data_key: Annotated[str, Param]
The key of the data that is read from the HDF5 file or written to the HDF5 file.
- __init__(path: str | Path, mode: str = 'r', data_key: str | None = None, **kwargs: Any) None[source]
Initializes a new
HDF5Storageinstance.- Parameters:
path (str | pathlib.Path) – The path to the HDF5 file where the data is to read from or written to.
mode (str) – The mode to open the HDF5 file in. This can be either “w” for write mode, “r” for read mode or “a” for append mode. In write mode, the file is created if it does not exist and existing files will be overwritten. In read mode, the file must already exist and the data is read from the file. In append mode, the file is created if it does not exist and the data is appended to the end of the file if the file already exists. Defaults to “r”.
data_key (str | None) – The key of the data that is read from the HDF5 file or written to the HDF5 file. Defaults to
None.**kwargs (Any) – Keyword arguments that are passed to the constructor of the class one step up in the class hierarchy, i.e.,
DataStorageBase.
- Raises:
ValueError – The mode is not “w”, “r”, or “a”.
- Return type:
None
- read(data_in: Any = None, meta: Any = None) Any[source]
Retrieves the data for a given data key.
- Parameters:
- Raises:
NoDataSource – The data source for the given data key does not exist.
- Returns:
Returns the data for the given data key.
- Return type:
- write(data_out: dict[str, Any] | tuple[Any, ...] | Any, data_in: Any = None, meta: Any = None) None[source]
Writes the specified output data to the HDF5 file. If the output data is a
dict, then the output data is stored in an HDF5 group with the name given by the data key. The key-value pairs of thedictwill be stored in this HDF5 group with the keys of thedictused as the names of the datasets and the values of thedictused as the data for the datasets. If the output data is a tuple, then the output data is stored in an HDF5 group with the name given by the data key. The values of the tuple will be stored as datasets in this HDF5 group, with the indices of the tuple used as the names of the datasets and the values of the tuple used as the data for the datasets. If the output data is neither adictnor a tuple, then the output data is stored in an HDF5 dataset with the name given by the data key and the output data used as the data for the dataset.- Parameters:
data_out (dict[str, Any] | tuple[Any, ...] | Any) – The data to write to the HDF5 file. This can either be a dataset, a tuple, or any value that can be written to an HDF5 file (i.e., basic data types like
int,float,bool, orstr, or andarray). If the data is adict, then it will be stored as an HDF5 group with the name given by the data key. The key-value pairs of thedictwill be stored in this HDF5 group with the keys of thedictused as the names of the datasets and the values of thedictused as the data for the datasets. If the data is a tuple, then it will be stored as an HDF5 group with the name given by the data key. The values of the tuple will be stored as datasets in this HDF5 group, with the indices of the tuple used as the names of the datasets and the values of the tuple used as the data for the datasets. If the data is neither adictnor a tuple, then it will be stored in an HDF5 dataset with the name given by the data key and the data used as the data for the dataset.data_in (Any) – The input data that produced the output data. Defaults to
None.meta (Any) – The metadata that can be used to store additional identifying information about the data. Defaults to
None.
- Return type:
None