corelay.processor.distance

A module that contains processors for pair-wise distance metrics.

Classes

Distance

The abstract base class for distance processors.

SciPyPDist

A distance metric, that computes the pair-wise distance between observations in n-dimensional space using scipy.spatial.distance.pdist().

class corelay.processor.distance.Distance[source]

Bases: Processor

The abstract base class for distance processors.

Parameters:
  • is_output (bool) – A value indicating whether this Distance processor is the output of a Pipeline. Defaults to False.

  • is_checkpoint (bool | None) – A value indicating whether check-pointed pipeline computations should start at this point, if there exists a previously computed checkpoint value. Defaults to False.

  • io (Storable | None) – An IO object that is used to cache intermediate results of the Pipeline, which can then be re-used in this run or in subsequent runs of the Pipeline. Defaults to an instance of NoStorage.

__tracked__: collections.OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

class corelay.processor.distance.SciPyPDist[source]

Bases: Distance

A distance metric, that computes the pair-wise distance between observations in n-dimensional space using scipy.spatial.distance.pdist().

Parameters:
  • is_output (bool) – A value indicating whether this SciPyPDist distance processor is the output of a Pipeline. Defaults to False.

  • is_checkpoint (bool | None) – A value indicating whether check-pointed pipeline computations should start at this point, if there exists a previously computed checkpoint value. Defaults to False.

  • io (Storable | None) – An IO object that is used to cache intermediate results of the Pipeline, which can then be re-used in this run or in subsequent runs of the Pipeline. Defaults to an instance of NoStorage.

  • metric (str) – The distance metric to use. Default is “euclidean”.

  • m_kwargs (dict) – Additional keyword arguments to pass to the distance function.

metric: Annotated[str, Param]

The distance metric to use. Can be one of

  • “braycurtis”

  • “canberra”

  • “chebychev”, “chebyshev”, “cheby”, “cheb”, “ch”

  • “cityblock”, “cblock”, “cb”, “c”

  • “correlation”, “co”

  • “cosine”, “cos”

  • “dice”

  • “euclidean”, “euclid”, “eu”, “e”

  • “hamming”, “hamm”, “ha”, “h”

  • “minkowski”, “mi”, “m”

  • “pnorm”

  • “jaccard”, “jacc”, “ja”, “j”

  • “jensenshannon”, “js”

  • “mahalanobis”, “mahal”, “mah”

  • “rogerstanimoto”

  • “russellrao”

  • “seuclidean”, “se”, “s”

  • “sokalsneath”

  • “sqeuclidean”, “sqe”, “sqeuclid”

  • “yule”

Defaults to “euclidean”.

Parameters:
Return type:

Plug

__tracked__: collections.OrderedDict[str, Any]

An collections.OrderedDict with all public class attributes, i.e., all class attributes not enclosed with double underscores.

m_kwargs: Annotated[dict[str, Any], Param]

Additional keyword arguments to pass to the distance function.

Parameters:
Return type:

Plug

function(data: Any) Any[source]

Applies the pairwise distance function to the input data.

Parameters:

data (Any) – The input data that is to be processed. The input data should be a NumPy array of shape (number_of_samples, number_of_features).

Raises:

ValueError – The distance metric is not valid.

Returns:

Returns the pairwise distance matrix of shape (number_of_samples, number_of_samples).

Return type:

Any