corelay.processor.embedding

A module that contains processors for embedding algorithms.

Module Attributes

UMAP

Uniform Manifold Approximation and Projection

Classes

`EigenDecomposition`	A spectral embedding `Processor` that performs eigenvalue decomposition.
`Embedding`	The abstract base class for embedding processors.
`LLEEmbedding`	An embedding `Processor` that uses the locally linear embedding (LLE) algorithm to reduce the dimensionality of the input data.
`PCAEmbedding`	An embedding `Processor` that uses the principal component analysis (PCA) algorithm to reduce the dimensionality of the input data.
`TSNEEmbedding`	An embedding `Processor` that uses the t-SNE algorithm to reduce the dimensionality of the input data.
`UMAPEmbedding`	An embedding `Processor` that uses the Uniform Manifold Approximation and Projection (UMAP) algorithm to reduce the dimensionality of the input data.

class corelay.processor.embedding.UMAP[source]

Bases: BaseEstimator, ClassNamePrefixFeaturesOutMixin

Performs the Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction algorithm, which will find a low dimensional embedding of the data that approximates an underlying manifold.

Note

Since the UMAP library is an optional dependency of CoRelAy, it is imported using the corelay.utils.import_or_stub() function, which tries to import the module/type/function specified. If the import fails, it returns a stub instead, which will raise an exception when used. The exception message will tell users how to install the missing dependencies for the functionality to work.

Returns:: Returns a UMAP cluster estimator, which can be used to fit the data.
Return type:: sklearn.base.TransformerMixin

__init__(n_neighbors=15, n_components=2, metric='euclidean', metric_kwds=None, output_metric='euclidean', output_metric_kwds=None, n_epochs=None, learning_rate=1.0, init='spectral', min_dist=0.1, spread=1.0, low_memory=True, n_jobs=-1, set_op_mix_ratio=1.0, local_connectivity=1.0, repulsion_strength=1.0, negative_sample_rate=5, transform_queue_size=4.0, a=None, b=None, random_state=None, angular_rp_forest=False, target_n_neighbors=-1, target_metric='categorical', target_metric_kwds=None, target_weight=0.5, transform_seed=42, transform_mode='embedding', force_approximation_algorithm=False, verbose=False, tqdm_kwds=None, unique=False, densmap=False, dens_lambda=2.0, dens_frac=0.3, dens_var_shift=0.1, output_dens=False, disconnection_distance=None, precomputed_knn=(None, None, None))[source]

__repr__()[source]: Return repr(self).

fit(X, y=None, ensure_all_finite=True, **kwargs)[source]

Fit X into an embedded space.

Optionally use y for supervised dimension reduction.

Parameters:

X (array, shape (n_samples, n_features) or (n_samples, n_samples)) – If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row. If the method is ‘exact’, X may be a sparse matrix of type ‘csr’, ‘csc’ or ‘coo’.
y (array, shape (n_samples)) – A target array for supervised dimension reduction. How this is handled is determined by parameters UMAP was instantiated with. The relevant attributes are target_metric and target_metric_kwds.
ensure_all_finite (Whether to raise an error on np.inf, np.nan, pd.NA in array.) –
The possibilities are: - True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- ’allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.
**kwargs (optional) – Any additional keyword arguments are passed to _fit_embed_data.

fit_transform(X, y=None, ensure_all_finite=True, **kwargs)[source]

Fit X into an embedded space and return that transformed output.

Parameters:

X (array, shape (n_samples, n_features) or (n_samples, n_samples)) – If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row.
y (array, shape (n_samples)) – A target array for supervised dimension reduction. How this is handled is determined by parameters UMAP was instantiated with. The relevant attributes are target_metric and target_metric_kwds.
ensure_all_finite (Whether to raise an error on np.inf, np.nan, pd.NA in array.) –
The possibilities are: - True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- ’allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.
**kwargs (Any additional keyword arguments are passed to _fit_embed_data.)

Returns:

X_new (array, shape (n_samples, n_components)) – Embedding of the training data in low-dimensional space.
or a tuple (X_new, r_orig, r_emb) if output_dens flag is set,
which additionally includes
r_orig (array, shape (n_samples)) – Local radii of data points in the original data space (log-transformed).
r_emb (array, shape (n_samples)) – Local radii of data points in the embedding (log-transformed).

inverse_transform(X)[source]

Transform X in the existing embedded space back into the input data space and return that transformed output.

Parameters:: X (array, shape (n_samples, n_components)) – New points to be inverse transformed.
Returns:: X_new – Generated data points new data in data space.
Return type:: array, shape (n_samples, n_features)

set_fit_request(*, ensure_all_finite: bool | None | str = '$UNCHANGED$') → UMAP[source]

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

ensure_all_finite (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for ensure_all_finite parameter in fit.
self (UMAP)

Returns:

self – The updated object.

Return type:

object

set_transform_request(*, ensure_all_finite: bool | None | str = '$UNCHANGED$') → UMAP[source]

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

ensure_all_finite (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for ensure_all_finite parameter in transform.
self (UMAP)

Returns:

self – The updated object.

Return type:

object

transform(X, ensure_all_finite=True)[source]

Transform X into the existing embedded space and return that transformed output.

Parameters:

X (array, shape (n_samples, n_features)) – New data to be transformed.
ensure_all_finite (Whether to raise an error on np.inf, np.nan, pd.NA in array.) –
The possibilities are: - True: Force all values of array to be finite.
- False: accepts np.inf, np.nan, pd.NA in array.
- ’allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.

Returns:

X_new – Embedding of the new data in low-dimensional space.

Return type:

array, shape (n_samples, n_components)

class corelay.processor.embedding.Embedding[source]

Bases: Processor

The abstract base class for embedding processors.

Parameters:

is_output (bool) – A value indicating whether this Embedding processor is the output of a Pipeline. Defaults to False.
is_checkpoint (bool | None) – A value indicating whether check-pointed pipeline computations should start at this point, if there exists a previously computed checkpoint value. Defaults to False.
io (Storable | None) – An IO object that is used to cache intermediate results of the Pipeline, which can then be re-used in this run or in subsequent runs of the Pipeline. Defaults to an instance of NoStorage.
kwargs (dict[str, Any]) – Additional keyword arguments for the embedding algorithm. Defaults to an empty dict.

kwargs: Annotated[dict[str, Any], Param]

Additional keyword arguments to pass to the embedding function.

Parameters:

obj (Any)
default (Any)

Return type: