pertpy.tools.KMeansSpace¶
Methods table¶
|
Add perturbations linearly. |
|
Computes K-Means clustering of the expression values. |
|
Subtract mean of the control from the perturbation. |
|
Evaluation of previously computed clustering against ground truth labels. |
|
Impute missing values in the specified column using KNN imputation in the space defined by use_rep. |
|
Subtract perturbations linearly. |
Methods¶
add¶
- KMeansSpace.add(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')¶
Add perturbations linearly. Assumes input of size n_perts x dimensionality
- Parameters:
adata (
AnnData
) – Anndata object of size n_perts x dim.reference_key (
str
) – perturbation source from which the perturbation summation starts. Defaults to ‘control’.ensure_consistency (
bool
) – If True, runs differential expression on all data matrices to ensure consistency of linear space.target_col (
str
) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbation’.
- Return type:
- Returns:
Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target") >>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")
compute¶
- KMeansSpace.compute(adata, layer_key=None, embedding_key=None, cluster_key='k-means', copy=False, return_object=False, **kwargs)[source]¶
Computes K-Means clustering of the expression values.
- Parameters:
adata (
AnnData
) – Anndata object of size cells x geneslayer_key (
str
) – if specified and exists in the adata, the clustering is done by using it. Otherwise, clustering is done with .Xembedding_key (
str
) – if specified and exists in the adata, the clustering is done with that embedding. Otherwise, clustering is done with .Xcluster_key (
str
) – name of the .obs column to store the cluster labels. Default ‘k-means’copy (
bool
) – if True returns a new Anndata of same size with the new column; otherwise it updates the initial adatareturn_object (
bool
) – if True returns the clustering object**kwargs – Are passed to sklearn’s KMeans.
- Return type:
- Returns:
If return_object is True, the adata and the clustering object is returned. Otherwise, only the adata is returned. The adata is updated with a new .obs column as specified in cluster_key,
that stores the cluster labels.
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> kmeans = pt.tl.KMeansSpace() >>> kmeans_adata = kmeans.compute(mdata["rna"], n_clusters=26)
compute_control_diff¶
- KMeansSpace.compute_control_diff(adata, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=False)¶
Subtract mean of the control from the perturbation.
- Parameters:
adata (
AnnData
) – Anndata object of size cells x genes.target_col (
str
) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.group_col (
str
) – .obs column name that stores the label of the group of eah cell. If None, ignore groups. Defaults to ‘perturbations’.reference_key (
str
) – The key of the control values. Defaults to ‘control’.layer_key (
str
) – Key of the AnnData layer to use for computation. Defaults to the X matrix otherwise.new_layer_key (
str
) – the results are stored in the given layer. Defaults to ‘control_diff’.embedding_key (
str
) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.new_embedding_key (
str
) – Results are stored in a new embedding in obsm with this key. Defaults to ‘control_diff’.all_data (
bool
) – if True, do the computation in all data representations (X, all layers and all embeddings)copy (
bool
) – If True returns a new Anndata of same size with the new column; otherwise it updates the initial AnnData object.
- Return type:
- Returns:
Updated AnnData object.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")
evaluate_clustering¶
- KMeansSpace.evaluate_clustering(adata, true_label_col, cluster_col, metrics=None, **kwargs)¶
Evaluation of previously computed clustering against ground truth labels.
- Parameters:
adata (
AnnData
) – AnnData object that contains the clustered data and the cluster labels.true_label_col (
str
) – ground truth labels.cluster_col (
str
) – cluster computed labels.metrics (
list
[str
]) – Metrics to compute. Defaults to [‘nmi’, ‘ari’, ‘asw’].**kwargs – Additional arguments to pass to the metrics. For nmi, average_method can be passed. For asw, metric, distances, sample_size, and random_state can be passed.
Examples
Example usage with KMeansSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> kmeans = pt.tl.KMeansSpace() >>> kmeans_adata = kmeans.compute(mdata["rna"], n_clusters=26) >>> results = kmeans.evaluate_clustering( ... kmeans_adata, true_label_col="gene_target", cluster_col="k-means", metrics=["nmi"] ... )
label_transfer¶
- KMeansSpace.label_transfer(adata, column='perturbation', target_val='unknown', n_neighbors=5, use_rep='X_umap')¶
Impute missing values in the specified column using KNN imputation in the space defined by use_rep.
- Parameters:
adata (
AnnData
) – The AnnData object containing single-cell data.column (
str
) – The column name in AnnData object to perform imputation on. Defaults to “perturbation”.target_val (
str
) – The target value to impute. Defaults to “unknown”.n_neighbors (
int
) – Number of neighbors to use for imputation. Defaults to 5.use_rep (
str
) – The key in adata.obsm where the embedding (UMAP, PCA, etc.) is stored. Defaults to ‘X_umap’.
- Return type:
Examples
>>> import pertpy as pt >>> import scanpy as sc >>> import numpy as np >>> adata = sc.datasets.pbmc68k_reduced() >>> rng = np.random.default_rng() >>> adata.obs["perturbation"] = rng.choice( ... ["A", "B", "C", "unknown"], size=adata.n_obs, p=[0.33, 0.33, 0.33, 0.01] ... ) >>> sc.pp.neighbors(adata) >>> sc.tl.umap(adata) >>> ps = pt.tl.PseudobulkSpace() >>> ps.label_transfer(adata, n_neighbors=5, use_rep="X_umap")
subtract¶
- KMeansSpace.subtract(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')¶
Subtract perturbations linearly. Assumes input of size n_perts x dimensionality
- Parameters:
adata (
AnnData
) – Anndata object of size n_perts x dim.reference_key (
str
) – Perturbation source from which the perturbation subtraction starts. Defaults to ‘control’.ensure_consistency (
bool
) – If True, runs differential expression on all data matrices to ensure consistency of linear space.target_col (
str
) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.
- Return type:
- Returns:
Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target") >>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])