pertpy.tools.CentroidSpace#

class CentroidSpace[source]#

Computes the centroids per perturbation of a pre-computed embedding.

Methods table#

add(adata, *, perturbations[, ...])

Add perturbations linearly.

compute(adata[, target_col, layer_key, ...])

Computes the centroids of a pre-computed embedding such as UMAP.

compute_control_diff(adata, *[, target_col, ...])

Subtract mean of the control from the perturbation.

label_transfer(adata, *[, target_column, ...])

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

subtract(adata, *, perturbations[, ...])

Subtract perturbations linearly.

Methods#

CentroidSpace.add(adata, *, perturbations, reference_key='control', ensure_consistency=True, target_col='perturbation')#

Add perturbations linearly. Assumes input of size n_perts x dimensionality.

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to add.

  • reference_key (str, default: 'control') – perturbation source from which the perturbation summation starts.

  • ensure_consistency (bool, default: True) – If True, differentiate against control via compute_control_diff before combining so that “perturbation - perturbation == control” holds in the resulting space. Set False only if the input has already been differenced.

  • target_col (str, default: 'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")
CentroidSpace.compute(adata, target_col='perturbation', layer_key=None, embedding_key='X_umap', keep_obs=True)[source]#

Computes the centroids of a pre-computed embedding such as UMAP.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes

  • target_col (str, default: 'perturbation') – .obs column that stores the label of the perturbation applied to each cell.

  • layer_key (str, default: None) – If specified pseudobulk computation is done by using the specified layer. Otherwise, computation is done with .X

  • embedding_key (str, default: 'X_umap') – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.

  • keep_obs (bool, default: True) – Whether .obs columns in the input AnnData should be kept in the output pseudobulk AnnData. Only .obs columns with the same value for each cell of one perturbation are kept.

Return type:

AnnData

Returns:

AnnData object with one observation per perturbation, storing the embedding data of the centroid of the respective perturbation.

Examples

Compute the centroids of a UMAP embedding of the papalexi_2021 dataset:

>>> import pertpy as pt
>>> import scanpy as sc
>>> mdata = pt.dt.papalexi_2021()
>>> sc.pp.pca(mdata["rna"])
>>> sc.pp.neighbors(mdata["rna"])
>>> sc.tl.umap(mdata["rna"])
>>> cs = pt.tl.CentroidSpace()
>>> cs_adata = cs.compute(mdata["rna"], target_col="gene_target")
CentroidSpace.compute_control_diff(adata, *, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=True)#

Subtract mean of the control from the perturbation.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes.

  • target_col (str, default: 'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.

  • group_col (str, default: None) – .obs column name that stores the label of the group of each cell. If None, ignore groups.

  • reference_key (str, default: 'control') – The key of the control values.

  • layer_key (str, default: None) – Key of the AnnData layer to use for computation.

  • new_layer_key (str, default: 'control_diff') – the results are stored in the given layer.

  • embedding_key (str, default: None) – obsm key of the AnnData embedding to use for computation.

  • new_embedding_key (str, default: 'control_diff') – Results are stored in a new embedding in obsm with this key.

  • all_data (bool, default: False) – if True, do the computation in all data representations (X, all layers and all embeddings)

  • copy (bool, default: True) – If True returns a new AnnData; otherwise updates the input AnnData in place.

Return type:

AnnData

Returns:

Updated AnnData object.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")
CentroidSpace.label_transfer(adata, *, target_column='perturbation', column_uncertainty_score_key='perturbation_transfer_uncertainty', target_val='unknown', neighbors_key='neighbors')#

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

Uncertainty is calculated as the entropy of the label distribution in the neighborhood of the target cell. In other words, a cell where all neighbors have the same set of labels will have an uncertainty of 0, whereas a cell where all neighbors have many different labels will have high uncertainty.

Parameters:
  • adata (AnnData) – The AnnData object containing single-cell data.

  • target_column (str, default: 'perturbation') – The column name in adata.obs to perform imputation on.

  • column_uncertainty_score_key (str, default: 'perturbation_transfer_uncertainty') – The column name in adata.obs to store the uncertainty score of the label transfer.

  • target_val (str, default: 'unknown') – The target value to impute.

  • neighbors_key (str, default: 'neighbors') – The key in adata.uns where the neighbors are stored.

Return type:

None

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> import numpy as np
>>> adata = sc.datasets.pbmc68k_reduced()
>>> # randomly dropout 10% of the data annotations
>>> adata.obs["perturbation"] = adata.obs["louvain"].astype(str).copy()
>>> random_cells = np.random.choice(adata.obs.index, int(adata.obs.shape[0] * 0.1), replace=False)
>>> adata.obs.loc[random_cells, "perturbation"] = "unknown"
>>> sc.pp.neighbors(adata)
>>> sc.tl.umap(adata)
>>> ps = pt.tl.PseudobulkSpace()
>>> ps.label_transfer(adata)
CentroidSpace.subtract(adata, *, perturbations, reference_key='control', ensure_consistency=True, target_col='perturbation')#

Subtract perturbations linearly. Assumes input of size n_perts x dimensionality.

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to subtract.

  • reference_key (str, default: 'control') – Perturbation source from which the perturbation subtraction starts.

  • ensure_consistency (bool, default: True) – If True, differentiate against control via compute_control_diff before combining so that “perturbation - perturbation == control” holds in the resulting space. Set False only if the input has already been differenced.

  • target_col (str, default: 'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])