pertpy.tools.CentroidSpace

class pertpy.tools.CentroidSpace[source]

Computes the centroids per perturbation of a pre-computed embedding.

Methods table

add(adata, perturbations[, reference_key, ...])

Add perturbations linearly.

compute(adata[, target_col, layer_key, ...])

Computes the centroids of a pre-computed embedding such as UMAP.

compute_control_diff(adata[, target_col, ...])

Subtract mean of the control from the perturbation.

label_transfer(adata[, column, target_val, ...])

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

subtract(adata, perturbations[, ...])

Subtract perturbations linearly.

Methods

add

CentroidSpace.add(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')

Add perturbations linearly. Assumes input of size n_perts x dimensionality

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to add.

  • reference_key (str) – perturbation source from which the perturbation summation starts. Defaults to ‘control’.

  • ensure_consistency (bool) – If True, runs differential expression on all data matrices to ensure consistency of linear space.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbation’.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")

compute

CentroidSpace.compute(adata, target_col='perturbation', layer_key=None, embedding_key='X_umap', keep_obs=True)[source]

Computes the centroids of a pre-computed embedding such as UMAP.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes

  • target_col (str) – .obs column that stores the label of the perturbation applied to each cell.

  • layer_key (str) – If specified pseudobulk computation is done by using the specified layer. Otherwise, computation is done with .X

  • embedding_key (str) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.

  • keep_obs (bool) – Whether .obs columns in the input AnnData should be kept in the output pseudobulk AnnData. Only .obs columns with the same value for each cell of one perturbation are kept. Defaults to True.

Return type:

AnnData

Returns:

AnnData object with one observation per perturbation, storing the embedding data of the centroid of the respective perturbation.

Examples

Compute the centroids of a UMAP embedding of the papalexi_2021 dataset:

>>> import pertpy as pt
>>> import scanpy as sc
>>> mdata = pt.dt.papalexi_2021()
>>> sc.pp.pca(mdata["rna"])
>>> sc.pp.neighbors(mdata["rna"])
>>> sc.tl.umap(mdata["rna"])
>>> cs = pt.tl.CentroidSpace()
>>> cs_adata = cs.compute(mdata["rna"], target_col="gene_target")

compute_control_diff

CentroidSpace.compute_control_diff(adata, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=False)

Subtract mean of the control from the perturbation.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.

  • group_col (str) – .obs column name that stores the label of the group of eah cell. If None, ignore groups. Defaults to ‘perturbations’.

  • reference_key (str) – The key of the control values. Defaults to ‘control’.

  • layer_key (str) – Key of the AnnData layer to use for computation. Defaults to the X matrix otherwise.

  • new_layer_key (str) – the results are stored in the given layer. Defaults to ‘control_diff’.

  • embedding_key (str) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.

  • new_embedding_key (str) – Results are stored in a new embedding in obsm with this key. Defaults to ‘control_diff’.

  • all_data (bool) – if True, do the computation in all data representations (X, all layers and all embeddings)

  • copy (bool) – If True returns a new Anndata of same size with the new column; otherwise it updates the initial AnnData object.

Return type:

AnnData

Returns:

Updated AnnData object.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")

label_transfer

CentroidSpace.label_transfer(adata, column='perturbation', target_val='unknown', n_neighbors=5, use_rep='X_umap')

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

Parameters:
  • adata (AnnData) – The AnnData object containing single-cell data.

  • column (str) – The column name in AnnData object to perform imputation on. Defaults to “perturbation”.

  • target_val (str) – The target value to impute. Defaults to “unknown”.

  • n_neighbors (int) – Number of neighbors to use for imputation. Defaults to 5.

  • use_rep (str) – The key in adata.obsm where the embedding (UMAP, PCA, etc.) is stored. Defaults to ‘X_umap’.

Return type:

None

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> import numpy as np
>>> adata = sc.datasets.pbmc68k_reduced()
>>> rng = np.random.default_rng()
>>> adata.obs["perturbation"] = rng.choice(
...     ["A", "B", "C", "unknown"], size=adata.n_obs, p=[0.33, 0.33, 0.33, 0.01]
... )
>>> sc.pp.neighbors(adata)
>>> sc.tl.umap(adata)
>>> ps = pt.tl.PseudobulkSpace()
>>> ps.label_transfer(adata, n_neighbors=5, use_rep="X_umap")

subtract

CentroidSpace.subtract(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')

Subtract perturbations linearly. Assumes input of size n_perts x dimensionality

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to subtract.

  • reference_key (str) – Perturbation source from which the perturbation subtraction starts. Defaults to ‘control’.

  • ensure_consistency (bool) – If True, runs differential expression on all data matrices to ensure consistency of linear space.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])