pertpy.tools.PseudobulkSpace

class pertpy.tools.PseudobulkSpace[source]

Determines pseudobulks using decoupler.

Methods table

add(adata, perturbations[, reference_key, ...])

Add perturbations linearly.

compute(adata[, target_col, groups_col, ...])

Determines pseudobulks of an AnnData object.

compute_control_diff(adata[, target_col, ...])

Subtract mean of the control from the perturbation.

label_transfer(adata[, column, target_val, ...])

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

subtract(adata, perturbations[, ...])

Subtract perturbations linearly.

Methods

add

PseudobulkSpace.add(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')

Add perturbations linearly. Assumes input of size n_perts x dimensionality

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to add.

  • reference_key (str) – perturbation source from which the perturbation summation starts. Defaults to ‘control’.

  • ensure_consistency (bool) – If True, runs differential expression on all data matrices to ensure consistency of linear space.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbation’.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")

compute

PseudobulkSpace.compute(adata, target_col='perturbation', groups_col=None, layer_key=None, embedding_key=None, **kwargs)[source]

Determines pseudobulks of an AnnData object. It uses Decoupler implementation.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes

  • target_col (str) – .obs column that stores the label of the perturbation applied to each cell.

  • groups_col (str) – Optional .obs column that stores a grouping label to consider for pseudobulk computation. The summarized expression per perturbation (target_col) and group (groups_col) is computed. Defaults to None.

  • layer_key (str) – If specified pseudobulk computation is done by using the specified layer. Otherwise, computation is done with .X

  • embedding_key (str) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.

  • **kwargs – Are passed to decoupler’s get_pseuobulk.

Return type:

AnnData

Returns:

AnnData object with one observation per perturbation.

Examples

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target")

compute_control_diff

PseudobulkSpace.compute_control_diff(adata, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=False)

Subtract mean of the control from the perturbation.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.

  • group_col (str) – .obs column name that stores the label of the group of eah cell. If None, ignore groups. Defaults to ‘perturbations’.

  • reference_key (str) – The key of the control values. Defaults to ‘control’.

  • layer_key (str) – Key of the AnnData layer to use for computation. Defaults to the X matrix otherwise.

  • new_layer_key (str) – the results are stored in the given layer. Defaults to ‘control_diff’.

  • embedding_key (str) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.

  • new_embedding_key (str) – Results are stored in a new embedding in obsm with this key. Defaults to ‘control_diff’.

  • all_data (bool) – if True, do the computation in all data representations (X, all layers and all embeddings)

  • copy (bool) – If True returns a new Anndata of same size with the new column; otherwise it updates the initial AnnData object.

Return type:

AnnData

Returns:

Updated AnnData object.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")

label_transfer

PseudobulkSpace.label_transfer(adata, column='perturbation', target_val='unknown', n_neighbors=5, use_rep='X_umap')

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

Parameters:
  • adata (AnnData) – The AnnData object containing single-cell data.

  • column (str) – The column name in AnnData object to perform imputation on. Defaults to “perturbation”.

  • target_val (str) – The target value to impute. Defaults to “unknown”.

  • n_neighbors (int) – Number of neighbors to use for imputation. Defaults to 5.

  • use_rep (str) – The key in adata.obsm where the embedding (UMAP, PCA, etc.) is stored. Defaults to ‘X_umap’.

Return type:

None

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> import numpy as np
>>> adata = sc.datasets.pbmc68k_reduced()
>>> rng = np.random.default_rng()
>>> adata.obs["perturbation"] = rng.choice(
...     ["A", "B", "C", "unknown"], size=adata.n_obs, p=[0.33, 0.33, 0.33, 0.01]
... )
>>> sc.pp.neighbors(adata)
>>> sc.tl.umap(adata)
>>> ps = pt.tl.PseudobulkSpace()
>>> ps.label_transfer(adata, n_neighbors=5, use_rep="X_umap")

subtract

PseudobulkSpace.subtract(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')

Subtract perturbations linearly. Assumes input of size n_perts x dimensionality

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to subtract.

  • reference_key (str) – Perturbation source from which the perturbation subtraction starts. Defaults to ‘control’.

  • ensure_consistency (bool) – If True, runs differential expression on all data matrices to ensure consistency of linear space.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])