pertpy.tools.LRClassifierSpace

class pertpy.tools.LRClassifierSpace[source]

Fits a logistic regression model to the data and takes the feature space as embedding.

We fit one logistic regression model per perturbation. After training, the coefficients of the logistic regression model are used as the feature space. This results in one embedding per perturbation.

Methods table

add(adata, perturbations[, reference_key, ...])

Add perturbations linearly.

compute(adata[, target_col, layer_key, ...])

Fits a logistic regression model to the data and takes the coefficients of the logistic regression model as perturbation embedding.

compute_control_diff(adata[, target_col, ...])

Subtract mean of the control from the perturbation.

label_transfer(adata[, column, target_val, ...])

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

subtract(adata, perturbations[, ...])

Subtract perturbations linearly.

Methods

add

LRClassifierSpace.add(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')

Add perturbations linearly. Assumes input of size n_perts x dimensionality

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to add.

  • reference_key (str) – perturbation source from which the perturbation summation starts. Defaults to ‘control’.

  • ensure_consistency (bool) – If True, runs differential expression on all data matrices to ensure consistency of linear space.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbation’.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")

compute

LRClassifierSpace.compute(adata, target_col='perturbations', layer_key=None, embedding_key=None, test_split_size=0.2, max_iter=1000)[source]

Fits a logistic regression model to the data and takes the coefficients of the logistic regression model as perturbation embedding.

Parameters:
  • adata (AnnData) – AnnData object of size cells x genes

  • target_col (str) – .obs column that stores the perturbations. Defaults to “perturbations”.

  • layer_key (str) – Layer in adata to use. Defaults to None.

  • embedding_key (str) – Key of the embedding in obsm to be used as data for the logistic regression classifier. Can only be specified if layer_key is None. Defaults to None.

  • test_split_size (float) – Fraction of data to put in the test set. Default to 0.2.

  • max_iter (int) – Maximum number of iterations taken for the solvers to converge. Defaults to 1000.

Returns:

AnnData object with the logistic regression coefficients as the embedding in X and the perturbations as .obs[‘perturbations’].

Examples

>>> import pertpy as pt
>>> adata = pt.dt.norman_2019()
>>> rcs = pt.tl.LRClassifierSpace()
>>> pert_embeddings = rcs.compute(adata, embedding_key="X_pca", target_col="perturbation_name")

compute_control_diff

LRClassifierSpace.compute_control_diff(adata, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=False)

Subtract mean of the control from the perturbation.

Parameters:
  • adata (AnnData) – Anndata object of size cells x genes.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.

  • group_col (str) – .obs column name that stores the label of the group of eah cell. If None, ignore groups. Defaults to ‘perturbations’.

  • reference_key (str) – The key of the control values. Defaults to ‘control’.

  • layer_key (str) – Key of the AnnData layer to use for computation. Defaults to the X matrix otherwise.

  • new_layer_key (str) – the results are stored in the given layer. Defaults to ‘control_diff’.

  • embedding_key (str) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.

  • new_embedding_key (str) – Results are stored in a new embedding in obsm with this key. Defaults to ‘control_diff’.

  • all_data (bool) – if True, do the computation in all data representations (X, all layers and all embeddings)

  • copy (bool) – If True returns a new Anndata of same size with the new column; otherwise it updates the initial AnnData object.

Return type:

AnnData

Returns:

Updated AnnData object.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")

label_transfer

LRClassifierSpace.label_transfer(adata, column='perturbation', target_val='unknown', n_neighbors=5, use_rep='X_umap')

Impute missing values in the specified column using KNN imputation in the space defined by use_rep.

Parameters:
  • adata (AnnData) – The AnnData object containing single-cell data.

  • column (str) – The column name in AnnData object to perform imputation on. Defaults to “perturbation”.

  • target_val (str) – The target value to impute. Defaults to “unknown”.

  • n_neighbors (int) – Number of neighbors to use for imputation. Defaults to 5.

  • use_rep (str) – The key in adata.obsm where the embedding (UMAP, PCA, etc.) is stored. Defaults to ‘X_umap’.

Return type:

None

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> import numpy as np
>>> adata = sc.datasets.pbmc68k_reduced()
>>> rng = np.random.default_rng()
>>> adata.obs["perturbation"] = rng.choice(
...     ["A", "B", "C", "unknown"], size=adata.n_obs, p=[0.33, 0.33, 0.33, 0.01]
... )
>>> sc.pp.neighbors(adata)
>>> sc.tl.umap(adata)
>>> ps = pt.tl.PseudobulkSpace()
>>> ps.label_transfer(adata, n_neighbors=5, use_rep="X_umap")

subtract

LRClassifierSpace.subtract(adata, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')

Subtract perturbations linearly. Assumes input of size n_perts x dimensionality

Parameters:
  • adata (AnnData) – Anndata object of size n_perts x dim.

  • perturbations (Iterable[str]) – Perturbations to subtract.

  • reference_key (str) – Perturbation source from which the perturbation subtraction starts. Defaults to ‘control’.

  • ensure_consistency (bool) – If True, runs differential expression on all data matrices to ensure consistency of linear space.

  • target_col (str) – .obs column name that stores the label of the perturbation applied to each cell. Defaults to ‘perturbations’.

Return type:

tuple[AnnData, AnnData] | AnnData

Returns:

Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.

Examples

Example usage with PseudobulkSpace:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ps = pt.tl.PseudobulkSpace()
>>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target")
>>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])