pertpy.tools.PseudobulkSpace#
Methods table#
|
Add perturbations linearly. |
|
Determines pseudobulks of an AnnData object. |
|
Subtract mean of the control from the perturbation. |
|
Impute missing values in the specified column using KNN imputation in the space defined by use_rep. |
|
Subtract perturbations linearly. |
Methods#
- PseudobulkSpace.add(adata, *, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')#
Add perturbations linearly. Assumes input of size n_perts x dimensionality.
- Parameters:
adata (
AnnData) – Anndata object of size n_perts x dim.reference_key (
str, default:'control') – perturbation source from which the perturbation summation starts.ensure_consistency (
bool, default:False) – Whether to run differential expression on all data matrices to ensure consistency of linear space.target_col (
str, default:'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.
- Return type:
- Returns:
Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target") >>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")
- PseudobulkSpace.compute(adata, target_col='perturbation', groups_col=None, layer_key=None, embedding_key=None, mode='sum')[source]#
Determines pseudobulks of an AnnData object.
- Parameters:
adata (
AnnData) – Anndata object of size cells x genestarget_col (
str, default:'perturbation') – .obs column that stores the label of the perturbation applied to each cell.groups_col (
str, default:None) – Optional .obs column that stores a grouping label to consider for pseudobulk computation. The summarized expression per perturbation (target_col) and group (groups_col) is computed.layer_key (
str, default:None) – If specified pseudobulk computation is done by using the specified layer. Otherwise, computation is done with .Xembedding_key (
str, default:None) – obsm key of the AnnData embedding to use for computation. Defaults to the ‘X’ matrix otherwise.mode (
Literal['count_nonzero','mean','sum','var','median'], default:'sum') – Pseudobulk aggregation function
- Return type:
- Returns:
AnnData object with one observation per perturbation.
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target")
- PseudobulkSpace.compute_control_diff(adata, *, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=False)#
Subtract mean of the control from the perturbation.
- Parameters:
adata (
AnnData) – Anndata object of size cells x genes.target_col (
str, default:'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.group_col (
str, default:None) – .obs column name that stores the label of the group of each cell. If None, ignore groups.reference_key (
str, default:'control') – The key of the control values.layer_key (
str, default:None) – Key of the AnnData layer to use for computation.new_layer_key (
str, default:'control_diff') – the results are stored in the given layer.embedding_key (
str, default:None) – obsm key of the AnnData embedding to use for computation.new_embedding_key (
str, default:'control_diff') – Results are stored in a new embedding in obsm with this key.all_data (
bool, default:False) – if True, do the computation in all data representations (X, all layers and all embeddings)copy (
bool, default:False) – If True returns a new Anndata of same size with the new column; otherwise it updates the initial AnnData object.
- Return type:
- Returns:
Updated AnnData object.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")
- PseudobulkSpace.label_transfer(adata, *, target_column='perturbation', column_uncertainty_score_key='perturbation_transfer_uncertainty', target_val='unknown', neighbors_key='neighbors', **kwargs)#
Impute missing values in the specified column using KNN imputation in the space defined by use_rep.
Uncertainty is calculated as the entropy of the label distribution in the neighborhood of the target cell. In other words, a cell where all neighbors have the same set of labels will have an uncertainty of 0, whereas a cell where all neighbors have many different labels will have high uncertainty.
- Parameters:
adata (
AnnData) – The AnnData object containing single-cell data.target_column (
str, default:'perturbation') – The column name in adata.obs to perform imputation on.column_uncertainty_score_key (
str, default:'perturbation_transfer_uncertainty') – The column name in adata.obs to store the uncertainty score of the label transfer.target_val (
str, default:'unknown') – The target value to impute.neighbors_key (
str, default:'neighbors') – The key in adata.uns where the neighbors are stored.
- Return type:
Examples
>>> import pertpy as pt >>> import scanpy as sc >>> import numpy as np >>> adata = sc.datasets.pbmc68k_reduced() >>> # randomly dropout 10% of the data annotations >>> adata.obs["perturbation"] = adata.obs["louvain"].astype(str).copy() >>> random_cells = np.random.choice(adata.obs.index, int(adata.obs.shape[0] * 0.1), replace=False) >>> adata.obs.loc[random_cells, "perturbation"] = "unknown" >>> sc.pp.neighbors(adata) >>> sc.tl.umap(adata) >>> ps = pt.tl.PseudobulkSpace() >>> ps.label_transfer(adata)
- PseudobulkSpace.subtract(adata, *, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')#
Subtract perturbations linearly. Assumes input of size n_perts x dimensionality.
- Parameters:
adata (
AnnData) – Anndata object of size n_perts x dim.reference_key (
str, default:'control') – Perturbation source from which the perturbation subtraction starts.ensure_consistency (
bool, default:False) – Whether to run differential expression on all data matrices to ensure consistency of linear space.target_col (
str, default:'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.
- Return type:
- Returns:
Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target") >>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])