pertpy.tools.LRClassifierSpace#
- class LRClassifierSpace[source]#
Fits a logistic regression model to the data and takes the feature space as embedding.
We fit one logistic regression model per perturbation. After training, the coefficients of the logistic regression model are used as the feature space. This results in one embedding per perturbation.
Methods table#
|
Add perturbations linearly. |
|
Fits a logistic regression model to the data and takes the coefficients of the logistic regression model as perturbation embedding. |
|
Subtract mean of the control from the perturbation. |
|
Impute missing values in the specified column using KNN imputation in the space defined by use_rep. |
|
Subtract perturbations linearly. |
Methods#
- LRClassifierSpace.add(adata, *, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')#
Add perturbations linearly. Assumes input of size n_perts x dimensionality.
- Parameters:
adata (
AnnData) – Anndata object of size n_perts x dim.reference_key (
str, default:'control') – perturbation source from which the perturbation summation starts.ensure_consistency (
bool, default:False) – Whether to run differential expression on all data matrices to ensure consistency of linear space.target_col (
str, default:'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.
- Return type:
- Returns:
Anndata object of size (n_perts+1) x dim, where the last row is the addition of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target") >>> new_perturbation = ps.add(ps_adata, perturbations=["ATF2", "CD86"], reference_key="NT")
- LRClassifierSpace.compute(adata, target_col='perturbations', layer_key=None, embedding_key=None, test_split_size=0.2, max_iter=1000)[source]#
Fits a logistic regression model to the data and takes the coefficients of the logistic regression model as perturbation embedding.
- Parameters:
adata (
AnnData) – AnnData object of size cells x genestarget_col (
str, default:'perturbations') – .obs column that stores the perturbations.layer_key (
str, default:None) – Layer in adata to use.embedding_key (
str, default:None) – Key of the embedding in obsm to be used as data for the logistic regression classifier. Can only be specified if layer_key is None.test_split_size (
float, default:0.2) – Fraction of data to put in the test set.max_iter (
int, default:1000) – Maximum number of iterations taken for the solvers to converge.
- Returns:
AnnData object with the logistic regression coefficients as the embedding in X and the perturbations as .obs[‘perturbations’].
Examples
>>> import pertpy as pt >>> adata = pt.dt.norman_2019() >>> rcs = pt.tl.LRClassifierSpace() >>> pert_embeddings = rcs.compute(adata, embedding_key="X_pca", target_col="perturbation_name")
- LRClassifierSpace.compute_control_diff(adata, *, target_col='perturbation', group_col=None, reference_key='control', layer_key=None, new_layer_key='control_diff', embedding_key=None, new_embedding_key='control_diff', all_data=False, copy=False)#
Subtract mean of the control from the perturbation.
- Parameters:
adata (
AnnData) – Anndata object of size cells x genes.target_col (
str, default:'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.group_col (
str, default:None) – .obs column name that stores the label of the group of each cell. If None, ignore groups.reference_key (
str, default:'control') – The key of the control values.layer_key (
str, default:None) – Key of the AnnData layer to use for computation.new_layer_key (
str, default:'control_diff') – the results are stored in the given layer.embedding_key (
str, default:None) – obsm key of the AnnData embedding to use for computation.new_embedding_key (
str, default:'control_diff') – Results are stored in a new embedding in obsm with this key.all_data (
bool, default:False) – if True, do the computation in all data representations (X, all layers and all embeddings)copy (
bool, default:False) – If True returns a new Anndata of same size with the new column; otherwise it updates the initial AnnData object.
- Return type:
- Returns:
Updated AnnData object.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> diff_adata = ps.compute_control_diff(mdata["rna"], target_col="gene_target", reference_key="NT")
- LRClassifierSpace.label_transfer(adata, *, target_column='perturbation', column_uncertainty_score_key='perturbation_transfer_uncertainty', target_val='unknown', neighbors_key='neighbors', **kwargs)#
Impute missing values in the specified column using KNN imputation in the space defined by use_rep.
Uncertainty is calculated as the entropy of the label distribution in the neighborhood of the target cell. In other words, a cell where all neighbors have the same set of labels will have an uncertainty of 0, whereas a cell where all neighbors have many different labels will have high uncertainty.
- Parameters:
adata (
AnnData) – The AnnData object containing single-cell data.target_column (
str, default:'perturbation') – The column name in adata.obs to perform imputation on.column_uncertainty_score_key (
str, default:'perturbation_transfer_uncertainty') – The column name in adata.obs to store the uncertainty score of the label transfer.target_val (
str, default:'unknown') – The target value to impute.neighbors_key (
str, default:'neighbors') – The key in adata.uns where the neighbors are stored.
- Return type:
Examples
>>> import pertpy as pt >>> import scanpy as sc >>> import numpy as np >>> adata = sc.datasets.pbmc68k_reduced() >>> # randomly dropout 10% of the data annotations >>> adata.obs["perturbation"] = adata.obs["louvain"].astype(str).copy() >>> random_cells = np.random.choice(adata.obs.index, int(adata.obs.shape[0] * 0.1), replace=False) >>> adata.obs.loc[random_cells, "perturbation"] = "unknown" >>> sc.pp.neighbors(adata) >>> sc.tl.umap(adata) >>> ps = pt.tl.PseudobulkSpace() >>> ps.label_transfer(adata)
- LRClassifierSpace.subtract(adata, *, perturbations, reference_key='control', ensure_consistency=False, target_col='perturbation')#
Subtract perturbations linearly. Assumes input of size n_perts x dimensionality.
- Parameters:
adata (
AnnData) – Anndata object of size n_perts x dim.reference_key (
str, default:'control') – Perturbation source from which the perturbation subtraction starts.ensure_consistency (
bool, default:False) – Whether to run differential expression on all data matrices to ensure consistency of linear space.target_col (
str, default:'perturbation') – .obs column name that stores the label of the perturbation applied to each cell.
- Return type:
- Returns:
Anndata object of size (n_perts+1) x dim, where the last row is the subtraction of the specified perturbations. If ensure_consistency is True, returns a tuple of (new_perturbation, adata) where adata is the AnnData object provided as input but updated using compute_control_diff.
Examples
Example usage with PseudobulkSpace:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ps = pt.tl.PseudobulkSpace() >>> ps_adata = ps.compute(mdata["rna"], target_col="gene_target", groups_col="gene_target") >>> new_perturbation = ps.subtract(ps_adata, reference_key="ATF2", perturbations=["BRD4", "CUL3"])