pertpy.tools.Mixscale#
- class Mixscale[source]#
Continuous perturbation scoring for pooled CRISPR screens.
Where
Mixscapeassigns each cell a binary perturbed/non-perturbed label, Mixscale assigns a continuous perturbation score that reflects how strongly each cell responded. This is useful for CRISPRi/CRISPRa screens where cells show a gradient of responses rather than a clean knockout, and as input to downstream weighted differential expression and pathway analyses.The method is described in Jiang, Dalgarno et al., “Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens”, Nature Cell Biology (2025) {cite}`Jiang2025`. It reproduces the reference implementation from the satijalab/Mixscale R package (satijalab/Mixscale).
Methods table#
|
Calculate a continuous perturbation score per cell with the Mixscale method. |
|
Calculate perturbation signature. |
Methods#
- Mixscale.mixscale(adata, pert_key, control, *, new_class_name='mixscale_score', layer=None, min_de_genes=5, max_de_genes=100, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', scale=True, split_by=None, pval_cutoff=0.05, fine_mode=False, fine_mode_labels='guide_id', de_genes_by_target=None, harmonize=False, harmonize_min_proportion=0.1, random_state=0, copy=False)[source]#
Calculate a continuous perturbation score per cell with the Mixscale method.
For every target gene the large-effect differentially expressed (DE) genes between its cells and the control cells are determined. The perturbation direction vector (mean perturbed minus mean control over those genes) is computed, and each cell’s perturbation signature is projected onto that vector. The per-cell projection is then standardized against the control distribution. DE genes are detected on all cells pooled, while the direction vector and standardization are computed within each split_by group. The automatic DE detection relies on
scanpy.tl.rank_genes_groups()and may select a slightly different gene set than the reference implementation; pass de_genes_by_target to score against a fixed gene set instead.Run
perturbation_signature()first to populate .layers[“X_pert”].- Parameters:
adata (
AnnData) – The annotated data object.pert_key (
str) – The column of .obs with target gene labels.control (
str) – Control category from the pert_key column.new_class_name (
str, default:'mixscale_score') – Name of the score column to be stored in .obs.layer (
str|None, default:None) – Key from adata.layers whose value is used for scoring. If None, .layers[“X_pert”] is used.min_de_genes (
int, default:5) – Required number of DE genes for scoring a perturbation. Perturbations with fewer DE genes are not scored and their cells receive the fallback score of 1.max_de_genes (
int, default:100) – Maximum number of top DE genes (by adjusted p-value) used for scoring.logfc_threshold (
float, default:0.25) – Minimum absolute log fold-change for a gene to be considered a large-effect DE gene.de_layer (
str|None, default:None) – Layer used for the DE test. If None, adata.X is used.test_method (
str, default:'wilcoxon') – Method passed toscanpy.tl.rank_genes_groups()for DE testing.scale (
bool, default:True) – Whether to z-score each gene’s perturbation signature (mean-centered and scaled to unit variance, then clipped at 10) before scoring.split_by (
str|None, default:None) – .obs column with a condition/cell-type annotation. The direction vector and standardization are computed separately within each group, while DE genes are still detected on all cells.pval_cutoff (
float, default:0.05) – Adjusted p-value cut-off for selecting significant DE genes.fine_mode (
bool, default:False) – If True, DE genes are computed per gRNA (fine_mode_labels) and pooled per target gene, rather than once per target gene.fine_mode_labels (
str, default:'guide_id') – .obs column with gRNA identifiers, used when fine_mode is True.de_genes_by_target (
Mapping[str,Sequence[str]] |None, default:None) – Optional mapping from target gene to a user-defined list of DE genes. When given, the DE test is skipped entirely and targets absent from the mapping are not scored.harmonize (
bool, default:False) – If True and split_by resolves to more than one group, control cells are subsampled so that their per-group composition matches the perturbed cells before the DE test.harmonize_min_proportion (
float, default:0.1) – Minimum fraction of control cells that must be retained during harmonization. Groups are dropped until the constraint is met.random_state (
int, default:0) – Seed for the control subsampling performed during harmonization.copy (
bool, default:False) – Determines whether a copy of adata is returned.
- Returns:
If copy=True, returns the copy of adata with the scores in .obs. Otherwise, writes the scores directly to .obs of the provided adata.
The following fields are added:
adata.obs[new_class_name]: Continuous perturbation score per cell. Control cells receive 0, cells of perturbations that could not be scored receive 1, and all other cells receive the projection standardized against the control distribution. Higher values indicate a stronger response.
adata.uns[“mixscale”]: Per target gene and split, a
DataFramewith the raw projection (pvec), the cell labels, and the leave-one-out projections (one column per DE gene).adata.uns[“mixscale_de_genes”]: The DE genes used for each target gene.
Examples
Compute continuous perturbation scores:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms = pt.tl.Mixscale() >>> ms.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms.mixscale(mdata["rna"], "gene_target", "NT", layer="X_pert")
- Mixscale.perturbation_signature(adata, pert_key, control, *, ref_selection_mode='nn', split_by=None, n_neighbors=20, use_rep=None, n_dims=15, n_pcs=None, batch_size=None, copy=False, **kwargs)#
Calculate perturbation signature.
The perturbation signature is calculated by subtracting the mRNA expression profile of each cell from the averaged mRNA expression profile of the control cells (selected according to ref_selection_mode). The implementation resembles https://satijalab.org/seurat/reference/runmixscape. Note that in the original implementation, the perturbation signature is calculated on unscaled data by default, and we therefore recommend to do the same.
- Parameters:
adata (
AnnData) – The annotated data object.pert_key (
str) – The column of .obs with perturbation categories, should also contain control.control (
str) – Name of the control category from the pert_key column.ref_selection_mode (
Literal['nn','split_by'], default:'nn') – Method to select reference cells for the perturbation signature calculation. If nn, the n_neighbors cells from the control pool with the most similar mRNA expression profiles are selected. If split_by, the control cells from the same split in split_by (e.g. indicating biological replicates) are used to calculate the perturbation signature.split_by (
str|None, default:None) – Provide the column .obs if multiple biological replicates exist to calculate the perturbation signature for every replicate separately.n_neighbors (
int, default:20) – Number of neighbors from the control to use for the perturbation signature.use_rep (
str|None, default:None) – Use the indicated representation. ‘X’ or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters.n_dims (
int|None, default:15) – Number of dimensions to use from the representation to calculate the perturbation signature. If None, use all dimensions.n_pcs (
int|None, default:None) – If PCA representation is used, the number of principal components to compute. If n_pcs==0 use .X if use_rep is None.batch_size (
int|None, default:None) – Size of batch to calculate the perturbation signature. If ‘None’, the perturbation signature is calcuated in the full mode, requiring more memory. The batched mode is very inefficient for sparse data.copy (
bool, default:False) – Determines whether a copy of the adata is returned.**kwargs – Additional arguments for the NNDescent class from pynndescent.
- Returns:
If copy=True, returns the copy of adata with the perturbation signature in .layers[“X_pert”]. Otherwise, writes the perturbation signature directly to .layers[“X_pert”] of the provided adata.
Examples
Calcutate perturbation signature for each cell in the dataset:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate")