pertpy.tools.Mixscale

pertpy.tools.Mixscale#

class Mixscale[source]#

Continuous perturbation scoring for pooled CRISPR screens.

Where Mixscape assigns each cell a binary perturbed/non-perturbed label, Mixscale assigns a continuous perturbation score that reflects how strongly each cell responded. This is useful for CRISPRi/CRISPRa screens where cells show a gradient of responses rather than a clean knockout, and as input to downstream weighted differential expression and pathway analyses.

The method is described in Jiang, Dalgarno et al., “Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens”, Nature Cell Biology (2025) {cite}`Jiang2025`. It reproduces the reference implementation from the satijalab/Mixscale R package (satijalab/Mixscale).

Methods table#

`mixscale`(adata, pert_key, control, *[, ...])	Calculate a continuous perturbation score per cell with the Mixscale method.
`perturbation_signature`(adata, pert_key, ...)	Calculate perturbation signature.

Methods#

Mixscale.mixscale(adata, pert_key, control, *, new_class_name='mixscale_score', layer=None, min_de_genes=5, max_de_genes=100, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', scale=True, split_by=None, pval_cutoff=0.05, fine_mode=False, fine_mode_labels='guide_id', de_genes_by_target=None, harmonize=False, harmonize_min_proportion=0.1, random_state=0, copy=False)[source]#

Calculate a continuous perturbation score per cell with the Mixscale method.

For every target gene the large-effect differentially expressed (DE) genes between its cells and the control cells are determined. The perturbation direction vector (mean perturbed minus mean control over those genes) is computed, and each cell’s perturbation signature is projected onto that vector. The per-cell projection is then standardized against the control distribution. DE genes are detected on all cells pooled, while the direction vector and standardization are computed within each split_by group. The automatic DE detection relies on scanpy.tl.rank_genes_groups() and may select a slightly different gene set than the reference implementation; pass de_genes_by_target to score against a fixed gene set instead.

Run perturbation_signature() first to populate .layers[“X_pert”].

Parameters:

adata (AnnData) – The annotated data object.
pert_key (str) – The column of .obs with target gene labels.
control (str) – Control category from the pert_key column.
new_class_name (str, default: 'mixscale_score') – Name of the score column to be stored in .obs.
layer (str | None, default: None) – Key from adata.layers whose value is used for scoring. If None, .layers[“X_pert”] is used.
min_de_genes (int, default: 5) – Required number of DE genes for scoring a perturbation. Perturbations with fewer DE genes are not scored and their cells receive the fallback score of 1.
max_de_genes (int, default: 100) – Maximum number of top DE genes (by adjusted p-value) used for scoring.
logfc_threshold (float, default: 0.25) – Minimum absolute log fold-change for a gene to be considered a large-effect DE gene.
de_layer (str | None, default: None) – Layer used for the DE test. If None, adata.X is used.
test_method (str, default: 'wilcoxon') – Method passed to scanpy.tl.rank_genes_groups() for DE testing.
scale (bool, default: True) – Whether to z-score each gene’s perturbation signature (mean-centered and scaled to unit variance, then clipped at 10) before scoring.
split_by (str | None, default: None) – .obs column with a condition/cell-type annotation. The direction vector and standardization are computed separately within each group, while DE genes are still detected on all cells.
pval_cutoff (float, default: 0.05) – Adjusted p-value cut-off for selecting significant DE genes.
fine_mode (bool, default: False) – If True, DE genes are computed per gRNA (fine_mode_labels) and pooled per target gene, rather than once per target gene.
fine_mode_labels (str, default: 'guide_id') – .obs column with gRNA identifiers, used when fine_mode is True.
de_genes_by_target (Mapping[str, Sequence[str]] | None, default: None) – Optional mapping from target gene to a user-defined list of DE genes. When given, the DE test is skipped entirely and targets absent from the mapping are not scored.
harmonize (bool, default: False) – If True and split_by resolves to more than one group, control cells are subsampled so that their per-group composition matches the perturbed cells before the DE test.
harmonize_min_proportion (float, default: 0.1) – Minimum fraction of control cells that must be retained during harmonization. Groups are dropped until the constraint is met.
random_state (int, default: 0) – Seed for the control subsampling performed during harmonization.
copy (bool, default: False) – Determines whether a copy of adata is returned.

Returns:

If copy=True, returns the copy of adata with the scores in .obs. Otherwise, writes the scores directly to .obs of the provided adata.

The following fields are added:

adata.obs[new_class_name]: Continuous perturbation score per cell. Control cells receive 0, cells of perturbations that could not be scored receive 1, and all other cells receive the projection standardized against the control distribution. Higher values indicate a stronger response.
adata.uns[“mixscale”]: Per target gene and split, a DataFrame with the raw projection (pvec), the cell labels, and the leave-one-out projections (one column per DE gene).
adata.uns[“mixscale_de_genes”]: The DE genes used for each target gene.

Examples

Compute continuous perturbation scores:

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> ms = pt.tl.Mixscale()
>>> ms.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate")
>>> ms.mixscale(mdata["rna"], "gene_target", "NT", layer="X_pert")

Mixscale.perturbation_signature(adata, pert_key, control, *, ref_selection_mode='nn', split_by=None, n_neighbors=20, use_rep=None, n_dims=15, n_pcs=None, batch_size=None, copy=False, **kwargs)#

Calculate perturbation signature.

The perturbation signature is calculated by subtracting the mRNA expression profile of each cell from the averaged mRNA expression profile of the control cells (selected according to ref_selection_mode). The implementation resembles https://satijalab.org/seurat/reference/runmixscape. Note that in the original implementation, the perturbation signature is calculated on unscaled data by default, and we therefore recommend to do the same.