pertpy.tools.Mixscape#
- class Mixscape[source]#
identify perturbation effects in CRISPR screens by separating cells into perturbation groups.
Methods table#
|
Linear Discriminant Analysis on pooled CRISPR screen data. |
|
Identify perturbed and non-perturbed gRNA expressing cells that accounts for multiple treatments/conditions/chemical perturbations. |
|
Calculate perturbation signature. |
|
Barplot to visualize perturbation scores calculated by the mixscape function. |
|
Heatmap plot using mixscape results. |
|
Visualizing perturbation responses with Linear Discriminant Analysis. |
|
Density plots to visualize perturbation scores calculated by the pt.tl.mixscape function. |
|
Violin plot using mixscape results. |
Methods#
- Mixscape.lda(adata, pert_key, control, *, mixscape_class_global='mixscape_class_global', layer=None, n_comps=10, min_de_genes=5, logfc_threshold=0.25, test_method='wilcoxon', split_by=None, pval_cutoff=0.05, perturbation_type='KO', copy=False)[source]#
Linear Discriminant Analysis on pooled CRISPR screen data. Requires pt.tl.mixscape() to be run first.
- Parameters:
adata (
AnnData) – The annotated data object.pert_key (
str) – The column of .obs with target gene labels.control (
str) – Control category from the pert_key column.mixscape_class_global (
str|None, default:'mixscape_class_global') – The column of .obs with mixscape global classification result (perturbed, NP or NT).layer (
str|None, default:None) – Layer to use for identifying differentially expressed genes. If None, adata.X is used.n_comps (
int|None, default:10) – Number of principal components to use.min_de_genes (
int|None, default:5) – Required number of genes that are differentially expressed for method to separate perturbed and non-perturbed cells.logfc_threshold (
float|None, default:0.25) – Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.test_method (
str|None, default:'wilcoxon') – Method to use for differential expression testing.split_by (
str|None, default:None) – Provide .obs column with experimental condition/cell type annotation, if perturbations are condition/cell type-specific.pval_cutoff (
float|None, default:0.05) – P-value cut-off for selection of significantly DE genes.perturbation_type (
str|None, default:'KO') – Specify type of CRISPR perturbation expected for labeling mixscape classifications.copy (
bool|None, default:False) – Determines whether a copy of the adata is returned.
- Returns:
If copy=True, returns the copy of adata with the LDA result in .uns. Otherwise, writes the results directly to .uns of the provided adata.
mixscape_lda: numpy.ndarray (adata.uns[‘mixscape_lda’]). LDA result.
Examples
Use LDA dimensionality reduction to visualize the perturbation effects:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert") >>> ms_pt.lda(mdata["rna"], "gene_target", "NT")
- Mixscape.mixscape(adata, pert_key, control, *, new_class_name='mixscape_class', layer=None, min_de_genes=5, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', iter_num=10, scale=True, split_by=None, pval_cutoff=0.05, perturbation_type='KO', random_state=0, copy=False, **gmmkwargs)[source]#
Identify perturbed and non-perturbed gRNA expressing cells that accounts for multiple treatments/conditions/chemical perturbations.
The implementation resembles https://satijalab.org/seurat/reference/runmixscape.
- Parameters:
adata (
AnnData) – The annotated data object.pert_key (
str) – The column of .obs with target gene labels.control (
str) – Control category from the labels column.new_class_name (
str|None, default:'mixscape_class') – Name of mixscape classification to be stored in .obs.layer (
str|None, default:None) – Key from adata.layers whose value will be used to perform tests on. Default is using .layers[“X_pert”].min_de_genes (
int|None, default:5) – Required number of genes that are differentially expressed for method to separate perturbed and non-perturbed cells.logfc_threshold (
float|None, default:0.25) – Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.de_layer (
str|None, default:None) – Layer to use for identifying differentially expressed genes. If None, adata.X is used.test_method (
str|None, default:'wilcoxon') – Method to use for differential expression testing.iter_num (
int|None, default:10) – Number of normalmixEM iterations to run if convergence does not occur.scale (
bool|None, default:True) – Scale the data specified in layer before running the GaussianMixture model on it.split_by (
str|None, default:None) – Provide .obs column with experimental condition/cell type annotation, if perturbations are condition/cell type-specific.pval_cutoff (
float|None, default:0.05) – P-value cut-off for selection of significantly DE genes.perturbation_type (
str|None, default:'KO') – specify type of CRISPR perturbation expected for labeling mixscape classifications.random_state (
int|None, default:0) – Random seed for the GaussianMixture model.copy (
bool|None, default:False) – Determines whether a copy of the adata is returned.**gmmkwargs – Passed to custom implementation of scikit-learn Gaussian Mixture Model.
- Returns:
If copy=True, returns the copy of adata with the classification result in .obs. Otherwise, writes the results directly to .obs of the provided adata.
mixscape_class: pandas.Series (adata.obs[‘mixscape_class’]). Classification result with cells being either classified as perturbed (KO, by default) or non-perturbed (NP) based on their target gene class.
mixscape_class_global: pandas.Series (adata.obs[‘mixscape_class_global’]). Global classification result (perturbed, NP or NT).
mixscape_class_p_ko: pandas.Series (adata.obs[‘mixscape_class_p_ko’]). Posterior probabilities used to determine if a cell is KO (default). Name of this item will change to match perturbation_type parameter setting. (>0.5) or NP.
Examples
Calcutate perturbation signature for each cell in the dataset:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert")
- Mixscape.perturbation_signature(adata, pert_key, control, *, ref_selection_mode='nn', split_by=None, n_neighbors=20, use_rep=None, n_dims=15, n_pcs=None, batch_size=None, copy=False, **kwargs)[source]#
Calculate perturbation signature.
The perturbation signature is calculated by subtracting the mRNA expression profile of each cell from the averaged mRNA expression profile of the control cells (selected according to ref_selection_mode). The implementation resembles https://satijalab.org/seurat/reference/runmixscape. Note that in the original implementation, the perturbation signature is calculated on unscaled data by default, and we therefore recommend to do the same.
- Parameters:
adata (
AnnData) – The annotated data object.pert_key (
str) – The column of .obs with perturbation categories, should also contain control.control (
str) – Name of the control category from the pert_key column.ref_selection_mode (
Literal['nn','split_by'], default:'nn') – Method to select reference cells for the perturbation signature calculation. If nn, the n_neighbors cells from the control pool with the most similar mRNA expression profiles are selected. If split_by, the control cells from the same split in split_by (e.g. indicating biological replicates) are used to calculate the perturbation signature.split_by (
str|None, default:None) – Provide the column .obs if multiple biological replicates exist to calculate the perturbation signature for every replicate separately.n_neighbors (
int, default:20) – Number of neighbors from the control to use for the perturbation signature.use_rep (
str|None, default:None) – Use the indicated representation. ‘X’ or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters.n_dims (
int|None, default:15) – Number of dimensions to use from the representation to calculate the perturbation signature. If None, use all dimensions.n_pcs (
int|None, default:None) – If PCA representation is used, the number of principal components to compute. If n_pcs==0 use .X if use_rep is None.batch_size (
int|None, default:None) – Size of batch to calculate the perturbation signature. If ‘None’, the perturbation signature is calcuated in the full mode, requiring more memory. The batched mode is very inefficient for sparse data.copy (
bool, default:False) – Determines whether a copy of the adata is returned.**kwargs – Additional arguments for the NNDescent class from pynndescent.
- Returns:
If copy=True, returns the copy of adata with the perturbation signature in .layers[“X_pert”]. Otherwise, writes the perturbation signature directly to .layers[“X_pert”] of the provided adata.
Examples
Calcutate perturbation signature for each cell in the dataset:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate")
- Mixscape.plot_barplot(adata, guide_rna_column, *, mixscape_class_global='mixscape_class_global', axis_text_x_size=8, axis_text_y_size=6, axis_title_size=8, legend_title_size=8, legend_text_size=8, legend_bbox_to_anchor=None, figsize=(25, 25), return_fig=False)[source]#
Barplot to visualize perturbation scores calculated by the mixscape function.
- Parameters:
adata (
AnnData) – The annotated data object.guide_rna_column (
str) – The column of .obs with guide RNA labels. The target gene labels. The format must be <gene_target>g<#>. Examples are ‘STAT2g1’ and ‘ATF2g1’.mixscape_class_global (
str, default:'mixscape_class_global') – The column of .obs with mixscape global classification result (perturbed, NP or NT).axis_text_x_size (
int, default:8) – Size of the x-axis text.axis_text_y_size (
int, default:6) – Size of the y-axis text.axis_title_size (
int, default:8) – Size of the axis title.legend_title_size (
int, default:8) – Size of the legend title.legend_text_size (
int, default:8) – Size of the legend text.legend_bbox_to_anchor (
tuple[float,float], default:None) – The bbox that the legend will be anchored.figsize (
tuple[float,float], default:(25, 25)) – The size of the figure.return_fig (
bool, default:False) – if True, returns figure of the plot, that can be used for saving.
- Return type:
- Returns:
If return_fig is True, returns the figure, otherwise None.
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert") >>> ms_pt.plot_barplot(mdata["rna"], guide_rna_column="NT")
- Preview:
- Mixscape.plot_heatmap(adata, labels, target_gene, control, *, layer=None, method='wilcoxon', subsample_number=900, vmin=-2, vmax=2, return_fig=False, **kwds)[source]#
Heatmap plot using mixscape results. Requires pt.tl.mixscape() to be run first.
- Parameters:
adata (
AnnData) – The annotated data object.labels (
str) – The column of .obs with target gene labels.target_gene (
str) – Target gene name to visualize heatmap for.control (
str) – Control category from the pert_key column.layer (
str|None, default:None) – Key from adata.layers whose value will be used to perform tests on.method (
str|None, default:'wilcoxon') – The default method is ‘wilcoxon’, see method parameter in scanpy.tl.rank_genes_groups for more options.subsample_number (
int|None, default:900) – Subsample to this number of observations.vmin (
float|None, default:-2) – The value representing the lower limit of the color scale. Values smaller than vmin are plotted with the same color as vmin.vmax (
float|None, default:2) – The value representing the upper limit of the color scale. Values larger than vmax are plotted with the same color as vmax.return_fig (
bool, default:False) – if True, returns figure of the plot, that can be used for saving.**kwds – Additional arguments to scanpy.pl.rank_genes_groups_heatmap.
- Return type:
- Returns:
If return_fig is True, returns the figure, otherwise None.
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert") >>> ms_pt.plot_heatmap( ... adata=mdata["rna"], labels="gene_target", target_gene="IFNGR2", layer="X_pert", control="NT" ... )
- Preview:
- Mixscape.plot_lda(adata, control, *, mixscape_class='mixscape_class', mixscape_class_global='mixscape_class_global', perturbation_type='KO', lda_key='mixscape_lda', n_components=None, color_map=None, palette=None, ax=None, return_fig=False, **kwds)[source]#
Visualizing perturbation responses with Linear Discriminant Analysis. Requires pt.tl.mixscape() to be run first.
- Parameters:
adata (
AnnData) – The annotated data objectplot_heatmap.control (
str) – Control category from the pert_key column.mixscape_class (
str, default:'mixscape_class') – The column of .obs with the mixscape classification result.mixscape_class_global (
str, default:'mixscape_class_global') – The column of .obs with mixscape global classification result (perturbed, NP or NT).perturbation_type (
str|None, default:'KO') – Specify type of CRISPR perturbation expected for labeling mixscape classifications.n_components (
int|None, default:None) – The number of dimensions of the embedding.lda_key (
str|None, default:'mixscape_lda') – If not specified, lda looks .uns[“mixscape_lda”] for the LDA results.color_map (
Colormap|str|None, default:None) – Matplotlib color map.palette (
str|Sequence[str] |None, default:None) – Matplotlib palette.return_fig (
bool, default:False) – if True, returns figure of the plot, that can be used for saving.**kwds – Additional arguments to scanpy.pl.umap.
- Return type:
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert") >>> ms_pt.lda(mdata["rna"], "gene_target", "NT", split_by="replicate") >>> ms_pt.plot_lda(adata=mdata["rna"], control="NT")
- Preview:
- Mixscape.plot_perturbscore(adata, labels, target_gene, *, mixscape_class='mixscape_class', color='orange', palette=None, split_by=None, before_mixscape=False, perturbation_type='KO', return_fig=False)[source]#
Density plots to visualize perturbation scores calculated by the pt.tl.mixscape function.
Requires pt.tl.mixscape to be run first.
https://satijalab.org/seurat/reference/plotperturbscore
- Parameters:
adata (
AnnData) – The annotated data object.labels (
str) – The column of .obs with target gene labels.target_gene (
str) – Target gene name to visualize perturbation scores for.mixscape_class (
str, default:'mixscape_class') – The column of .obs with mixscape classifications.color (
str, default:'orange') – Specify color of target gene class or knockout cell class. For control non-targeting and non-perturbed cells, colors are set to different shades of grey.palette (
dict[str,str], default:None) – Optional full color palette to overwrite all colors.split_by (
str, default:None) – Provide the column .obs if multiple biological replicates exist to calculate the perturbation signature for every replicate separately.before_mixscape (
bool, default:False) – Option to split densities based on mixscape classification (default) or original target gene classification. Default is set to NULL and plots cells by original class ID.perturbation_type (
str, default:'KO') – Specify type of CRISPR perturbation expected for labeling mixscape classifications.return_fig (
bool, default:False) – if True, returns figure of the plot, that can be used for saving.
- Return type:
- Returns:
If return_fig is True, returns the figure, otherwise None.
Examples
Visualizing the perturbation scores for the cells in a dataset:
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert") >>> ms_pt.plot_perturbscore(adata=mdata["rna"], labels="gene_target", target_gene="IFNGR2", color="orange")
- Preview:
- Mixscape.plot_violin(adata, target_gene_idents, *, keys='mixscape_class_p_ko', groupby='mixscape_class', log=False, use_raw=None, stripplot=True, hue=None, jitter=True, size=1, layer=None, scale='width', order=None, multi_panel=None, xlabel='', ylabel=None, rotation=None, ax=None, return_fig=False, **kwargs)[source]#
Violin plot using mixscape results.
Requires pt.tl.mixscape to be run first.
- Parameters:
adata (
AnnData) – The annotated data object.target_gene_idents (
str|list[str]) – Target gene name to plot.keys (
str|Sequence[str], default:'mixscape_class_p_ko') – Keys for accessing variables of .var_names or fields of .obs. Default is ‘mixscape_class_p_ko’.groupby (
str|None, default:'mixscape_class') – The key of the observation grouping to consider. Default is ‘mixscape_class’.log (
bool, default:False) – Plot on logarithmic axis.use_raw (
bool|None, default:None) – Whether to use raw attribute of adata.stripplot (
bool, default:True) – Add a stripplot on top of the violin plot.order (
Sequence[str] |None, default:None) – Order in which to show the categories.xlabel (
str, default:'') – Label of the x-axis. Defaults to groupby if rotation is None, otherwise, no label is shown.ylabel (
str|Sequence[str] |None, default:None) – Label of the y-axis. If None and groupby is None, defaults to ‘value’. If None and groubpy is not None, defaults to keys.ax (
Axes|None, default:None) – A matplotlib axes object. Only works if plotting a single component.return_fig (
bool, default:False) – if True, returns figure of the plot, that can be used for saving.**kwargs – Additional arguments to seaborn.violinplot.
- Return type:
- Returns:
If return_fig is True, returns the figure (as Axes list if it’s a multi-panel plot), otherwise None.
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> ms_pt = pt.tl.Mixscape() >>> ms_pt.perturbation_signature(mdata["rna"], "perturbation", "NT", split_by="replicate") >>> ms_pt.mixscape(mdata["rna"], "gene_target", "NT", layer="X_pert") >>> ms_pt.plot_violin( ... adata=mdata["rna"], target_gene_idents=["NT", "IFNGR2 NP", "IFNGR2 KO"], groupby="mixscape_class" ... )
- Preview: