pertpy.tools.Dialogue¶

class pertpy.tools.Dialogue(sample_id, celltype_key, n_counts_key, n_mpcs)[source]¶: Python implementation of DIALOGUE

Methods table¶

`calculate_multifactor_PMD`(adata[, ...])	Runs multifactor PMD.
`get_extrema_MCP_genes`(ct_subs[, fraction])	Identifies cells with extreme MCP scores.
`get_mlm_mcp_genes`(celltype, results[, MCP, ...])	Extracts MCP genes from the MCP multilevel modeling object for the cell type of interest.
`multilevel_modeling`(ct_subs, mcp_scores, ...)	Runs the multilevel modeling step to match genes to MCPs and generate p-values for MCPs.
`test_association`(adata, condition_label[, ...])	Tests association between MCPs and a binary response variable (e.g. response to treatment).
`plot_pairplot`(adata, celltype_key, color, ...)	Generate a pairplot visualization for multi-cell perturbation (MCP) data.
`plot_split_violins`(adata, split_key, ...[, ...])	Plots split violin plots for a given MCP and split variable.

Methods¶

calculate_multifactor_PMD¶

Dialogue.calculate_multifactor_PMD(adata, penalties=None, ct_order=None, agg_pca=True, solver='bs', normalize=True)[source]¶

Runs multifactor PMD.

Currently mimics DIALOGUE1.

Parameters:

adata (AnnData) – AnnData object to calculate PMD for.
sample_id – Key to use for pseudobulk determination.
penalties (list[int]) – PMD penalties.
ct_order (list[str]) – The order of cell types.
agg_pca (bool) – Whether to calculate cell-averaged PCA components.
solver (Literal['lp', 'bs']) – Which solver to use for PMD. Must be one of “lp” (linear programming) or “bs” (binary search). For differences between these to please refer to https://github.com/theislab/sparsecca/blob/main/examples/linear_programming_multicca.ipynb Defaults to ‘bs’.
normalize (bool) – Whether to mimic DIALOGUE as close as possible

Return type:

tuple[AnnData, dict[str, ndarray], dict[Any, Any], dict[Any, Any]]

Returns:

MCP scores # TODO this requires more detail

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(
...     sample_id="clinical.status", celltype_key="cell.subtypes", n_counts_key="nCount_RNA", n_mpcs=3
... )
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)

get_extrema_MCP_genes¶

Dialogue.get_extrema_MCP_genes(ct_subs, fraction=0.1)[source]¶

Identifies cells with extreme MCP scores.

Takes as input a dictionary of subpopulations AnnData objects (DIALOGUE output), For each MCP it identifies cells with extreme MCP scores, then calls rank_genes_groups to identify genes which are differentially expressed between high-scoring and low-scoring cells.

Parameters:

ct_subs (dict) – Dialogue output ct_subs dictionary
fraction (float) – Fraction of extreme cells to consider for gene ranking. Should be between 0 and 1. Defaults to 0.1.

Returns:

Nested dictionary where keys of the first level are MCPs (of the form “mcp_0” etc) and the second level keys are cell types. The values are dataframes containing the results of the rank_genes_groups analysis.

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(sample_id = "clinical.status", celltype_key = "cell.subtypes",                 n_counts_key = "nCount_RNA", n_mpcs = 3)
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)
>>> extrema_mcp_genes = dl.get_extrema_MCP_genes(ct_subs)

get_mlm_mcp_genes¶

Dialogue.get_mlm_mcp_genes(celltype, results, MCP='mcp_0', threshold=0.7, focal_celltypes=None)[source]¶

Extracts MCP genes from the MCP multilevel modeling object for the cell type of interest.

Parameters:

celltype (str) – Cell type of interest.
results (dict) – dl.MultilevelModeling result object.
MCP (str) – MCP key of the result object.
threshold (float) – Number between [0,1]. The fraction of cell types compared against which must have the associated MCP gene. Defaults to 0.70.
focal_celltypes (list[str] | None) – None (compare against all cell types) or a list of other cell types which you want to compare against. Defaults to None.

Returns:

Dict with keys ‘up_genes’ and ‘down_genes’ and values of lists of genes

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(sample_id = "clinical.status", celltype_key = "cell.subtypes",                 n_counts_key = "nCount_RNA", n_mpcs = 3)
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)
>>> all_results, new_mcps = dl.multilevel_modeling(ct_subs=ct_subs, mcp_scores=mcps, ws_dict=ws,                 confounder="gender")
>>> mcp_genes = dl.get_mlm_mcp_genes(celltype='Macrophages', results=all_results)

multilevel_modeling¶

Dialogue.multilevel_modeling(ct_subs, mcp_scores, ws_dict, confounder, formula=None)[source]¶

Runs the multilevel modeling step to match genes to MCPs and generate p-values for MCPs.

Parameters:

ct_subs (dict) – The DIALOGUE cell type objects.
mcp_scores (dict) – The determined MCP scores from the PMD step.
confounder (str | None) – Any modeling confounders.
formula (str) – The hierarchical modeling formula. Defaults to y ~ x + n_counts.

Returns:

for each mcp: HLM_result_1, HLM_result_2, sig_genes_1, sig_genes_2
merged HLM_result_1, HLM_result_2, sig_genes_1, sig_genes_2 of all mcps

Return type:

A Pandas DataFrame containing

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(sample_id = "clinical.status", celltype_key = "cell.subtypes",                 n_counts_key = "nCount_RNA", n_mpcs = 3)
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)
>>> all_results, new_mcps = dl.multilevel_modeling(ct_subs=ct_subs, mcp_scores=mcps, ws_dict=ws,                 confounder="gender")

plot_pairplot¶

Dialogue.plot_pairplot(adata, celltype_key, color, sample_id, mcp='mcp_0', return_fig=None, show=None, save=None)[source]¶

Generate a pairplot visualization for multi-cell perturbation (MCP) data.

Computes the mean of a specified MCP feature (mcp) for each combination of sample and cell type, then creates a pairplot to visualize the relationships between these mean MCP values.

Parameters:

adata (AnnData) – Annotated data object.
celltype_key (str) – Key in adata.obs containing cell type annotations.
color (str) – Key in adata.obs for color annotations. This parameter is used as the hue
sample_id (str) – Key in adata.obs for the sample annotations.
mcp (str) – Key in adata.obs for MCP feature values. Defaults to “mcp_0”.

Return type:

PairGrid | Figure | None

Returns:

Seaborn Pairgrid object.

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(sample_id = "clinical.status", celltype_key = "cell.subtypes",                 n_counts_key = "nCount_RNA", n_mpcs = 3)
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)
>>> dl.plot_pairplot(adata, celltype_key="cell.subtypes", color="gender", sample_id="clinical.status")

Preview:

plot_split_violins¶

Dialogue.plot_split_violins(adata, split_key, celltype_key, split_which=None, mcp='mcp_0', return_fig=None, ax=None, save=None, show=None)[source]¶

Plots split violin plots for a given MCP and split variable.

Any cells with a value for split_key not in split_which are removed from the plot.

Parameters:

adata (AnnData) – Annotated data object.
split_key (str) – Variable in adata.obs used to split the data.
celltype_key (str) – Key for cell type annotations.
split_which (tuple[str, str]) – Which values of split_key to plot. Required if more than 2 values in split_key.
mcp (str) – Key for MCP data. Defaults to “mcp_0”.

Return type:

Axes | Figure | None

Returns:

A Axes object

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(sample_id = "clinical.status", celltype_key = "cell.subtypes",                 n_counts_key = "nCount_RNA", n_mpcs = 3)
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)
>>> dl.plot_split_violins(adata, split_key='gender', celltype_key='cell.subtypes')

Preview:

test_association¶

Dialogue.test_association(adata, condition_label, conditions_compare=None)[source]¶

Tests association between MCPs and a binary response variable (e.g. response to treatment).

Note: benjamini-hochberg corrects for the number of cell types, NOT the number of MCPs

Parameters:

adata (AnnData) – AnnData object with MCPs in obs
condition_label (str) – Column name in adata.obs with condition labels. Must be categorical.
conditions_compare (tuple[str, str]) – Tuple of length 2 with the two conditions to compare, must be in adata.obs[condition_label]

Returns:

Dict of data frames with pvals, tstats, and pvals_adj for each MCP

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> adata = pt.dt.dialogue_example()
>>> sc.pp.pca(adata)
>>> dl = pt.tl.Dialogue(sample_id = "clinical.status", celltype_key = "cell.subtypes",                 n_counts_key = "nCount_RNA", n_mpcs = 3)
>>> adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(adata, normalize=True)
>>> stats = dl.test_association(adata, condition_label="pathology")