pertpy.tools.Sccoda

class pertpy.tools.Sccoda(*args, **kwargs)[source]

Statistical model for single-cell differential composition analysis with specification of a reference cell type. This is the standard scCODA model and recommended for all uses.

The hierarchical formulation of the model for one sample is:

\[\begin{split}y|x &\sim DirMult(\phi, \bar{y}) \\ \log(\phi) &= \alpha + x \beta \\ \alpha_k &\sim N(0, 5) \quad &\forall k \in [K] \\ \beta_{m, \hat{k}} &= 0 &\forall m \in [M]\\ \beta_{m, k} &= \tau_{m, k} \tilde{\beta}_{m, k} \quad &\forall m \in [M], k \in \{[K] \smallsetminus \hat{k}\} \\ \tau_{m, k} &= \frac{\exp(t_{m, k})}{1+ \exp(t_{m, k})} \quad &\forall m \in [M], k \in \{[K] \smallsetminus \hat{k}\} \\ \frac{t_{m, k}}{50} &\sim N(0, 1) \quad &\forall m \in [M], k \in \{[K] \smallsetminus \hat{k}\} \\ \tilde{\beta}_{m, k} &= \sigma_m^2 \cdot \gamma_{m, k} \quad &\forall m \in [M], k \in \{[K] \smallsetminus \hat{k}\} \\ \sigma_m^2 &\sim HC(0, 1) \quad &\forall m \in [M] \\ \gamma_{m, k} &\sim N(0,1) \quad &\forall m \in [M], k \in \{[K] \smallsetminus \hat{k}\} \\\end{split}\]

with y being the cell counts and x the covariates.

For further information, see scCODA is a Bayesian model for compositional single-cell data analysis (Büttner, Ostner et al., NatComms, 2021)

Methods table

credible_effects(data[, modality_key, est_fdr])

Decides which effects of the scCODA model are credible based on an adjustable inclusion probability threshold.

get_effect_df(data[, modality_key])

Get effect dataframe as printed in the extended summary

get_intercept_df(data[, modality_key])

Get intercept dataframe as printed in the extended summary

get_node_df(data[, modality_key])

Get node effect dataframe as printed in the extended summary of a tascCODA model

load(adata, type[, generate_sample_level, ...])

Prepare a MuData object for subsequent processing.

make_arviz(data[, modality_key, rng_key, ...])

Creates arviz object from model results for MCMC diagnosis

model(counts, covariates, n_total, ...)

Implements scCODA model in numpyro

prepare(data, formula[, ...])

Handles data preprocessing, covariate matrix creation, reference selection, and zero count replacement for scCODA.

run_hmc(data[, modality_key, num_samples, ...])

Run standard Hamiltonian Monte Carlo sampling (Neal, 2011) to infer optimal model parameters.

run_nuts(data[, modality_key, num_samples, ...])

Run No-U-turn sampling (Hoffman and Gelman, 2014), an efficient version of Hamiltonian Monte Carlo sampling to infer optimal model parameters.

set_fdr(data, est_fdr[, modality_key])

Direct posterior probability approach to calculate credible effects while keeping the expected FDR at a certain level

set_init_mcmc_states(rng_key, ref_index, ...)

Sets initial MCMC state values for scCODA model

summary(data[, extended, modality_key])

Printing method for the summary.

summary_prepare(sample_adata[, est_fdr])

Generates summary dataframes for intercepts, effects and node-level effect (if using tree aggregation).

plot_boxplots(data, feature_name[, ...])

Grouped boxplot visualization.

plot_draw_effects(data, covariate[, ...])

Plot a tree with colored circles on the nodes indicating significant effects with bar plots which indicate leave-level significant effects.

plot_draw_tree(data[, modality_key, tree, ...])

Plot a tree using input ete3 tree object.

plot_effects_barplot(data[, modality_key, ...])

Barplot visualization for effects.

plot_effects_umap(mdata, effect_name, ...[, ...])

Plot a UMAP visualization colored by effect strength.

plot_rel_abundance_dispersion_plot(data[, ...])

Plots total variance of relative abundance versus minimum relative abundance of all cell types for determination of a reference cell type.

plot_stacked_barplot(data, feature_name[, ...])

Plots a stacked barplot for all levels of a covariate or all samples (if feature_name=="samples").

Methods

credible_effects

Sccoda.credible_effects(data, modality_key='coda', est_fdr=None)[source]
Decides which effects of the scCODA model are credible based on an adjustable inclusion probability threshold.

Note: Parameter est_fdr has no effect for spike-and-slab LASSO selection method

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • est_fdr (float) – Estimated false discovery rate. Must be between 0 and 1. Defaults to None.

Returns:

Credible effect decision series which includes boolean values indicate whether effects are credible under inc_prob_threshold.

Return type:

pd.Series

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> credible_effects = sccoda.credible_effects(mdata)

get_effect_df

Sccoda.get_effect_df(data, modality_key='coda')

Get effect dataframe as printed in the extended summary

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

Returns:

Effect data frame.

Return type:

pd.DataFrame

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> effects = sccoda.get_effect_df(mdata)

get_intercept_df

Sccoda.get_intercept_df(data, modality_key='coda')

Get intercept dataframe as printed in the extended summary

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

Returns:

Intercept data frame.

Return type:

pd.DataFrame

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> intercepts = sccoda.get_intercept_df(mdata)

get_node_df

Sccoda.get_node_df(data, modality_key='coda')

Get node effect dataframe as printed in the extended summary of a tascCODA model

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

Returns:

Node effect data frame.

Return type:

pd.DataFrame

Examples

>>> import pertpy as pt
>>> adata = pt.dt.tasccoda_example()
>>> tasccoda = pt.tl.Tasccoda()
>>> mdata = tasccoda.load(
>>>     adata, type="sample_level",
>>>     levels_agg=["Major_l1", "Major_l2", "Major_l3", "Major_l4", "Cluster"],
>>>     key_added="lineage", add_level_name=True
>>> )
>>> mdata = tasccoda.prepare(
>>>     mdata, formula="Health", reference_cell_type="automatic", tree_key="lineage", pen_args={"phi": 0}
>>> )
>>> tasccoda.run_nuts(mdata, num_samples=1000, num_warmup=100, rng_key=42)
>>> node_effects = tasccoda.get_node_df(mdata)

load

Sccoda.load(adata, type, generate_sample_level=True, cell_type_identifier=None, sample_identifier=None, covariate_uns=None, covariate_obs=None, covariate_df=None, modality_key_1='rna', modality_key_2='coda')[source]

Prepare a MuData object for subsequent processing. If type is “cell_level”, then create a compositional analysis dataset from the input adata.

When using type="cell_level", adata needs to have a column in adata.obs that contains the cell type assignment. Further, it must contain one column or a set of columns (e.g. subject id, treatment, disease status) that uniquely identify each (statistical) sample. Further covariates (e.g. subject age) can either be specified via addidional column names in adata.obs, a key in adata.uns, or as a separate DataFrame.

Parameters:
  • adata (AnnData) – AnnData object.

  • type (Literal['cell_level', 'sample_level']) – Specify the input adata type, which could be either a cell-level AnnData or an aggregated sample-level AnnData.

  • generate_sample_level (bool) – Whether to generate an AnnData object on the sample level or create an empty AnnData object.

  • cell_type_identifier (str) – If type is “cell_level”, specify column name in adata.obs that specifies the cell types. Defaults to None.

  • sample_identifier (str) – If type is “cell_level”, specify column name in adata.obs that specifies the sample. Defaults to None.

  • covariate_uns (str | None) – If type is “cell_level”, specify key for adata.uns, where covariate values are stored. Defaults to None.

  • covariate_obs (list[str] | None) – If type is “cell_level”, specify list of keys for adata.obs, where covariate values are stored. Defaults to None.

  • covariate_df (DataFrame | None) – If type is “cell_level”, specify dataFrame with covariates. Defaults to None.

  • modality_key_1 (str) – Key to the cell-level AnnData in the MuData object. Defaults to “rna”.

  • modality_key_2 (str) – Key to the aggregated sample-level AnnData object in the MuData object. Defaults to “coda”.

Returns:

MuData object with cell-level AnnData (mudata[modality_key_1]) and aggregated sample-level AnnData (mudata[modality_key_2]).

Return type:

MuData

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch", covariate_obs=["condition"])

make_arviz

Sccoda.make_arviz(data, modality_key='coda', rng_key=None, num_prior_samples=500, use_posterior_predictive=True)[source]

Creates arviz object from model results for MCMC diagnosis

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • rng_key – The rng state used for the prior simulation. If None, a random state will be selected. Defaults to None.

  • num_prior_samples (int) – Number of prior samples calculated. Defaults to 500.

  • use_posterior_predictive (bool) – If True, the posterior predictive will be calculated. Defaults to True.

Returns:

arviz_data with all MCMC information

Return type:

az.InferenceData

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> arviz_data = sccoda.make_arviz(mdata, num_prior_samples=100)

model

Sccoda.model(counts, covariates, n_total, ref_index, sample_adata)[source]

Implements scCODA model in numpyro

Parameters:
  • counts (ndarray) – Count data array

  • covariates (ndarray) – Covariate matrix

  • n_total (ndarray) – Number of counts per sample

  • ref_index – Index of reference feature

  • sample_adata (AnnData) – Anndata object with cell counts as sample_adata.X and covariates saved in sample_adata.obs.

Returns:

predictions (see numpyro documentation for details on models)

plot_boxplots

Sccoda.plot_boxplots(data, feature_name, modality_key='coda', y_scale='relative', plot_facets=False, add_dots=False, cell_types=None, args_boxplot=None, args_swarmplot=None, palette='Blues', show_legend=True, level_order=None, figsize=None, dpi=100, return_fig=None, ax=None, show=None, save=None)

Grouped boxplot visualization.

The cell counts for each cell type are shown as a group of boxplots with intra–group separation by a covariate from data.obs.

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object

  • feature_name (str) – The name of the feature in data.obs to plot

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • y_scale (Literal['relative', 'log', 'log10', 'count']) – Transformation to of cell counts. Options: “relative” - Relative abundance, “log” - log(count), “log10” - log10(count), “count” - absolute abundance (cell counts). Defaults to “relative”.

  • plot_facets (bool) – If False, plot cell types on the x-axis. If True, plot as facets. Defaults to False.

  • add_dots (bool) – If True, overlay a scatterplot with one dot for each data point. Defaults to False.

  • cell_types (list | None) – Subset of cell types that should be plotted. Defaults to None.

  • args_boxplot (dict | None) – Arguments passed to sns.boxplot. Defaults to {}.

  • args_swarmplot (dict | None) – Arguments passed to sns.swarmplot. Defaults to {}.

  • figsize (tuple[float, float] | None) – Figure size. Defaults to None.

  • dpi (int | None) – Dpi setting. Defaults to 100.

  • palette (str | None) – The seaborn color map for the barplot. Defaults to “Blues”.

  • show_legend (bool | None) – If True, adds a legend. Defaults to True.

  • level_order (list[str]) – Custom ordering of bars on the x-axis. Defaults to None.

Return type:

Axes | Figure | FacetGrid | None

Returns:

Depending on plot_facets, returns a Axes (plot_facets = False) or FacetGrid (plot_facets = True) object

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> sccoda.plot_boxplots(mdata, feature_name="condition", add_dots=True)
Preview:
../../_images/sccoda_boxplots.png

plot_draw_effects

Sccoda.plot_draw_effects(data, covariate, modality_key='coda', tree='tree', show_legend=None, show_leaf_effects=False, tight_text=False, show_scale=False, units='px', figsize=(None, None), dpi=100, show=True, save=None)

Plot a tree with colored circles on the nodes indicating significant effects with bar plots which indicate leave-level significant effects.

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • covariate (str) – The covariate, whose effects should be plotted.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • tree (str) – A ete3 tree object or a str to indicate the tree stored in .uns. Defaults to “tree”.

  • show_legend (bool | None) – If show legend of nodes significant effects or not. Defaults to False if show_leaf_effects is True.

  • show_leaf_effects (bool | None) – If True, plot bar plots which indicate leave-level significant effects. Defaults to False.

  • tight_text (bool | None) – When False, boundaries of the text are approximated according to general font metrics, producing slightly worse aligned text faces but improving the performance of tree visualization in scenes with a lot of text faces. Defaults to False.

  • show_scale (bool | None) – Include the scale legend in the tree image or not. Defaults to False.

  • show (bool | None) – If True, plot the tree inline. If false, return tree and tree_style objects. Defaults to True.

  • file_name – Path to the output image file. valid extensions are .SVG, .PDF, .PNG. Output image can be saved whether show is True or not. Defaults to None.

  • units (Optional[Literal['px', 'mm', 'in']]) – Unit of image sizes. “px”: pixels, “mm”: millimeters, “in”: inches. Defaults to “px”.

  • figsize (tuple[float, float] | None) – Figure size. Defaults to None.

  • dpi (int | None) – Dots per inches. Defaults to 100.

Return type:

TreeNode | None

Returns:

Depending on show, returns ete3.TreeNode and ete3.TreeStyle (show = False) or plot the tree inline (show = False)

Examples

>>> import pertpy as pt
>>> adata = pt.dt.tasccoda_example()
>>> tasccoda = pt.tl.Tasccoda()
>>> mdata = tasccoda.load(
>>>     adata, type="sample_level",
>>>     levels_agg=["Major_l1", "Major_l2", "Major_l3", "Major_l4", "Cluster"],
>>>     key_added="lineage", add_level_name=True
>>> )
>>> mdata = tasccoda.prepare(
>>>     mdata, formula="Health", reference_cell_type="automatic", tree_key="lineage", pen_args={"phi": 0}
>>> )
>>> tasccoda.run_nuts(mdata, num_samples=1000, num_warmup=100, rng_key=42)
>>> tasccoda.plot_draw_effects(mdata, covariate="Health[T.Inflamed]", tree="lineage")
Preview:
../../_images/tasccoda_draw_effects.png

plot_draw_tree

Sccoda.plot_draw_tree(data, modality_key='coda', tree='tree', tight_text=False, show_scale=False, units='px', figsize=(None, None), dpi=100, show=True, save=None)

Plot a tree using input ete3 tree object.

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • tree (str) – A ete3 tree object or a str to indicate the tree stored in .uns. Defaults to “tree”.

  • tight_text (bool | None) – When False, boundaries of the text are approximated according to general font metrics, producing slightly worse aligned text faces but improving the performance of tree visualization in scenes with a lot of text faces. Default to False.

  • show_scale (bool | None) – Include the scale legend in the tree image or not. Defaults to False.

  • show (bool | None) – If True, plot the tree inline. If false, return tree and tree_style objects. Defaults to True.

  • file_name – Path to the output image file. Valid extensions are .SVG, .PDF, .PNG. Output image can be saved whether show is True or not. Defaults to None.

  • units (Optional[Literal['px', 'mm', 'in']]) – Unit of image sizes. “px”: pixels, “mm”: millimeters, “in”: inches. Defaults to “px”.

  • figsize (tuple[float, float] | None) – Figure size. Defaults to None.

  • dpi (int | None) – Dots per inches. Defaults to 100.

Return type:

TreeNode | None

Returns:

Depending on show, returns ete3.TreeNode and ete3.TreeStyle (show = False) or plot the tree inline (show = False)

Examples

>>> import pertpy as pt
>>> adata = pt.dt.tasccoda_example()
>>> tasccoda = pt.tl.Tasccoda()
>>> mdata = tasccoda.load(
>>>     adata, type="sample_level",
>>>     levels_agg=["Major_l1", "Major_l2", "Major_l3", "Major_l4", "Cluster"],
>>>     key_added="lineage", add_level_name=True
>>> )
>>> mdata = tasccoda.prepare(
>>>     mdata, formula="Health", reference_cell_type="automatic", tree_key="lineage", pen_args={"phi": 0}
>>> )
>>> tasccoda.run_nuts(mdata, num_samples=1000, num_warmup=100, rng_key=42)
>>> tasccoda.plot_draw_tree(mdata, tree="lineage")
Preview:
../../_images/tasccoda_draw_tree.png

plot_effects_barplot

Sccoda.plot_effects_barplot(data, modality_key='coda', covariates=None, parameter='log2-fold change', plot_facets=True, plot_zero_covariate=True, plot_zero_cell_type=False, palette=<matplotlib.colors.ListedColormap object>, level_order=None, args_barplot=None, figsize=None, dpi=100, return_fig=None, ax=None, show=None, save=None)

Barplot visualization for effects.

The effect results for each covariate are shown as a group of barplots, with intra–group separation by cell types. The covariates groups can either be ordered along the x-axis of a single plot (plot_facets=False) or as plot facets (plot_facets=True).

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • covariates (str | list | None) – The name of the covariates in data.obs to plot. Defaults to None.

  • parameter (Literal['log2-fold change', 'Final Parameter', 'Expected Sample']) – The parameter in effect summary to plot. Defaults to “log2-fold change”.

  • plot_facets (bool) – If False, plot cell types on the x-axis. If True, plot as facets. Defaults to True.

  • plot_zero_covariate (bool) – If True, plot covariate that have all zero effects. If False, do not plot. Defaults to True.

  • plot_zero_cell_type (bool) – If True, plot cell type that have zero effect. If False, do not plot. Defaults to False.

  • figsize (tuple[float, float] | None) – Figure size. Defaults to None.

  • dpi (int | None) – Figure size. Defaults to 100.

  • palette (str | ListedColormap | None) – The seaborn color map for the barplot. Defaults to cm.tab20.

  • level_order (list[str]) – Custom ordering of bars on the x-axis. Defaults to None.

  • args_barplot (dict | None) – Arguments passed to sns.barplot. Defaults to None.

Return type:

Axes | Figure | FacetGrid | None

Returns:

Depending on plot_facets, returns a Axes (plot_facets = False) or FacetGrid (plot_facets = True) object

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> sccoda.plot_effects_barplot(mdata)
Preview:
../../_images/sccoda_effects_barplot.png

plot_effects_umap

Sccoda.plot_effects_umap(mdata, effect_name, cluster_key, modality_key_1='rna', modality_key_2='coda', color_map=None, palette=None, return_fig=None, ax=None, show=None, save=None, **kwargs)

Plot a UMAP visualization colored by effect strength.

Effect results in .varm of aggregated sample-level AnnData (default is data[‘coda’]) are assigned to cell-level AnnData (default is data[‘rna’]) depending on the cluster they were assigned to.

Parameters:
  • mudata – MuData object.

  • effect_name (str | list | None) – The name of the effect results in .varm of aggregated sample-level AnnData to plot

  • cluster_key (str) – The cluster information in .obs of cell-level AnnData (default is data[‘rna’]). To assign cell types’ effects to original cells.

  • modality_key_1 (str) – Key to the cell-level AnnData in the MuData object. Defaults to “rna”.

  • modality_key_2 (str) – Key to the aggregated sample-level AnnData object in the MuData object. Defaults to “coda”.

  • show (bool) – Whether to display the figure or return axis. Defaults to None.

  • ax (Axes) – A matplotlib axes object. Only works if plotting a single component. Defaults to None.

  • **kwargs – All other keyword arguments are passed to scanpy.plot.umap()

Return type:

Axes | Figure | None

Returns:

If show==False a Axes or a list of it.

Examples

>>> import pertpy as pt
>>> import scanpy as sc
>>> import schist
>>> adata = pt.dt.haber_2017_regions()
>>> sc.pp.neighbors(adata)
>>> schist.inference.nested_model(adata, n_init=100, random_seed=5678)
>>> tasccoda_model = pt.tl.Tasccoda()
>>> tasccoda_data = tasccoda_model.load(adata, type="cell_level",
>>>                 cell_type_identifier="nsbm_level_1",
>>>                 sample_identifier="batch", covariate_obs=["condition"],
>>>                 levels_orig=["nsbm_level_4", "nsbm_level_3", "nsbm_level_2", "nsbm_level_1"],
>>>                 add_level_name=True)
>>> tasccoda_model.prepare(
>>>     tasccoda_data,
>>>     modality_key="coda",
>>>     reference_cell_type="18",
>>>     formula="condition",
>>>     pen_args={"phi": 0, "lambda_1": 3.5},
>>>     tree_key="tree"
>>> )
>>> tasccoda_model.run_nuts(
...     tasccoda_data, modality_key="coda", rng_key=1234, num_samples=10000, num_warmup=1000
... )
>>> tasccoda_model.run_nuts(
...     tasccoda_data, modality_key="coda", rng_key=1234, num_samples=10000, num_warmup=1000
... )
>>> sc.tl.umap(tasccoda_data["rna"])
>>> tasccoda_model.plot_effects_umap(tasccoda_data,
>>>                         effect_name=["effect_df_condition[T.Salmonella]",
>>>                                      "effect_df_condition[T.Hpoly.Day3]",
>>>                                      "effect_df_condition[T.Hpoly.Day10]"],
>>>                                       cluster_key="nsbm_level_1",
>>>                         )
Preview:
../../_images/tasccoda_effects_umap.png

plot_rel_abundance_dispersion_plot

Sccoda.plot_rel_abundance_dispersion_plot(data, modality_key='coda', abundant_threshold=0.9, default_color='Grey', abundant_color='Red', label_cell_types=True, figsize=None, dpi=100, return_fig=None, ax=None, show=None, save=None)

Plots total variance of relative abundance versus minimum relative abundance of all cell types for determination of a reference cell type.

If the count of the cell type is larger than 0 in more than abundant_threshold percent of all samples, the cell type will be marked in a different color.

Parameters:
  • data (AnnData | MuData) – AnnData or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”. Defaults to “coda”.

  • abundant_threshold (float | None) – Presence threshold for abundant cell types. Defaults to 0.9.

  • default_color (str | None) – Bar color for all non-minimal cell types. Defaults to “Grey”.

  • abundant_color (str | None) – Bar color for cell types with abundant percentage larger than abundant_threshold. Defaults to “Red”.

  • label_cell_types (bool) – Label dots with cell type names. Defaults to True.

  • figsize (tuple[float, float] | None) – Figure size. Defaults to None.

  • dpi (int | None) – Dpi setting. Defaults to 100.

  • ax (Axes | None) – A matplotlib axes object. Only works if plotting a single component. Defaults to None.

Return type:

Axes | Figure | None

Returns:

A Axes object

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> sccoda.plot_rel_abundance_dispersion_plot(mdata)
Preview:
../../_images/sccoda_rel_abundance_dispersion_plot.png

plot_stacked_barplot

Sccoda.plot_stacked_barplot(data, feature_name, modality_key='coda', palette=<matplotlib.colors.ListedColormap object>, show_legend=True, level_order=None, figsize=None, dpi=100, return_fig=None, ax=None, show=None, save=None, **kwargs)

Plots a stacked barplot for all levels of a covariate or all samples (if feature_name==”samples”).

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • feature_name (str) – The name of the covariate to plot. If feature_name==”samples”, one bar for every sample will be plotted

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • figsize (tuple[float, float] | None) – Figure size. Defaults to None.

  • dpi (int | None) – Dpi setting. Defaults to 100.

  • palette (ListedColormap | None) – The matplotlib color map for the barplot. Defaults to cm.tab20.

  • show_legend (bool | None) – If True, adds a legend. Defaults to True.

  • level_order (list[str]) – Custom ordering of bars on the x-axis. Defaults to None.

Return type:

Axes | Figure | None

Returns:

A Axes object

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> sccoda.plot_stacked_barplot(mdata, feature_name="samples")
Preview:
../../_images/sccoda_stacked_barplot.png

prepare

Sccoda.prepare(data, formula, reference_cell_type='automatic', automatic_reference_absence_threshold=0.05, modality_key='coda')[source]

Handles data preprocessing, covariate matrix creation, reference selection, and zero count replacement for scCODA.

Parameters:
  • data (AnnData | MuData) – Anndata object with cell counts as sample_adata.X and covariates saved in sample_adata.obs.

  • formula (str) – R-style formula for building the covariate matrix. Categorical covariates are handled automatically, with the covariate value of the first sample being used as the reference category. To set a different level as the base category for a categorical covariate, use “C(<CovariateName>, Treatment(‘<ReferenceLevelName>’))”

  • reference_cell_type (str) – Column name that sets the reference cell type. Reference the name of a column. If “automatic”, the cell type with the lowest dispersion in relative abundance that is present in at least 90% of samlpes will be chosen. Defaults to “automatic”.

  • automatic_reference_absence_threshold (float) – If using reference_cell_type = “automatic”, determine the maximum fraction of zero entries for a cell type to be considered as a possible reference cell type. Defaults to 0.05.

  • modality_key (str) – If data is a MuData object, specify key to the aggregated sample-level AnnData object in the MuData object. Defaults to “coda”.

Return type:

AnnData | MuData

Returns:

Return an AnnData (if input data is an AnnData object) or return a MuData (if input data is a MuData object)

Specifically, parameters have been set:

  • adata.uns[“param_names”] or data[modality_key].uns[“param_names”]: List with the names of all tracked latent model parameters (through npy.sample or npy.deterministic)

  • adata.uns[“scCODA_params”][“model_type”] or data[modality_key].uns[“scCODA_params”][“model_type”]: String indicating the model type (“classic”)

  • adata.uns[“scCODA_params”][“select_type”] or data[modality_key].uns[“scCODA_params”][“select_type”]: String indicating the type of spike_and_slab selection (“spikeslab”)

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")

run_hmc

Sccoda.run_hmc(data, modality_key='coda', num_samples=20000, num_warmup=5000, rng_key=None, copy=False, *args, **kwargs)

Run standard Hamiltonian Monte Carlo sampling (Neal, 2011) to infer optimal model parameters.

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • num_samples (int) – Number of sampled values after burn-in. Defaults to 20000.

  • num_warmup (int) – Number of burn-in (warmup) samples. Defaults to 5000.

  • rng_key – The rng state used. If None, a random state will be selected. Defaults to None.

  • copy (bool) – Return a copy instead of writing to adata. Defaults to False.

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_hmc(mdata, num_warmup=100, num_samples=1000)

run_nuts

Sccoda.run_nuts(data, modality_key='coda', num_samples=10000, num_warmup=1000, rng_key=0, copy=False, *args, **kwargs)[source]

Run No-U-turn sampling (Hoffman and Gelman, 2014), an efficient version of Hamiltonian Monte Carlo sampling to infer optimal model parameters.

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • num_samples (int) – Number of sampled values after burn-in. Defaults to 10000.

  • num_warmup (int) – Number of burn-in (warmup) samples. Defaults to 1000.

  • rng_key (int) – The rng state used. Defaults to 0.

  • copy (bool) – Return a copy instead of writing to adata. Defaults to False.

Returns:

Calls self.__run_mcmc

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)

set_fdr

Sccoda.set_fdr(data, est_fdr, modality_key='coda', *args, **kwargs)[source]
Direct posterior probability approach to calculate credible effects while keeping the expected FDR at a certain level

Note: Does not work for spike-and-slab LASSO selection method

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • est_fdr (float) – Desired FDR value.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • args – passed to self.summary_prepare

  • kwargs – passed to self.summary_prepare

Returns:

Adjusts intercept_df and effect_df

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> sccoda.set_fdr(mdata, est_fdr=0.4)

set_init_mcmc_states

Sccoda.set_init_mcmc_states(rng_key, ref_index, sample_adata)[source]

Sets initial MCMC state values for scCODA model

Parameters:
  • rng_key (None) – RNG value to be set

  • ref_index (ndarray) – Index of reference feature

  • sample_adata (AnnData) – Anndata object with cell counts as sample_adata.X and covariates saved in sample_adata.obs.

Return type:

AnnData

Returns:

Return AnnData object.

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> adata = sccoda.set_init_mcmc_states(rng_key=42, ref_index=0, sample_adata=mdata["coda"])

summary

Sccoda.summary(data, extended=False, modality_key='coda', *args, **kwargs)[source]

Printing method for the summary.

Parameters:
  • data (AnnData | MuData) – AnnData object or MuData object.

  • extended (bool) – If True, return the extended summary with additional statistics. Defaults to False.

  • modality_key (str) – If data is a MuData object, specify which modality to use. Defaults to “coda”.

  • args – Passed to az.summary

  • kwargs – Passed to az.summary

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> sccoda.summary(mdata)

Examples

>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells,
>>>                     type="cell_level",
>>>                     generate_sample_level=True,
>>>                     cell_type_identifier="cell_label",
>>>                     sample_identifier="batch",
>>>                     covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> sccoda.summary(mdata)

summary_prepare

Sccoda.summary_prepare(sample_adata, est_fdr=0.05, *args, **kwargs)
Generates summary dataframes for intercepts, effects and node-level effect (if using tree aggregation).

This function builds on and supports all functionalities from az.summary.

Parameters:
  • sample_adata (AnnData) – Anndata object with cell counts as sample_adata.X and covariates saved in sample_adata.obs.

  • est_fdr (float) – Desired FDR value. Defaults to 0.05.

  • args – Passed to az.summary

  • kwargs – Passed to az.summary

Returns:

Intercept, effect and node-level DataFrames

intercept_df

Summary of intercept parameters. Contains one row per cell type.

  • Final Parameter: Final intercept model parameter

  • HDI X%: Upper and lower boundaries of confidence interval (width specified via hdi_prob=)

  • SD: Standard deviation of MCMC samples

  • Expected sample: Expected cell counts for a sample with no present covariates. See the tutorial for more explanation

effect_df

Summary of effect (slope) parameters. Contains one row per covariate/cell type combination.

  • Final Parameter: Final effect model parameter. If this parameter is 0, the effect is not significant, else it is.

  • HDI X%: Upper and lower boundaries of confidence interval (width specified via hdi_prob=)

  • SD: Standard deviation of MCMC samples

  • Expected sample: Expected cell counts for a sample with only the current covariate set to 1. See the tutorial for more explanation

  • log2-fold change: Log2-fold change between expected cell counts with no covariates and with only the current covariate

  • Inclusion probability: Share of MCMC samples, for which this effect was not set to 0 by the spike-and-slab prior.

node_df

Summary of effect (slope) parameters on the tree nodes (features or groups of features). Contains one row per covariate/cell type combination.

  • Final Parameter: Final effect model parameter. If this parameter is 0, the effect is not significant, else it is.

  • Median: Median of parameter over MCMC chain

  • HDI X%: Upper and lower boundaries of confidence interval (width specified via hdi_prob=)

  • SD: Standard deviation of MCMC samples

  • Delta: Decision boundary value - threshold of practical significance

  • Is credible: Boolean indicator whether effect is credible

Return type:

Tuple[pd.DataFrame, pd.DataFrame] or Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

Examples:
>>> import pertpy as pt
>>> haber_cells = pt.dt.haber_2017_regions()
>>> sccoda = pt.tl.Sccoda()
>>> mdata = sccoda.load(haber_cells, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_label",                 sample_identifier="batch", covariate_obs=["condition"])
>>> mdata = sccoda.prepare(mdata, formula="condition", reference_cell_type="Endocrine")
>>> sccoda.run_nuts(mdata, num_warmup=100, num_samples=1000, rng_key=42)
>>> intercept_df, effect_df = sccoda.summary_prepare(mdata["coda"])