pertpy.tools.EdgeR#

class EdgeR(adata, design, *, mask=None, layer=None, **kwargs)[source]#

Differential expression test using EdgeR.

Attributes table#

data

Get the data matrix from anndata this object was initalized with (X or layer).

variables

Get the names of the variables used in the model definition.

Methods table#

compare_groups(adata, column, baseline, ...)

Compare between groups in a specified column.

cond(**kwargs)

Get a contrast vector representing a specific condition.

contrast(*args, **kwargs)

Build a simple contrast for pairwise comparisons.

fit(**kwargs)

Fit model using edgeR.

plot_fold_change(results_df, *[, var_names, ...])

Plot a metric from the results as a bar chart, optionally with additional information about paired samples in a scatter plot.

plot_multicomparison_fc(results_df, *[, ...])

Plot a matrix of log2 fold changes from the results.

plot_paired(adata, results_df, groupby, ...)

Creates a pairwise expression plot from a Pandas DataFrame or Anndata.

plot_volcano(data, *[, log2fc_col, ...])

Creates a volcano plot from a pandas DataFrame or Anndata.

test_contrasts(contrasts, **kwargs)

Perform a comparison as specified in a contrast vector.

test_reduced(modelB)

Test against a reduced model.

Attributes#

EdgeR.data#

Get the data matrix from anndata this object was initalized with (X or layer).

EdgeR.variables#

Get the names of the variables used in the model definition.

Methods#

classmethod EdgeR.compare_groups(adata, column, baseline, groups_to_compare, *, paired_by=None, mask=None, layer=None, fit_kwargs=mappingproxy({}), test_kwargs=mappingproxy({}))#

Compare between groups in a specified column.

Parameters:
  • adata (AnnData) – AnnData object.

  • column (str) – column in obs that contains the grouping information.

  • baseline (str) – baseline value (one category from variable).

  • groups_to_compare (str | Iterable[str]) – One or multiple categories from variable to compare against baseline.

  • paired_by (str | None, default: None) – Column from obs that contains information about paired sample (e.g. subject_id).

  • mask (Series | None, default: None) – Subset anndata by a boolean mask stored in this column in .obs before making any tests.

  • layer (str | None, default: None) – Use this layer instead of .X.

  • fit_kwargs (default: mappingproxy({})) – Additional fit options.

  • test_kwargs (default: mappingproxy({})) – Additional test options.

Returns:

Pandas dataframe with results ordered by significance. If multiple comparisons were performed this is indicated in an additional column.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> res_df = edgr.compare_groups(pdata, column="Efficacy", baseline="SD", groups_to_compare=["PR", "PD"])
EdgeR.cond(**kwargs)#

Get a contrast vector representing a specific condition.

Parameters:

**kwargs – column/value pairs.

Returns:

A contrast vector that aligns to the columns of the design matrix.

EdgeR.contrast(*args, **kwargs)#

Build a simple contrast for pairwise comparisons.

Parameters:
  • column – column in adata.obs to test on.

  • baseline – baseline category (denominator).

  • group_to_compare – category to compare against baseline (nominator).

Returns:

Numeric contrast vector.

EdgeR.fit(**kwargs)[source]#

Fit model using edgeR.

Note: this creates its own AnnData object for downstream.

Parameters:

**kwargs – Keyword arguments specific to glmQLFit()

EdgeR.plot_fold_change(results_df, *, var_names=None, n_top_vars=15, log2fc_col='log_fc', symbol_col='variable', y_label='Log2 fold change', figsize=(10, 5), return_fig=False, **barplot_kwargs)#

Plot a metric from the results as a bar chart, optionally with additional information about paired samples in a scatter plot.

Parameters:
  • results_df (DataFrame) – DataFrame with results from DE analysis.

  • var_names (Sequence[str], default: None) – Variables to plot. If None, the top n_top_vars variables based on the log2 fold change are plotted.

  • n_top_vars (int, default: 15) – Number of top variables to plot. The top and bottom n_top_vars variables are plotted, respectively.

  • log2fc_col (str, default: 'log_fc') – Column name of log2 Fold-Change values.

  • symbol_col (str, default: 'variable') – Column name of gene IDs.

  • y_label (str, default: 'Log2 fold change') – Label for the y-axis.

  • figsize (tuple[int, int], default: (10, 5)) – Size of the figure.

  • return_fig (bool, default: False) – if True, returns figure of the plot, that can be used for saving.

  • **barplot_kwargs – Additional arguments for seaborn.barplot.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_fold_change(res_df)
Preview:
../../_images/de_fold_change.png
EdgeR.plot_multicomparison_fc(results_df, *, n_top_vars=15, contrast_col='contrast', log2fc_col='log_fc', pvalue_col='adj_p_value', symbol_col='variable', marker_size=100, figsize=(10, 2), x_label='Contrast', y_label='Gene', return_fig=False, **heatmap_kwargs)#

Plot a matrix of log2 fold changes from the results.

Parameters:
  • results_df (DataFrame) – DataFrame with results from DE analysis.

  • n_top_vars (default: 15) – Number of top variables to plot per group.

  • contrast_col (str, default: 'contrast') – Column in results_df containing information about the contrast.

  • log2fc_col (str, default: 'log_fc') – Column in results_df containing the log2 fold change.

  • pvalue_col (str, default: 'adj_p_value') – Column in results_df containing the p-value. Can be used to switch between adjusted and unadjusted p-values.

  • symbol_col (str, default: 'variable') – Column in results_df containing the gene symbol.

  • marker_size (int, default: 100) – Size of the biggest marker for significant variables.

  • figsize (tuple[int, int], default: (10, 2)) – Size of the figure.

  • x_label (str, default: 'Contrast') – Label for the x-axis.

  • y_label (str, default: 'Gene') – Label for the y-axis.

  • return_fig (bool, default: False) – if True, returns figure of the plot, that can be used for saving.

  • **heatmap_kwargs – Additional arguments for seaborn.heatmap.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> res_df = edgr.compare_groups(pdata, column="Efficacy", baseline="SD", groups_to_compare=["PR", "PD"])
>>> edgr.plot_multicomparison_fc(res_df)
Preview:
../../_images/de_multicomparison_fc.png
EdgeR.plot_paired(adata, results_df, groupby, pairedby, *, var_names=None, n_top_vars=15, layer=None, pvalue_col='adj_p_value', symbol_col='variable', n_cols=4, panel_size=(5, 5), show_legend=True, size=10, y_label='expression', pvalue_template=<function MethodBase.<lambda>>, boxplot_properties=None, palette=None, return_fig=False)#

Creates a pairwise expression plot from a Pandas DataFrame or Anndata.

Visualizes a panel of paired scatterplots per variable.

Parameters:
  • adata (AnnData) – AnnData object, can be pseudobulked.

  • results_df (DataFrame) – DataFrame with results from a differential expression test.

  • groupby (str) – .obs column containing the grouping. Must contain exactly two different values.

  • pairedby (str) – .obs column containing the pairing (e.g. “patient_id”). If None, an independent t-test is performed.

  • var_names (Sequence[str], default: None) – Variables to plot.

  • n_top_vars (int, default: 15) – Number of top variables to plot.

  • layer (str, default: None) – Layer to use for plotting.

  • pvalue_col (str, default: 'adj_p_value') – Column name of the p values.

  • symbol_col (str, default: 'variable') – Column name of gene IDs.

  • n_cols (int, default: 4) – Number of columns in the plot.

  • panel_size (tuple[int, int], default: (5, 5)) – Size of each panel.

  • show_legend (bool, default: True) – Whether to show the legend.

  • size (int, default: 10) – Size of the points.

  • y_label (str, default: 'expression') – Label for the y-axis.

  • pvalue_template (default: <function MethodBase.<lambda> at 0x761b8929e2a0>) – Template for the p-value string displayed in the title of each panel.

  • boxplot_properties (default: None) – Additional properties for the boxplot, passed to seaborn.boxplot.

  • palette (default: None) – Color palette for the line- and stripplot.

  • return_fig (bool, default: False) – if True, returns figure of the plot, that can be used for saving.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_paired(pdata, results_df=res_df, n_top_vars=8, groupby="Treatment", pairedby="Efficacy")
Preview:
../../_images/de_paired_expression.png
EdgeR.plot_volcano(data, *, log2fc_col='log_fc', pvalue_col='adj_p_value', symbol_col='variable', pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, s_curve=False, colors=None, varm_key=None, color_dict=None, shape_dict=None, size_col=None, fontsize=10, top_right_frame=False, figsize=(5, 5), legend_pos=(1.6, 1), point_sizes=(15, 150), shapes=None, shape_order=None, x_label=None, y_label=None, return_fig=False, **kwargs)#

Creates a volcano plot from a pandas DataFrame or Anndata.

Parameters:
  • data (DataFrame | AnnData) – DataFrame or Anndata to plot.

  • log2fc_col (str, default: 'log_fc') – Column name of log2 Fold-Change values.

  • pvalue_col (str, default: 'adj_p_value') – Column name of the p values.

  • symbol_col (str, default: 'variable') – Column name of gene IDs.

  • varm_key (str | None, default: None) – Key in Anndata.varm slot to use for plotting if an Anndata object was passed.

  • size_col (str | None, default: None) – Column name to size points by.

  • point_sizes (tuple[int, int], default: (15, 150)) – Lower and upper bounds of point sizes.

  • pval_thresh (float, default: 0.05) – Threshold p value for significance.

  • log2fc_thresh (float, default: 0.75) – Threshold for log2 fold change significance.

  • to_label (int | list[str], default: 5) – Number of top genes or list of genes to label.

  • s_curve (bool | None, default: False) – Whether to use a reciprocal threshold for up and down gene determination.

  • color_dict (dict[str, list[str]] | None, default: None) – Dictionary for coloring dots by categories.

  • shape_dict (dict[str, list[str]] | None, default: None) – Dictionary for shaping dots by categories.

  • fontsize (int, default: 10) – Size of gene labels.

  • colors (list[str], default: None) – Colors for [non-DE, up, down] genes. Defaults to [‘gray’, ‘#D62728’, ‘#1F77B4’].

  • top_right_frame (bool, default: False) – Whether to show the top and right frame of the plot.

  • figsize (tuple[int, int], default: (5, 5)) – Size of the figure.

  • legend_pos (tuple[float, float], default: (1.6, 1)) – Position of the legend as determined by matplotlib.

  • shapes (list[str] | None, default: None) – List of matplotlib marker ids.

  • shape_order (list[str] | None, default: None) – Order of categories for shapes.

  • x_label (str | None, default: None) – Label for the x-axis.

  • y_label (str | None, default: None) – Label for the y-axis.

  • return_fig (bool, default: False) – if True, returns figure of the plot, that can be used for saving.

  • **kwargs (int) – Additional arguments for seaborn.scatterplot.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_volcano(res_df, log2fc_thresh=0)
Preview:
../../_images/de_volcano.png
EdgeR.test_contrasts(contrasts, **kwargs)#

Perform a comparison as specified in a contrast vector.

Parameters:
  • contrasts (ndarray | Mapping[str | None, ndarray]) – Either a numeric contrast vector, or a dictionary of numeric contrast vectors.

  • **kwargs – passed to the respective implementation.

Returns:

A dataframe with the results.

EdgeR.test_reduced(modelB)#

Test against a reduced model.

Parameters:

modelB – the reduced model against which to test.

Example

>>> import pertpy as pt
>>> modelA = Model().fit()
>>> modelB = Model().fit()
>>> modelA.test_reduced(modelB)