pertpy.preprocessing.GuideAssignment#

class GuideAssignment[source]#

Assign cells to guide RNAs.

Methods table#

assign_by_threshold()

Simple threshold based gRNA assignment function.

assign_mixture_model()

Assigns gRNAs to cells using a Poisson-Gaussian mixture model.

assign_to_max_guide()

Simple threshold based max gRNA assignment function.

assign_to_max_guide_anndata(adata, /, *, ...)

assign_to_max_guide_numpy(X, /, *, var, ...)

assign_to_max_guide_sparse(X, /, *, var, ...)

plot_heatmap(adata, *[, layer, order_by, ...])

Heatmap plotting of guide RNA expression matrix.

Methods#

GuideAssignment.assign_by_threshold(data, /, *, assignment_threshold, layer=None, output_layer='assigned_guides')[source]#
GuideAssignment.assign_by_threshold(adata, /, *, assignment_threshold, layer=None, output_layer='assigned_guides')
GuideAssignment.assign_by_threshold(X, /, *, assignment_threshold)
GuideAssignment.assign_by_threshold(X, /, *, assignment_threshold)

Simple threshold based gRNA assignment function.

Each cell is assigned to gRNA with at least assignment_threshold counts. This function expects unnormalized data as input.

Parameters:
  • data (AnnData | ndarray | csr_matrix) – The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

  • assignment_threshold (float) – The count threshold that is required for an assignment to be viable.

  • layer (str | None, default: None) – Key to the layer containing raw count values of the gRNAs. adata.X is used if layer is None. Expects count data.

  • output_layer (str, default: 'assigned_guides') – Assigned guide will be saved on adata.layers[output_key].

Examples

Each cell is assigned to gRNA that occurs at least 5 times in the respective cell.

>>> import pertpy as pt
>>> mdata = pt.data.papalexi_2021()
>>> gdo = mdata.mod["gdo"]
>>> ga = pt.pp.GuideAssignment()
>>> ga.assign_by_threshold(gdo, assignment_threshold=5)
GuideAssignment.assign_mixture_model(data, /, *, model='poisson_gauss_mixture', layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)[source]#
GuideAssignment.assign_mixture_model(adata, /, *, model='poisson_gauss_mixture', layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)
GuideAssignment.assign_mixture_model(X, /, *, var, model='poisson_gauss_mixture', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)
GuideAssignment.assign_mixture_model(X, /, *, var, model='poisson_gauss_mixture', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)

Assigns gRNAs to cells using a Poisson-Gaussian mixture model.

The model, priors, and per-guide thresholding rule reproduce crispat.ga_poisson_gauss (Velten group, velten-group/crispat). MAP estimation runs for all guides in parallel on JAX, replacing crispat’s per-guide Pyro SVI loop.

For each guide, the model is fit only to cells with non-zero counts. Log2 counts are modelled as a mixture of a continuous Poisson background and a Gaussian on-target component with priors weights ~ Dirichlet([0.9, 0.1]), mu ~ Normal(3, 2), scale ~ LogNormal(2, 1), lam ~ LogNormal(0, 1). A cell is assigned to a guide if its UMI count is at least the smallest integer t for which P(Normal | log2(t)) > 0.5.

Parameters:
  • data (AnnData | ndarray | csr_matrix) – AnnData with gRNA counts, or a dense or sparse cell-by-guide count matrix.

  • model (Literal['poisson_gauss_mixture'], default: 'poisson_gauss_mixture') – The mixture model to use; currently only "poisson_gauss_mixture" is supported.

  • layer (str | None, default: None) – Layer name to use when data is an AnnData (defaults to X).

  • assigned_guides_key (str, default: 'assigned_guide') – Per-cell assignment is saved on adata.obs[assigned_guides_key].

  • no_grna_assigned_key (str, default: 'negative') – Key to use when a cell is negative for all gRNAs.

  • max_assignments_per_cell (int, default: 5) – Maximum number of gRNAs that can be assigned to a cell.

  • multiple_grna_assigned_key (str, default: 'multiple') – Key to use when more than max_assignments_per_cell gRNAs are assigned.

  • multiple_grna_assignment_string (str, default: '+') – Separator used to join multiple gRNAs assigned to one cell.

  • only_return_results (bool, default: False) – If True, do not modify adata and return the assignment array.

  • show_progress (bool, default: False) – Whether to print a progress line.

  • n_iter (int, default: 500) – Optimization steps for the SVI loop (crispat default: 500).

  • learning_rate (float, default: 0.01) – Adam learning rate (crispat default: 0.01).

  • n_init_seeds (int, default: 10) – Number of prior-sampled inits per guide; best is kept (crispat default: 10).

  • seed (int, default: 2024) – Random seed used for initialization.

Return type:

ndarray | None

Examples

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> gdo = mdata.mod["gdo"]
>>> ga = pt.pp.GuideAssignment()
>>> ga.assign_mixture_model(gdo)
GuideAssignment.assign_to_max_guide(data, /, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]#
GuideAssignment.assign_to_max_guide(adata, /, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')
GuideAssignment.assign_to_max_guide(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')
GuideAssignment.assign_to_max_guide(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')

Simple threshold based max gRNA assignment function.

Each cell is assigned to the most expressed gRNA if it has at least assignment_threshold counts. This function expects unnormalized data as input.

Parameters:
  • data (AnnData | ndarray | csr_matrix) – The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

  • assignment_threshold (float) – The count threshold that is required for an assignment to be viable.

  • layer (str | None, default: None) – Key to the layer containing raw count values of the gRNAs. adata.X is used if layer is None. Expects count data.

  • obs_key (str, default: 'assigned_guide') – Assigned guide will be saved on adata.obs[output_key].

  • no_grna_assigned_key (str, default: 'Negative') – The key to return if no gRNA is expressed enough.

Return type:

ndarray | None

Examples

Each cell is assigned to the most expressed gRNA if it has at least 5 counts.

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> gdo = mdata.mod["gdo"]
>>> ga = pt.pp.GuideAssignment()
>>> ga.assign_to_max_guide(gdo, assignment_threshold=5)
GuideAssignment.assign_to_max_guide_anndata(adata, /, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]#
Return type:

None

GuideAssignment.assign_to_max_guide_numpy(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')[source]#
Return type:

ndarray

GuideAssignment.assign_to_max_guide_sparse(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')[source]#
Return type:

ndarray

GuideAssignment.plot_heatmap(adata, *, layer=None, order_by=None, key_to_save_order=None, return_fig=False, **kwargs)[source]#

Heatmap plotting of guide RNA expression matrix.

Assuming guides have sparse expression, this function reorders cells and plots guide RNA expression so that a nice sparse representation is achieved. The cell ordering can be stored and reused in future plots to obtain consistent plots before and after analysis of the guide RNA expression. Note: This function expects a log-normalized or binary data.

Parameters:
  • adata (AnnData) – Annotated data matrix containing gRNA values

  • layer (str | None, default: None) – Key to the layer containing log normalized count values of the gRNAs. adata.X is used if layer is None.

  • order_by (ndarray | str | None, default: None) – The order of cells in y axis. If None, cells will be reordered to have a nice sparse representation. If a string is provided, adata.obs[order_by] will be used as the order. If a numpy array is provided, the array will be used for ordering.

  • key_to_save_order (str, default: None) – The obs key to save cell orders in the current plot. Only saves if not None.

  • return_fig (bool, default: False) – if True, returns figure of the plot, that can be used for saving.

  • kwargs – Are passed to sc.pl.heatmap.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None. Order of cells in the y-axis will be saved on adata.obs[key_to_save_order] if provided.

Examples

Each cell is assigned to gRNA that occurs at least 5 times in the respective cell, which is then visualized using a heatmap.

>>> import pertpy as pt
>>> mdata = pt.dt.papalexi_2021()
>>> gdo = mdata.mod["gdo"]
>>> ga = pt.pp.GuideAssignment()
>>> ga.assign_by_threshold(gdo, assignment_threshold=5)
>>> ga.plot_heatmap(gdo)