pertpy.preprocessing.GuideAssignment#
Methods table#
Simple threshold based gRNA assignment function. |
|
Assigns gRNAs to cells using a Poisson-Gaussian mixture model. |
|
Simple threshold based max gRNA assignment function. |
|
|
|
|
|
|
|
|
Heatmap plotting of guide RNA expression matrix. |
Methods#
- GuideAssignment.assign_by_threshold(data, /, *, assignment_threshold, layer=None, output_layer='assigned_guides')[source]#
- GuideAssignment.assign_by_threshold(adata, /, *, assignment_threshold, layer=None, output_layer='assigned_guides')
- GuideAssignment.assign_by_threshold(X, /, *, assignment_threshold)
- GuideAssignment.assign_by_threshold(X, /, *, assignment_threshold)
Simple threshold based gRNA assignment function.
Each cell is assigned to gRNA with at least assignment_threshold counts. This function expects unnormalized data as input.
- Parameters:
data (
AnnData|ndarray|csr_matrix) – The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.assignment_threshold (
float) – The count threshold that is required for an assignment to be viable.layer (
str|None, default:None) – Key to the layer containing raw count values of the gRNAs. adata.X is used if layer is None. Expects count data.output_layer (
str, default:'assigned_guides') – Assigned guide will be saved on adata.layers[output_key].
Examples
Each cell is assigned to gRNA that occurs at least 5 times in the respective cell.
>>> import pertpy as pt >>> mdata = pt.data.papalexi_2021() >>> gdo = mdata.mod["gdo"] >>> ga = pt.pp.GuideAssignment() >>> ga.assign_by_threshold(gdo, assignment_threshold=5)
- GuideAssignment.assign_mixture_model(data, /, *, model='poisson_gauss_mixture', layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)[source]#
- GuideAssignment.assign_mixture_model(adata, /, *, model='poisson_gauss_mixture', layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)
- GuideAssignment.assign_mixture_model(X, /, *, var, model='poisson_gauss_mixture', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)
- GuideAssignment.assign_mixture_model(X, /, *, var, model='poisson_gauss_mixture', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', show_progress=False, n_iter=500, learning_rate=0.01, n_init_seeds=10, seed=2024)
Assigns gRNAs to cells using a Poisson-Gaussian mixture model.
The model, priors, and per-guide thresholding rule reproduce
crispat.ga_poisson_gauss(Velten group, velten-group/crispat). MAP estimation runs for all guides in parallel on JAX, replacing crispat’s per-guide Pyro SVI loop.For each guide, the model is fit only to cells with non-zero counts. Log2 counts are modelled as a mixture of a continuous Poisson background and a Gaussian on-target component with priors
weights ~ Dirichlet([0.9, 0.1]),mu ~ Normal(3, 2),scale ~ LogNormal(2, 1),lam ~ LogNormal(0, 1). A cell is assigned to a guide if its UMI count is at least the smallest integertfor whichP(Normal | log2(t)) > 0.5.- Parameters:
data (
AnnData|ndarray|csr_matrix) – AnnData with gRNA counts, or a dense or sparse cell-by-guide count matrix.model (
Literal['poisson_gauss_mixture'], default:'poisson_gauss_mixture') – The mixture model to use; currently only"poisson_gauss_mixture"is supported.layer (
str|None, default:None) – Layer name to use whendatais an AnnData (defaults toX).assigned_guides_key (
str, default:'assigned_guide') – Per-cell assignment is saved onadata.obs[assigned_guides_key].no_grna_assigned_key (
str, default:'negative') – Key to use when a cell is negative for all gRNAs.max_assignments_per_cell (
int, default:5) – Maximum number of gRNAs that can be assigned to a cell.multiple_grna_assigned_key (
str, default:'multiple') – Key to use when more thanmax_assignments_per_cellgRNAs are assigned.multiple_grna_assignment_string (
str, default:'+') – Separator used to join multiple gRNAs assigned to one cell.only_return_results (
bool, default:False) – IfTrue, do not modifyadataand return the assignment array.show_progress (
bool, default:False) – Whether to print a progress line.n_iter (
int, default:500) – Optimization steps for the SVI loop (crispat default: 500).learning_rate (
float, default:0.01) – Adam learning rate (crispat default: 0.01).n_init_seeds (
int, default:10) – Number of prior-sampled inits per guide; best is kept (crispat default: 10).seed (
int, default:2024) – Random seed used for initialization.
- Return type:
Examples
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> gdo = mdata.mod["gdo"] >>> ga = pt.pp.GuideAssignment() >>> ga.assign_mixture_model(gdo)
- GuideAssignment.assign_to_max_guide(data, /, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]#
- GuideAssignment.assign_to_max_guide(adata, /, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')
- GuideAssignment.assign_to_max_guide(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')
- GuideAssignment.assign_to_max_guide(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')
Simple threshold based max gRNA assignment function.
Each cell is assigned to the most expressed gRNA if it has at least assignment_threshold counts. This function expects unnormalized data as input.
- Parameters:
data (
AnnData|ndarray|csr_matrix) – The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.assignment_threshold (
float) – The count threshold that is required for an assignment to be viable.layer (
str|None, default:None) – Key to the layer containing raw count values of the gRNAs. adata.X is used if layer is None. Expects count data.obs_key (
str, default:'assigned_guide') – Assigned guide will be saved on adata.obs[output_key].no_grna_assigned_key (
str, default:'Negative') – The key to return if no gRNA is expressed enough.
- Return type:
Examples
Each cell is assigned to the most expressed gRNA if it has at least 5 counts.
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> gdo = mdata.mod["gdo"] >>> ga = pt.pp.GuideAssignment() >>> ga.assign_to_max_guide(gdo, assignment_threshold=5)
- GuideAssignment.assign_to_max_guide_anndata(adata, /, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]#
- Return type:
- GuideAssignment.assign_to_max_guide_numpy(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')[source]#
- Return type:
- GuideAssignment.assign_to_max_guide_sparse(X, /, *, var, assignment_threshold, no_grna_assigned_key='Negative')[source]#
- Return type:
- GuideAssignment.plot_heatmap(adata, *, layer=None, order_by=None, key_to_save_order=None, return_fig=False, **kwargs)[source]#
Heatmap plotting of guide RNA expression matrix.
Assuming guides have sparse expression, this function reorders cells and plots guide RNA expression so that a nice sparse representation is achieved. The cell ordering can be stored and reused in future plots to obtain consistent plots before and after analysis of the guide RNA expression. Note: This function expects a log-normalized or binary data.
- Parameters:
adata (
AnnData) – Annotated data matrix containing gRNA valueslayer (
str|None, default:None) – Key to the layer containing log normalized count values of the gRNAs. adata.X is used if layer is None.order_by (
ndarray|str|None, default:None) – The order of cells in y axis. If None, cells will be reordered to have a nice sparse representation. If a string is provided, adata.obs[order_by] will be used as the order. If a numpy array is provided, the array will be used for ordering.key_to_save_order (
str, default:None) – The obs key to save cell orders in the current plot. Only saves if not None.return_fig (
bool, default:False) – if True, returns figure of the plot, that can be used for saving.kwargs – Are passed to sc.pl.heatmap.
- Return type:
- Returns:
If return_fig is True, returns the figure, otherwise None. Order of cells in the y-axis will be saved on adata.obs[key_to_save_order] if provided.
Examples
Each cell is assigned to gRNA that occurs at least 5 times in the respective cell, which is then visualized using a heatmap.
>>> import pertpy as pt >>> mdata = pt.dt.papalexi_2021() >>> gdo = mdata.mod["gdo"] >>> ga = pt.pp.GuideAssignment() >>> ga.assign_by_threshold(gdo, assignment_threshold=5) >>> ga.plot_heatmap(gdo)