pertpy.tools.Distance

class pertpy.tools.Distance(metric='edistance', layer_key=None, obsm_key=None, cell_wise_metric='euclidean')[source]

Distance class, used to compute distances between groups of cells.

The distance metric can be specified by the user. This class also provides a method to compute the pairwise distances between all groups of cells. Currently available metrics:

  • “edistance”: Energy distance (Default metric).

    In essence, it is twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group respectively. More information can be found in Peidli et al. (2023).

  • “euclidean”: euclidean distance.

    Euclidean distance between the means of cells from two groups.

  • “root_mean_squared_error”: euclidean distance.

    Euclidean distance between the means of cells from two groups.

  • “mse”: Pseudobulk mean squared error.

    mean squared distance between the means of cells from two groups.

  • “mean_absolute_error”: Pseudobulk mean absolute distance.

    Mean absolute distance between the means of cells from two groups.

  • “pearson_distance”: Pearson distance.

    Pearson distance between the means of cells from two groups.

  • “spearman_distance”: Spearman distance.

    Spearman distance between the means of cells from two groups.

  • “kendalltau_distance”: Kendall tau distance.

    Kendall tau distance between the means of cells from two groups.

  • “cosine_distance”: Cosine distance.

    Cosine distance between the means of cells from two groups.

  • “r2_distance”: coefficient of determination distance.

    Coefficient of determination distance between the means of cells from two groups.

  • “mean_pairwise”: Mean pairwise distance.

    Mean of the pairwise euclidean distances between cells of two groups.

  • “mmd”: Maximum mean discrepancy

    Maximum mean discrepancy between the cells of two groups. Here, uses linear, rbf, and quadratic polynomial MMD. For theory on MMD in single-cell applications, see Lotfollahi et al. (2019).

  • “wasserstein”: Wasserstein distance (Earth Mover’s Distance)

    Wasserstein distance between the cells of two groups. Uses an OTT-JAX implementation of the Sinkhorn algorithm to compute the distance. For more information on the optimal transport solver, see Cuturi et al. (2013).

  • “sym_kldiv”: symmetrized Kullback–Leibler divergence distance.

    Kullback–Leibler divergence of the gaussian distributions between cells of two groups. Here we fit a gaussian distribution over one group of cells and then calculate the KL divergence on the other, and vice versa.

  • “t_test”: t-test statistic.

    T-test statistic measure between cells of two groups.

  • “ks_test”: Kolmogorov-Smirnov test statistic.

    Kolmogorov-Smirnov test statistic measure between cells of two groups.

  • “nb_ll”: log-likelihood over negative binomial

    Average of log-likelihoods of samples of the secondary group after fitting a negative binomial distribution over the samples of the first group.

  • “classifier_proba”: probability of a binary classifier

    Average of the classification probability of the perturbation for a binary classifier.

  • “classifier_cp”: classifier class projection

    Average of the class

metric

Name of distance metric.

layer_key

Name of the counts to use in adata.layers.

obsm_key

Name of embedding in adata.obsm to use.

cell_wise_metric

Metric from scipy.spatial.distance to use for pairwise distances between single cells.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.distance_example()
>>> Distance = pt.tools.Distance(metric="edistance")
>>> X = adata.obsm["X_pca"][adata.obs["perturbation"] == "p-sgCREB1-2"]
>>> Y = adata.obsm["X_pca"][adata.obs["perturbation"] == "control"]
>>> D = Distance(X, Y)

Methods table

onesided_distances(adata, groupby[, ...])

Get distances between one selected cell group and the remaining other cell groups.

pairwise(adata, groupby[, groups, ...])

Get pairwise distances between groups of cells.

precompute_distances(adata[, n_jobs])

Precompute pairwise distances between all cells, writes to adata.obsp.

Methods

onesided_distances

Distance.onesided_distances(adata, groupby, selected_group=None, groups=None, show_progressbar=True, n_jobs=-1, **kwargs)[source]

Get distances between one selected cell group and the remaining other cell groups.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • groupby (str) – Column name in adata.obs.

  • selected_group (str | None) – Group to compute pairwise distances to all other.

  • groups (list[str] | None) – List of groups to compute distances to selected_group for. If None, uses all groups. Defaults to None.

  • show_progressbar (bool) – Whether to show progress bar. Defaults to True.

  • n_jobs (int) – Number of cores to use. Defaults to -1 (all).

  • kwargs – Additional keyword arguments passed to the metric function.

Returns:

Dataframe with distances of groups to selected_group.

Return type:

pd.DataFrame

Examples

>>> import pertpy as pt
>>> adata = pt.dt.distance_example()
>>> Distance = pt.tools.Distance(metric="edistance")
>>> pairwise_df = Distance.onesided_distances(adata, groupby="perturbation", selected_group="control")

pairwise

Distance.pairwise(adata, groupby, groups=None, show_progressbar=True, n_jobs=-1, **kwargs)[source]

Get pairwise distances between groups of cells.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • groupby (str) – Column name in adata.obs.

  • groups (list[str] | None) – List of groups to compute pairwise distances for. If None, uses all groups. Defaults to None.

  • show_progressbar (bool) – Whether to show progress bar. Defaults to True.

  • n_jobs (int) – Number of cores to use. Defaults to -1 (all).

  • kwargs – Additional keyword arguments passed to the metric function.

Returns:

Dataframe with pairwise distances.

Return type:

pd.DataFrame

Examples

>>> import pertpy as pt
>>> adata = pt.dt.distance_example()
>>> Distance = pt.tools.Distance(metric="edistance")
>>> pairwise_df = Distance.pairwise(adata, groupby="perturbation")

precompute_distances

Distance.precompute_distances(adata, n_jobs=-1)[source]

Precompute pairwise distances between all cells, writes to adata.obsp.

The precomputed distances are stored in adata.obsp under the key ‘{self.obsm_key}_{cell_wise_metric}_predistances’, as they depend on both the cell-wise metric and the embedding used.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • n_jobs (int) – Number of cores to use. Defaults to -1 (all).

Return type:

None

Examples

>>> import pertpy as pt
>>> adata = pt.dt.distance_example()
>>> distance = pt.tools.Distance(metric="edistance")
>>> distance.precompute_distances(adata)