Note

This page was generated from distance_tests.ipynb. Some tutorial content may look better in light mode.

Distance Tests

Pertpy offers several Distances to compute distances between groups of cells. To determine whether two groups came from the same distribution as evaluated by the Distance metric, pertpy provides distance tests. In practice, this can be a valuable tool to assess whether a treatment had a significant effect on the transcript profiles of the cells.

Pertpy offers a flexible implementation of Monte-Carlo permutation tests (we call them Distance Tests) that can use any of the implemented distance functions as a test statistic.

Setup

[7]:
import matplotlib.pyplot as plt
import pertpy as pt
import scanpy as sc

Dataset

Here we use an example dataset, which is a subsetted and already preprocessed version of data from the original Perturb-seq paper (Dixit et al., 2016). The full dataset can be accessed using pt.dt.dixit_2016().

Note that most distances are computed in PCA space to avoid the curse of dimensionality or to speed up computation. When using your own dataset run scanpy.pp.pca first, prior to using the distance methods. If you prefer to compute distances in a different space, specify an alternative obsm_key argument when calling the distance function.

[4]:
adata = pt.dt.distance_example()
obs_key = "perturbation"  # defines groups to test
contrast = "control"  # defines contrast group for test
[5]:
sc.pp.neighbors(adata, use_rep="X_pca", n_neighbors=30, n_pcs=30)
sc.tl.umap(adata)
[6]:
sc.pl.scatter(adata, basis="umap", color=obs_key, show=True)
../../_images/tutorials_notebooks_distance_tests_8_0.png

E-test (uses E-distance)

We can run DistanceTest using any of the implemented distance functions. Here we use the E-test, which uses the E-distance as a test statistic.

[14]:
etest = pt.tl.DistanceTest("edistance", n_perms=1000, obsm_key="X_pca", alpha=0.0015)
tab = etest(adata, groupby=obs_key, contrast=contrast)

We recommend using the adjusted p-values, which are corrected for multiple testing. You might have to specify a lower alpha value for the adjusted p-values to be significant, depending on the number of permutations, the number of cells in each group, and the number of groups tested (each group increases the number of tests made and therefore affects the adjusted p-value). Also note that the lowest p-value you can get is 1 / n_permutations, which is the probability of getting the observed distance by chance.

[15]:
tab.head()
[15]:
distance pvalue significant pvalue_adj significant_adj
p-INTERGENIC393453 0.460251 0.001 True 0.031509 False
p-sgELF1-2 0.523675 0.001 True 0.031509 False
p-INTERGENIC216151 0.429756 0.001 True 0.031509 False
p-INTERGENIC1144056 0.502560 0.001 True 0.031509 False
p-sgELF1-5 0.510796 0.001 True 0.031509 False

Let’s plot the test results:

[16]:
import seaborn as sns

with sns.axes_style("darkgrid"):
    sns.scatterplot(
        data=tab[tab.index != contrast],
        x="pvalue",
        y="distance",
        hue="significant",
        palette={True: "green", False: "red"},
    )
plt.title("E-test results")
plt.xlabel("E-test p-value")
plt.ylabel("E-distance to contrast group (control)")
plt.show()
../../_images/tutorials_notebooks_distance_tests_15_0.png

You can see that most permutations are quite different from the control cells, but there are a few that are not. In general, more cells will increase the power of the test. On average, you can expect that cells with a lower distance to control cells will have a lower p-value, tough this depends on the shape of the distribution of distances: almost all distances can be biased by a few outliers, which is why we recommend using a Distance Test to quantify the effect of a treatment on the cells.

Conclusion

Pertpy provides Monte-Carlo permutation tests with various distance metrics to evaluate if treatment significantly alters cell transcript profiles.

References

  1. Stefan Peidli, Tessa D. Green, Ciyue Shen, Torsten Gross, Joseph Min, Samuele Garda, Bo Yuan, Linus J. Schumacher, Jake P. Taylor-King, Debora S. Marks, Augustin Luna, Nils Blüthgen, Chris Sander bioRxiv 2022.08.20.504663; doi: https://doi.org/10.1101/2022.08.20.504663