Datasets

Datasets#

pertpy provides access to several curated single-cell datasets spanning several types of perturbations. Many of the datasets originate from scperturb [PGS+24].

data.adamson_2016_pilot()

6000 chronic myeloid leukemia (K562) cells carrying 8 distinct GBCs.

data.adamson_2016_upr_epistasis()

15000 K562 cells with UPR sensor genes knocked out and treated with thapsigargin.

data.adamson_2016_upr_perturb_seq()

Transcriptomics measurements of 65000 cells that were subject to 91 sgRNAs targeting 82 genes.

data.aissa_2021()

Transcriptomics of 848 P99 cells subject to consecutive erlotinib and 756 control cells.

data.bhattacherjee()

Processed single-cell data PFC adult mice under cocaine self-administration.

data.burczynski_crohn()

Bulk data with conditions ulcerative colitis (UC) and Crohn's disease (CD).

data.chang_2021()

Transcriptomics of 5 different cell lines that were induced with a unique TraCe-seq barcode.

data.combosciplex()

scRNA-seq subset of the combinatorial experiment of sciplex3.

data.cinemaot_example()

Subsampled CINEMA-OT example dataset.

data.datlinger_2017()

Transcriptomics measurements of 5905 Jurkat cells induced with anti-CD3 and anti-CD28 antibodies.

data.datlinger_2021()

Transcriptomics measurements of 151788 nuclei of four cell lines.

data.dialogue_example()

Example dataset used in DIALOGUE vignettes.

data.distance_example()

Example dataset used to feature distances and distance_tests.

data.dixit_2016()

Perturb-seq: scRNA-seq with pooled CRISPR-KO perturbations.

data.dixit_2016_raw()

Perturb-seq: scRNA-seq with pooled CRISPR-KO perturbations.

data.dong_2023()

Complete CINEMA-OT dataset.

data.frangieh_2021()

Processed perturb-CITE-seq data with multi-modal RNA and protein single-cell profiling.

data.frangieh_2021_protein()

CITE-seq data of 218000 cells under 750 perturbations (only the surface protein data).

data.frangieh_2021_raw()

Raw Perturb-CITE-seq data with multi-modal RNA and protein single-cell profiling readout.

data.frangieh_2021_rna()

CITE-seq data of 218000 cells under 750 perturbations (only the transcriptomics data).

data.gasperini_2019_atscale()

Transcriptomics of 254974 cells of chronic K562 cells with CRISPRi perturbations.

data.gasperini_2019_highmoi()

K562 perturbed cells with 1119 candidate enhancers (only the high MOI part).

data.gasperini_2019_lowmoi()

K562 perturbed cells with 1119 candidate enhancers (only the low MOI part).

data.gehring_2019()

96-plex perturbation experiment on live mouse neural stem cells.

data.haber_2017_regions()

Raw single-cell, pooled CRISPR screening.

data.hagai_2018()

Cross-species analysis of primary dermal fibroblasts and bone marrow-derived phagocytes, stimulated with dsRNA and IFNB.

data.kang_2018()

Processed multiplexing droplet-based single cell RNA-sequencing using genetic barcodes.

data.mcfarland_2020()

Response of various cell lines to a range of different drugs and CRISPRi perturbations.

data.norman_2019()

Processed single-cell, pooled CRISPR screening.

data.norman_2019_raw()

Raw single-cell, pooled CRISPR screening.

data.papalexi_2021()

ECCITE-seq dataset of 11 gRNAs generated from stimulated THP-1 cell line.

data.replogle_2022_k562_essential()

K562 cells transduced with CRISPRi (day 7 after transduction).

data.replogle_2022_k562_gwps()

K562 cells transduced with CRISPRi (day 8 after transduction).

data.replogle_2022_rpe1()

RPE1 cells transduced with CRISPRi (day 7 after transduction).

data.sc_sim_augur()

Simulated test dataset used in Augur example vignettes.

data.schiebinger_2019_16day()

Transcriptomes of 65781 iPSC cells collected over 10 time points in 2i or serum conditions (16-day time course).

data.schiebinger_2019_18day()

Transcriptomes of 259155 iPSC cells collected over 39 time points in 2i or serum conditions (18-day time course).

data.schraivogel_2020_tap_screen_chr8()

TAP-seq applied to K562 cells (only chromosome 8).

data.schraivogel_2020_tap_screen_chr11()

TAP-seq applied to K562 cells (only chromosome 11).

data.sciplex_gxe1()

sci-Plex-GxE profiling of A172 dCas9-KRAB (HPRT1 or MMR knockout) with 6-TG/TMZ and A172 dCas9-SunTag (HPRT1 knockout) with 6-TG.

data.sciplex3_raw()

Raw sciplex3 perturbation dataset curated for perturbation modeling.

data.shifrut_2018()

CD8 T-cells from two donors for two conditions (SLICE and CROP-seq).

data.smillie_2019()

scRNA-seq data of the small intestine of mice under Ulcerative Colitis.

data.srivatsan_2020_sciplex2()

A549 cells exposed to four compounds.

data.srivatsan_2020_sciplex3()

Transcriptomes of 650000 A549, K562, and mCF7 cells exposed to 188 compounds.

data.srivatsan_2020_sciplex4()

A549 and MCF7 cells treated with pracinostat.

data.stephenson_2021_subsampled()

Processed 10X 5' scRNA-seq data from PBMC of COVID-19 patients and healthy donors.

data.tasccoda_example()

Example for the coda part of a mudata object.

data.tian_2019_day7neuron()

Transcriptomes of 20000 day 7 neurons targeted by 58 gRNAs.

data.tian_2019_ipsc()

Transcriptomics of 20000 iPSCs targeted by 58 sgRNAs.

data.tian_2021_crispra()

CROP-seq of 50000 neurons treated with 374 gRNAs (CRISPRa only).

data.tian_2021_crispri()

CROP-seq of 98000 neurons treated with 374 gRNAs (CRISPRi only).

data.weinreb_2020()

Mouse embryonic stem cells under different cytokines across time.

data.xie_2017()

Single-cell transcriptomics of 51448 cells generated with Mosaic-seq.

data.zhao_2021()

Multiplexed drug perturbation from freshly resected tumors.

data.zhang_2021()

Single-cell RNA-seq of TNBC patients' immune cells exposed to paclitaxel alone or combined with the anti-PD-L1 atezolizumab.