pertpy.metadata.CellLine

class pertpy.metadata.CellLine[source]

Utilities to fetch cell line metadata.

Methods table

annotate(adata[, query_id, reference_id, ...])

Annotate cell lines.

annotate_bulk_rna(adata[, query_id, ...])

Fetch bulk rna expression from the Broad or Sanger.

annotate_from_gdsc(adata[, query_id, ...])

Fetch drug response data from GDSC.

annotate_protein_expression(adata[, ...])

Fetch protein expression.

correlate(adata[, identifier, metadata_key])

Correlate cell lines with annotated metadata.

lookup()

Generate LookUp object for CellLineMetaData.

plot_correlation(adata, corr, pval[, ...])

Visualise the correlation of cell lines with annotated metadata.

Methods

annotate

CellLine.annotate(adata, query_id='DepMap_ID', reference_id='ModelID', fetch=None, cell_line_source='DepMap', verbosity=5, copy=False)[source]

Annotate cell lines.

For each cell, we fetch cell line annotation from either the Dependency Map (DepMap) or The Genomics of Drug Sensitivity in Cancer Project (Cancerxgene).

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str) – The column of .obs with cell line information. Defaults to “DepMap_ID”.

  • reference_id (str) – The type of cell line identifier in the metadata, e.g. ModelID, CellLineName or StrippedCellLineName. If fetching cell line metadata from Cancerrxgene, it is recommended to choose “stripped_cell_line_name”. Defaults to “ModelID”.

  • fetch (list[str] | None) – The metadata to fetch. Defaults to None (=all).

  • cell_line_source (Literal['DepMap', 'Cancerrxgene']) – The source of cell line metadata, DepMap or Cancerrxgene. Defaults to “DepMap”.

  • verbosity (int | str) – The number of unmatched identifiers to print, can be either non-negative values or “all”. Defaults to 5.

  • copy (bool) – Determines whether a copy of the adata is returned. Defaults to False.

Return type:

AnnData

Returns:

Returns an AnnData object with cell line annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.dialogue_example()
>>> adata.obs["cell_line_name"] = "MCF7"
>>> pt_metadata = pt.md.CellLine()
>>> adata_annotated = pt_metadata.annotate(adata=adata,
>>>                                        reference_id='cell_line_name',
>>>                                        query_id='cell_line_name',
>>>                                        fetch=["cell_line_name", "age", "primary_disease"],
>>>                                        copy=True)

annotate_bulk_rna

CellLine.annotate_bulk_rna(adata, query_id='cell_line_name', cell_line_source='sanger', verbosity=5, gene_identifier='gene_ID', copy=False)[source]

Fetch bulk rna expression from the Broad or Sanger.

For each cell, we fetch bulk rna expression from either Broad or Sanger cell line.

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str) – The column of .obs with cell line information. Defaults to “cell_line_name” if cell_line_source is sanger, otherwise “DepMap_ID”.

  • cell_line_source (Literal['broad', 'sanger']) – The bulk rna expression data from either broad or sanger cell line. Defaults to “sanger”.

  • verbosity (int | str) – The number of unmatched identifiers to print, can be either non-negative values or “all”. Defaults to 5.

  • copy (bool) – Determines whether a copy of the adata is returned. Defaults to False.

Return type:

AnnData

Returns:

Returns an AnnData object with bulk rna expression annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.dialogue_example()
>>> adata.obs["cell_line_name"] = "MCF7"
>>> pt_metadata = pt.md.CellLine()
>>> adata_annotated = pt_metadata.annotate(
...     adata=adata, reference_id="cell_line_name", query_id="cell_line_name", copy=True
... )
>>> pt_metadata.annotate_bulk_rna(adata_annotated)

annotate_from_gdsc

CellLine.annotate_from_gdsc(adata, query_id='cell_line_name', reference_id='cell_line_name', query_perturbation='perturbation', reference_perturbation='drug_name', gdsc_dataset=1, verbosity=5, copy=False)[source]

Fetch drug response data from GDSC.

For each cell, we fetch drug response data as natural log of the fitted IC50 for its corresponding cell line and perturbation from GDSC fitted data results file.

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str) – The column of .obs with cell line information. Defaults to “cell_line_name”.

  • reference_id (Literal['cell_line_name', 'sanger_model_id', 'cosmic_id']) – The type of cell line identifier in the metadata, cell_line_name, sanger_model_id or cosmic_id. Defaults to “cell_line_name”.

  • query_perturbation (str) – The column of .obs with perturbation information. Defaults to “perturbation”.

  • reference_perturbation (Literal['drug_name', 'drug_id']) – The type of perturbation in the metadata, drug_name or drug_id. Defaults to ‘drug_name’.

  • gdsc_dataset (Literal[1, 2]) – The GDSC dataset, 1 or 2. The GDSC1 dataset updates previous releases with additional drug screening data from the Sanger Institute and Massachusetts General Hospital. It covers 970 Cell lines and 403 Compounds with 333292 IC50s. GDSC2 is new and has 243,466 IC50 results from the latest screening at the Sanger Institute. Defaults to 1.

  • verbosity (int | str) – The number of unmatched identifiers to print, can be either non-negative values or ‘all’. Defaults to 5.

  • copy (bool) – Determines whether a copy of the adata is returned. Defaults to False.

Return type:

AnnData

Returns:

Returns an AnnData object with drug response annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.mcfarland_2020()
>>> pt_metadata = pt.md.CellLine()
>>> pt_metadata.annotate_from_gdsc(adata, query_id="cell_line")

annotate_protein_expression

CellLine.annotate_protein_expression(adata, query_id='cell_line_name', reference_id='model_name', protein_information='protein_intensity', protein_id='uniprot_id', verbosity=5, copy=False)[source]

Fetch protein expression.

For each cell, we fetch protein intensity values acquired using data-independent acquisition mass spectrometry (DIA-MS).

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str) – The column of .obs with cell line information. Defaults to “cell_line_name”.

  • reference_id (Literal['model_name', 'model_id']) – The type of cell line identifier in the meta data, model_name or model_id. Defaults to “model_name”.

  • protein_information (Literal['protein_intensity', 'zscore']) – The type of protein expression data to fetch, protein_intensity or zscore. Defaults to “protein_intensity”.

  • protein_id (Literal['uniprot_id', 'symbol']) – The protein identifier saved in the fetched meta data, uniprot_id or symbol. Defaults to “uniprot_id”.

  • verbosity (int | str) – The number of unmatched identifiers to print, can be either non-negative values or “all”. Defaults to 5.

  • copy (bool) – Determines whether a copy of the adata is returned. Defaults to False.

Return type:

AnnData

Returns:

Returns an AnnData object with protein expression annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.dialogue_example()
>>> adata.obs["cell_line_name"] = "MCF7"
>>> pt_metadata = pt.md.CellLine()
>>> adata_annotated = pt_metadata.annotate(
...     adata=adata, reference_id="cell_line_name", query_id="cell_line_name", copy=True
... )
>>> pt_metadata.annotate_protein_expression(adata_annotated)

correlate

CellLine.correlate(adata, identifier='DepMap_ID', metadata_key='bulk_rna_broad')[source]

Correlate cell lines with annotated metadata.

Parameters:
  • adata (AnnData) – Input data object.

  • identifier (str) – Column in .obs containing cell line identifiers. Defaults to “DepMap_ID”.

  • metadata_key (str) – Key of the AnnData obsm for comparison with the X matrix. Defaults to “bulk_rna_broad”.

Return type:

tuple[DataFrame, DataFrame, DataFrame | None, DataFrame | None]

Returns:

Returns pearson correlation coefficients and their corresponding p-values for matched and unmatched cell lines separately.

lookup

CellLine.lookup()[source]

Generate LookUp object for CellLineMetaData.

The LookUp object provides an overview of the metadata to annotate. Each annotate_{metadata} function has a corresponding lookup function in the LookUp object, where users can search the reference_id in the metadata and compare with the query_id in their own data.

Return type:

LookUp

Returns:

A LookUp object specific for cell line annotation.

Examples

>>> import pertpy as pt
>>> pt_metadata = pt.md.CellLine()
>>> lookup = pt_metadata.lookup()

plot_correlation

CellLine.plot_correlation(adata, corr, pval, identifier='DepMap_ID', metadata_key='bulk_rna_broad', category='cell line', subset_identifier=None)[source]

Visualise the correlation of cell lines with annotated metadata.

Parameters:
  • adata (AnnData) – Input data object.

  • corr (DataFrame) – Pearson correlation scores.

  • pval (DataFrame) – P-values for pearson correlation.

  • identifier (str) – Column in .obs containing the identifiers. Defaults to ‘DepMap_ID’.

  • metadata_key (str) – Key of the AnnData obsm for comparison with the X matrix. Defaults to ‘bulk_rna_broad’.

  • category (str) – The category for correlation comparison. Defaults to “cell line”.

  • subset_identifier (str | int | Iterable[str] | Iterable[int] | None) – Selected identifiers for scatter plot visualization between the X matrix and metadata_key. If not None, only the chosen cell line will be plotted, either specified as a value in identifier (string) or as an index number. If None, all cell lines will be plotted. Defaults to None.

Return type:

None

Returns:

Pearson correlation coefficients and their corresponding p-values for matched and unmatched cell lines separately.