pertpy.metadata.CellLine#

class CellLine[source]#

Utilities to fetch cell line metadata.

Methods table#

annotate(adata[, query_id, reference_id, ...])

Annotate cell lines.

annotate_bulk_rna(adata[, query_id, ...])

Fetch bulk rna expression from the Broad or Sanger.

annotate_from_gdsc(adata[, query_id, ...])

Fetch drug response data from GDSC.

annotate_from_prism(adata[, query_id, ...])

Fetch drug response data from PRISM.

annotate_protein_expression(adata[, ...])

Fetch protein expression.

correlate(adata[, identifier, metadata_key])

Correlate cell lines with annotated metadata.

lookup()

Generate LookUp object for CellLineMetaData.

plot_correlation(adata, corr, pval, *[, ...])

Visualise the correlation of cell lines with annotated metadata.

Methods#

CellLine.annotate(adata, query_id='DepMap_ID', reference_id='ModelID', fetch=None, cell_line_source='DepMap', verbosity=5, copy=False)[source]#

Annotate cell lines.

For each cell, we fetch cell line annotation from either the Dependency Map (DepMap) or The Genomics of Drug Sensitivity in Cancer Project (Cancerxgene).

Parameters:
  • adata (AnnData) – The AnnData object to annotate.

  • query_id (str, default: 'DepMap_ID') – The column of .obs with cell line information.

  • reference_id (str, default: 'ModelID') – The type of cell line identifier in the metadata, e.g. ModelID, CellLineName or StrippedCellLineName. If fetching cell line metadata from Cancerrxgene, it is recommended to choose “stripped_cell_line_name”.

  • fetch (list[str] | None, default: None) – The metadata to fetch.

  • cell_line_source (Literal['DepMap', 'Cancerrxgene'], default: 'DepMap') – The source of cell line metadata, DepMap or Cancerrxgene.

  • verbosity (int | str, default: 5) – The number of unmatched identifiers to print, can be either non-negative values or “all”.

  • copy (bool, default: False) – Determines whether a copy of adata is returned.

Return type:

AnnData

Returns:

Returns an AnnData object with cell line annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.dialogue_example()
>>> adata.obs["cell_line_name"] = "MCF7"
>>> pt_metadata = pt.md.CellLine()
>>> adata_annotated = pt_metadata.annotate(adata=adata,
>>>                                        reference_id='cell_line_name',
>>>                                        query_id='cell_line_name',
>>>                                        fetch=["cell_line_name", "Age", "OncotreePrimaryDisease"],
>>>                                        copy=True)
CellLine.annotate_bulk_rna(adata, query_id=None, cell_line_source='sanger', verbosity=5, gene_identifier='gene_ID', copy=False)[source]#

Fetch bulk rna expression from the Broad or Sanger.

For each cell, we fetch bulk rna expression from either Broad or Sanger cell line.

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str, default: None) – The column of .obs with cell line information. Defaults to “cell_line_name” if cell_line_source is sanger, otherwise “DepMap_ID”.

  • cell_line_source (Literal['broad', 'sanger'], default: 'sanger') – The bulk rna expression data from either broad or sanger cell line.

  • verbosity (int | str, default: 5) – The number of unmatched identifiers to print, can be either non-negative values or “all”.

  • gene_identifier (Literal['gene_name', 'gene_ID', 'both'], default: 'gene_ID') – The type of gene identifier saved in the fetched meta data, ‘gene_name’, ‘gene_ID’ or ‘both’.

  • copy (bool, default: False) – Determines whether a copy of the adata is returned.

Return type:

AnnData

Returns:

Returns an AnnData object with bulk rna expression annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.dialogue_example()
>>> adata.obs["cell_line_name"] = "MCF7"
>>> pt_metadata = pt.md.CellLine()
>>> adata_annotated = pt_metadata.annotate(
...     adata=adata, reference_id="cell_line_name", query_id="cell_line_name", copy=True
... )
>>> pt_metadata.annotate_bulk_rna(adata_annotated)
CellLine.annotate_from_gdsc(adata, query_id='cell_line_name', reference_id='cell_line_name', query_perturbation='perturbation', reference_perturbation='drug_name', gdsc_dataset='gdsc_1', verbosity=5, copy=False)[source]#

Fetch drug response data from GDSC.

For each cell, we fetch drug response data as natural log of the fitted IC50 and AUC for its corresponding cell line and perturbation from GDSC fitted data results file.

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str, default: 'cell_line_name') – The column of .obs with cell line information.

  • reference_id (Literal['cell_line_name', 'sanger_model_id', 'cosmic_id'], default: 'cell_line_name') – The type of cell line identifier in the metadata, cell_line_name, sanger_model_id or cosmic_id.

  • query_perturbation (str, default: 'perturbation') – The column of .obs with perturbation information.

  • reference_perturbation (Literal['drug_name', 'drug_id'], default: 'drug_name') – The type of perturbation in the metadata, drug_name or drug_id.

  • gdsc_dataset (Literal['gdsc_1', 'gdsc_2'], default: 'gdsc_1') – The GDSC dataset, 1 or 2, specified as ‘gdsc_1’ or ‘gdsc_2’. The GDSC1 dataset updates previous releases with additional drug screening data from the Sanger Institute and Massachusetts General Hospital. It covers 970 Cell lines and 403 Compounds with 333292 IC50s. GDSC2 is new and has 243,466 IC50 results from the latest screening at the Sanger Institute.

  • verbosity (int | str, default: 5) – The number of unmatched identifiers to print, can be either non-negative values or ‘all’.

  • copy (bool, default: False) – Determines whether a copy of the adata is returned.

Return type:

AnnData

Returns:

Returns an AnnData object with drug response annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.mcfarland_2020()
>>> pt_metadata = pt.md.CellLine()
>>> pt_metadata.annotate_from_gdsc(adata, query_id="cell_line")
CellLine.annotate_from_prism(adata, query_id='DepMap_ID', query_perturbation='perturbation', verbosity=5, copy=False)[source]#

Fetch drug response data from PRISM.

For each cell, we fetch drug response data as IC50, EC50 and AUC for its corresponding cell line and perturbation from PRISM fitted data results file. Note that all rows where either depmap_id or name is missing will be dropped.

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str, default: 'DepMap_ID') – The column of .obs with cell line information.

  • query_perturbation (str, default: 'perturbation') – The column of .obs with perturbation information.

  • verbosity (int | str, default: 5) – The number of unmatched identifiers to print, can be either non-negative values or ‘all’.

  • copy (bool, default: False) – Determines whether a copy of the adata is returned.

Return type:

AnnData

Returns:

Returns an AnnData object with drug response annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.mcfarland_2020()
>>> pt_metadata = pt.md.CellLine()
>>> pt_metadata.annotate_from_prism(adata, query_id="DepMap_ID")
CellLine.annotate_protein_expression(adata, query_id='cell_line_name', reference_id='model_name', protein_information='protein_intensity', protein_id='uniprot_id', verbosity=5, copy=False)[source]#

Fetch protein expression.

For each cell, we fetch protein intensity values acquired using data-independent acquisition mass spectrometry (DIA-MS).

Parameters:
  • adata (AnnData) – The data object to annotate.

  • query_id (str, default: 'cell_line_name') – The column of .obs with cell line information.

  • reference_id (Literal['model_name', 'model_id'], default: 'model_name') – The type of cell line identifier in the meta data, model_name or model_id.

  • protein_information (Literal['protein_intensity', 'zscore'], default: 'protein_intensity') – The type of protein expression data to fetch, protein_intensity or zscore.

  • protein_id (Literal['uniprot_id', 'symbol'], default: 'uniprot_id') – The protein identifier saved in the fetched meta data, uniprot_id or symbol.

  • verbosity (int | str, default: 5) – The number of unmatched identifiers to print, can be either non-negative values or “all”.

  • copy (bool, default: False) – Determines whether a copy of the adata is returned.

Return type:

AnnData

Returns:

Returns an AnnData object with protein expression annotation.

Examples

>>> import pertpy as pt
>>> adata = pt.dt.dialogue_example()
>>> adata.obs["cell_line_name"] = "MCF7"
>>> pt_metadata = pt.md.CellLine()
>>> adata_annotated = pt_metadata.annotate(
...     adata=adata, reference_id="cell_line_name", query_id="cell_line_name", copy=True
... )
>>> pt_metadata.annotate_protein_expression(adata_annotated)
CellLine.correlate(adata, identifier='DepMap_ID', metadata_key='bulk_rna_broad')[source]#

Correlate cell lines with annotated metadata.

Parameters:
  • adata (AnnData) – Input data object.

  • identifier (str, default: 'DepMap_ID') – Column in .obs containing cell line identifiers.

  • metadata_key (str, default: 'bulk_rna_broad') – Key of the AnnData obsm for comparison with the X matrix.

Return type:

tuple[DataFrame, DataFrame, DataFrame | None, DataFrame | None]

Returns:

Returns pearson correlation coefficients and their corresponding p-values for matched and unmatched cell lines separately.

CellLine.lookup()[source]#

Generate LookUp object for CellLineMetaData.

The LookUp object provides an overview of the metadata to annotate. Each annotate_{metadata} function has a corresponding lookup function in the LookUp object, where users can search the reference_id in the metadata and compare with the query_id in their own data.

Return type:

LookUp

Returns:

A LookUp object specific for cell line annotation.

Examples

>>> import pertpy as pt
>>> pt_metadata = pt.md.CellLine()
>>> lookup = pt_metadata.lookup()
CellLine.plot_correlation(adata, corr, pval, *, identifier='DepMap_ID', metadata_key='bulk_rna_broad', category='cell line', subset_identifier=None, return_fig=False)[source]#

Visualise the correlation of cell lines with annotated metadata.

Parameters:
  • adata (AnnData) – Input data object.

  • corr (DataFrame) – Pearson correlation scores.

  • pval (DataFrame) – P-values for pearson correlation.

  • identifier (str, default: 'DepMap_ID') – Column in .obs containing the identifiers.

  • metadata_key (str, default: 'bulk_rna_broad') – Key of the AnnData obsm for comparison with the X matrix.

  • category (str, default: 'cell line') – The category for correlation comparison.

  • subset_identifier (str | int | Iterable[str] | Iterable[int] | None, default: None) – Selected identifiers for scatter plot visualization between the X matrix and metadata_key. If not None, only the chosen cell line will be plotted, either specified as a value in identifier (string) or as an index number. If None, all cell lines will be plotted.

  • return_fig (bool, default: False) – if True, returns figure of the plot, that can be used for saving.

Return type:

Figure | None

Returns:

Pearson correlation coefficients and their corresponding p-values for matched and unmatched cell lines separately.