pertpy.metadata.LookUp#

class LookUp(type='cell_line', transfer_metadata=None)[source]#

Generate LookUp object for different type of metadata.

Methods table#

available_bulk_rna([cell_line_source, ...])

A brief summary of bulk RNA expression data.

available_cell_lines([cell_line_source, ...])

A brief summary of cell line metadata.

available_compounds([query_id_list, ...])

A brief summary of compound annotation.

available_drug_annotation([...])

A brief summary of drug annotation.

available_drug_response([gdsc_dataset, ...])

A brief summary of drug response data.

available_genes_annotation([reference_id, ...])

A brief summary of gene annotation metadata.

available_moa([query_id_list, target_list])

A brief summary of MoA annotation.

available_protein_expression([reference_id, ...])

A brief summary of protein expression data.

Methods#

LookUp.available_bulk_rna(cell_line_source='sanger', query_id_list=None)[source]#

A brief summary of bulk RNA expression data.

Parameters:
  • cell_line_source (Literal['broad', 'sanger'], default: 'sanger') – the source of RNA-seq data, broad or sanger.

  • query_id_list (Sequence[str] | None, default: None) – Unique cell line identifiers to test the number of matched ids present in the metadata. If set to None, the query of metadata identifiers will be disabled.

Return type:

None

LookUp.available_cell_lines(cell_line_source='DepMap', reference_id='ModelID', query_id_list=None)[source]#

A brief summary of cell line metadata.

Parameters:
  • cell_line_source (Literal['DepMap', 'Cancerrxgene'], default: 'DepMap') – the source of cell line annotation, DepMap or Cancerrxgene.

  • reference_id (str, default: 'ModelID') – The type of cell line identifier in the meta data, e.g. ModelID, CellLineName or StrippedCellLineName. If fetch cell line metadata from Cancerrxgene, it is recommended to choose “stripped_cell_line_name”.

  • query_id_list (Sequence[str] | None, default: None) – Unique cell line identifiers to test the number of matched ids present in the metadata. If set to None, the query of metadata identifiers will be disabled.

Return type:

None

LookUp.available_compounds(query_id_list=None, query_id_type='name')[source]#

A brief summary of compound annotation.

Parameters:
  • query_id_list (Sequence[str] | None, default: None) – Unique compounds to test the number of matched ones present in the metadata. If set to None, query of compound identifiers will be disabled.

  • query_id_type (Literal['name', 'cid'], default: 'name') – The type of compound identifiers, name or cid.

Return type:

None

LookUp.available_drug_annotation(drug_annotation_source='chembl', query_id_list=None, query_id_type='target')[source]#

A brief summary of drug annotation.

Parameters:
  • drug_annotation_source (Literal['chembl', 'dgidb', 'pharmgkb'], default: 'chembl') – the source of drug annotation data, chembl, dgidb or pharmgkb.

  • query_id_list (Sequence[str] | None, default: None) – Unique target or compound names to test the number of matched ones present in the metadata. If set to None, query of compound identifiers will be disabled.

  • query_id_type (Literal['target', 'compound', 'disease'], default: 'target') – The type of identifiers, target, compound and disease(pharmgkb only).

Return type:

None

LookUp.available_drug_response(gdsc_dataset=1, reference_id='cell_line_name', query_id_list=None, reference_perturbation='drug_name', query_perturbation_list=None)[source]#

A brief summary of drug response data.

Parameters:
  • gdsc_dataset (Literal[1, 2], default: 1) – The GDSC dataset, 1 or 2. The GDSC1 dataset updates previous releases with additional drug screening data from the Wellcome Sanger Institute and Massachusetts General Hospital. It covers 970 Cell lines and 403 Compounds with 333292 IC50s. GDSC2 is new and has 243,466 IC50 results from the latest screening at the Wellcome Sanger Institute using improved experimental procedures.

  • reference_id (Literal['cell_line_name', 'sanger_model_id', 'cosmic_id'], default: 'cell_line_name') – The type of cell line identifier in the meta data, cell_line_name, sanger_model_id or cosmic_id.

  • query_id_list (Sequence[str] | None, default: None) – Unique cell line identifiers to test the number of matched ids present in the metadata. If set to None, the query of metadata identifiers will be disabled.

  • reference_perturbation (Literal['drug_name', 'drug_id'], default: 'drug_name') – The perturbation information in the meta data, drug_name or drug_id.

  • query_perturbation_list (Sequence[str] | None, default: None) – Unique perturbation types to test the number of matched ones present in the metadata. If set to None, the query of perturbation types will be disabled.

Return type:

None

LookUp.available_genes_annotation(reference_id='ensembl_gene_id', query_id_list=None)[source]#

A brief summary of gene annotation metadata.

Parameters:
  • reference_id (Literal['gene_id', 'ensembl_gene_id', 'hgnc_id', 'hgnc_symbol'], default: 'ensembl_gene_id') – The type of gene identifier in the meta data, gene_id, ensembl_gene_id, hgnc_id, hgnc_symbol.

  • query_id_list (Sequence[str] | None, default: None) – Unique gene identifiers to test the number of matched ids present in the metadata.

Return type:

None

LookUp.available_moa(query_id_list=None, target_list=None)[source]#

A brief summary of MoA annotation.

Parameters:
  • query_id_list (Sequence[str] | None, default: None) – Unique perturbagens to test the number of matched ones present in the metadata. If set to None, the query of metadata perturbagens will be disabled.

  • target_list (Sequence[str] | None, default: None) – Unique molecular targets to test the number of matched ones present in the metadata. If set to None, the comparison of molecular targets in the query of metadata perturbagens will be disabled.

Return type:

None

LookUp.available_protein_expression(reference_id='model_name', query_id_list=None)[source]#

A brief summary of protein expression data.

Parameters:
  • reference_id (Literal['model_name', 'model_id'], default: 'model_name') – The type of cell line identifier in the meta data, model_name or model_id.

  • query_id_list (Sequence[str] | None, default: None) – Unique cell line identifiers to test the number of matched ids present in the metadata. If set to None, the query of metadata identifiers will be disabled.

Return type:

None