CMPD: cancer mutant proteome database.
Bottom Line: Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases.Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences.While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data.
Affiliation: Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan.Show MeSH
Related in: MedlinePlus
Mentions: Figure 1 shows the overview of CMPD. To generate this database, over 2 millions genetic alterations were retrieved from large-scale cancer genomics studies (1–3), which were subsequently annotated by using information from a variety of external databases. To facilitate functional interpretation of these alterations, only nucleotide sequence variants that alter the protein sequences are gathered by CMPD. The current version of CMPD contains 3 379 122 mutated protein sequences (including isoforms) with respect to 1 661 156 non-synonymous coding variants. Descriptions on the data sources and distribution of mutation types are summarized in Table 1. Supplementary Figure S1 is a Venn diagram illustrating the overlapping of protein-altering mutations between CMPD and widely accepted resources such as UniProt (11), COSMIC (8) and IARC TP53 (9). Since the mutation events collected in CMPD were gathered from cancer cell lines and TCGA cancerous samples, a large proportion of COSMIC mutations and all mutation events in IARC TP53 database were covered by CMPD database. As UniProt is dedicated to collect wild-type protein sequences with curated protein information for all species, human mutant proteins listed in UniProt variant database (http://www.uniprot.org/docs/humsavar) are related to diseases or polymorphisms. It is thus not surprising to see that just a few missense point mutations were overlapped with CMPD. Importantly, a large proportion of mutation events including missense, nonsense, and frame-shift mutations are not categorized in COSMIC, indicating that CMPD is equipped for identifying novel cancer biomarkers.
Affiliation: Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan.