Limits...
Comprehensive comparison of large-scale tissue expression datasets.

Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O'Donoghue SI, Jensen LJ - PeerJ (2015)

Bottom Line: Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated.We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining.We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen , Denmark.

ABSTRACT
For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.

No MeSH data available.


Analysis of the proteomic datasets.(A) To make the data from HPA IHC and HPM comparable with other datasets, we developed a quality scoring scheme for each. The quality scores show good correlation with the fold enrichment for associations from the UniProtKB and the mRNA reference sets. (B) The distribution of expression breadth is consistent with the results of the transcriptome datasets in case of HPM, whereas the results for HPA IHC vary qualitatively between confidence levels.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493645&req=5

fig-5: Analysis of the proteomic datasets.(A) To make the data from HPA IHC and HPM comparable with other datasets, we developed a quality scoring scheme for each. The quality scores show good correlation with the fold enrichment for associations from the UniProtKB and the mRNA reference sets. (B) The distribution of expression breadth is consistent with the results of the transcriptome datasets in case of HPM, whereas the results for HPA IHC vary qualitatively between confidence levels.

Mentions: With the scoring schemes defined, we analyzed the two proteomics datasets with respect to enrichment for associations from both the UniProtKB and mRNA reference sets (Fig. 5A). Higher scores were correlated with higher enrichment, validating that the proposed scoring schemes work. Despite looking at proteins instead of transcripts, the proteomics datasets show worse fold enrichment than the transcriptome datasets, when compared to the UniProtKB gold standard. This is consistent with the criticism raised over the quality of the HPM data based on an analysis of olfactory receptors expressed in multiple tissues (Ezkurdia et al., 2014), which demonstrated a high percentage of false positives in this dataset. In case of HPA IHC, this is especially true for data derived based only on a single antibody.


Comprehensive comparison of large-scale tissue expression datasets.

Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O'Donoghue SI, Jensen LJ - PeerJ (2015)

Analysis of the proteomic datasets.(A) To make the data from HPA IHC and HPM comparable with other datasets, we developed a quality scoring scheme for each. The quality scores show good correlation with the fold enrichment for associations from the UniProtKB and the mRNA reference sets. (B) The distribution of expression breadth is consistent with the results of the transcriptome datasets in case of HPM, whereas the results for HPA IHC vary qualitatively between confidence levels.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493645&req=5

fig-5: Analysis of the proteomic datasets.(A) To make the data from HPA IHC and HPM comparable with other datasets, we developed a quality scoring scheme for each. The quality scores show good correlation with the fold enrichment for associations from the UniProtKB and the mRNA reference sets. (B) The distribution of expression breadth is consistent with the results of the transcriptome datasets in case of HPM, whereas the results for HPA IHC vary qualitatively between confidence levels.
Mentions: With the scoring schemes defined, we analyzed the two proteomics datasets with respect to enrichment for associations from both the UniProtKB and mRNA reference sets (Fig. 5A). Higher scores were correlated with higher enrichment, validating that the proposed scoring schemes work. Despite looking at proteins instead of transcripts, the proteomics datasets show worse fold enrichment than the transcriptome datasets, when compared to the UniProtKB gold standard. This is consistent with the criticism raised over the quality of the HPM data based on an analysis of olfactory receptors expressed in multiple tissues (Ezkurdia et al., 2014), which demonstrated a high percentage of false positives in this dataset. In case of HPA IHC, this is especially true for data derived based only on a single antibody.

Bottom Line: Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated.We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining.We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen , Denmark.

ABSTRACT
For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.

No MeSH data available.