Limits...
Comprehensive comparison of large-scale tissue expression datasets.

Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O'Donoghue SI, Jensen LJ - PeerJ (2015)

Bottom Line: Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated.We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining.We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen , Denmark.

ABSTRACT
For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.

No MeSH data available.


Consistency of the transcriptome datasets.We assessed the consistency of the five transcriptome datasets by calculating the overlap of gene–tissue associations for the shared genes and tissues. At all levels of confidence, we observe surprisingly good agreement, with the largest count in each Venn diagram representing associations found by all five datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493645&req=5

fig-3: Consistency of the transcriptome datasets.We assessed the consistency of the five transcriptome datasets by calculating the overlap of gene–tissue associations for the shared genes and tissues. At all levels of confidence, we observe surprisingly good agreement, with the largest count in each Venn diagram representing associations found by all five datasets.

Mentions: The previous analysis showed that the global trends in terms of tissue specificity are similar across the transcriptome datasets. That, however, does not imply that the datasets necessarily agree on which genes are expressed where. To quantify the agreement, we focused on the five tissues and 3,254 genes covered by all the transcriptome datasets. Comparing the five transcriptome datasets, we saw that genes are assigned to tissues with high consistency between datasets at all three confidence levels (Fig. 3) (P < 10−15 for all pairwise overlaps). At medium confidence 39.2% (5679/14504) of gene–tissue associations are common to all datasets and 65.8% (9537/14504) are common to at least four of the five datasets (Data S2).


Comprehensive comparison of large-scale tissue expression datasets.

Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O'Donoghue SI, Jensen LJ - PeerJ (2015)

Consistency of the transcriptome datasets.We assessed the consistency of the five transcriptome datasets by calculating the overlap of gene–tissue associations for the shared genes and tissues. At all levels of confidence, we observe surprisingly good agreement, with the largest count in each Venn diagram representing associations found by all five datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493645&req=5

fig-3: Consistency of the transcriptome datasets.We assessed the consistency of the five transcriptome datasets by calculating the overlap of gene–tissue associations for the shared genes and tissues. At all levels of confidence, we observe surprisingly good agreement, with the largest count in each Venn diagram representing associations found by all five datasets.
Mentions: The previous analysis showed that the global trends in terms of tissue specificity are similar across the transcriptome datasets. That, however, does not imply that the datasets necessarily agree on which genes are expressed where. To quantify the agreement, we focused on the five tissues and 3,254 genes covered by all the transcriptome datasets. Comparing the five transcriptome datasets, we saw that genes are assigned to tissues with high consistency between datasets at all three confidence levels (Fig. 3) (P < 10−15 for all pairwise overlaps). At medium confidence 39.2% (5679/14504) of gene–tissue associations are common to all datasets and 65.8% (9537/14504) are common to at least four of the five datasets (Data S2).

Bottom Line: Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated.We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining.We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , Copenhagen , Denmark.

ABSTRACT
For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.

No MeSH data available.