Limits...
Parallel deep transcriptome and proteome analysis of zebrafish larvae.

Palmblad M, Henkel CV, Dirks RP, Meijer AH, Deelder AM, Spaink HP - BMC Res Notes (2013)

Bottom Line: We compared Agilent custom made expression microarrays with Illumina deep sequencing for RNA analysis, showing as expected a high degree of correlation of expression of a common set of 18,230 genes.Gene expression was also found to correlate with the abundance of 963 distinct proteins, with several categories of genes as exceptions.By comparing state of the art transcriptomic and proteomic technologies on samples derived from the same group of organisms we have for the first time benchmarked the differences in these technologies with regard to sensitivity and bias towards detection of particular gene categories in zebrafish.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Proteomics and Metabolomics, Leiden University Medical Center, Zone L04-Q, P,O, Box 9600, 2300 RC, Leiden, The Netherlands. n.m.palmblad@lumc.nl.

ABSTRACT

Background: Sensitivity and throughput of transcriptomic and proteomic technologies have advanced tremendously in recent years. With the use of deep sequencing of RNA samples (RNA-seq) and mass spectrometry technology for protein identification and quantitation, it is now feasible to compare gene and protein expression on a massive scale and for any organism for which genomic data is available. Although these technologies are currently applied to many research questions in various model systems ranging from cell cultures to the entire organism level, there are few comparative studies of these technologies in the same system, let alone on the same samples. Here we present a comparison between gene and protein expression in embryos of zebrafish, which is an upcoming model in disease studies.

Results: We compared Agilent custom made expression microarrays with Illumina deep sequencing for RNA analysis, showing as expected a high degree of correlation of expression of a common set of 18,230 genes. Gene expression was also found to correlate with the abundance of 963 distinct proteins, with several categories of genes as exceptions. These exceptions include ribosomal proteins, histones and vitellogenins, for which biological and technical explanations are discussed.

Conclusions: By comparing state of the art transcriptomic and proteomic technologies on samples derived from the same group of organisms we have for the first time benchmarked the differences in these technologies with regard to sensitivity and bias towards detection of particular gene categories in zebrafish. Our datasets submitted to public repositories are a good starting point for researchers interested in disease progression in zebrafish at a stage of development highly suited for high throughput screening technologies.

Show MeSH

Related in: MedlinePlus

Correlation between mRNA and protein detection signals. Plots of transcript and protein detection levels for 963 genes detected by all three technologies (left). Every point represents a gene annotation. Correspondence is expressed by Spearman’s correlation coefficients (ρ). In the detailed view of the mRNA abundance assayed by RNA-seq versus protein abundance (right), three categories of proteins/transcripts that do not conform to the overall correlation trend are highlighted: ribosomal (red, GO cellular component ribosome), histones (blue, based on gene descriptions) and vitellogenins (green, based on gene descriptions). Crosses are multiple gene matches (one protein/many genes or many proteins/many genes).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4016144&req=5

Figure 3: Correlation between mRNA and protein detection signals. Plots of transcript and protein detection levels for 963 genes detected by all three technologies (left). Every point represents a gene annotation. Correspondence is expressed by Spearman’s correlation coefficients (ρ). In the detailed view of the mRNA abundance assayed by RNA-seq versus protein abundance (right), three categories of proteins/transcripts that do not conform to the overall correlation trend are highlighted: ribosomal (red, GO cellular component ribosome), histones (blue, based on gene descriptions) and vitellogenins (green, based on gene descriptions). Crosses are multiple gene matches (one protein/many genes or many proteins/many genes).

Mentions: Of each category shown in 7 colours in the Venn diagram of Figure 1 we have compared the detection levels in the boxplots in Figure 2. The total overlap of the annotated genes detected using all three technologies is shown in white. In general, these genes detected using all technologies (Figure 2, first columns) have higher protein/mRNA abundance than genes detected using only a single technology (Figure 2, columns 4-6). For LC-MS/MS and microarray, the last one and two columns, respectively, summarize signals that cannot be linked to Ensembl gene IDs. In the used custom microarray a majority of these probes were designed for exons that were either dubious or possibly linked to differential splicing. Future reannotation of the zebrafish genome will undoubtedly lead to removal of many of these probes from the design and therefore these were not analyzed further. It is of interest to note that the non-common overlap between the proteomics data is larger with the RNA-seq data (164 annotations in yellow) than with microarrays (16 annotations in pink), even though the expression levels are in the same range. This result emphasizes the advantage of an unbiased deep sequencing approach over a biased microarray approach in transcriptome analyses. We mostly focused on the overlapping set of 963 genes detected by all three technologies. As shown in Figure 3, microarrays and RNA-seq levels exhibit relatively high correspondence (Spearman’s correlation coefficient 0.62), but with a noticeable bias at high signal strengths. Correlation between the transcriptomic technologies and proteomics is less obvious, e.g. 0.14 for the RNA-seq vs MS. However, when we zoom in on the mRNA abundance assayed by mRNA-seq versus protein abundance in Figure 3B we can see that many genes that do not correlate in expression levels belong to three well known gene categories: ribosomal (red, GO cellular component ribosome), histones (blue, based on gene descriptions) and vitellogenins (green, based on gene descriptions). Based on the predicted functions of these groups of genes we can explain why there are such distinct differences at the transcription and proteome levels. Most obviously, the vitellogenins are maternally expressed proteins of which the genes are transcribed in the female liver and not in the embryos [14]. The vitellogenins are transported from the liver to the gonad and deposited in the eggs. Since the proteins are expected to be very stable in the embryo, a much higher level of the protein than the mRNAs is expected at 5 days post fertilization. The much higher level of ribosomal protein transcripts than protein levels can be explained because several of these also have a function as untranslated RNAs. Histones mRNAs are generally not polyadenylated, and therefore will be underrepresented in the RNA-seq data, because polyadenylated mRNA was captured using poly-dT primers prior to random-primed cDNA synthesis. In addition, histones are DNA-binding proteins with many positively charged amino acids, ionizing and fragmenting well in positive-mode electrospray-tandem mass spectrometry. Leaving out mappings to multiple genes increases the Spearman’s correlation between RNA-seq and MS data to 0.26, and by additionally leaving out the three identified and explained special cases it increases to 0.30.


Parallel deep transcriptome and proteome analysis of zebrafish larvae.

Palmblad M, Henkel CV, Dirks RP, Meijer AH, Deelder AM, Spaink HP - BMC Res Notes (2013)

Correlation between mRNA and protein detection signals. Plots of transcript and protein detection levels for 963 genes detected by all three technologies (left). Every point represents a gene annotation. Correspondence is expressed by Spearman’s correlation coefficients (ρ). In the detailed view of the mRNA abundance assayed by RNA-seq versus protein abundance (right), three categories of proteins/transcripts that do not conform to the overall correlation trend are highlighted: ribosomal (red, GO cellular component ribosome), histones (blue, based on gene descriptions) and vitellogenins (green, based on gene descriptions). Crosses are multiple gene matches (one protein/many genes or many proteins/many genes).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4016144&req=5

Figure 3: Correlation between mRNA and protein detection signals. Plots of transcript and protein detection levels for 963 genes detected by all three technologies (left). Every point represents a gene annotation. Correspondence is expressed by Spearman’s correlation coefficients (ρ). In the detailed view of the mRNA abundance assayed by RNA-seq versus protein abundance (right), three categories of proteins/transcripts that do not conform to the overall correlation trend are highlighted: ribosomal (red, GO cellular component ribosome), histones (blue, based on gene descriptions) and vitellogenins (green, based on gene descriptions). Crosses are multiple gene matches (one protein/many genes or many proteins/many genes).
Mentions: Of each category shown in 7 colours in the Venn diagram of Figure 1 we have compared the detection levels in the boxplots in Figure 2. The total overlap of the annotated genes detected using all three technologies is shown in white. In general, these genes detected using all technologies (Figure 2, first columns) have higher protein/mRNA abundance than genes detected using only a single technology (Figure 2, columns 4-6). For LC-MS/MS and microarray, the last one and two columns, respectively, summarize signals that cannot be linked to Ensembl gene IDs. In the used custom microarray a majority of these probes were designed for exons that were either dubious or possibly linked to differential splicing. Future reannotation of the zebrafish genome will undoubtedly lead to removal of many of these probes from the design and therefore these were not analyzed further. It is of interest to note that the non-common overlap between the proteomics data is larger with the RNA-seq data (164 annotations in yellow) than with microarrays (16 annotations in pink), even though the expression levels are in the same range. This result emphasizes the advantage of an unbiased deep sequencing approach over a biased microarray approach in transcriptome analyses. We mostly focused on the overlapping set of 963 genes detected by all three technologies. As shown in Figure 3, microarrays and RNA-seq levels exhibit relatively high correspondence (Spearman’s correlation coefficient 0.62), but with a noticeable bias at high signal strengths. Correlation between the transcriptomic technologies and proteomics is less obvious, e.g. 0.14 for the RNA-seq vs MS. However, when we zoom in on the mRNA abundance assayed by mRNA-seq versus protein abundance in Figure 3B we can see that many genes that do not correlate in expression levels belong to three well known gene categories: ribosomal (red, GO cellular component ribosome), histones (blue, based on gene descriptions) and vitellogenins (green, based on gene descriptions). Based on the predicted functions of these groups of genes we can explain why there are such distinct differences at the transcription and proteome levels. Most obviously, the vitellogenins are maternally expressed proteins of which the genes are transcribed in the female liver and not in the embryos [14]. The vitellogenins are transported from the liver to the gonad and deposited in the eggs. Since the proteins are expected to be very stable in the embryo, a much higher level of the protein than the mRNAs is expected at 5 days post fertilization. The much higher level of ribosomal protein transcripts than protein levels can be explained because several of these also have a function as untranslated RNAs. Histones mRNAs are generally not polyadenylated, and therefore will be underrepresented in the RNA-seq data, because polyadenylated mRNA was captured using poly-dT primers prior to random-primed cDNA synthesis. In addition, histones are DNA-binding proteins with many positively charged amino acids, ionizing and fragmenting well in positive-mode electrospray-tandem mass spectrometry. Leaving out mappings to multiple genes increases the Spearman’s correlation between RNA-seq and MS data to 0.26, and by additionally leaving out the three identified and explained special cases it increases to 0.30.

Bottom Line: We compared Agilent custom made expression microarrays with Illumina deep sequencing for RNA analysis, showing as expected a high degree of correlation of expression of a common set of 18,230 genes.Gene expression was also found to correlate with the abundance of 963 distinct proteins, with several categories of genes as exceptions.By comparing state of the art transcriptomic and proteomic technologies on samples derived from the same group of organisms we have for the first time benchmarked the differences in these technologies with regard to sensitivity and bias towards detection of particular gene categories in zebrafish.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Proteomics and Metabolomics, Leiden University Medical Center, Zone L04-Q, P,O, Box 9600, 2300 RC, Leiden, The Netherlands. n.m.palmblad@lumc.nl.

ABSTRACT

Background: Sensitivity and throughput of transcriptomic and proteomic technologies have advanced tremendously in recent years. With the use of deep sequencing of RNA samples (RNA-seq) and mass spectrometry technology for protein identification and quantitation, it is now feasible to compare gene and protein expression on a massive scale and for any organism for which genomic data is available. Although these technologies are currently applied to many research questions in various model systems ranging from cell cultures to the entire organism level, there are few comparative studies of these technologies in the same system, let alone on the same samples. Here we present a comparison between gene and protein expression in embryos of zebrafish, which is an upcoming model in disease studies.

Results: We compared Agilent custom made expression microarrays with Illumina deep sequencing for RNA analysis, showing as expected a high degree of correlation of expression of a common set of 18,230 genes. Gene expression was also found to correlate with the abundance of 963 distinct proteins, with several categories of genes as exceptions. These exceptions include ribosomal proteins, histones and vitellogenins, for which biological and technical explanations are discussed.

Conclusions: By comparing state of the art transcriptomic and proteomic technologies on samples derived from the same group of organisms we have for the first time benchmarked the differences in these technologies with regard to sensitivity and bias towards detection of particular gene categories in zebrafish. Our datasets submitted to public repositories are a good starting point for researchers interested in disease progression in zebrafish at a stage of development highly suited for high throughput screening technologies.

Show MeSH
Related in: MedlinePlus