Limits...
Comparative analysis of proteome and transcriptome variation in mouse.

Ghazalpour A, Bennett B, Petyuk VA, Orozco L, Hagopian R, Mungrue IN, Farber CR, Sinsheimer J, Kang HM, Furlotte N, Park CC, Wen PZ, Brewer H, Weitz K, Camp DG, Pan C, Yordanova R, Neuhaus I, Tilford C, Siemers N, Gargalovic P, Eskin E, Kirchgessner T, Smith DJ, Smith RD, Lusis AJ - PLoS Genet. (2011)

Bottom Line: For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation.Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels.In light of the widespread use of high-throughput technologies in both clinical and basic research, the results presented have practical as well as basic implications.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine/Division of Cardiology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America. aghazalp@ucla.edu

ABSTRACT
The relationships between the levels of transcripts and the levels of the proteins they encode have not been examined comprehensively in mammals, although previous work in plants and yeast suggest a surprisingly modest correlation. We have examined this issue using a genetic approach in which natural variations were used to perturb both transcript levels and protein levels among inbred strains of mice. We quantified over 5,000 peptides and over 22,000 transcripts in livers of 97 inbred and recombinant inbred strains and focused on the 7,185 most heritable transcripts and 486 most reliable proteins. The transcript levels were quantified by microarray analysis in three replicates and the proteins were quantified by Liquid Chromatography-Mass Spectrometry using O(18)-reference-based isotope labeling approach. We show that the levels of transcripts and proteins correlate significantly for only about half of the genes tested, with an average correlation of 0.27, and the correlations of transcripts and proteins varied depending on the cellular location and biological function of the gene. We examined technical and biological factors that could contribute to the modest correlation. For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation. We also employed genome-wide association analyses to map loci controlling both transcript and protein levels. Surprisingly, little overlap was observed between the protein- and transcript-mapped loci. We have typed numerous clinically relevant traits among the strains, including adiposity, lipoprotein levels, and tissue parameters. Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels. Surprisingly, transcript levels were more strongly correlated with clinical traits than protein levels. In light of the widespread use of high-throughput technologies in both clinical and basic research, the results presented have practical as well as basic implications.

Show MeSH

Related in: MedlinePlus

Proteome and transcriptome data quality.A) Reliability of peptide measurement in LC-MS. The distribution of variance among the technical replicates in the LC-MS data (grey plot) and in the HMDP population (blue plot). B) The frequency of peptides with varying amount as defined by the “signal to noise” ratio. C) Distribution of heritability (fraction of total variance attributed to genetics) in the transcript dataset. The dashed line depicts the significant heritability estimates (p-value<0.05) D) Comparison of Affymetrix data with the Next Generation Sequencing data. E) Number of peptides per gene in the filtered peptide dataset.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3111477&req=5

pgen-1001393-g002: Proteome and transcriptome data quality.A) Reliability of peptide measurement in LC-MS. The distribution of variance among the technical replicates in the LC-MS data (grey plot) and in the HMDP population (blue plot). B) The frequency of peptides with varying amount as defined by the “signal to noise” ratio. C) Distribution of heritability (fraction of total variance attributed to genetics) in the transcript dataset. The dashed line depicts the significant heritability estimates (p-value<0.05) D) Comparison of Affymetrix data with the Next Generation Sequencing data. E) Number of peptides per gene in the filtered peptide dataset.

Mentions: From the original 5363 peptides measured, we selected peptides that a) had less than 50% missing measurements in the whole population, b) had no internal lysine or arginine, and c) aligned uniquely to one Ensembl gene. Fifty four percent of peptides (2893 peptides) passed these initial selection criteria. To assess the quality of the measurements, we investigated the amount of technical noise in the peptides selected. Having the control technical replicates allowed us to measure the reproducibility of the LC-MS measurement and assess whether the variation in the levels of the selected peptides in the HMDP population was due to technical or genetic variation. The distribution of the variance in the control mice and in the HMDP panel are shown in Figure 2A (the blue histogram). The mean and median across the ten replicates were 0.19 and 0.08 (the grey histogram), respectively, suggesting that, for most peptides, the measurements were robust. In contrast, the distribution of the variance was much broader in the genetic population where the mean and median of variances across all the peptides were 0.2 and 0.3 respectively (Figure 2A, the blue histogram).


Comparative analysis of proteome and transcriptome variation in mouse.

Ghazalpour A, Bennett B, Petyuk VA, Orozco L, Hagopian R, Mungrue IN, Farber CR, Sinsheimer J, Kang HM, Furlotte N, Park CC, Wen PZ, Brewer H, Weitz K, Camp DG, Pan C, Yordanova R, Neuhaus I, Tilford C, Siemers N, Gargalovic P, Eskin E, Kirchgessner T, Smith DJ, Smith RD, Lusis AJ - PLoS Genet. (2011)

Proteome and transcriptome data quality.A) Reliability of peptide measurement in LC-MS. The distribution of variance among the technical replicates in the LC-MS data (grey plot) and in the HMDP population (blue plot). B) The frequency of peptides with varying amount as defined by the “signal to noise” ratio. C) Distribution of heritability (fraction of total variance attributed to genetics) in the transcript dataset. The dashed line depicts the significant heritability estimates (p-value<0.05) D) Comparison of Affymetrix data with the Next Generation Sequencing data. E) Number of peptides per gene in the filtered peptide dataset.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3111477&req=5

pgen-1001393-g002: Proteome and transcriptome data quality.A) Reliability of peptide measurement in LC-MS. The distribution of variance among the technical replicates in the LC-MS data (grey plot) and in the HMDP population (blue plot). B) The frequency of peptides with varying amount as defined by the “signal to noise” ratio. C) Distribution of heritability (fraction of total variance attributed to genetics) in the transcript dataset. The dashed line depicts the significant heritability estimates (p-value<0.05) D) Comparison of Affymetrix data with the Next Generation Sequencing data. E) Number of peptides per gene in the filtered peptide dataset.
Mentions: From the original 5363 peptides measured, we selected peptides that a) had less than 50% missing measurements in the whole population, b) had no internal lysine or arginine, and c) aligned uniquely to one Ensembl gene. Fifty four percent of peptides (2893 peptides) passed these initial selection criteria. To assess the quality of the measurements, we investigated the amount of technical noise in the peptides selected. Having the control technical replicates allowed us to measure the reproducibility of the LC-MS measurement and assess whether the variation in the levels of the selected peptides in the HMDP population was due to technical or genetic variation. The distribution of the variance in the control mice and in the HMDP panel are shown in Figure 2A (the blue histogram). The mean and median across the ten replicates were 0.19 and 0.08 (the grey histogram), respectively, suggesting that, for most peptides, the measurements were robust. In contrast, the distribution of the variance was much broader in the genetic population where the mean and median of variances across all the peptides were 0.2 and 0.3 respectively (Figure 2A, the blue histogram).

Bottom Line: For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation.Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels.In light of the widespread use of high-throughput technologies in both clinical and basic research, the results presented have practical as well as basic implications.

View Article: PubMed Central - PubMed

Affiliation: Department of Medicine/Division of Cardiology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America. aghazalp@ucla.edu

ABSTRACT
The relationships between the levels of transcripts and the levels of the proteins they encode have not been examined comprehensively in mammals, although previous work in plants and yeast suggest a surprisingly modest correlation. We have examined this issue using a genetic approach in which natural variations were used to perturb both transcript levels and protein levels among inbred strains of mice. We quantified over 5,000 peptides and over 22,000 transcripts in livers of 97 inbred and recombinant inbred strains and focused on the 7,185 most heritable transcripts and 486 most reliable proteins. The transcript levels were quantified by microarray analysis in three replicates and the proteins were quantified by Liquid Chromatography-Mass Spectrometry using O(18)-reference-based isotope labeling approach. We show that the levels of transcripts and proteins correlate significantly for only about half of the genes tested, with an average correlation of 0.27, and the correlations of transcripts and proteins varied depending on the cellular location and biological function of the gene. We examined technical and biological factors that could contribute to the modest correlation. For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation. We also employed genome-wide association analyses to map loci controlling both transcript and protein levels. Surprisingly, little overlap was observed between the protein- and transcript-mapped loci. We have typed numerous clinically relevant traits among the strains, including adiposity, lipoprotein levels, and tissue parameters. Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels. Surprisingly, transcript levels were more strongly correlated with clinical traits than protein levels. In light of the widespread use of high-throughput technologies in both clinical and basic research, the results presented have practical as well as basic implications.

Show MeSH
Related in: MedlinePlus