Limits...
Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

View Article: PubMed Central - PubMed

ABSTRACT

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.

Doi:: http://dx.doi.org/10.7554/eLife.13328.001

No MeSH data available.


Related in: MedlinePlus

Annotated genes with peptide hits tend to be longer, have higher expression and a distinct amino acid composition.(A) Cumulative distribution of footprint density for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). The median footprint density of genes with a peptide hit is about 125 fold higher than the median footprint density of genes without a peptide hit. (B) Cumulative distribution of protein length for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). Genes with a peptide hit tend to code for proteins that are 20% longer than proteins encoded by genes without a peptide hit. (C-F) Comparing amino acid composition within tryptic peptides with a mass-spectrum match and tryptic peptides without a mass-spectrum match. Amino acids, grouped by their electrostatic properties, have distinct compositions between matched and unmatched peptides. Matched peptides tend to be significantly shorter than unmatched peptides, and have a distinct composition of charged amino acids.DOI:http://dx.doi.org/10.7554/eLife.13328.016
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4940163&req=5

fig3s4: Annotated genes with peptide hits tend to be longer, have higher expression and a distinct amino acid composition.(A) Cumulative distribution of footprint density for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). The median footprint density of genes with a peptide hit is about 125 fold higher than the median footprint density of genes without a peptide hit. (B) Cumulative distribution of protein length for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). Genes with a peptide hit tend to code for proteins that are 20% longer than proteins encoded by genes without a peptide hit. (C-F) Comparing amino acid composition within tryptic peptides with a mass-spectrum match and tryptic peptides without a mass-spectrum match. Amino acids, grouped by their electrostatic properties, have distinct compositions between matched and unmatched peptides. Matched peptides tend to be significantly shorter than unmatched peptides, and have a distinct composition of charged amino acids.DOI:http://dx.doi.org/10.7554/eLife.13328.016

Mentions: 1. The median footprint density of annotated coding genes with at least one peptide match is about 125 fold higher than that of coding genes with no peptide match (see Figure 3—figure supplement 4A).


Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling
Annotated genes with peptide hits tend to be longer, have higher expression and a distinct amino acid composition.(A) Cumulative distribution of footprint density for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). The median footprint density of genes with a peptide hit is about 125 fold higher than the median footprint density of genes without a peptide hit. (B) Cumulative distribution of protein length for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). Genes with a peptide hit tend to code for proteins that are 20% longer than proteins encoded by genes without a peptide hit. (C-F) Comparing amino acid composition within tryptic peptides with a mass-spectrum match and tryptic peptides without a mass-spectrum match. Amino acids, grouped by their electrostatic properties, have distinct compositions between matched and unmatched peptides. Matched peptides tend to be significantly shorter than unmatched peptides, and have a distinct composition of charged amino acids.DOI:http://dx.doi.org/10.7554/eLife.13328.016
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4940163&req=5

fig3s4: Annotated genes with peptide hits tend to be longer, have higher expression and a distinct amino acid composition.(A) Cumulative distribution of footprint density for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). The median footprint density of genes with a peptide hit is about 125 fold higher than the median footprint density of genes without a peptide hit. (B) Cumulative distribution of protein length for genes with at least one unique peptide hit (blue) and genes with no unique peptide hit (red). Genes with a peptide hit tend to code for proteins that are 20% longer than proteins encoded by genes without a peptide hit. (C-F) Comparing amino acid composition within tryptic peptides with a mass-spectrum match and tryptic peptides without a mass-spectrum match. Amino acids, grouped by their electrostatic properties, have distinct compositions between matched and unmatched peptides. Matched peptides tend to be significantly shorter than unmatched peptides, and have a distinct composition of charged amino acids.DOI:http://dx.doi.org/10.7554/eLife.13328.016
Mentions: 1. The median footprint density of annotated coding genes with at least one peptide match is about 125 fold higher than that of coding genes with no peptide match (see Figure 3—figure supplement 4A).

View Article: PubMed Central - PubMed

ABSTRACT

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.

Doi:: http://dx.doi.org/10.7554/eLife.13328.001

No MeSH data available.


Related in: MedlinePlus