Limits...
The vast, conserved mammalian lincRNome.

Managadze D, Lobkovsky AE, Wolf YI, Shabalina SA, Rogozin IB, Koonin EV - PLoS Comput. Biol. (2013)

Bottom Line: Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes.We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences.The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.

ABSTRACT
We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.

Show MeSH
Computational pipeline to characterize the lincRNome.The subset of orthologous lincRNAs (Kb) was obtained by comparing genomic positions of mouse and human lincRNA genes (minimal overlap 100 nucleotides), with further manual inspection of the genomic alignments. This comparison yielded 196 pairs of unique orthologous pairs of human and mouse lincRNA genes (Kb). Of the 4662 human lincRNAs (Lh), corresponding alignable regions in mouse were detected for 3529. These sequences were designated putative orthologs and checked for evidence of expression using RNAseq data for mouse tissues. Of the 3369 putative lincRNAs, for which the exon models could be determined, 2872 showed expression level greater than zero (Kh). Similarly, the subset of mouse lincRNAs with expressed putative orthologs (Km) was identified by searching for evidence of expression in human tissues. Of the 4156 mouse lincRNAs (Lm), for 3157 corresponding alignable regions with expression level greater than zero were identified in mouse. After applying ORF (<120 nucleotides), indel and expression thresholds (see Methods for details), final results (Figure 2 and Table 1) were obtained using a Maximum Likelihood Model (see Methods for details) and Lm, Lh, Km, Kh, Kb as input parameters (shown by dashed arrows) to estimate the size of the human lincRNome (Nh), the mouse lincRNome (Nm) and the orthologous subset of the two lincRNomes (Nb). For details of the procedures see Methods.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3585383&req=5

pcbi-1002917-g001: Computational pipeline to characterize the lincRNome.The subset of orthologous lincRNAs (Kb) was obtained by comparing genomic positions of mouse and human lincRNA genes (minimal overlap 100 nucleotides), with further manual inspection of the genomic alignments. This comparison yielded 196 pairs of unique orthologous pairs of human and mouse lincRNA genes (Kb). Of the 4662 human lincRNAs (Lh), corresponding alignable regions in mouse were detected for 3529. These sequences were designated putative orthologs and checked for evidence of expression using RNAseq data for mouse tissues. Of the 3369 putative lincRNAs, for which the exon models could be determined, 2872 showed expression level greater than zero (Kh). Similarly, the subset of mouse lincRNAs with expressed putative orthologs (Km) was identified by searching for evidence of expression in human tissues. Of the 4156 mouse lincRNAs (Lm), for 3157 corresponding alignable regions with expression level greater than zero were identified in mouse. After applying ORF (<120 nucleotides), indel and expression thresholds (see Methods for details), final results (Figure 2 and Table 1) were obtained using a Maximum Likelihood Model (see Methods for details) and Lm, Lh, Km, Kh, Kb as input parameters (shown by dashed arrows) to estimate the size of the human lincRNome (Nh), the mouse lincRNome (Nm) and the orthologous subset of the two lincRNomes (Nb). For details of the procedures see Methods.

Mentions: A computational pipeline was developed to compare the sets of validated lincRNAs from human and mouse and to identify expressed orthologs by mapping the sequences to the respective counterpart genome and searching the available RNAseq data [28] (Figure 1). We then applied a maximum likelihood (ML) technique to estimate the total number of lincRNA genes in the human and mouse genomes as well as the number of orthologous lincRNA genes (see Online Methods). The following simplifying assumptions were made:


The vast, conserved mammalian lincRNome.

Managadze D, Lobkovsky AE, Wolf YI, Shabalina SA, Rogozin IB, Koonin EV - PLoS Comput. Biol. (2013)

Computational pipeline to characterize the lincRNome.The subset of orthologous lincRNAs (Kb) was obtained by comparing genomic positions of mouse and human lincRNA genes (minimal overlap 100 nucleotides), with further manual inspection of the genomic alignments. This comparison yielded 196 pairs of unique orthologous pairs of human and mouse lincRNA genes (Kb). Of the 4662 human lincRNAs (Lh), corresponding alignable regions in mouse were detected for 3529. These sequences were designated putative orthologs and checked for evidence of expression using RNAseq data for mouse tissues. Of the 3369 putative lincRNAs, for which the exon models could be determined, 2872 showed expression level greater than zero (Kh). Similarly, the subset of mouse lincRNAs with expressed putative orthologs (Km) was identified by searching for evidence of expression in human tissues. Of the 4156 mouse lincRNAs (Lm), for 3157 corresponding alignable regions with expression level greater than zero were identified in mouse. After applying ORF (<120 nucleotides), indel and expression thresholds (see Methods for details), final results (Figure 2 and Table 1) were obtained using a Maximum Likelihood Model (see Methods for details) and Lm, Lh, Km, Kh, Kb as input parameters (shown by dashed arrows) to estimate the size of the human lincRNome (Nh), the mouse lincRNome (Nm) and the orthologous subset of the two lincRNomes (Nb). For details of the procedures see Methods.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3585383&req=5

pcbi-1002917-g001: Computational pipeline to characterize the lincRNome.The subset of orthologous lincRNAs (Kb) was obtained by comparing genomic positions of mouse and human lincRNA genes (minimal overlap 100 nucleotides), with further manual inspection of the genomic alignments. This comparison yielded 196 pairs of unique orthologous pairs of human and mouse lincRNA genes (Kb). Of the 4662 human lincRNAs (Lh), corresponding alignable regions in mouse were detected for 3529. These sequences were designated putative orthologs and checked for evidence of expression using RNAseq data for mouse tissues. Of the 3369 putative lincRNAs, for which the exon models could be determined, 2872 showed expression level greater than zero (Kh). Similarly, the subset of mouse lincRNAs with expressed putative orthologs (Km) was identified by searching for evidence of expression in human tissues. Of the 4156 mouse lincRNAs (Lm), for 3157 corresponding alignable regions with expression level greater than zero were identified in mouse. After applying ORF (<120 nucleotides), indel and expression thresholds (see Methods for details), final results (Figure 2 and Table 1) were obtained using a Maximum Likelihood Model (see Methods for details) and Lm, Lh, Km, Kh, Kb as input parameters (shown by dashed arrows) to estimate the size of the human lincRNome (Nh), the mouse lincRNome (Nm) and the orthologous subset of the two lincRNomes (Nb). For details of the procedures see Methods.
Mentions: A computational pipeline was developed to compare the sets of validated lincRNAs from human and mouse and to identify expressed orthologs by mapping the sequences to the respective counterpart genome and searching the available RNAseq data [28] (Figure 1). We then applied a maximum likelihood (ML) technique to estimate the total number of lincRNA genes in the human and mouse genomes as well as the number of orthologous lincRNA genes (see Online Methods). The following simplifying assumptions were made:

Bottom Line: Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes.We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences.The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.

View Article: PubMed Central - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.

ABSTRACT
We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.

Show MeSH