Limits...
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

Al-Tobasei R, Paneru B, Salem M - PLoS ONE (2016)

Bottom Line: Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed.In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test.This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

View Article: PubMed Central - PubMed

Affiliation: Computational Science Program, Middle Tennessee State University, Murfreesboro, TN, 37132, United States of America.

ABSTRACT
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

Show MeSH

Related in: MedlinePlus

Classification of lncRNAs based on their intersection with protein-coding genes and number of lncRNAs in each class.Diagram on the top is a visual illustration of each class of lncRNAs relative to nearest protein-coding gene(s) based on genomic position and direction of transcripts. Bottom Fig in tabular format presents number of different classes of lncRNAs from each class. Numbers inside brackets following data source references indicate total number of that particular class of lncRNAs. Letters C, D, S, AS and U indicate number of convergent, divergent, sense, anti-sense and transcripts with unknown directionality, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4764514&req=5

pone.0148940.g003: Classification of lncRNAs based on their intersection with protein-coding genes and number of lncRNAs in each class.Diagram on the top is a visual illustration of each class of lncRNAs relative to nearest protein-coding gene(s) based on genomic position and direction of transcripts. Bottom Fig in tabular format presents number of different classes of lncRNAs from each class. Numbers inside brackets following data source references indicate total number of that particular class of lncRNAs. Letters C, D, S, AS and U indicate number of convergent, divergent, sense, anti-sense and transcripts with unknown directionality, respectively.

Mentions: LncRNAs are classified, based on their intersection with protein-coding genes, as genic and intergenic [9]. Some of the lncRNAs are located in transcriptionally-active regions and influence expression of neighboring genes [8, 73]. Therefore, the genomic position of lncRNAs relative to protein-coding genes can possibly provide important clues about lncRNA-mediated regulation of protein-coding genes [74]. Our data indicate that 7,847 (14.4%) of the lncRNAs intersected with protein-coding gene and thus are called genic (Fig 3). Of these lncRNAs 4,697 (8.6%), were intronic lncRNAs, existing in introns of protein-coding genes but do not intersect with any exons, and 3,091 (5.6%) exonic, sharing at least part of a protein-coding exon. Among those lncRNAs, 248 were sense and 1,488 were antisense; and 6,052 lncRNAs had an unknown orientation. In addition, there were 59 lncRNAs that completely overlapped with a protein-coding gene by containing this protein-coding gene within its intron. Fig 3 and S1 table show classification and number of lncRNAs based on their intersection with protein-coding genes. There were 46,656 (85.6%) intergenic lncRNAs in the trout genome that did not intersect but were within 15 kb of the nearest protein-coding gene. Those intergenic lncRNAs were further divided into 3,588 convergent (same sense) and 3,428 divergent (opposite sense). Consistent with our study, previous reports in humans indicate that the majority of lncRNA transcripts do not intersect with protein-coding genes [9].


Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

Al-Tobasei R, Paneru B, Salem M - PLoS ONE (2016)

Classification of lncRNAs based on their intersection with protein-coding genes and number of lncRNAs in each class.Diagram on the top is a visual illustration of each class of lncRNAs relative to nearest protein-coding gene(s) based on genomic position and direction of transcripts. Bottom Fig in tabular format presents number of different classes of lncRNAs from each class. Numbers inside brackets following data source references indicate total number of that particular class of lncRNAs. Letters C, D, S, AS and U indicate number of convergent, divergent, sense, anti-sense and transcripts with unknown directionality, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4764514&req=5

pone.0148940.g003: Classification of lncRNAs based on their intersection with protein-coding genes and number of lncRNAs in each class.Diagram on the top is a visual illustration of each class of lncRNAs relative to nearest protein-coding gene(s) based on genomic position and direction of transcripts. Bottom Fig in tabular format presents number of different classes of lncRNAs from each class. Numbers inside brackets following data source references indicate total number of that particular class of lncRNAs. Letters C, D, S, AS and U indicate number of convergent, divergent, sense, anti-sense and transcripts with unknown directionality, respectively.
Mentions: LncRNAs are classified, based on their intersection with protein-coding genes, as genic and intergenic [9]. Some of the lncRNAs are located in transcriptionally-active regions and influence expression of neighboring genes [8, 73]. Therefore, the genomic position of lncRNAs relative to protein-coding genes can possibly provide important clues about lncRNA-mediated regulation of protein-coding genes [74]. Our data indicate that 7,847 (14.4%) of the lncRNAs intersected with protein-coding gene and thus are called genic (Fig 3). Of these lncRNAs 4,697 (8.6%), were intronic lncRNAs, existing in introns of protein-coding genes but do not intersect with any exons, and 3,091 (5.6%) exonic, sharing at least part of a protein-coding exon. Among those lncRNAs, 248 were sense and 1,488 were antisense; and 6,052 lncRNAs had an unknown orientation. In addition, there were 59 lncRNAs that completely overlapped with a protein-coding gene by containing this protein-coding gene within its intron. Fig 3 and S1 table show classification and number of lncRNAs based on their intersection with protein-coding genes. There were 46,656 (85.6%) intergenic lncRNAs in the trout genome that did not intersect but were within 15 kb of the nearest protein-coding gene. Those intergenic lncRNAs were further divided into 3,588 convergent (same sense) and 3,428 divergent (opposite sense). Consistent with our study, previous reports in humans indicate that the majority of lncRNA transcripts do not intersect with protein-coding genes [9].

Bottom Line: Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed.In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test.This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

View Article: PubMed Central - PubMed

Affiliation: Computational Science Program, Middle Tennessee State University, Murfreesboro, TN, 37132, United States of America.

ABSTRACT
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

Show MeSH
Related in: MedlinePlus