Limits...
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

Al-Tobasei R, Paneru B, Salem M - PLoS ONE (2016)

Bottom Line: Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed.In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test.This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

View Article: PubMed Central - PubMed

Affiliation: Computational Science Program, Middle Tennessee State University, Murfreesboro, TN, 37132, United States of America.

ABSTRACT
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

Show MeSH

Related in: MedlinePlus

Bioinformatics pipeline used in prediction of rainbow trout lncRNAs.LncRNAs were predicted from four different transcriptomic datasets, then all putative lncRNAs from all data were blasted against each other. A total of 54,503 non-redundant lncRNAs identified in at least 2 of the 4 data sets were chosen for further analyses in order to increase the confidence of lncRNA prediction. Vertical arrows are pointing toward the subsequent prediction and filtration steps of the workflow. First horizontal arrow pointing toward the right is referring to the number of initial transcripts predicted from the four datasets. Middle six horizontal arrows indicate the number of transcripts filtered at each step and the final horizontal arrow points to the number of putative lncRNAs with significant hits to noncoding-RNA databases from each dataset.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4764514&req=5

pone.0148940.g001: Bioinformatics pipeline used in prediction of rainbow trout lncRNAs.LncRNAs were predicted from four different transcriptomic datasets, then all putative lncRNAs from all data were blasted against each other. A total of 54,503 non-redundant lncRNAs identified in at least 2 of the 4 data sets were chosen for further analyses in order to increase the confidence of lncRNA prediction. Vertical arrows are pointing toward the subsequent prediction and filtration steps of the workflow. First horizontal arrow pointing toward the right is referring to the number of initial transcripts predicted from the four datasets. Middle six horizontal arrows indicate the number of transcripts filtered at each step and the final horizontal arrow points to the number of putative lncRNAs with significant hits to noncoding-RNA databases from each dataset.

Mentions: The main objective of this study was to identify a comprehensive list of putative lncRNA genes in the rainbow trout genome. To accomplish this, we sequenced poly-A selected cDNA libraries using total RNA isolated from 13 tissues. Recently, we used the same sequencing data to identify protein-coding transcripts in the trout genome [43]. In this study, sequence data for about 1.167 billion, paired-end reads (100 nt) were mapped against a reference rainbow trout genome using the Cufflink and TopHat software [55, 69], resulting in 231,505 putative transcripts. Several filtration steps were used to distinguish lncRNAs in the transcript list by removing the protein-coding transcripts, pseudogenes and other classes of non-coding RNAs including rRNA, miRNA, tRNA, snRNA, snoRNA (Fig 1). First, all transcripts shorter than 200 nt were removed, and then transcripts with an open reading frame (ORF) longer than 100 amino acids were filtered out. Next, remaining transcripts were BLASTx searched against the NCBI non-redundant protein database to eliminate transcripts with sequence similarity to known proteins at a cut off E-value of ≤ 0.0001. To further filter remaining protein-coding transcripts, we used the Coding Potential Calculator (CPC) software that assesses quality and completeness of query ORF to proteins in the NCBI database using six biologically meaningful sequence features [56]. These filtration steps left 44,350 transcripts from this data set that had very little or no evidence of protein-coding ability. Because most of the small non-coding RNAs like miRNA and tRNA are shorter than 200 nt, the first filtration step should be enough to remove most of the small non-coding RNAs. To confirm removal of any remaining small non-coding RNAs (tRNA, rRNA, snoRNA, miRNA, siRNA and other small non-coding RNAs), transcripts were searched against multiple RNA databases including genomic tRNA database, mirBase, and LSU (large subunit ribosomal RNA) and SSU (Small subunit ribosomal RNA) databases [57–60]. After application of the above filtration steps, we found 44,124 putative lncRNAs from our sequence data set (Salem et al., [43]). These lncRNAs exhibited little or no evidence of coding potential or belonging to other non-coding classes of RNA.


Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

Al-Tobasei R, Paneru B, Salem M - PLoS ONE (2016)

Bioinformatics pipeline used in prediction of rainbow trout lncRNAs.LncRNAs were predicted from four different transcriptomic datasets, then all putative lncRNAs from all data were blasted against each other. A total of 54,503 non-redundant lncRNAs identified in at least 2 of the 4 data sets were chosen for further analyses in order to increase the confidence of lncRNA prediction. Vertical arrows are pointing toward the subsequent prediction and filtration steps of the workflow. First horizontal arrow pointing toward the right is referring to the number of initial transcripts predicted from the four datasets. Middle six horizontal arrows indicate the number of transcripts filtered at each step and the final horizontal arrow points to the number of putative lncRNAs with significant hits to noncoding-RNA databases from each dataset.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4764514&req=5

pone.0148940.g001: Bioinformatics pipeline used in prediction of rainbow trout lncRNAs.LncRNAs were predicted from four different transcriptomic datasets, then all putative lncRNAs from all data were blasted against each other. A total of 54,503 non-redundant lncRNAs identified in at least 2 of the 4 data sets were chosen for further analyses in order to increase the confidence of lncRNA prediction. Vertical arrows are pointing toward the subsequent prediction and filtration steps of the workflow. First horizontal arrow pointing toward the right is referring to the number of initial transcripts predicted from the four datasets. Middle six horizontal arrows indicate the number of transcripts filtered at each step and the final horizontal arrow points to the number of putative lncRNAs with significant hits to noncoding-RNA databases from each dataset.
Mentions: The main objective of this study was to identify a comprehensive list of putative lncRNA genes in the rainbow trout genome. To accomplish this, we sequenced poly-A selected cDNA libraries using total RNA isolated from 13 tissues. Recently, we used the same sequencing data to identify protein-coding transcripts in the trout genome [43]. In this study, sequence data for about 1.167 billion, paired-end reads (100 nt) were mapped against a reference rainbow trout genome using the Cufflink and TopHat software [55, 69], resulting in 231,505 putative transcripts. Several filtration steps were used to distinguish lncRNAs in the transcript list by removing the protein-coding transcripts, pseudogenes and other classes of non-coding RNAs including rRNA, miRNA, tRNA, snRNA, snoRNA (Fig 1). First, all transcripts shorter than 200 nt were removed, and then transcripts with an open reading frame (ORF) longer than 100 amino acids were filtered out. Next, remaining transcripts were BLASTx searched against the NCBI non-redundant protein database to eliminate transcripts with sequence similarity to known proteins at a cut off E-value of ≤ 0.0001. To further filter remaining protein-coding transcripts, we used the Coding Potential Calculator (CPC) software that assesses quality and completeness of query ORF to proteins in the NCBI database using six biologically meaningful sequence features [56]. These filtration steps left 44,350 transcripts from this data set that had very little or no evidence of protein-coding ability. Because most of the small non-coding RNAs like miRNA and tRNA are shorter than 200 nt, the first filtration step should be enough to remove most of the small non-coding RNAs. To confirm removal of any remaining small non-coding RNAs (tRNA, rRNA, snoRNA, miRNA, siRNA and other small non-coding RNAs), transcripts were searched against multiple RNA databases including genomic tRNA database, mirBase, and LSU (large subunit ribosomal RNA) and SSU (Small subunit ribosomal RNA) databases [57–60]. After application of the above filtration steps, we found 44,124 putative lncRNAs from our sequence data set (Salem et al., [43]). These lncRNAs exhibited little or no evidence of coding potential or belonging to other non-coding classes of RNA.

Bottom Line: Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed.In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test.This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

View Article: PubMed Central - PubMed

Affiliation: Computational Science Program, Middle Tennessee State University, Murfreesboro, TN, 37132, United States of America.

ABSTRACT
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

Show MeSH
Related in: MedlinePlus