Limits...
UMARS: Un-MAppable Reads Solution.

Li SC, Chan WC, Lai CH, Tsai KW, Hsu CN, Jou YS, Chen HC, Chen CH, Lin WC - BMC Bioinformatics (2011)

Bottom Line: However, a fraction of NGS reads failed to be mapped back to the reference sequences.By demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS.Our UMARS pipeline provides another way to examine and recycle the un-mappable reads that are commonly discarded as garbage.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

ABSTRACT

Background: Un-MAppable Reads Solution (UMARS) is a user-friendly web service focusing on retrieving valuable information from sequence reads that cannot be mapped back to reference genomes. Recently, next-generation sequencing (NGS) technology has emerged as a powerful tool for generating high-throughput sequencing data and has been applied to many kinds of biological research. In a typical analysis, adaptor-trimmed NGS reads were first mapped back to reference sequences, including genomes or transcripts. However, a fraction of NGS reads failed to be mapped back to the reference sequences. Such un-mappable reads are usually imputed to sequencing errors and discarded without further consideration.

Methods: We are investigating possible biological relevance and possible sources of un-mappable reads. Therefore, we developed UMARS to scan for virus genomic fragments or exon-exon junctions of novel alternative splicing isoforms from un-mappable reads. For mapping un-mappable reads, we first collected viral genomes and sequences of exon-exon junctions. Then, we constructed UMARS pipeline as an automatic alignment interface.

Results: By demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS. We first showed that the expected EBV genomic fragments can be detected by UMARS. Second, we also detected exon-exon junctions from un-mappable reads. Further experimental validation also ensured the authenticity of the UMARS pipeline. The UMARS service is freely available to the academic community and can be accessed via http://musk.ibms.sinica.edu.tw/UMARS/.

Conclusions: In this study, we have shown that some un-mappable reads are not caused by sequencing errors. They can originate from viral infection or transcript splicing. Our UMARS pipeline provides another way to examine and recycle the un-mappable reads that are commonly discarded as garbage.

Show MeSH

Related in: MedlinePlus

Collection of exon-exon junctions. By our definition, the EEJs can be continuous or discrete. The former represent known alternative splicing products. The latter, however, represent novel alternative splicing isoforms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3044317&req=5

Figure 1: Collection of exon-exon junctions. By our definition, the EEJs can be continuous or discrete. The former represent known alternative splicing products. The latter, however, represent novel alternative splicing isoforms.

Mentions: During maturation, eukaryotic genes usually undergo messenger RNA splicing, producing many alternative splicing isoforms from one gene. UCSC mapped these splicing isoforms back to genomes, and determined the boundaries and genomic coordinates of exons, recording the information in refFlat files. As shown in Fig. 1, from such coordinate information, we may exactly define the boundaries of exons and introns. Further, we may also define the exon-end fragments at exon’s both termini, start (S) or end (E). By extracting and assembling the exon-end fragments, from continuous or discrete exons, we collected 60-nt exon-exon junctions (EEJs). Therefore, they are either continuous or discrete EEJs, where the former denote known splicing patterns and the latter denote novel ones. As a result, the number of 60-nt EEJs is for each transcript, where n is the number of exon in each transcript. By doing so, we collected 60-nt EEJs for 21 species. The refFlat versions, number of EEJs and scientific names of these 21 species are available in Additional file 1.


UMARS: Un-MAppable Reads Solution.

Li SC, Chan WC, Lai CH, Tsai KW, Hsu CN, Jou YS, Chen HC, Chen CH, Lin WC - BMC Bioinformatics (2011)

Collection of exon-exon junctions. By our definition, the EEJs can be continuous or discrete. The former represent known alternative splicing products. The latter, however, represent novel alternative splicing isoforms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3044317&req=5

Figure 1: Collection of exon-exon junctions. By our definition, the EEJs can be continuous or discrete. The former represent known alternative splicing products. The latter, however, represent novel alternative splicing isoforms.
Mentions: During maturation, eukaryotic genes usually undergo messenger RNA splicing, producing many alternative splicing isoforms from one gene. UCSC mapped these splicing isoforms back to genomes, and determined the boundaries and genomic coordinates of exons, recording the information in refFlat files. As shown in Fig. 1, from such coordinate information, we may exactly define the boundaries of exons and introns. Further, we may also define the exon-end fragments at exon’s both termini, start (S) or end (E). By extracting and assembling the exon-end fragments, from continuous or discrete exons, we collected 60-nt exon-exon junctions (EEJs). Therefore, they are either continuous or discrete EEJs, where the former denote known splicing patterns and the latter denote novel ones. As a result, the number of 60-nt EEJs is for each transcript, where n is the number of exon in each transcript. By doing so, we collected 60-nt EEJs for 21 species. The refFlat versions, number of EEJs and scientific names of these 21 species are available in Additional file 1.

Bottom Line: However, a fraction of NGS reads failed to be mapped back to the reference sequences.By demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS.Our UMARS pipeline provides another way to examine and recycle the un-mappable reads that are commonly discarded as garbage.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

ABSTRACT

Background: Un-MAppable Reads Solution (UMARS) is a user-friendly web service focusing on retrieving valuable information from sequence reads that cannot be mapped back to reference genomes. Recently, next-generation sequencing (NGS) technology has emerged as a powerful tool for generating high-throughput sequencing data and has been applied to many kinds of biological research. In a typical analysis, adaptor-trimmed NGS reads were first mapped back to reference sequences, including genomes or transcripts. However, a fraction of NGS reads failed to be mapped back to the reference sequences. Such un-mappable reads are usually imputed to sequencing errors and discarded without further consideration.

Methods: We are investigating possible biological relevance and possible sources of un-mappable reads. Therefore, we developed UMARS to scan for virus genomic fragments or exon-exon junctions of novel alternative splicing isoforms from un-mappable reads. For mapping un-mappable reads, we first collected viral genomes and sequences of exon-exon junctions. Then, we constructed UMARS pipeline as an automatic alignment interface.

Results: By demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS. We first showed that the expected EBV genomic fragments can be detected by UMARS. Second, we also detected exon-exon junctions from un-mappable reads. Further experimental validation also ensured the authenticity of the UMARS pipeline. The UMARS service is freely available to the academic community and can be accessed via http://musk.ibms.sinica.edu.tw/UMARS/.

Conclusions: In this study, we have shown that some un-mappable reads are not caused by sequencing errors. They can originate from viral infection or transcript splicing. Our UMARS pipeline provides another way to examine and recycle the un-mappable reads that are commonly discarded as garbage.

Show MeSH
Related in: MedlinePlus