Limits...
Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T, Odenwald WF - BMC Genomics (2008)

Bottom Line: An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs.EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA.When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: 1Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA. yavatka@ninds.nih.gov

ABSTRACT

Background: Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.

Results: We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.

Conclusion: EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

Show MeSH

Related in: MedlinePlus

EvoPrinterHD repeat finder algorithm identifies repetitive elements within the input DNA. The repeat finder algorithm superimposes the three highest scoring eBLAT input reference DNA to reference genome alignments to reveal those sequences within the input DNA that are repeated within the input DNA itself and/or elsewhere in the reference genome. Single-copy repeat sequences, identified just once in the second or third highest scoring eBLATs but not in both, are highlighted by blue-colored bases. Multiple (≥ 3 copies) repeats are highlighted with red-colored bases. Shown is a 1,958 bp genomic fragment that flanks the 3' end of the Caenorhabditis elegans egl-26 gene (+5,290 to +7,248 bp from the start of transcription) that was initially part of a 20 kb input DNA repeat finder readout. Note, the single copy repeat (blue-colored) sequences that flank the multi-copy repeat sequences (red-colored) indicate that one of the repeat copies located elsewhere in the reference genome is more homologous to the input DNA repeat sequence than with its other repeat family members.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2268679&req=5

Figure 3: EvoPrinterHD repeat finder algorithm identifies repetitive elements within the input DNA. The repeat finder algorithm superimposes the three highest scoring eBLAT input reference DNA to reference genome alignments to reveal those sequences within the input DNA that are repeated within the input DNA itself and/or elsewhere in the reference genome. Single-copy repeat sequences, identified just once in the second or third highest scoring eBLATs but not in both, are highlighted by blue-colored bases. Multiple (≥ 3 copies) repeats are highlighted with red-colored bases. Shown is a 1,958 bp genomic fragment that flanks the 3' end of the Caenorhabditis elegans egl-26 gene (+5,290 to +7,248 bp from the start of transcription) that was initially part of a 20 kb input DNA repeat finder readout. Note, the single copy repeat (blue-colored) sequences that flank the multi-copy repeat sequences (red-colored) indicate that one of the repeat copies located elsewhere in the reference genome is more homologous to the input DNA repeat sequence than with its other repeat family members.

Mentions: One prominent feature of all bacteria and metazoan genomes is that they harbor diverse populations of repetitive elements that range in copy number from single duplications to thousands of transposable elements dispersed throughout the genome. Given that many of these repeats contain highly conserved sequences that may interfere with alignments between evolutionary distant orthologs, it is important to first identify the repetitive sequence(s) within the reference genome before comparative analysis is considered. To accomplish this, the EvoPrinterHD repeat finder algorithm superimposes the first, second and third highest scoring eBLAT alignments of the input DNA to its own genome and then color-codes the readout to identify single or multiple repeat sequences within the input reference DNA (Figure 3). Sequences that have one additional copy in the reference genome are noted with blue-colored uppercase bases while those that are present three or more times are highlighted with red-colored bases. The algorithm also reveals if one of the multiple repeat sequences is more homologous to the repeat present in the input DNA by highlighting single repeat sequences that flank the core multi-repeat element (Figure 3). By underlining repeat sequences in the EvoPrint and EvoDifference readouts potential 'false positive' alignments that have their origin in repetitive elements are highlighted.


Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T, Odenwald WF - BMC Genomics (2008)

EvoPrinterHD repeat finder algorithm identifies repetitive elements within the input DNA. The repeat finder algorithm superimposes the three highest scoring eBLAT input reference DNA to reference genome alignments to reveal those sequences within the input DNA that are repeated within the input DNA itself and/or elsewhere in the reference genome. Single-copy repeat sequences, identified just once in the second or third highest scoring eBLATs but not in both, are highlighted by blue-colored bases. Multiple (≥ 3 copies) repeats are highlighted with red-colored bases. Shown is a 1,958 bp genomic fragment that flanks the 3' end of the Caenorhabditis elegans egl-26 gene (+5,290 to +7,248 bp from the start of transcription) that was initially part of a 20 kb input DNA repeat finder readout. Note, the single copy repeat (blue-colored) sequences that flank the multi-copy repeat sequences (red-colored) indicate that one of the repeat copies located elsewhere in the reference genome is more homologous to the input DNA repeat sequence than with its other repeat family members.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2268679&req=5

Figure 3: EvoPrinterHD repeat finder algorithm identifies repetitive elements within the input DNA. The repeat finder algorithm superimposes the three highest scoring eBLAT input reference DNA to reference genome alignments to reveal those sequences within the input DNA that are repeated within the input DNA itself and/or elsewhere in the reference genome. Single-copy repeat sequences, identified just once in the second or third highest scoring eBLATs but not in both, are highlighted by blue-colored bases. Multiple (≥ 3 copies) repeats are highlighted with red-colored bases. Shown is a 1,958 bp genomic fragment that flanks the 3' end of the Caenorhabditis elegans egl-26 gene (+5,290 to +7,248 bp from the start of transcription) that was initially part of a 20 kb input DNA repeat finder readout. Note, the single copy repeat (blue-colored) sequences that flank the multi-copy repeat sequences (red-colored) indicate that one of the repeat copies located elsewhere in the reference genome is more homologous to the input DNA repeat sequence than with its other repeat family members.
Mentions: One prominent feature of all bacteria and metazoan genomes is that they harbor diverse populations of repetitive elements that range in copy number from single duplications to thousands of transposable elements dispersed throughout the genome. Given that many of these repeats contain highly conserved sequences that may interfere with alignments between evolutionary distant orthologs, it is important to first identify the repetitive sequence(s) within the reference genome before comparative analysis is considered. To accomplish this, the EvoPrinterHD repeat finder algorithm superimposes the first, second and third highest scoring eBLAT alignments of the input DNA to its own genome and then color-codes the readout to identify single or multiple repeat sequences within the input reference DNA (Figure 3). Sequences that have one additional copy in the reference genome are noted with blue-colored uppercase bases while those that are present three or more times are highlighted with red-colored bases. The algorithm also reveals if one of the multiple repeat sequences is more homologous to the repeat present in the input DNA by highlighting single repeat sequences that flank the core multi-repeat element (Figure 3). By underlining repeat sequences in the EvoPrint and EvoDifference readouts potential 'false positive' alignments that have their origin in repetitive elements are highlighted.

Bottom Line: An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs.EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA.When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: 1Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA. yavatka@ninds.nih.gov

ABSTRACT

Background: Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.

Results: We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.

Conclusion: EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

Show MeSH
Related in: MedlinePlus