Limits...
Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T, Odenwald WF - BMC Genomics (2008)

Bottom Line: An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs.EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA.When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: 1Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA. yavatka@ninds.nih.gov

ABSTRACT

Background: Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.

Results: We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.

Conclusion: EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

Show MeSH

Related in: MedlinePlus

EvoPrints generated with eBLAT alignments reveal additional conserved sequences when compared to the original method. A) Shown is a composite EvoPrint of the Drosophila melanogaster Krüppel central domain (CD2) enhancer region generated by superimposing an EvoPrint generated from eBLAT alignments and a second prepared from BLAT alignments. Pairwise alignments between D. melanogaster and D. sechellia, D. simulans, D. erecta, D. yakuba, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. willistoni, D. mojavensis and D. grimshawi were used to generate both EvoPrints. Conserved sequences identified by both procedures are shown as uppercase black nucleotides and yellow highlighted nucleotides represent the additional sequences recognized by EvoPrinterHD. The boxed region contains the cis-regulatory DNA required for enhancer function as determined by Hoch et al. [9]. B) An EvoDifferences profile identifies those DNA sequences that are shared by all but one of the species included in the analysis. As in the EvoPrint, black uppercase letters indicate sequences shared by all species and colored uppercase letters, which denote individual species, represent sequences that were not detected by the eBLAT alignment for just one of the genomes included in the EvoPrint analysis (D. erecta, dark-red; D. yakuba, teal; D. pseudoobscura, light-blue; D. persimilis, brown; D. ananassae, pink; D. virilis, orange; D. willistoni, blue; D. mojavensis, green; or D. grimshawi, red). The underline indicates the region of the EvoDifferences profile that is compared with the alignments obtained from the UCSC genome browser (shown in panel C). C) Comparison of the EvoDifferences profile with the UCSC genome alignments. Shown is the underlined sequence in panel (B) aligned to the corresponding alignments obtained at the Drosophila UCSC comparative genome bioinformatics web site.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2268679&req=5

Figure 2: EvoPrints generated with eBLAT alignments reveal additional conserved sequences when compared to the original method. A) Shown is a composite EvoPrint of the Drosophila melanogaster Krüppel central domain (CD2) enhancer region generated by superimposing an EvoPrint generated from eBLAT alignments and a second prepared from BLAT alignments. Pairwise alignments between D. melanogaster and D. sechellia, D. simulans, D. erecta, D. yakuba, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. willistoni, D. mojavensis and D. grimshawi were used to generate both EvoPrints. Conserved sequences identified by both procedures are shown as uppercase black nucleotides and yellow highlighted nucleotides represent the additional sequences recognized by EvoPrinterHD. The boxed region contains the cis-regulatory DNA required for enhancer function as determined by Hoch et al. [9]. B) An EvoDifferences profile identifies those DNA sequences that are shared by all but one of the species included in the analysis. As in the EvoPrint, black uppercase letters indicate sequences shared by all species and colored uppercase letters, which denote individual species, represent sequences that were not detected by the eBLAT alignment for just one of the genomes included in the EvoPrint analysis (D. erecta, dark-red; D. yakuba, teal; D. pseudoobscura, light-blue; D. persimilis, brown; D. ananassae, pink; D. virilis, orange; D. willistoni, blue; D. mojavensis, green; or D. grimshawi, red). The underline indicates the region of the EvoDifferences profile that is compared with the alignments obtained from the UCSC genome browser (shown in panel C). C) Comparison of the EvoDifferences profile with the UCSC genome alignments. Shown is the underlined sequence in panel (B) aligned to the corresponding alignments obtained at the Drosophila UCSC comparative genome bioinformatics web site.

Mentions: Another measure of eBLAT efficacy in identifying evolutionary conservation is to compare the detection of conserved sequences when eBLAT vs. BLAT alignments are used to generate an EvoPrint. To demonstrate the increased alignment sensitivity of eBLAT over BLAT in the EvoPrint analysis, the Drosophila melanogaster Krüppel central domain enhancer [19] was EvoPrinted using 11 of the Drosophila species (Figure 2A). The original EvoPrinter (which uses the BLAT algorithm) detected a total of 169 conserved bases compared with 254 conserved bases identified with an eBLAT generated EvoPrint – a 50% increase in alignment recognition. In addition, the EvoDifferences profile identified additional bases (shown in color) that are conserved in all but one of the genomes used to generate the EvoPrint (Figure 2B and see below).


Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T, Odenwald WF - BMC Genomics (2008)

EvoPrints generated with eBLAT alignments reveal additional conserved sequences when compared to the original method. A) Shown is a composite EvoPrint of the Drosophila melanogaster Krüppel central domain (CD2) enhancer region generated by superimposing an EvoPrint generated from eBLAT alignments and a second prepared from BLAT alignments. Pairwise alignments between D. melanogaster and D. sechellia, D. simulans, D. erecta, D. yakuba, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. willistoni, D. mojavensis and D. grimshawi were used to generate both EvoPrints. Conserved sequences identified by both procedures are shown as uppercase black nucleotides and yellow highlighted nucleotides represent the additional sequences recognized by EvoPrinterHD. The boxed region contains the cis-regulatory DNA required for enhancer function as determined by Hoch et al. [9]. B) An EvoDifferences profile identifies those DNA sequences that are shared by all but one of the species included in the analysis. As in the EvoPrint, black uppercase letters indicate sequences shared by all species and colored uppercase letters, which denote individual species, represent sequences that were not detected by the eBLAT alignment for just one of the genomes included in the EvoPrint analysis (D. erecta, dark-red; D. yakuba, teal; D. pseudoobscura, light-blue; D. persimilis, brown; D. ananassae, pink; D. virilis, orange; D. willistoni, blue; D. mojavensis, green; or D. grimshawi, red). The underline indicates the region of the EvoDifferences profile that is compared with the alignments obtained from the UCSC genome browser (shown in panel C). C) Comparison of the EvoDifferences profile with the UCSC genome alignments. Shown is the underlined sequence in panel (B) aligned to the corresponding alignments obtained at the Drosophila UCSC comparative genome bioinformatics web site.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2268679&req=5

Figure 2: EvoPrints generated with eBLAT alignments reveal additional conserved sequences when compared to the original method. A) Shown is a composite EvoPrint of the Drosophila melanogaster Krüppel central domain (CD2) enhancer region generated by superimposing an EvoPrint generated from eBLAT alignments and a second prepared from BLAT alignments. Pairwise alignments between D. melanogaster and D. sechellia, D. simulans, D. erecta, D. yakuba, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. willistoni, D. mojavensis and D. grimshawi were used to generate both EvoPrints. Conserved sequences identified by both procedures are shown as uppercase black nucleotides and yellow highlighted nucleotides represent the additional sequences recognized by EvoPrinterHD. The boxed region contains the cis-regulatory DNA required for enhancer function as determined by Hoch et al. [9]. B) An EvoDifferences profile identifies those DNA sequences that are shared by all but one of the species included in the analysis. As in the EvoPrint, black uppercase letters indicate sequences shared by all species and colored uppercase letters, which denote individual species, represent sequences that were not detected by the eBLAT alignment for just one of the genomes included in the EvoPrint analysis (D. erecta, dark-red; D. yakuba, teal; D. pseudoobscura, light-blue; D. persimilis, brown; D. ananassae, pink; D. virilis, orange; D. willistoni, blue; D. mojavensis, green; or D. grimshawi, red). The underline indicates the region of the EvoDifferences profile that is compared with the alignments obtained from the UCSC genome browser (shown in panel C). C) Comparison of the EvoDifferences profile with the UCSC genome alignments. Shown is the underlined sequence in panel (B) aligned to the corresponding alignments obtained at the Drosophila UCSC comparative genome bioinformatics web site.
Mentions: Another measure of eBLAT efficacy in identifying evolutionary conservation is to compare the detection of conserved sequences when eBLAT vs. BLAT alignments are used to generate an EvoPrint. To demonstrate the increased alignment sensitivity of eBLAT over BLAT in the EvoPrint analysis, the Drosophila melanogaster Krüppel central domain enhancer [19] was EvoPrinted using 11 of the Drosophila species (Figure 2A). The original EvoPrinter (which uses the BLAT algorithm) detected a total of 169 conserved bases compared with 254 conserved bases identified with an eBLAT generated EvoPrint – a 50% increase in alignment recognition. In addition, the EvoDifferences profile identified additional bases (shown in color) that are conserved in all but one of the genomes used to generate the EvoPrint (Figure 2B and see below).

Bottom Line: An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs.EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA.When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: 1Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA. yavatka@ninds.nih.gov

ABSTRACT

Background: Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.

Results: We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.

Conclusion: EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

Show MeSH
Related in: MedlinePlus