Limits...
Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T, Odenwald WF - BMC Genomics (2008)

Bottom Line: An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs.EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA.When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: 1Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA. yavatka@ninds.nih.gov

ABSTRACT

Background: Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.

Results: We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.

Conclusion: EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

Show MeSH

Related in: MedlinePlus

Intra-species ceBLATs and composite-EvoPrints identify conserved sequences within the input reference DNA that have rearranged in the aligning regions of other genomes. A) Shown is a D. melanogaster (reference DNA) to D. virilis ceBLAT alignment that spans a 3,570 bp sequence located upstream of the fushi tarazu gene (-7184 to -3,434 bp from its transcription start). Black-colored uppercase nucleotides represent aligning bases found only in the highest scoring D. virilis eBLAT alignment, green-colored bases identify aligning bases that are unique to the second highest scoring alignment and blue-colored bases are aligning bases unique to the third highest score eBLAT alignment. B) Shown is an EvoPrint of the input D. melanogaster sequence shown in (A) that was generated with ceBLATs of the D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. mojavensis, D. grimshawi and D. willistoni alignments. C) The accompanying EvoDifferences profile of the EvoPrint shown in (B). Black uppercase letters are aligning bases shared by all species examined. Colored uppercase letters, which denote individual species, represent sequences that were not aligned in the ceBLAT for just one of the genomes included in the analysis (D. simulans, teal; D. sechellia, dark-red; D. yakuba, brown; D. erecta, light-blue; D. ananassae, orange; D. pseudoobscura, pink; D. virilis, blue; D. mojavensis, green; or D. grimshawi, red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2268679&req=5

Figure 5: Intra-species ceBLATs and composite-EvoPrints identify conserved sequences within the input reference DNA that have rearranged in the aligning regions of other genomes. A) Shown is a D. melanogaster (reference DNA) to D. virilis ceBLAT alignment that spans a 3,570 bp sequence located upstream of the fushi tarazu gene (-7184 to -3,434 bp from its transcription start). Black-colored uppercase nucleotides represent aligning bases found only in the highest scoring D. virilis eBLAT alignment, green-colored bases identify aligning bases that are unique to the second highest scoring alignment and blue-colored bases are aligning bases unique to the third highest score eBLAT alignment. B) Shown is an EvoPrint of the input D. melanogaster sequence shown in (A) that was generated with ceBLATs of the D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. mojavensis, D. grimshawi and D. willistoni alignments. C) The accompanying EvoDifferences profile of the EvoPrint shown in (B). Black uppercase letters are aligning bases shared by all species examined. Colored uppercase letters, which denote individual species, represent sequences that were not aligned in the ceBLAT for just one of the genomes included in the analysis (D. simulans, teal; D. sechellia, dark-red; D. yakuba, brown; D. erecta, light-blue; D. ananassae, orange; D. pseudoobscura, pink; D. virilis, blue; D. mojavensis, green; or D. grimshawi, red).

Mentions: Once the initial eBLAT alignments are completed, the EvoPrinterHD intra-genomic comparative algorithm automatically determines: (1) the number of aligning bases in the second and third eBLAT alignments that are not identified in the first (highest scoring) alignment for each species, called the "R" value indicating putative rearrangements in the test species, (2) the number of aligning bases in the second and third alignments that are also aligning in the highest score alignment, termed the "D" value for putative duplications, and (3) the number of aligning bases that are shared by all three alignments, indicating conserved sequences within putative repetitive elements. For example, the alignment scorecard of a D. melanogaster 3,570 bp input reference sequence, located 6 kb 5' to the fushi tarazu gene, reveals that 5 of the 11 species included in the analysis have undergone putative rearrangements in their aligning regions compared to the reference genome (Figure 4). The rearrangements within 4 of the 5 genomes (D. mojavensis, D. grimshawi, D. willistoni and D. virilis) flank the aligning bases in each of their highest score aligning regions (noted by the color coded number in the R column) (Figure 4). ceBLATs of these 5 species identified that each contained at least two different MCS rearrangements relative to the input D. melanogaster reference DNA (Figure 5A and data not shown).


Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis.

Yavatkar AS, Lin Y, Ross J, Fann Y, Brody T, Odenwald WF - BMC Genomics (2008)

Intra-species ceBLATs and composite-EvoPrints identify conserved sequences within the input reference DNA that have rearranged in the aligning regions of other genomes. A) Shown is a D. melanogaster (reference DNA) to D. virilis ceBLAT alignment that spans a 3,570 bp sequence located upstream of the fushi tarazu gene (-7184 to -3,434 bp from its transcription start). Black-colored uppercase nucleotides represent aligning bases found only in the highest scoring D. virilis eBLAT alignment, green-colored bases identify aligning bases that are unique to the second highest scoring alignment and blue-colored bases are aligning bases unique to the third highest score eBLAT alignment. B) Shown is an EvoPrint of the input D. melanogaster sequence shown in (A) that was generated with ceBLATs of the D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. mojavensis, D. grimshawi and D. willistoni alignments. C) The accompanying EvoDifferences profile of the EvoPrint shown in (B). Black uppercase letters are aligning bases shared by all species examined. Colored uppercase letters, which denote individual species, represent sequences that were not aligned in the ceBLAT for just one of the genomes included in the analysis (D. simulans, teal; D. sechellia, dark-red; D. yakuba, brown; D. erecta, light-blue; D. ananassae, orange; D. pseudoobscura, pink; D. virilis, blue; D. mojavensis, green; or D. grimshawi, red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2268679&req=5

Figure 5: Intra-species ceBLATs and composite-EvoPrints identify conserved sequences within the input reference DNA that have rearranged in the aligning regions of other genomes. A) Shown is a D. melanogaster (reference DNA) to D. virilis ceBLAT alignment that spans a 3,570 bp sequence located upstream of the fushi tarazu gene (-7184 to -3,434 bp from its transcription start). Black-colored uppercase nucleotides represent aligning bases found only in the highest scoring D. virilis eBLAT alignment, green-colored bases identify aligning bases that are unique to the second highest scoring alignment and blue-colored bases are aligning bases unique to the third highest score eBLAT alignment. B) Shown is an EvoPrint of the input D. melanogaster sequence shown in (A) that was generated with ceBLATs of the D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. mojavensis, D. grimshawi and D. willistoni alignments. C) The accompanying EvoDifferences profile of the EvoPrint shown in (B). Black uppercase letters are aligning bases shared by all species examined. Colored uppercase letters, which denote individual species, represent sequences that were not aligned in the ceBLAT for just one of the genomes included in the analysis (D. simulans, teal; D. sechellia, dark-red; D. yakuba, brown; D. erecta, light-blue; D. ananassae, orange; D. pseudoobscura, pink; D. virilis, blue; D. mojavensis, green; or D. grimshawi, red).
Mentions: Once the initial eBLAT alignments are completed, the EvoPrinterHD intra-genomic comparative algorithm automatically determines: (1) the number of aligning bases in the second and third eBLAT alignments that are not identified in the first (highest scoring) alignment for each species, called the "R" value indicating putative rearrangements in the test species, (2) the number of aligning bases in the second and third alignments that are also aligning in the highest score alignment, termed the "D" value for putative duplications, and (3) the number of aligning bases that are shared by all three alignments, indicating conserved sequences within putative repetitive elements. For example, the alignment scorecard of a D. melanogaster 3,570 bp input reference sequence, located 6 kb 5' to the fushi tarazu gene, reveals that 5 of the 11 species included in the analysis have undergone putative rearrangements in their aligning regions compared to the reference genome (Figure 4). The rearrangements within 4 of the 5 genomes (D. mojavensis, D. grimshawi, D. willistoni and D. virilis) flank the aligning bases in each of their highest score aligning regions (noted by the color coded number in the R column) (Figure 4). ceBLATs of these 5 species identified that each contained at least two different MCS rearrangements relative to the input D. melanogaster reference DNA (Figure 5A and data not shown).

Bottom Line: An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs.EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA.When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: 1Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA. yavatka@ninds.nih.gov

ABSTRACT

Background: Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed EvoPrinter, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An EvoPrint reveals with near base-pair resolution those sequences that are essential for gene function.

Results: We describe here EvoPrinterHD, a 2nd-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 Drosophila, 20 vertebrate, 17 Staphylococcus and 20 enteric bacteria genomes, EvoPrinterHD employs a modified BLAT algorithm [enhanced-BLAT (eBLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier EvoPrinter program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. EvoPrinterHD currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An EvoDifferences profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, EvoPrinterHD incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.

Conclusion: EvoPrinterHD is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with cis-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, EvoPrinterHD facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.

Show MeSH
Related in: MedlinePlus