Limits...
Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish.

Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G - Nucleic Acids Res. (2013)

Bottom Line: Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs.We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets.Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

ABSTRACT
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Show MeSH
Zebrafish is currently evolutionarily distant from all other available fish genomes. (A) Phylogeny with branch lengths and clade groupings (solid lines only). The ‘mousefish’, a desirable but currently unavailable teleost genome at human—mouse distance, is discussed in the text. Apart from zebrafish, frog (1.49 subs/site to chicken), lamprey (1.76 subs/site to zebrafish), amphioxus (>2.5 subs/site to lamprey) and C. elegans (1.07 subs/site to Caenorhabditis remanei) are also shown to have phylogenetically isolated genomes. Molecular distances were taken from the UCSC genome browser (28) for the hg18, braFlo1 and ce10 assemblies. (B) Evolutionary distances (neutral substitutions per site) between zebrafish (left) and human (right) to other sequenced species. In contrast to human, the zebrafish genome occupies a phylogenetic outgroup position with the closest sequenced teleosts at a distance of 1.25–1.41 subs/site, which exceeds the distance between the human and chicken genome (1.08 subs/site). (C) The portion of CNEs conserved to mouse that can be discovered in comparisons between human and evolutionarily more distant species can be used to estimate the fraction of zebrafish CNEs visible using the current availability of genomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753653&req=5

gkt557-F1: Zebrafish is currently evolutionarily distant from all other available fish genomes. (A) Phylogeny with branch lengths and clade groupings (solid lines only). The ‘mousefish’, a desirable but currently unavailable teleost genome at human—mouse distance, is discussed in the text. Apart from zebrafish, frog (1.49 subs/site to chicken), lamprey (1.76 subs/site to zebrafish), amphioxus (>2.5 subs/site to lamprey) and C. elegans (1.07 subs/site to Caenorhabditis remanei) are also shown to have phylogenetically isolated genomes. Molecular distances were taken from the UCSC genome browser (28) for the hg18, braFlo1 and ce10 assemblies. (B) Evolutionary distances (neutral substitutions per site) between zebrafish (left) and human (right) to other sequenced species. In contrast to human, the zebrafish genome occupies a phylogenetic outgroup position with the closest sequenced teleosts at a distance of 1.25–1.41 subs/site, which exceeds the distance between the human and chicken genome (1.08 subs/site). (C) The portion of CNEs conserved to mouse that can be discovered in comparisons between human and evolutionarily more distant species can be used to estimate the fraction of zebrafish CNEs visible using the current availability of genomes.

Mentions: Although human, mouse, Drosophila and other species are in the desirable situation of being accompanied by genomes of both evolutionarily close and distant species, many important genomes are phylogenetically isolated in that comparative genomics is restricted to using genomes of other species that are evolutionarily distant, operationally defined here as a distance exceeding 1 neutral substitution per site. Examples include zebrafish (see later in the text), frog, lamprey, amphioxus, sea urchin, hydra, sea anemone and sponges, which are all important models for developmental biology, regeneration, stem cell biology or evolutionary biology (22–27) (Figure 1). Even one of the most important model organisms, Caenorhabditis elegans, is separated from other sequenced nematodes by >1 neutral substitution per site (Figure 1), prompting the community to sequence evolutionarily closer species (29). Finally, some species of interest for evolutionary research, such as the coelacanth or the tuatara, have only one known surviving sister species in their order, and will thus remain in phylogenetic isolation indefinitely. This phylogenetic isolation hampers comparative analysis and results in poor genome annotation.Figure 1.


Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish.

Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G - Nucleic Acids Res. (2013)

Zebrafish is currently evolutionarily distant from all other available fish genomes. (A) Phylogeny with branch lengths and clade groupings (solid lines only). The ‘mousefish’, a desirable but currently unavailable teleost genome at human—mouse distance, is discussed in the text. Apart from zebrafish, frog (1.49 subs/site to chicken), lamprey (1.76 subs/site to zebrafish), amphioxus (>2.5 subs/site to lamprey) and C. elegans (1.07 subs/site to Caenorhabditis remanei) are also shown to have phylogenetically isolated genomes. Molecular distances were taken from the UCSC genome browser (28) for the hg18, braFlo1 and ce10 assemblies. (B) Evolutionary distances (neutral substitutions per site) between zebrafish (left) and human (right) to other sequenced species. In contrast to human, the zebrafish genome occupies a phylogenetic outgroup position with the closest sequenced teleosts at a distance of 1.25–1.41 subs/site, which exceeds the distance between the human and chicken genome (1.08 subs/site). (C) The portion of CNEs conserved to mouse that can be discovered in comparisons between human and evolutionarily more distant species can be used to estimate the fraction of zebrafish CNEs visible using the current availability of genomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753653&req=5

gkt557-F1: Zebrafish is currently evolutionarily distant from all other available fish genomes. (A) Phylogeny with branch lengths and clade groupings (solid lines only). The ‘mousefish’, a desirable but currently unavailable teleost genome at human—mouse distance, is discussed in the text. Apart from zebrafish, frog (1.49 subs/site to chicken), lamprey (1.76 subs/site to zebrafish), amphioxus (>2.5 subs/site to lamprey) and C. elegans (1.07 subs/site to Caenorhabditis remanei) are also shown to have phylogenetically isolated genomes. Molecular distances were taken from the UCSC genome browser (28) for the hg18, braFlo1 and ce10 assemblies. (B) Evolutionary distances (neutral substitutions per site) between zebrafish (left) and human (right) to other sequenced species. In contrast to human, the zebrafish genome occupies a phylogenetic outgroup position with the closest sequenced teleosts at a distance of 1.25–1.41 subs/site, which exceeds the distance between the human and chicken genome (1.08 subs/site). (C) The portion of CNEs conserved to mouse that can be discovered in comparisons between human and evolutionarily more distant species can be used to estimate the fraction of zebrafish CNEs visible using the current availability of genomes.
Mentions: Although human, mouse, Drosophila and other species are in the desirable situation of being accompanied by genomes of both evolutionarily close and distant species, many important genomes are phylogenetically isolated in that comparative genomics is restricted to using genomes of other species that are evolutionarily distant, operationally defined here as a distance exceeding 1 neutral substitution per site. Examples include zebrafish (see later in the text), frog, lamprey, amphioxus, sea urchin, hydra, sea anemone and sponges, which are all important models for developmental biology, regeneration, stem cell biology or evolutionary biology (22–27) (Figure 1). Even one of the most important model organisms, Caenorhabditis elegans, is separated from other sequenced nematodes by >1 neutral substitution per site (Figure 1), prompting the community to sequence evolutionarily closer species (29). Finally, some species of interest for evolutionary research, such as the coelacanth or the tuatara, have only one known surviving sister species in their order, and will thus remain in phylogenetic isolation indefinitely. This phylogenetic isolation hampers comparative analysis and results in poor genome annotation.Figure 1.

Bottom Line: Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs.We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets.Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

ABSTRACT
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Show MeSH