Limits...
Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish.

Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G - Nucleic Acids Res. (2013)

Bottom Line: Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs.We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets.Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

ABSTRACT
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Show MeSH
Ancestral reconstruction reveals additional CNE alignments between distant species. (A) Large evolutionary distances between zebrafish and human/mouse can be substantially reduced if (B) reconstructed ancestral sequences are aligned. The phylogenetic tree contains the species used to reconstruct the percomorph and mammalian ancestor. Species used as outgroups are in blue in (B). (C) Sequence identity of zebrafish–human alignments is shown for CNEs that align to human in our multiple alignment and for 1262 CNEs where ancestral reconstruction but not direct alignment detects conservation to human (630 align to a tetrapod but not human in our multiple alignment; 632 have no alignment to any vertebrate). Although alignments detected only using reconstruction have lower sequence identities, even values ∼50% indicate clear conservation between species separated by ≥1.8 neutral substitutions per site. (D) An example where conservation within teleosts and within tetrapods can be used to reconstruct the percomorph and mammalian ancestor of the CNE (1 and 2). The reconstructed ancestral sequences align with high enough sequence identity to detect orthology and anchor an alignment between the human and zebrafish CNEs not visible otherwise (3). The CNE shares conserved synteny with the same putative target gene (4). Blue background is identity to the ancestor in (1 and 2) and sequence identity in (3).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753653&req=5

gkt557-F4: Ancestral reconstruction reveals additional CNE alignments between distant species. (A) Large evolutionary distances between zebrafish and human/mouse can be substantially reduced if (B) reconstructed ancestral sequences are aligned. The phylogenetic tree contains the species used to reconstruct the percomorph and mammalian ancestor. Species used as outgroups are in blue in (B). (C) Sequence identity of zebrafish–human alignments is shown for CNEs that align to human in our multiple alignment and for 1262 CNEs where ancestral reconstruction but not direct alignment detects conservation to human (630 align to a tetrapod but not human in our multiple alignment; 632 have no alignment to any vertebrate). Although alignments detected only using reconstruction have lower sequence identities, even values ∼50% indicate clear conservation between species separated by ≥1.8 neutral substitutions per site. (D) An example where conservation within teleosts and within tetrapods can be used to reconstruct the percomorph and mammalian ancestor of the CNE (1 and 2). The reconstructed ancestral sequences align with high enough sequence identity to detect orthology and anchor an alignment between the human and zebrafish CNEs not visible otherwise (3). The CNE shares conserved synteny with the same putative target gene (4). Blue background is identity to the ancestor in (1 and 2) and sequence identity in (3).

Mentions: We also used ancestral sequence reconstruction (47,48) as an additional approach to reduce large evolutionary distances by aligning reconstructed percomorph ancestral zCNE sequences to reconstructed mammalian ancestral CNE sequences (Figure 4). As the evolutionary distance between the percomorph and mammalian ancestors is only 1.04 neutral subs. per site, much shorter than the distance of 1.8 (2.01) subs. per site between zebrafish and human (mouse) (Figure 4), we were hoping to detect additional alignments.Figure 4.


Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish.

Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G - Nucleic Acids Res. (2013)

Ancestral reconstruction reveals additional CNE alignments between distant species. (A) Large evolutionary distances between zebrafish and human/mouse can be substantially reduced if (B) reconstructed ancestral sequences are aligned. The phylogenetic tree contains the species used to reconstruct the percomorph and mammalian ancestor. Species used as outgroups are in blue in (B). (C) Sequence identity of zebrafish–human alignments is shown for CNEs that align to human in our multiple alignment and for 1262 CNEs where ancestral reconstruction but not direct alignment detects conservation to human (630 align to a tetrapod but not human in our multiple alignment; 632 have no alignment to any vertebrate). Although alignments detected only using reconstruction have lower sequence identities, even values ∼50% indicate clear conservation between species separated by ≥1.8 neutral substitutions per site. (D) An example where conservation within teleosts and within tetrapods can be used to reconstruct the percomorph and mammalian ancestor of the CNE (1 and 2). The reconstructed ancestral sequences align with high enough sequence identity to detect orthology and anchor an alignment between the human and zebrafish CNEs not visible otherwise (3). The CNE shares conserved synteny with the same putative target gene (4). Blue background is identity to the ancestor in (1 and 2) and sequence identity in (3).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753653&req=5

gkt557-F4: Ancestral reconstruction reveals additional CNE alignments between distant species. (A) Large evolutionary distances between zebrafish and human/mouse can be substantially reduced if (B) reconstructed ancestral sequences are aligned. The phylogenetic tree contains the species used to reconstruct the percomorph and mammalian ancestor. Species used as outgroups are in blue in (B). (C) Sequence identity of zebrafish–human alignments is shown for CNEs that align to human in our multiple alignment and for 1262 CNEs where ancestral reconstruction but not direct alignment detects conservation to human (630 align to a tetrapod but not human in our multiple alignment; 632 have no alignment to any vertebrate). Although alignments detected only using reconstruction have lower sequence identities, even values ∼50% indicate clear conservation between species separated by ≥1.8 neutral substitutions per site. (D) An example where conservation within teleosts and within tetrapods can be used to reconstruct the percomorph and mammalian ancestor of the CNE (1 and 2). The reconstructed ancestral sequences align with high enough sequence identity to detect orthology and anchor an alignment between the human and zebrafish CNEs not visible otherwise (3). The CNE shares conserved synteny with the same putative target gene (4). Blue background is identity to the ancestor in (1 and 2) and sequence identity in (3).
Mentions: We also used ancestral sequence reconstruction (47,48) as an additional approach to reduce large evolutionary distances by aligning reconstructed percomorph ancestral zCNE sequences to reconstructed mammalian ancestral CNE sequences (Figure 4). As the evolutionary distance between the percomorph and mammalian ancestors is only 1.04 neutral subs. per site, much shorter than the distance of 1.8 (2.01) subs. per site between zebrafish and human (mouse) (Figure 4), we were hoping to detect additional alignments.Figure 4.

Bottom Line: Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs.We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets.Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

ABSTRACT
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Show MeSH