Limits...
Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish.

Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G - Nucleic Acids Res. (2013)

Bottom Line: Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs.We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets.Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

ABSTRACT
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Show MeSH
Transitivity can reveal orthology between distant genomes that is not directly visible. (A) Illustration of the transitivity principle. A zebrafish locus aligns to chicken but not directly to human. However, the chicken locus does align to human, allowing us to infer orthology and anchor an alignment between zebrafish and human. Conceptually, transitivity mimics a multiple alignment using the intermediate species as the reference species. (B) Sequence identity of zebrafish—human/mouse alignments, separating those alignments found only using transitivity (blue) and those directly aligning in the syntenic multiple alignments (gray), suggests that transitivity-inferred alignments also evolve under clear purifying selection. (C) An example where zebrafish has a weaker alignment to human that is not detected in the genome-wide pipeline. However, an anchored alignment using chicken as an intermediate species shows clear orthology between the diverged zebrafish and human sequence. (D) The CNE shown in (C) is in synteny with the PTPRE gene in all three species.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753653&req=5

gkt557-F3: Transitivity can reveal orthology between distant genomes that is not directly visible. (A) Illustration of the transitivity principle. A zebrafish locus aligns to chicken but not directly to human. However, the chicken locus does align to human, allowing us to infer orthology and anchor an alignment between zebrafish and human. Conceptually, transitivity mimics a multiple alignment using the intermediate species as the reference species. (B) Sequence identity of zebrafish—human/mouse alignments, separating those alignments found only using transitivity (blue) and those directly aligning in the syntenic multiple alignments (gray), suggests that transitivity-inferred alignments also evolve under clear purifying selection. (C) An example where zebrafish has a weaker alignment to human that is not detected in the genome-wide pipeline. However, an anchored alignment using chicken as an intermediate species shows clear orthology between the diverged zebrafish and human sequence. (D) The CNE shown in (C) is in synteny with the PTPRE gene in all three species.

Mentions: Transitivity infers orthology between zebrafish and human/mouse sequences that do not align directly but do share orthology to a common sequence in a third related species (Figure 3) (21,45). For each CNE, we obtained the genomic coordinates for any intermediate species from the multiple alignments (first transitive step). Then, we used the UCSC syntenic (orthology) liftOver chains in search of a syntenic alignment between the intermediate species and human/mouse, using liftOver with –minMatch = 0.7 (second transitive step). This procedure was repeated for all intermediate species (all tetrapods that align to zebrafish in our multiple alignments). Finally, as a quality-control measure, we only inferred zebrafish—human/mouse homology if the CNE mapped to the same location in the human/mouse genome for all available intermediate species.Figure 3.


Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish.

Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G - Nucleic Acids Res. (2013)

Transitivity can reveal orthology between distant genomes that is not directly visible. (A) Illustration of the transitivity principle. A zebrafish locus aligns to chicken but not directly to human. However, the chicken locus does align to human, allowing us to infer orthology and anchor an alignment between zebrafish and human. Conceptually, transitivity mimics a multiple alignment using the intermediate species as the reference species. (B) Sequence identity of zebrafish—human/mouse alignments, separating those alignments found only using transitivity (blue) and those directly aligning in the syntenic multiple alignments (gray), suggests that transitivity-inferred alignments also evolve under clear purifying selection. (C) An example where zebrafish has a weaker alignment to human that is not detected in the genome-wide pipeline. However, an anchored alignment using chicken as an intermediate species shows clear orthology between the diverged zebrafish and human sequence. (D) The CNE shown in (C) is in synteny with the PTPRE gene in all three species.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753653&req=5

gkt557-F3: Transitivity can reveal orthology between distant genomes that is not directly visible. (A) Illustration of the transitivity principle. A zebrafish locus aligns to chicken but not directly to human. However, the chicken locus does align to human, allowing us to infer orthology and anchor an alignment between zebrafish and human. Conceptually, transitivity mimics a multiple alignment using the intermediate species as the reference species. (B) Sequence identity of zebrafish—human/mouse alignments, separating those alignments found only using transitivity (blue) and those directly aligning in the syntenic multiple alignments (gray), suggests that transitivity-inferred alignments also evolve under clear purifying selection. (C) An example where zebrafish has a weaker alignment to human that is not detected in the genome-wide pipeline. However, an anchored alignment using chicken as an intermediate species shows clear orthology between the diverged zebrafish and human sequence. (D) The CNE shown in (C) is in synteny with the PTPRE gene in all three species.
Mentions: Transitivity infers orthology between zebrafish and human/mouse sequences that do not align directly but do share orthology to a common sequence in a third related species (Figure 3) (21,45). For each CNE, we obtained the genomic coordinates for any intermediate species from the multiple alignments (first transitive step). Then, we used the UCSC syntenic (orthology) liftOver chains in search of a syntenic alignment between the intermediate species and human/mouse, using liftOver with –minMatch = 0.7 (second transitive step). This procedure was repeated for all intermediate species (all tetrapods that align to zebrafish in our multiple alignments). Finally, as a quality-control measure, we only inferred zebrafish—human/mouse homology if the CNE mapped to the same location in the human/mouse genome for all available intermediate species.Figure 3.

Bottom Line: Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs.We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets.Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

ABSTRACT
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Show MeSH