Limits...
A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis.

Gilchrist MJ, Sobral D, Khoueiry P, Daian F, Laporte B, Patrushev I, Matsumoto J, Dewar K, Hastings KE, Satou Y, Lemaire P, Rothbächer U - Dev. Biol. (2015)

Bottom Line: Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes.It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms.It contains 19,163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes.

View Article: PubMed Central - PubMed

Affiliation: Gurdon Institute, Cambridge University, Cambridge, United Kingdom. Electronic address: mike.gilchrist@crick.ac.uk.

Show MeSH

Related in: MedlinePlus

Coding genome of Ciona intestinalis. (A). Phylogenetic position of Ciona intestinalis relative to major model organisms, with branch length indicating degree of amino acid divergence (adapted from Putnam et al. (2007)). (B) Length distribution of 5′ UTRs in Ciona intestinalis determined from assembled EST sequence where open reading frame is probably complete. Red line indicates the proportion at any given length expected to include at least one in-frame stop codon. (C) Lack of conservation of N-terminus of Ciona intestinalis proteins relative to well annotated model systems, and compared to Xenopus tropicalis. Comparison of BLASTp alignment data using sets of mutual orthologs between Ciona intestinalis, Xenopus tropicalis, and either human or mouse. Schematic of BLAST alignments indicates how N-terminus divergence is measured.
© Copyright Policy - CC BY
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4528069&req=5

f0005: Coding genome of Ciona intestinalis. (A). Phylogenetic position of Ciona intestinalis relative to major model organisms, with branch length indicating degree of amino acid divergence (adapted from Putnam et al. (2007)). (B) Length distribution of 5′ UTRs in Ciona intestinalis determined from assembled EST sequence where open reading frame is probably complete. Red line indicates the proportion at any given length expected to include at least one in-frame stop codon. (C) Lack of conservation of N-terminus of Ciona intestinalis proteins relative to well annotated model systems, and compared to Xenopus tropicalis. Comparison of BLASTp alignment data using sets of mutual orthologs between Ciona intestinalis, Xenopus tropicalis, and either human or mouse. Schematic of BLAST alignments indicates how N-terminus divergence is measured.

Mentions: In spite of significant scientific interest, there is to our knowledge no marine invertebrate species for which a systematic collection of full-ORF cDNA clones has been developed. A collection of 24,020 cDNA clones was generated in the cephalochordate Branchiostomae floridae (Yu et al., 2008), but no specific attempt was made to select only full-ORF clones, nor to distinguish between recent paralogs and highly polymorphic loci. This may in part be due to the challenge of marine invertebrate genomes: recognition of open reading frames is made harder by the large evolutionary distances to the available non-marine model organisms with substantially mature genome-scale protein annotation. In the present case, C. intestinalis diverged over 500 million years ago from the closest taxa with annotated genomes: vertebrates and cephalochordates (Putnam et al., 2008). Extensive protein divergence (Fig. 1A, adapted from Putnam et al. (2007)), contribute to the difficulty of identifying N-terminal coding sequences of many Ciona proteins by simple comparison to orthologous proteins in the well annotated vertebrate species (Fig. 1C), an issue worsened by typically short 5′ UTRs, often lacking upstream in-frame STOP codons (Fig. 1B). In addition, many marine invertebrates have high levels of polymorphism and undergo cryptic speciation: allelic variation in C. intestinalis within individuals can be over 1.5% (Dehal et al., 2002), and divergence between the two described subspecies can reach 12% in some loci (Caputi et al., 2007; Nydam and Harrison, 2010). This degree of variation significantly widens the range of sequence identity over which allelic variation at a single locus may be confused with sequence divergence between recent paralogs, and thus complicates gene referencing and non-redundant clone selection.


A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis.

Gilchrist MJ, Sobral D, Khoueiry P, Daian F, Laporte B, Patrushev I, Matsumoto J, Dewar K, Hastings KE, Satou Y, Lemaire P, Rothbächer U - Dev. Biol. (2015)

Coding genome of Ciona intestinalis. (A). Phylogenetic position of Ciona intestinalis relative to major model organisms, with branch length indicating degree of amino acid divergence (adapted from Putnam et al. (2007)). (B) Length distribution of 5′ UTRs in Ciona intestinalis determined from assembled EST sequence where open reading frame is probably complete. Red line indicates the proportion at any given length expected to include at least one in-frame stop codon. (C) Lack of conservation of N-terminus of Ciona intestinalis proteins relative to well annotated model systems, and compared to Xenopus tropicalis. Comparison of BLASTp alignment data using sets of mutual orthologs between Ciona intestinalis, Xenopus tropicalis, and either human or mouse. Schematic of BLAST alignments indicates how N-terminus divergence is measured.
© Copyright Policy - CC BY
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4528069&req=5

f0005: Coding genome of Ciona intestinalis. (A). Phylogenetic position of Ciona intestinalis relative to major model organisms, with branch length indicating degree of amino acid divergence (adapted from Putnam et al. (2007)). (B) Length distribution of 5′ UTRs in Ciona intestinalis determined from assembled EST sequence where open reading frame is probably complete. Red line indicates the proportion at any given length expected to include at least one in-frame stop codon. (C) Lack of conservation of N-terminus of Ciona intestinalis proteins relative to well annotated model systems, and compared to Xenopus tropicalis. Comparison of BLASTp alignment data using sets of mutual orthologs between Ciona intestinalis, Xenopus tropicalis, and either human or mouse. Schematic of BLAST alignments indicates how N-terminus divergence is measured.
Mentions: In spite of significant scientific interest, there is to our knowledge no marine invertebrate species for which a systematic collection of full-ORF cDNA clones has been developed. A collection of 24,020 cDNA clones was generated in the cephalochordate Branchiostomae floridae (Yu et al., 2008), but no specific attempt was made to select only full-ORF clones, nor to distinguish between recent paralogs and highly polymorphic loci. This may in part be due to the challenge of marine invertebrate genomes: recognition of open reading frames is made harder by the large evolutionary distances to the available non-marine model organisms with substantially mature genome-scale protein annotation. In the present case, C. intestinalis diverged over 500 million years ago from the closest taxa with annotated genomes: vertebrates and cephalochordates (Putnam et al., 2008). Extensive protein divergence (Fig. 1A, adapted from Putnam et al. (2007)), contribute to the difficulty of identifying N-terminal coding sequences of many Ciona proteins by simple comparison to orthologous proteins in the well annotated vertebrate species (Fig. 1C), an issue worsened by typically short 5′ UTRs, often lacking upstream in-frame STOP codons (Fig. 1B). In addition, many marine invertebrates have high levels of polymorphism and undergo cryptic speciation: allelic variation in C. intestinalis within individuals can be over 1.5% (Dehal et al., 2002), and divergence between the two described subspecies can reach 12% in some loci (Caputi et al., 2007; Nydam and Harrison, 2010). This degree of variation significantly widens the range of sequence identity over which allelic variation at a single locus may be confused with sequence divergence between recent paralogs, and thus complicates gene referencing and non-redundant clone selection.

Bottom Line: Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes.It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms.It contains 19,163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes.

View Article: PubMed Central - PubMed

Affiliation: Gurdon Institute, Cambridge University, Cambridge, United Kingdom. Electronic address: mike.gilchrist@crick.ac.uk.

Show MeSH
Related in: MedlinePlus