A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis.
Bottom Line: Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes.It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms.It contains 19,163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes.
Affiliation: Gurdon Institute, Cambridge University, Cambridge, United Kingdom. Electronic address: firstname.lastname@example.org.Show MeSH
Related in: MedlinePlus
Mentions: In spite of significant scientific interest, there is to our knowledge no marine invertebrate species for which a systematic collection of full-ORF cDNA clones has been developed. A collection of 24,020 cDNA clones was generated in the cephalochordate Branchiostomae floridae (Yu et al., 2008), but no specific attempt was made to select only full-ORF clones, nor to distinguish between recent paralogs and highly polymorphic loci. This may in part be due to the challenge of marine invertebrate genomes: recognition of open reading frames is made harder by the large evolutionary distances to the available non-marine model organisms with substantially mature genome-scale protein annotation. In the present case, C. intestinalis diverged over 500 million years ago from the closest taxa with annotated genomes: vertebrates and cephalochordates (Putnam et al., 2008). Extensive protein divergence (Fig. 1A, adapted from Putnam et al. (2007)), contribute to the difficulty of identifying N-terminal coding sequences of many Ciona proteins by simple comparison to orthologous proteins in the well annotated vertebrate species (Fig. 1C), an issue worsened by typically short 5′ UTRs, often lacking upstream in-frame STOP codons (Fig. 1B). In addition, many marine invertebrates have high levels of polymorphism and undergo cryptic speciation: allelic variation in C. intestinalis within individuals can be over 1.5% (Dehal et al., 2002), and divergence between the two described subspecies can reach 12% in some loci (Caputi et al., 2007; Nydam and Harrison, 2010). This degree of variation significantly widens the range of sequence identity over which allelic variation at a single locus may be confused with sequence divergence between recent paralogs, and thus complicates gene referencing and non-redundant clone selection.
Affiliation: Gurdon Institute, Cambridge University, Cambridge, United Kingdom. Electronic address: email@example.com.