Limits...
The Prediction and Validation of Small CDSs Expand the Gene Repertoire of the Smallest Known Eukaryotic Genomes.

Belkorchia A, Gasc C, Polonais V, Parisot N, Gallois N, Ribière C, Lerat E, Gaspin C, Pombert JF, Peyret P, Peyretaillade E - PLoS ONE (2015)

Bottom Line: To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes.Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance.The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.

View Article: PubMed Central - PubMed

Affiliation: Clermont Université, Université d'Auvergne, Laboratoire "Microorganismes: Génome et Environnement", BP 10448, F-63000, Clermont-Ferrand, France; CNRS, UMR 6023, LMGE, F-63171, Aubière, France.

ABSTRACT
The proper prediction of the gene catalogue of an organism is essential to obtain a representative snapshot of its overall lifestyle, especially when it is not amenable to culturing. Microsporidia are obligate intracellular, sometimes hard to culture, eukaryotic parasites known to infect members of every animal phylum. To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes. In the present paper, we investigated whether such gene sizes may have induced biases for the methodologies used for genome annotation, with an emphasis on small coding sequence (CDS) gene prediction. Using better delineated intergenic regions from four Encephalitozoon genomes, we predicted de novo new small CDSs with sizes ranging from 78 to 255 bp (median 168) and corroborated these predictions by RACE-PCR experiments in Encephalitozoon cuniculi. Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance. The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.

No MeSH data available.


Example of the genomic context of previously annotated genes and newly-identified sCDSs in Encephalitozoon genomes.The transcriptional signals of the newly predicted genes are highlighted in red (promoter signal) and green (polyadenylation signal), respectively. The putative polyadenylation signals of the genes flanking the new sCDSs are highlighted in light blue.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4589312&req=5

pone.0139075.g001: Example of the genomic context of previously annotated genes and newly-identified sCDSs in Encephalitozoon genomes.The transcriptional signals of the newly predicted genes are highlighted in red (promoter signal) and green (polyadenylation signal), respectively. The putative polyadenylation signals of the genes flanking the new sCDSs are highlighted in light blue.

Mentions: Thereafter, using the curated annotations described above, we searched for the presence of short protein-coding gene candidates. Specifically, we searched for transcriptional and/or translational signals in intergenic regions that flanked small open reading frames, with the condition that both signals and ORFs were conserved across the Encephalitozoon genomes. Using this approach, a total of 31 small but highly conserved CDSs were identified in the four Encephalitozoon species (Fig 1, Table 1 and S3 Table). Another sCDS was also found to be shared between E. cuniculi (ECU04_1635) and E. romaleae (EROM_041665). However, its presence could not be ascertained in E. hellem and E. intestinalis because its location, based on syntenic information, falls within unsequenced regions. The proteins encoded by the newly-identified small CDS range from 25 to 84 amino acids in E. cuniculi (median 55; Table 1) and generally show a high level of similarity across the four Encephalitozoon species, with an average of 72% (min 46%, max 96%; Fig 2 and S1 Fig).


The Prediction and Validation of Small CDSs Expand the Gene Repertoire of the Smallest Known Eukaryotic Genomes.

Belkorchia A, Gasc C, Polonais V, Parisot N, Gallois N, Ribière C, Lerat E, Gaspin C, Pombert JF, Peyret P, Peyretaillade E - PLoS ONE (2015)

Example of the genomic context of previously annotated genes and newly-identified sCDSs in Encephalitozoon genomes.The transcriptional signals of the newly predicted genes are highlighted in red (promoter signal) and green (polyadenylation signal), respectively. The putative polyadenylation signals of the genes flanking the new sCDSs are highlighted in light blue.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4589312&req=5

pone.0139075.g001: Example of the genomic context of previously annotated genes and newly-identified sCDSs in Encephalitozoon genomes.The transcriptional signals of the newly predicted genes are highlighted in red (promoter signal) and green (polyadenylation signal), respectively. The putative polyadenylation signals of the genes flanking the new sCDSs are highlighted in light blue.
Mentions: Thereafter, using the curated annotations described above, we searched for the presence of short protein-coding gene candidates. Specifically, we searched for transcriptional and/or translational signals in intergenic regions that flanked small open reading frames, with the condition that both signals and ORFs were conserved across the Encephalitozoon genomes. Using this approach, a total of 31 small but highly conserved CDSs were identified in the four Encephalitozoon species (Fig 1, Table 1 and S3 Table). Another sCDS was also found to be shared between E. cuniculi (ECU04_1635) and E. romaleae (EROM_041665). However, its presence could not be ascertained in E. hellem and E. intestinalis because its location, based on syntenic information, falls within unsequenced regions. The proteins encoded by the newly-identified small CDS range from 25 to 84 amino acids in E. cuniculi (median 55; Table 1) and generally show a high level of similarity across the four Encephalitozoon species, with an average of 72% (min 46%, max 96%; Fig 2 and S1 Fig).

Bottom Line: To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes.Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance.The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.

View Article: PubMed Central - PubMed

Affiliation: Clermont Université, Université d'Auvergne, Laboratoire "Microorganismes: Génome et Environnement", BP 10448, F-63000, Clermont-Ferrand, France; CNRS, UMR 6023, LMGE, F-63171, Aubière, France.

ABSTRACT
The proper prediction of the gene catalogue of an organism is essential to obtain a representative snapshot of its overall lifestyle, especially when it is not amenable to culturing. Microsporidia are obligate intracellular, sometimes hard to culture, eukaryotic parasites known to infect members of every animal phylum. To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes. In the present paper, we investigated whether such gene sizes may have induced biases for the methodologies used for genome annotation, with an emphasis on small coding sequence (CDS) gene prediction. Using better delineated intergenic regions from four Encephalitozoon genomes, we predicted de novo new small CDSs with sizes ranging from 78 to 255 bp (median 168) and corroborated these predictions by RACE-PCR experiments in Encephalitozoon cuniculi. Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance. The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.

No MeSH data available.