Limits...
Re-annotation of the woodland strawberry (Fragaria vesca) genome.

Darwish O, Shahan R, Liu Z, Slovin JP, Alkharouf NW - BMC Genomics (2015)

Bottom Line: The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes.We have experimentally verified one of the new gene model predictions to validate our results.This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer and Information Sciences, Towson University, 7800 York Road, Towson, Maryland, 21252, USA. ODARWISH@TOWSON.EDU.

ABSTRACT

Background: Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1.

Results: The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae ( http://www.rosaceae.org/ ). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We have experimentally verified one of the new gene model predictions to validate our results.

Conclusions: Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.

Show MeSH

Related in: MedlinePlus

Comparisons of version 1.1 annotation with the TowU_Fve annotation. (A) The version1.1 (peach color) annotation shows two exons connected by an intron. However, leaf RNASeq reads align to the intronic region. The TowU_Fve annotation (red) merges two existing exons by including the intronic region. (B) The version1.1 (peach color) annotation is missing the last exon revealed by cortex tissue RNASeq reads alignments. The TowU_Fve annotation (red) shows the newly predicted gene structure with the addition of the distal exon. (C) The first generation annotation (peach color) shows an absence of a gene between 5613k and 5616k, while the aligned reads from leaf tissue revealed the existence of an expressed gene at that site. The TowU_Fve annotation (red) shows a newly predicted gene (an alpha/beta-hydrolase domain-containing protein) between gene35181 and gene12565.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4318131&req=5

Fig3: Comparisons of version 1.1 annotation with the TowU_Fve annotation. (A) The version1.1 (peach color) annotation shows two exons connected by an intron. However, leaf RNASeq reads align to the intronic region. The TowU_Fve annotation (red) merges two existing exons by including the intronic region. (B) The version1.1 (peach color) annotation is missing the last exon revealed by cortex tissue RNASeq reads alignments. The TowU_Fve annotation (red) shows the newly predicted gene structure with the addition of the distal exon. (C) The first generation annotation (peach color) shows an absence of a gene between 5613k and 5616k, while the aligned reads from leaf tissue revealed the existence of an expressed gene at that site. The TowU_Fve annotation (red) shows a newly predicted gene (an alpha/beta-hydrolase domain-containing protein) between gene35181 and gene12565.

Mentions: The increased numbers of coding regions were discovered based on the RNA-Seq reads from different tissue libraries, as illustrated in Figure 3. For example, Figure 3A illustrates that the transcriptome data uncovered potential splice variation for gene21088, a putative receptor protein kinase, and re-annotation resulted in retention of a previously annotated intron. Figure 3B shows the addition of an exon to gene31621, a bZIP transcription factor. Figure 3C illustrates the discovery of a new gene, a putative hydrolase, absent from previous annotations.Figure 3


Re-annotation of the woodland strawberry (Fragaria vesca) genome.

Darwish O, Shahan R, Liu Z, Slovin JP, Alkharouf NW - BMC Genomics (2015)

Comparisons of version 1.1 annotation with the TowU_Fve annotation. (A) The version1.1 (peach color) annotation shows two exons connected by an intron. However, leaf RNASeq reads align to the intronic region. The TowU_Fve annotation (red) merges two existing exons by including the intronic region. (B) The version1.1 (peach color) annotation is missing the last exon revealed by cortex tissue RNASeq reads alignments. The TowU_Fve annotation (red) shows the newly predicted gene structure with the addition of the distal exon. (C) The first generation annotation (peach color) shows an absence of a gene between 5613k and 5616k, while the aligned reads from leaf tissue revealed the existence of an expressed gene at that site. The TowU_Fve annotation (red) shows a newly predicted gene (an alpha/beta-hydrolase domain-containing protein) between gene35181 and gene12565.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4318131&req=5

Fig3: Comparisons of version 1.1 annotation with the TowU_Fve annotation. (A) The version1.1 (peach color) annotation shows two exons connected by an intron. However, leaf RNASeq reads align to the intronic region. The TowU_Fve annotation (red) merges two existing exons by including the intronic region. (B) The version1.1 (peach color) annotation is missing the last exon revealed by cortex tissue RNASeq reads alignments. The TowU_Fve annotation (red) shows the newly predicted gene structure with the addition of the distal exon. (C) The first generation annotation (peach color) shows an absence of a gene between 5613k and 5616k, while the aligned reads from leaf tissue revealed the existence of an expressed gene at that site. The TowU_Fve annotation (red) shows a newly predicted gene (an alpha/beta-hydrolase domain-containing protein) between gene35181 and gene12565.
Mentions: The increased numbers of coding regions were discovered based on the RNA-Seq reads from different tissue libraries, as illustrated in Figure 3. For example, Figure 3A illustrates that the transcriptome data uncovered potential splice variation for gene21088, a putative receptor protein kinase, and re-annotation resulted in retention of a previously annotated intron. Figure 3B shows the addition of an exon to gene31621, a bZIP transcription factor. Figure 3C illustrates the discovery of a new gene, a putative hydrolase, absent from previous annotations.Figure 3

Bottom Line: The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes.We have experimentally verified one of the new gene model predictions to validate our results.This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer and Information Sciences, Towson University, 7800 York Road, Towson, Maryland, 21252, USA. ODARWISH@TOWSON.EDU.

ABSTRACT

Background: Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1.

Results: The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae ( http://www.rosaceae.org/ ). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We have experimentally verified one of the new gene model predictions to validate our results.

Conclusions: Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.

Show MeSH
Related in: MedlinePlus