Limits...
Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation.

Czaban A, Sharma S, Byrne SL, Spannagl M, Mayer KF, Asp T - BMC Genomics (2015)

Bottom Line: Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex.The orthologous genes between the species have a very high %id (91,61%) and the majority of gene families were shared for all of them.It is likely that the knowledge of the genomes will be largely transferable between species within the complex.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Biology and Genetics, Aarhus University, Forsøgsvej 1, Slagelse, 4200, Denmark. Adrian.Czaban@mbg.au.dk.

ABSTRACT

Background: The Lolium-Festuca complex incorporates species from the Lolium genera and the broad leaf fescues, both belonging to the subfamily Pooideae. This subfamily also includes wheat, barley, oat and rye, making it extremely important to world agriculture. Species within the Lolium-Festuca complex show very diverse phenotypes, and many of them are related to agronomically important traits. Analysis of sequenced transcriptomes of these non-model species may shed light on the molecular mechanisms underlying this phenotypic diversity.

Results: We have generated de novo transcriptome assemblies for four species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family clusters for the four species using OrthoMCL and analyzed their inferred phylogenetic relationships. Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex. Grouping of the gene families based on their BLAST identity enabled us to divide ortholog groups into those that are very conserved and those that are more evolutionarily relaxed. The ratio of the non-synonumous to synonymous substitutions enabled us to pinpoint protein sequences evolving in response to positive selection. These proteins may explain some of the differences between the more stress tolerant Festuca, and the less stress tolerant Lolium species.

Conclusions: Our data presents a comprehensive transcriptome sequence comparison between species from the Lolium-Festuca complex, with the identification of potential candidate genes underlying some important phenotypical differences within the complex (such as VRN2). The orthologous genes between the species have a very high %id (91,61%) and the majority of gene families were shared for all of them. It is likely that the knowledge of the genomes will be largely transferable between species within the complex.

Show MeSH
Length distribution graph. A vertical bar chart of length distribution of transcriptome assembly fragments across analyzed species. The X-axis represents the length range bins, the Y-axis is the amount of transcripts present in each bin.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4389671&req=5

Fig1: Length distribution graph. A vertical bar chart of length distribution of transcriptome assembly fragments across analyzed species. The X-axis represents the length range bins, the Y-axis is the amount of transcripts present in each bin.

Mentions: We focused on generating transcriptome assemblies for four species within the Lolium-Festuca complex. Reads were error-corrected using ALLPATHS-LG tool [30], and assembled using Trinity software [31] to produce transcriptome assemblies that varied in transcript number between 52,166 and 72,133 after quality filtering for low-read support transcripts (Table 1). The distribution of transcript length is very similar between the four species (Figure 1), and in all cases a large portion of the assembly is contained within transcripts that are over 1000 bp in length. We have taken several approaches to evaluate the quality of each assembly and determine how comparable the four assemblies are. First, we identified which transcripts from three closely related species (B.distachyon, O. sativa and T. aestivum) share the greatest sequence similiarity with transcripts from the four Lolium-Festuca complex species. We then determined how much overlap there was between the transcript from our de-novo assemblies and the transcript from the related species. A high proportion of the transcripts can be aligned fully (100%) or almost fully (80%) to the transcripts from the related species (Table 2). The highest number of hits were found to the wheat gene set, the closest relative in this comparison. Secondly, we used the CEGMA pipeline [32] to evaluate the completeness of our assemblies. This is a tool that assesses the presence and coverage of a set of 248 extremely conserved core eukaryotic genes (CEGs). The tool is routinely used for evaluating genomic assemblies, however, it has also been used for evaluating transcriptome assemblies [33,34]. The percentage of complete CEGs ranged from 88.71 to 95.56, and the percentage of partially complete CEGs ranged from 94.76 to 97.58 (Table 3). The average number of orthologs per CEG and the % of detected CEGs that had more than 1 ortholog were similar across the four species. Our results point to transcriptome assemblies that reflect a representative portion of the transcriptome complexity, and are comparable between the four species.Table 1


Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation.

Czaban A, Sharma S, Byrne SL, Spannagl M, Mayer KF, Asp T - BMC Genomics (2015)

Length distribution graph. A vertical bar chart of length distribution of transcriptome assembly fragments across analyzed species. The X-axis represents the length range bins, the Y-axis is the amount of transcripts present in each bin.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4389671&req=5

Fig1: Length distribution graph. A vertical bar chart of length distribution of transcriptome assembly fragments across analyzed species. The X-axis represents the length range bins, the Y-axis is the amount of transcripts present in each bin.
Mentions: We focused on generating transcriptome assemblies for four species within the Lolium-Festuca complex. Reads were error-corrected using ALLPATHS-LG tool [30], and assembled using Trinity software [31] to produce transcriptome assemblies that varied in transcript number between 52,166 and 72,133 after quality filtering for low-read support transcripts (Table 1). The distribution of transcript length is very similar between the four species (Figure 1), and in all cases a large portion of the assembly is contained within transcripts that are over 1000 bp in length. We have taken several approaches to evaluate the quality of each assembly and determine how comparable the four assemblies are. First, we identified which transcripts from three closely related species (B.distachyon, O. sativa and T. aestivum) share the greatest sequence similiarity with transcripts from the four Lolium-Festuca complex species. We then determined how much overlap there was between the transcript from our de-novo assemblies and the transcript from the related species. A high proportion of the transcripts can be aligned fully (100%) or almost fully (80%) to the transcripts from the related species (Table 2). The highest number of hits were found to the wheat gene set, the closest relative in this comparison. Secondly, we used the CEGMA pipeline [32] to evaluate the completeness of our assemblies. This is a tool that assesses the presence and coverage of a set of 248 extremely conserved core eukaryotic genes (CEGs). The tool is routinely used for evaluating genomic assemblies, however, it has also been used for evaluating transcriptome assemblies [33,34]. The percentage of complete CEGs ranged from 88.71 to 95.56, and the percentage of partially complete CEGs ranged from 94.76 to 97.58 (Table 3). The average number of orthologs per CEG and the % of detected CEGs that had more than 1 ortholog were similar across the four species. Our results point to transcriptome assemblies that reflect a representative portion of the transcriptome complexity, and are comparable between the four species.Table 1

Bottom Line: Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex.The orthologous genes between the species have a very high %id (91,61%) and the majority of gene families were shared for all of them.It is likely that the knowledge of the genomes will be largely transferable between species within the complex.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular Biology and Genetics, Aarhus University, Forsøgsvej 1, Slagelse, 4200, Denmark. Adrian.Czaban@mbg.au.dk.

ABSTRACT

Background: The Lolium-Festuca complex incorporates species from the Lolium genera and the broad leaf fescues, both belonging to the subfamily Pooideae. This subfamily also includes wheat, barley, oat and rye, making it extremely important to world agriculture. Species within the Lolium-Festuca complex show very diverse phenotypes, and many of them are related to agronomically important traits. Analysis of sequenced transcriptomes of these non-model species may shed light on the molecular mechanisms underlying this phenotypic diversity.

Results: We have generated de novo transcriptome assemblies for four species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family clusters for the four species using OrthoMCL and analyzed their inferred phylogenetic relationships. Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex. Grouping of the gene families based on their BLAST identity enabled us to divide ortholog groups into those that are very conserved and those that are more evolutionarily relaxed. The ratio of the non-synonumous to synonymous substitutions enabled us to pinpoint protein sequences evolving in response to positive selection. These proteins may explain some of the differences between the more stress tolerant Festuca, and the less stress tolerant Lolium species.

Conclusions: Our data presents a comprehensive transcriptome sequence comparison between species from the Lolium-Festuca complex, with the identification of potential candidate genes underlying some important phenotypical differences within the complex (such as VRN2). The orthologous genes between the species have a very high %id (91,61%) and the majority of gene families were shared for all of them. It is likely that the knowledge of the genomes will be largely transferable between species within the complex.

Show MeSH