Limits...
De novo assembly and characterisation of the field pea transcriptome using RNA-Seq.

Sudheesh S, Sawbridge TI, Cogan NO, Kennedy P, Forster JW, Kaur S - BMC Genomics (2015)

Bottom Line: Advances in second-generation sequencing and associated bioinformatics analysis now provide unprecedented opportunities for the development of such resources.This study provided a comprehensive assembled and annotated transcriptome set for field pea that can be used for development of genetic markers, in order to assess genetic diversity, construct linkage maps, perform trait-dissection and implement whole-genome selection strategies in varietal improvement programs, as well to identify target genes for genetic modification approaches on the basis of annotation and expression analysis.In addition, the reference field pea transcriptome will prove highly valuable for comparative genomics studies and construction of a finalised genome sequence.

View Article: PubMed Central - PubMed

Affiliation: Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia. shimna.sudheesh@ecodev.vic.gov.au.

ABSTRACT

Background: Field pea (Pisum sativum L.) is a cool-season grain legume that is cultivated world-wide for both human consumption and stock-feed purposes. Enhancement of genetic and genomic resources for field pea will permit improved understanding of the control of traits relevant to crop productivity and quality. Advances in second-generation sequencing and associated bioinformatics analysis now provide unprecedented opportunities for the development of such resources. The objective of this study was to perform transcriptome sequencing and characterisation from two genotypes of field pea that differ in terms of seed and plant morphological characteristics.

Results: Transcriptome sequencing was performed with RNA templates from multiple tissues of the field pea genotypes Kaspa and Parafield. Tissue samples were collected at various growth stages, and a total of 23 cDNA libraries were sequenced using Illumina high-throughput sequencing platforms. A total of 407 and 352 million paired-end reads from the Kaspa and Parafield transcriptomes, respectively were assembled into 129,282 and 149,272 contigs, which were filtered on the basis of known gene annotations, presence of open reading frames (ORFs), reciprocal matches and degree of coverage. Totals of 126,335 contigs from Kaspa and 145,730 from Parafield were subsequently selected as the reference set. Reciprocal sequence analysis revealed that c. 87% of contigs were expressed in both cultivars, while a small proportion were unique to each genotype. Reads from different libraries were aligned to the genotype-specific assemblies in order to identify and characterise expression of contigs on a tissue-specific basis, of which 87% were expressed in more than one tissue, while others showed distinct expression patterns in specific tissues, providing unique transcriptome signatures.

Conclusion: This study provided a comprehensive assembled and annotated transcriptome set for field pea that can be used for development of genetic markers, in order to assess genetic diversity, construct linkage maps, perform trait-dissection and implement whole-genome selection strategies in varietal improvement programs, as well to identify target genes for genetic modification approaches on the basis of annotation and expression analysis. In addition, the reference field pea transcriptome will prove highly valuable for comparative genomics studies and construction of a finalised genome sequence.

No MeSH data available.


Related in: MedlinePlus

Sequence conservation of field pea contigs in comparison to sequences from other species (a) Percentage of sequence similarity of field pea contigs with nr, nt databases and sequences from other plant species; (b) Venn diagram summarising the distribution of BLASTN matches between the Kaspa transcriptome and sequences from three other legume genomes; (c) Venn diagram summarising the distribution of BLASTN matches between the Parafield transcriptome and sequences from three other legume genomes. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLASTN and the numbers within the parenthesis indicate the percentage of matches in terms of total numbers
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4537571&req=5

Fig3: Sequence conservation of field pea contigs in comparison to sequences from other species (a) Percentage of sequence similarity of field pea contigs with nr, nt databases and sequences from other plant species; (b) Venn diagram summarising the distribution of BLASTN matches between the Kaspa transcriptome and sequences from three other legume genomes; (c) Venn diagram summarising the distribution of BLASTN matches between the Parafield transcriptome and sequences from three other legume genomes. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLASTN and the numbers within the parenthesis indicate the percentage of matches in terms of total numbers

Mentions: In order to annotate the transcriptomes, all contigs were BLASTX analysed against the nr database of GenBank. For the Kaspa transcriptome, BLASTX analysis (Additional file 2) revealed 60,808 sequences (47 %) with significant matches, which were then filtered to remove non-plant sequences. This process resulted in a set of 59,229 sequences corresponding to 27,145 unique gene clusters. The length of the annotated sequences varied from 201 to 7,802 bp, with an average of 809 bp, and N50 of 1,106 bp. There were 34,452 (59 %) annotated sequences ≥ 500 bp, in which 15,867 sequences were longer than 1,000 bp, and the remaining 41 % of sequences were 201–500 bp in size. The E-value distribution of significant hits revealed that 48 % of matched sequences exhibited high levels of similarity (E-value lower than 10−50) to other legume genomes (Additional file 3, Figure A). For the Parafield transcriptome, 64,727 (43 %) of sequences exhibited significant BLASTX hits (Additional file 2), and after the removal of the non-plant sequences, 63,843 sequences (N50 of 1,083 bp and average 797 bp) remained, corresponding to 27,655 unique genes. Among the annotated sequences, 36,979 (58 %) were greater than 500 bp in length, whereas 26,863 sequences were 201–500 bp in length. The distribution of significant hits for the Parafield contigs showed that 48 % of the sequences displayed E-values less than 10−50, while the other matching sequences were located in the value range between 10−50 and 10−10 (Additional file 3, Figure A). The annotated contigs were also examined for the presence of repetitive elements, and c. 1 % of the contigs were annotated as repeat elements such as retrotransposons, gag polyprotein-encoding etc. The distribution of gene annotations based on BLASTX analysis exhibited a highest number of hits against sequences of M. truncatula, followed by soybean, and so-far published pea protein sequences within the nr database of NCBI (Additional file 3, Figure B). The BLASTN analysis of transcriptome contigs (Additional file 4) identified a higher number of matches (Fig. 3) to the NCBI nt database as compared to BLASTX analysis against nr. However, most of these additional matches were annotated as retrotransposons and hypothetical proteins, without well-characterised functions. The BLASTN analysis of transcriptome contigs (Additional file 4) against the pea chloroplast genome identified up to 0.17 % of contigs to be chloroplast-derived.Fig. 3


De novo assembly and characterisation of the field pea transcriptome using RNA-Seq.

Sudheesh S, Sawbridge TI, Cogan NO, Kennedy P, Forster JW, Kaur S - BMC Genomics (2015)

Sequence conservation of field pea contigs in comparison to sequences from other species (a) Percentage of sequence similarity of field pea contigs with nr, nt databases and sequences from other plant species; (b) Venn diagram summarising the distribution of BLASTN matches between the Kaspa transcriptome and sequences from three other legume genomes; (c) Venn diagram summarising the distribution of BLASTN matches between the Parafield transcriptome and sequences from three other legume genomes. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLASTN and the numbers within the parenthesis indicate the percentage of matches in terms of total numbers
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4537571&req=5

Fig3: Sequence conservation of field pea contigs in comparison to sequences from other species (a) Percentage of sequence similarity of field pea contigs with nr, nt databases and sequences from other plant species; (b) Venn diagram summarising the distribution of BLASTN matches between the Kaspa transcriptome and sequences from three other legume genomes; (c) Venn diagram summarising the distribution of BLASTN matches between the Parafield transcriptome and sequences from three other legume genomes. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLASTN and the numbers within the parenthesis indicate the percentage of matches in terms of total numbers
Mentions: In order to annotate the transcriptomes, all contigs were BLASTX analysed against the nr database of GenBank. For the Kaspa transcriptome, BLASTX analysis (Additional file 2) revealed 60,808 sequences (47 %) with significant matches, which were then filtered to remove non-plant sequences. This process resulted in a set of 59,229 sequences corresponding to 27,145 unique gene clusters. The length of the annotated sequences varied from 201 to 7,802 bp, with an average of 809 bp, and N50 of 1,106 bp. There were 34,452 (59 %) annotated sequences ≥ 500 bp, in which 15,867 sequences were longer than 1,000 bp, and the remaining 41 % of sequences were 201–500 bp in size. The E-value distribution of significant hits revealed that 48 % of matched sequences exhibited high levels of similarity (E-value lower than 10−50) to other legume genomes (Additional file 3, Figure A). For the Parafield transcriptome, 64,727 (43 %) of sequences exhibited significant BLASTX hits (Additional file 2), and after the removal of the non-plant sequences, 63,843 sequences (N50 of 1,083 bp and average 797 bp) remained, corresponding to 27,655 unique genes. Among the annotated sequences, 36,979 (58 %) were greater than 500 bp in length, whereas 26,863 sequences were 201–500 bp in length. The distribution of significant hits for the Parafield contigs showed that 48 % of the sequences displayed E-values less than 10−50, while the other matching sequences were located in the value range between 10−50 and 10−10 (Additional file 3, Figure A). The annotated contigs were also examined for the presence of repetitive elements, and c. 1 % of the contigs were annotated as repeat elements such as retrotransposons, gag polyprotein-encoding etc. The distribution of gene annotations based on BLASTX analysis exhibited a highest number of hits against sequences of M. truncatula, followed by soybean, and so-far published pea protein sequences within the nr database of NCBI (Additional file 3, Figure B). The BLASTN analysis of transcriptome contigs (Additional file 4) identified a higher number of matches (Fig. 3) to the NCBI nt database as compared to BLASTX analysis against nr. However, most of these additional matches were annotated as retrotransposons and hypothetical proteins, without well-characterised functions. The BLASTN analysis of transcriptome contigs (Additional file 4) against the pea chloroplast genome identified up to 0.17 % of contigs to be chloroplast-derived.Fig. 3

Bottom Line: Advances in second-generation sequencing and associated bioinformatics analysis now provide unprecedented opportunities for the development of such resources.This study provided a comprehensive assembled and annotated transcriptome set for field pea that can be used for development of genetic markers, in order to assess genetic diversity, construct linkage maps, perform trait-dissection and implement whole-genome selection strategies in varietal improvement programs, as well to identify target genes for genetic modification approaches on the basis of annotation and expression analysis.In addition, the reference field pea transcriptome will prove highly valuable for comparative genomics studies and construction of a finalised genome sequence.

View Article: PubMed Central - PubMed

Affiliation: Department of Economic Development, Jobs, Transport and Resources, Biosciences Research Division, AgriBio, Centre for AgriBioscience, 5 Ring Road, Bundoora, VIC, 3083, Australia. shimna.sudheesh@ecodev.vic.gov.au.

ABSTRACT

Background: Field pea (Pisum sativum L.) is a cool-season grain legume that is cultivated world-wide for both human consumption and stock-feed purposes. Enhancement of genetic and genomic resources for field pea will permit improved understanding of the control of traits relevant to crop productivity and quality. Advances in second-generation sequencing and associated bioinformatics analysis now provide unprecedented opportunities for the development of such resources. The objective of this study was to perform transcriptome sequencing and characterisation from two genotypes of field pea that differ in terms of seed and plant morphological characteristics.

Results: Transcriptome sequencing was performed with RNA templates from multiple tissues of the field pea genotypes Kaspa and Parafield. Tissue samples were collected at various growth stages, and a total of 23 cDNA libraries were sequenced using Illumina high-throughput sequencing platforms. A total of 407 and 352 million paired-end reads from the Kaspa and Parafield transcriptomes, respectively were assembled into 129,282 and 149,272 contigs, which were filtered on the basis of known gene annotations, presence of open reading frames (ORFs), reciprocal matches and degree of coverage. Totals of 126,335 contigs from Kaspa and 145,730 from Parafield were subsequently selected as the reference set. Reciprocal sequence analysis revealed that c. 87% of contigs were expressed in both cultivars, while a small proportion were unique to each genotype. Reads from different libraries were aligned to the genotype-specific assemblies in order to identify and characterise expression of contigs on a tissue-specific basis, of which 87% were expressed in more than one tissue, while others showed distinct expression patterns in specific tissues, providing unique transcriptome signatures.

Conclusion: This study provided a comprehensive assembled and annotated transcriptome set for field pea that can be used for development of genetic markers, in order to assess genetic diversity, construct linkage maps, perform trait-dissection and implement whole-genome selection strategies in varietal improvement programs, as well to identify target genes for genetic modification approaches on the basis of annotation and expression analysis. In addition, the reference field pea transcriptome will prove highly valuable for comparative genomics studies and construction of a finalised genome sequence.

No MeSH data available.


Related in: MedlinePlus