Limits...
Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery.

Kaur S, Cogan NO, Pembleton LW, Shinozuka M, Savin KW, Materne M, Forster JW - BMC Genomics (2011)

Bottom Line: When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space.A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism.As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Primary Industries, Biosciences Research Division, Victorian AgriBiosciences Centre, La Trobe University Research and Development Park, Bundoora, Australia.

ABSTRACT

Background: Lentil (Lens culinaris Medik.) is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality.

Results: Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs). De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR)-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism.

Conclusions: A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.

Show MeSH

Related in: MedlinePlus

Frequency histogram depicting the distribution of number of contigs as a function of number of reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113791&req=5

Figure 2: Frequency histogram depicting the distribution of number of contigs as a function of number of reads.

Mentions: A total of 1.38 × 106 reads corresponding to a cumulative sequence of 448 Mbp were generated from a range of tissues of six genotypes of lentil using the GS-FLX Titanium chemistry. Prior to sequence quality filtering, a median sequence read length of 330 bp was generated. The adaptors, primer sequences and strings of 35 nucleotides from both the 5'- and 3'-termini of each sequence read were removed in order to generate high confidence reads. A total of 847,824 high quality reads were then used to perform de novo assembly. After clustering and assembly, a total of 15,359 contigs and 68,715 singletons were obtained, representing a total of 84,074 unigenes (Additional files 1 and 2). The unigene set was then further analysed for quality based on read length, and any remnant sequences less than 100 bp in length were excluded from further analysis, leaving a total of 15,354 contigs and 66,652 singletons. The length of contigs ranged from 114 bp to 6479 bp, with an average of 717 bp. Contig coverage varied from 1.25-fold to 8779-fold, with an average of 13.9-fold. The number of reads per contig varied between 2 and 104,007 with an average of 46 (Table 1). The distributions of read length and number of reads per contig are shown in Figures 1 and 2, respectively. After assembly, 9,614 contigs were identified with read lengths > 500 bp. In addition, only 0.4% of the contigs displayed a read length < 200 bp. This effect might be due to the short length of individual reads, and/or to low coverage of the transcriptome. The majority of contigs (51.5%) were derived from less than 10 reads (Figure 2), followed by 21% composed of up to 11-20 reads per contig. A total of 5.9% of the contigs were composed of more than 100 reads in each contig.


Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery.

Kaur S, Cogan NO, Pembleton LW, Shinozuka M, Savin KW, Materne M, Forster JW - BMC Genomics (2011)

Frequency histogram depicting the distribution of number of contigs as a function of number of reads.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113791&req=5

Figure 2: Frequency histogram depicting the distribution of number of contigs as a function of number of reads.
Mentions: A total of 1.38 × 106 reads corresponding to a cumulative sequence of 448 Mbp were generated from a range of tissues of six genotypes of lentil using the GS-FLX Titanium chemistry. Prior to sequence quality filtering, a median sequence read length of 330 bp was generated. The adaptors, primer sequences and strings of 35 nucleotides from both the 5'- and 3'-termini of each sequence read were removed in order to generate high confidence reads. A total of 847,824 high quality reads were then used to perform de novo assembly. After clustering and assembly, a total of 15,359 contigs and 68,715 singletons were obtained, representing a total of 84,074 unigenes (Additional files 1 and 2). The unigene set was then further analysed for quality based on read length, and any remnant sequences less than 100 bp in length were excluded from further analysis, leaving a total of 15,354 contigs and 66,652 singletons. The length of contigs ranged from 114 bp to 6479 bp, with an average of 717 bp. Contig coverage varied from 1.25-fold to 8779-fold, with an average of 13.9-fold. The number of reads per contig varied between 2 and 104,007 with an average of 46 (Table 1). The distributions of read length and number of reads per contig are shown in Figures 1 and 2, respectively. After assembly, 9,614 contigs were identified with read lengths > 500 bp. In addition, only 0.4% of the contigs displayed a read length < 200 bp. This effect might be due to the short length of individual reads, and/or to low coverage of the transcriptome. The majority of contigs (51.5%) were derived from less than 10 reads (Figure 2), followed by 21% composed of up to 11-20 reads per contig. A total of 5.9% of the contigs were composed of more than 100 reads in each contig.

Bottom Line: When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space.A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism.As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Primary Industries, Biosciences Research Division, Victorian AgriBiosciences Centre, La Trobe University Research and Development Park, Bundoora, Australia.

ABSTRACT

Background: Lentil (Lens culinaris Medik.) is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality.

Results: Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs). De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR)-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism.

Conclusions: A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.

Show MeSH
Related in: MedlinePlus