Limits...
Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.

Ralph SG, Chun HJ, Cooper D, Kirkpatrick R, Kolosova N, Gunter L, Tuskan GA, Douglas CJ, Holt RA, Jones SJ, Marra MA, Bohlmann J - BMC Genomics (2008)

Bottom Line: We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa x P. deltoides hybrid.The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth.Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.

View Article: PubMed Central - HTML - PubMed

Affiliation: Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada. steven.ralph@und.nodak.edu

ABSTRACT

Background: The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions.

Results: As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa x P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in poplar leaves attacked by forest tent caterpillars.

Conclusion: This study has generated a high-quality FLcDNA resource for poplar and the third largest FLcDNA collection published to date for any plant species. We successfully used the FLcDNA sequences to reassess gene prediction in the poplar genome sequence, perform comparative sequence annotation, and identify differentially expressed transcripts associated with defense against insects. The FLcDNA sequences will be essential to the ongoing curation and annotation of the poplar genome, in particular for targeting gaps in the current genome assembly and further improvement of gene predictions. The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth. Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.

Show MeSH

Related in: MedlinePlus

Sequence annotation of 4,664 high-quality poplar FLcDNAs against published databases. Panel A shows the percentage of FLcDNAs with similarity to entries in three databases using expect (E) value thresholds of < 1e-05 and < 1e-50: matches to previously published poplar ESTs (i.e., ESTs available in GenBank, excluding ESTs from this study) identified by BLASTN; amino acid sequences in the non-redundant (NR) division of GenBank identified by BLASTX; and The Arabidopsis Information Resource (TAIR) non-redundant Arabidopsis peptide matches identified by BLASTX. Panel B shows a Venn diagram of distinct and overlapping patterns of sequence similarity against the three databases (public poplar ESTs, TAIR, NR) at a BLAST E value threshold of < 1e-05. At this threshold, 95 poplar FLcDNAs had no similarity to sequences in any of the databases examined.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2270264&req=5

Figure 6: Sequence annotation of 4,664 high-quality poplar FLcDNAs against published databases. Panel A shows the percentage of FLcDNAs with similarity to entries in three databases using expect (E) value thresholds of < 1e-05 and < 1e-50: matches to previously published poplar ESTs (i.e., ESTs available in GenBank, excluding ESTs from this study) identified by BLASTN; amino acid sequences in the non-redundant (NR) division of GenBank identified by BLASTX; and The Arabidopsis Information Resource (TAIR) non-redundant Arabidopsis peptide matches identified by BLASTX. Panel B shows a Venn diagram of distinct and overlapping patterns of sequence similarity against the three databases (public poplar ESTs, TAIR, NR) at a BLAST E value threshold of < 1e-05. At this threshold, 95 poplar FLcDNAs had no similarity to sequences in any of the databases examined.

Mentions: Despite the growing research interest in poplar as a model angiosperm tree species and the recent completion of the poplar genome sequence, poplar still represents a difficult experimental system with relatively few functionally characterized proteins, compared to other established model systems such as Arabidopsis. Therefore, our effort of in silico annotation of poplar FLcDNAs was largely based on comparison with Arabidopsis together with the NR database of GenBank containing sequences from all plants, among other species. Using BLASTX, we found that the proportion of FLcDNAs with similarity to TAIR Arabidopsis proteins was 87.5% (4,081) at E value < 1e-05 and 55.5% (2,590) at E value < 1e-50 (Figure 6A). Similar values were obtained when using BLASTX to compare against peptides from other species in the NR division of GenBank (88.0% matches at E value < 1e-05 and 56.9% matches at E value < 1e-50) (Figure 6A). As expected, the proportion of poplar FLcDNAs with sequence similarity to previously published poplar ESTs (i.e., ESTs available in the dbEST division of GenBank, excluding ESTs from this study) by BLASTN was very high, with 96.3% (4,496) and 94.3% (4,401) of FLcDNAs having matches with E values < 1e-05 and < 1e-50, respectively (Figure 6A).


Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.

Ralph SG, Chun HJ, Cooper D, Kirkpatrick R, Kolosova N, Gunter L, Tuskan GA, Douglas CJ, Holt RA, Jones SJ, Marra MA, Bohlmann J - BMC Genomics (2008)

Sequence annotation of 4,664 high-quality poplar FLcDNAs against published databases. Panel A shows the percentage of FLcDNAs with similarity to entries in three databases using expect (E) value thresholds of < 1e-05 and < 1e-50: matches to previously published poplar ESTs (i.e., ESTs available in GenBank, excluding ESTs from this study) identified by BLASTN; amino acid sequences in the non-redundant (NR) division of GenBank identified by BLASTX; and The Arabidopsis Information Resource (TAIR) non-redundant Arabidopsis peptide matches identified by BLASTX. Panel B shows a Venn diagram of distinct and overlapping patterns of sequence similarity against the three databases (public poplar ESTs, TAIR, NR) at a BLAST E value threshold of < 1e-05. At this threshold, 95 poplar FLcDNAs had no similarity to sequences in any of the databases examined.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2270264&req=5

Figure 6: Sequence annotation of 4,664 high-quality poplar FLcDNAs against published databases. Panel A shows the percentage of FLcDNAs with similarity to entries in three databases using expect (E) value thresholds of < 1e-05 and < 1e-50: matches to previously published poplar ESTs (i.e., ESTs available in GenBank, excluding ESTs from this study) identified by BLASTN; amino acid sequences in the non-redundant (NR) division of GenBank identified by BLASTX; and The Arabidopsis Information Resource (TAIR) non-redundant Arabidopsis peptide matches identified by BLASTX. Panel B shows a Venn diagram of distinct and overlapping patterns of sequence similarity against the three databases (public poplar ESTs, TAIR, NR) at a BLAST E value threshold of < 1e-05. At this threshold, 95 poplar FLcDNAs had no similarity to sequences in any of the databases examined.
Mentions: Despite the growing research interest in poplar as a model angiosperm tree species and the recent completion of the poplar genome sequence, poplar still represents a difficult experimental system with relatively few functionally characterized proteins, compared to other established model systems such as Arabidopsis. Therefore, our effort of in silico annotation of poplar FLcDNAs was largely based on comparison with Arabidopsis together with the NR database of GenBank containing sequences from all plants, among other species. Using BLASTX, we found that the proportion of FLcDNAs with similarity to TAIR Arabidopsis proteins was 87.5% (4,081) at E value < 1e-05 and 55.5% (2,590) at E value < 1e-50 (Figure 6A). Similar values were obtained when using BLASTX to compare against peptides from other species in the NR division of GenBank (88.0% matches at E value < 1e-05 and 56.9% matches at E value < 1e-50) (Figure 6A). As expected, the proportion of poplar FLcDNAs with sequence similarity to previously published poplar ESTs (i.e., ESTs available in the dbEST division of GenBank, excluding ESTs from this study) by BLASTN was very high, with 96.3% (4,496) and 94.3% (4,401) of FLcDNAs having matches with E values < 1e-05 and < 1e-50, respectively (Figure 6A).

Bottom Line: We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa x P. deltoides hybrid.The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth.Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.

View Article: PubMed Central - HTML - PubMed

Affiliation: Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada. steven.ralph@und.nodak.edu

ABSTRACT

Background: The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions.

Results: As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa x P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in poplar leaves attacked by forest tent caterpillars.

Conclusion: This study has generated a high-quality FLcDNA resource for poplar and the third largest FLcDNA collection published to date for any plant species. We successfully used the FLcDNA sequences to reassess gene prediction in the poplar genome sequence, perform comparative sequence annotation, and identify differentially expressed transcripts associated with defense against insects. The FLcDNA sequences will be essential to the ongoing curation and annotation of the poplar genome, in particular for targeting gaps in the current genome assembly and further improvement of gene predictions. The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth. Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.

Show MeSH
Related in: MedlinePlus