Limits...
An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags.

Lee BY, Howe AE, Conte MA, D'Cotta H, Pepey E, Baroiller JF, di Palma F, Carleton KL, Kocher TD - BMC Genomics (2010)

Bottom Line: The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp.Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10(-10)) to the UniProt database.These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology, University of Maryland, College Park, Maryland 20742, USA.

ABSTRACT

Background: Large collections of expressed sequence tags (ESTs) are a fundamental resource for analysis of gene expression and annotation of genome sequences. We generated 116,899 ESTs from 17 normalized and two non-normalized cDNA libraries representing 16 tissues from tilapia, a cichlid fish widely used in aquaculture and biological research.

Results: The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp. Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10(-10)) to the UniProt database.

Conclusion: Normalization of the cDNA pools with double-stranded nuclease allowed us to efficiently sequence a large collection of ESTs. These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.

Show MeSH

Related in: MedlinePlus

Distribution of sequence starts and stops on Uniprot entries. a, c - distributions for unigenes. b, d - distributions for unassembled ESTs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2874815&req=5

Figure 5: Distribution of sequence starts and stops on Uniprot entries. a, c - distributions for unigenes. b, d - distributions for unassembled ESTs.

Mentions: 41,129 (35%) of the unassembled ESTs and 17,722 (32%) of the assembled unigenes found a significant (e-10) blast match in the UniProt database. We calculated the fraction of ESTs and Unigenes that were complete on the 5' and 3' ends by scoring the number of sequences that matched to within 10 amino acids of the 5' and 3' ends of each UniProt entry (Figure 5). This proportion varied with the length of the UniProt entry (Table 3). Sixty-five percent of the ESTs matching UniProts <250aa were complete on the 5' end. For UniProts between 251 and 500 amino acids this fraction dropped to 36%. Similarly, 68% of the unigenes matching UniProts <250aa were complete on the 5' end, and the fraction dropped to 36% for UniProts between 251 and 500 amino acids. Overall 13.9% of the ESTs and 13.1% of the unigenes were complete on both the 5' and 3' ends. 17,505 (31%) of the unigenes were mapped to candidate Gene Ontology (GO) terms. Following the application of the annotation rules, 12,792 (22.8%) of the unigenes were annotated with GO terms. The proportion of unigenes annotated with GO terms can be seen for each of the three GO functional categories (biological process, molecular function, and cellular component) in Additional file 1: Figure S1. 10,527 (18.7%) of the unigenes had a single-directional best hit to the KEGG pathway database.


An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags.

Lee BY, Howe AE, Conte MA, D'Cotta H, Pepey E, Baroiller JF, di Palma F, Carleton KL, Kocher TD - BMC Genomics (2010)

Distribution of sequence starts and stops on Uniprot entries. a, c - distributions for unigenes. b, d - distributions for unassembled ESTs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2874815&req=5

Figure 5: Distribution of sequence starts and stops on Uniprot entries. a, c - distributions for unigenes. b, d - distributions for unassembled ESTs.
Mentions: 41,129 (35%) of the unassembled ESTs and 17,722 (32%) of the assembled unigenes found a significant (e-10) blast match in the UniProt database. We calculated the fraction of ESTs and Unigenes that were complete on the 5' and 3' ends by scoring the number of sequences that matched to within 10 amino acids of the 5' and 3' ends of each UniProt entry (Figure 5). This proportion varied with the length of the UniProt entry (Table 3). Sixty-five percent of the ESTs matching UniProts <250aa were complete on the 5' end. For UniProts between 251 and 500 amino acids this fraction dropped to 36%. Similarly, 68% of the unigenes matching UniProts <250aa were complete on the 5' end, and the fraction dropped to 36% for UniProts between 251 and 500 amino acids. Overall 13.9% of the ESTs and 13.1% of the unigenes were complete on both the 5' and 3' ends. 17,505 (31%) of the unigenes were mapped to candidate Gene Ontology (GO) terms. Following the application of the annotation rules, 12,792 (22.8%) of the unigenes were annotated with GO terms. The proportion of unigenes annotated with GO terms can be seen for each of the three GO functional categories (biological process, molecular function, and cellular component) in Additional file 1: Figure S1. 10,527 (18.7%) of the unigenes had a single-directional best hit to the KEGG pathway database.

Bottom Line: The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp.Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10(-10)) to the UniProt database.These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology, University of Maryland, College Park, Maryland 20742, USA.

ABSTRACT

Background: Large collections of expressed sequence tags (ESTs) are a fundamental resource for analysis of gene expression and annotation of genome sequences. We generated 116,899 ESTs from 17 normalized and two non-normalized cDNA libraries representing 16 tissues from tilapia, a cichlid fish widely used in aquaculture and biological research.

Results: The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp. Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10(-10)) to the UniProt database.

Conclusion: Normalization of the cDNA pools with double-stranded nuclease allowed us to efficiently sequence a large collection of ESTs. These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.

Show MeSH
Related in: MedlinePlus