Limits...
Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used

View Article: PubMed Central - PubMed

ABSTRACT

The classification performance of Kraken was evaluated in terms of sensitivity and specificity when using short and long 16S rRNA sequences. A total of 440,738 sequences from bacteria with complete taxonomic classifications were downloaded from the high quality ribosomal RNA database SILVA. Amplicons produced (86,371 sequences; 1450 bp) by virtual PCR with primers covering the V1–V9 region of the 16S-rRNA gene were used as reference. Virtual PCŔs of internal fragments V3–V4, V4–V5 and V3–V5 were performed. A total of 81,523, 82,334 and 82,998 amplicons were obtained for regions V3–V4, V4–V5 and V3–V5 respectively. Differences in depth of taxonomic classification were detected among the internal fragments. For instance, sensitivity and specificity of sequences classified up to subspecies level were higher when the largest internal fraction (V3–V5) was used (54.0 and 74.6% respectively), compared to V3–V4 (45.1 and 66.7%) and V4–V5 (41.8 and 64.6%) fragments. Similar pattern was detected for sequences classified up to more superficial taxonomic categories (i.e. family, order, class…). Results also demonstrate that internal fragments lost specificity and some could be misclassified at the deepest taxonomic levels (i.e. species or subspecies). It is concluded that the larger V3–V5 fragment could be considered for massive high throughput sequencing reducing the loss of sensitivity and sensibility.

No MeSH data available.


Distribution size of amplicons obtained after virtual PCR of complete 16S rRNA gene (V1–V9). Amplicons with extreme sizes outside the range of mean ± 2 standard deviations were excluded.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037269&req=5

fig0005: Distribution size of amplicons obtained after virtual PCR of complete 16S rRNA gene (V1–V9). Amplicons with extreme sizes outside the range of mean ± 2 standard deviations were excluded.

Mentions: Silva data base (release 123) contains 440,738 non-redundant sequences with taxonomic description from phylum to species or strain. However, not all of them were virtually amplified using the primer set for the complete gene. A total of 102,101 amplicons (23%) were obtained; from these, 86,371 resulted to be unique large fragments with an average size of 1450.3 bp. Considering sequences within the size range of mean ±2 standard deviations (S.D.) resulted in a homogeneous group that accounted for 99.1% of the sequences ranging from 1378 to 1522 bp (1450 ± 1.9 bp; Fig. 1) and covering variable regions V1 to V9. Thereafter virtual PCR was performed on this group of sequences (85,594) by using primers for the amplification of segments containing hypervariable regions V3–V4, V4–V5 and V3–V5 (Table 1).


Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used
Distribution size of amplicons obtained after virtual PCR of complete 16S rRNA gene (V1–V9). Amplicons with extreme sizes outside the range of mean ± 2 standard deviations were excluded.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037269&req=5

fig0005: Distribution size of amplicons obtained after virtual PCR of complete 16S rRNA gene (V1–V9). Amplicons with extreme sizes outside the range of mean ± 2 standard deviations were excluded.
Mentions: Silva data base (release 123) contains 440,738 non-redundant sequences with taxonomic description from phylum to species or strain. However, not all of them were virtually amplified using the primer set for the complete gene. A total of 102,101 amplicons (23%) were obtained; from these, 86,371 resulted to be unique large fragments with an average size of 1450.3 bp. Considering sequences within the size range of mean ±2 standard deviations (S.D.) resulted in a homogeneous group that accounted for 99.1% of the sequences ranging from 1378 to 1522 bp (1450 ± 1.9 bp; Fig. 1) and covering variable regions V1 to V9. Thereafter virtual PCR was performed on this group of sequences (85,594) by using primers for the amplification of segments containing hypervariable regions V3–V4, V4–V5 and V3–V5 (Table 1).

View Article: PubMed Central - PubMed

ABSTRACT

The classification performance of Kraken was evaluated in terms of sensitivity and specificity when using short and long 16S rRNA sequences. A total of 440,738 sequences from bacteria with complete taxonomic classifications were downloaded from the high quality ribosomal RNA database SILVA. Amplicons produced (86,371 sequences; 1450 bp) by virtual PCR with primers covering the V1–V9 region of the 16S-rRNA gene were used as reference. Virtual PCŔs of internal fragments V3–V4, V4–V5 and V3–V5 were performed. A total of 81,523, 82,334 and 82,998 amplicons were obtained for regions V3–V4, V4–V5 and V3–V5 respectively. Differences in depth of taxonomic classification were detected among the internal fragments. For instance, sensitivity and specificity of sequences classified up to subspecies level were higher when the largest internal fraction (V3–V5) was used (54.0 and 74.6% respectively), compared to V3–V4 (45.1 and 66.7%) and V4–V5 (41.8 and 64.6%) fragments. Similar pattern was detected for sequences classified up to more superficial taxonomic categories (i.e. family, order, class…). Results also demonstrate that internal fragments lost specificity and some could be misclassified at the deepest taxonomic levels (i.e. species or subspecies). It is concluded that the larger V3–V5 fragment could be considered for massive high throughput sequencing reducing the loss of sensitivity and sensibility.

No MeSH data available.