Limits...
The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data.

Ramette A, Buttigieg PL - Front Microbiol (2014)

Bottom Line: The aim of this implementation is to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework.These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible.The package and procedures are illustrated by several tutorials and examples.

View Article: PubMed Central - PubMed

Affiliation: HGF-MPG Group for Deep Sea Ecology and Technology, Max Planck Institute for Marine Microbiology Bremen, Germany.

ABSTRACT
Oligotyping is a novel, supervised computational method that classifies closely related sequences into "oligotypes" (OTs) based on subtle nucleotide variation (Eren et al., 2013). Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs). Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al., 2014c), in the statistical programming language and environment, R. The aim of this implementation is to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: (1) An analytical method (the broken stick model) is proposed to help identify OTs of low abundance that could be generated by chance alone and (2) a one-pass profiling (OP) method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising to be undertaken. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples.

No MeSH data available.


Related in: MedlinePlus

Comparison of variances obtained by MED (y axis) vs. OP (x axis) across 67 sequence alignments (OTUs), both after BSM filtering. (A) whole dataset, (B) after rescaling to variances below 400 on each axis, and (C) after removing the three points that made the y = x line deviate (as red dots in A and B). The blue and red lines represent linear confidence (95%) and prediction lines, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4231947&req=5

Figure 6: Comparison of variances obtained by MED (y axis) vs. OP (x axis) across 67 sequence alignments (OTUs), both after BSM filtering. (A) whole dataset, (B) after rescaling to variances below 400 on each axis, and (C) after removing the three points that made the y = x line deviate (as red dots in A and B). The blue and red lines represent linear confidence (95%) and prediction lines, respectively.

Mentions: After applying BSM filtering to MED- and OP-generated tables, only 79 and 123 datasets still contained OTs, respectively, with 67 datasets in common to both techniques. The comparison of the variance in each dataset across the 67 sequence alignments identified three datasets which were mainly responsible for the departure from an exact match between the variances obtained by the two methods for each dataset analyzed (Figure 6). Removing those three datasets, in which OP identified generally higher variance than MED (Tutorial 4), led to a near 1:1 correspondence between the variance obtained by MED and by OP (Figure 6C).


The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data.

Ramette A, Buttigieg PL - Front Microbiol (2014)

Comparison of variances obtained by MED (y axis) vs. OP (x axis) across 67 sequence alignments (OTUs), both after BSM filtering. (A) whole dataset, (B) after rescaling to variances below 400 on each axis, and (C) after removing the three points that made the y = x line deviate (as red dots in A and B). The blue and red lines represent linear confidence (95%) and prediction lines, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4231947&req=5

Figure 6: Comparison of variances obtained by MED (y axis) vs. OP (x axis) across 67 sequence alignments (OTUs), both after BSM filtering. (A) whole dataset, (B) after rescaling to variances below 400 on each axis, and (C) after removing the three points that made the y = x line deviate (as red dots in A and B). The blue and red lines represent linear confidence (95%) and prediction lines, respectively.
Mentions: After applying BSM filtering to MED- and OP-generated tables, only 79 and 123 datasets still contained OTs, respectively, with 67 datasets in common to both techniques. The comparison of the variance in each dataset across the 67 sequence alignments identified three datasets which were mainly responsible for the departure from an exact match between the variances obtained by the two methods for each dataset analyzed (Figure 6). Removing those three datasets, in which OP identified generally higher variance than MED (Tutorial 4), led to a near 1:1 correspondence between the variance obtained by MED and by OP (Figure 6C).

Bottom Line: The aim of this implementation is to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework.These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible.The package and procedures are illustrated by several tutorials and examples.

View Article: PubMed Central - PubMed

Affiliation: HGF-MPG Group for Deep Sea Ecology and Technology, Max Planck Institute for Marine Microbiology Bremen, Germany.

ABSTRACT
Oligotyping is a novel, supervised computational method that classifies closely related sequences into "oligotypes" (OTs) based on subtle nucleotide variation (Eren et al., 2013). Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs). Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al., 2014c), in the statistical programming language and environment, R. The aim of this implementation is to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: (1) An analytical method (the broken stick model) is proposed to help identify OTs of low abundance that could be generated by chance alone and (2) a one-pass profiling (OP) method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising to be undertaken. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples.

No MeSH data available.


Related in: MedlinePlus