Limits...
ANDES: Statistical tools for the ANalyses of DEep Sequencing.

Li K, Venter E, Yooseph S, Stockwell TB, Eckerle LD, Denison MR, Spiro DJ, Methé BA - BMC Res Notes (2010)

Bottom Line: Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common.We have provided a software package and demonstrated its application on various empirically-derived datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: The J, Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA. kli@jcvi.org.

ABSTRACT

Background: The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA), where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep.

Findings: We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.

Conclusions: As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its application on various empirically-derived datasets. Investigators may download the software from Sourceforge at https://sourceforge.net/projects/andestools.

No MeSH data available.


Example calculation of the RMSD value between a biallelic sample, X, and a monoallelic sample, Y. In this case, the intermediate RMSD value of 0.316 was generated because sample X was biallelic, and sample Y was monoallelic but in partial agreement.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2921379&req=5

Figure 5: Example calculation of the RMSD value between a biallelic sample, X, and a monoallelic sample, Y. In this case, the intermediate RMSD value of 0.316 was generated because sample X was biallelic, and sample Y was monoallelic but in partial agreement.

Mentions: The minimum RMSD value of 0 occurs when the distributions of nucleotides are identical between the two alignments for a specific position. The maximum value of 0.632 is generated if the allele of one sample is completely different than the second sample, for example, if sample X consisted of 100% T's and sample Y consisted of 100% G's. Another important RMSD value is 0.316. This value occurs when one sample is biallelic, for example A/T, and the other sample is partially similar with a single allelic representation of A. See Figure 4 and 5 for sample calculations of the two cases.


ANDES: Statistical tools for the ANalyses of DEep Sequencing.

Li K, Venter E, Yooseph S, Stockwell TB, Eckerle LD, Denison MR, Spiro DJ, Methé BA - BMC Res Notes (2010)

Example calculation of the RMSD value between a biallelic sample, X, and a monoallelic sample, Y. In this case, the intermediate RMSD value of 0.316 was generated because sample X was biallelic, and sample Y was monoallelic but in partial agreement.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2921379&req=5

Figure 5: Example calculation of the RMSD value between a biallelic sample, X, and a monoallelic sample, Y. In this case, the intermediate RMSD value of 0.316 was generated because sample X was biallelic, and sample Y was monoallelic but in partial agreement.
Mentions: The minimum RMSD value of 0 occurs when the distributions of nucleotides are identical between the two alignments for a specific position. The maximum value of 0.632 is generated if the allele of one sample is completely different than the second sample, for example, if sample X consisted of 100% T's and sample Y consisted of 100% G's. Another important RMSD value is 0.316. This value occurs when one sample is biallelic, for example A/T, and the other sample is partially similar with a single allelic representation of A. See Figure 4 and 5 for sample calculations of the two cases.

Bottom Line: Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common.We have provided a software package and demonstrated its application on various empirically-derived datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: The J, Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA. kli@jcvi.org.

ABSTRACT

Background: The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA), where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep.

Findings: We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.

Conclusions: As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its application on various empirically-derived datasets. Investigators may download the software from Sourceforge at https://sourceforge.net/projects/andestools.

No MeSH data available.