Limits...
ANGSD: Analysis of Next Generation Sequencing Data.

Korneliussen TS, Albrechtsen A, Nielsen R - BMC Bioinformatics (2014)

Bottom Line: Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.We present a multithreaded program suite called ANGSD.The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

View Article: PubMed Central - PubMed

Affiliation: Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark. thorfinn@binf.ku.dk.

ABSTRACT

Background: High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results: We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions: The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

Show MeSH
Data formats and call graph.A) Dependency of different data formats and analyses that can be performed in ANGSD. B) Simplified call graph. Red nodes indicate areas that are not threaded. With the exception of file readers, all analyses, printing and cleaning is done by objects derived from the abstract base class called general.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4248462&req=5

Fig1: Data formats and call graph.A) Dependency of different data formats and analyses that can be performed in ANGSD. B) Simplified call graph. Red nodes indicate areas that are not threaded. With the exception of file readers, all analyses, printing and cleaning is done by objects derived from the abstract base class called general.

Mentions: ANGSD can currently parse a variety of different input formats including binary BAM files and mpileup text files. Genotype likelihoods input are supported for simple genotype likelihood formats and it also supports genotype (posterior) probabilities in the BEAGLE [16] format. ANGSD can perform various analyses, but some of these can be limited by the chosen input format e.g., sequencing depth calculation can only be performed on the basis of raw sequencing files and not GLs. The dependency between the different analysis and input formats is depicted in Figure 1. Indexed BAM files facilitate random access and this feature is implemented in ANGSD. Random access is not supported for other file formats.Figure 1


ANGSD: Analysis of Next Generation Sequencing Data.

Korneliussen TS, Albrechtsen A, Nielsen R - BMC Bioinformatics (2014)

Data formats and call graph.A) Dependency of different data formats and analyses that can be performed in ANGSD. B) Simplified call graph. Red nodes indicate areas that are not threaded. With the exception of file readers, all analyses, printing and cleaning is done by objects derived from the abstract base class called general.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4248462&req=5

Fig1: Data formats and call graph.A) Dependency of different data formats and analyses that can be performed in ANGSD. B) Simplified call graph. Red nodes indicate areas that are not threaded. With the exception of file readers, all analyses, printing and cleaning is done by objects derived from the abstract base class called general.
Mentions: ANGSD can currently parse a variety of different input formats including binary BAM files and mpileup text files. Genotype likelihoods input are supported for simple genotype likelihood formats and it also supports genotype (posterior) probabilities in the BEAGLE [16] format. ANGSD can perform various analyses, but some of these can be limited by the chosen input format e.g., sequencing depth calculation can only be performed on the basis of raw sequencing files and not GLs. The dependency between the different analysis and input formats is depicted in Figure 1. Indexed BAM files facilitate random access and this feature is implemented in ANGSD. Random access is not supported for other file formats.Figure 1

Bottom Line: Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.We present a multithreaded program suite called ANGSD.The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

View Article: PubMed Central - PubMed

Affiliation: Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark. thorfinn@binf.ku.dk.

ABSTRACT

Background: High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results: We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions: The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

Show MeSH