Limits...
Genome-wide DNA polymorphism analyses using VariScan.

Hutter S, Vilella AJ, Rozas J - BMC Bioinformatics (2006)

Bottom Line: Additionally, we have also incorporated a graphical-user interface.The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers.VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain. hutter@zi.biologie.uni-muenchen.de

ABSTRACT

Background: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis.

Results: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers.

Conclusion: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.

Show MeSH

Related in: MedlinePlus

Application of the MRA analysis to the coalescent-simulated data set. The data contains 10 sequences of 2,000,000 bp each, and it was generated applying a per-site value of θ = 0.01. Upon this raw data set, we made two different levels of changes: i) two wide reductions in nucleotide diversity levels (g1: α = 1/3, β = 500,000; g2: α = 1/2, β = 500,000); and ii) 11 local valleys of reduced variability (v1: α = 1/4, β = 20,000; v2: α = 1/4, β = 15,000; v3: α = 1/4, β = 10,000; v4: α = 1/4, β = 5,000; v5: α = 1/4, β = 2,000; v6: α = 1/3, β = 20,000; v7: α = 1/3, β = 10,000; v8: α = 1/3, β = 5,000; v9: α = 1/2, β = 10,000; v10: α = 1/2, β = 5,000; v11: α = 1/2, β = 2,000). (a) nucleotide diversity profile obtained by SW using non-overlapping windows of 50 bp; (b) Signal reconstruction of low-frequency bands with information from 7 to 8 MRA levels, showing the location (in arrows) of 5 depleted-variation regions (v4–5, v8, v10–11; β ≤ 5,000). c) Signal reconstruction from 9 to 12 MRA levels, showing the location (in arrows) of 9 depleted-variation regions (v1–4, v6–10; 5,000 ≤ β ≤ 20,000). d) Signal reconstruction from 13 to 15 MRA levels, showing the location (in arrows) of the two broad areas with reduced levels of variation (g1–2; β = 500,000). The nucleotide sequence positions (X axis) are given in kb..
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1574356&req=5

Figure 4: Application of the MRA analysis to the coalescent-simulated data set. The data contains 10 sequences of 2,000,000 bp each, and it was generated applying a per-site value of θ = 0.01. Upon this raw data set, we made two different levels of changes: i) two wide reductions in nucleotide diversity levels (g1: α = 1/3, β = 500,000; g2: α = 1/2, β = 500,000); and ii) 11 local valleys of reduced variability (v1: α = 1/4, β = 20,000; v2: α = 1/4, β = 15,000; v3: α = 1/4, β = 10,000; v4: α = 1/4, β = 5,000; v5: α = 1/4, β = 2,000; v6: α = 1/3, β = 20,000; v7: α = 1/3, β = 10,000; v8: α = 1/3, β = 5,000; v9: α = 1/2, β = 10,000; v10: α = 1/2, β = 5,000; v11: α = 1/2, β = 2,000). (a) nucleotide diversity profile obtained by SW using non-overlapping windows of 50 bp; (b) Signal reconstruction of low-frequency bands with information from 7 to 8 MRA levels, showing the location (in arrows) of 5 depleted-variation regions (v4–5, v8, v10–11; β ≤ 5,000). c) Signal reconstruction from 9 to 12 MRA levels, showing the location (in arrows) of 9 depleted-variation regions (v1–4, v6–10; 5,000 ≤ β ≤ 20,000). d) Signal reconstruction from 13 to 15 MRA levels, showing the location (in arrows) of the two broad areas with reduced levels of variation (g1–2; β = 500,000). The nucleotide sequence positions (X axis) are given in kb..

Mentions: We generated random data sets based on the simplest non-recombining coalescent model [6] as follows: i) generation of evolutionary times and the gene genealogy (fixing the number of sequences); ii) incorporation of Poisson-randomly distributed mutations (fixing the population mutation parameter θ). Subsequently, we modify this data set by changing (at specific locations) the applied θ value. In particular, we reduced nucleotide diversity values continuously and symmetrically. We made changes at two different levels: iii) one or more chromosome-wide nucleotide diversity reductions; iv) additional reductions at narrow regions. These changes were conducted by using different intensity (parameter α; the degree of nucleotide diversity reduction) and stretch lengths (parameter β; β specifies the half-length of the affected region) values. Therefore, the simulated data set mimics the effects caused by partial selective sweeps upon different nucleotide diversity levels. The analysis of one of these simulated data files is given in Figure 4. It can be seen that the MRA technique recovers the two different intensity types of distorted regions included in the data: nucleotide diversity reductions affecting small DNA stretches are detected at lower MRA levels while more genome-wide reductions are identified at higher levels.


Genome-wide DNA polymorphism analyses using VariScan.

Hutter S, Vilella AJ, Rozas J - BMC Bioinformatics (2006)

Application of the MRA analysis to the coalescent-simulated data set. The data contains 10 sequences of 2,000,000 bp each, and it was generated applying a per-site value of θ = 0.01. Upon this raw data set, we made two different levels of changes: i) two wide reductions in nucleotide diversity levels (g1: α = 1/3, β = 500,000; g2: α = 1/2, β = 500,000); and ii) 11 local valleys of reduced variability (v1: α = 1/4, β = 20,000; v2: α = 1/4, β = 15,000; v3: α = 1/4, β = 10,000; v4: α = 1/4, β = 5,000; v5: α = 1/4, β = 2,000; v6: α = 1/3, β = 20,000; v7: α = 1/3, β = 10,000; v8: α = 1/3, β = 5,000; v9: α = 1/2, β = 10,000; v10: α = 1/2, β = 5,000; v11: α = 1/2, β = 2,000). (a) nucleotide diversity profile obtained by SW using non-overlapping windows of 50 bp; (b) Signal reconstruction of low-frequency bands with information from 7 to 8 MRA levels, showing the location (in arrows) of 5 depleted-variation regions (v4–5, v8, v10–11; β ≤ 5,000). c) Signal reconstruction from 9 to 12 MRA levels, showing the location (in arrows) of 9 depleted-variation regions (v1–4, v6–10; 5,000 ≤ β ≤ 20,000). d) Signal reconstruction from 13 to 15 MRA levels, showing the location (in arrows) of the two broad areas with reduced levels of variation (g1–2; β = 500,000). The nucleotide sequence positions (X axis) are given in kb..
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1574356&req=5

Figure 4: Application of the MRA analysis to the coalescent-simulated data set. The data contains 10 sequences of 2,000,000 bp each, and it was generated applying a per-site value of θ = 0.01. Upon this raw data set, we made two different levels of changes: i) two wide reductions in nucleotide diversity levels (g1: α = 1/3, β = 500,000; g2: α = 1/2, β = 500,000); and ii) 11 local valleys of reduced variability (v1: α = 1/4, β = 20,000; v2: α = 1/4, β = 15,000; v3: α = 1/4, β = 10,000; v4: α = 1/4, β = 5,000; v5: α = 1/4, β = 2,000; v6: α = 1/3, β = 20,000; v7: α = 1/3, β = 10,000; v8: α = 1/3, β = 5,000; v9: α = 1/2, β = 10,000; v10: α = 1/2, β = 5,000; v11: α = 1/2, β = 2,000). (a) nucleotide diversity profile obtained by SW using non-overlapping windows of 50 bp; (b) Signal reconstruction of low-frequency bands with information from 7 to 8 MRA levels, showing the location (in arrows) of 5 depleted-variation regions (v4–5, v8, v10–11; β ≤ 5,000). c) Signal reconstruction from 9 to 12 MRA levels, showing the location (in arrows) of 9 depleted-variation regions (v1–4, v6–10; 5,000 ≤ β ≤ 20,000). d) Signal reconstruction from 13 to 15 MRA levels, showing the location (in arrows) of the two broad areas with reduced levels of variation (g1–2; β = 500,000). The nucleotide sequence positions (X axis) are given in kb..
Mentions: We generated random data sets based on the simplest non-recombining coalescent model [6] as follows: i) generation of evolutionary times and the gene genealogy (fixing the number of sequences); ii) incorporation of Poisson-randomly distributed mutations (fixing the population mutation parameter θ). Subsequently, we modify this data set by changing (at specific locations) the applied θ value. In particular, we reduced nucleotide diversity values continuously and symmetrically. We made changes at two different levels: iii) one or more chromosome-wide nucleotide diversity reductions; iv) additional reductions at narrow regions. These changes were conducted by using different intensity (parameter α; the degree of nucleotide diversity reduction) and stretch lengths (parameter β; β specifies the half-length of the affected region) values. Therefore, the simulated data set mimics the effects caused by partial selective sweeps upon different nucleotide diversity levels. The analysis of one of these simulated data files is given in Figure 4. It can be seen that the MRA technique recovers the two different intensity types of distorted regions included in the data: nucleotide diversity reductions affecting small DNA stretches are detected at lower MRA levels while more genome-wide reductions are identified at higher levels.

Bottom Line: Additionally, we have also incorporated a graphical-user interface.The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers.VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain. hutter@zi.biologie.uni-muenchen.de

ABSTRACT

Background: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis.

Results: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers.

Conclusion: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.

Show MeSH
Related in: MedlinePlus