Limits...
Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm.

Magi A, Benelli M, Yoon S, Roviello F, Torricelli F - Nucleic Acids Res. (2011)

Bottom Line: Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome.For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously.When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

View Article: PubMed Central - PubMed

Affiliation: Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy. albertomagi@gmail.com

ABSTRACT
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

Show MeSH

Related in: MedlinePlus

TPR and FPR for JointSLM, EWT, CBS and GLAD on the synthetic chromosomes data sets. TPR is calculated as the average fraction of correctly detected alterations in each chromosome and the FPR as the average number of FP detected in each chromosome. For JointSLM, we report the results obtained in simulated datasets made of 10, 30 and 50 synthetic chromosomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105418&req=5

Figure 2: TPR and FPR for JointSLM, EWT, CBS and GLAD on the synthetic chromosomes data sets. TPR is calculated as the average fraction of correctly detected alterations in each chromosome and the FPR as the average number of FP detected in each chromosome. For JointSLM, we report the results obtained in simulated datasets made of 10, 30 and 50 synthetic chromosomes.

Mentions: To demonstrate the advantages of analysing multiple samples at once by means of our joint model instead of using single sample models, we compared the performance of JointSLM with other three algorithms: the CBS (17) and EWT (19) methods that have been already used in the analyses of DOC data and the GLAD method (22) previously used for the analysis of array CGH data. To this end, we applied the three methods with default parameter settings to the synthetic data sets of the previous paragraph and we calculated the TPR as the fraction of correctly detected alterations and the FPR as the average number of FP detected in each chromosome. To call gain and losses with CBS algorithm, we used the same thresholds used for the JointSLM algorithm (see Supplementary Data). The results of these analyses and the comparison with JointSLM performance are detailed in Figure 2.Figure 2.


Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm.

Magi A, Benelli M, Yoon S, Roviello F, Torricelli F - Nucleic Acids Res. (2011)

TPR and FPR for JointSLM, EWT, CBS and GLAD on the synthetic chromosomes data sets. TPR is calculated as the average fraction of correctly detected alterations in each chromosome and the FPR as the average number of FP detected in each chromosome. For JointSLM, we report the results obtained in simulated datasets made of 10, 30 and 50 synthetic chromosomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105418&req=5

Figure 2: TPR and FPR for JointSLM, EWT, CBS and GLAD on the synthetic chromosomes data sets. TPR is calculated as the average fraction of correctly detected alterations in each chromosome and the FPR as the average number of FP detected in each chromosome. For JointSLM, we report the results obtained in simulated datasets made of 10, 30 and 50 synthetic chromosomes.
Mentions: To demonstrate the advantages of analysing multiple samples at once by means of our joint model instead of using single sample models, we compared the performance of JointSLM with other three algorithms: the CBS (17) and EWT (19) methods that have been already used in the analyses of DOC data and the GLAD method (22) previously used for the analysis of array CGH data. To this end, we applied the three methods with default parameter settings to the synthetic data sets of the previous paragraph and we calculated the TPR as the fraction of correctly detected alterations and the FPR as the average number of FP detected in each chromosome. To call gain and losses with CBS algorithm, we used the same thresholds used for the JointSLM algorithm (see Supplementary Data). The results of these analyses and the comparison with JointSLM performance are detailed in Figure 2.Figure 2.

Bottom Line: Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome.For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously.When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

View Article: PubMed Central - PubMed

Affiliation: Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy. albertomagi@gmail.com

ABSTRACT
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

Show MeSH
Related in: MedlinePlus