Limits...
Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm.

Magi A, Benelli M, Yoon S, Roviello F, Torricelli F - Nucleic Acids Res. (2011)

Bottom Line: Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome.For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously.When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

View Article: PubMed Central - PubMed

Affiliation: Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy. albertomagi@gmail.com

ABSTRACT
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

Show MeSH

Related in: MedlinePlus

Venn diagram of the comparison between the regions called by JointSLM, PEM-based methods and by the GSV Consortium.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105418&req=5

Figure 3: Venn diagram of the comparison between the regions called by JointSLM, PEM-based methods and by the GSV Consortium.

Mentions: Lastly, we compared our set of calls with SVs detected by PEM-based approach. The SVs of two of the individuals considered in this study (YH and NA18507) were previously analysed by means of PEM-based approach (23,24). To understand the differences between JointSLM and PEM-based methods in detecting known CNVs, we took the set of copy number variants of GSV as a set of true positive (TP), and we determined the proportion of TP identified by the two approaches. In the samples NA18507 and YH, JointSLM was able to identify 290 (39%) and 256 (34%) of the 748 CNV regions of the validation set, while PEM-based methods detected 125 (17%) and 79 (7%) (see Figure 3).Figure 3.


Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm.

Magi A, Benelli M, Yoon S, Roviello F, Torricelli F - Nucleic Acids Res. (2011)

Venn diagram of the comparison between the regions called by JointSLM, PEM-based methods and by the GSV Consortium.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105418&req=5

Figure 3: Venn diagram of the comparison between the regions called by JointSLM, PEM-based methods and by the GSV Consortium.
Mentions: Lastly, we compared our set of calls with SVs detected by PEM-based approach. The SVs of two of the individuals considered in this study (YH and NA18507) were previously analysed by means of PEM-based approach (23,24). To understand the differences between JointSLM and PEM-based methods in detecting known CNVs, we took the set of copy number variants of GSV as a set of true positive (TP), and we determined the proportion of TP identified by the two approaches. In the samples NA18507 and YH, JointSLM was able to identify 290 (39%) and 256 (34%) of the 748 CNV regions of the validation set, while PEM-based methods detected 125 (17%) and 79 (7%) (see Figure 3).Figure 3.

Bottom Line: Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome.For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously.When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

View Article: PubMed Central - PubMed

Affiliation: Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy. albertomagi@gmail.com

ABSTRACT
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.

Show MeSH
Related in: MedlinePlus