Limits...
A new method to reconstruct recombination events at a genomic scale.

Melé M, Javed A, Pybus M, Calafell F, Parida L, Bertranpetit J, Genographic Consortium Membe - PLoS Comput. Biol. (2010)

Bottom Line: Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity.Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS.We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.

View Article: PubMed Central - PubMed

Affiliation: IBE, Institute of Evolutionary Biology (UPF-CSIC), CEXS-UPF-PRBB, Barcelona, Catalonia, Spain.

ABSTRACT
Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.

Show MeSH

Related in: MedlinePlus

Scheme of the recombination detection process integrating 10 runs of the algorithm.The analyzed dataset is the one shown in Figure 1. (A) Integration of the information of 10 runs regarding the recombination event of sequence 5. For each run of the algorithm, the starting and ending position of the network in which the recombination is detected, is saved. For each run, the size of the first column varies, being 10, 1, 2, 3… up to 9 and therefore the number of runs corresponds to the grain size. At the end, for each recombination event, we have a set of intervals in which it was detected which can be represented graphically as a distribution. The maximum interval represents the region in which the recombination has been seen the maximum number of times. The mean point of the maximum interval is defined as the estimated breakpoint position. The threshold indicates the number of times a recombination has to be detected to be considered as true. The intersection between the threshold and the detection distribution defines the threshold interval in which the algorithm guarantees that the recombination event is located. (B) Integration of the information of all detections for the 10 runs of the algorithm. Each line represents a set of sequences in which the same recombination event has been detected; the distribution of the line shows the number of times the event has been detected along the sequence. (C) Final output of the algorithm: breakpoint positon in the first row, the recotypes in rows and the recombination events detected in columns. The presence of a particular recombination event in a particular sequence is represented as a 1, and absence as a 0. Note that the recotypes represent exactly the coloring of the sequences in Figure 1 and that only recombinations that had a distribution above the threshold are represented in the recotypes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2991245&req=5

pcbi-1001010-g002: Scheme of the recombination detection process integrating 10 runs of the algorithm.The analyzed dataset is the one shown in Figure 1. (A) Integration of the information of 10 runs regarding the recombination event of sequence 5. For each run of the algorithm, the starting and ending position of the network in which the recombination is detected, is saved. For each run, the size of the first column varies, being 10, 1, 2, 3… up to 9 and therefore the number of runs corresponds to the grain size. At the end, for each recombination event, we have a set of intervals in which it was detected which can be represented graphically as a distribution. The maximum interval represents the region in which the recombination has been seen the maximum number of times. The mean point of the maximum interval is defined as the estimated breakpoint position. The threshold indicates the number of times a recombination has to be detected to be considered as true. The intersection between the threshold and the detection distribution defines the threshold interval in which the algorithm guarantees that the recombination event is located. (B) Integration of the information of all detections for the 10 runs of the algorithm. Each line represents a set of sequences in which the same recombination event has been detected; the distribution of the line shows the number of times the event has been detected along the sequence. (C) Final output of the algorithm: breakpoint positon in the first row, the recotypes in rows and the recombination events detected in columns. The presence of a particular recombination event in a particular sequence is represented as a 1, and absence as a 0. Note that the recotypes represent exactly the coloring of the sequences in Figure 1 and that only recombinations that had a distribution above the threshold are represented in the recotypes.

Mentions: (A) Input dataset of 10 sequences and 83 SNPs. Colors on sequences represent similar patterns of SNPs, and a change of color along a sequence represents the signal of past recombination events. (B) Recoded matrix. The patterns of SNPs within a column of grain size n (10 SNPs in this example) have been recoded into numbers. Those sequences having the same pattern within a column will be assigned the same number. Between columns, numbers represent completely different patterns. Unique patterns are assigned the number zero and will not be considered. (C) Trees one, two and three, constructed based on the recoded matrix. Going from left to right, the recoded matrix is segmented into sets of compatible [30] columns of patterns. Compatibility of columns is checked using a variant of the four gamete test [31] for multi-allelic markers. Each segment is represented as a tree in which the leaf nodes contain the sequences analyzed and the edges contain the patterns inherited, similar to point mutations. Recurrence is not allowed. (D) Networks 1–2 and 2–3 constructed from consecutive trees one, two and three merged pairwise. All the information contained in the two original trees will be present in the compatible network. Recombinant sequences are leaf nodes descending from nodes having two parents, which means that have inherited patterns from two different nodes (similar to an Ancestral Recombination Graph). (E) Information saved for each detected recombination event: the recombinants sequences and the starting and ending position of the network. For a more detailed description of the algorithm see [12]. In red, the recombination event that will be further studied in Figure 2.


A new method to reconstruct recombination events at a genomic scale.

Melé M, Javed A, Pybus M, Calafell F, Parida L, Bertranpetit J, Genographic Consortium Membe - PLoS Comput. Biol. (2010)

Scheme of the recombination detection process integrating 10 runs of the algorithm.The analyzed dataset is the one shown in Figure 1. (A) Integration of the information of 10 runs regarding the recombination event of sequence 5. For each run of the algorithm, the starting and ending position of the network in which the recombination is detected, is saved. For each run, the size of the first column varies, being 10, 1, 2, 3… up to 9 and therefore the number of runs corresponds to the grain size. At the end, for each recombination event, we have a set of intervals in which it was detected which can be represented graphically as a distribution. The maximum interval represents the region in which the recombination has been seen the maximum number of times. The mean point of the maximum interval is defined as the estimated breakpoint position. The threshold indicates the number of times a recombination has to be detected to be considered as true. The intersection between the threshold and the detection distribution defines the threshold interval in which the algorithm guarantees that the recombination event is located. (B) Integration of the information of all detections for the 10 runs of the algorithm. Each line represents a set of sequences in which the same recombination event has been detected; the distribution of the line shows the number of times the event has been detected along the sequence. (C) Final output of the algorithm: breakpoint positon in the first row, the recotypes in rows and the recombination events detected in columns. The presence of a particular recombination event in a particular sequence is represented as a 1, and absence as a 0. Note that the recotypes represent exactly the coloring of the sequences in Figure 1 and that only recombinations that had a distribution above the threshold are represented in the recotypes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2991245&req=5

pcbi-1001010-g002: Scheme of the recombination detection process integrating 10 runs of the algorithm.The analyzed dataset is the one shown in Figure 1. (A) Integration of the information of 10 runs regarding the recombination event of sequence 5. For each run of the algorithm, the starting and ending position of the network in which the recombination is detected, is saved. For each run, the size of the first column varies, being 10, 1, 2, 3… up to 9 and therefore the number of runs corresponds to the grain size. At the end, for each recombination event, we have a set of intervals in which it was detected which can be represented graphically as a distribution. The maximum interval represents the region in which the recombination has been seen the maximum number of times. The mean point of the maximum interval is defined as the estimated breakpoint position. The threshold indicates the number of times a recombination has to be detected to be considered as true. The intersection between the threshold and the detection distribution defines the threshold interval in which the algorithm guarantees that the recombination event is located. (B) Integration of the information of all detections for the 10 runs of the algorithm. Each line represents a set of sequences in which the same recombination event has been detected; the distribution of the line shows the number of times the event has been detected along the sequence. (C) Final output of the algorithm: breakpoint positon in the first row, the recotypes in rows and the recombination events detected in columns. The presence of a particular recombination event in a particular sequence is represented as a 1, and absence as a 0. Note that the recotypes represent exactly the coloring of the sequences in Figure 1 and that only recombinations that had a distribution above the threshold are represented in the recotypes.
Mentions: (A) Input dataset of 10 sequences and 83 SNPs. Colors on sequences represent similar patterns of SNPs, and a change of color along a sequence represents the signal of past recombination events. (B) Recoded matrix. The patterns of SNPs within a column of grain size n (10 SNPs in this example) have been recoded into numbers. Those sequences having the same pattern within a column will be assigned the same number. Between columns, numbers represent completely different patterns. Unique patterns are assigned the number zero and will not be considered. (C) Trees one, two and three, constructed based on the recoded matrix. Going from left to right, the recoded matrix is segmented into sets of compatible [30] columns of patterns. Compatibility of columns is checked using a variant of the four gamete test [31] for multi-allelic markers. Each segment is represented as a tree in which the leaf nodes contain the sequences analyzed and the edges contain the patterns inherited, similar to point mutations. Recurrence is not allowed. (D) Networks 1–2 and 2–3 constructed from consecutive trees one, two and three merged pairwise. All the information contained in the two original trees will be present in the compatible network. Recombinant sequences are leaf nodes descending from nodes having two parents, which means that have inherited patterns from two different nodes (similar to an Ancestral Recombination Graph). (E) Information saved for each detected recombination event: the recombinants sequences and the starting and ending position of the network. For a more detailed description of the algorithm see [12]. In red, the recombination event that will be further studied in Figure 2.

Bottom Line: Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity.Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS.We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.

View Article: PubMed Central - PubMed

Affiliation: IBE, Institute of Evolutionary Biology (UPF-CSIC), CEXS-UPF-PRBB, Barcelona, Catalonia, Spain.

ABSTRACT
Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.

Show MeSH
Related in: MedlinePlus