Limits...
HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH
The success rate and score distribution of V, D, and J alignments for simulated rearrangements with mutation probability: a) 0% b) 20% c) 30% and d) 50%. The success rate is a function of the alignment score and mutation probability. The success rate (circles with dashed lines) is the frequency at which a V-, D-, or J-alignments in the simulated sequences are correct. The score distribution is calculated from the number of times an alignment score occurs in 10,000 simulated sequences. Scores that occur less than 10 times (out of the 10,000 simulations) are not shown to prevent discontinuities. V-segments scores are normalized by the length to account for differences in the size of the V-germlines, which may have 5’ or 3’ truncations. The solid vertical line in the D-alignment score at score = 9 corresponds to the suggested minimum length for the size of a D-match in JOINSOLVER. The increase in the success rate of low scoring J-matches is a result of ties in which most or all J germlines are selected.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492005&req=5

Fig5: The success rate and score distribution of V, D, and J alignments for simulated rearrangements with mutation probability: a) 0% b) 20% c) 30% and d) 50%. The success rate is a function of the alignment score and mutation probability. The success rate (circles with dashed lines) is the frequency at which a V-, D-, or J-alignments in the simulated sequences are correct. The score distribution is calculated from the number of times an alignment score occurs in 10,000 simulated sequences. Scores that occur less than 10 times (out of the 10,000 simulations) are not shown to prevent discontinuities. V-segments scores are normalized by the length to account for differences in the size of the V-germlines, which may have 5’ or 3’ truncations. The solid vertical line in the D-alignment score at score = 9 corresponds to the suggested minimum length for the size of a D-match in JOINSOLVER. The increase in the success rate of low scoring J-matches is a result of ties in which most or all J germlines are selected.

Mentions: In our simulations, the alignment scores tend to decrease as the mutation probabilities increase. However, for most biologically relevant situations, the success rate remains high. Figure 5 shows the success rate for V, D, and J matching of simulated sequences with mutation probabilities of (a) 0%, (b) 20%, (c) 30%, and (d) 50% as a function of score. This simulation is important, because in real sequences, the mutation frequency is unknown. The success rate for V-, D- and J-rearrangement alignments improves as the score increases and is inversely related to the mutation probability. The score distributions are presented to allow the reader to focus attention to regions where the success rate is most relevant. As seen in the figure, as the mutation probability increases, the alignment score distribution shifts to smaller scores for the V, D, and J alignments. However, the success rate remains high at the peak of the score distribution (i.e. where most of the counts are, the success rate is high). The success rate is only shown for scoring bins that have more than 100 counts in the score distribution. The first column in the figure is the success rate for V-matching. For V-alignments, the scores decrease from approximately 1500 to 700 going from a mutation probability of 0% to 20%, however the success rate remains near 100%.Figure 5


HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

The success rate and score distribution of V, D, and J alignments for simulated rearrangements with mutation probability: a) 0% b) 20% c) 30% and d) 50%. The success rate is a function of the alignment score and mutation probability. The success rate (circles with dashed lines) is the frequency at which a V-, D-, or J-alignments in the simulated sequences are correct. The score distribution is calculated from the number of times an alignment score occurs in 10,000 simulated sequences. Scores that occur less than 10 times (out of the 10,000 simulations) are not shown to prevent discontinuities. V-segments scores are normalized by the length to account for differences in the size of the V-germlines, which may have 5’ or 3’ truncations. The solid vertical line in the D-alignment score at score = 9 corresponds to the suggested minimum length for the size of a D-match in JOINSOLVER. The increase in the success rate of low scoring J-matches is a result of ties in which most or all J germlines are selected.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492005&req=5

Fig5: The success rate and score distribution of V, D, and J alignments for simulated rearrangements with mutation probability: a) 0% b) 20% c) 30% and d) 50%. The success rate is a function of the alignment score and mutation probability. The success rate (circles with dashed lines) is the frequency at which a V-, D-, or J-alignments in the simulated sequences are correct. The score distribution is calculated from the number of times an alignment score occurs in 10,000 simulated sequences. Scores that occur less than 10 times (out of the 10,000 simulations) are not shown to prevent discontinuities. V-segments scores are normalized by the length to account for differences in the size of the V-germlines, which may have 5’ or 3’ truncations. The solid vertical line in the D-alignment score at score = 9 corresponds to the suggested minimum length for the size of a D-match in JOINSOLVER. The increase in the success rate of low scoring J-matches is a result of ties in which most or all J germlines are selected.
Mentions: In our simulations, the alignment scores tend to decrease as the mutation probabilities increase. However, for most biologically relevant situations, the success rate remains high. Figure 5 shows the success rate for V, D, and J matching of simulated sequences with mutation probabilities of (a) 0%, (b) 20%, (c) 30%, and (d) 50% as a function of score. This simulation is important, because in real sequences, the mutation frequency is unknown. The success rate for V-, D- and J-rearrangement alignments improves as the score increases and is inversely related to the mutation probability. The score distributions are presented to allow the reader to focus attention to regions where the success rate is most relevant. As seen in the figure, as the mutation probability increases, the alignment score distribution shifts to smaller scores for the V, D, and J alignments. However, the success rate remains high at the peak of the score distribution (i.e. where most of the counts are, the success rate is high). The success rate is only shown for scoring bins that have more than 100 counts in the score distribution. The first column in the figure is the success rate for V-matching. For V-alignments, the scores decrease from approximately 1500 to 700 going from a mutation probability of 0% to 20%, however the success rate remains near 100%.Figure 5

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH