Limits...
HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH
The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm. The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm for mutation probabilities between 0 and 80%. The counts on the y-axis are the number artificial rearrangements and the score difference is on the x-axis. In (a-d), the sharp peak at zero shows that the difference in score between the two algorithms is zero for most of the sequences. As the mutation probability increases (e-i), the scale for the counts changes as the difference between the algorithms becomes more apparent. The dotted lines indicate a constant count level at 5000 and 500 across the figure. Negative score differences indicate the approximation had a higher score than the full algorithm.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492005&req=5

Fig3: The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm. The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm for mutation probabilities between 0 and 80%. The counts on the y-axis are the number artificial rearrangements and the score difference is on the x-axis. In (a-d), the sharp peak at zero shows that the difference in score between the two algorithms is zero for most of the sequences. As the mutation probability increases (e-i), the scale for the counts changes as the difference between the algorithms becomes more apparent. The dotted lines indicate a constant count level at 5000 and 500 across the figure. Negative score differences indicate the approximation had a higher score than the full algorithm.

Mentions: The approximate backward algorithm was designed to quickly estimate the alignment score of a complete DP algorithm for overlapping sequences without sacrificing accuracy. Figure 3 shows the score differences between the approximate algorithm and the full DP algorithm for simulated sequence alignments with mutation probabilities ranging from 0% to 80%. Score differences between the approximate backwards algorithm and the full DP algorithm for sequences with no mutations (Figure 3a) were very rare. For mutation probabilities up to 30% (Figure 3b-d), which far exceeds the ~6% mutation frequency of average memory B cells [7] and includes the elevated nucleotide mutation frequency of some HIV antibodies [20], a sharp peak at zero indicates that the approximate algorithm replicates the expected score very well. As the mutation probability increases (Figure 3e-i), slight differences in scores between the two methods cause the distribution to spread. The dashed lines are presented to emphasize the differences in the Y-axis caused by the wider distribution. These results show that the approximate backwards algorithm can provide a good estimate of the alignment score calculated by a full DP algorithm.Figure 3


HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm. The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm for mutation probabilities between 0 and 80%. The counts on the y-axis are the number artificial rearrangements and the score difference is on the x-axis. In (a-d), the sharp peak at zero shows that the difference in score between the two algorithms is zero for most of the sequences. As the mutation probability increases (e-i), the scale for the counts changes as the difference between the algorithms becomes more apparent. The dotted lines indicate a constant count level at 5000 and 500 across the figure. Negative score differences indicate the approximation had a higher score than the full algorithm.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492005&req=5

Fig3: The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm. The difference in scores between the approximate backwards algorithm and an overlapping DP algorithm for mutation probabilities between 0 and 80%. The counts on the y-axis are the number artificial rearrangements and the score difference is on the x-axis. In (a-d), the sharp peak at zero shows that the difference in score between the two algorithms is zero for most of the sequences. As the mutation probability increases (e-i), the scale for the counts changes as the difference between the algorithms becomes more apparent. The dotted lines indicate a constant count level at 5000 and 500 across the figure. Negative score differences indicate the approximation had a higher score than the full algorithm.
Mentions: The approximate backward algorithm was designed to quickly estimate the alignment score of a complete DP algorithm for overlapping sequences without sacrificing accuracy. Figure 3 shows the score differences between the approximate algorithm and the full DP algorithm for simulated sequence alignments with mutation probabilities ranging from 0% to 80%. Score differences between the approximate backwards algorithm and the full DP algorithm for sequences with no mutations (Figure 3a) were very rare. For mutation probabilities up to 30% (Figure 3b-d), which far exceeds the ~6% mutation frequency of average memory B cells [7] and includes the elevated nucleotide mutation frequency of some HIV antibodies [20], a sharp peak at zero indicates that the approximate algorithm replicates the expected score very well. As the mutation probability increases (Figure 3e-i), slight differences in scores between the two methods cause the distribution to spread. The dashed lines are presented to emphasize the differences in the Y-axis caused by the wider distribution. These results show that the approximate backwards algorithm can provide a good estimate of the alignment score calculated by a full DP algorithm.Figure 3

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH