Limits...
HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH
The success rate for simulated sequences as a function of the mutation probability. At each mutation probability, 10,000 artificial rearrangements were generated. The success rate is the percentage of rearrangements with correctly identified V- (circles), D- (triangles), and J- (pluses) germlines. The All Success (X) line is the percent of rearrangements with all gene utilization correctly identified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492005&req=5

Fig4: The success rate for simulated sequences as a function of the mutation probability. At each mutation probability, 10,000 artificial rearrangements were generated. The success rate is the percentage of rearrangements with correctly identified V- (circles), D- (triangles), and J- (pluses) germlines. The All Success (X) line is the percent of rearrangements with all gene utilization correctly identified.

Mentions: As the mutation probability increases, sequence alignment becomes more difficult. The effect of the mutation probability on the success rate can be seen in FigureĀ 4. The success rates for the V, D, and J have sigmoidal curves. The V-segment has a success rate of around 95% even when the mutation probability is approximately 40%. The sigmoidal curve decreases sharply from around 95% success rate to below 10% as the mutation probability changes from 40% to 60%. A leveling off of the success rate at around 3% is due to random matching to the correct V-germline. The sigmoidal shape of the J has a gentler slope and levels off at around 13%, which is less than the 17% expected from randomly matching J-germlines. However, if the highly mutated sequences matched the wrong V, it is possible that there are no nucleotides left for a J match. No J germline would be assigned, which would tend to decrease the J success rate.Figure 4


HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

The success rate for simulated sequences as a function of the mutation probability. At each mutation probability, 10,000 artificial rearrangements were generated. The success rate is the percentage of rearrangements with correctly identified V- (circles), D- (triangles), and J- (pluses) germlines. The All Success (X) line is the percent of rearrangements with all gene utilization correctly identified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492005&req=5

Fig4: The success rate for simulated sequences as a function of the mutation probability. At each mutation probability, 10,000 artificial rearrangements were generated. The success rate is the percentage of rearrangements with correctly identified V- (circles), D- (triangles), and J- (pluses) germlines. The All Success (X) line is the percent of rearrangements with all gene utilization correctly identified.
Mentions: As the mutation probability increases, sequence alignment becomes more difficult. The effect of the mutation probability on the success rate can be seen in FigureĀ 4. The success rates for the V, D, and J have sigmoidal curves. The V-segment has a success rate of around 95% even when the mutation probability is approximately 40%. The sigmoidal curve decreases sharply from around 95% success rate to below 10% as the mutation probability changes from 40% to 60%. A leveling off of the success rate at around 3% is due to random matching to the correct V-germline. The sigmoidal shape of the J has a gentler slope and levels off at around 13%, which is less than the 17% expected from randomly matching J-germlines. However, if the highly mutated sequences matched the wrong V, it is possible that there are no nucleotides left for a J match. No J germline would be assigned, which would tend to decrease the J success rate.Figure 4

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH