Limits...
HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH

Related in: MedlinePlus

Overview of V(D)J Partitioning. Partitioning Ig VDJ rearrangements at conserved VH & JH motifs for alignment with the approximate backwards algorithm and other DP algorithms. a) A VDJ nucleotide sequence before subdivision and algorithm processing. The dots between CAG…GTA and GGA…CAG represent the nucleotides that are omitted for brevity. The V and J motifs, TAT TAC TGT and C TGG GG, respectively are shown in bold face type. b) The VDJ rearrangement is divided into 2 sections: the 5’ end of the V-segment containing codons 1–101; and the 3’ end of the V-segment, the VD junction, the D-segment, the DJ junction, and the J-segment. The 5’ end of the V-segment is aligned backwards (3’ to 5’), and the reset of the sequence is aligned forwards (5’ to 3’). The V-end is identified and the two parts of the V are merged. c) The rest of sequence is split just before the J motif, which is where the J-Start DP algorithm aligns the sequence to a J gene (left arrow) and determines the 5’ end of the J-segment. The V-end algorithm is also used to identify the 3’ end of the JH (right arrow). The 5’ and 3’ ends of the J are merged. d) A specialized local DP algorithm is used to align a D-gene within the VD-D-DJ subunit. In the figure, the V-, D-, and J-segments within each partition of the sequence are labeled. The intervening nucleotides labeled VD and DJ represent N addition nucleotides in the junctions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492005&req=5

Fig1: Overview of V(D)J Partitioning. Partitioning Ig VDJ rearrangements at conserved VH & JH motifs for alignment with the approximate backwards algorithm and other DP algorithms. a) A VDJ nucleotide sequence before subdivision and algorithm processing. The dots between CAG…GTA and GGA…CAG represent the nucleotides that are omitted for brevity. The V and J motifs, TAT TAC TGT and C TGG GG, respectively are shown in bold face type. b) The VDJ rearrangement is divided into 2 sections: the 5’ end of the V-segment containing codons 1–101; and the 3’ end of the V-segment, the VD junction, the D-segment, the DJ junction, and the J-segment. The 5’ end of the V-segment is aligned backwards (3’ to 5’), and the reset of the sequence is aligned forwards (5’ to 3’). The V-end is identified and the two parts of the V are merged. c) The rest of sequence is split just before the J motif, which is where the J-Start DP algorithm aligns the sequence to a J gene (left arrow) and determines the 5’ end of the J-segment. The V-end algorithm is also used to identify the 3’ end of the JH (right arrow). The 5’ and 3’ ends of the J are merged. d) A specialized local DP algorithm is used to align a D-gene within the VD-D-DJ subunit. In the figure, the V-, D-, and J-segments within each partition of the sequence are labeled. The intervening nucleotides labeled VD and DJ represent N addition nucleotides in the junctions.

Mentions: Similar to the original JOINSOLVER algorithm, conserved motifs initiate the alignment process [1]. In preparation for heavy chain VDJ alignment, the rearrangements are split into smaller regions using the conserved 3’ VH-motif “TAT TAC TGT” and JH-motif “C TGG GG”. If a motif is not found, we fall back to other methods of finding the motif, which are described below. Figure 1 provides an overview of the partitioning process with an example sequence. In the figure, many of the V and J nucleotides are replaced with dots to preserve space. First the conserved motifs are found in the sequence (Figure 1a, the motifs are bold). The sequence is split just before the highly conserved 3’ V-motif (Figure 1b). The sequence on the 5’ side of the V-motif, which includes the nucleotides encoding codons 1–101 of the V-germline using IMGT numbering [13], is aligned using our approximate backwards DP algorithm (3’ to 5’). In the figure, the arrows show the alignment direction. The sequence on the 3’ side of the V-motif consists of the 3’ of the V-segment, the VD junction, D-segment, DJ junction, and the J-segment. Our V-end algorithm, an overlap DP algorithm described below, aligns the sequence on the 3’ side of the V-motif and identifies the end of the V-segment. The two parts of the V-segment are merged to produce a completely aligned V-segment. The remainder of the rearrangement consists of the unaligned VD junction, the D-segment, the DJ junction, and the J-segment.Figure 1


HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs.

Russ DE, Ho KY, Longo NS - BMC Bioinformatics (2015)

Overview of V(D)J Partitioning. Partitioning Ig VDJ rearrangements at conserved VH & JH motifs for alignment with the approximate backwards algorithm and other DP algorithms. a) A VDJ nucleotide sequence before subdivision and algorithm processing. The dots between CAG…GTA and GGA…CAG represent the nucleotides that are omitted for brevity. The V and J motifs, TAT TAC TGT and C TGG GG, respectively are shown in bold face type. b) The VDJ rearrangement is divided into 2 sections: the 5’ end of the V-segment containing codons 1–101; and the 3’ end of the V-segment, the VD junction, the D-segment, the DJ junction, and the J-segment. The 5’ end of the V-segment is aligned backwards (3’ to 5’), and the reset of the sequence is aligned forwards (5’ to 3’). The V-end is identified and the two parts of the V are merged. c) The rest of sequence is split just before the J motif, which is where the J-Start DP algorithm aligns the sequence to a J gene (left arrow) and determines the 5’ end of the J-segment. The V-end algorithm is also used to identify the 3’ end of the JH (right arrow). The 5’ and 3’ ends of the J are merged. d) A specialized local DP algorithm is used to align a D-gene within the VD-D-DJ subunit. In the figure, the V-, D-, and J-segments within each partition of the sequence are labeled. The intervening nucleotides labeled VD and DJ represent N addition nucleotides in the junctions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492005&req=5

Fig1: Overview of V(D)J Partitioning. Partitioning Ig VDJ rearrangements at conserved VH & JH motifs for alignment with the approximate backwards algorithm and other DP algorithms. a) A VDJ nucleotide sequence before subdivision and algorithm processing. The dots between CAG…GTA and GGA…CAG represent the nucleotides that are omitted for brevity. The V and J motifs, TAT TAC TGT and C TGG GG, respectively are shown in bold face type. b) The VDJ rearrangement is divided into 2 sections: the 5’ end of the V-segment containing codons 1–101; and the 3’ end of the V-segment, the VD junction, the D-segment, the DJ junction, and the J-segment. The 5’ end of the V-segment is aligned backwards (3’ to 5’), and the reset of the sequence is aligned forwards (5’ to 3’). The V-end is identified and the two parts of the V are merged. c) The rest of sequence is split just before the J motif, which is where the J-Start DP algorithm aligns the sequence to a J gene (left arrow) and determines the 5’ end of the J-segment. The V-end algorithm is also used to identify the 3’ end of the JH (right arrow). The 5’ and 3’ ends of the J are merged. d) A specialized local DP algorithm is used to align a D-gene within the VD-D-DJ subunit. In the figure, the V-, D-, and J-segments within each partition of the sequence are labeled. The intervening nucleotides labeled VD and DJ represent N addition nucleotides in the junctions.
Mentions: Similar to the original JOINSOLVER algorithm, conserved motifs initiate the alignment process [1]. In preparation for heavy chain VDJ alignment, the rearrangements are split into smaller regions using the conserved 3’ VH-motif “TAT TAC TGT” and JH-motif “C TGG GG”. If a motif is not found, we fall back to other methods of finding the motif, which are described below. Figure 1 provides an overview of the partitioning process with an example sequence. In the figure, many of the V and J nucleotides are replaced with dots to preserve space. First the conserved motifs are found in the sequence (Figure 1a, the motifs are bold). The sequence is split just before the highly conserved 3’ V-motif (Figure 1b). The sequence on the 5’ side of the V-motif, which includes the nucleotides encoding codons 1–101 of the V-germline using IMGT numbering [13], is aligned using our approximate backwards DP algorithm (3’ to 5’). In the figure, the arrows show the alignment direction. The sequence on the 3’ side of the V-motif consists of the 3’ of the V-segment, the VD junction, D-segment, DJ junction, and the J-segment. Our V-end algorithm, an overlap DP algorithm described below, aligns the sequence on the 3’ side of the V-motif and identifies the end of the V-segment. The two parts of the V-segment are merged to produce a completely aligned V-segment. The remainder of the rearrangement consists of the unaligned VD junction, the D-segment, the DJ junction, and the J-segment.Figure 1

Bottom Line: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively.The D-segment is much harder to fit even at 20% mutation probability.For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

View Article: PubMed Central - PubMed

Affiliation: Division of Computational Bioscience, Center for Information Technology, NIH, 12 South Drive, Bethesda, MD, 20892, USA. druss@mail.nih.gov.

ABSTRACT

Background: Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.

Results: In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.

Conclusions: HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.

Show MeSH
Related in: MedlinePlus