Limits...
A simple method to control over-alignment in the MAFFT multiple sequence alignment program.

Katoh K, Standley DM - Bioinformatics (2016)

Bottom Line: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments).Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions.

View Article: PubMed Central - PubMed

Affiliation: Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan.

No MeSH data available.


Related in: MedlinePlus

Comparison of sensitivity based on two real protein-based benchmarks. (a) In the PREFAB test, the  score (Eq. 7) was computed with the qscore program (Edgar, 2004a) and averaged for the 1682 entries. The scores of BAli-Phy with the two initial states were similar to each other, 0.5465 and 0.5480. (b) In the ‘extended’ subset of OXBench, the column score was computed using the run_metric.pl program (Raghava et al., 2003) and averaged for the 672 entries. For PRANK, the average for 667 entries are shown because it failed in five entries. BAli-Phy was not applicable to this test, because the maximum number of sequences in an MSA is 668
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920119&req=5

btw108-F3: Comparison of sensitivity based on two real protein-based benchmarks. (a) In the PREFAB test, the score (Eq. 7) was computed with the qscore program (Edgar, 2004a) and averaged for the 1682 entries. The scores of BAli-Phy with the two initial states were similar to each other, 0.5465 and 0.5480. (b) In the ‘extended’ subset of OXBench, the column score was computed using the run_metric.pl program (Raghava et al., 2003) and averaged for the 672 entries. For PRANK, the average for 667 entries are shown because it failed in five entries. BAli-Phy was not applicable to this test, because the maximum number of sequences in an MSA is 668

Mentions: The results of two different protein-based benchmarks, PREFAB (Edgar, 2004b) and OXBench (Raghava et al., 2003), are shown in Figure 3. These datasets are based on protein structural alignments. In relatively difficult cases, only short conserved regions (usually functional sites under strong evolutionary constraint) are aligned in the reference and it is assessed how correctly these sites are aligned by methods to be tested. By using VSM, this type of benchmark score decreased. The amount of decrease depends on the parameter ; small as approaches 0.8 and relatively large when . The benchmark scores of PRANK and BAli-Phy are low, consistent with a previous study (Sievers et al., 2011). This test does not take the over-alignment problem into account.Fig. 3.


A simple method to control over-alignment in the MAFFT multiple sequence alignment program.

Katoh K, Standley DM - Bioinformatics (2016)

Comparison of sensitivity based on two real protein-based benchmarks. (a) In the PREFAB test, the  score (Eq. 7) was computed with the qscore program (Edgar, 2004a) and averaged for the 1682 entries. The scores of BAli-Phy with the two initial states were similar to each other, 0.5465 and 0.5480. (b) In the ‘extended’ subset of OXBench, the column score was computed using the run_metric.pl program (Raghava et al., 2003) and averaged for the 672 entries. For PRANK, the average for 667 entries are shown because it failed in five entries. BAli-Phy was not applicable to this test, because the maximum number of sequences in an MSA is 668
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920119&req=5

btw108-F3: Comparison of sensitivity based on two real protein-based benchmarks. (a) In the PREFAB test, the score (Eq. 7) was computed with the qscore program (Edgar, 2004a) and averaged for the 1682 entries. The scores of BAli-Phy with the two initial states were similar to each other, 0.5465 and 0.5480. (b) In the ‘extended’ subset of OXBench, the column score was computed using the run_metric.pl program (Raghava et al., 2003) and averaged for the 672 entries. For PRANK, the average for 667 entries are shown because it failed in five entries. BAli-Phy was not applicable to this test, because the maximum number of sequences in an MSA is 668
Mentions: The results of two different protein-based benchmarks, PREFAB (Edgar, 2004b) and OXBench (Raghava et al., 2003), are shown in Figure 3. These datasets are based on protein structural alignments. In relatively difficult cases, only short conserved regions (usually functional sites under strong evolutionary constraint) are aligned in the reference and it is assessed how correctly these sites are aligned by methods to be tested. By using VSM, this type of benchmark score decreased. The amount of decrease depends on the parameter ; small as approaches 0.8 and relatively large when . The benchmark scores of PRANK and BAli-Phy are low, consistent with a previous study (Sievers et al., 2011). This test does not take the over-alignment problem into account.Fig. 3.

Bottom Line: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments).Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions.

View Article: PubMed Central - PubMed

Affiliation: Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan.

No MeSH data available.


Related in: MedlinePlus