Limits...
A simple method to control over-alignment in the MAFFT multiple sequence alignment program.

Katoh K, Standley DM - Bioinformatics (2016)

Bottom Line: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments).Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions.

View Article: PubMed Central - PubMed

Affiliation: Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan.

No MeSH data available.


Related in: MedlinePlus

(a) and (b), MSAs of CDK1 sequences with and without VSM, visualized on Jalview (Waterhouse et al., 2009); (c–e), NYN domain (purple), ZF domain (green) and regions without a predicted structure (gray) in MSAs of the zc3h12a-like and N4BP1-like families, with and without VSM. In e, an additional option (- - leavegappyregion; see Supplemental data for details) was applied that tends to return easy-to-understand alignments in gappy regions
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920119&req=5

btw108-F2: (a) and (b), MSAs of CDK1 sequences with and without VSM, visualized on Jalview (Waterhouse et al., 2009); (c–e), NYN domain (purple), ZF domain (green) and regions without a predicted structure (gray) in MSAs of the zc3h12a-like and N4BP1-like families, with and without VSM. In e, an additional option (- - leavegappyregion; see Supplemental data for details) was applied that tends to return easy-to-understand alignments in gappy regions

Mentions: Figure 2 shows two examples to illustrate the efficacy of VSM. For each example, the same sequence dataset was aligned by MAFFT with and without the use of a VSM. Two MSAs of vertebrate CDK1 protein sequences are shown in Figure 2a and b. The sequences are highly conserved but there are three unusual segments, possibly because of alternative splicing. Conventional MAFFT (G-INS-i without VSM) aligns these unusual segments (Fig. 2a). In contrast, by applying a VSM with =0.8, the unusual segments are not aligned (Fig. 2b). Depending on the necessity of the downstream analysis, the user can select an appropriate type of alignment.Fig. 2.


A simple method to control over-alignment in the MAFFT multiple sequence alignment program.

Katoh K, Standley DM - Bioinformatics (2016)

(a) and (b), MSAs of CDK1 sequences with and without VSM, visualized on Jalview (Waterhouse et al., 2009); (c–e), NYN domain (purple), ZF domain (green) and regions without a predicted structure (gray) in MSAs of the zc3h12a-like and N4BP1-like families, with and without VSM. In e, an additional option (- - leavegappyregion; see Supplemental data for details) was applied that tends to return easy-to-understand alignments in gappy regions
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920119&req=5

btw108-F2: (a) and (b), MSAs of CDK1 sequences with and without VSM, visualized on Jalview (Waterhouse et al., 2009); (c–e), NYN domain (purple), ZF domain (green) and regions without a predicted structure (gray) in MSAs of the zc3h12a-like and N4BP1-like families, with and without VSM. In e, an additional option (- - leavegappyregion; see Supplemental data for details) was applied that tends to return easy-to-understand alignments in gappy regions
Mentions: Figure 2 shows two examples to illustrate the efficacy of VSM. For each example, the same sequence dataset was aligned by MAFFT with and without the use of a VSM. Two MSAs of vertebrate CDK1 protein sequences are shown in Figure 2a and b. The sequences are highly conserved but there are three unusual segments, possibly because of alternative splicing. Conventional MAFFT (G-INS-i without VSM) aligns these unusual segments (Fig. 2a). In contrast, by applying a VSM with =0.8, the unusual segments are not aligned (Fig. 2b). Depending on the necessity of the downstream analysis, the user can select an appropriate type of alignment.Fig. 2.

Bottom Line: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments).Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions.

View Article: PubMed Central - PubMed

Affiliation: Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan.

No MeSH data available.


Related in: MedlinePlus