Limits...
CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

Zhou CL - Source Code Biol Med (2015)

Bottom Line: The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins.CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins.CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Group, Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94550 USA.

ABSTRACT

Background: In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure.

Results: This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins.

Conclusions: CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

No MeSH data available.


Multiple structure-based sequence alignment (MSSA) of Reston Ebolavirus secreted glycoprotein (sGP) model (reference) aligned with sGP models from four Ebolaviruses. Pairwise TM-align alignments were combined using combAlign.py
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4526201&req=5

Fig2: Multiple structure-based sequence alignment (MSSA) of Reston Ebolavirus secreted glycoprotein (sGP) model (reference) aligned with sGP models from four Ebolaviruses. Pairwise TM-align alignments were combined using combAlign.py

Mentions: A second test case involving structure-based comparison of Reston Ebolavirus sGP with the corresponding proteins from several other Ebolavirus species (Fig. 2) illustrates that combining structure-based alignments can reveal structural (and therefore potential functional) differences that might not be apparent using sequence-only methods (Fig. 3). The CombAlign alignment in Fig. 2 suggests that there may be considerable structural differences between sGP of Reston Ebolavirus compared to its pathogenic near neighbors in the N- terminal region, in the approximate center of the peptide chain, and in a large portion of the C-terminus, whereas the Clustal Omega [17] alignment depicted in Fig. 3 implies tight global and local correspondences between the residues of these proteins. Of particular note is the divergence seen at the C terminus, which contains the delta peptide (Fig. 3, box). This region is perfectly aligned at the sequence level, yet displays poor structural homology when examined using structure tools. Corresponding MSSAs were constructed using CombAlign to determine whether any given Ebolavirus sGP (as the reference structure) displayed close structure homology to any other (data not shown), and none was found to align well to any other. This apparent poor structure homology may be due to disorder in this region of the protein. Nonetheless, the MSSA in Fig. 2 supports the use of CombAlign for detecting structural deviations in a protein of interest relative to its structural near neighbors. It has been postulated that the delta peptide may function either to prevent superinfection of producer cells during early stages of infection or they may prevent trapping of budding progeny virus [11]. As the function of the delta peptide may be critical to pathogenicity or disease progression, it is interesting to note the apparent structural differences among the sGPs from the species depicted in Fig. 2, and based on this observation it would be reasonable to justify structure-function studies of these peptides in the context of their proposed functions.Fig. 2


CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

Zhou CL - Source Code Biol Med (2015)

Multiple structure-based sequence alignment (MSSA) of Reston Ebolavirus secreted glycoprotein (sGP) model (reference) aligned with sGP models from four Ebolaviruses. Pairwise TM-align alignments were combined using combAlign.py
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4526201&req=5

Fig2: Multiple structure-based sequence alignment (MSSA) of Reston Ebolavirus secreted glycoprotein (sGP) model (reference) aligned with sGP models from four Ebolaviruses. Pairwise TM-align alignments were combined using combAlign.py
Mentions: A second test case involving structure-based comparison of Reston Ebolavirus sGP with the corresponding proteins from several other Ebolavirus species (Fig. 2) illustrates that combining structure-based alignments can reveal structural (and therefore potential functional) differences that might not be apparent using sequence-only methods (Fig. 3). The CombAlign alignment in Fig. 2 suggests that there may be considerable structural differences between sGP of Reston Ebolavirus compared to its pathogenic near neighbors in the N- terminal region, in the approximate center of the peptide chain, and in a large portion of the C-terminus, whereas the Clustal Omega [17] alignment depicted in Fig. 3 implies tight global and local correspondences between the residues of these proteins. Of particular note is the divergence seen at the C terminus, which contains the delta peptide (Fig. 3, box). This region is perfectly aligned at the sequence level, yet displays poor structural homology when examined using structure tools. Corresponding MSSAs were constructed using CombAlign to determine whether any given Ebolavirus sGP (as the reference structure) displayed close structure homology to any other (data not shown), and none was found to align well to any other. This apparent poor structure homology may be due to disorder in this region of the protein. Nonetheless, the MSSA in Fig. 2 supports the use of CombAlign for detecting structural deviations in a protein of interest relative to its structural near neighbors. It has been postulated that the delta peptide may function either to prevent superinfection of producer cells during early stages of infection or they may prevent trapping of budding progeny virus [11]. As the function of the delta peptide may be critical to pathogenicity or disease progression, it is interesting to note the apparent structural differences among the sGPs from the species depicted in Fig. 2, and based on this observation it would be reasonable to justify structure-function studies of these peptides in the context of their proposed functions.Fig. 2

Bottom Line: The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins.CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins.CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Group, Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94550 USA.

ABSTRACT

Background: In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure.

Results: This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins.

Conclusions: CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

No MeSH data available.