Limits...
AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation.

Masso M, Vaisman II - Adv Bioinformatics (2014)

Bottom Line: The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models.Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback.Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.

View Article: PubMed Central - PubMed

Affiliation: Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, Manassas, VA 20110, USA.

ABSTRACT
The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.

No MeSH data available.


Delaunay tessellation of the HIV-1 reverse transcriptase enzyme (PDB ID: 1rtjA). Initially, the protein is represented as a discrete set of points in 3D space, corresponding to the C-alpha atomic coordinates of every amino acid residue in the structure. A 3D tetrahedral tiling is then obtained by using these C-alpha points to serve as vertices. The complete tessellation yields hundreds of solid tetrahedra that are packed against one another in the form of a convex hull, filling the space otherwise occupied by the protein structure. Shown here is the modified tessellation obtained by removing all edges longer than 12 Å, which reveals clefts and pockets on the protein surface and ensures that each tetrahedron identifies a quadruplet of interacting amino acid residues at its four vertices via their C-alpha coordinates. Each C-alpha point is typically shared as a vertex by several tetrahedra as a result of their packed arrangement; hence, each amino acid may simultaneously participate in a number of distinct nearest neighbor residue quadruplets.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4150472&req=5

fig1: Delaunay tessellation of the HIV-1 reverse transcriptase enzyme (PDB ID: 1rtjA). Initially, the protein is represented as a discrete set of points in 3D space, corresponding to the C-alpha atomic coordinates of every amino acid residue in the structure. A 3D tetrahedral tiling is then obtained by using these C-alpha points to serve as vertices. The complete tessellation yields hundreds of solid tetrahedra that are packed against one another in the form of a convex hull, filling the space otherwise occupied by the protein structure. Shown here is the modified tessellation obtained by removing all edges longer than 12 Å, which reveals clefts and pockets on the protein surface and ensures that each tetrahedron identifies a quadruplet of interacting amino acid residues at its four vertices via their C-alpha coordinates. Each C-alpha point is typically shared as a vertex by several tetrahedra as a result of their packed arrangement; hence, each amino acid may simultaneously participate in a number of distinct nearest neighbor residue quadruplets.

Mentions: To derive the energy function, we selected X-ray crystallographic structures for 1417 single protein chains (http://proteins.gmu.edu/automute/tessellatable1417.txt) with high resolution (≤2.2 Å), sharing low sequence similarity (<30%), from the protein data bank (PDB) [11]. Each structure is abstracted to a collection of points in three-dimensional (3D) space, corresponding to the C-alpha coordinates of all its constituent amino acid residues (i.e., coarse graining of the protein structure at the residue level). The set of C-alpha points associated with a protein structure are used as vertices to create hundreds of nonoverlapping, space filling, irregular tetrahedra that collectively form a convex hull, referred to as a Delaunay tessellation in the computational geometry literature [12], which we generate with the Qhull software package [10]. Each tetrahedron in the tessellation objectively identifies at its four vertices a quadruplet of nearest neighbor residues in the protein structure; however, as an added measure to exclude false-positive residue quadruplet interactions, all tetrahedral simplex edges longer than 12 Å are immediately removed from every tessellation prior to further analysis [7, 8]. Since the tetrahedra forming a Delaunay tessellation are solid and pack against one another (i.e., two adjacent tetrahedra in a tessellation may share one vertex, one edge—two vertices, or one triangular face—three vertices), each C-alpha point generally serves simultaneously as a vertex for several tetrahedra in the tessellation (Figure 1).


AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation.

Masso M, Vaisman II - Adv Bioinformatics (2014)

Delaunay tessellation of the HIV-1 reverse transcriptase enzyme (PDB ID: 1rtjA). Initially, the protein is represented as a discrete set of points in 3D space, corresponding to the C-alpha atomic coordinates of every amino acid residue in the structure. A 3D tetrahedral tiling is then obtained by using these C-alpha points to serve as vertices. The complete tessellation yields hundreds of solid tetrahedra that are packed against one another in the form of a convex hull, filling the space otherwise occupied by the protein structure. Shown here is the modified tessellation obtained by removing all edges longer than 12 Å, which reveals clefts and pockets on the protein surface and ensures that each tetrahedron identifies a quadruplet of interacting amino acid residues at its four vertices via their C-alpha coordinates. Each C-alpha point is typically shared as a vertex by several tetrahedra as a result of their packed arrangement; hence, each amino acid may simultaneously participate in a number of distinct nearest neighbor residue quadruplets.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4150472&req=5

fig1: Delaunay tessellation of the HIV-1 reverse transcriptase enzyme (PDB ID: 1rtjA). Initially, the protein is represented as a discrete set of points in 3D space, corresponding to the C-alpha atomic coordinates of every amino acid residue in the structure. A 3D tetrahedral tiling is then obtained by using these C-alpha points to serve as vertices. The complete tessellation yields hundreds of solid tetrahedra that are packed against one another in the form of a convex hull, filling the space otherwise occupied by the protein structure. Shown here is the modified tessellation obtained by removing all edges longer than 12 Å, which reveals clefts and pockets on the protein surface and ensures that each tetrahedron identifies a quadruplet of interacting amino acid residues at its four vertices via their C-alpha coordinates. Each C-alpha point is typically shared as a vertex by several tetrahedra as a result of their packed arrangement; hence, each amino acid may simultaneously participate in a number of distinct nearest neighbor residue quadruplets.
Mentions: To derive the energy function, we selected X-ray crystallographic structures for 1417 single protein chains (http://proteins.gmu.edu/automute/tessellatable1417.txt) with high resolution (≤2.2 Å), sharing low sequence similarity (<30%), from the protein data bank (PDB) [11]. Each structure is abstracted to a collection of points in three-dimensional (3D) space, corresponding to the C-alpha coordinates of all its constituent amino acid residues (i.e., coarse graining of the protein structure at the residue level). The set of C-alpha points associated with a protein structure are used as vertices to create hundreds of nonoverlapping, space filling, irregular tetrahedra that collectively form a convex hull, referred to as a Delaunay tessellation in the computational geometry literature [12], which we generate with the Qhull software package [10]. Each tetrahedron in the tessellation objectively identifies at its four vertices a quadruplet of nearest neighbor residues in the protein structure; however, as an added measure to exclude false-positive residue quadruplet interactions, all tetrahedral simplex edges longer than 12 Å are immediately removed from every tessellation prior to further analysis [7, 8]. Since the tetrahedra forming a Delaunay tessellation are solid and pack against one another (i.e., two adjacent tetrahedra in a tessellation may share one vertex, one edge—two vertices, or one triangular face—three vertices), each C-alpha point generally serves simultaneously as a vertex for several tetrahedra in the tessellation (Figure 1).

Bottom Line: The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models.Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback.Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.

View Article: PubMed Central - PubMed

Affiliation: Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, Manassas, VA 20110, USA.

ABSTRACT
The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.

No MeSH data available.