Limits...
Integrating protein structural dynamics and evolutionary analysis with Bio3D.

Skjærven L, Yao XQ, Scarabelli G, Grant BJ - BMC Bioinformatics (2014)

Bottom Line: These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis.New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included.We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedicine, University of Bergen, Bergen, Norway. lars.skjarven@biomed.uib.no.

ABSTRACT

Background: Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution.

Results: Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.

Conclusions: The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .

Show MeSH
Example workflow forensembleNMA and PCA. In this example the user starts with a single protein identifier, performs a BLAST search to identify related structures, fetches and aligns the identified structures, performs PCA and calculates the normal modes for each structure to obtain aligned normal mode vectors. Result interpretation and comparison of mode subsets is made available through various methods for similarity assessment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4279791&req=5

Fig1: Example workflow forensembleNMA and PCA. In this example the user starts with a single protein identifier, performs a BLAST search to identify related structures, fetches and aligns the identified structures, performs PCA and calculates the normal modes for each structure to obtain aligned normal mode vectors. Result interpretation and comparison of mode subsets is made available through various methods for similarity assessment.

Mentions: A typical user workflow for the comparison of cross-species protein flexibility is depicted in Figure 1. In this example, we begin by fetching the protein sequence of a PDB structure with the get.seq() function. This sequence is then used in a BLAST or HMMER search of the full PDB database to identify related protein structures (functions blast() or hmmer()). Identified structures can then optionally be downloaded (with the function get.pdb()) and aligned using the function pdbaln(). The output will be a multiple sequence alignment together with aligned coordinate data and associated attributes. Ensemble NMA on all aligned structures can then be carried out with function nma(). The function provides an “eNMA” object containing aligned eigenvectors, mode fluctuations, and all pair-wise root mean squared inner product (RMSIP) values. These results are formatted to facilitate direct comparison of the flexibility patterns between protein structures, as well as clustering based on the pair-wise modes similarity. Also shown in Figure 1 is the typical application of principal component analysis (PCA) on the same experimental structures using the function pca(). This provides principal components of the same dimensions as the normal modes facilitating direct comparison of mode fluctuations, or alternatively mode vectors using functions such as rmsip() and overlap(). Indeed extensive new functions for the analysis of normal modes and principal components are now provided. These include cross-correlation, fluctuations, overlap, vector field, dynamic sub-domain clustering, correlation network analysis and movie generation along with integrated functions for plotting and visualization. Extensive multicore support is also included for a number of commonly used functions. This enables a significant speed-up for time-consuming tasks, such as ensemble NMA for large protein families, modes comparison, domain assignment, correlation analysis for multiple structures, and analysis for long-timescale MD simulations. Comprehensive tutorials integrating NMA with PCA, simulation data from MD, and additional sequence and structure analysis methods, including correlation network analysis, are available in Additional files 1, 2, 3 and 4.Figure 1


Integrating protein structural dynamics and evolutionary analysis with Bio3D.

Skjærven L, Yao XQ, Scarabelli G, Grant BJ - BMC Bioinformatics (2014)

Example workflow forensembleNMA and PCA. In this example the user starts with a single protein identifier, performs a BLAST search to identify related structures, fetches and aligns the identified structures, performs PCA and calculates the normal modes for each structure to obtain aligned normal mode vectors. Result interpretation and comparison of mode subsets is made available through various methods for similarity assessment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4279791&req=5

Fig1: Example workflow forensembleNMA and PCA. In this example the user starts with a single protein identifier, performs a BLAST search to identify related structures, fetches and aligns the identified structures, performs PCA and calculates the normal modes for each structure to obtain aligned normal mode vectors. Result interpretation and comparison of mode subsets is made available through various methods for similarity assessment.
Mentions: A typical user workflow for the comparison of cross-species protein flexibility is depicted in Figure 1. In this example, we begin by fetching the protein sequence of a PDB structure with the get.seq() function. This sequence is then used in a BLAST or HMMER search of the full PDB database to identify related protein structures (functions blast() or hmmer()). Identified structures can then optionally be downloaded (with the function get.pdb()) and aligned using the function pdbaln(). The output will be a multiple sequence alignment together with aligned coordinate data and associated attributes. Ensemble NMA on all aligned structures can then be carried out with function nma(). The function provides an “eNMA” object containing aligned eigenvectors, mode fluctuations, and all pair-wise root mean squared inner product (RMSIP) values. These results are formatted to facilitate direct comparison of the flexibility patterns between protein structures, as well as clustering based on the pair-wise modes similarity. Also shown in Figure 1 is the typical application of principal component analysis (PCA) on the same experimental structures using the function pca(). This provides principal components of the same dimensions as the normal modes facilitating direct comparison of mode fluctuations, or alternatively mode vectors using functions such as rmsip() and overlap(). Indeed extensive new functions for the analysis of normal modes and principal components are now provided. These include cross-correlation, fluctuations, overlap, vector field, dynamic sub-domain clustering, correlation network analysis and movie generation along with integrated functions for plotting and visualization. Extensive multicore support is also included for a number of commonly used functions. This enables a significant speed-up for time-consuming tasks, such as ensemble NMA for large protein families, modes comparison, domain assignment, correlation analysis for multiple structures, and analysis for long-timescale MD simulations. Comprehensive tutorials integrating NMA with PCA, simulation data from MD, and additional sequence and structure analysis methods, including correlation network analysis, are available in Additional files 1, 2, 3 and 4.Figure 1

Bottom Line: These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis.New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included.We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedicine, University of Bergen, Bergen, Norway. lars.skjarven@biomed.uib.no.

ABSTRACT

Background: Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution.

Results: Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.

Conclusions: The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .

Show MeSH