Multiple Protein Structures Map Out Intrinsically Favored Conformational Changes
The large number of available HIV-1 protease structures provides a remarkable sampling of conformations of the different conformational states, which can be viewed as direct structural information about the dynamics of the HIV-1 protease. After structure matching, we applied principal component analysis (PCA) to obtain the important experimental motions for both bound and unbound structures. There were significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Even greater similarities were found when the approach was applied to an NMR ensemble, as well as to molecular dynamics (MD) trajectories. See: Yang L, Song G, Carriquiry A and Jernigan RL Close Correspondence between the Motions from Principal Component Analysis of Multiple HIV-1 Protease Structures and Elastic Network Modes Structure 16, 321–330, 2008. Even more structures are now available and lead to the primary motions in the first three principal components shown in the Figure below.
As a result, experimental structures can be used directly to model protein dynamics. The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein was developed. We provided the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.
New Elastic Network Models with Springs Based on Variability of Distances among Experimental Sets of Structures
In this case, more variable distances are assigned weaker springs and invariant distances are assigned strong springs. The outcome is an elastic network model that can break contacts, and in the movies above the flaps separate. This overcomes one of the problems with usual elastic models in which contacts never break but are just infinitely extensible.
Force Driven Conformational Transitions
We hypothesize that exothermic chemical reactions can cause highly specific conformational changes. We have seen how ATP hydrolysis, if it produces a highly directional force, can cause the conformational transition in a subunit of GroEL.
Two Conformations of GroEL with right domains aligned showing the vector changes between the open and closed forms on the left, ATP is shown as atoms at the center. The RMSD between the two forms is 14.1 Å. A force in a single direction originating from the ATP can drive this conformational change. The treatment considers ATP hydrolysis to be generating a directional ballistic force.
Energy landscape for a Monte Carlo simulation between the open and closed forms of GroEL (Liu J, Sankar K, Wang Y, Jia K, Jernigan RL. Directional Force Origination from ATP Hydrolysis Drives the GroEL Conformational Change. Biophys J. 2017;112:1561-1570).
Protein Dynamic Communities from Elastic Network Models Reproduce the Communities Defined by Molecular Dynamics
Dynamic communities in proteins comprise the cohesive structural units that individually exhibit rigid body motions. These can correspond to structural domains, but are usually smaller parts that move with respect to one another in a protein’s internal motions, key to its functional dynamics. Previous studies emphasized their importance to understand the nature of ligand-induced allosteric regulation. These studies reported that mutations to key community residues can hinder transmission of allosteric signals among the communities. Usually molecular dynamic (MD) simulations (~ 100 ns or longer) have been used to identify the communities - a demanding task for larger proteins. In the new work, we demonstrated that dynamic communities obtained from MD simulations can also be obtained with simpler elastic network models (ENMs) that our group pioneered. To verify this premise, we compared the specific communities obtained from MD and ENMs for 44 proteins. We evaluated the correspondences in communities between the two methods and computed the extent of agreement in the dynamic cross-correlation data used for community detection. This demonstrated a strong correspondence between the communities from MD and ENM and also good agreement for the residue cross-correlations. Thus, we can state with certainty that the dynamic communities from MD can be closely reproduced with ENMs.
The distance map for a protein provides information about the spatial proximity of residues. Spatially close residues are naturally expected to have high correlations in their dynamics. For the two proteins, we observe both MD and ENM showing high dynamic correlations for the spatially close residues. However, it is interesting to notice that correlations for residues in spatial proximity are more strongly indicated with the ENM than by MD. The cross-correlation maps from MD and ENM exhibit good overall agreement. It is also worth noting that for alpha-chymotrypsinogen, the blocks of residues with high dynamic correlation in MD ([1-70], [80-120, 1-70] and [120-220]) are almost closely replicated by ENM. Moreover, the extent of similarity in the correlation profiles of the secondary structure elements (helical regions along the diagonal and anti-parallel beta strands perpendicular to the diagonal, shown in red) for MD and ENM is quite remarkable.
Unstable Mutants Cause Changes in the Protein Community Structure
In addition, we investigated a set of T4 lysozyme mutants where there were experimental structures and stability information. We compared the community structures of stable and unstable mutant forms of T4 Lysozyme with its wild-type. We found that communities for unstable mutants are changed in comparison with the communities for the wild-type structures, demonstrating that such ENM-based community structures can serve as a means to rapidly identify deleterious mutants.
We considered 16 mutant structures of T4 Lysozyme with crystallized and reported by Mooers et al ( Mooers BHM, Baase WA, Wray JW, Matthews BW. Contributions of all 20 amino acids at site 96 to the stability and structure of T4 lysozyme. Protein Sci. 2009;18:871–80.). In their study, the authors investigated the effect of mutating Arg96 on the stability of the enzyme. We arbitrarily divide the dataset into two groups: the more unstable mutants (rows 1-8) having between -4.7 and -2.6 and less unstable mutants (rows 9-16), varying between -2.6 and 0. For simplicity, we refer to the more unstable type as unstable and the less unstable type as stable. We obtain dynamic communities with ENM using all heavy-atoms from the atomic protein structures.
Previously methods to investigate dynamic communities in proteins have relied on the use of trajectory data from MD simulations. Analyses of dynamic communities stress the importance of identifying the cohesive parts of proteins for their functional dynamics and to understand the mechanisms of protein function and allostery. However, simulations from MD are computationally expensive for large macromolecular structures. There is also a need for long time scale simulations to adequately sample the conformational ensemble for any given protein. This can be demanding in terms of time and often requires use of the highest performance computers. Thus, there is a significant need for the simpler method we have established here to aid in defining these dynamic communities, which is computationally less expensive and yet maintains substantially good agreement with the results from MD. This has been accomplished here.
The ability of ENM to discriminate stable mutants from unstable ones by evaluating community agreement is notable. The extent of change in community structures in unstable mutants is much greater than among the stable mutants. We have used the atomic structures of T4 Lysozyme in the ENM as opposed to the coarse-grained version to account for the mutation changes, and have seen in every case clear changes in the distribution of communities for the unstable mutants.
For further details see Mishra SK, Jernigan RL. Protein Dynamic Communities form Elastic Network Models Align Closely to the Communities Defined by Molecular Dynamics. PloS One 2018;13:e0199225