Research Using PIDD

We have developed a computational approach of deriving distance constraints from databases of known protein structures for structure refinement. We calculate the distributions of the distances of various types in known protein structures, and use them to obtain the most probable ranges or the mean-force potentials for the distances. We then impose the constraints on the structures to be refined or include the mean-force potentials in energy minimization so that more plausible structural models may be built.

 

 

Distance Constraints

  

We specify a distance by its two corresponding atoms, two associated residues, and the separating residues, assuming that the two atoms are not in the same residue. Let A1 and A2 be the two atoms, R1 and R2 the two residues, and S1, …, SN the separating residues. Then, the distribution of the distance in known protein structures can be represented by a function P[A1,A2,R1,R2,S1,…,SN].

 

Based on the distribution of a particular type of distance in known protein structures, we can define a range for the distance with its lower and upper bounds set to the mean ± 2 × the standard deviation of the distance. This range can then be used as a distance constraint for structure refinement.

 

Mean-Force Potentials

 

With the range constraints, we treat the distances within the range equally, but in fact, they have different probabilities, and even the distances outside the range are not completely impossible. In order to represent the true distribution of the distance in known protein structures, we have also developed the potentials of mean force (PMF) for the distances of interest and add them to the energy function for refinement.

 

Figure 3. Illustration of Mean Force Potentials.

 

Total 70 NMR-determined structures are refined using the database derived mean-force potentials. The energies of about 80% of the structures are significantly minimized (in average, by 7.5%) after the refinement. Most ensembles of proteins refined using distance derived potentials are obtained with a higher resolution and Ramachandran plots for those proteins are improved as well compared with proteins determined by general protocol.

 

 

Figure 4. The ensemble becomes more compact after the refinement.

 

Table 2. Results of Ramachandran plots.

 

Concluding Remarks

 

We have developed a computational approach for refining protein structures using distance constraints or mean force potentials that can be derived from databases of known protein structures.

 

We have also developed a protein inter-atomic distance distribution database, for computing the distributions of the distances of various types and automatically generating distance constraints or mean force potentials.

 

Work on combining energy minimization, incorporating other types of geometric data, utilizing parallel computing tools, and refining theoretical as well as experimental models is underway.