The purpose of this homework is to explore the Monte Carlo Algorithm and apply it to the simplified protein folding model in 2D.

Monte Carlo Method uses the randomly generated possible solutions to a certain problem in a solution space and test its degree of goodness based on certain physical requirements(W. H. Press 2007). The ways of generating the possible solutions are usually two. First, we can generate the possible solutions totally at random. For example, we use random number generator to do Monte Carlo Integration. Second, we can generate the possible solutions from the previous step by randomly changing some parameters of the previous one. We will use the later one to generate our 2D protein structures in this homework.

The general flow of Monte Carlo Method is shown as follows. Note that we use the term ”conformation space” instead of ”solution space” since we are talking about protein structures here.

- 1.
Start from a initial state in the conformation space.

- 2.
Randomly change the previous state, subjecting to requirement 1.

- 3.
Determine the degree of goodness by criterion 2.

- 4.
Accept/ reject this state by physical rule 3.

- 5.
If it is accepted, pass this state and repeat 2 through 5 for certain number of steps.

- 6.
If it is rejected, do not pass the state and repeat 2 through 4 until the new state is accepted.

The requirement 1, criterion 2 and rule 3 are problem-specific and we will mention these in our protein folding problem.

Proteins are composed of 20 different amino acids (AAs) in a polypeptide chain and due to the mutual interactions between those AAs, proteins will favor some folded states to lower the Gibbs free energy. The interactions are mostly negative because of hydrophobic effects or ion-ion interactions. In order for proteins to perform certain biological functions, their unique structures are essential. We can use a simple bead-and-chain model for a 2D protein chain(Sali 1994), assuming that all the AAs are of the same size and the peptide bond between two AAs is rigid, being only one unit and unstretchable. Each AA occupies one grid point of the 2D space and cannot be in the same point of any other AAs. When protein folds, the non-covalent interactions apply to the two non-bonding AAs separate by one unit. An we can calculate the relative Gibbs free energy \(\Delta G\) by summing all the interactions of non-bonding neighbors.

where \((i,j)\) are the indices of two neighboring AAs of types \((t(i),t(j))\) and \(E\) is an \(20\times 20\) interaction matrix.

With this model in mind, we can determine the requirements mentioned in the previous section.

- 1.
Requirement 1:

- (a)
The modified AA cannot occupy other’s positions.

- (b)
The modified AA must be one unit away from its neighbor(s).

- (c)
The best way to modify an AA’s position is to move \((1,0)\), \((1,1)\), \((0,1)\), \((-1,1)\), \((-1,0)\), \((-1,-1)\), \((0,-1)\) and \((1,-1)\), eight possible changes.

- (a)
- 2.
Criterion 2: Evaluate the interaction energy, \(E_{0}\) and use this number to determine the goodness of the state. The lower, the better.

- 3.
Rule 3:

- (a)
If the new state has lower \(E_{0}\), the protein will adopt this state in order to reach the minimum of the folding landscape.

- (b)
If the new state has higher \(E_{0}\), the protein does not favor such state. However, there is still some probability to jump from lower energy state to higher energy ones, \(P=e^{-(E_{new}-E_{0})/kT}\).

- (a)

Once the model is set and the steps are clear, we can start to do the simulation.

First, we need a general random number generator, myrand(seed). We will test its validity and then apply it to alter the position of a randomly selected AA. Given different seed, the function will give different random number sequences. Our seeds for generating interaction matrix \(E\) and the AA sequence in the protein are two distinct yet fixed value. So we will guarantee that we use exactly the same protein and interactions throughout the calculation. Other than those, the seed will be set by time(NULL) and independent of our bias.

Second, there are several subroutines to do the Monte Carlo calculations and to make sure the protein is subject to some requirements. Note that the information of proteins is stored in a \(45\times 3\) matrix with the first column being AA types, second the x positions and third the y positions.

- 1.
neighbor(): inputs a Protein Vector and outputs the pairs of indices of two non-bonding AAs.

- 2.
Energy(): input pairs of neighbor indices, Protein Vector, interaction matrix \(E\) and outputs the energy \(E_{0}\).

- 3.
n2ndistance(): inputs a Protein Vector and outputs the end-to-end distance of the protein.

- 4.
pcheck(): inputs Protein Vector and the index of certain AA and check if that AA occupies others’ positions. The function outputs true if the protein is not allowed, false otherwise.

- 5.
conformationchange(): input Protein Vector and make a position change to one of its AA and outputs a modified new Protein Vector.

First we tested the random number generator myrand(). The random number generator gives a uniform distribution of numbers between \(0\) and \(1\). And the points \((x_{n+1},x_{n})\) cover the \(1\times 1\) square without noticeable patterns as shown in Fig. 1. We further test it by estimating \(\pi\).

where \(N_{in}\) is the number of points in the quarter circle and \(N\) is the total number of points.As the total number of points increases, the RHS will reach \(\frac{\pi}{4}\) asympotically, as shown in Fig. 2.