Running GROMACS
From Bootable Cluster CD
Running GROMACS
GROMACS is a freely available molecular dynamics package. For more information, please see About GROMACS. This tutorial will walk through running GROMACS to analyze a molecule. The molecule we analyze is a small, fast, stable system that doesn't scale particularly well beyond a few processes. This tutorial assumes that you have booted the BCCD on one or more machines.
Installing GROMACS
First we install GROMACS and the GMXBench molecules:
- Login to the bccd environment as root. If you have already logged in as
bccd, to become root, issue the$su -command, which is the "sudo user" command. For example:[bccd@host129]>su -You will be prompted for the root user's password, which should beletmein(or see the login splash screen for your image's root password) - Type
list-packagesat the prompt. From the list that appears, selectgromacsand thenOK. This is the GROMACS package. - Once that has finished, issue
list-packagesagain. This time, selectgmxbenchand thenOK. These are sample molecules often run as benchmarks when using GROMACS. - Type
logoutto return to the prompt or thebccduser.
Running a Molecular System
Now we are ready to prepare and run a molecular system. Let's start by running on one processor.
Running on One Processor
- Login to the bccd environment as
bccd. If you prompted to run the BCCD heartbeat process, chooseYes. If you're currently running as the root user (your prompt will be light-blue and end in the hash - #), you can log out of the root account by typinglogoutor using the ctrl-d key combination. Then log in asbccd. - Type
bccd-allowallAt the prompt, chooseYes. - Type
bccd-snarfhostsThis will find all the other computers running the BCCD on the same network. - Type
bccd-checkem machinesAt this step, note how many hosts (other computers running the BCCD) you have access to. - The GMX Benchmarks contain four different simulations that can be run. You can see these by moving to the gmxbench directory by typing
cd /usr/local/gmxbench/. You should see four directories.- d.dppc - This is a bilayer phospholipid membrane, similar to the structure of a cell membrane. The simulation consists of 1024 dipalmitoylphosphatidylcholine (DPPC) lipids simulated in water.
- d.lzm - This is the lysozyme enzyme, an antibiotic enzyme found in the human body.
- d.poly-ch2 - This consists of a 6000 unit polyethylene molecule.
- d.villin - This is the villin headpiece protein. It starts with a chain of thirty-six amino acids making up the protein and places it in an environment of water as it folds. (To be specific, the amino acid sequence DEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLFMLS.) This is the simulation we will run.
- Now create a folder to run the benchmark in. Do this with the command
mkdir ~/gmxtmp - After we've created the folder, we need to copy the files over that we are going to run. Since we're analyzing the villin molecule, this is the directory we will copy. Copy the villin folder over with the following command:
cp /usr/local/gmxbench/d.villin/* ~/gmxtmp
(However, if you want to use one of the other molecules, copy that folder instead. For instance, to copy lysozyme, type cp /usr/local/gmxbench/d.lzm/* ~/gmxtmp) We create this directory because GROMACS assumes by default that all the files are in the current working directory.
- Move into that directory with the command
cd ~/gmxtmp - Next we need to prepare the system to run GROMACS with the GROMACS Pre-Processor (grompp). We're going to run with one processor first. Issue the command
grompp -np 1 -shuffle -sort -v
The -shuffle argument specifies that the the workload should be equally split up amongst the number of processors; this won't matter when we're using only one processor, but will help optimize when we use more than one. -sort is another optimization when using multiple processors; it splits up the molecules according to their coordinates. -np 1 tells the system that we're going to run on one processor.
- Finally we're ready to run GROMACS itself. (Here, we're actually using MPI to run GROMACS. For more information on MPI, check out Compiling and Running.) Do this with the command
mpirun -np 1 -machinefile ~/machines /usr/local/bin/mdrun -v -c output.gro
Once you hit enter, mdrun (the computational workhorse of GROMACS) begins to run the molecular dynamics simulation. For each atom, a list of its neighbors is made, and then the forces for each neighbor are computed on each atom. After summing up all the forces, the new position and velocity of the atom is calculated. This happens at each timestep!
Benchmarking Results
At the end of mdrun's output you will notice a number of run-time statistics. These can be used for benchmarking. Note the values for Real(s) (this is the wall time, or the total time the process was running) and ps/NODE hour (picoseconds per node hour).
Output File Results
The output is also stored in the file we specified when running mdrun: output.gro. This file represents the final positions and forces on all of the atoms. To view in a scroll-able form, use cat output.gro | more (To quit from this, type the letter q.)
This file lists all of the atoms present in the villin headpiece. The file is quite long! For instance, in the amino acid sequence for villin (DEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLFMLS) there are three A's, which stand for alanine. One alanine molecule alone has three atoms of carbon, seven atoms of hydrogen, one of nitrogen, and two of oxygen!
We can view just the atoms making up alanine molecules in our output file with the command
cat output.gro | grep ALA
This searches our output file for any lines that contain 'ALA'. The three letter symbol for any other amino acid could also be used, or, to search for a specific atom, the symbol for that atom. For instance, to search for hydrogen, we would use the command cat output.gro | grep ' H ' (We need the spaces because we're only interested in lines that have an 'H' by itself. For instance, we're not interested in lines with the word PHE just because it has an H in it.)
Your results for alanine molecules should look something like this:
9ALA N 87 3.492 2.996 1.618 0.1574 0.2410 -0.0154 9ALA H 88 3.519 2.922 1.679 2.0612 0.8963 0.0172 9ALA CA 89 3.614 3.063 1.563 -0.6239 0.2846 0.0653 9ALA CB 90 3.744 3.018 1.632 -0.3756 0.0682 0.1689 9ALA C 91 3.627 3.036 1.412 0.0235 -0.4013 0.1374 9ALA O 92 3.560 2.952 1.355 -0.9055 -0.2922 0.6419 17ALA N 166 3.249 2.174 1.947 0.8748 -0.1217 -0.4455 17ALA H 167 3.163 2.174 1.895 1.5885 1.3499 -1.6779 17ALA CA 168 3.262 2.077 2.058 -0.1161 0.1543 0.2606 17ALA CB 169 3.269 2.138 2.195 -0.2276 0.7956 -0.3209 17ALA C 170 3.139 1.988 2.070 0.5609 0.6531 0.0107 17ALA O 171 3.028 2.033 2.034 -0.2431 0.5592 -0.3315 19ALA N 189 3.013 1.812 2.337 0.1137 0.0036 0.4017 19ALA H 190 3.111 1.813 2.357 0.5014 0.0426 -1.3815 19ALA CA 191 2.922 1.861 2.448 0.5072 0.0745 0.3275 19ALA CB 192 2.980 1.825 2.580 -0.3012 0.7028 0.5064 19ALA C 193 2.908 2.016 2.438 -0.0876 -0.0857 0.1006 19ALA O 194 3.007 2.087 2.435 -0.1408 -0.5455 0.2537
Each line represents one atom. The first six lines belong to an alanine which happens to be the ninth group of atoms (called a residue) in this file, the second and third alanines are the seventeenth and nineteenth residues, respectively. Looking at the first line, the second column indicates the first atom is N for nitrogen. The second has an H for hydrogen. The third column indicates the unique identifying number for that particular atom. From here, the numbers with decimals all indicate positions or velocities: the first three are for the x, y, and z positions, and the last three are for the x, y, and z velocities for that particular atom.
Two Processors
Now let's run villin on two processors. (Assuming you have access to that many - see the bccd-checkem machines step above).
- Make sure you're still in the directory you created with the command
cd ~/gmxtmp - Run the GROMACS pre-processor again. Here the
2indicates that we're going to run on two processors.
grompp -np 2 -shuffle -sort -v
- This time, because we're going to be running over multiple machines, we need to synchronize all the files that grompp spit out over the different computers. To do this, first move into your home directory with
cd ~Next, run the following command
bccd-syncdir ~/gmxtmp machines
This will tell you the specific name of the temporary directory. You'll need this for the next step!
- Move into that directory that was just created:
cd /tmp/<your directory> - Now we're ready to run GROMACS again. The
-np 2specifies that we're using two processors this time.
mpirun -np 2 -machinefile ~/machines /usr/local/bin/mdrun -v -c output.gro
Benchmarking Results
Again note the values for Real (s) and ps/NODE hour. How do they compare to the one processor values? We would expect a speedup with two processors working on the problem, unless the overhead introduced by having to communicate between the two processors was actually longer than the speed up in processing. Did it run in half the time? Did the rate (ps/NODE hour) double?
Repeat the above for additional processors, record the data, and graph the results.
Other molecular systems are likely to exhibit different characteristics with respect to speedup and efficiency. This has to do with how well a particular simulation can be split up to run on separate processors, as well as how large it is and how long it takes to run.

