Description of the program SIMULAID: a collection of utilities
designed
to help setting up and analyze molecular simulations.
Mihaly Mezei
Department of Structural and Chemical Biology,
Mount Sinai School of Medicine,
New York, NY 100102
Mihaly.Mezei@mssm.edu
Jan. 03, 2012.
Reference: M. Mezei, Simulaid: a simulation facilitator and analysis program,
J. Comp. Chem., 31, 2658-2668 (2010).
0. GENERAL INTRODUCTION TO SIMULATION SETUPS
In general, a simulation requires three different types of
information:
- Description of the interaction energy terms
- Description of the molecules in the system
- Description of the boundary conditions.
For analysis, the
- Trajectory (history) file
may also be needed.
Simulaid will help in various aspects of establishing these. In order to
help clarify the various functionalities of Simulaid, a brief
description is given of the various components of these
descriptions. Frequent reference will be made to the PDB (Protein
Data Bank) format that is considered the 'lingua franca' of
molecular simulations (although it has a number of 'dialects')
and to formats used by the simulation packages Amber, Charmm,
Gromos/Gromacs, Discover, Macromodel, and MMC.
0.1. Description of the interaction energy terms
Most simulation packages rely on the concept of
atomtype for
the specification of the interaction potential, supplemented by a
partial charge. It is generally assumed that force
constants, dispersion and exhange repulsion terms are
transferable to a larger extent than the partial charges.
The atomtypes can be defined by alphanumeric labels (Amber,
Charmm,
Gromos/Gromacs, Discover) or simple integers (Macromodel, OPLS)
or both
(MMC). In most cases, a parameter file contains the data
necessary
to extract the various coefficients to be used in the energy
expressions
(Amber, Charmm, Gromos/Gromacs, Discover) but they may be built
into the
program source code (Macromodel) or both (MMC).
0.2. Description of the molecules in the system
Most simulation packages use a hierarchical approach to define
the assembly of atoms to be simulated.
- The largest unit is called a segment in Charmm, a
chain in
PDB. MMC has two segments: the solvent and the
solute. Segments or chains are likely to be different
(macro)molecules, although they don't have to be.
- Each segment is generally broken into residues. A
residue is usually a repeating unit that may occur several times
at different places (i.e., an amino or nucleic acid).
- In some instances (Charmm, Discover) residues are further
partioned into groups whose total charge is likely to be
an integer (in most cases zero).
- In any event, each residue involves a collection of
atoms.
Since residues are considered frequently occuring molecular
units, several packages (Amber, Charmm, Discover) use a so-called
residue topology file (RTF) for their definition. For amino and
nucleic acids the PDB nomenclature is followed, but only
approximately. This standard includes definitions for amino acid
and nucleic acid residues. These definitions consist of a
three-letter residue name and atomnames that can be up to four
characters long.
To arrive at the fully defined system, most packages rely on
an annotated coordinate file containing the Cartesian coordinates
of the atoms in the system as well as on the RTF and parameter
files. For Amber, the coordinate file is in the PDB format while
for Charmm it is in the so-called CRD format. Once the list of
residues and atoms are known to Amber or Charmm, they will turn
to their RTF file where the combination of residue names and
atomnames will provide for each atom their atomtypes and partial
charges. The .car file of Discover, the .dat file of Macromodel,
the .mol2 file of Tripos,
as well as the .slt file of MMC, however will contain not only
the residue names and atomnames, but the atom types and partial
charges as well.
The record formats of the formatted input records can be
listed by Simulaid.
The RTF file of Amber and Charmm also specifies the bonding
pattern. Macromodel and Tripos, on the other hand, require the bond list in
the coordinate file while MMC will deduce the bonding pattern from the
coordinates themselves.
Segments are usually defined by the input stream.
0.3. Description of the boundary conditions.
The simulated system may be a droplet in vacuum or a unit cell
of a periodic system. Each program has to be explicitly
instructed in the input stream as to the size and shape of the periodic cell - the options vary
from package to package. Charmm uses an additional file of the
type .img that specifies the centers of all surrounding cells,
but most packages use built-in routines for the different cell
types allowed.
0.4. Trajectories.
Simulaid can read trajectories in the following formats:
- Charmm DCD
- Amber
- MMC (both binary and ASCII formats)
- Macromodel
- Stack of PDB files
- Stack of Charmm CRD files
If a trajectory is in multiple files then the
file names should be derived from the name of the first segment.
Assuming that the trajectory starts with file x.dcd,
subsequent segments should be called x_2.dcd, x_3.dcd, etc.
NOTE on binary files: it is important to compile Simulad with the
same (type) of compilers as the program that generated the trajectory
was compiled with. Binary files generated with Fortran-77 and Fortran-90
are different; similarly, they can be different for 32-bit and 64-bit systems.
0.5. Clustering options.
Simulaid has implemented three clustering algorithms.
- Single link clustering: For a given threshold distance for any pair
of clusters the shortest distance between representatives of each cluster
is larger than the threshold. For any member of a cluster (of more than one
members) there is at least one other cluster member within the threshold
distance.
- K-means clustering: with the number of clusters set by the user,
the program iteratively assigns the objects to the cluster whose center is
the nearest to it.
The center of the cluster can be established in two ways:
- Pick the member of each cluster as its center whose distance from the
member farthes from is is the minimum. This option is useful when the
distances are not representing cartesian distances.
- Calculate the center-of-mass of each cluster and set the center there.
This is meaningful when the coordinates of the nodes are also known
and the distances represent the cartesian distance between these nodes.
- A clustering based on maximum number of neighbors: Pick the (an) object
that has the most number of neighbors (i.e., within threshold distance);
and call a cluster and remove it and it neighbors from the list.
Adjust the neighbor counts and repeat until the list is exhausted.
Single-link and maximum neighbor based clustering can be run either
by specifying a distance cutoff, resulting in an unspecified number of
clusters or by specifying the number of clusters
in which case the cutoff is udjusted until this number of cluster results.
Furthermore, single-link clustering can be used to do a hierarchical
clustering using several different distance cutoffs. In this case clusters
obtained with a larger cutoff are further clustered with the next smaller
cutoff.
I. DESCRIPTION OF THE FUNCTIONS OF THE PROGRAM
I.0. Online help:
Any prompt containing a '?' within the bracketed field
(e.g., 'Do you want ... [y/n/$] ?' or 'Number of ... [100,$]=')
will respond with an explanation if the
user types just a '?'.
Some of the prompts asking for a selection from a list also
has a help option that can be activated by typing a '?'.
In each case, the prompt is repeated after the explanation has been printed.
I.1. Operations performed by several functions:
When the graphics option (with SGI Iris GL only)
is enabled the user is periodocally
offered the option to modify the display using characters typed
in to the keyboard:
- <Enter>: rotates the molecule by the angle increment
(initially set to 10o) around the selected axis
(initally set to the X axis)
- x, y, or z : changes the rotation axis to X, Y, or Z, resp.
- a: <A>sks for an new angle increment
- d: Toggles <d>epth cueing off/onn
- h: Toggles displaying the <h>ydrogens off/on
- c: Toggles displaying the protein side <c>hains off/on
- v: Toggles displaying the sol<v>ents off/on
- <digit>: Changes the solute linewidth to
the value of <digit>
- s: <S>tops image manipulation
For translation/rotation and trajectory conversions Simulaid
offers the option to reset the system into the periodic cell.
The use of periodic boundary conditions may be restricted to
two or one dimensions; i.e., the resetting is applied only in a selected
coordinae plane or along a selected axis.
This option also selects periodic images of each solute molecule
in such a way that the solute appears as compact as possible.
Since Simulaid considers each solute segment a single molecule,
this may result in moving groups of molecules (e.g., lipids)
always together.
Thus, for functions that may involve resetting the system into
a PBC cell, Simulaid offers the users the option to define a number
of residues as molecular residues, i.e., groups of atoms that
are to be reset into the PBC cell together but
independent of the rest of the solute.
Furthermore, residues can also be defined as ions - these
are also treated individually but they are not moved around only to
keep them in the simulation cell, but not in an
attempt to make the solute more compact,
I.2. Specific operations performed by Simulaid:
- Geometry optimizations
- Find the sphere of smallest radius still enclosing a
set of atoms and translate the atoms to the center of this
smallest enclosing sphere (REF1). This
allows the solvation of a solute with the smallest number of
solvents providing a solvation layer of given minimum thickness.
It is possible to input a solvated system where the solvents are
excluded from the sphere calculations but will be subsequently
translated along with the atoms included (i.e., the solute);
solvents outside a layer of user-specified thickness will be
eliminated. The graphics windows (SGI only) shows the original and reduced
sphere's x-y cross section and the atoms touching the minimal
sphere's surface are highlighted as aterisks. Once the
optimization is completed, the image can be manipulated by
keyboard commands:
- Find the orientation of the solute molecule so that
the volume of the cube or rectangle enclosing it is the smallest
(REF2). Smallest enclosing cube is
optimal for a Delphi run, smallest enclosing rectangle is optimal
for grand-canonical runs where a bulk solvent layer is required.
The user is asked if cube or rectangle is required. The
optimization is performed starting from the input orientation as
well as from a user defined number of random orientations to
provide several local minima of the optimization problem from
which the best will be chosen.
Option is provided for limiting the optimization to rotations around
one of the coordinate axes.
- Find the orientation of the solute molecule that
maximizes the smallest image-image distance (REF1). The optimization is performed
starting from the input orientation as well as from a user
defined number of random orientations to provide several local
minima of the optimization problem. The user has a choice of
optimizing the orientation in a periodic
simulation cell defined on input or the construction of the
smallest possible periodic simulation cell
with a given minimum image-image distance.
Again, option is provided for limiting the optimization
to rotations around one of the coordinate axes.
For the minimization of the cellsize the initial cell
dimensions are obtained from the maximum extension of the
molecule in the X, Y, and Z direction for cubic, rectangular and
hexagonal prism cells, from the smallest enclosing sphere for a
face-centered cubic, hexagonal close packed, or truncated
octahedron cell and from input for a cell defined by a Charmm
image file. Once the best orientation is found, the cell's
dimensions are reduced until the smallest image-image distance
reaches a user-defined minimum. Comparing the runs with different
starting orientations, the program choses the one with the
smallest volume.
The graphics window (SGI only) shows the molecule in its original
orientation and the corresponding PBC cell in yellow and red,
respectively; the molecule in its optimized orietation and and
the corresponding PBC cell in blue and green, respectively; and
the line between the atom and its nearest image that determined
the smallest distance (in purple). Once the optimization is
completed, the image can be manipulated by keyboard commands.
- Perform cleanup operations.
Cleanup operations include:
- ensuring that atom and residue numbers are consecutive
(starting from user-defined atom and residue numbers);
- ensuring that all atoms of a residue are in a contiguous
stretch;
- ensuring that all PDB files have a chain (segment) ID. When
none present in the input file, the first chain id will be 'A'.
Every time a 'TER' record is encountered in the input file the
segment id will be changed to the next upper-case letter,
followed by the next lower case letters and finally, the digits
'0' to '9'.
- ensuring that Macromodel files have both 1-letter and
3-letter residue names.
- For PDB output files, it gives an option to write
CONECT records as well
- for MMC's .slt and Autodock's .pdbqs or .pdbqt
files it provides an option
to adjust charges to result in integral residue charges.
- Perform a variety of data
conversions.
- Conversions changing file formats
- Rearrange atoms within a residue according to a
template. Atomnames within a residue can be sorted and selected
to conform to a prescribed template. This may be useful, for
example, when a trajectory file is created from configurations
that don't conform to the RTF file used by the program reading
the trajectory, or when the configuration contains extra atoms
(such as lone pairs). The atoms and their desired order is
specified by a template file and/or by
the RTF file (in Amber, Charmm or Gromacs formats only)
describing the residues in the desired order. Currently, template
files are provided for Amber, Amber94, and Charmm22. If the
residue is not found in the template or the number of atoms in
the template is less than the number in the structure or if the
correspondence between the two atomsets is not one-to-one then
that residue will be left unchanged. The names of the template
files (found in the installation directory)
are listed by the program. Again, they can be modified by the
user and placed in their own directory to allow the treatment of
non-standard, patched or other terminal residues.
- Convert to a different file syntax. Conversions to
most of the recognized input formats can be requested. The
following limitations apply to these conversions:
- If the input format was PDB, the charges will be left as
zero;
- Macromodel output may select generic atomtypes for the
potential function specification when it fails to recognize the
bonding pattern around an atom
- For Macromodel output, the program may not be able to
determine the proper bond orders. The bond orders will be
especially problematic if the input file does not contain
hydrogens
- InsightII .car files will have atom symbols for potential
types
- Can not convert to Grasp .crg format - use the Create a Grasp .crg file option for that
- Can not convert to Tripos .mol2 format
- Create an input configuration for the MMC Monte Carlo
program. The MMC Monte Carlo program's input requires
coordinates, charges and potential type information for each
atom. There are three options to establish atomtypes and partial
charges:
- For Macromodel input and Insight .car file input (assuming
that the latter has been prepared with AMBER potential types) the
program will use the Amber codes;
- For Charmm or Amber input the atom codes can be established
from the residue and atomnames from the Charmm PSF file or the
Amber top file corresponding to the input structure.
- If no PSF or top file is available,
a conversion files
can be used: one of amb_prot_ua.dat, amb_all.dat,
amb_all94.dat, and cha_all22.dat are provided
in the distribution or a file prepared by the user.
Terminal residues may have to be corrected manually,
though. The Charmm conversion file also has the protonated ASP
and GLU residue information, using residue names ASX and GLX,
respectively.
- For Charmm, Amber and Gromacs the conversion information can
be also extracted from multiple RTF file(s) - the user will be prompted
for the filenames.
- Convert trajectories.
This is currently limited to Charmm, Amber, MMC and Macromodel formats.
During conversions, the user has the option to
- rearrange the atoms within residues.
- place only selected configurations on the output file.
- center the solute molecule(s)
in a periodic cell.
- rotate the whole system for each frame around a selected axis
or by an inputted rotation matrix.
- Split an MMC trajectory generated in the grand-canonical ensemble
into Amber trajectories with fixed number of solvents.
- Create an input for Amsol 3.5. Amsol requires the
specification of atomtypes according to its own convention
besides the atomic coordinates and atomic numbers. This
functionality determines these codes based on the molecule's
connectivity table.
- Create an input for Gaussian's ONIOM option.
Besides the X, Y, Z coordinates of each atom,
this option requires the atomic number, the Amber atom type,
partial charge of the atoms as well as the treatment category (H, M, or L).
This option works analogously to the input for creating input for MMC:
it can use the built-in list for deducing types and charges from atom names
and/or topology file(s) in Amber, Charmm or Gromos/Gromacs format.
The H, M, and L regions can be selected by user provided lists or by
spheres of different radii drawn around a user-selected atom.
For the latter, option is provided to extend the list specified by the
inner spheres to avoid breaking non C-C bonds.
Optionally, it also produces a connectivity input, with bond orders
deduced from the coordinates, just like for conversion to Macromodel
format.
- List the positions of the various items in a coordinate
file record for formatted input types.
- Conversions changing atom and or residue names
- 'Regularize' PDB atomnames. Brookhaven specification
uses the first two characters of the name as the right-adjusted
chemical symbol and put trailing numeric digits to the place of
the first space for hydrogens. This functionality enforces this
convention.
- Undo PDB regularization.This functionality moves all
leading numeric characters found in atomnames to the end of the
name.
- Left-adjust atom names.
- Convert residue and atomnames from different
conventions. Unfortunately, PDB atomnames (and even file
syntax) follow many different conventions and are slightly
different in the various software systems in use. This
functionality modifies atom and residue names
based on the source of the names and the software system in use.
Conversion of the atom and residue names are governed by a file called pdb_nam.dat (found in the
installation directory). Residues or
atomnames not in the file will be left unchanged. At present, the
file contains data to interconvert protein and nucleic acid
residues for the Amber, Amber94, Charmm, Moil, IUPAC, Macromodel,
ChemX and PDB conventions. Conversions from Macromodel would
allow for dropping lone-pair 'atoms' and the change of names of
the form C0* to their corresponding PDB-type names only. As a
result, conversions to Macromodel convention have been disabled.
When converting to PDB convention, the one selected by InsightII
will be used. When converting from PDB convention, a list of
known variants will be examined (most of these variants were
compiled by Dr. D. Strahs). If there is a copy of the conversion file
pdb_nam.dat in the local directory (the directory Simulaid
is run from), it will be used instead, so users can add
additional conversions to the existing residues or add new
residues. A local copy of pdb_nam.dat can also be used for
making selected changes by eliminating or changing conversion
information from the 'master' file.
- Trajectory - structure file conversions
- Unpack a collection of
configurations into single files. This functionality takes a
large file containing a number of configurations and separates
them into single configurations.
There are several options of selecting the files to be unpacked.
An input file XXX.YYY will
result in files XXX.1.YYY, XXX.2.YYY, etc.
- Unpack a trajectory into
single configurations, either in a single file or in separate
files. The selection options and the
naming convention of the new files are the same as for
unpacking configurations (see above).
The format of the unpacked file can be specified by the user.
For PDB output, option is given mark eachs structure as MODEL.
- Convert (pack) a series of configurations into a trajectory
file. This option will create a Charmm, Amber or MMC binary
trajectory file and put the input configuration as well as all
subsequent configurations into this file as consecutive
snapshots.
When packing into MMC binary format, Simulaid will search for
energy information if the input structures are in PDB or Charmm CRD format
s follows:
- In Charmm .CRD files, Checking the title records for
the string '* energy=' or '* Energy=', or '* E=', followed by the energy
- In PDB files, Checking the REMARK records for the string
'REMARK energy=' or 'REMARK Energy=', or 'REMARK E=', followed by
the energy
- Edit the conformation
(translate/rotate, add/delete atoms, etc)
- Edit the configuration. Editing
allows to
- select (keep only one) chain/segment
- excise (drop only one) chain/segment
- keep only the backbone
- keep only the alpha carbons
- drop all aliphatic (bonded to carbon) hydrogens
(and add its charge to the carbon's)
- select atom range to keep
- select atom range to drop
- select residue range to keep
- select residue range to drop
-
Translate and/or rotate the inputted structure
- The following translations can be performed
- Translate by a user-specified vector.
- Translate by a user-specified vector and
reset the molecules into a
user-specified periodic cell.
- Center the solute molecule(s) in a user-specified periodic cell.
For more than one solute molecules, the images resulting in the most
compact solute will be selected.
Translation is useful when switching
between sytems with the origin at the corner (e.g., Amber) and at
the center (e.g., Charmm and MMC) or when a simulation
lets molecules move away from the periodic cell.
- Rotation can be performed
around a user-specified axis by a user-specied angle
or by a user-specified rotation matrix
- Modify the configuration. Modifying
allows to
- replace an atom
- add an atom
When replacing a terminal atom (i.e., on with only one bonded neighbor)
the bond length can be changed. When adding a new atom X, the program
will let the user establish a bonded sequence of heavy atoms,
R3-R2-R1-X and the user will have to specify the R1-X distance,
R2-R1-X angle and the R3-R2-R1-X torsion angle.
- Replace the coordinates of a configuration.
This option allows the replacement of just the coordinates in a structure
without changing the rest of the file.
The replacement can be
- simply a copy of an other structure
- the coordinates of the original structure translated and rotated to
obtain the best overlay to an inputted reference structure
The source of the coordinates or the reference structure can
be in any format except .mol2 for the system to be changed.
- Extract the CA carbons of the protein backbone into a PDB file
- Change the (solvent) water molecules to their gas
phase geometry. Minimizations or molecular dynamics may have been
run without restraining the solvents to their experimental
geometry. This option 'fixes' these solvents.
- Eliminate solvents outside a simulation cell.
Simulation setup may involve the 'soaking' of a solute in some
solvent bath. This option allows the carving out of a periodic
cell of any of the above mentioned types. The graphics window (SGI only)
shows the original system and the PBC cell outside which the
solvents will be eliminated. This image can also be manipulated
by keyboard commands.
- Reverse optimization.
This option transforms back the structure in the original position/orientation.
The translation/rotation information is read from the comment lines
in the input structure file as saved there by Simulaid.
- Create a full unit cell from the asymmetric unit.
This option extracts from the PDB file the transformation information that
defines the symmetry-related images of the asymmetric unit and performs
these transformations.
- Create biological oligomers from the asymmetric unit.
This option extracts from the PDB file the transformation information that
defines the symmetry-related images forming the biological oligomers
and performs these transformations.
- Gather all crystal contacts of the asymmetric unit.
This option gathers all copies of the asymmetric unit that are in contact
with the first copy.
Each copy may be related to the first asymmetric unit by
symmetry transformation and/or unit cell translation.
- Prepare a structure-derived file
(sequence, Charmm RTF, UHBD dat, etc)
- Extract the sequence list
from the inputted structure. For structures carrying residue
information this function will extract the list of the residues
in a user selectable format:
- Charmm segment definition format
- PDB (SEQRES) format
- PIR format (first line: title; subsequent lines:
one letter residue codes without any separator).
- GCG - the proprietary format of the Wisconsin GCG program
- Create a Grasp .crg file from RTF file(s)
- Create a UHBD .dat file from RTF and parameter file(s)
(for now, only in Charmm format)
- Print the centers of neigboring image cells and cell vertices.
The cell centers will written as a pdb file called imagecell.pdb
and the vertices in an other PDB file called pbcvertices.pdb,
complete with CONECT statements to draw the cell polygon's edges.
With the graphics option, the cell will be drawn on the screen as well.
- Generate torsion angle input for Monte Carlo
calculations:
- Torsion angle list in Macromodel (Bmin) format (general or
protein (i.e., with rigid peptide bond))
- Torsion angle list in
MMC format
for proteins
(backbone with rigid peptide bond and sidechains or sidechains only)
or for a general molecule.
- Create Charmm topology file residue records for
residues specified by the user.
When the input is in InsightII .car format, the atom types are
also converted, using the algorithm (and routines) of the program
Intocham.
- Perform a number of conformation analyses on the
solute.
The analyses can be done on the configuration read or on a trajectory
specified by the user.
When analyzing a trajectory, options are provided for
analyzing only selected configurations.
The time axis of plots of the evolution of various properties may be
labeled by the frame number, or in units of time (ps, ns, or ms).
The following analyses are implemented:
- Geometry/topology analyses
- Print a complete list of
neighbors, bond lenths, angles and torsions
and mark atoms that are chiral centers (in some rare cases
the algorithm may miss a subtle chirality).
- Print a complete list of torsion angles and the corresponding
1-4 distances
- Perform a functional group analysis that groups atoms into
functional groups, list these groups and prints the neighbors of
each member of each group
- Print the longest connected chain of the molecule
(one atom per line)
and for each atom prints the longest side chain.
- Print a complete list of bonds types and their lengths, angles and
their values as well as
statistics (min, max, average) of the lengths of bonds and
values on angles between various elements
- Allow the user to change the bond-threshold values for
selected atoms
- Finding and tracking various bonds
- Calculate hydrogen bonds:
- Specify the distance thresholds for hydrogen bonds
- Select atoms whose hydrogen bonds are to be tracked
- Print a list of hydrogen bonds with the geometric parameters.
For a trajectory,
- Plot the history of each hydrogen bond
(i.e., track when it was on or off)
- Calculate the frequencies of residue pairs being hydrogen bonded
and generate a 2D hydrogen-bond map.
- Optionally, calculate the correlation c(i,j)
between selected hydrogen bonds
- Optionally, cluster
these hydrogen bonds using (1-c(i,j))^e as the distance
function; the exponent e can be 1,2,3,... or 1/2, 1/3, 1/4, ...
- Calculate salt bridges:
- Specify the atom name and/or partial charge that can form salt bridge
- Specify the distance threshold for salt bridges
- Select atoms whose salt bridges are to be tracked
- Print a list of salt bridges with their length.
For a trajectory,
- Plot the history of each salt bridge
(i.e., track when it was on or off)
- Calculate the frequencies of residue pairs having a salt bridge
and generate a 2D salt-bridge map.
- Optionally, calculate the correlation c(i,j)
between salt bridges
- Optionally, cluster
these salt bridges using (1-c(i,j))^e as the distance
function; the exponent e can be 1,2,3,... or 1/2, 1/3, 1/4, ...
- Calculate hydrophobic bonds:
- Specify the carbon atom type that are considered hydrophobic
- Specify the distance threshold for hydrophobic bonds
- Select atoms whose hydrophobic bonds are to be tracked
- Print a list of hydrophobic bonds with their length.
For a trajectory,
- Plot the history of each hydrophobic bond
(i.e., track when it was on or off)
- Calculate the frequencies of residue pairs having a hydrophobic bond
and generate a 2D hydrophobic-bond map.
- Optionally, calculate the correlation c(i,j)
between hydrophobic bonds
- Optionally, cluster
these hydrophobic bonds using (1-c(i,j))^e as the distance
function; the exponent e can be 1,2,3,... or 1/2, 1/3, 1/4, ...
- Calculate heavy-atom contact bonds:
- Specify the distance threshold for the contact
- Select atoms whose contacts are to be tracked
For a trajectory,
- Plot the history of each contact
(i.e., track when it was on or off)
- Calculate the frequencies of residue pairs having a contact bond
and generate a 2D contact map.
- Search for hydrogen-bonded bridges between selected solute atoms.
The program asks for a list of solute atoms (specified by atom indices,
or atom names or protein backbone) to start looking for hydrogen-bonded
chains ending in the solute (maximum length considered: 4 bonds)
- Calculate atomic properties to be written on the
data column of the output structure file.
For PDB fomat, the temperature factor column is used,
for Insight and Macromodel the charge column.
- Get the hydrophobicity
(hydropathy) values of the residue the atom belongs to.
The user can choose among Kyte-Doolittle, Eisenberg or White's
scale or input a new one. A good resource (references, values)
for the large number of such scales is the paper
J. Cornette, K. B. Cease, H. Margalit, J. L. Spouge,
J. A. Berzofsky and C. DeLisi,
Hydrophobicity Scales and Computational Techniques for Detecting
Amphipathic Structures in Proteins,
J. Mol. Biol., 195, 659-685 (1987).
- Calculate circular variance of the solute atoms and solvent
molecules w.r.t. the solute
(REF. 3).
- Read in a
DELPHI
potential map (written in InsightII format) and calculate the
potential at the sites of the solute atoms (InsightII has a
DELPHI spectrum built in for coloring).
When a Delphi map is read, option is provided for calculating the
potential at a user-specified site and/or printing a PDB file
of the potentials at a selected subset of the map.
Potential in the vicinity of atoms can be set to zero (using
a user-specified threshold) or interpolated from grids farther.
- Calculate molecular properties
- Calculate a Monte Carlo estimate of the
volume of the solute's combined
vdW spheres, the volume of the shell region excluded by the solvent
(of user-specified radius), the volume of the first solvation shell
and, for multimolecular solutes the volume of the
interface beween the solute molecules
(the combined total and the pairwise values).
A region is considered interface if it is between two atoms on
different molecules and the solvation shells of the two atoms overlap.
For trajectories, a plot of the evolution and the distributions of the
excluded volume and first solvation shell volume is also given, as well as
averages and correlations among the volumes calculated.
- Calculate the principal axes of a selected solute molecule.
For trajectory scan, the angle between the initial and the current
axes are also calculated and plotted.
The initial structure can be set to either the first structure of the
trajectory or the structure inputted at th start.
The calculation of the principal axes can be restricted to a subset of the
atoms in the selected solute molecule.
- Calculate the radius of gyration,
hydrodynamic radius and - when partial charges are available - the
dipole moment of a selected solute molecule.
- RMSD's
- Calculate the RMSD between the reference structure and
the structures in a trajectory.
The atoms used for overlay and calculation of the RMSD can be
limited to a subset of the system (see the
Edit the configuration option).
- Calculate the RMSD map over a trajectory.
The calculations will give
- The RMSD between all pairs of structures in a trajectory,
printed and (optionally) visualized in a color-coded Postscript 2D map.
Option is given to avarege and condnese blocks in the plot.
- The average RMSD will also be
calculated as a function of the run length.
- The distribution of the MSD's, printed and (optionally)
displayed in a Postscript plot.
- The distribution of the MSD's
divided by the distribution from an uncorrelated random
walk (R^2 = c * N), printed and (optionally) displayed in a Postscript plot.
- The structure for which the largest RMSD between itself and
the rest of the structures is minimal
- The structure for which the sum of MSD's between itself and
the rest of the structures is minimal
- Optionally the program can cluster
the structures, using the calculated RMSD's as the distance measure.
The clustering can be also performed using the 2D RMSD values calculated
in a previous Simulaid run
(reading in the result file of extension .rd2).
The clustering is based either on a user-given distance threshold, or on
the user-defined requested number of clusters.
Representative members (to be considered at the center of the cluster)
of the cluster will be selected based on the following criteria:
- The structure with the smallest maximum RMSD with the rest
of the cluster members
- The structure with the smallest average RMSD with the rest
- The structure with the smallest energy within the cluster.
This option is only available if energy was read from the trajectory.
Currently, energies can be read from some of MMC formatted trajectories:
- Binary MMC trajectory
- MMC trajectory that is a set of Charmm .CRD files, having in the
title the string '* energy=' or '* Energy=', or '* E=', followed by
the energy
- MMC trajectory that is a set of PDB files, containing REMARK lines
of the form
'REMARK energy=' or 'REMARK Energy=', or 'REMARK E=', followed by
the energy
When the energy is available, for each cluster the average energy will also be
calculated and printed.
- Optionally, separate trajectories can be generated containing the frames
of the different clusters.
- Calculate the cross RMSD map between two trajectories
run on the same system. The calculations will give
- The color-coded map of the RMSD's
- The distribution of the RMSD's
- If requested, for each structure in the first trajectory
Simulaid will print the number of the one in the
second with the smallest RMSD and the same for the second trajectory.
Structure pairs that are mutully the closest ones are marked.
- If the 2D RMSD result of the two trajectories are available then
Simulaid can read in the .rd2 files of these calculation, the
result file (the file with .rdx extension) of the cross RMSD calculation,
perform a clustering on the .rd2 files and calculate the
overlap among the clusters of the two trajectory.
- Measure various distances
- Print a list of close contacts between solute atoms
- Perform an analysis of the adjaceny matrix defined between residues
based on the idea of
A. Schuyler et al. that looks at the column
sum of successive powers of the adjacency matrix and its generalization
that uses weighted edges.
- Calculate the distance between two selected atoms (including
PBC imaging, if requested).
When scanning a trajectory,
- option is given to use the first atom of the pair from the
inputted reference structure
- the evolution of the distance and its components are plotted
as function of simulation length.
- Find (protein) residue pairs that are in "contact";
one residue should be in a user-specified reference range
and the other in the user-specified neighborhood range.
Contacts are defined in three different ways:
- The distance between the
between representative atoms of the two residues considered
(C(alpha) by default (unless
the user specifies a different representative atom)
is within a certain threshold distance
- the distance between the pair of atoms
(one on each residue under consideration) nearest to each other
is within a certain threshold distance
- There is a pair of atoms (R,N) in the residue pair considered
such that R is closest to N when considering the whole
reference range
and N is closest to R when considering the whole
neighborhood range.
The distance list is followed by a nonredundant list of neigbour residues
as well as a nonredundant list of the non-neighbour residues.
- Check for potentially unphysical distances in the system
- For a set of pairs of atoms calculate the average distances and
their standard deviation and the distribution of the
distances between the atom pairs.
Optionally, atom pairs can be replaced by pairs of small clusters
(e.g., the hydrogens of a given methyl group);
the distance between such clusters is defined as the distance between the
closest pair of two atoms in different clusters.
- Calculate the protein backbone torsion angles Phi and Psi.
The following outputs can be obtained:
- A Ramachandran map of the protein backbone torsions.
For a single configuration, the points are color-coded continuously
with the residue number
on the rainbow scale red-yellow-green-cyan-blue,
and for trajectory scan they are color-coded with the frame number.
- A file with the calculated angles.
For trajectory scan, the distribution of angles in 60o
grids and the
PROSS
assignment is also printed.
- Optionally,
for residues selected by the user (currently up to 25),
the time-evolution of the angles is also shown both with dial plots
and with a path in the Ramachandran map and
plots of the autocorrelation function of the conformation of each residue
is also prepared (to quantify the rate of convergence).
- Calculate selected torsion angles (currently up to 49).
For analyzing trajectories, calculate
their average, CV. If requested, all pairs of angles will be used to calculate
the Pearson correlation coefficent and Fisher-Lee
circular correlation coefficients:
C(a,b) = Dab / (A2*B2)
where
Dab =
SUMNi=1 SUMNj=i+1
sin(ai-aj) * sin(bi-bj)
A = SUMNi=1 SUMNj=i+1
sin(ai-aj)2
B = SUMNi=1 SUMNj=i+1
sin(bi-bj)2
Also, prepare dial plots showing the evolution of the selected torsions.
- Perform the
analysis of proline kinks
as implemented in the program
ProKink
by Visiers et al.
- As an enhacement ot the original algorithm, option is porvided
to fit a plane to the proline ring and use the projection of the
alpha-carbon to this plane for the definition of the wobble and
phase shift angles.
- When reading a trajectory, the proline kink is animated in the
graphics window (SGI only) and the kink angles are plotted both in a second
window and onto a Postscript file.
- The psudorotation angle with the nitrogen as the apex
is also calculated and plotted for each structure.
- For a selected helix
(REF. 8),
- calculate its axis
- find and plot the angles between
the axis and the laboratory frame coordinate axes
- calculate the and plot radius if circle fitted to the alpha carbons
(a bend indicator) and provide a novel shape estimator.
- calculate and plot the length of the helix
- calculate and plot the average turn angle per residue.
When scanning a trajectory, for each frame
- calculate and plot the angle between the axis in the current and in the
reference structure (read in at the start of the run)
- calculate and plot the angle the helix was rotated around
the axis (again, compared to the reference structure).
The overall rotation of the protein can be applied to the reference
helix before the comparison.
- calculate and plot the displacement fo the helix center
compared to the reference structure.
The displacement of the protein's COM can be subtracted.
- calculate the angle between the normals of the plane fitted to the
reference structure and the current structure, rotate both normals
so that the reference planes normal is parallel to the Z axids and plot
the progress of the X-Y projection of the current (rotated) normal.
At the end of the trajectory scan the correlation coefficients among all pairs
of calculated properties will be calculated.
For properties involving two angles the
Fisher-Lee circular correlation coefficient
will also be calculated:
For properties involving one angle only the
Mardia linear-circular correlation coefficient will also be calculated:
C(a,x) = (r122+r122-2*r12*r13*r23)/(1-r232)
where
r12 = corr(sin(a),x); r13 = corr(cos(a),x);
r23 = corr(sin(a),cos(a))
- Calculate pseudorotation angles as defined by
Cremer & Pople
and for 5-membered rings also by
Altona & Sundaralingam.
- Find helices and sheets using the formalism implemented
in the program DSSP (
Kabsch & Sander).
When scanning a trajectory, a plot of the secondary structure
evolution is also prepared.
- Calculate and plot (both in the graphics window (SGI only) and onto a
Postscript file) the
circular variance map of the
solute:
The circular variance map of a macromolecule of N residues
is an NxN matrix:
CVi,j = [ SUMk between i and j
(ri,k/|ri,k|)]/|i-j+1| ,
while its weighted variant is defined as:
CVwi,j =
[ SUMk between i and j ri,k]/
[ SUMk between i and j|ri,k|]
- Calculate the the covariance and correlation matrices
of selected residues over a
trajectory. Both matrices are is printed and a color-coded map
is plotted for both.
Furtermore, if requested, the eigenvalues and eigenvectors of the
covariance matrix will also be calulated.
- Extract and tabulate results for selected properties form the large
output tables created by the script mm_pbsa.pl
that is part of the Amber suite of programs analyzing Amber trajectories.
Simulaid will recognize if the input tables contain residue-residue
contributions or residue-molecule contributions.
The table colums are the selected energy contribution terms, except for
the case when one property is selected to tabulate from from residue-residue
tables - in that case the table printed will have both rows and colums
as residues, the range of which is specified by the user.
In addition, correlations between selected properties can also be calculated.
- General clustering
- Cluster atoms. This option takes the input structure, and
cluster
the atoms based on their distances.
This function can be used for a data set representing a density distributions
- the clustering will delineate the regions of different density maxima.
- Cluster based on an input matrix
This option reads in a matrix representing distances and performs clustering
on it. The input is assumed to be in the following form:
Size of the matrix (N)
N times:
Row number, Row name
N numbers, representing one row of the matrix (separated by spaces)
This matrix can be transformed before clustering using several formulae:
- R(i,j) = 1 - R(i,j)
- R(i,j) = R(i,j)^N
- R(i,j) = R(i,j)^(1/N)
- R(i,j) = (1- R(i,j))^N
- R(i,j) = (1- R(i,j))^(1/N)
- Start logging the keyboard input.
When logging is turned on, the program writes all keyboard input on a file
that can be later used as input to repeat the run:
simulaid < <logfile>
Note that if the run created new files than 'replaying' the log file
will only work if these files are removed, or a line containing y
is inserted at the places where Simulaid would ask permission to overwrite
such files.
- Make the interactive quiz predictable.
In certain cases, Simulaid has extra questions
or omits some questions, depending on the data. This can confuse runs
that use a log file as input. To avoid this problem, when this option
is set, the extra qestions will assume default answers and the questions that
may be eliminated in some cases will not be.
- Animate a trajectory (only when compiled with the
C@GL comments removed)
II.RUNNING THE PROGRAM - INPUT
The program is run interactively: the user is first asked to select the
operation to be performed then (if needed) the
- name of the input STRUCTURE file. If the graphics code is
active the program opens a graphics window and displays the
molecule read. Input syntax can be
- (.CRD) Charmm CRD, either in the original (80 characters/atoms,
4-character names) to the extended format introduced with Charmm 32 (132
characters/atoms, 8-character names, 10-character integers)
- (.pdb) Original Protein Data Bank format - there are
several variants:
- Brookhaven format
- Charmm PDB (Charmm PDB differs from Brookhaven PDB mainly
in the placement of the segment ID)
- Autodock PDBQS - an extension of the PDB format containig
partial charge and solvation parameters (used by Autoock 3)
- Autodock PDBQT - an extension of the PDB format containig additional
partial charge (used by Autoock 4)
- Macromodel
- Macromodel/Xcluster (for more than one conformations, a
Macromodel format conformation is followed by only x,y,z
information as provided by Xcluster)
- (.slt) MMC solute description file format
- (.gro) Gromos/Gromacs coordinate file format
- (.mol2) Tripos Mol2 coordinate file format (only for input)
- (.car) Insight car format
- Free format as defined by the Insight format filenxyz.frm (number of atoms in the first line,
followed by atomic number and x,y,z coordinates in each of the
successive lines)
- Free format as defined by the Insight format filesxyz.frm (like nxyz.frm, except that chemical
symbols are expected instead of atomic numbers)
- Free format as defined by the Insight format filesxyzqr.frm (like sxyz, but appended by charge
and residue number).
- (.crg) Grasp charge file
- If the extension of the input file is neither of the extensions
specified above the program asks the
format of the input data.
- type of calculation to be run,
- name of the output file.
Whenever relevant, it asks the
- simulation cell shape. The
following (periodic) boundary conditions are recognized:
- cubic
- rectangular
- hexagonal prism (both regular or irregular hexagon),
with user-specified coordinate axis along the prism axis
and with user-specified coordiante axis going through the hexagon vertex
(e.g., X axis along the prism axis and vertices
of the hexagon are on the Z axis - as used by MMC)
- face-centered cubic
- truncated octahedron, X axis normal to a square face - as used by Charmm
- truncated octahedron, X axis normal to a hexgon face - as used by Amber
- hexagonal close-packed
- image-file based
- sphere (used only for trimming solvents)
- no boundary conditions
- residue name of the solvent (default: HOH for PDB,
WTR for Insight and TIP3 otherwise)
- number of solvent atoms for solvents other than TIP3,
WTR or HOH.
- For runs optimizing position or orientation it offers the option of excluding
hydrogens.
- Sphere optimization asks the width
of the extra solvent layer - this width will be used for
volume calculations and (when relevant) excluding excess
solvents.
- Orienting run asks the minimum
image-image distance to be achieved. The final cell size will
be determined with this value in sight.
- Trajectory conversions will use the
input configuration to define the order of the atoms in the input
trajectory and may be given a second configuration file to
establish the target trajectory's atom order.
Creation of a Grasp .crg file involves specifying the RTF
file name(s). Note that the Grasp format limits atom names to
four characters and residue names to three characters. If either
of these limits is exceded, the program will truncate the names
and print a warning. Also, the program requires a valid 'input'
filename, but it will be ignored.
Animation runs (only when the graphics code is active, of
course) require the specification of the trajectory file name,
the range and frequency of sampling and the frequency of
pausing for manipulations.
Each solute segment/chain is colored differently (using
six colors, cyclicly repeated:
1:red; 2:green; 3:yellow; 4:blue; 5:pink; 6; cyan),
solvents are colored white.
For simulations with variable box size the smallest and largest
PBC boxes are also drawn.
For multisegment solutes one segment can be highlighted.
After each pass option is
given to repeat the animation and change the value for the
length of waiting time between showing two frames.
Each time the animation pauses the display can also be
manipulated by keyboard commands. The
length of wait is defined in a hardware dependent arbitrary unit.
There is also an option to reorient the system to the orientation
of the initial configuration (useful for animating Monte Carlo
runs).
For most runs with conversions or with
cleanup the program asks the number of
additional structures to process in the same way. This allows
the processing of a batch of files with one run. For Charmm
format output, the segment id of the additional structures will
be appended by 1,2, etc., as long as there is 'room' for this
additional information in the four-digit sequence id (i.e., the
original segment id contained enough blanks).
Additional input
structures may be either in the same file or in separate files -
you will be asked which. Each file may contain several structures
- you will be asked how many. If they are in separate files, and
the first file was called ether A.B or A.1.B then
the remaining files have to be called A.2.B, A.3.B,
etc. Similar naming convention applies to the output files.
The output file(s) will contain the modified molecule. Except
for runs requesting a transformation of syntax, it will use the
same syntax type as the input file. The header comments for
Charmm and PDB formats will be extended with a description of the
transformation(s) done by the program and/or the results of the
optimization.
If the graphics code is active, some functionalities allow the
rotation of the molecule in the graphics window
(keyboard commands).
Running Simulaid on a batch of files
Simulaid has been succesfully run from a c-shell script on a batch of files.
The script requires a copy of all the answers to the interactive quiz
durig the Simulaid run. This can be obtained by running Simulaid
and turn on on input logging.
It is advisable to set the input
predictable in this case.
This list has to be put into the
script file between the simulaid << fin command and the
terminator string fin. In the example below all .pdb
files in the directory are converted from PDB atom/residue name
convention to Charmm's convention.
Also, the generation of the list of files to be processed has
to be adapted to each case.
#! /bin/csh -f
#
# script to convert from PDB name convention to Charmm convention
#
set files = `ls *.pdb`
# The variable $file lists all the .pdb files in this directory
foreach file ($files)
set file1 = `echo $file | sed s/\.pdb/_ch\.pdb/`
# Generated a new file name from the priginal .pdb file name to use as output
echo Reading $file and writing file $file1
simulaid << fin >> junk
p
n
f
$file
b
$file1
y
9
3
fin
end
#rm junk
exit
III. THE OUTPUT ON THE TERMINAL
The terminal output will include the title cards and the
number of atoms found in the input file (the number of hydrogens
also if they are not to be considered) for all types of runs.
- For enclosing sphere calculation:
-
The initial radius found based on the center determined by
shifting along the X, Y, and Z axes by
(X(min)+X(max))/2, (Y(min)+Y(max))/2,
(Z(min)+Z(max))/2, respectively, where
X(min), X(max) are the smallest and largest X
coordinates, etc.;
- The optimized radius;
- The volume of the initial and optimized sphere and the
number of waters required to fill it in.
- The volume of the initial and optimized spheres increased by
the water layer width and the number of waters required to fill
it in.
- The vector shifting the molecule to the center of the
smallest sphere.
- If solvents were present in the input file, the number of
them remaining that is within the layer of specified width around
the optimized sphere.
- For minimizing the bounding box:
- The initial bounding box sixe
- Messages describing the progress of optimization.
- The smallest bounding boxes found in each minimization try
- The optimal bounding box sixe
- The vector shifting the molecule's COM to (0,0,0) and
the rotation matrix to the optimal orientation
- For orientation optimization:
- The shortest distance between atoms on different images in
the original and in the new orientation. The optimization will be
performed starting from the input orientation as well as from a
user-defined number of different random orientations.
- Messages describing the progress of optimization.
- The cell parameter(s), like the edge lengths, corresponding
to the cell shape used in the calculation.
- The cell volume and the number of waters required to fill it
in (without the solute present).
- The vector shifting the molecule's COM to (0,0,0) and
the rotation matrix to the optimal orientation
- For solvent geometry fixing calculations:
- The number of solvents found.
- For elimination of waters outside a simulation cell:
- The number of waters kept and the number of non-water atoms
found.
- The cleanup operations, performed
automatically at the end of the run or at special request. Note,
that automatically invoked cleanup operations start the residue
and atomnumbers from one.
- The number of atoms found detached from their residue
- A message if resequencing was or was not needed
- Conversion output
- For each type of conversion a variety of warnings or error
messages may be obtained indicating that the input file does not
conform to some assumptions or the templates used are not
prepared for the particular input. These messages should be
heeded.
- For residue-name bearing input a sequence list can be written
to a file.
- The analysis option provides a menu from which a choice
should be made and ask if trajectory scan is requested.
After the analysis requested is performed
the program returns to this menu allowing the user to do an other one
or to stop.
If additional configurations are read from the input file,
the last analysis will be repeated.
- Each type of analysis prints the name of the output file
(generated from the input file name by adding a special suffix)
where the requested information is written.
- Various statistics generated over the lists are also printed
on the terminal.
Furthermore, several internal consistency checks are
implemented to increase the chance of detecting any error that
might have remained in the program. Failing such a test results
in a message preceded by PROGRAM ERROR. This always indicates a
bug in the program and should be brought to the attention of the
author, together with the input file that was processed when the
error occurred.
IV. FILE FORMATS
- The name conversion file
pdb_nam.dat is of the following syntax:
- Part 1: nconv,ncol (free format)
- nconv: Number of name conventions included.
- ncol: number of data columns in the file.
- Part 2: 8-character names of the conventions (in one line):
"Amber ", "Amber94 ", "Charmm ",
"Moil ", "IUPAC ",
"Macromod","ChemX ", " PDB "
PDB has to be the last convention.
- Part 3: Data for residue name conversions. Each line has
ncol entries.
- The first entry is a generic residue name.
- Entries two to nconv+1 are the actual residue
names corresponding to possible variants in the various
conventions.
- Entries from nconv+2 to ncol are possible
additional PDB variants. A blank entry means that conversions to
that convention of the corresponding atoms will retain the
original name. An entry of "****" means that conversions to that
convention will result in dropping those atoms.
- This section is concluded with a line for residue name
"DONE".
-
Part 4: Data for atom name conversions. Each line has ncol
entries. The first entry is a generic residue name as defined in
Part 3. Entries two to nconv+1 are the atom names
corresponding to the variants in the various conventions.
Entries from nconv+2 to ncol are possible
additional PDB variants. Entries for PDB varians have to conform
to the PDB convention requiring that atom names for elements with
single-character chemical id be in the second position -
leaving the first one either blank or filled with a number (e.g.,
"_OD1" or "2HA1").
There are a number of entries with blank generic residue names
- they represent names that convert to each other for any
residue. This section is also concluded with a line for residue
name "DONE".
- The template file for the orders of
the atoms within each residue has 8-character entries in the form
"RESN ","ATNM ". The current
versions have been directly generated from the RTF files of Amber
and Charmm.
- The file containing the
information needed to convert to MMC Monte Carlo input has
8-character entries "RESN ",
"ATNM ", "PFCD ",
"Charge","GROUPa b" where
- RESN is the residue name
- ATNM is the atom name
- PFCD is the left-adjusted atom-type name
- Charge is the partial charge given with decimal point
- GROUPa b gives information about the carous charge groups
within a residue. For each group, a=1 for the first atom od the
group and a=0 for the rest; b is the sequence number of the group
within the residue (i.e., b=1,...,b=2,..., etc).
Again, they have been extracted from the RTF files of Amber
and Charmm, respectively. Note that the atom order template file
has the same syntax as the MMC conversion file for the residue
and atomnames thus an MMC conversion file can be also used for
atom order template file.
V. INSTALLATION
The distribution contains the source code and some data files.
Installation of the program requires the following steps:
- Putting these data files into a directory referred to as the
installation directory
(/hosts/fulcrum/homes/molmod/src/simulaid/ in the
distribution file).
- Changing in the block data
- the string MYHOST to a 7-character name of your host
- the string MYPATH to the path of the directory the
conversion files are installed
- the fifth number (00) to the number of characters in /MYPATH/
- Removing the C@**
(i.e., deleting the first four characters in the line)
comments from lines corresponding to your environment
in the subroutine datprt (near the end of the source code) for
the printing of the date on output files as follows:
- "" - SGI IRIX Fortran compiler
- "" - g77 Fortran compiler
- "" - Absoft Fortran compiler
- "" - Intel Fortran compiler
- "" - Hewlett-Packard Fortran compiler
- "" - IBM AIX Fortran compiler
If the program is to be compiled without the graphics
interface (i.e., not to run on an SGI graphics workstation),
simply compile the program with Fortran, e.g.,:
f77 -o simulaid.bin simulaid.f
to obtain the
executable simulaid.bin
- If the program is to be compiled with Fortran95 then there
is a preprocessor in the distribution f77tof95.f and f77tof95_f95.f (the
second one is the Fortran95 version of the first one) that changes
the syntax to conform to Fortran95 requirements.
- If the program is to be run on an SGI graphics workstation
then the compilation has to be preceded by uncommenting this part
of the code (make sure that the graphics libraries are installed
on your system):
cat simulaid.f | sed 's/C@GL//' > simulaid_tmp.f
f77 simulaid_tmp.f -o simulaid.bin -lfgl -lgl rm
simulaid_tmp.f
to obtain the executable
simulaid.bin
The distribution directory also contains the files
nxyz.frm, sxyz.frm and sxyzrq.frm that give
the decription of the data format for some of the InsightII
free-formats. The Charmm image files for the non-rectangular
boundary conditions, hexy.img, hexz.img,
fcc.img, to.img, and hcp.img are also given.
VI. CHANGING DIMENSIONS
The sizes of the arrays are established with parameter statements
throughout the code. Several symbols user used for this purpose.
There are certain relations between these syembols,
so changing one of them is likely to require changes in some others.
Below is a list of these symbols and their relations that have to be
maintained (the program check for violations).
IMPORTANT: Parameter statements for any symbol occur several places
in the program. When a change is required, it has to be carried out
at ALL occurences!
- MAXPHI: maximum number of
Delphi gridpoints
Also, MAXPHI3 is the memory used for 2D-data - see below
the parameter definitions depending on MAXPHI
- MAXREC: Maximum number of records to read
> maximum number of atoms
must be divisible by 20
- MAXRSD: Maximum number of residues
- MAXNEIG: Maximum number of neighbors of an atom in the neighbor list
(7+2*MAXNEIG)*MAXREC < MAXPHI3
- MAXBONDS: Maximum number of bonds (hydrogen, hydrophobic or salt bridge)
- MAX2D: maximum number of frames for the
2-D RMSD map
< MAXBONDS
4*MAXBONDS+MAX2D+2*MAX2D2 < MAXPHI3
- MAXCV: maximum number of residues for the
CV-based domain map
8*MAXCV+5*MAXCV*2 < MAXPHI3
2*MAXRCORR*2+16*MAXRCORR < MAXPHI3
- MAXCONN: maximum number of residues for the
residue distance map
3*MAXCONN*2-11*MAXCONN < MAXPHI3
The current version uses MAXREC=169980, MAXRSD=49999, MAXNEIG=25,
MAXPHI=258, MAX2D=2900, MAXCV=800, MAXRCORR=2800 .
Valid values for a larger version (ca. 1.1 Gb) are
MAXPHI=640, MAX2D=10000, MAXCV=800, MAXRCORR=10000.
In additions to the symbols above, there are other size parameters
whose value can be chosen independently (the current value is in braces):
- MAXFRAMES {25000}: the number of trajectory frames to store for analysis
- MAXCOPY {120}: the number of different data (e.g., torsion angles) per frame
to store for analysis
- MAXHX {50}: the maximum number of residues per protein helix
- MAXBRDIGEATOM {5000}: Maximum number of bridge anchor atoms:
- MAXBRDIGETYPE {500}: Maximum number of bridge destination atoms:
- MAXBRIDGELEN {4}: Maximum number of H bonds in a bridge:
- MAXDDISTR {200}: Maximum number of atom pairs to calculate the distance distribution
- MAXDDBIN {20}: The number of distance bins in the distance distributions
- MAXCDLIST {2000}: The total number of
atom pairs that can be involved
in distance distribution calculation
REFERENCES
1. M. Mezei, Optimal Position of the Solute
for Simulations,
J. Comp. Chem. 18:812-815.
Abstract (free) and Full Text (with subscription)
2. M. Mezei, Finding of the smallest
enclosing cube to improve molecular modeling,
Information Newsletter for Computer Simulation of Condensed
Phases, CCP5, Daresbury Lab., No 47, (2000).
3. M. Mezei, A new method for mapping macromolecular topography,
J. Molec. Modeling and Simul.,
21, 463-472 (2003).
4. C. Altona and M. Sundaralingam,
J. Am. Chem. Soc., 94, 8205 (1972).
5. D. Cremer and J.A. Pople,
J. Am. Chem. Soc., 97, 1354 (1975).
6. W. Kabsch and C. Sander,
Dictionary of protein secondary structure:
Pattern recognition of hydrogen-bonded and geometrical features.
Biopolymers, 22, 2577-2637 (1983).
7. I. Visiers, B.B. Braunheim, and H. Weinstein,
Prokink: a protocol for numerical evaluation of helix
distortions by proline.
Prot. Engrg., 13, 603-606 (2000).
8. M. Mezei and M. Filizola,
TRAJELIX: A computational tool for the geometric characterization of
protein helices during molecular dynamics simulations.
J. Computer-Aided Molecular Design, 20, 97-107 (2006).
9. Computational Methods for Predicting Sites of Functionally Important Dynamics
Adam D. Schuyler, Heather A. Carlson, and Eva L. Feldman,
J. Phys. Chem. B,113, 6613-6622 (2009)
DOI: 10.1021/jp808736c