Mount Sinai School of Medicine, NYU
New York, NY 10029
Sept. 21, 2009.
The screening of a library of ligand by Autodock (Version 3.0.5 or Version 4) requires
Furthermore, the programs in this package have to be compiled with a Fortran compiler (e.g., g77). To obtain an executable form the source file program.f type (replace g77 with your compiler's name, if different)
>g77 -O4 -o program program.f
I. Ligand preparation The programs ans scripts used for praparation make sure that there is a single directory containing files with a single molecule each, with likely valid partial charges.
I.1. Prepare individual .mol2 or .pdb* files from a single file The program splitmol (written by D.A. Gschwend and adapted to xlf by M. Mezei) reads in a file containing several .mol2 or .pdb* structures and creates several files with a limited number (e.g., one) structures each. A sample run looks like this:
% splitmol Running splitmol DAG/MM v.2.16... 24-Oct-0 This program supports command-line arguments. Type splitmol -h for a brief description. Use PDB-type or MOL2 file (<P>/M)? m Enter name of the MOL2 file: ../split_test.mol2 Enter name to add to the molecule number: xxx Enter starting and ending molecule numbers to extract. (Enter 0 0 to do all, enter a negative number to use a list file.) 0 0 Split this file into groups of how many molecules? 1 Label by absolute or relative molecule # (<A>/R)? a Output file names will be of the form <number>.xxx.mol2 All molecules in input will be extracted. Each molecule will be put into an individual file. Absolute molecule numbers will be used. Working... Execution completed. 2 molecules written. 2 file(s) created. %
I.2. Filter the ligand directory
The C-shell script filtermol2.csh filters files by the following criteria:
A sample run looks like this:
% filtermol2.csh
Filtering a directory of .mol2 files - Version 09/26/2008
This script requires the following files to be present in this directory:
The executable program prepmol2
The executable program filtermol2
Name of the input directory containing the ligand .mol2 files=../mol2_files_t
Name of the directory containing the filtered ligand .mol2 files=TEST
Checking for filtermol2
Checking for prepmol2
Maximum molecular weight to keep=250
Current directory: /Users/mezei/autodock4/fullscreen
Number of files found in ../mol2_files_t : 52
Number of files kept: 47
%
A file filtermol2.log will contain detailed information about
the filtering of each ligand.
I.3. Get a list of Autodock 4 atom types used in a ligand library
The C-shell script get_typlist.csh reads all the ligand structures in a directory, extracts a list of Autodock 4 atom types and creates a file all.pdbqt with atoms having all the atomtypes found - this file will be needed for the screening run.
Running get_typlist.csh requires the executable get_typlist and the template file all0.pdbqt. It also uses Python and thus has to set the path to it. Current implementation runs on Faradis only,
A sample run looks like this:
% get_typlist.csh
Gather atom types used in this library - written by Mihaly Mezei
Version 10/22/2007 - Autodock 4
Running Thu Oct 25 11:23:52 EDT 2007
Running on faradis.mssm.edu
Host specific information (to be customized):
MGLROOT = /Library/MGLTools/1.4.5
pythonutil = /Library/MGLTools/1.4.5/MGLToolsPckgs/AutoDockTools/Utilities24
End host specific information
Name of the ligand database file contaning the .mol2 files=../mol2_files_t
/private/var/automount/Users/mezei/autodock4/mol2_files_t
Number of files found in ../mol2_files_t : 3
get_typlist: file 000007.pdbqt opened OK
get_typlist: file all0.pdbqt opened
Read 26 lines from 000007.pdbqt
Read 24 lines from all0.pdbqt
Types found so far= C OA
Types found so far= C OA A Cl NA HD
get_typlist: file all0.pdbqt opened as new
mol2 file No: 1
get_typlist: file 000008.pdbqt opened OK
get_typlist: file all.pdbqt opened
Read 34 lines from 000008.pdbqt
Read 24 lines from all.pdbqt
Types found so far= C OA A Cl NA HD
Types found so far= C OA A Cl NA HD N
get_typlist: file all.pdbqt opened as new
mol2 file No: 2
get_typlist: file 000009.pdbqt opened OK
get_typlist: file all.pdbqt opened
Read 21 lines from 000009.pdbqt
Read 24 lines from all.pdbqt
Types found so far= C OA A Cl NA HD N
mol2 file No: 3
get_typlist: file 000011.pdbqt opened OK
get_typlist: file all.pdbqt opened
Read 23 lines from 000011.pdbqt
Read 24 lines from all.pdbqt
Final all.pdbqt file (copied to /private/var/automount/Users/mezei/autodock4/alkynes):
REMARK 3 active torsions:
REMARK status: ('A' for Active; 'I' for Inactive)
ROOT
ATOM 1 C1 <0> d 1.327 3.455 -0.398 0.00 0.00 0.036 C
ATOM 2 O2 <0> d 1.420 1.980 -0.001 0.00 0.00 -0.032 OA
ATOM 3 C1 <0> d 2.376 1.251 -0.914 0.00 0.00 0.031 A
ATOM 4 C12 <0> d 3.741 1.077 -0.220 0.00 0.00 0.035 Cl
ATOM 5 N5 <0> d 4.048 2.342 0.567 0.00 0.00 0.013 NA
ATOM 6 H19 <0> d 3.145 2.387 1.810 0.00 0.00 0.008 HD
ATOM 7 N9 <0> d 1.805 1.812 1.483 0.00 0.00 0.013 N
ATOM 8 C8 <0> d 0.592 2.429 2.198 0.00 0.00 0.027 C
ATOM 9 C9 <0> d -0.610 1.949 1.340 0.00 0.00 0.006 C
ATOM 10 C10 <0> d -0.017 1.420 0.010 0.00 0.00 0.240 C
ATOM 11 C13 <0> d 3.178 3.801 2.370 0.00 0.00 0.033 C
ATOM 12 C14 <0> d 4.601 4.036 2.906 0.00 0.00 0.070 C
ATOM 13 C15 <0> d 5.624 3.720 1.831 0.00 0.00 -0.059 C
ATOM 14 C16 <0> d 6.588 4.614 1.625 0.00 0.00 -0.092 C
ATOM 15 C17 <0> d 7.646 4.378 0.641 0.00 0.00 0.383 C
ATOM 16 O2 <0> d 8.375 5.279 0.269 0.00 0.00 -0.452 OA
ATOM 17 C18 <0> d 7.797 2.971 0.104 0.00 0.00 0.029 C
ATOM 18 C19 <0> d 6.408 2.416 -0.189 0.00 0.00 0.030 C
ATOM 19 C20 <0> d 5.503 2.449 1.031 0.00 0.00 0.172 C
ENDROOT
TORSDOF 0
Processed 23 .mol2 files
%
I.4. Replace the .mol2 partial charges with AM1-calculated charges
The C-shell script gausscharge.csh runs Gaussian for all ligands in a directory, and replaces the partial charges with the result of the Mulliken population analysis of their AM1 wavefunction. Optionally, the conformation can be minimized with Gaussian. Running gausscharge.csh requires the C-shell script mol2togauss.csh and the executables prepmol2, gausstomol2, mol2togauss. mol2togauss prepares the input file for Gaussian and gausstomol2 extracts the charges from the Gaussian output file and replaces the values in the .mol2 file.
The programs mol2togauss and gausstomol2
can be run independently. The first command-line argument is
the full name of the .mol2 file.
mol2togauss takes the second argumnt as an indicator for
optimization: it will set up input for an optimization run if the
first character is 'y' or 'Y'. For example,
% mol2togauss test.mol2 yes
will create an input file test.mol2.g99
from the structure in test.mol2
asking for optimization.
% gausstomol2 test.mol2
will replace the charges and coordinates of test.mol2
by the charges and coordinates calculated by Gaussian
(read from the Gaussian output file test.mol2.g99out) and write
a new mol2 file called test.mol2.am1.
A sample run looks like this:
% gausscharge.csh Name of the directory containing the ligand .mol2 files=hits_mol2 Name of the directory to put the converted .mol2 files=hits_mol2_am1 First molecule to use [1]= Last molecule to use [1000000]= Name of the Gaussian executable [gaussian]= Is your Gaussian version 03 or higher? (y/n)? n Do you want to remove all Gaussian files (y/n)? n Number of CPUs to use [10]=4 Do you want to optimize the structure as well (y/n)? n Number of CPUs to be used: 4 Current directory: /hosts/fulcrum/home/mezei/MOLMOD/autodock/morph Converting ligands in directory hits_mol2 to directory hits_mol2_am1 Number of files found in hits_mol2 : 2 Trying file 5236332.1.mol2 Calling prepmol2 for ligand No 1 5236332.1.mol2 Prepmol2: file 5236332.1.mol2 opened OK Prepmol2: file 5236332.1.mol2.new opened OK Number of lines= 87 number of atoms= 37 number of bonds= 39 Number of substitutions= 0 Number of atoms dropped= 0 number of bonds dropped= 0 Charge sum= 0.00000 ls: No match. Starting conversion for 5236332.1.mol2 mol2togauss: file 5236332.1.mol2 opened OK mol2togauss: file 5236332.1.mol2.g99 opened OK MOLECULE 5236332.1.mol2 Number of atoms= 37 Charge sum= 0.00000 User = mezei Host = pepi WARNING: Reserve directory /hosts/pepi/reserve/mezei is not found /hosts/pepi/scr/mezei will be used instead more WARNING: You do not even have a scr directory here, the current directory: /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph will be used instead Input file: 5236332.1.mol2.g99 GAUSS_EXEDIR = /hosts/fulcrum/homes/molmod/src/g99 GAUSS_ARCDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph GAUSS_SCRDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph job 5236332.1.mol2 done File 5236332.1.mol2 opened OK File 5236332.1.mol2.am1 opened OK File 5236332.1.mol2.g99out opened OK Sum of rounded charges= -2.6822090E-07 Adjusting the charge of atom 1 with -2.6822090E-07 Finished writing to file 5236332.1.mol2.am1 37 atoms Trying file 5464817.1.mol2 Calling prepmol2 for ligand No 2 5464817.1.mol2 Prepmol2: file 5464817.1.mol2 opened OK Prepmol2: file 5464817.1.mol2.new opened OK Number of lines= 103 number of atoms= 45 number of bonds= 47 Number of substitutions= 0 Number of atoms dropped= 0 number of bonds dropped= 0 Charge sum= 0.00000 ls: No match. Starting conversion for 5464817.1.mol2 mol2togauss: file 5464817.1.mol2 opened OK mol2togauss: file 5464817.1.mol2.g99 opened OK MOLECULE 5464817.1.mol2 Number of atoms= 45 Charge sum= 0.00000 User = mezei Host = pepi WARNING: Reserve directory /hosts/pepi/reserve/mezei is not found /hosts/pepi/scr/mezei will be used instead more WARNING: You do not even have a scr directory here, the current directory: /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph will be used instead Input file: 5464817.1.mol2.g99 GAUSS_EXEDIR = /hosts/fulcrum/homes/molmod/src/g99 GAUSS_ARCDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph GAUSS_SCRDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph job 5464817.1.mol2 done File 5464817.1.mol2 opened OK File 5464817.1.mol2.am1 opened OK File 5464817.1.mol2.g99out opened OK Sum of rounded charges= -4.0026009E-04 Adjusting the charge of atom 1 with -4.0026009E-04 Finished writing to file 5464817.1.mol2.am1 45 atoms %
The screening script fullscreen.csh requires individual .mol2 or .pdbqt files in a single directory. It also requires a sample file all.pdbq (for Autodock 3) or all.pdbqt (for Autodock 4) containing all the atom types that are used in the library.
There are placeholders in the script to specify the path to the
The script fullscreen.csh asks the user to specify
fullscreen.csh creates a directory macro_dock and a log file macro.log, and calls the script screenlist_loop_3.csh or screenlist_loop_4.csh for Autodock 3 or 4, resp. These shell scripts prepare the input for Autogrid, run it and for each ligand prepare the input file for Autodock and run it. They use the executable prepmol2 and a system-dependent script submit_*.csh. For systems using the Sun grid engine the shell script dockit_gridengine.csh will work with Autodock4.
Currently, screenlist_loop_3.csh can be run on the SGI systems served by Fulcrum or on Faradis, on other systems the path to the Autodock executables have to be changed. screenlist_loop_4.csh runs on systems wuth the Sun grid engine (e.g., Faradis and Nagini), on TACC/Ranger using the launcher utility and on any Unix/Linux system using a single CPU. For other queing systems it has to be extended.
The script runs Autogrid on a single compute noder. Once the grid is ready, it submits the allowed number of docking jobs to run on different CPU's. It creates a directory runcount where each job creates a file when starting the run and deleting it when the run ends. The number of files in runcount control the submission of subsequent jobs. Note, that the number of CPU's to be used can be modified during the run - see the instruction printed in the sample run below.
There is also an option to run Autogrid and the various scripts preparing the input for the docking runs (.pdbqt and .dpf files for each ligand) but not running Atutodock. Using this option allows the perparation of a system that can be run on a different system (e.g. on a supercomputer) that has only Autodock installed, but not the script libraries.
This option is complemented by the option of skipping all preliminaries and assume that all grid map (created my Autogrid) and docling input files are already present and just perform the docking runs.
A sample run looks like this:
% fullscreen.csh
Automated screening using Autodock
Written by Mihaly Mezei - version 10/18/2009
Will the grid have more than 128 points in any direction (y/n)? n
Server is Nagini - distributed memory system using the Sun grid engine
Normal size grid
Name of the macromolecule file (without the .pdbqs or .pdbqt)=PAH2_m
PAH2_m.pdbqt found - run will use Autodock 4
Do you have flexible residues (y/n) [n]?
This script requires the following files to be present in this directory:
The executable program prepmol2
The executable program checkresnum
The sample structure file with all atom types all.pdbqt
The awk scripts mod_gpf.awk
The csh scripts add_dpf.csh
The c-shell script screenlist_loop_4.csh
The c-shell script dockit_gridengine.csh
Checking checkresnum
Checking prepmol2
Checking mod_gpf.awk
Checking mod_dpf.awk
Checking screenlist_loop_4.csh
Checking dockit_gridengine.csh
Checking residue numbers in file PAH2_m.pdbqt
Residue number check - Version 09/23/08
Number of CPUs to use [20]=75
NOTE: the number of CPUs can be changed by creating a file in this directory
called PAH2_m.NEWNCPU containing the new number of CPUs and nothing else
First molecule to use [1]=
Last molecule to use [99999999]=
Number of dockings per ligand (50 or more recommended) [50]=100
RMSD tolerance for clustering (in A) [1.0]=2
Maximum number of GA energy evaluations (0 torsions) [250000]=
Maximum number of GA energy evaluations (1-2 torsions) [500000]=
Maximum number of GA energy evaluations (3-5 torsions) [1000000]=
Maximum number of GA energy evaluations (6-10 torsions) [2000000]=
Maximum number of GA energy evaluations (11- torsions) [3000000]=
Maximum number of GA generations [27000]=
Maximum number of GA populations [200]=
Ligand charge option
Add Gasteiger charges : g
Add Kollman charges : k
Keep input charges : i [g]
X[Grid center] (in A) [0]=-1
Y[Grid center] (in A) [0]=21
Z[Grid center] (in A) [0]=9
Grid spacing (in A) [0.375]=
gridspace set to 0.375
Number of gridpoints in the X direction (even number)=96
Number of gridpoints in the Y direction (even number)=96
Number of gridpoints in the Z direction (even number)=96
Do you want to ignore symmetry for ligand pose clustering (y/n) [n]?
Do you want to print all members of the clusters (y/n) [n]?
Do you want to drop files with names of the form H.M.T
where only the M part differs (y/n) [n]?
Do you want to partition the number of dockings for files of the form H.M.T
where only the M part differs (y/n) [y]?
Minimum number of docking attempts per copy [20]=
Name of the directory containing the ligand files=../CHEMBR_chir_opt_pdbqt
Are the ligand files already in .pdbqt form (instead of .mol2) (y/n) y
Do you want to only run the setup but skip docking for now (y/n) [n]? n
Do you want to remove all ligand .mol2, .pdbqt, .dpf files (y/n) [n]? y
All ligand .mol2, .pdbqt, .dpf files will be removed
Number of CPUs to be used: 75
Calculations will be logged on file PAH2_m.log
Docking ligands in directory ../CHEMBR_chir_opt_pdbqt to macromolecule PAH2_m.pdbqt
RMSD tolerance for clustering: 2 A
Files with names of the form H.M.T where only the M part differs will
share the number of docking attempts
Minimum number of docking attempts per copy= 20
Number of dockings/ligand: 100
Maximum number of GA energy evaluations (0 torsions)=250000
Maximum number of GA energy evaluations (1-2 torsions)=500000
Maximum number of GA energy evaluations (3-5 torsions)=1000000
Maximum number of GA energy evaluations (6-10 torsions)=2000000
Maximum number of GA energy evaluations (11- torsions)=3000000
Maximum number of GA generations: 27000
Maximum number of GA populations: 200
Grid center at < -1 , 21 , 9 >
Number of gridpoints in the x, y, and z direction: 96, 96, and 96
Grid spacing = 0.375 A
Gasteiger charges will be added to the ligand
Grids will be generated for atom types (based on all.pdbqt):
1 C
2 OA
3 A
4 N
5 HD
6 Cl
7 NA
8 SA
9 P
10 F
11 Br
12 S
13 I
The rm command (to delete a file) is aliased to
For proper functioning of this script, the alias will be removed
thus files will be removed without confirmation.
OK to submit the run (y/n)? y
[1] 23784
%
In this example, the script recognized our host and used the built-in
values for the platform-dependent information. It your host is not recognized
by the script, you will we prompted for those data.
The script fullscreen.csh keeps track of the jobs running via files in the directory runcount. If a job aborts then the file corresponding to this job (that would be deleted when the job exits normally) may not be deleted. The script cleanruncount.csh checks for such files and offers to delete them.
II.3.Changing the number of CPU's used
The number of CPUs used by a job (i.e., the number of ligands docked simulatanously on different CPU's) can be changed by creating a file in the directory running the job called macro.NEWNCPU containing the new number of CPUs and nothing else. The requested change is executed by attrition: the next job will only be submitted when the currently running jobs number less than the new limit.
These functions include the sorting, filtering and extracting of the results, as well as some house cleaning.
III.1. Gather a list of docking log files
The script getdir.csh looks into the directory of docking log files and extracts the name of the grid parameter file and of all the logfiles. This information is written into a file macro.dir and used by the program dockres.
A sample run looks like this:
% getdir_new.csh Name of the macromolecule file (without the .pdbq*)=cbx_h cbx_h.pdbqt found Existing cbx_h.dir file is removed Directory of .dlg files: cbx_h_dock - OK (y/n)? y Number of files found in cbx_h_dock : 2114 Number of cbx_h .dlg files found in cbx_h_dock : 208 %
III.2. Sort, filter and extract docked poses
The program Dockres gathers the top binders and diplays a variety of statistics, both on the ligand set and on the top binding poses. It writes the result in a file with extension .res and, if requested, extracts docked poses into .pdb files. Since it can be run independently of these utilities, it has a separate documentation.
III.3. Combine results of different screening runs
The program compare_pose reads the .res files created by dockres from docking of the same ligand library to different conformations of the same macromolecule and for each ligand appearing on the top-scoring list gathered by dockres collects the binding free energies and the multiplicities of each hit.
Ligand ID's of the form nnnnnn_aa, nnnnnn_bb are assumed to be stereoisomers of the same molecule. If such ligands are found, the user are given the option of lumping stereoisomers together for calculating the averager free-energy estimates.
Running the program requires the specification of the names of the .res files. A sample run looks like this:
% compare_pose
Name of the (next) .res file (w/o the .res)=pcafpm
Name of the (next) .res file (w/o the .res)=pcafpl
Name of the (next) .res file (w/o the .res)=
Files read:
1 pcafpm.res
2 pcafpl.res
FE 1 lig FE 2 lig FE
-10.6 10817 -12.4 10815
-10.4 1865 -11.8 2029
-10.1 11209 -11.5 2944
-10.0 5784 -11.4 11995
-10.0 2945 -11.3 7831
-9.9 2945 -11.2 5782
-9.9 10817 -11.2 1864
-9.8 5784 -11.1 2944
-9.7 11213 -11.1 2817
-9.6 7833 -11.0 6629
-9.6 1865 -10.8 1864
-9.6 11997 -10.8 2944
-9.6 11209 -10.8 7831
-9.6 12316 -10.6 3793
-9.6 10817 -10.6 10815
-9.4 11213 -10.5 1864
-9.4 9441 -10.4 11207
-9.4 2411 -10.4 2029
-9.3 2945 -10.4 11207
-9.3 3795 -10.4 11995
Ligand 5791272 # 10815 file#= 2 pose= 4 rank= 1
Contact res: 83(THR ) fe= -12.4 multiplicity=43
Ligand 5791272 # 10815 file#= 2 pose= 6 rank= 15
Contact res: 42(GLU ) fe= -10.6 multiplicity=43
Ligand 5791272 # 10815 = -11.5 # of hits= 2 Distribution: 0 2
# of poses= 86 Distribution= 0 86
Ligand 5348595 # 5782 file#= 2 pose= 2 rank= 6
Contact res: 83(THR ) fe= -11.2 multiplicity=15
Ligand 5348595 # 5782 = -11.2 # of hits= 1 Distribution: 0 1
# of poses= 101 Distribution= 0 101
..............
%
(output is truncated)
The ligands are sorted by the average free energy of all poses on all targets.
III.4. 'House-cleaning' of the docking log directory
When a directory contains a large number of files the ls command in the C shell is known to fail. To circumvent this, there are three scripts that clean, copy, compress and uncompress files in a directory, irrespective of the nuber of files there:
clean_dock_dir.csh and
compressdir.csh
accept the directory name as an argument, e.g.,
% compressdir x_dock
will process all files in the directory x_dock.
If no argument is given, each script will prompt for the directory name
to clean, compress or uncompress.