SARAMA

A standalone suite of programs to plot the distribution of residues from a globular protein in the Complementarity Plots (for Linux)

This documentation (html) file is also provided in the downloadable package (In the sub-directory: HELP)

Requirements:

Linux (tested on Redhat Enterprize / open Suse) Platform

Fortran 90 Compiler or higher (tested for f90, f95, gfortran, ifort)

Perl 5.8.8 or higher

'dos2unix' and 'display' should be running

Delphi must be pre-installed in the system and running under the command : delphi_static

Hydrogen atoms must be geometrically fixed by REDUCE to the input PDB file prior to calculation

Installation:

gunzip SARAMA.tar.gz

tar -xvf SARAMA.tar

cd SARAMA/

The directory contains 3 shell scripts ( ./install.csh, ./surfNelcomp.csh, ./clean.csh ), sub-directories ( DOC, SRC, HELP, EXEC, LIBR, TESTPDBS ) and a brief guideline (USAGE) for the installation and run commands

SRC contains all source-codes (FORTRAN 90 programs, PERL scripts and shell scripts) which upon a successful run of ./install.csh will be compiled and copied to EXEC and will be cleaned up subsequent to a successfull run of ./surfNelcomp.csh

TESTPDBS contains a few examples of the input pdb file (in correct format) required for a succesful run of ./surfNelcomp.csh

In order to intiate installation, execute the following commands

dos2unix install.csh

chmod +x install.csh

./install.csh [FORTRAN90_Compiler_name, e.g.,f90]

The main shell script (./surfNelcomp.csh) is now ready to be used

./install.csh HAS TO BE EXECUTED PRIOR TO EVERY RUN OF ./surfNelcomp.csh

FOR AN INCOMPLETE OR ABNORMAL (INTERRUPTED) TERMINATION OF ./surfNelcomp.csh ENSURE TO RUN ./clean.csh THEN ./install.csh PRIOR TO THE NEXT RUN OF THE PROGRAM

Input-Output:

Input:

PDB file in brookhaven format (tabulated at the end of this page) with .pdb (lowercase) extension
(headers / comments may or may not be present)

The filename can have any number of characters but must not contain any '.' (or blank space " ") other than the '.' in the extension (.pdb)

e.g., 2haq.pdb, deca.pdb, 1234.pdb, x1c3.pdb, 1abc-wc.pdb, 2haq-00001.pdb etc.

PDB file must not contain more than 500 amino acid residues

And should consist of only a single polypeptide chain

Residue sequence numbers (for any and all residues) must be restricted to 3-digits (1-999)

residue sequence numbers must not contain any non-integer characters (e.g., 42A, 78B etc)

Atoms must not have multiple occupancies

Only isolated metal ions listed (in appropriate format) in the table below will be considered
and other hetero atoms will be ignored

The program will automatically reject input pdb files in any of the following cases:

1. Incorrect Filename (as specified above)
2. Files including non-naturally occuring amino-acids (with record id 'ATOM')
3. Files containing more than one polypeptide chains (oligomeric proteins)
4. Files with no Hydrogen atoms
5. Files with Hydrogen atom types inconsistant with REDUCE format ('H' in the 12-12 column range, e.g., HD11, HE21)
6. Files containing more than 500 residues
7. Files contaning residue-sequence-numbers exceeding 3 digits (999)
8. Files containing redundant residue identities for the same residue position

Output:

A directory OUT$fn will be created where $fn is the input filename without the extension (e.g., OUT2haq for 2haq.pdb) and all the output files will be saved in it.

Textfiles:

Description

File name with extension

Format

The formatted pdb file

$fn.pdb

standard brookheven

The residue sequence file

$fn.res

col1

resno-restype

Metal coordination profile (if isolated metals are present)

$fn.mcores

col1	col2	col3	col4
metal_ion	<=>	coordinating_residue	N_ca

Solvent Accessibility (burial) profile

$fn.bury

col1	col2	col3	col4
resno	restype	SAA (Ang²)	burial

The van der Waals surface file

$fn-surf.pdb

standard brookheven

Surface Complementarity profile

$fn.Sm

col1	col2	col3	col4	col5	col6
resno	restype	burial	S_m^all	S_m^sc	S_m^mc

Electrostatic Complementarity profile

$fn.Sm

col1	col2	col3	col4	col5
resno-restype	burial	E_m^all	E_m^sc	E_m^mc

Joint Complementarity profile for buried residues

$fn.CSplot

col1	col2	col3	col4	col5
resno	restype	burial	S_m^sc	E_m^sc

Residue listing in the three CPs (whether lying in the 'probable', 'less probable' or 'improbable' regions of CP1, CP2, CP3)

$fn-comp.cb (CP1)
$fn-comp.pb (CP2)
$fn-comp.pe (CP3)

col1	col2	col3	col4	col5	col6	col7
resno	restype	burial	S_m^sc	E_m^sc	P_grid	CP-status

Complementarity and Accessibility scores
(see descriptions below)

$fn.CS

CS_l=x1; rGb=y1; P_count=z1; P_Sm=w1; P_Em=v1;

Postscripts:

Description	File name with extension	Template
distributions of residues in CP1	$fn-cb.ps
distributions of residues in CP2	$fn-pb.ps
distributions of residues in CP3	$fn-pe.ps

Table Abbrevations and Footnotes:

$fn : input filename without the extension
resno  :  residue sequence number
restype  =  amino acid identity
resno-restype  :  '100-LEU',' 45-TYR'
N_ca : Numer of coordinating atoms (of the metal-ligated residue)
SAA : Solvent Accessible Area (in Angstrom²)
For S_m^all, S_m^sc, S_m^mc and E_m^all, E_m^sc, E_m^mc see Reference)
P_grid : Grid Probabilities
CP-status : whether the point lies in the probable,  less probable or improbable regions of the plot.

The van der Waals surface file (in PDB format, dot surface points named according to atom identity) can easily be viewed in Rasmol (preferred display mode: wireframe off, spacefill 30) example or any other molecular display graphics program

Run Commands:

general usage : ./surfNelcomp.csh -inp [PDB_filename]

example : ./surfNelcomp.csh -inp 2HAQ.pdb

For help : ./surfNelcomp.csh -help

Provision for single residue calculations:

usage : ./surfNelcomp.csh -inp [PDB_filename] target ['residue_sequence_number'-'residue_identity' (in uppercase)]

example : ./surfNelcomp.csh -inp 2HAQ.pdb target 51-PHE

Description and Rationalization of the scores: (Click to open PDF)

Criteria for successful validation:

Any structure should simultaniously attain higher values than the thresholds for all two global scores as given below:
CS_l : 0.80
rGb : 0.011
Structures registering less than threshold values in any of the two scores needs re-investigation

In addition,
(The local score, P_count should be below 15%)
AND similar thresholds for scores based on S_m and E_m alone :
P_Sm should be above -1.017
P_Em should be above -1.789
========================================================
Average Scores for correctly folded native proteins (DB2):
Standard deviations in parentheses
CS_l: 2.24 (+-0.48), rGb: 0.055 (+-0.022) P_Sm: -0.855 (+-0.054), P_Em: -1.492 (+-0.099)
========================================================

The following list of isolated metal ions are considered in the calculations:

Metal_ion	PDB NOMENCLEATURE	CONVERTED INTO
Fortran format:	atom-res (12x,a4,1x,a3)	atom-res (13x,a3,1x,a3)
Na+	'NA NA'	' NA1 SOD'
Mg+2	'MG MG'	' MG2 MAG'
Al+3	'AL ALF'	' AL3 ALF'
K+	' K K'	' K_1 POT'
Ca+2	'CA CA'	' CA2 CAL'
Mn+2	'MN MN'	'MN2 MNG'
Mn+3	'MN MN3'	' MN3 MNG'
Fe+2	'FE FE2'	' FE2 IRN'
Fe+3	'FE FE'	' FE3 IRN'
Mg+2	'MG MG'	' MG2 MAG'
Co+2	'CO CO'	' CO2 COB'
Co+3	'CO 3CO'	' CO3 COB'
Ni+2	'NI NI'	' NI2 NIC'
Ni+3	'NI 3NI'	' NI3 NIC'
Cu+2	'CU CU1'	' CU1 COP'
Cu+2	'CU CU'	' CU2 COP'
Zn+2	'ZN ZN'	' ZN2 ZNC'
Ag+2	'AG AG'	' AG1 SLV'
Cd+2	'CD CD'	' CD2 CDM'
Pt+2	'PT2 TPT'	' PT2 PLT'
Au+	'AU AU'	' AU1 GLD'
Au+3	'AU AU3'	' AU3 GLD'
Hg+2	'HG HG'	' HG2 MRC'

water coordinates will be trimmed if present in the input pdb file
since water and surface bound ligands are modeled as bulk solvent

In case of missing atoms / patches of residues in the input pdb, the (S_m, E_m) values may not be authentic

PDB file should definitely contain Hydrogen coordinates consistent with REDUCE format

Atom and residue types will have to be consistent with brookhaven (PDB) format

Field No.	Column range	FORTRAN FORMAT	Description
1.	1-6	A6	Record ID (eg ATOM, HETATM)^*
2.	7-11	I5	Atom serial number^#
-	12-12	1X	Blank
3.	13-16	A4^**	Atom name (eg , " ND1")^*
4.	17-17	A1	Alternative location code (if any)^#
5.	18-20	A3	Standard 3-letter amino acid code for residue^*
-	21-21	1X	Blank
6.	22-22	A1	Chain identifier code^#
-	23-23	1X	Blank
7.	24-26	I3	Residue sequence number^*
8.	27-27	A1	Insertion code (if any)^#
-	28-30	3X	Blank
9.	31-38	F8.3	Atom's x-coordinate^*
10.	39-46	F8.3	Atom's y-coordinate^*
11.	47-54	F8.3	Atom's z-coordinate^*
12.	55-60	F6.2	Occupancy value for atom^#
13.	61-66	F6.2	B-value (thermal factor)^#
-	67-67	1X	Blank
14.	68-70	I3	Footnote number^#

Table Footnotes:

^# These fields might be left blank (preserving the specified format)
^* Mandatory fields
^** An alternative 3-letter atom code (Fortran format: 1X,A3) at column range (13-16) (i.e., leaving column 13 as blank) will also do.

COMPUTATIONAL TIME:

Both S_m and E_m are computed using dot surface points and therefore are essentially large.
A pdb file containing ~100 residues takes about 12 minutes in a Dell work-station (Redhat Enterprize Linux platform)