SARAMA

A standalone suite of programs to plot the distribution of residues from a globular protein in the Complementarity Plots (for Linux)

This documentation (html) file is also provided in the downloadable package (In the sub-directory: HELP)

Requirements:

  • Linux (tested on Redhat Enterprize / open Suse) Platform

  • Fortran 90 Compiler or higher (tested for f90, f95, gfortran, ifort)

  • Perl 5.8.8 or higher

  • 'dos2unix' and 'display' should be running

  • Delphi must be pre-installed in the system and running under the command : delphi_static

  • Hydrogen atoms must be geometrically fixed by REDUCE to the input PDB file prior to calculation

  • Installation:

  • gunzip SARAMA.tar.gz
  • tar -xvf SARAMA.tar
  • cd SARAMA/
  • The directory contains 3 shell scripts ( ./install.csh,  ./surfNelcomp.csh,  ./clean.csh ), sub-directories ( DOC, SRC, HELP, EXEC, LIBR, TESTPDBS ) and a brief guideline (USAGE) for the installation and run commands
  • SRC contains all source-codes (FORTRAN 90 programs, PERL scripts and shell scripts) which upon a successful run of ./install.csh will be compiled and copied to EXEC and will be cleaned up subsequent to a successfull run of ./surfNelcomp.csh
  • TESTPDBS contains a few examples of the input pdb file (in correct format) required for a succesful run of ./surfNelcomp.csh

  • In order to intiate installation, execute the following commands

  • dos2unix install.csh
  • chmod +x install.csh
  • ./install.csh   [FORTRAN90_Compiler_name, e.g.,f90]

  • The main shell script (./surfNelcomp.csh) is now ready to be used

    ./install.csh  HAS TO BE EXECUTED PRIOR TO EVERY RUN OF  ./surfNelcomp.csh

    FOR AN INCOMPLETE OR ABNORMAL (INTERRUPTED) TERMINATION OF  ./surfNelcomp.csh  ENSURE TO RUN  ./clean.csh  THEN  ./install.csh PRIOR TO THE NEXT RUN OF THE PROGRAM

    Input-Output:

  • Input:

    PDB file in brookhaven format (tabulated at the end of this page) with .pdb (lowercase) extension
                  (headers / comments may or may not be present)
  • The filename can have any number of characters but must not contain any '.' (or blank space "  ") other than the '.' in the extension (.pdb)

  • e.g., 2haq.pdb, deca.pdb, 1234.pdb, x1c3.pdb, 1abc-wc.pdb, 2haq-00001.pdb etc.

  • PDB file must not contain more than 500 amino acid residues

  • And should consist of only a single polypeptide chain

  • Residue sequence numbers (for any and all residues) must be restricted to 3-digits (1-999)

  • residue sequence numbers must not contain any non-integer characters (e.g., 42A, 78B etc)

  • Atoms must not have multiple occupancies

  • Only isolated metal ions listed (in appropriate format) in the table below will be considered
    and other hetero atoms will be ignored

  • The program will automatically reject input pdb files in any of the following cases:

    1. Incorrect Filename (as specified above)
    2. Files including non-naturally occuring amino-acids (with record id 'ATOM')
    3. Files containing more than one polypeptide chains (oligomeric proteins)
    4. Files with no Hydrogen atoms
    5. Files with Hydrogen atom types inconsistant with REDUCE format ('H' in the 12-12 column range, e.g., HD11, HE21)
    6. Files containing more than 500 residues
    7. Files contaning residue-sequence-numbers exceeding 3 digits (999)
    8. Files containing redundant residue identities for the same residue position


  • Output:

  • A directory OUT$fn will be created where $fn is the input filename without the extension (e.g., OUT2haq for 2haq.pdb) and all the output files will be saved in it.

  • Textfiles:


  • DescriptionFile name with extensionFormat
    The formatted pdb file$fn.pdbstandard brookheven
    The residue sequence file$fn.res
    col1
    resno-restype
    Metal coordination profile (if isolated metals are present)$fn.mcores
    col1col2col3col4
    metal_ion<=>coordinating_residueNca
    Solvent Accessibility (burial) profile$fn.bury
    col1col2col3col4
    resnorestypeSAA (Ang2)burial
    The van der Waals surface file$fn-surf.pdbstandard brookheven
    Surface Complementarity profile$fn.Sm
    col1col2col3col4col5col6
    resnorestypeburialSmallSmscSmmc
    Electrostatic Complementarity profile$fn.Sm
    col1col2col3col4col5
    resno-restypeburialEmallEmscEmmc
    Joint Complementarity profile for buried residues$fn.CSplot
    col1col2col3col4col5
    resnorestypeburialSmscEmsc
    Residue listing in the three CPs (whether lying in the 'probable', 'less probable' or 'improbable' regions of CP1, CP2, CP3)$fn-comp.cb (CP1)
    $fn-comp.pb (CP2)
    $fn-comp.pe (CP3)
    col1col2col3col4col5col6col7
    resnorestypeburialSmscEmscPgridCP-status
    Complementarity and Accessibility scores
    (see descriptions below)
    $fn.CSCSl=x1; rGb=y1; Pcount=z1; PSm=w1; PEm=v1; 

  • Postscripts:

  • DescriptionFile name with extensionTemplate
    distributions of residues in CP1$fn-cb.ps
    distributions of residues in CP2$fn-pb.ps
    distributions of residues in CP3$fn-pe.ps

    Table Abbrevations and Footnotes:

    $fn : input filename without the extension
    resno  :  residue sequence number
    restype  =  amino acid identity
    resno-restype  :  '100-LEU',' 45-TYR'
    Nca : Numer of coordinating atoms (of the metal-ligated residue)
    SAA  : Solvent Accessible Area (in Angstrom2)
    For Small, Smsc, Smmc and Emall, Emsc, Emmc see Reference)
    Pgrid : Grid Probabilities
    CP-status : whether the point lies in the probable,  less probable or improbable regions of the plot.

  • The van der Waals surface file (in PDB format, dot surface points named according to atom identity) can easily be viewed in Rasmol (preferred display mode: wireframe off, spacefill 30) example or any other molecular display graphics program



  • Run Commands:

  • general usage : ./surfNelcomp.csh   -inp   [PDB_filename]


  • example : ./surfNelcomp.csh   -inp   2HAQ.pdb


  • For help : ./surfNelcomp.csh   -help


  • Provision for single residue calculations:


  • usage : ./surfNelcomp.csh   -inp   [PDB_filename]   target   ['residue_sequence_number'-'residue_identity' (in uppercase)]


  • example : ./surfNelcomp.csh   -inp   2HAQ.pdb   target   51-PHE


  • Description and Rationalization of the scores: (Click to open PDF)

    Criteria for successful validation:

    Any structure should simultaniously attain higher values than the thresholds for all two global scores as given below:
    CSl : 0.80
    rGb  : 0.011
    Structures registering less than threshold values in any of the two scores needs re-investigation

    In addition,
    (The local score, Pcount should be below 15%)
    AND similar thresholds for scores based on Sm and Em alone :
    PSm should be above -1.017
    PEm should be above -1.789
    ========================================================
    Average Scores for correctly folded native proteins (DB2):
    Standard deviations in parentheses

    CSl: 2.24 (+-0.48), rGb: 0.055 (+-0.022) PSm: -0.855 (+-0.054), PEm: -1.492 (+-0.099)
    ========================================================

  • The following list of isolated metal ions are considered in the calculations:

  • Metal_ion
    PDB NOMENCLEATURE
    CONVERTED INTO
    Fortran format:
    atom-res
    (12x,a4,1x,a3)
    atom-res
    (13x,a3,1x,a3)
    Na+
    'NA    NA'
    ' NA1 SOD'
    Mg+2
    'MG    MG'
    ' MG2 MAG'
    Al+3
    'AL   ALF'
    ' AL3 ALF'
    K+
    ' K     K'
    ' K_1 POT'
    Ca+2
    'CA    CA'
    ' CA2 CAL'
    Mn+2
    'MN    MN'
    'MN2 MNG'
    Mn+3
    'MN   MN3'
    ' MN3 MNG'
    Fe+2
    'FE   FE2'
    ' FE2 IRN'
    Fe+3
    'FE    FE'
    ' FE3 IRN'
    Mg+2
    'MG    MG'
    ' MG2 MAG'
    Co+2
    'CO    CO'
    ' CO2 COB'
    Co+3
    'CO   3CO'
    ' CO3 COB'
    Ni+2
    'NI    NI'
    ' NI2 NIC'
    Ni+3
    'NI   3NI'
    ' NI3 NIC'
    Cu+2
    'CU   CU1'
    ' CU1 COP'
    Cu+2
    'CU    CU'
    ' CU2 COP'
    Zn+2
    'ZN    ZN'
    ' ZN2 ZNC'
    Ag+2
    'AG    AG'
    ' AG1 SLV'
    Cd+2
    'CD    CD'
    ' CD2 CDM'
    Pt+2
    'PT2   TPT'
    ' PT2 PLT'
    Au+
    'AU    AU'
    ' AU1 GLD'
    Au+3
    'AU   AU3'
    ' AU3 GLD'
    Hg+2
    'HG    HG'
    ' HG2 MRC'

  • water coordinates will be trimmed if present in the input pdb file
    since water and surface bound ligands are modeled as bulk solvent


  • In case of missing atoms / patches of residues in the input pdb, the (Sm, Em) values may not be authentic
  • PDB file should definitely contain Hydrogen coordinates consistent with REDUCE format

  • Atom and residue types will have to be consistent with brookhaven (PDB) format

  • Field No.
    Column range
    FORTRAN FORMAT
    Description
    1.
    1-6
    A6
    Record ID (eg ATOM, HETATM)*
    2.
    7-11
    I5
    Atom serial number#
    -
    12-12
    1X
    Blank
    3.
    13-16
    A4**
    Atom name (eg   , " ND1")*
    4.
    17-17
    A1
    Alternative location code (if any)#
    5.
    18-20
    A3
    Standard 3-letter amino acid code for residue*
    -
    21-21
    1X
    Blank
    6.
    22-22
    A1
    Chain identifier code#
    -
    23-23
    1X
    Blank
    7.
    24-26
    I3
    Residue sequence number*
    8.
    27-27
    A1
    Insertion code (if any)#
    -
    28-30
    3X
    Blank
    9.
    31-38
    F8.3
    Atom's x-coordinate*
    10.
    39-46
    F8.3
    Atom's y-coordinate*
    11.
    47-54
    F8.3
    Atom's z-coordinate*
    12.
    55-60
    F6.2
    Occupancy value for atom#
    13.
    61-66
    F6.2
    B-value (thermal factor)#
    -
    67-67
    1X
    Blank
    14.
    68-70
    I3
    Footnote number#


    Table Footnotes:

    # These fields might be left blank (preserving the specified format)
    * Mandatory fields
    ** An alternative 3-letter atom code (Fortran format: 1X,A3) at column range (13-16) (i.e., leaving column 13 as blank) will also do.

    COMPUTATIONAL TIME:

    Both Sm and Em are computed using dot surface points and therefore are essentially large.
    A pdb file containing ~100 residues takes about 12 minutes in a Dell work-station (Redhat Enterprize Linux platform)