SARAMA

A standalone suite of programs to plot the distribution of residues embedded at a globular protein interior in the Complementarity Plots [CPs] (for Linux)

&

SARAMAint

The same for residues buried upon complexation embedded at a Protein-Protein Interface (Linux)

This documentation (html) file is also provided in the downloadable package (In the sub-directory: HELP)

Requirements:

  • Linux (tested on Redhat Enterprize / open Suse) Platform

  • Fortran 90 Compiler or higher (tested for ifort [Strongly Recommended] f90, f95, gfortran)

  • Perl 5.8.8 or higher

  • 'dos2unix' and (ghostscript) 'gv' should be running

  • Delphi must be pre-installed in the system and running under the command : delphi_static (v.4) / delphi95 (v.6.2)

  • Hydrogen atoms must be geometrically fixed by REDUCE (Version 2) to the input PDB file prior to calculation
  • In case you are using REDUCE v.3, add hydrogens and then change Hydrogen atom format to make them compatible to REDUCE v.2 using the master-script "./addH_reduce3to2.bash" present in the sub-directory: reduce_scripts/ provided with this distribution
  • Examples of the two formats are given in the files below (to be found in this reduce_scripts/ sub-directory)
    REDUCE v.2: reducev2_example.pdb
    REDUCE v.3: reducev3_example.pdb
  • If Hydrogens are added using MD-packages like AMBER, NAMD etc. they should also be compatible to REDUCE v.2; However we encourage our users to check that and use their own scripts for these case-specific format conversions

  • Users now have the choice of opting for either the single (SARAMA / SARAMAint) or multi-dielctric (SARAMA-multidielctric-delphi / SARAMAint-multidielctric-delphi) versions of the package to appropriately set the protein internal dielectric at the interior / or at the interface.

  • Users are recomended to read additional background literature before implementing the multi-dielctric Delphi-Gaussian-mode

  • All calculations (validation and benchmarking) were performed using
    REDUCE version 2 to build Hydrogen atoms geometrically

    Installation:

  • gunzip SARAMA.tar.gz (or gunzip SARAMAint.tar.gz) [for SARAMA and SARAMAint respectively ]
  • tar -xvf SARAMA.tar (or tar -xvf SARAMAint.tar)
  • cd SARAMA/ (or cd SARAMAint/)
  • The main directory (SARAMA or SARAMAint) contains Four shell scripts ( ./install,  ./ComplPlot,  (./CPint in the case of SARAMAint) ./clean  ./refresh), sub-directories ( DOC, SRC, HELP, EXEC, LIBR, TESTPDBS ) a file (README.output) for the description of the output files and a brief guideline (USAGE) for the installation and run commands;
    It also contains DOWNLOAD Links for REDUCE (version 2) and DelPhi
  • SRC contains all source-codes (FORTRAN 90 programs, PERL scripts and Bourn and C shell scripts) which upon a successful run of ./install will be compiled and copied to EXEC  whereupon the main command script ./ComplPlot could be run successfully
  • TESTPDBS contains a few examples of the input pdb file (in the correct format) (with a nested subdirectory LARGE containing pdb files with more than 800 residues (given in the case of SARAMA)) required for a succesful run of ./ComplPlot and ./CPint (SARAMAint)
  • TESTPDBS also contain another nested subdirectory named FALSE containing examples of pdb files formatted incorrectly (which returns error LOGS being failed to run by ./ComplPlot or ./CPint (SARAMAint))

  • In order to intiate installation, execute the following commands

  • dos2unix install
  • chmod +x install
  • ./install   [FORTRAN90_Compiler_name, e.g.,ifort (Strongly recommended)]

  • The main shell script (./ComplPlotor ./CPint (SARAMAint)) is now ready to be used

    ./refresh removes all intermediate files after a successful or otherwise failed run of the program
    whereas in addition to this ./clean removes all the executables from EXEC and all the library files from the current directory
    Thus, if cleaned,./install  HAS TO BE EXECUTED PRIOR TO THE NEXT RUN OF  ./ComplPlotor ./CPint (SARAMAint)

    FOR AN INCOMPLETE OR ABNORMAL (INTERRUPTED) TERMINATION OF  ./ComplPlot  or ./CPint (SARAMAint) ENSURE TO RUN  ./clean  THEN  ./install PRIOR TO THE NEXT RUN OF THE PROGRAM

    Input-Output:

  • Input:

    The default should read as that for SARAMA (or otherwise explicitely mentioned in case of SARAMAint)

  • PDB file in brookhaven format (tabulated at the end of this page) with .pdb (lowercase) extension (headers / comments may or may not be present)
  • The filename can have any number of characters but must not contain any '.' (or blank space "  ") other than the '.' in the extension (.pdb)

  • e.g., 2haq.pdb, deca.pdb, 1234.pdb, x1c3.pdb, 1abc-wc.pdb, 2haq-00001.pdb etc.

  • PDB file must not contain more than 990 amino acid residues

  • And should consist of only a single polypeptide chain [In the case of SARAMAint, 'two and exactly two' polypeptide chains]

  • Residue sequence numbers (for any and all residues) must be restricted to 3-digits (1-999)

  • residue sequence numbers must not contain any non-integer characters (e.g., 42A, 78B etc)

  • Atoms must not have multiple occupancies

  • Only isolated metal ions listed (in appropriate format) in the table below will be considered
    and other hetero atoms will be ignored

  • The program will automatically reject input pdb files in any of the following cases:
    (In each case a LOG file (fch.log) containing the details of rejection would be provided into a folder named LOG_$fn (for the input: $fn.pdb))

    1. Incorrect Filename (as specified above)
    2. Files including non-naturally occuring amino-acids (with record id 'ATOM')
    3. Files containing more than one polypeptide chains (SARAMA) [and not equal to 'Two' chains in case of SARAMAint]
    4. Files with no Hydrogen atoms
    5. Files with Hydrogen atom types inconsistant with REDUCE [version 2] format
    6. Files containing more than 990 residues
    7. Files contaning residue-sequence-numbers exceeding 3 digits (999)
    8. Files containing redundant residue identities for the same residue position
    9. Files containing main-chain (backbone) atoms only


  • Output:

  • A directory OUT$fn will be created where $fn is the input filename without the extension (e.g., OUT2haq for 2haq.pdb) and all the output files will be saved in it.
    In the following description of the outfiles, the default should read as that for SARAMA (or otherwise explicitely mentioned in case of SARAMAint)

  • Textfiles:


  • DescriptionFile name with extensionFormat
    The formatted pdb file$fn.pdb
    +
    $fn_X.pdb, $fn_Y.pdb $fn_XY.pdb (SARAMAint)
    standard brookheven
    The residue sequence file$fn.res
    +
    $fn_X-intf.res, $fn_Y-intf.res (SARAMAint)
    col1
    resno-restype
    Metal coordination profile (if isolated metals are present)

    $fn.mcores



    ==================================
    In adition, $fn-met.spt (a RASMOL script file) is also provided for visualizining the metal ions and the coordinating residues at the interface (SARAMAint)
    col1col2col3col4
    metal_ion<=>coordinating_residueNca
    RASMOL script file
    Solvent Accessibility (burial) profile$fn.bury
    col1col2col3col4
    resnorestypeASA (Ang2)burial
    The van der Waals surface file$fn-surf.pdbstandard brookheven
    A RASMOL script (.spt) to view the van der Waals surface of the molecular interior (or the interface (SARAMAint)) in the background of the entire molecular (or bimolecular) surface (in different colors) $fn-rasint.spt [example]

    $fn-intsurf.spt (SARAMAint) [example]
    (requires surfinp.pdb in the same directory to display)
    rasmol script file format
    Surface Complementarity profile$fn.Sm
    col1col2col3col4col5col6
    resnorestypeburialSmallSmscSmmc
    Electrostatic Complementarity profile$fn.Sm
    col1col2col3col4col5
    resno-restypeburialEmallEmscEmmc
    Joint Complementarity profile for buried residues$fn.CSplot
    col1col2col3col4col5
    resnorestypeburialSmscEmsc
    Residue listing in the three CPs (whether lying in the 'probable', 'less probable' or 'improbable' regions of CP1, CP2, CP3)$fn-comp.cb (CP1)
    $fn-comp.pb (CP2)
    $fn-comp.pe (CP3)
    col1col2col3col4col5col6col7
    resnorestypeburialSmscEmscPgridCP-status
    Complementarity and Accessibility scores
    (see descriptions below)
    $fn.CSCSl=x1; rGb=y1; Pcount=z1; PSm=w1; PEm=v1; 

  • Postscripts: (ghostscript (gv) should be installed and used to view these)

  • DescriptionFile name with extensionTemplate
    distributions of residues in CP1$fn-cb.ps
    distributions of residues in CP2$fn-pb.ps
    distributions of residues in CP3$fn-pe.ps

    Table Abbrevations and Footnotes:

    $fn : input filename without the extension
    resno  :  residue sequence number
    restype  =  amino acid identity
    resno-restype  :  '100-LEU',' 45-TYR'
    Nca : Numer of coordinating atoms (of the metal-ligated residue)
    ASA  : solvent Accessible Surface Area (in Angstrom2)
    For Small, Smsc, Smmc and Emall, Emsc, Emmc see
    Reference)
    Pgrid : Grid Probabilities (Final Pgrid values are computed by bilinear interpolation)
    CP-status : whether the point lies in the probable,  less probable or improbable regions of the plot.

  • The van der Waals surface file (in PDB format, dot surface points named according to atom identity) can easily be viewed in Rasmol (preferred display mode: wireframe off, spacefill 30) example or any other molecular display graphics program
  • Run Commands:

    SARAMA

  • general usage : ./ComplPlot   -inp   [PDB_filename]
  • example : ./ComplPlot   -inp   2HAQ.pdb
  • For help : ./ComplPlot   -help
  • Provision for single residue calculations:
  • usage : ./ComplPlot   -inp   [PDB_filename]   -tar   ['residue_sequence_number'-'residue_identity' (in uppercase)]
  • example : ./ComplPlot   -inp   2HAQ.pdb   -tar   151-PHE
  • SARAMAint

  • usage : ./CPint   -inp   [PDB_filename]
  • example : ./CPint   -inp   1esv.pdb
  • For help : ./CPint   -help
  • Description and Rationalization of the scores: (Click to open PDF)
     Reference

    Criteria for successful validation:

    Any structure should simultaniously attain higher values than the thresholds for all two global scores as given below:
    CSl : 0.80
    rGb  : 0.011
    Structures registering less than threshold values in any of the two scores needs re-investigation

    In addition,
    (The local score, Pcount should be below 15%)
    AND similar thresholds for scores based on Sm and Em alone :
    PSm should be above -1.017
    PEm should be above -1.789
    ========================================================
    Average Scores for correctly folded native proteins (DB2):
    Standard deviations in parentheses

    CSl: 2.24 (+-0.48), rGb: 0.055 (+-0.022) PSm: -0.855 (+-0.054), PEm: -1.492 (+-0.099)
    The main global CP-Scores (CSl & rGb) are also cross-validated (given below) in a Database of 1651 native high-resolution PPI complex crystal structures ('DB3' : SARAMAint)
    CSl: 2.29 (+-0.71), rGb: 0.059 (+-0.022)
    ========================================================

  • The following list of isolated metal ions are considered in the calculations:

  • Metal_ion
    PDB NOMENCLEATURE
    CONVERTED INTO
    Fortran format:
    atom-res
    (12x,a4,1x,a3)
    atom-res
    (13x,a3,1x,a3)
    Na+
    'NA    NA'
    ' NA1 SOD'
    Mg+2
    'MG    MG'
    ' MG2 MAG'
    Al+3
    'AL   ALF'
    ' AL3 ALF'
    K+
    ' K     K'
    ' K_1 POT'
    Ca+2
    'CA    CA'
    ' CA2 CAL'
    Mn+2
    'MN    MN'
    'MN2 MNG'
    Mn+3
    'MN   MN3'
    ' MN3 MNG'
    Fe+2
    'FE   FE2'
    ' FE2 IRN'
    Fe+3
    'FE    FE'
    ' FE3 IRN'
    Mg+2
    'MG    MG'
    ' MG2 MAG'
    Co+2
    'CO    CO'
    ' CO2 COB'
    Co+3
    'CO   3CO'
    ' CO3 COB'
    Ni+2
    'NI    NI'
    ' NI2 NIC'
    Ni+3
    'NI   3NI'
    ' NI3 NIC'
    Cu+2
    'CU   CU1'
    ' CU1 COP'
    Cu+2
    'CU    CU'
    ' CU2 COP'
    Zn+2
    'ZN    ZN'
    ' ZN2 ZNC'
    Ag+2
    'AG    AG'
    ' AG1 SLV'
    Cd+2
    'CD    CD'
    ' CD2 CDM'
    Pt+2
    'PT2   TPT'
    ' PT2 PLT'
    Au+
    'AU    AU'
    ' AU1 GLD'
    Au+3
    'AU   AU3'
    ' AU3 GLD'
    Hg+2
    'HG    HG'
    ' HG2 MRC'

  • water coordinates will be trimmed if present in the input pdb file
    since water and surface bound ligands are modeled as bulk solvent


  • In case of missing atoms / patches of residues in the input pdb, the (Sm, Em) values may not be authentic
  • PDB file should definitely contain Hydrogen coordinates consistent with REDUCE (v.2) format

  • Atom and residue types will have to be consistent with brookhaven (PDB) format

  • Field No.
    Column range
    FORTRAN FORMAT
    Description
    1.
    1-6
    A6
    Record ID (eg ATOM, HETATM)*
    2.
    7-11
    I5
    Atom serial number#
    -
    12-12
    1X
    Blank
    3.
    13-16
    A4**
    Atom name (eg   , " ND1")*
    4.
    17-17
    A1
    Alternative location code (if any)#
    5.
    18-20
    A3
    Standard 3-letter amino acid code for residue*
    -
    21-21
    1X
    Blank
    6.
    22-22
    A1
    Chain identifier code#
    -
    23-23
    1X
    Blank
    7.
    24-26
    I3
    Residue sequence number*
    8.
    27-27
    A1
    Insertion code (if any)#
    -
    28-30
    3X
    Blank
    9.
    31-38
    F8.3
    Atom's x-coordinate*
    10.
    39-46
    F8.3
    Atom's y-coordinate*
    11.
    47-54
    F8.3
    Atom's z-coordinate*
    12.
    55-60
    F6.2
    Occupancy value for atom#
    13.
    61-66
    F6.2
    B-value (thermal factor)#
    -
    67-67
    1X
    Blank
    14.
    68-70
    I3
    Footnote number#


    Table Footnotes:

    # These fields might be left blank (preserving the specified format)
    * Mandatory fields
    ** An alternative 3-letter atom code (Fortran format: 1X,A3) at column range (13-16) (i.e., leaving column 13 as blank) will also do.

    COMPUTATIONAL TIME:

    Both Sm and Em are computed using dot surface points and therefore are essentially large.
    A pdb file containing ~100 residues SARAMA takes about 8-10 minutes in a Dell work-station (Redhat Enterprize / CENT-OS Linux platform)
    In the case of SARAMAint the computational time is a function of the number of interfacial residues rather than the whole molecule