DaliLite.v4 manual

DaliLite.v4 is a light version of the software run by the Dali server. The web server has search and data visualization options which are not included in this package. DaliLite.v4 supports just two core functionalities:

Data import (import.pl): convert PDB entries to Dali's internal data format
Pairwise comparison (dali.pl): structurally align a list of query structures to a list of target structures

System requirements

Linux OS
openmpi
Fortran-90 and C compilers
Perl

openmpi is optional, the software can be run serially. Although DaliLite.v4 can use MPI, the present implementation will not work in a distributed cluster; we run it on a multicore server. All nodes must have read access to the internal structure data directories (DALIDATDIR_1, DALIDATDIR_2) and write access to a shared current work directory (CWD).

Installation

download

	cd /home/you
	wget http://ekhidna2.biocenter.helsinki.fi/dali/DaliLite.v4.tar.gz
	tar -zxvf DaliLite.v4.tar.gz

compile

	cd /home/you/DaliLite.v4
	make clean
	make

configure

	Edit the paths of $MPIDALI_BIN and $MPIRUN_EXE at the top of /home/you/DaliLite.v4/bin/mpidali.pm

test
```
	cd /home/you/DaliLite.v4
	./test.csh
```

Data import

This example reads a file in PDB format and creates the corresponding data files, one for each chain, in the data directory /home/you/DaliLite.v4/DAT:

	cd /home/you/DaliLite.v4
	perl -I bin ./bin/import.pl ./toy_PDB/pdb1ppt.ent.gz 1ppt ./DAT/

The first argument of import.pl is the PDB file name. Uncompressed and compressed files (extension .gz) are accepted. The second argument is a four-letter identifier for the structure. The chain identifiers will be appended automatically. The resulting five-letter identifier is used in Dali's internal database; the example creates the file ./DAT/1pptA.dat. The third argument is the path to the data directory. All query structures should be in one directory (DALIDATDIR_1). All target structures should be in one directory (DALIDATDIR_2). DALIDATDIR_1 and DALIDATDIR_2 can be identical, but usually DALIDATDIR_2 contains public structures downloaded from the Protein Data Bank (PDB) and DALIDATDIR_1 contains private structures.

Mirroring PDB

Modify MIRRORDIR and LOGFILE in the script (rsync.sh) below. Execute the script from the crontab once a week, which is the update frequency of PDB. Extract the list of new PDB entries from the log file and import them as explained above.

	MIRRORDIR=/data/pdb/                 		# your top level rsync directory
	LOGFILE=/data/DaliLite/pdb_update.logs          # file for storing logs
	RSYNC=/usr/bin/rsync                            # location of local rsync
	SERVER=rsync.wwpdb.org::ftp                     # RCSB PDB server name
	PORT=33444                                     	# port RCSB PDB server is using
	${RSYNC} -rlpt -v -z --delete --port=$PORT ${SERVER}/data/structures/divided/pdb/ $MIRRORDIR > $LOGFILE

Pairwise comparison

Create a list of PDB + chain identifiers in file query.list. The query structures must have been imported to DALIDATDIR_1 beforehand.
Create a list of PDB + chain identifiers in file target.list. The target structures must have been imported to DALIDATDIR_2 beforehand.
Compare all query structures against all target structures.
```
	cd /home/you/DaliLite.v4
	perl -I bin ./bin/dali.pl --query query.list --db target.list --dat1 DALIDATDIR_1 --dat2 DALIDATDIR_2 --np npara
```
npara is the number of MPI processes. The default is npara=1 which will run the serial version of the software and does not require openmpi. Output is generated in nxxxA.txt, where nxxxA is the query identifier. Targets with a Z-score above 2 are reported.

All-against-all comparison

Create a list of PDB + chain identifiers in file query.list. The query structures must have been imported to DALIDATDIR_1 beforehand.

Do an all-against-all comparison of the query structures.

	cd /home/you/DaliLite.v4
        perl -I bin ./bin/dali.pl --query query.list --matrix --dat1 DALIDATDIR_1 2> /dev/null

The matrix option works like pairwise comparison with identical query and target lists. It generates additional outputs named 'ordered' (pairwise Z-scores), 'newick' and 'newick_unrooted' (dendrograms in Newick format).

Example

Execute these commands:

cd /home/you/DaliLite.v4
perl -I bin ./bin/import.pl ./toy_PDB/pdb1ppt.ent.gz 1ppt ./DAT
perl -I bin ./bin/import.pl ./toy_PDB/pdb1bba.ent.gz 1bba ./DAT
echo 1pptA > query.list
echo 1pptA > target.list
echo 1bbaA >> target.list
perl -I bin ./bin/dali.pl --query query.list --db target.list

The output is 1pptA.txt which looks like this:

# Job: test
# Query: 1pptA
# No:  Chain   Z    rmsd lali nres  %id PDB  Description
   1:  1ppt-A  7.7  0.0   36    36  100   MOLECULE: AVIAN PANCREATIC POLYPEPTIDE;
   2:  1bba-A  3.6  1.8   33    36   39   MOLECULE: BOVINE PANCREATIC POLYPEPTIDE;

# Structural equivalences
   1: 1ppt-A 1ppt-A     1 -  36 <=>    1 -  36   (GLY    1  - TYR   36  <=> GLY    1  - TYR   36 )
   2: 1ppt-A 1bba-A     1 -  33 <=>    1 -  33   (GLY    1  - ARG   33  <=> ALA    1  - ARG   33 )

# Translation-rotation matrices
-matrix  "1ppt-A 1ppt-A  U(1,.)   1.000000  0.000000 -0.000000            0.000000"
-matrix  "1ppt-A 1ppt-A  U(2,.)  -0.000000  1.000000  0.000000           -0.000000"
-matrix  "1ppt-A 1ppt-A  U(3,.)   0.000000 -0.000000  1.000000           -0.000000"
-matrix  "1ppt-A 1bba-A  U(1,.)   0.631906 -0.761372 -0.144939           -0.890845"
-matrix  "1ppt-A 1bba-A  U(2,.)   0.512616  0.550832 -0.658642          -10.882093"
-matrix  "1ppt-A 1bba-A  U(3,.)   0.581308  0.341902  0.738366            4.946664"

DaliLite writes a number of intermediate results in the current working directory (CWD). If a file named dali.lock is present, you cannot start another DaliLite job in the same directory. The lock file is deleted automatically, if the job completed successfully.

Version history

October 2018: v.4.0 released

Contact

liisa dot holm at helsinki dot fi