Basic Tutorial

We provide here an example of how to utilize WatCon for new users. In this case, we will use WatCon to analyze conserved water molecules within the PTP1B active site, compare water networks across different PTP1B structures, and quantify conservation of these networks. For the sake of simplicity, this tutorial only uses static crystal structures. Further details on implementation of dynamic structures is available in the User Guide

Preparing Structures

We first obtain a series of PDB structures directly from the PDB databank. In this case, we will obtain crystal structures of PTP1B only in the closed WPD-loop position. This accounts to 69 structures.

We note that the higher resolution of a crystal structure, the more water molecules it is likely to have. Therefore, ideally we desire crystal structures with as good of a resolution as possible for our water network analysis.

1. Clean Raw PDBs

We first clean our PDBs by using the AmberTools pdb4amber function. Although not explicitly necessary, this tool easily rewrites our PDBs in Modeller-readable format for structural alignment. We can implement a simple bash script to process these files. This script assumes that you have kept all PDB files in a directory titled pdbs and then moves all of the cleaned files to the clean_pdbs directory.

#!/bin/bash

mkdir clean_pdbs
files=$(ls pdbs)

for file in $files; do
    name=$(basename "$file" .pdb)
    pdb4amber -i "pdbs/$file" -o "${name}.amber.pdb"
done

#Remove unused files
rm *sslink* *nonprot* *renum*

mv *amber* clean_pdbs

2. Create Fasta Files

To perform structural and sequence alignments, fasta files need to be obtained for all proteins. This can be done in any manner, but we recommend using the built in functions in the WatCon.sequence_processing module to create fasta files directly from the clean_pdbs directory.

import os, sys
from WatCon.sequence_processing import pdb_to_fastas

fasta_out = 'fasta'
for file in os.listdir('clean_pdbs'):
    name = file.split('.')[0]
    pdb_to_fastas(os.path.join('clean_pdbs', file), fasta_out, name)

os.chdir(fasta_out)

#Concatenate all fastas
os.system('for f in *.fa; do (cat "${f}"; echo) >> ../all_fastas.fa; done')

3. Align PDBs

Since WatCon partially relies on cartesian positions of water molecules, all structures of interest should be aligned before creating networks. There are instances in which this is not always true, see User Guide for more details. To do this, we will use Modeller to align all structures and save these structures in the aligned_pdbs directory. This process saves PDB structures with no waters (not useful!), and so we will then take the translation and rotation matrices calculated from our alignments to align the entire system including waters and save those files to the aligned_with_waters directory.

from WatCon.sequence_processing import perform_structure_alignment, align_with_waters

rotation_info = perform_structure_alignment('clean_pdbs')
align_with_waters('clean_pdbs', rotation_info['Rot'], rotation_info['Trans'], out_dir='aligned_with_waters')

4. Create Multiple Sequence Alignment

A multiple sequence alignment can either be generated by inputting the all_fastas.fa file into the CLUSTAL webserver (or aanother alignment webserver/method) and converting the output to PIR format or simply by using the built-in alignment from WatCon.

from WatCon.sequence_processing import msa_with_modeller

msa_with_modeller('alignment.txt', 'all_fastas.fa')

Run WatCon

Now that we have prepared our structures, we can run WatCon. The easiest way to do this is via the use of input files. An example input file is provided in the Getting Started section, and further details are provided in the User Guide.

WatCon can then be called on the command line:

$ python -m WatCon.WatCon --input input_file.txt --name PTP1B_closed

Depending on which analyses you chose to conduct, WatCon will make a series of directories, including watcon_output, cluster_pdbs, msa_classifications, pymol_projections. The different types of files contained in each section are described below.

  • watcon_output: If using input files to run WatCon analysis, a watcon_output directory will be made containing .pkl files containing, if indicated, WaterNetwork objects and calculated metrics which can be loaded into a follow-up python script (examples provided in the other tutorials). To read the data associated with each .pkl file, simply proceed as follows:

import pickle

with open('/path/to/file.pkl', 'rb') as FILE:
    data = pickle.load(FILE)

network_metrics, networks, cluster_centers, pdb_names = data

We provide several built-in post-analysis features which can be implemented without the user directly accessing these files. More details are provided in the next section.

  • cluster_pdbs: WatCon can be used to cluster water positions across multiple structures. If doing so, the positions of these clustered positions will be saved in PDB format and can be visualized using your favorite molecular visualization software. We recommend to visualize the cluster centers simultaneously to a protein structure to more easily see the relative locations of the cluster centers. Since the cluster centers were calculated with respect to the inputted aligned PDBs, the cluster centers can be loaded alongside any topology file from this collection without fear of misalignment.

Cluster positions from independent WatCon analyses can be viewed together, but care in alignment of independent structures needs to be taken. Further description on different ways to projectcluster centers onto non-aligned structures is given in the User Guide.

  • msa_classifications: If using the two-angle water position classification (explained further in the User Guide), corresponding .csv files will be saved in the msa_classifications directory. These files contain the following header:

Where the column names are:
  • Frame Index/PDB ID: Identifier for particular structure or frame identifier

  • Resid: Residue number (for a given structure file)

  • MSA_Resid: Common residue indexing based on multiple sequence alignment (MSA)

  • Index_1: Atom index (0-based indexing) of interacting protein atom

  • Index_2: Atom index (0-based indexing) of interacting water atom

  • Protein_Atom: Name of interacting protein atom

  • Classification: ‘backbone’ or ‘side-chain’

  • Protein_Coords: Coordinates of interacting protein atom

  • Water_Coords: Coordinates of interacting water atom

  • Angle_1: Calculated angle from protein atom – water atom – reference 1

  • Angle_2: Calculated angle from protein atom – water atom – reference 2

Note

Atom indexes use 0-based indexing to ensure consistency with MDAnalysis. However, most structure files use 1-based indexing for atom numbers.

  • pymol_projections: This directory will contain .pml (PyMOL) files containing connection information to project onto protein structures. To read these files properly, load first into pymol the corresponding structure and trajectory frames of interest. Then, load in the connection information by calling:

$ @path/to/pml/file

Into the PyMOL console.

Note

If using trajectories, be sure to load the trajectory frames into the structure before loading in the connection information. In order to increase speed in loading, we recommend using the ‘start’ and ‘stop’ arguments in PyMOL’s load_traj function to ensure that only relevant frames are loaded into the structure. For example:

$ load /path/to/structure
$ load_traj /path/to/trajectory start=10, stop=20
$ @/path/to/pml_files/15.pml

Run WatCon Post-Analysis

Once WatCon has been run initially, a separate input file can be utilized for separate post analysis. An example analysis input file is provided in the Getting Started section. Post-analysis will produce (depending on specifications) a series of plots along with PDB and .pml files containing conservation information. Tips on calculating and visualizing conservation scores are outlined more directly in the User Guide section. WatCon can then be called on the command line:

$ python -m WatCon.WatCon --analysis analysis_input.txt

We hope that this tutorial provides a sufficient guide to introduce the basics of a WatCon analysis. For more specific examples and directed guides, we recommend the user to study the remaining tutorials. Specific advice for effective WatCon usage is also outlined in the User Guide section.