Bioinformatics & Homology Modeling

Week 5

The purpose of today's exercise is to build the structure of a protein, beginning with only the DNA sequence of the gene encoding the protein. WHOA, you say-didn't Dr. Koudelka say that we could not predict structure from sequence? Indeed you cannot, however you can build a reasonable model of a protein IF there is a structural homologue in the protein database.

The example we will be using in this exercise is dicC. DicC is a division control protein of E. coli. We already know from a number of studies that this protein is a DNA binding protein that recognizes specific DNA sequences using a helix-turn-helix supersecondary structure element. For the purposes of today's exercise, we will pretend that we did not know what gene we had in our hot little hands.

PART I- PRELIMINARY ANALYSIS OF DNA & PROTEIN SEQUENCES

TECHNIQUES USED TO IDENTIFY GENES-DATABASE SEARCHING

TECHNIQUES FOR ANALYZING DNA AND PROTEIN SEQUENCES-GCG PROGRAMS.

TECHNIQUES FOR CHOOSING STRUCTURAL HOMOLOGUES

PART II BUILDING A MODEL PROTEIN

Before we begin, you will first need to copy several files that you will need to have for today's exercises.

Bring one of the blue terminal windows forward

Copy the files you will be working with

At the unix '>' prompt type
mkdir homology (this command creates the directory homology)

then type
cp /nsm/home/koudelka/homology/*<space>homology (note there is a space AFTER cp)

then type

cd homology

If you now type ls, followed by the enter key there should be 8 files displayed.

dicCprot.seq dic_C.seq homology99_1.psv homology99_2.psv homology99_3.psv homology99_4.psv unknown.txt, p22.pdb.

PART I-PRELIMINARY ANALYSIS OF DNA & PROTEIN SEQUENCES

A. TECHNIQUES USED TO IDENTIFY GENES-DATABASE SEARCHING

Through various genetic manipulations, you have identified and cloned your candidate gene. The output from your Sequencing Facility was poor and you only have 50 bases of readable DNA sequence. That sequence is stored in a file called unknown.txt. Your first task is to search the sequence database at National Center for Biotechnology Information (NCBI).

1) Start your web browser by typing 'eng netscape'

Go to the URL http://www.ncbi.nlm.nih.gov/.
Click on the "BLAST" button at the top of the page
The BLAST screen now comes up with several options. You have a choice of several ways to search the database. Among others these are:

blastn is the default program and searches the nucleic acid data with a DNA "probe" sequence.

blastp compares an amino acid query sequence against a protein sequence database

blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database

tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames

tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that tblastx program cannot be used with the nr database on the BLAST Web page.

Click on Standard nucleotide-nucleotide BLAST [blastn]

ENTER THE SEQUENCE IN THE SEQUENCE WINDOW Sequence entry can be done manually by typing or pasting in from a file. We will be pasting in from a file:

At the Unix command '>' prompt type 'unknown.txt'

--A DNA sequence of 79 bases in length will appear at the command prompt

Highlight the DNA sequence by placing your cursor on the first base of the sequence, click and hold the left mouse button and drag so that the entire sequence is highlighted and then release the button
Make netscape the active window and place the cursor in the gray sequence query box in the middle of the screen and click on the middle mouse button. The sequence now appears in the query box.

Click on the Submit Query button. In a few seconds, a results screen appears giving you a graphical and text based list of all the sequences that matched your DNA sequence.

Scroll down to the text based list and you will see:

Score E

Sequences producing significant alignments: (bits) Value

gi|1787841|gb|AE000253.1|AE000253 Escherichia coli K12 MG16... 157 8e-37

gi|1742585|dbj|D90800.1|D90800 E.coli genomic DNA, Kohara c... 157 8e-37

gi|1742560|dbj|D90799.1|D90799 E.coli genomic DNA, Kohara c... 157 8e-37

gi|312764|emb|X07465.1|ECDICABC E. coli genes dicA, dicB, d... 157 8e-37

gi|5705995|gb|AC000049.11|AC000049 Homo sapiens Chromosome ... 38 0.53

gi|6598396|gb|AC003680.2|AC003680 Arabidopsis thaliana chro... 36 2.1

The first set of blue highlighted text indicates the database where the sequence is found and the "accession number" that represents the file name of the sequence in the database. Following that is a brief description of the sequence file. The next information is the Score, a scaled measure of the degree of homology (identity) between your probe sequence and a portion of the sequence in the file. The larger this number is, the higher degree of identity there is. The last value on this line is the "E value" that represents the probability that the probe sequence is not homologous to the sequence from the database. The smaller this number is, the more likely that your sequence and the sequence in the database represent the same gene.

Clicking on the "Score" for any entry in the list will move you down the page where you can see the actual alignment between the probe sequence and the sequence in the database. CLICK ON the third entry in the list. You will see a more complete sequence description, including the number of bases in the database entry. PLEASE NOTE that the sequence number of the subject sequence is written from 269-191.

Query: 1 atgcttaaaactgacgctcttttgtatttcggttcaaaaacaaaacttgcacaagcagca 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 269 atgcttaaaactgacgctcttttgtatttcggttcaaaaacaaaacttgcacaagcagca 210

Query: 61 ggtattcgtttggcttcgc 79

|||||||||||||||||||

Sbjct: 209 ggtattcgtttggcttcgc 191

Click on the accession number emb|X07465|ECDICABC and you are transported to a complete sequence annotation, including any publication information.

Scroll down to through the sequence annotation to the FEATURES list. This part of the annotation describes both confirmed and unconfirmed sequence features (promoter sequences, coding sequences, etc.). Note that the first feature:

CDS complement(39..269)

/note="dicC polypeptide"

/codon_start=1

/transl_table=11

/db_xref="PID:g41277"

/db_xref="SWISS-PROT:P06965"

/translation="MLKTDALLYFGSKTKLAQAAGIRLASLYSWKGDLVPEGRAMRLQ

EASGGELQYDPKVYDEYRKTKRAGRLNNENHS"

indicates that our probe or query sequence is found in the region of the database sequence that encodes the dicC polypeptide. However the sequence in the database in complementary direction (i.e., the reverse complement of the sequence in the database is the "coding sequence"). Note that this was suggested by the results of the BLAST query.

To better visualize the orientation of the coding sequence, we will reformat the display output of the sequence page. In the upper left corner of the display window, next to the 'Display' button, set the display option to Graphics. Click on the Display button. In the view that appears, the entire sequence is depicted as a multicolored bar at the top and below that a detailed view of a region of the sequence is given. Unfortunately, this view defaults to the middle of the sequence. Click near the N-terminal end of the multicolored bar at the top to view the detailed display of the region we are interested in. As can be seen, the coding sequence reads in the opposite direction.

We now have learned that we are working with a known gene encoding for a protein. Reading some of the references in the annotation will lead you to information regarding the protein's structure & function, which will come back to later. We will now turn to a more detailed analysis of the DNA & protein sequence.

B. TECHNIQUES FOR ANALYZING DNA AND PROTEIN SEQUENCES-GCG PROGRAMS.

In this part of the class, we will be using another software package owned by the CAMBI Computer Facility, the so-called Genetics Computer Group (GCG) software. This software package contains programs used for analysis of the primary sequences of both protein and DNA.

This software resides on the NSM server pinky.nsm.buffalo.edu. There are several preliminary steps that must be done in order for you to run this software successfully from the SGI machines.

Minimize netscape

2) Setting up the GCG Programs

In one of the "blue" terminal windows:

Add the host "pinky" to the control list

At the Unix '>' prompt, type 'xhost +pinky.nsm.buffalo.edu' (no quotes) The machine echoes that you have added this host to the control list.

type rlogin (cambi1.bio, okeefe.bio, ipsi.med or sabine1.med) at the Unix prompt

Enter your password (the one used on the SGI's).

type 'rlogin pinky.nsm' at the Unix prompt (again no quotes)

Enter your password (the one used on the SGI's).

Set the environmental display variable to display on your screen.

type EXACTLY!

setenv DISPLAY machineyouarsittingat:0.0

This command must be type exactly as displayed, it is case (capitalization) sensitive. The machineyouaresttingatname is the name of the machine you are at. It has the name plastered on the front, e.g. azure.eng.buffalo.edu. This name is followed by a colon (:) and 0.0 (zero.zero). There is no echo from this command.

Start the GCG programs

At the unix > prompt type 'eng gcg' (no quotes)

The program now starts.

Open an X-window for displaying 2-D graphical output

At the unix '>' prompt type

xwindows

Hit the ENTER key to accept the COLORWORKSTATION default.

A GCG graphics window now appears.

Sequence analysis using the GCG Programs

a. Mapping the sequence-Generating a restriction map

Type map dic_C.seq

Hit the ENTER key to default through the beginning and end of the sequence regions to be analyzed

Hit the ENTER key to default to accept all ENZYMES for the restriction map

The next subcommand asks you about protein translation frames"

What protein translations do you want:

a) frame 1 b) frame 2 c) frame 3

d) frame 4 e) frame 5 f) frame 6

t)hree forward frames s)ix frames o)pen frames only

n)o protein translation q)uit

Please select (capitalize for 3-letter) (* t *):

At the : prompt type a capital A and hit enter. This selects a three letter designation in coding frame 1.

Hit the ENTER key to accept the default filename dic_C.map.

b. Viewing the output.

Type more<space>dic_C.map

A portion of the sequence with a restriction map above and a protein translation below becomes visible. To look at the rest of the file, hit the space bar until your reach the end of the file and the unix '>' prompt appears again. Note that you also get a list of enzymes that do and do not cut your DNA sequence.

Alternatively you could type

cat<space> dic_C.map

and the entire file would appear. You can then use the window's slider bar to look at the file

Protein analysis using the GCG

We will now try get a preliminary indication of the possible secondary structural elements in the dicC polypeptide. Before doing this, we must first translate our DNA sequence into a protein.

Translating DNA

Type translate dic_C.seq

Hit the ENTER key to default through the beginning and end of the sequence regions to be analyzed

Hit the ENTER key to accept the beginning and end of the sequence regions to be analyzed

Type 'w' to write the translation to a file

Hit the ENTER key to accept the default file name dic_C.pep

Now we will analyze the peptide structure based on the physical properties of its constituent amino acids

Using peptidestructure

Type peptidestructure dic_C.seq

Hit the ENTER key to default through the beginning and end of the sequence regions to be analyzed

Hit the ENTER key to accept the default hydrophilicity index of Kyte-Doolittle

Hit the ENTER key to accept the default output file name dic_C.p2s

Now we will display the results of this analysis in the xwindow we opened earlier.

c. Using plotstructure

Type plotstructure dic_C.seq

Hit the ENTER key to default through the beginning and end of the sequence regions to be analyzed

Choose a 1 dimensional panel graph plot by typing '1'

Press the <ENTER> key when prompted for a <RETURN>

A plot summarizing the hydrophilicity, surface probability, helicity, b-sheet probability, and several other features of the protein primary sequence appears in the GCG xwindow. Be sure to note where the helices and sheets are predicted to occur.

d. Exiting the GCG

Click the Exit button on the GCG xwindow

In the window connected to pinky, type logout

C. TECHNIQUES FOR CHOOSING STRUCTURAL HOMOLOGUES

The next step in this process is to use the sequence information, along with other information gathered along the way to try and build a model of the dicC protein based on its sequence and structural homologues.

One question you may have is how to decide what are the structural homologues of the protein. Although the computer can HELP, you must do a little literature-reading legwork on own. A way the computer CAN help is to speed up the literature search. This can be done in BLAST

Finding references

Click on your minimized netscape icon

Scroll back to the top of the file where the literature references are and find Reference 3

REFERENCE 3 (bases 1 to 4441)

AUTHORS Bejar,S., Bouche,F. and Bouche,J.P.

TITLE Cell division inhibition gene dicB is regulated by a locus similar

to lambdoid bacteriophage immunity loci

JOURNAL Mol. Gen. Genet. 212 (1), 11-19 (1988)

MEDLINE 88232418

Click on the blue Medline reference number and you are transported to a page with the abstract of an article that highlights particular structural homologies between dicC and other proteins, in particular, DNA binding proteins from the bacteriophage P22.
Notice that the abstract states that sequence comparisons indicate that dicA and dicC are similar to genes c2 and cro of bacteriophage P22. Type "bacteriophage P22" (in quotes) AND c2 in the text box at the top.
Before clicking on GO, notice the pulldown menu on the left side of the text box at the top. It is defaulted to PubMed, i.e., the database list of publications. If you examine the other items in the pulldown, you will see that you may choose among many databases, protein, nucleotide, structure, AHH! STRUCTURE!!! Change the setting to structure, then click go!
BINGO, there is a structure for the P22 c2 protein, whose PDB ID code is 1ADR. Click on the 3D Domains and then Related 3D Domains links and you are transported to a wonderous page where all of the structural homologues of P22c2 and by extension, the protein that you have found. You can then use these numbers to download the structures for homology modeling.

PART II BUILDING A MODEL PROTEIN

Using the procedure described above, I have identified three structural homologues of the dicC protein. These are l repressor, 434 repressor, P22 repressor. I have downloaded these structures and placed them in a folder called - homology99_1.psv. We will be begin building our model protein by first examining the structures of these proteins.

1) File

Restore Folder- homology99_1.psv

The folder contains three objects, restore them all by making sure there is an * in the objects window. The three objects are three structures of bacteriophage repressors l repressor <LAMREPCOR>, 434 repressor <REPH2O> and P22 repressor <P22R>. These proteins are all helix-turn-helix (HTH)-containing proteins. The HTH is colored differently from the rest pf the atoms in each of the three proteins: 434 repressor (lilac/white); l repressor (brown/teal) and P22 repressor (red/yellow). View each protein individually paying close attention to the conserved HTH motif. To view each protein: Object-Blank-on <object> Repeat until only one object appears on the screen (Note: Using blank to turn an object off and on object preserves it display characteristics, e.g. when an object is 'unblanked', in these cases, they come back as backbone traces. If you had used Molecule-Display/Undisplay, it would have come back as a complete structure.) Turn all objects back on (unblank them) 2) Open Module-Homology

The purpose of the rest of this exercise is to build a structure for dicC, which is homologous to the three we have displayed on the screen, but whose 3-D structure is unknown. This built protein will draw on known sequence and structural homologies in these HTH proteins.

a) Sequences-

Extract. Extract the protein sequences for the three proteins displayed on the screen by clicking, in turn, on their names in the value aid. Please do this for LAMREPCOR first, followed by P22R and REPH2O. As the first sequence appears, a new window opens on the bottom of the screen. As each subsequent sequence appears, a slider bar appears on the bottom of the sequence window. Also, the structure of each protein becomes represented by a Ca carbon trace. You can use the slider bar to move the sequence back and forth. Clicking on any residue shows the identity in the command window on the very bottom of the insight screen. At this point, align the sequences by clicking USING THE MIDDLE MOUSE BUTTON and dragging the second amino acid of each sequence so that Gln (Q) 117 of REPH2O and Q21 of P22R and Q33 of LAMREPCOR are aligned on top of each other. This residue is the first residue of the DNA recognition helix of the HTH motif. To aid you in this, in the sequence box, click color by Ca . The next part of this exercise involves the identification of structurally conserved regions (SCRs) in the three proteins. This is done by assessing both RMS differences in superpositioning of backbone atoms in homologous positions in the three proteins and a comparison of their sequence similarities. b) Object-Blank-ON-LAMREPCOR-execute

c) Alignment-

Structure-Manual-Execute d) Alignment Pairwise_Sequence

Manual

Scoring Matrix® Mutation

Execute

e) Boxes-

Initialize-Click on Q117 of REPH2O and Q21 of P22R and then click on execute. The box now turns green.

IMPORTANT In the sequence window, next to the Mode command, click on "Box"

Using the right mouse button, extend the box three amino acids towards the C-terminus.

What happens?

Note: the movement of the structure, observe the RMS derivative and sequence homology values on the command line in the lower left part of the screen. A good structural alignment has a low RMS value, a good sequence homology has a high homology score. Move the box until it encompasses the sequence of both proteins from Q117/21 through V124/V28.

We know need to compare the sequence and structure homologies between REPH2O and LAMREPCOR and also between P22R and LAMREPCOR. Before we can do this we have to 'freeze' the sequence box we have already created.

f) Boxes Freeze-click in the sequence box you created. The number 0 appears in the white window. That is this box's number. Press execute. The sequence box turns red, indicating that it is now 'frozen'. Unblank LAMREPCOR and blank P22R g) Boxes Initialize-Click on REPH2O Q117 and LAMREPCOR Q33. Execute Extend this box to encompass REPH2O Q117-V124 and LAMREPCOR 33-40. h) Boxes Freeze-click in the sequence box you created. The number 1 appears in the yellow window. That is this box's number. Press execute. The sequence box turns red, indicating that it is now 'frozen'. Unblank P22R and blank REPH2O. i) Boxes Initialize-Click on P22R Q21 and LAMREPCOR Q33. Execute Extend this box to encompass P22R Q21-V28 and LAMREPCOR 33-40.

j) Boxes

Freeze-click in the sequence box you created. The number 2 appears in the yellow window. That is this box's number. Press execute. The sequence box turns red, indicating that it is now 'frozen'. In a real model building case, you would continue defining a many SCRs as you can find, but since this takes some time and experience, I have done this for you for two more SCRs. These are stored in the folder homology99_2.psv.

DELETE ALL OBJECTS BEFORE RESTORING NEXT FOLDER

Restore Folder-homology99_2.psv

The folder contains three objects, restore them all by typing an * in the object window. The three objects are three structures of the bacteriophage repressor l repressor <LAMREPCOR>, 434 repressor <REPH2O> and P22 repressor <P22R>. Also, the sequences of these three proteins appears, along with three SCR boxes, the one you defined and two others. Our next operation is to begin the process of assigning coordinates to our model-built protein. The first step in doing this is to make 'super'-sequence boxes which summarize all the information in the three overlapping frozen sequence boxes in the three different SCRs. a) Boxes Summarize® Create® Execute: Two things happen. A white box appears around each SCR. A series of subsets are created, four for each protein. List these by b) Subset List

<objectname>$SCR- contains all the structural information in the three structurally conserved regions for a particular object.

<objectname>$SCR1-3 Each contains the structural information for one structurally conserved region for a particular object.

The next thing we will do is to finally visualize the degree of structural homology between these proteins in the sequence conserved region by superimposing them. c) Transform Superimpose-

Molecule Pick Level Subset

Superimpose -Backbone

Label Mode-Off

Source P22R$SCR

Target REPH2O$SCR

End definition ON

Execute

The two molecules now superimpose on the screen. Note the RMS difference

d) Transform Superimpose-

Molecule Pick Level Subset

Superimpose -Backbone

Label Mode-Off

Source LAMREPCOR$SCR

Target REPH2O$SCR

End definition ON Execute

The two molecules now superimpose on the screen. Note the RMS difference. Which pair of molecules are more alike?

We are now finally ready to begin building the new protein's structure

e) Sequences Get ® dicCprot.seq Import Mode ® Single

File Fmt® Biosym

Execute

Another sequence line now appears at the bottom of the screen. It is, however, unaligned, both in sequence and proposed structure with the other three proteins. This is the next thing we must do. Unfortunately the alignment program in insight is terrible at recognizing more distant relationships so I am incorporating information from a GCG alignment I did on these protein sequences. You can also use published alignments at this step. To use this manual alignment strategy:

Alignment

Pairwise_Sequence

OFF

Execute

Now Click on SEQUENCE next to Mode in the Homology window.

Align the sequences by clicking USING THE MIDDLE MOUSE BUTTON and dragging the first residue of dicCprot so that it is aligned with Ile 21 of LAMREPCOR.

DO NOT DO THIS TODAY-BUT IF YOU ARE INTERESTED IN DOING AN AUTOMATIC ALIGNMENT FOLLOW THE PROCEDURE IN f)

f) Alignment Pairwise_Sequences Automatic

Scoring Matrix-Mutation

Gap Penalty-15 (This is very important!!!)

Sequence 1 REPH2O

Sequence 2 dicCprot

Execute Now we determine which protein is most likely to best predict the structure of the new protein. This is done solely by sequence homology. g) Alignment Pairwise_Sequences Manual Execute

h) Alignment

Structure Off Execute If the sequence in the window moves up and out of view, use the scroll slider on the right of the sequence window to bring it back into view.

We assess sequence homology pairwise dicCprot vs. REPH2O, P22R, and LAMREPCOR individually in each of the three SCR regions. This takes some time so you will only do it for one here.

IMPORTANT In the sequence window, next to the Mode command, click on "Box"

i) Boxes Initialize: Pick dicCprot K13 (on the sequence bar) and LAMREPCOR Q33 Execute Enclose the entire SCR region with the box and record the homology score.

j) Boxes

Initialize: Pick DicCprot K13 (on the sequence bar) and REPH2O Q117. Execute

Enclose the entire SCR region with the box and record the homology score.

k) Boxes Initialize: Pick DicCprot K13 (on the sequence bar) and P22Q21. Execute Enclose the entire SCR region with the box and record the homology score.

Which pair has the highest homology score?

Normally, we would now delete the unwanted sequence boxes using Boxes-Delete and then repeat the process comparing the DicCprot sequence with the three other proteins in the two remaining SCRs using a process identical to that here. We would then delete the unwanted boxes and freeze the three good ones. I have done that for you and here were the homology scores

Protein SCR1 SCR2 SCR3

LAMREPCOR -11 -6.28 -8

REPH2O 6.25 16.25 -4

P22R -5 10 10

DELETE ALL OBJECTS BEFORE RESTORING THE NEXT FOLDER

PART III

Restore Folder-homology99_3.psv

Use the scroll slider on the right of the sequence window to bring the SCR regions into view

We will now use the SCR region of the three segments to assign coordinates to our new protein. We are using the coordinates of REPH2O for SCRs 1&2 and that of P22R for SCR 3.

a) Sequence

Assign Coords Pick in lower row of each of the red boxes and then-Execute (If you have trouble doing this, type 12 in the value aid box and click Execute-Repeat this using Box 13 and Box 14). After each operation, a notice that a new segment of protein structure now appears on the screen.

Simplify the display by displaying only the Ca trace of DicCprot using

Molecule

Molecular Pick Level® Molecule

Display-Only

Specified-Backbone

Execute.

This was the easy part. An a-helix is an a-helix. Most any helical structure of the right length will be fine. Now for the hard part, assigning the structures to the loops between the helices. Homology allows you to do this in any of three ways. 1) search the entire PDB with your loop's amino acid sequence and display the structure of the top ten candidates; 2) randomly generate loop structures using the y and f constraints of Ramachandran; 3) assign structure from one of the reference proteins. This last option is best IF the sequence of the unknown loop matches that of one of the known proteins. Since this is not true in our case today, we are left with 1 or 2. Since we don't have the time to search the entire PDB we will proceed using strategy 2 b) Loops Generate-Start search at DicCprot A20 (click on this residue in the sequence bar) End loop at DicCprot A24 Automatically, the number of residues that are FLEXible appears in the FLEX window. Several other settings must be made. Convergence: 0.001

Internal Overlap: 0.5

External Overlap: 0.5

Iterations: 1000

Execute.

The program generates ten loops and creates them as subsets Choice$1-10. The program leaves you in Loops-Display. You now must evaluate the best loop choice for the region of interest. You can do this by inspection and RMS evaluation.

Zoom in on the region where the loop will be placed.

c) Loops Display: Pick Loop choice to display by toggling it to on. Execute The loop, with side chains, now appears in the region. The RMS difference between the closest structure is calculated and displayed on the screen. Toggle through the loops, one by one and decide which is the "best" by toggling them on and off and recording the RMS distance. After you have found the best loop, redisplay it. Now we will graft it on to the structure.

e) Loops

Assign Coords: Your best fit loop that you left on in display should be selected. If it is then just Execute In the real case we would repeat this for the other loop between helices 1 & 2. However, in the interests of time, I have done this for you. The completed protein is in the folder homology99_4.psv.

DELETE ALL OBJECTS BEFORE RESTORING THE NEXT FOLDER

PART IV

Restore Folder-homology99_4.psv

On the screen appears the preliminary model structure of the DicCprot. However, certain things are missing, namely, the amino and carboxyl terminal ends. In order to proceed, we must let Homology put these on.

a) Refine

End repair DicCprot Execute The ends of the molecule are now added in an extended chain conformation.

The molecule is still not done. Two things remain. First, the existing model of the rest of the protein contains several steric overlaps. You may have noticed messages throughout the entire exercise indicating that to you. Second the ends of the molecule almost undoubtedly do not exist as extended chains. They must be minimized into a more reasonable configuration. First, we will identify the bad contacts.

b) Measure Bump Molecule-DicCprot

Overlap value- 0.5 (A)

Option-intra

Monitor-off (IMPORTANT)

Execute What appears in the pop-up textport is a large list of contacts within the molecule that are too close to be real. You must run a Discover job to remove these.

c) Click FF to show: Forcefield Potentials Molecule DicCprot

VERY IMPORTANT!!!- Click on Fix in each area of the control panel

Execute d) Open Module-Discover_3 IF THE PROGRAM DOES NOT LET YOU EXIT WITHOUT FIRST EXAMINING WHETHER TO SET POTENTIALS ON CHOICES$1-10 AND THE THREE PROTEINS, JUST CANCEL THROUGH THE LIST

d) Open module Discover_3

Choose Setup ® System

Click the Expert button ON

Select dicCprot as the Assembly/Molecule

Execute

Choose Specify® Non-bonds

Click Dielectric-Distance Dependent

Execute

Choose Calculate® Minimize

Set Run Max Steps 1000

Newton-Off

Execute

Choose D_Run® Run

select dicCprot0 as the job

Execute

After the job starts (as noted by the green text and molecule movements):

h) Background_Job

Select Control_Background_Job

Under Control Mode select Detach from Job

Click on dicCprot0 in the job window

Execute

This allows the job to proceed more quickly because the program now no longer must continually update the display.

After you receive notification that the job has finished, we will look at the minimized structure:

i) Molecule Get Archive DicCprot0.cor Execute

This restores the output of the Discover run you setup above. Notice how the molecule has changed. Confirm that Discover has done its job of removing steric overlaps by re-running Bump

j) Measure Bump Option-intra

Molecule-DicCprot0.cor (IMPORTANT)

Overlap value- 0.5 (A)

Monitor-off

Execute The command should execute, but no textport appears. This is because the overlaps have been removed.

You have now successfully created a model of the dicC protein. We will be using this model of the dicC protein later on this semester when we analyze the mechanisms of DNA recognition by proteins. Therefore, we will save this structure for later use.

Protein	SCR1	SCR2	SCR3
LAMREPCOR	-11	-6.28	-8
REPH2O	6.25	16.25	-4
P22R	-5	10	10