Computer-based molecular modeling in drug discovery

BioDocFL · Post by **BioDocFL** » Sun May 25, 2008 8:26 am

Computer modeling in drug discovery

I have been asked to comment on a treatment protocol being put forward based on computer modeling of small molecules in the ligand binding sites of receptors. I am not familiar with the protocol but I can comment on some of the many caveats that must be considered when interpreting computer-based modeling of drug-protein interactions. The bottom line is that the modeling by itself is only suggestive of possibilities, not proof. It requires a lot more work with experimental testing using in vitro assays, cell-based assays, structural biology work, and then it can proceed into animal models, and eventually into clinical trials.

First, the computer modeling begins with either 3D structures determined by X-ray of crystallographic results, NMR of those results, or a homology model developed based on a similar known structure. All of these approaches have their drawbacks in providing a true representation of the protein. The 3D coordinates are based on a pretty static view of the protein. The protein needs to be crystallized into a highly repetitive stacking of the homogeneous protein molecule so that, when the X-rays are directed at the crystal, a consistent pattern of diffractions can be seen as the X-rays hit the individual atoms and have their path altered. Doing this with millions of the molecules in a crystal structure then allows for the average position of each atom of the basic molecule to be mathematically determined. These are just average positions and can be off by the resolution distance given for a crystal structure, such as 2 angstroms, which is about the size of a typical atom-atom bond distance (like a carbon-carbon bond). So a structure with a 5 angstrom resolution is very crude. A structure is considered to be a good depiction if it is 2 angstroms or less. There are more than 45,000 protein structures posted on the Protein DataBank (www.rcsb.org) where most researchers go for proteins to model.

Now, in generating a crystal structure, the crystallographer (or structural biologist) has to work out the conditions to get the proteins to stack. If the proteins get too tightly packed, they precipitate out and are just a clump of junk. If the proteins are too loose, they are still in random positions relative to each other so the average positioning of atoms can not be determined. Even under the best circumstances, some of the amino acids at either end of the protein may remain undefined because they are still free to move about, giving random data as to their position relative to the rest of the protein. You'll notice this in many of the published structures, that often residues 1-4 or so are missing. To simplify the work, sometimes the crystallographer will work with only the supposed critical portion of the protein, such as the enzyme active site. This can usually get a more definitive picture of that portion of the protein but it can miss allosteric sites, those sites away from the protein's ligand binding site that, nevertheless, still have a significant effect on the protein activity.

As the crystallographer works out the conditions, the best crystals for X-ray work are often in very harsh conditions, far from the in vivo conditions. So the crystal may only form near 0oC (32oF) and with a high salt content in the buffer. Crystallographers often put heavy metal ions in too since these give strong diffraction patterns that help the crystallographer initially in seeing the overall repetitive pattern of the overall molecule, i.e., is the molecule always in the same orientation, or do two of them appear as a set in 180o mirror, or three at 120o or some even more complex repetitive pattern. By the time a crystal is obtained, the proteins may be so dense (but still soluble) that they distort each others atom positions due to the pressures of their stacking. So the static picture that results may have a distorted ligand binding site and be missing some sites on the protein that are important to overall activity. It may not have required cofactors in the final structure also. And the placement of key water molecules may not be there, where a specific water molecule in the structure can provide a electrostatic bridge that helps hold a ligand in the active site.

In the cell, the proteins are very flexible and have a lot of motion, motion of the side-chains off each amino acid, and motion of large domains of the protein. Many proteins are more active when they bind in homodimeric (binding with another of the same protein) or heterodimeric groups (binding with a different protein). Also, some proteins may be embedded in a lipid layer (detergent-like or oily-like) and then move into the more aqueous environment of the cell as they translocate to the nucleus to convey a signal. It is difficult to crystallize the protein in a heterogeneous environment like this but those different environments can be very important in the proteins' normal functioning. The motion of the protein needs to be taken into account since it can be very dramatic. Kinases are a popular target in cancer-related drug discovery but kinases are usually very flexible. The active sites often can be tight or open very wide, giving the possibility that drugs of a range of sizes could be effective inhibitors. This flexibility of protein structures is just now being addressed in virtual screening but is very important.

The software used in molecular modeling and virtual screening (docking thousands of small molecules into a protein's active site one at a time and calculating the potential binding energies to see if each molecule could be a good inhibitor) is very new, perhaps only in the past 10 years has it been effective and it is getting better. However, many virtual screening labs will use software from more than one vendor so that a concensus is obtained about particular molecules as drug candidates. Using several different software packages to do the same screening means several different algorithms are used in determining whether a particular molecule is a good potential inhibitor. The weaknesses of one algorithm may be compensated by the other algorithms so that the truly better molecules will show the best results virtually. These are only potential candidates. To go so far as to predict binding affinities and other kinetic data, (IC50 values) is well beyond the capabilities of the software now to think that they are accurate values. Virtual results always, always, always need experimental confirmation, always. We usually do not want to show the molecule structure to our chemists until we can confirm the molecule's effect experimentally. The chemists don't want to go on just computer-based (in silico) results.

Once a molecule confirms in the experimental assays, we will try it in cell-based assays to see if it gets into the cell, a major issue. And we will try to determine experimentally if it is hitting the target we expect it to hit. And we will take the molecule and try to co-crystallize it with the protein to see if it is docking in the active site as our original screening predicted.

So, what I am saying is that there is a lot of experimental work that must be done before validity can be given to the virtual work. When we do virtual screening a protein target using a library of 50,000 molecules for example, we do not claim that our virtual screening results give a true drug candidate in the top 5,000 molecules based on virtual scores. We think of it as enriching the top 10-20% to improve our chances of finding
a good molecule quickly without having to experimentally screening all 50,000 molecules. The virtual screening can at least eliminate those molecules in the 50,000 that are just completely wrong for the protein, such as being too big for the active site or the wrong electrostatic charges. This can save us a lot of time and money on experimental work, but doesn't guarantee that any of our molecules are good, and we may need to go to another collection of small molecules to test.

All that being said, I was able to use virtual screening to find a drug candidate targeting S-adenosylmethionine decarboxylase back in 2004. It is still being developed towards a preclinical drug, i.e. the chemists are trying various modifications to make it even more potent. In that project I took the 1,990 molecules of the NCI Diversity set, virtually screened them and then had our collaborators test 133 of the top 300. I believe our hit was number 76 out of that. So sometimes virtual screening can be a good start, but it always takes experimental work to confirm it.

Once you have a molecule of interest, how do you know it won't hit other proteins and cause side-effects? That too takes experimental cell-based work but there are attempts to use virtual screening to research this. We are developing a system that takes the protein structures from the Protein Databank and it will dock your molecule of interest into hundreds of different proteins, and report on those proteins for which there appears to be a strong interaction. Not many labs are doing this yet because of the amount of work involved in preparing the virtual structures of the proteins. We have over 800 proteins in our system now and we are trying to incorporate different conformations of them to show their flexibility. We want to get an NIH grant to develop this further and make it available to other researchers. In the meantime, I am going to meet with our internal computer guys to see if they will take over the project software development and let us focus on preparing the proteins for it. We have a unique approach to determining the important interactions. We should have a paper coming out on it soon. We presented the system to a software user conference a year ago and they were very interested in it and they liked our scoring approach better than other approaches that could be used. If the system works right, it can point to potential adverse effects. But it can also help with retargeting of known approved drugs, for example, an arthritis drug may hit some proteins that could be of interest in cancer research. We have validated our system with published data (i.e. trichostatinA, a known inhibitor of deacetylases finds the deacetylases in our protein collection; staurosporin, a known inhibitor of kinases, finds many of the kinases in our protein collection).

So, in conclusion, virtual modeling is helpful but needs to be tied to experimental confirmation. As for the protocol in question, I did not see details on the virtual modeling software used or on experimental confirmation done. I think the author was suggesting that his virtual work should be tested experimentally. The virtual work needs validation before one can develop therapies around it.

Wesley

mrhodes40 · Post by **mrhodes40** » Sun May 25, 2008 10:33 am

Wow, ask a question and look what happened. Thanks Wesley so much for that! What an exciting field and really you made it clear, thank you very much!

The question is related to the experimental online Marshall Protocol, which insists that people with autoimmune disease must AVOID all sources of vitamin D.

This includes people with MS and I know a couple pwMS on this protocol.

No sunlight no cod liver oil etc etc. To see from the MP site a presentation that shows what his theory is, (including the molecular modelling) look at this URL:

http://www.marshallprotocol.com/forum2/2274.html

links to the presentation on that page:

American Academy of Environmental Medicine conference on "The Brain and the Environment"; Oct. 26-29, 2006 in Hilton Head, SC- Plenary Sessions Syllabus, 41st Annual Meeting A New Approach to Treating Intraphagocytic CWD Bacterial Pathogens in Sarcoidosis, CFS, Lyme and other Inflammatory Diseases. (Certified for physician CMEs)

The excellent presentation is available online in RealVideo 9 format....you can view it from URL
http://autoimmunityresearch.org/aaem_2006_mirror1.ram
or from URL
http://autoimmunityresearch.org/aaem_2006_mirror2.ram
or from URL
http://autoimmunityresearch.org/aaem_2006.ram

If you look at the presentation, you will get a strong understanding that the reason they believe that vitamin d is so bad for people is that
according to the computer program Dr Marshall uses, vitamin d functions exactly like steroids, which hamper the immune system, but which would allow the person to have infections go unchecked and causing their disease.

Because most people do not have the background to understand the molecular genomics, including the doctor I know who prescribes this for people, I asked Wesley if he would please comment on the accuracy of these programs.

If someone is going to force their vitamin d levels into deficient ranges based on a computer program in the face of all the mounting clinical evidence that vitamin d is good for pwMS, then they had better know that such a program is absolutely fool proof in its assessment of that need.

Wesley said:

The bottom line is that the modeling by itself is only suggestive of possibilities, not proof. It requires a lot more work with experimental testing using in vitro assays, cell-based assays, structural biology work, and then it can proceed into animal models, and eventually into clinical trials.

and another that speaks specifically to the idea that the MP is superior to lab research because it is using math:

To go so far as to predict binding affinities and other kinetic data, (IC50 values) is well beyond the capabilities of the software now to think that they are accurate values. Virtual results always, always, always need experimental confirmation, always. We usually do not want to show the molecule structure to our chemists until we can confirm the molecule's effect experimentally.

The MP is not only offering their computer generated material as a possibility, they are offering it as a vastly superior information that should be taken OVER lab research

If one was to take the computer model as if it was accurate, it forces you to believe that in all the years of research and in all the labs all over the world, no one to date has noticed that the inactive form of vitamin d made in the skin inactivates the vitamin d receptor as the MP says it does.

It forces you to accept that vitamin d in physiologic levels is bad for you and you must allow your body to be deficient to recover from your disease.

It forces you to believe that being in the sun is equal to being on steroids

Even if you believe that maybe cryptic cell wall deficient bacteria play a role in autoimmune disease (possible in my mind), this computer model forces you to reject research that shows that TB patients with HIGH vitamin d status recover better, including intracellular TB, when it is well known that people who are on REAL steroids are MORE at risk for TB.

All in all, if your disease is causing you to consider some different alternative ways of treatment, do not over estimate the importance of the molecular modeling and be overly impressed by it when looking at this idea.

If you want to read another page that is a critical review of the MP, see it here.
http://stuff.mit.edu/people/london/universe.htm

I hope this helps some future people looking for information related to MS and this marshall protocol.
Many thanks to Wesley for taking the time to really explain what is going on with the molecular modeling!
marie

Post by **jimmylegs** » Sun May 25, 2008 11:15 am

anybody who studies biochemistry of a single nutrient on a computer without including all the synergistic effects of other components is asking for trouble, in my view.

Lyon · Post by **Lyon** » Sun May 25, 2008 1:27 pm

gwa · Post by **gwa** » Sun May 25, 2008 2:29 pm

This is a very understandable presentation about computer generated studies, Wesley. I appreciate the time it took you to write and explain the topic.

I would not attempt to debate the MIT findings as they seem very conclusive. Several months ago I found my way to the Marshall Protocol and read quite a bit about it. I came away thinking that what he said made no sense, so I guess I am in good company with MIT also disagreeing with him.

gwa