- Open Access
Discovery: an interactive resource for the rational selection and comparison of putative drug target proteins in malaria
Malaria Journalvolume 8, Article number: 178 (2009)
Up to half a billion human clinical cases of malaria are reported each year, resulting in about 2.7 million deaths, most of which occur in sub-Saharan Africa. Due to the over-and misuse of anti-malarials, widespread resistance to all the known drugs is increasing at an alarming rate. Rational methods to select new drug target proteins and lead compounds are urgently needed. The Discovery system provides data mining functionality on extensive annotations of five malaria species together with the human and mosquito hosts, enabling the selection of new targets based on multiple protein and ligand properties.
A web-based system was developed where researchers are able to mine information on malaria proteins and predicted ligands, as well as perform comparisons to the human and mosquito host characteristics. Protein features used include: domains, motifs, EC numbers, GO terms, orthologs, protein-protein interactions, protein-ligand interactions and host-pathogen interactions among others. Searching by chemical structure is also available.
An in silico system for the selection of putative drug targets and lead compounds is presented, together with an example study on the bifunctional DHFR-TS from Plasmodium falciparum.
The Discovery system allows for the identification of putative drug targets and lead compounds in Plasmodium species based on the filtering of protein and chemical properties.
Even after a century of attempts to control and eradicate malaria, up to half a billion clinical cases are reported each year  with an annual death rate of 2.7 million . One of the most studied pathogens of all time, the parasite Plasmodium falciparum accounts for 80% of clinical cases and 90% of deaths from malaria . Malaria is a complicated, but curable and preventable disease. Millions of lives can be saved if the disease is detected early and adequately treated. As P. falciparum is becoming increasingly resistant to current anti-malarial drugs and the development of a vaccine is an extremely time-consuming, ongoing process, the death rate is expected to double in the next 20 years . Since malaria drug research is not economically feasible for international pharmaceutical companies, the responsibility to ease the burden of malaria falls upon the shoulders of third world countries.
A major limitation in malaria research is the availability of effective approaches to mine post-genomic data for new putative drug targets and lead compounds. The selection criteria for promising drug targets are therefore of extreme importance to researchers involved in malaria target screening projects and lead design [5, 6]. Currently, the main available sources of protein annotation and interaction data specifically relevant to malaria include PlasmoDB [7, 8], PlasmoMap [9, 10], the Malaria Drug Targets database , the TDR Targets Database [12, 13] and the Plasmodium falciparum Interactome Database .
Although PlasmoDB is a comprehensive annotation database, the focus of the search environment is not specifically directed towards the drug target and lead discovery process, and protein-ligand interactions are not provided. The MDT database does provide some information on potential drug targets and their inhibitors. PlasmoMap allows protein-protein interaction data from predictions to be downloaded, a single PlasmoDB ID may also be queried for predicted interactions, but no advanced searches against the data are offered. The TDR Targets database serves as an information repository as well as a tool for the prioritization of targets in whole genomes against neglected tropical diseases including malaria. PlasmoID is a developing project that provides protein interactions for P. falciparum, but the main aim of the project is to build metabolic pathways using protein-protein interactions. A brief summary of key features of these resources is provided in Table 1.
No tool exists to perform advanced interactive mining of protein together with ligand properties in a species-comparative environment. The Discovery project attempts to provide an interface where researchers may efficiently perform the selection of putative drug targets and lead compounds based on relevant properties. Relationships were constructed between the proteins, protein features and chemical compounds in a way that allows powerful searching and efficient navigation between molecules.
Discovery was developed in Python using the TurboGears web development framework  with an underlying MySQL database . Protein sequences for P. falciparum, Plasmodium vivax, Plasmodium chabaudi, Plasmodium berghei and Plasmodium yoelii were obtained from PlasmoDB and for Homo sapiens and Anopheles gambiae from Ensembl . Data for ligand-protein interactions were collected from DrugBank , KEGG  and PDB-Ligand  and protein-protein interaction data from MINT  and DIP . Currently, only hits common to both MINT and DIP are displayed, with a MINT cut-off value of 0.3. Blast searches against PDB-Ligand and DrugBank were performed using NCBI BLAST with an E-value cut-off of 1.0e-6. InterPro  analysis was performed using the latest InterProScan  from the EBI. Homology models of P. falciparum and human proteins were obtained from ModBase . Custom ortholog clusters were generated with OrthoMCL . An outline of the structure of the Discovery system is provided in Figure 1. Chemical searches, displays and ADMET calculations were implemented using the Marvin applet and JChem Base from ChemAxon . Jmol  was used to display the 3D structures and Jalview  to display the T-Coffee  alignments of the OrthoMCL clusters.
The power of the Discovery system is in the rich relations between the different data types in the system. Relations to protein features were created based on existing annotations, or analyses performed in-house. Inter-and intra-species protein-protein interaction predictions were based on orthology to interacting partners in DIP and MINT. Predicted protein-ligand interactions were based on KEGG annotations and on BLAST searches against PDB and DrugBank.
Results and discussion
A brief introduction to the resource
The Discovery resource was designed to provide an effective interface where researchers could view annotations for malaria proteins that are relevant to the drug design process, in an environment where comparisons could efficiently be made between organisms. Users may embark on a search from either a protein or ligand perspective.
When performing a simple search from a protein perspective, users may enter a PlasmoDB/Ensembl ID or a keyword. The user is then presented with either the specified protein, or a list of proteins matching the keyword by description or by one of its relationships. This list is separated by species. The user is then able to further explore the hits of interest. More powerful is the advanced search function, where the user may enter search terms in several categories, and select for the presence or absence of specific relations (Figure 2). The results are returned as above. When exploring the data for a selected protein, the user is presented with a tabbed environment with different categories of data (Figure 3). The first tab shows a summary of the protein's GO and EC terms, a graphical summary of the InterPro features and a view of the protein sequence. Next is a view of the orthologs and paralogs predicted by OrthoMCL, together with a Jalview display of the T-Coffee alignment of these sequences. This is followed by the InterPro analysis results of the protein sequence, the position of the protein in the KEGG metabolic pathway maps and tertiary structure information such as ModBase models or BLAST results against PDB. The protein interaction tab provides results of possible protein-protein interactions based on Y2H studies, PlasmoMAP interactions and orthology to DIP and MINT interacting partners. The ligand interaction tab provides predicted ligand interactions based on KEGG annotations and BLAST results against PDB and DrugBank. Host-pathogen interaction predictions are based on orthology to MINT and DIP interacting partners, with species filtering.
When approaching the search from the ligand perspective, the user may search by ligand keyword, or by drawing a chemical structure using the Marvin applet. Different methods are then available to search with the structure against ligands from KEGG, PDB and DrugBank using the ChemAxon JChem Base functionality. Results are displayed as a JChem table, and the user may select a compound to view further information. A view is then provided of the compound's structure, ADMET properties as pre-calculated in JChem Base and possible interacting proteins based on KEGG annotations, and BLAST results against PDB and DrugBank (Figure 4). The query processes followed for the different types of searches described above are summarized in Figure 5.
A sample investigation: P. falciparum DHFR-TS
Dihydrofolate reductase-thymidylate synthase (DHFR-TS) was selected to demonstrate the data displayed in the system, as it is well annotated and an established drug target. DHFR-TS occurs as a bifunctional protein in malaria, with both the DHFR and TS proteins fused together as a single polypeptide. Dihydrofolate reductase (DHFR with PlasmoDB id PFD0830w) is inhibited by the class II antifolates, including drugs such as pyrimethamine and proguanil, a prodrug of cycloguanil. All the molecules in the antifolate drug family currently in use were discovered during the Second World War in the 1940's, and since then no new anti-malarial antifolates have been developed that reached phase I/II clinical trials . One of the major causes of antifolate resistance is point mutations in DHFR with a resulting decrease in binding affinity of the inhibitor to the enzyme. Known inhibitors of P. falciparum DHFR include antifolate such as pyrimethamine, proguanil, aminopterin , methotraxate , 2-amino-1,4-dihydro-4,4,7,8-tetramethyl-s-triazino(1,2-a) benzimidazole and Trp-P-2 . Aminopterin and methotrexate inhibit virtually all DHFR enzymes, including P. falciparum and the human DHFR, and although these drugs cannot be used to treat malaria because of the possible toxicity of these particular drugs to the human host, precursors thereof may be used to treat malaria . A third generation antifolate compound, dihydrotriazene (WR99210 or BRL 6231) was also recently found to inhibit P. falciparum DHFR. WR99210 is different from the resistance-compromized pyrimethamine in that it has a flexible side-chain that binds tightly with not only the wild-type enzyme, but also the mutant P. falciparum DHFR, while still retaining anti-malarial activity .
Plasmodium falciparum DHFR-TS was selected in the Discovery system by searching with the keyword dihydrofolate, and selecting the entry 'PFD0830w bifunctional dihydrofolate reductase-thymidylate synthase'. On the summary page the EC annotation shows the two enzyme domains in the protein, EC:188.8.131.52, corresponding to the dihydrofolate reductase enzymes and EC:184.108.40.206 corresponding to the thymidylate synthases, and links to KEGG are provided. The GO annotations include GO:0006231 (dTMP biosynthetic process), GO:0006545(glycine biosynthetic process), GO:0006730(one-carbon compound metabolic process), GO:0009165 (nucleotide biosynthetic process), GO:0004146 (dihydrofolate reductase activity) and GO:0004799 (thymidylate synthase activity). Orthologs, based on OrthoMCL analysis, are indicated as: A. gambiae AGAP010457-PA with no description, three human enzymes, namely ENSP00000314727, ENSP00000314902 and ENSP00000315644, all three of which are thymidylate synthases, a P. berghei protein PB000415.02.0 and a P. chabaudi protein PC000873.02.0, both putative bifunctional dihydrofolate reductase thymidylate synthases. The P. yoelii protein PY04370, a putative thymidylate synthase, was also returned. It should be noted here that the ortholog prediction encounters a special case with bifunctional proteins, as a protein may only occur in one OrthoMCL cluster, which in effect groups bifunctional proteins according to only one of the domains. In this case, the human and mosquito orthologs were for the TS domain of the bifunctional enzyme. The T-Coffee alignment of these proteins is shown in the JalView applet, enabling further manipulations of the sequences.
When considering the summary of the InterPro results, the first featured annotation common to all species is a BLAST hit (TS) against ProDom, a database of protein domain families automatically generated from the UniProt Knowledge Database. Gene3D G3DSA:3.40.430.10 is present in all the Plasmodium species, but not in human or mosquito. HMMPfam returned two hits as to be expected for the bifunctional sequence, namely PF00186, described as DHFR 1, and PF00303 described as Thymidylat synt.
In human and mosquito, only the TS domain is associated with this ortholog group, as DHFR and TS occur separately in these species. HMMPIR hit PIRSF000389 identifies the protein as a bifunctional DHFR-TS in all the Plasmodium species. PatternScan returns two hits, PS00075 and PS00091, identifying DHFR 1 and thymidylate synthase in the Plasmodium species, except for P. chabaudi that does not show the DHFR domain (probably due to the shorter sequence). Only the TS domain is shown in human and mosquito. ProfileScan returns ID PS51330, identifying the Plasmodium proteins as DHFR 2. Superfamily identifies both parts of the bifunctional protein in the Plasmodium species as SSF55831 (TS) and SSF53597 (DHFR). HMMTigr TIGR03284 (TS) and Gene3D G3DSA:3.30.572.10 (TS) are present in all species included. This comparison highlights possible areas that may be investigated to identify parasite-specific features in the protein.
Three ModBase models are reported for this protein, based on DHFR-TS from Leishmania major (PDB template 1qzf), P. falciparum (PDB template 1j3q) and TS from Pneumocystis carinii (PDB template 1f28). A range of high-scoring PDB BLAST hits over both DHFR and TS are returned. PFD0830w is predicted through KEGG annotations and BLAST searches to potentially bind with more than a hundred ligands, including the well known inhibitors of DHFR. When searches were performed using common chemical compound names or structures such as pyrimethamine or proguanil, the bifunctional DHFR-TS from P. falciparum was listed as a possible interacting protein. No protein-protein or host-pathogen interactions were predicted for this protein.
This sample investigation effectively highlights the main features of DHFR-TS as described in the Discovery system, including possible orthologs, protein motifs and domains anda set of ligands possibly interacting with this protein.
The Discovery system currently provides basic and advanced interfaces to search protein properties in the resource, as well as a chemical search interface for ligands. Results include orthology, protein features, metabolic information, structural information, and interaction information. Of specific interest is the availability of relationships between proteins and potential interacting ligands, together with ADMET-related properties for the ligands. The system currently has some shortcomings such as the lack of statistical measures for predicted protein-ligand and protein-protein interactions. This could be improved by incorporating sequence conservation across similar sequences for scoring purposes. Also, the prediction of host-pathogen protein-protein interaction does not yet take similarity, sub-cellular localization and properties such as PEXEL motifs and transmembrane regions into account. Additionally, in the special case of bifunctional proteins, these are currently clustered only with the ortholog of the best-matching domain by OrthoMCL, and alternative approaches need to be investigated for bifunctional matching. These and other shortcomings will be addressed in future releases, yielding futher improvements to the system.
It is anticipated that the Discovery system will offer an effective starting point where users interested in drug discovery may obtain relevant information about a pre-selected malaria protein, and discover other proteins of interest by searching according to a selection of protein and ligand properties. The system's further development will serve to make it a valuable tool in the malaria drug discovery process.
Bjorkman A, Bhattarai A: Public health impact of drug resistant Plasmodium falciparum malaria. Acta Trop. 2005, 94: 163-169.
Dyer MD, Murali TM, Sobral BW: Computational prediction of host-pathogen protein-protein interactions. Bioinformatics. 2007, 23: i159-166. 10.1093/bioinformatics/btm208.
Doolittle RF: The grand assault. Nature. 2002, 419: 493-494. 10.1038/419493a.
Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419: 498-511. 10.1038/nature01097.
Birkholtz LM, Bastien O, Wells G, Grando D, Joubert F, Kasam V, Zimmermann M, Ortet P, Jacq N, Saidani N, Roy S, Hofmann-Apitius M, Breton V, Louw AI, Marechal E: Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?. Malar J. 2006, 5: 110-133. 10.1186/1475-2875-5-110.
de Beer TA, Wells GA, Burger PB, Joubert F, Marechal E, Birkholtz L, Louw AI: Antimalarial drug discovery: in silico structural biology and rational drug design. Infect Disord Drug Targets. 2009, 9: 304-318.
Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ, Treatman C, Wang H: PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 2009, 37: D539-543. 10.1093/nar/gkn814.
Date SV, Stoeckert CJ: Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res. 2006, 16: 542-549. 10.1101/gr.4573206.
TDR Targets Database. [http://tdrtargets.org]
Aguero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, Carmona S, Carruthers IM, Chan AW, Chen F, Crowther GJ, Doyle MA, Hertz-Fowler C, Hopkins AL, McAllister G, Nwaka S, Overington JP, Pain A, Paolini GV, Pieper U, Ralph SA, Riechers A, Roos DS, Sali A, Shanmugam D, Suzuki T, Van Voorhis WC, Verlinde CL: Genomic-scale prioritization of drug targets: the TDR Targets database. Nat Rev Drug Discov. 2008, 7: 900-907. 10.1038/nrd2684.
Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P: Ensembl 2009. Nucleic Acids Res. 2009, 37: D690-697. 10.1093/nar/gkn828.
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, 36: D901-906. 10.1093/nar/gkm958.
Goto S, Okuno Y, Hattori M, Nishioka T, Kanehisa M: LIGAND: database of chemical compounds and reactions in biological pathways. Nucleic Acids Res. 2002, 30: 402-404. 10.1093/nar/30.1.402.
Shin JM, Cho DH: PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures. Nucleic Acids Res. 2005, 33: D238-241. 10.1093/nar/gki059.
Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G: MINT: the Molecular INTeraction database. Nucleic Acids Res. 2007, 35: D572-574. 10.1093/nar/gkl950.
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32: D449-451. 10.1093/nar/gkh086.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37: D211-215. 10.1093/nar/gkn785.
Zdobnov EM, Apweiler R: InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.
Pieper U, Eswar N, Webb BM, Eramian D, Kelly L, Barkan DT, Carter H, Mankoo P, Karchin R, Marti-Renom MA, Davis FP, Sali A: MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2009, 37: D347-354. 10.1093/nar/gkn791.
Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13: 2178-2189. 10.1101/gr.1224503.
Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ: Jalview Version 2 – a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009, 25: 1189-1191. 10.1093/bioinformatics/btp033.
Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.
Nzila A: The past, present and future of antifolates in the treatment of Plasmodium falciparum infection. J Antimicrob Chemother. 2006, 57: 1043-1054. 10.1093/jac/dkl104.
Yeh I, Altman RB: Drug Targets for Plasmodium falciparum: a post-genomic review/survey. Mini Rev Med Chem. 2006, 6: 177-202. 10.2174/138955706775475957.
Shallom S, Zhang K, Jiang L, Rathod PK: Essential protein-protein interactions between Plasmodium falciparum thymidylate synthase and dihydrofolate reductase domains. J Biol Chem. 1999, 274: 37781-37786. 10.1074/jbc.274.53.37781.
Toyoda T, Brobey RK, Sano G, Horii T, Tomioka N, Itai A: Lead discovery of inhibitors of the dihydrofolate reductase domain of Plasmodium falciparum dihydrofolate reductase-thymidylate synthase. Biochem Biophys Res Commun. 1997, 235: 515-519. 10.1006/bbrc.1997.6814.
Yuvaniyama J, Chitnumsub P, Kamchonwongpaisan S, Vanichtanankul J, Sirawaraporn W, Taylor P, Walkinshaw MD, Yuthavong Y: Insights into antifolate resistance from malarial DHFR-TS structures. Nat Struct Biol. 2003, 10: 357-365. 10.1038/nsb921.
The authors thank Philip Law for help in displaying metabolic pathways. We are grateful to ChemAxon for the FreeWeb license. This project received funding from the South African National Bioinformatics Network, the National Research Foundation and the SAFeTI programme. FJ would like to thank Eric Marechal for hosting him during his sabbatical, during which a large part of the development took place.
The authors declare that they have no competing interests.
The orthology view functionality was developed by FJ and TAPDB, the function, metabolism and structure views by FJ, RJK and TAPDB, the protein-protein interactions by CMH, the protein-ligand interaction by TAPDB, CMH, FJ, the host-pathogen interaction view by CJO and the chemical searches by FJ and TAPDB. All authors contributed to the writing of the manuscript.