PEST sequences in the malaria parasite Plasmodium falciparum: a genomic study

Background Inhibitors of the protease calpain are known to have selectively toxic effects on Plasmodium falciparum. The enzyme has a natural inhibitor calpastatin and in eukaryotes is responsible for turnover of proteins containing short sequences enriched in certain amino acids (PEST sequences). The genome of P. falciparum was searched for this protease, its natural inhibitor and putative substrates. Methods The publicly available P. falciparum genome was found to have too many errors to permit reliable analysis. An earlier annotation of chromosome 2 was instead examined. PEST scores were determined for all annotated proteins. The published genome was searched for calpain and calpastatin homologs. Results Typical PEST sequences were found in 13% of the proteins on chromosome 2, including a surprising number of cell-surface proteins. The annotated calpain gene has a non-biological "intron" that appears to have been created to avoid an unrecognized frameshift. Only the catalytic domain has significant similarity with the vertebrate calpains. No calpastatin homologs were found in the published annotation. Conclusion A calpain gene is present in the genome and many putative substrates of this enzyme have been found. Calpastatin homologs may be found once the re-annotation is completed. Given the selective toxicity of calpain inhibitors, this enzyme may be worth exploring further as a potential drug target.


Background
Calpain (EC 3.4.22.17) is a Ca 2+ -dependent cysteine protease first isolated in 1978, with a pH optimum between 7.0 and 8.0. There are at least 15 distinct calpain genes present in the human genome and several have a number of isoforms (up to 10). Along with the ATP-dependent proteasome, calpain appears to be responsible for the majority of non-lysosomal targeted proteolysis. It is a member of the papain superfamily [2] a group of proteases that includes papain, calpain, streptopain, ubiqui-tin-specific peptidases and many families of viral cysteine endopeptidases. Calpain is a protein of ancient origin with homologues found in vertebrates, insects, crustaceans, nematodes, fungi, higher plants, Dictyostelium, kinetoplastid Protozoa, and bacteria [2] and evolved from a gene fusion event between an N-terminal cysteine protease and a C-terminal calmodulin-like protein, an event predating the eukaryote/prokaryote divergence [3].
The enzyme cleaves preferentially on the C-terminal side of tyrosine, methionine or arginine, preceded by leucine or valine (i.e. P1 = Y, M, or R; P2 = L or V according to the established nomenclature [4]). Calpain occurs either as a heterodimer with a small regulatory subunit and a large catalytic subunit or as the catalytic subunit alone [5]. It has been crystallised and its structure has been solved for several species [6,7]. The active site consists of a conserved triad of cysteine, asparagine and histidine. The catalytic domain is divided into two subdomains (2a and 2b) with the cysteine residue lying in domain 2a and the histidine and asparagine in 2b. Calpain has a natural monomeric protein inhibitor, calpastatin [8]. In the presence of Ca 2+ , calpain undergoes a conformational change, dissociates from or cleaves the associated calpastatin and finally cleaves its own first domain to become fully active.
Substrates of this enzyme appear to be recognised principally by the presence of PEST sequence(s) within the protein [9,10] although exceptions are known [11]. PEST sequences were first described in 1986 [12] and are short subsequences (usually 10 -60 residues) within proteins that are bounded by but do not contain basic residues (H, K or R), and are enriched in proline (P), glutamate (E), serine (S), threonine (T) and aspartate (D) residues. An algorithm (the PEST-find score) has been described for assessing the significance of such subsequences: a score of 5 or greater is regarded as significant. PEST sequences are found in ~10% of all cellular proteins in the organisms analysed to date and are typically found in highly regulated proteins. PEST +ve (PEST sequence containing) proteins typically have short half lives (0.5 to 2 hours) in intact cells compared with most other proteins (>24 hours). In PEST +ve proteins, removal or disruption of the PEST sequence increases the protein's half life to more "normal" values while insertion or creation of a new PEST sequence within a PEST -ve (PEST sequence free) protein decreases that protein's half life to a value typical of a PEST +ve protein.
Two papers describe the effects of calpain inhibitors on P. falciparum. The first [13] described the effect of calpain inhibitors on the invasion of erythrocytes. The authors found the inhibitors used were ~100 times as potent (IC 50 10 -7 M) than the other protease inhibitors (chymostatin, leupeptin, pepstatin A and bestatin) examined. Erythrocytes normally contain only calpain 2 and it was not clear at the time if the effect of these inhibitors was as a result of inhibition of the parasite's and/or of the erythrocyte's calpain. This has been clarified recently by Hanspal et al. [14] who reinvestigated this effect in calpain 2 knock-out mice. The mouse erythrocytes were shown to have no detectable calpain activity but still supported the invasion and growth of P. falciparum in culture. Calpain inhibition again prevented re-invasion. A third paper [15] has shown that removal of Ca 2+ from the growth medium results in growth arrest in the late trophozoite stage and failure to invade erythrocytes -findings consistent with a role for calpain in the parasite life cycle. With the recent publication of the entire genome [16][17][18] a search for the calpain and calpastatin genes or their homologs and PEST +ve proteins was undertaken to investigate this further.

Methods
The flat files (version 2) were downloaded from the Plas-moDB [19] web site http://www.plasmodb.org/. The gene coordinates extracted and then used to build a database of the genes. Multiple errors were found in the annotation including an anomalous start (AAA, CAC, GTA, TAG) and termination codons (AAT, ATA, AAG, CTT, GAA, GGG, TTC, TTT), introns of length 2 and 5 bases, unusual intronic splice sites (TA, TT), introns with exceptionally high GC content (up to 44%) and the absence of a number experimentally known genes. In view of these findings, the alternative annotation of chromosome 2 [20], as revised in September 2002 http:// www.wehi.edu.au/MalDB-www/chr2list.html/, was used.
The PEST score was calculated with the standard algorithm [12]. Sequences of 10+ residues bounded by, but not containing basic residues (H, K or R), are first identified. The mole percent (MP) of this subsequence is then determined after subtracting one mole equivalent of P, E/ D and S/T. The normalised hydrophobicity value is the value of the Kyte-Doolittle index [21] for that residue multiplied by 10 plus 45, giving values between 0 and 90. Stellwagen http://emb1.bcc.univie.ac.at/embnet/tools/ bio/PESTfind/ has suggested that a value of 58 for tyrosine rather than 32 as originally given gives a more reliable PEST score: the former value was used here. The average hydrophobicity (H o ) of a subsequence is determined by summing the MP of each residue and its normalised hydrophobicity value. The PEST-find score is 0.55 (MP) -0.5 (H o ). (The original paper has a misprint with PEST-

Results
The revised annotation of chromosome 2 [20] predicts 206 protein-encoding genes. Forty-four PEST sequences with scores > 5 were found with lengths varying from 12 to 94 (30.3 + /-19.9) amino acids in 27 (13.1%) proteins. The proteins fall into four groups (a) hypothetical proteins -4 (b) DNA binding proteins -2 (origin recognition complex subunit 5 and chromatin-binding protein) (c) metabolic proteins -3 (a phosphatase, ribosome releasing factor, ATP-dependent acyl-CoA synthetase) and (d) cell surface associated proteins -18 (erythrocyte membrane proteins (EMP) 1 and 3, rifins, merozoite surface proteins (MSP) 2 and 5, serine-repeat antigens, transmission blocking target antigen pfs230 and two predicted secreted proteins). The PEST sequences occur throughout proteins with some bias towards the N-terminal end (60% are found in the first half). The PEST +ve proteins had significantly lower predicted pIs (6.63 and 8.13 respectively: t = 3.86, p < 0.0005) and were significantly longer (1181 and 731 amino acids respectively: t = 2.91, p < 0.004) than the average. 18 (66.7%) of the PEST +ve proteins have introns, a figure slightly higher than the mean (57.5%). There was no significant difference in the number of introns per protein (1.12 and 1.32, t = -0.16, p > 0.8).
On chromosome 13, a putative calpain gene (MAL13P1.310) containing a single intron was found. This gene has been discussed in a paper by Wu et al. [22]. The gene is unusually large (2047 residues) and has a biologically implausible "intron" that appears to have been created to avoid an unrecognised frameshift [see Additional file: 1]. The 5' end of the gene contains a low-com- plexity region and is more than twice the size of other known calpains (generally 600 -800 residues). The catalytic domain is the only part of the enzyme with homology with the vertebrate enzymes and in the P. falciparum gene this domain is unusually distant from the N-terminus (residues 1002-1470): in other organisms the active site lies within 150 residues of the N-terminus. It seems probable that a fusion event has occurred at the original 5' end of the calpain gene with a second, to date, unidentified gene. If activation of the P. falciparum calpain is similar to that in other organisms, the 5' domain would be removed during activation and this new element may be responsible for the selective toxicity of the inhibitors or may play some regulatory role.

Discussion
The errors in the published genome were unexpected. A re-annotation of the genome will shortly be completed and it is intended that it will be possible to compare the two annotations (Huestis R., personal communication).
PEST sequences have not been previously described in P. falciparum and the sequences found here appear to be very similar to those known in other organisms. The presence of PEST sequences in hypothetical proteins, DNA-binding proteins [23] and proteins involved in intermediary metabolism was expected, while the finding that the majority of PEST +ve sequences were surface exposed proteins was not. The greater length and the lower predicted pI of the PEST +ve proteins and locational bias of the PEST sequences towards the N-terminus are consistent with earlier findings [12].
Several families of surface exposed proteins are present in chromosome 2: PEST sequences were found in all PfEMPs, MSP 2 and 5, a subset of SERA antigens (6 of 8) and rifins (2 of 7). Cytoadherence-linked asexual genes (CLAGs) and sub-telomeric variable open-reading frames (STE-VORs) were all PEST -ve. The presence of PEST sequences in a subset of SERA and rifins is suggestive of differential processing or cellular turnover and may shed some light on the reason for the large number of these genes in the genome.
The presence of PEST sequences in surface exposed proteins prompted a search for these sequences in other proteins known to be involved in merozoite invasion. MSP-1 and -2, the SERAs, and erythrocyte binding antigen (EBA)-175 are PEST +ve while apical membrane antigen (AMA)-1, rhoptry associated protein (RAP)-1, -2 and -3, MAEBL, merozoite capping protein-1 and acidic/basic repeat antigen (ABRA) are PEST -ve. Spectrin, band 4.1 and ankyrin are PEST +ve erythrocyte proteins known to be bound by P. faciparum merozoites during the invasion process [25]. Band 3, another PEST +ve erythrocyte protein, is elimi-nated from the site of contact with the merozoite [26]. The involvement of calpain in other cell fusion reactions [24] and the presence of PEST +ve membrane proteins suggest that this system may be involved in cell adhesion processes in P. falciparum.
As the parasite progresses from the trophozoite to the schizont stage, there is a thirty-fold increase in the level of transcription of calpain [22]. The erythrocyte normally maintains a submicromolar intracellular Ca 2+ concentration which rises thirty-fold as the parasite matures. [27] Given the presence of a PEST sequence in DNA origin recognition complex protein and the effects on Ca 2+ removal on trophozoite-to-schizont progression, it is tempting to speculate that the lack of Ca 2+ may inhibit calpain activation and that this is responsible for the effects seen here [28][29][30][31][32]. Bearing in mind a report for a central role for falcipain 1 in merozoite biology [33], the increase in transcriptional levels seen in the schizont and the effect of inhibitors on erythrocyte invasion suggest that calpain too may play a role here.
The P. falciparum calpain gene differs significantly from those found to date in vertebrates and this may partly explain the selective toxicity of the inhibitors. There are many calpain inhibitors presently available and the majority of these are small peptides that can be freezedried and stored at room temperature. Several have been used in Phase 2 human trials for treatment of myocardial infarction, stroke and cancer. Given the need for novel drugs to treat malaria, the selective toxicity of these inhibitors, several known crystal structures and the possibility of recognising probable target proteins and gene sequence, these agents may be worth exploring further.