T-cell epitope polymorphisms of the Plasmodium falciparum circumsporozoite protein among field isolates from Sierra Leone: age-dependent haplotype distribution?

Background In the context of the development of a successful malaria vaccine, understanding the polymorphisms exhibited by malaria antigens in natural parasite populations is crucial for proper vaccine design. Recent observations have indicated that sequence polymorphisms in the C-terminal T-cell epitopes of the Plasmodium falciparum circumsporozoite protein (Pfcsp) are rather low and apparently stable in low endemic areas. This study sought to assess the pattern in a malaria endemic setting in Africa, using samples from Freetown, Sierra Leone. Methods Filter-paper blood samples were collected from subjects at a teaching hospital in Freetown during September–October 2006 and in April–May 2007. The C-terminal portion of the Pfcsp gene spanning the Th2R and Th3R epitopes was amplified and directly sequenced; sequences were analysed with subject parameters and polymorphism patterns in Freetown were compared to that in other malaria endemic areas. Results and Discussion Overall, the genetic diversity in Freetown was high. From a total of 99 sequences, 42 haplotypes were identified with at least three accounting for 44.4% (44/99): the 3D7-type (19.2%), a novel type, P-01 (17.2%), and E12 (8.1%). Interestingly, all were unique to the African sub-region and there appeared to be predilection for certain haplotypes to distribute in certain age-groups: the 3D7 type was detected mainly in hospitalized children under 15 years of age, while the P-01 type was common in adult antenatal females (Pearson Chi-square = 48.750, degrees of freedom = 34, P = 0.049). In contrast, the single-haplotype predominance (proportion > 50%) pattern previously identified in Asia was not detected in Freetown. Conclusion Haplotype distribution of the T-cell epitopes of Pfcsp in Freetown appeared to vary with age in the study population, and the polymorphism patterns were similar to that observed in neighbouring Gambia, but differed significantly at the sequence level from that observed in Asia. The findings further emphasize the role of local factors in generating polymorphisms in the T-cell epitopes of the P. falciparum circumsporozoite protein.


Background
Every year malaria kills more than a million people worldwide [1] and the need for an effective vaccine to boost control efforts cannot be over-emphasized. In the context of the development of a successful malaria vaccine, a major potential obstacle is the issue of genetic polymorphisms exhibited by malaria antigens. While an understanding of these polymorphisms in natural plasmodium populations is not only crucial for proper vaccine design, deployment and evaluation, such data may also provide invaluable insights into host-parasite interaction(s). The Plasmodium falciparum circumsporozoite protein (Pf-CSP) is the most abundant protein on the sporozoite's surface and has a sequence comprising a central repeat that is flanked by polymorphic N-terminal and C-terminal nonrepeat regions. The Pf-CSP antigen is the primary component of the RTS,S [2], which is one of the leading malaria vaccine candidates, and the only one to have shown moderate, but promising, results during Phase II trials in adults in The Gambia [3] and children in Mozambique [4]. Plans for Phase III trials are currently underway in 10 African nations [5]. Although field data on malaria antigen polymorphisms generally have been rather scanty, there is now an improving trend partly due to increased training of and collaboration with researchers in endemic nations.
With regards to the Pfcsp gene, two distinct polymorphisms have been described: polymorphism in the number of NANP repeats and single nucleotide polymorphisms (SNPs) in non-repeat regions. Although polymorphisms in the number of NANP repeats can be extensive even in low endemic areas [6], it has been documented that SNPs in the non-repeat regions (Th2R/Th3R epitopes) in such settings, e.g. Southeast Asia, are stable for an observation period of at least five years in Vietnam [7], or 10 years in Thailand [8]. Moreover, the overall scenario suggests that Asian parasites (with respect to the Th2R/Th3R regions) are somewhat "clonal", with one majority sequence predominating, probably reflecting distinct differences between parasites from Asia (lower endemicity) and Africa (higher endemicity). However, samples examined from Africa at the population level tended to be relatively small and with variation in sampling time-points. These findings prompted the investigation of the diversity of Pfcsp gene in a typical African malaria-endemic setting in Sierra Leone. With no previous data on this important topic in the country, the main aim herein was to collect baseline sequence data and to characterize sequence polymorphisms of the T-cell epitopes of Pf-CSP among field isolates distributing in the capital, Freetown, with an ultimate aim of acquiring insight into host-parasite interaction(s). In addition, this study examined whether there was any significant difference in polymorphism patterns between African and Asian parasites. Results confirmed marked sequence diversity of African isolates as compared to Asia, with at least three major haplotypes in Freetown. Moreover, haplotype distribution in this endemic area appeared to vary with age, and presumably immunity status. Implications of findings are discussed with regards to malaria vaccine design and local malaria epidemiology.

Study area
This study was conducted at the Princess Christian Maternity and Children's Hospital (PCMH), as part of a study on G6PD deficiency polymorphisms, which has been described in [9]. The PCMH is located in the East End of Freetown, where it is surrounded by overcrowded residential and market areas. As the name implies, the hospital not only plays a vital role in maternity medicine, but also serves as one of the main national paediatric centers in the country. Malaria transmission in Freetown is perennial with apparent fluctuations in intensity during the rainy (May-October) and dry (November-April) seasons. Although there have been no recent malaria-related entomologic studies in the capital, earlier studies have identified Anopheles gambiae, Anopheles funestus, Anopheles melas, and Anopheles nili as efficient local malaria vectors, with P. falciparum being the dominant parasite species (reviewed in [10]). Previous research [10] also identified the capital as fluctuating between that of a meso-and hyper-endemic area, but studies are urgently required to properly assess the current situation. Despite the long history of malaria control activities in Sierra Leone, the disease is still a major public health problem and accounts for about 40% of outpatient consultations in the country. Presumptive diagnosis followed by self-treatment is not uncommon, and is thought to contribute greatly to the widespread drug resistance recently documented in the country, warranting a recent shift in national malaria treatment policy [11].

Sample collection
Hospital-based surveys were conducted in mid-September to mid-October 2006 and in mid-April to mid-May 2007. Participants in this study comprised two main groups of individuals residing in Freetown: (i) symptomatic children visiting and/or admitted at the PCMH, as well as (ii) antenatal out-patients visiting the PCMH microbiology laboratory for routine blood tests (e.g. malaria microscopy, differential cell count, haemoglobin levels etc.). Patients were introduced to the study and those consenting were screened for malaria via microscopy, and filter paper blood samples were collected for each subject. Filter paper blood spots were air-dried, sealed in tiny plastic bags and stored at room temperature (25-31°C), until later molecular analysis at Jichi Medical University, Tochigi, Japan. Prior to blood donation, subjects were interviewed and basic demographic information for each participant were recorded including age, sex, ethnicity, and whether they were admitted at the hospital (i.e. inpatients) or only visiting (out-patients). Falciparum parasitaemia (parasite density) was determined for each subject by local microscopists, but due to technical limitations encountered in the field parasite density data was not utilized further.

DNA extraction and sequencing
Genomic DNA was extracted from randomly selected filter paper blood samples using the GFX kit (GE Healthcare, UK), regardless of microscopy results. The C-terminal portion spanning the T-cell epitope regions of the gene of P. falciparum csp was amplified by a semi-nested PCR using primers and cycle conditions described in [12]. In brief, the first PCR used primers 5'-aatcaaggtaatggacaagg-3' (CSo101) and 5'-ctaattaaggaacaagaagg-3 (CSo102); and 1 μl of the PCR product was used in a second round PCR using primers CSo101 and a second reverse primer, 5'ggaacaagaaggataatacc-3' (CSo104) in a 20 μl reaction mixture. Amplified gene fragments were resolved on 2% agarose gels following electrophoresis and visualized by ultraviolet (UV) light. The fragments of positive samples were excised using FUNA Gel pipette tips (Funakoshi, Tokyo, Japan), and diluted with 40-100 μl deionized distilled water depending on band intensity. Five microlitres of diluted DNA was used as template for cycle sequencing in a 10 μl reaction and the reverse primer was used to read the sequences for each fragment on an ABI 310 Genetic Analyzer. In principle, sequences were read once per sample using the second reverse primer (CSo104) but when a new mutation was encountered or when two electrophoregram peaks overlapped at a known mutation site, sequencing was repeated with another independent sample using the forward primer (CSo101).

Sequence alignment and statistical analyses
Sequences were aligned using MEGA4 software [13] and manually edited, with the sequence of the 7G8 strain from Brazil [14] serving as reference. Each csp sequence was treated as an independent entity, coded and statistical analysis was conducted using the SPSS 10.0 software for Windows. The 2-sided Pearson Chi-square test of independence for categorical variables was used to compare the respective frequencies of haplotypes with regards to subject parameters including age, ethnicity, sex, and "status" (i.e. whether hospitalized or not), as well as time of sample collection. This was done in an attempt to examine the hypothesis whether haplotype distribution was associated with age (and presumably immunity status) or any of the other parameters described above. Statistical significance was set at the 5% level (P = 0.05).
In an effort to examine geographical diversity patterns with other parasite populations, the C-terminal portion of recently published Pfcsp sequences from Vietnam [7] (n = 142), from Iran (n = 90) described in [15], and from one cohort (n = 44) of field isolates from neighbouring Gambia [16] were analysed. Commonly used intra-population and inter-population genetic parameters were estimated using DnaSP 4.5 software [17] and compared between African (The Gambia and Sierra Leone) and Asian sequences (Vietnam and Iran). Parameters investigated include nucleotide diversity, pi (π) [18], which is the average number of substitutions between any two sequences, and haplotype (gene) diversity values [18]. In addition, the Wright's Fixation statistic (F ST ) [19], which measures genetic differentiation between sub-populations, and Nei's parameters [18] including the average number of nucleotide differences per site (Dxy), net nucleotide substitution per site (Da), and average proportions of nucleotide substitutions per site (Kxy) between populations, were also estimated.
This study was approved by the Ethical Committees of the PCMH, University of Sierra Leone and Jichi Medical University, Tochigi, Japan, and the partial Pfcsp sequences obtained herein have been submitted to GenBank with accession numbers GQ119637-GQ119678. Table 1 describes the demographic characteristics of all study subjects in the PCMH 2006 study-group who provided blood samples, as well as the number of DNA samples successfully sequenced and analysed. For analytic purposes, subjects from this PCMH 2006 study-group were categorised into three main age groups: (i) below five years of age, (ii) 5-14 years of age, and (iii) 15 years and older. Of 164 filter paper blood samples, DNA was isolated from 102 samples (i.e. isolates) of which 81 showed bands (positive samples) following PCR amplification for the target Pfcsp gene fragment of 321 base pairs (bp), and these were successfully sequenced. Sixty-six sequences were of clear electrophoregram peaks showing no or low (<1/3) background overlap of the dominant peak. These were referred to as "single dominant" sequences and coded as "single-clone" infections. Six samples revealed sequences with electrophoregram peaks overlapping at one known mutation site for which both possible amino acid sequences were deduced from the nucleotide sequence (total n = 6 × 2). Nine samples showed chromatogram peak overlap at two or more known mutation sites (termed "mixed-clone" infections), and were omitted in the analysis presented here.

Haplotype distribution with age, sex and other parameters
Pooling the data and analysis of the whole set of 99 C-terminal csp sequences (including the six double-clone infections) from Freetown identified a total of 42 unique haplotypes for which the respective proportions are depicted in Figure 1. Three major haplotypes were detected: the 3D7-type (19.2%) which is identical to the RTS,S vaccine sequence, a previously undescribed haplotype, P-01 (17.2%), and E-12 (8.1%) a sequence that has been reported from neighbouring Gambia. A majority of the remaining haplotypes comprised mostly minority alleles and almost all were unique to the West African subregion at the nucleotide level. Consistent with previous studies, almost all mutations observed (except for haplotype P-17) were non-synonymous, and clustered in the Tcell epitope regions (Th2R, Th3R) of the gene, although a few sequences had mutations in the region preceding the Th2R (labeled X in Figure 1). Moreover, there was an Arg > Gly substitution (haplotype P-10 in Figure 1) in the highly conserved region II i.e. a region sandwiched between the Th2R and Th3R, which has been implicated in hepatocyte binding [20]. Interestingly, most of the observed haplotypes usually differed from one another by one non-silent point mutation in the whole C-terminal region. For example, the P-10 haplotype differed from the 3D7 sequence only by the amino residue substitution at position 360 in Figure 1, while the P-01 isolate that dominated in antenatal out-patients differed from the P-30 isolate only by the R>T substitution at residue position 337. Haplotypes E12 and Cam3006 differed by one mutation in Th2R, whereas 3D7 and P-03 differed by a mutation in the Th3R. A similar trend was observed for other haplotypes, tempting the speculation that some form of recombination events might be occurring in these C-terminal T-cell epitope regions of CSP in natural parasite populations.
Preliminary analysis for the whole set of 99 sequences revealed a non-significant (Pearson Chi-square value of 93.219, degrees of freedom = 82, P = 0.187), yet unexpected haplotype distribution pattern with respect to age. As shown in Figure 2, the 3D7 and a previously undescribed haplotype (P-01) appeared to predominate in different age groups, in which the former was associated with

Geographical distribution of csp haplotypes
Spatial polymorphism patterns were examined by comparing recently published sympatric (of parasites from the same locality at a given time) csp sequences obtained from hospital-based surveys in different countries [7,15,16]. These data were acquired through direct sequencing of isolates, allowing appropriate comparison to data presented here, and only the region (207 bp) spanning the Th2R and Th3R epitopes was used (i.e. spanning amino acid positions 326 -394  Note that only variable sites immediately after the central repeats (labeled X i.e. amino residues 308 -317), and the regions spanning the Th2R (amino residues 326 -342) and Th3R (amino residues 356 -378) in the C-terminal portion of the gene are shown. Amino residue numbering is with reference to the 7G8 sequence and haplotypes labeled P-01 ~ P-30 are unique to this study. The P-17 haplotype differs from the 3D7 sequence by two silent nucleotide mutations but otherwise has an identical amino acid sequence. As shown, the "Asia-type*" refers to the Th2R*5/Th3R*1 haplotype previously shown to be predominant in Asia, but was not detected in this study. Amino acid sequence the 7G8 sequence was found in all three countries except Vietnam. While this differential csp-haplotype distribution was statistically significant by Chi-square analysis using SPSS (Pearson Chi-square = 766.847, d.f. = 219, P < 0.00001), it is of interest to note that not finding a particular haplotype does not completely preclude its existence in the locality of interest but may simply suggest it might be of low frequency. Another striking feature was that all mutations were almost always restricted to the same positions in the T-cell epitope regions of the gene, regardless of parasite origin. What seemed important was the nature of the substituting amino acid, again suggesting strong functional constraints on the CSP molecule.
As depicted in Table 2, within-population genetic diversity estimates (π, haplotype diversity) reveal consistent patterns in all countries, with the Th2R epitope region being more polymorphic than the Th3R region. However, between-population genetic differentiation parameters (F ST , Dxy, Kxy and Da) indicate regional differences, with African parasites having similar indices than they are to Asian parasites. This data is probably reminiscent of differences in transmission intensities/dynamics between the respective geographic regions.

Discussion
Presented here is the first malaria molecular epidemiologic study in Sierra Leone on the genetic diversity of T-cell epitope regions in the C-terminal portion of the P. falciparum circumsporozoite protein among wild isolates circulating in the capital, Freetown. Contrary to previous data presented from moderate to low endemic settings of malaria [6][7][8]15], results of this study indicate that sequence polymorphisms in the same region of Pfcsp (Th2R/Th3R epitopes) in an African setting are markedly diverse, but again with some degree of restriction. In addition, the single-haplotype predominance of the "Asia type" sequence (proportion > 50%) previously docu-Distribution of csp haplotypes by age category among the PCMH 2006 sequences for which n = 33, 21, and 24 for the 3 respec-tive age groups Note that all sequences from single-clone as well as the six double-clone infections were used in this analysis. The 3D7 haplotype was found more in children aged less than 15 years while the P-01 allele was more prevalent in adult females, who were mostly antenatal follow-up subjects.
Haplotype (n = 78 sequences) P-30 P-29 P-28 P-27 P-26 P-25 P-24 P-23 P-22 P-21 P-17 P-16 P-14 P-13 P-11 P-10 P-09 P-08 P-07 P-06 P-05 P-04 P-02 >/=15 yr mented in Asia (Additional file 1) was not observed in this African setting. Instead three haplotypes accounted for the majority (44.4%) of the sequences i.e. 3D7 (19.2%), P-01 (17.2%) and E12 (8.1%), and with intriguing distribution patterns. The 3D7 allele was associated with hospitalized children in two age groups (<5 years of age and 5-15 years of age), but not with adults, while another previously undescribed haplotype (P-01) was significantly associated with visiting antenatal subjects (P = 0.049). Since most of these antenatal females were of mild or asymptomatic infection, this finding inevitably warrants the speculation whether distribution of P. falciparum csp haplotypes could be associated with disease severity i.e. does naturallyacquired immunity select for certain CSP variants in malaria-endemic areas? While the data presented here may in part support this notion, it may also be argued that this study may have suffered due to limited sample size (thus underpowered), or the findings may have been simply due to chance (borderline P values). Whatever the case, this haplotype distribution pattern in an African endemic area is important in that it adds insight into a relevant issue in vaccine design and evaluation i.e. could there indeed be significant differences of vaccine effects in symptomatic versus asymptomatic malaria individuals, particularly in the context of an African endemic setting where transmission is usually intense? This might be a major issue for consideration in future large-scale longitudinal studies on malaria antigen polymorphisms of vaccine antigens in other endemic nations.
Despite the documented polymorphisms in CSP, the mechanism(s) behind them are poorly understood. Proposed mechanisms include intragenic recombination and natural selection (human immune pressure-or mosquitoinduced). In recent field trials of the most advanced CSP vaccine (RTS,S), it was demonstrated that vaccine-mediated immunity was not strain-specific [21], nor did the vaccine induce T-cell epitope selection [12]. A subsequent study using samples from Thailand further questioned the contribution of human immune pressure towards epitope diversity in the csp [8]. Thus it was suggested that this scenario could be due to a major contributory role of "mosquito-induced" selection by the different vector species transmitting malaria in the different geographic regions [8,15]. However, it appears that this "mosquito-induced" selection may not completely explain the observed differential haplotype distribution patterns in different endemic settings, as documented in this study. Moreover, although one study recently demonstrated that sterile protection is independent of the circumsporozoite protein [22], the role of the human immune system is supported by other studies documenting induced immune responses to CSP [23][24][25][26]. Taken together, the data suggest that the role of the human host's immune pressure cannot be completely excluded, even if it potentially leads to a nonprotective or partially protective immune response, as might be expected in an immune-evasion strategy.
Thus, a more likely explanation would incorporate a complex interplay of a combination of factors including hostimmune selection (presumably due to differential bind- ing of parasite-derived peptides to host molecules), transmission dynamics, strong functional constraints on the protein, as well as the reproductive nature of the parasite. Which factor(s) contribute a major role at any given time and place would depend largely on a "net effect" given the local epidemiologic situation. Generally, regions of high P. falciparum transmission intensity correlate with high parasite prevalence and high genetic diversity of the parasite, with resultant high proportion of mixed genotypes in a given infection. This would increase the probability that a feeding mosquito obtains genetically distinct haplotypes in its blood meal, which in turn increases the probability of out-crossing in the midgut. This is consistent with data presented here considering the numerous CSP haplotypes circulating in this African endemic setting, as well as the tendency for sympatric parasites appearing to be more closely related than their allopatric counterparts ( Figure 1, Table 2). In low transmission areas, it is expected that the low genetic diversity would limit the observed recombination events as detected by common algorithms, with the tendency of having a somewhat "clonal" population pattern in such low transmission areas. The finding of a predominant haplotype using larger samples in Asia e.g. the "Asia-type" (Additional file 1) and, more recently, the predominance of the 7G8 haplotype in Peru [27] would support this scenario. In short, it may be necessary to reflect these epidemiologic differences while interpreting field data on csp or other malaria antigen polymorphisms.
Hopefully future large-scale studies on Pfcsp polymorphism, supplemented with well-coordinated entomologic and sero-immunological analyses, in other African endemic areas would shed more light on this issue, similar to what was recently documented for the P. falciparum merozoite surface protein-1 (MSP-1) in Mali [28].

Conclusion
This study highlights an apparent discordant haplotype distribution of the T-cell epitopes of Pf-CSP with respect to age. In addition, the observed sequence polymorphism patterns were similar to that in neighbouring Gambia, but differed significantly at the sequence level from that observed in Asia. Thus, the interpretation of genetic diversity data of malaria antigen genes should probably reflect not only the antigen but also the local epidemiologic situation. In the context of the RTS,S vaccine, its future deployment in Sierra Leone is envisioned to confer similar protection in children based on extrapolation of efficacy results from The Gambia and Mozambique, because the 3D7 haplotype appears to be prevalent in this endemic setting.