Genetic diversity of vaccine candidate antigens in Plasmodium falciparum isolates from the Amazon basin of Peru

Background Several of the intended Plasmodium falciparum vaccine candidate antigens are highly polymorphic and could render a vaccine ineffective if their antigenic sites were not represented in the vaccine. In this study, characterization of genetic variability was performed in major B and T-cell epitopes within vaccine candidate antigens in isolates of P. falciparum from Peru. Methods DNA sequencing analysis was completed on 139 isolates of P. falciparum collected from endemic areas of the Amazon basin in Loreto, Peru from years 1998 to 2006. Genetic diversity was determined in immunological important regions in circumsporozoite protein (CSP), merozoite surface protein-1 (MSP-1), apical membrane antigen-1 (AMA-1), liver stage antigen-1 (LSA-1) and thrombospondin-related anonymous protein (TRAP). Alleles identified by DNA sequencing were aligned with the vaccine strain 3D7 and DNA polymorphism analysis and FST study-year pairwise comparisons were done using the DnaSP software. Multilocus analysis (MLA) was performed and average of expected heterozygosity was calculated for each loci and haplotype over time. Results Three different alleles for CSP, seven for MSP-1 Block 2, one for MSP-1 Block 17, three for AMA-1 and for LSA-1 each and one for TRAP were identified. There were 24 different haplotypes in 125 infections with complete locus typing for each gene. Conclusion Characterization of the genetic diversity in Plasmodium isolates from the Amazon Region of Peru showed that P. falciparum T and B cell epitopes in these antigens have polymorphisms more similar to India than to Africa. These findings are helpful in the formulation of a vaccine considering restricted repertoire populations.

One of the best characterized and widely accepted by many as a potential vaccine candidate for P. falciparum is CSP [4,5]. CSP is a 58-kDa protein and is the major antigen on the surface of malaria sporozoites [6,7]. The CSP protein can be subdivided into two non-repetitive regions (N-and C-termini) and a variable central region consisting of several repeats of four-residue long motifs; both regions exhibit polymorphisms [8][9][10]. Several T-cell epitopes have been found in the non-repeat regions while immunodominant B-cell epitopes have been identified in the central repeat region [8,11]. RTS, S, AS02, a P. falciparum vaccine that consists of the repeat and C-terminal regions of CSP, has successfully completed Phase IIb trials in Mozambique [12,13].
Another antigen that is considered as a vaccine candidate for P. falciparum is MSP-1. MSP-1 is a 195-kDa protein that is cleaved into an 83-kDa N-terminal fragment, two central fragments of 30-and 38-kDa and a 42-kDa C-terminal fragment [14,15]. Just before invasion, the 42-kDa is further cleaved into 33-and 19-kDa fragments (MSP-1 33 and MSP-1 19 ). The MSP-1 19 protein fragment remains anchored to the merozoite surface at the time of erythrocyte invasion and because of its location is a major target of naturally-acquired antimalarial immunity [16]. Within the coding region of the 83-kDa fragment is Block 2, which is a principal target of antibodies associated with clinical immunity in African children [17,18]. In contrast to Block 2, the Block 17 portion of Pfmsp-1, which encodes the MSP-1 19 fragment, is conserved with only a few polymorphic sites that produce non-synonymous amino acid changes [16,19].
AMA-1 has also been evaluated for inclusion in a multisubunit vaccine for both P. falciparum and Plasmodium vivax. Recombinant AMA-1 induces protective immune responses in mouse and monkey models of malaria [20,21] and both monoclonal and polyclonal antibodies to AMA-1 inhibit merozoite invasion of erythrocytes [22]. As with the other P. falciparum vaccine candidate sequences, Pfama-1 is highly polymorphic [23][24][25] with most of polymorphisms occurring in domain I [22,23,26] making a broadly effective vaccine difficult to create.
The liver stage-specific antigen, LSA-1 is well conserved among P. falciparum isolates and is also considered a vaccine candidate. Cytokines, such as interferon gamma, have been implicated in the control of Plasmodium growth and with protection from reinfections with P. falciparum [27]. Studies have shown that the N-terminal and PfLSA-1 protein junction (PfLSA-J) regions of PfLSA-1 protein, could induce INF-γ by CD8+ T-cells in adults [28].
Yet another candidate for inclusion in a vaccine for P. falciparum is TRAP [29,30]. As with the many vaccine targets discussed above, TRAP protein is highly polymorphic. Studies designed to identify HABPs in TRAP successfully identified 21 loci, three of which contain B epitopes [31], while other studies using INF-gamma ELISPOT identified two CD8+ lymphocyte epitopes [32].
Knowledge of the distribution of polymorphic sites on malaria antigens is necessary to obtain a detailed understanding of their significance for vaccine development. This is the first report of the variants found in this part of the Amazon basin; moreover, this study includes infections occurring early in the Peruvian P. falciparum emergence (1998)(1999) [33] as well as more recently occurring infections (2003)(2004)(2005)(2006).

Malaria samples
Plasmodium falciparum isolates were collected from endemic areas in the Peruvian Amazon Department of Loreto during years 1998 to 2006 using human use approved protocols. Loreto is located in the northeast part of Peru and encompasses 30% of the Peruvian territory. The climate is warm and humid, with the rainy season (December -March) having temperatures reaching 36°C and the driest season (June -July) having temperatures as low as 17°C -20°C.
There were 139 P. falciparum isolates available for this study, all from P. falciparum-infected individuals living in or near the city of Iquitos. Twenty-seven of the isolates were collected in years 1998-1999 from patients diagnosed with severe and complicated malaria (courtesy of Dr. Richard Witzig; IRB NMRCD.2005.0004). Twentyfour samples were from patients enrolled in a sulphadoxine-pyrimethamine in vivo study conducted in 1999 (WRAIR #719). Thirty-two samples were collected between years 2003 to 2005 (11 from 2003 and 2005 each and 10 from 2004) from a community in the San Juan district, approximately five kilometers south of Iquitos, called Zungarococha from individuals participating in an active malaria detection study, including symptomatic and asymptomatic individuals (IRB PJT.NMRCD.015; IRB UAB) [33]. Also, fifty-six samples from year 2006 came from individuals presenting with malaria-like illness at clinics located in different communities near Iquitos (IRB NMRC.2000.0006). All samples obtained were blood samples confirmed to be positive for P. falciparum by microscopy. [34], was used for comparative genetic analysis with Peruvian sequences.

Genetic analysis of B-and T-cell epitopes
The genetic diversity of 11 DNA regions encoding mainly T and B cells epitopes in CSP (Pfcsp), MSP-1 (Pfmsp-1 Block 2 and Pfmsp-1 Block 17), AMA-1 (Pfama-1), LSA-1 (Pflsa-1) and TRAP (Pfssp-2) was determined in the isolates. DNA was purified from 200 μL of whole blood using the QIAamp DNA Blood mini kit. PCR fragments were generated using primers and conditions listed in Table 1 in order to amplify DNA regions encoding the selected T or B cell epitopes. PCR amplification was performed in a 30 μL reaction mixture containing 0.2 μM each of forward and reverse primers (for Pfmsp-1 0.8 μM of each primer was used), 200 μM each dNTP, 0.6 units of DNA polymerase recombinant, 3 μL of 10× PCR buffer, 1.5 mM of MgCl 2 and 5 μL of the template extracted from blood.
The following PCR conditions were used: Pfcsp genetic variability: 3 min at 94°C, 2 min at 58°C, 2 min at 72°C; 1 min at 94°C, 2 min at 58°C, 1 min at 72°C for 32 cycles and a final extension of 72°C for 10 min; obtaining a PCR product of 318 bp. Pfama-1 genetic variability: 2 min at 95°C; 1 min at 95°C, 1 min at 55°C, 2 min at 72°C for 30 cycles and a final extension of 72°C for 2 min; obtaining a PCR product of 803 bp.
Pflsa-1 genetic variability: 5 min at 94°C; 30 sec at 91°C, 40 sec at 45°C, 40 sec at 70°C for 40 cycles and a final extension of 70°C for 5 min; obtaining a PCR product of 336 bp.
Pfssp-2 genetic variability (Regions II and IV): 5 min at 94°C; 30 sec at 91°C, 40 sec at 50°C, 40 sec at 70°C for 35 cycles; and a final extension of 70°C for 5 min; obtaining PCR products of 1000 bp and 650 bp, respectively. PCR products were purified using Qiagen PCR spin columns and sequenced using BigDye terminator v3.1 cycle (Applied Biosystem, Foster City, CA) sequencing kit, the primers listed in Table 1 and an ABI 3100 automated sequencer.

Sequence analysis
Sequences were analysed using the Sequencher version 4.7 software (Gene Codes Corporation, Michigan) and comparative alignments to 3D7 were performed using the MEGA version 3.1 software [35]. Additionally, strains 7G8, D6 and W2 were sequenced and used as regional reference strains. Nucleotide diversity, Pi (π) [36,37], which is the average number of substitutions between any two sequences, and haplotype (gene) diversity values [36] were estimated with the DnaSP 4.0 software [38]. The standard deviation (or standard error) was calculated for both measures. Statistical Analysis Software (SAS Version 9.3, Cary, NC) was used to test multilocus linkage association (MLA) by Fisher Exact Test (FET); two-tailed with alpha = 0.05. The p-value was multiplied by four to correct for conducting pairwise comparisons of four antigens. The antigen diversity was calculated by determining the heterozygosity of alleles detected in each antigen, where heterozygosity (H) = (1 -Sp 2 ), with p 2 being the squared frequency of each allele variant. Wright's F-statistic (Fst) to evaluate gene flow among the different study years was determined using DnaSP 4.0 software.  Table 3). The Peruvian Pfcsp sequences were more closely related to 7G8, HB3 and Ven 765 alleles, which was expected due to the close geographical origin of the isolates [39]. All polymor-   Amino acids sequences for major T and B cell epitopes in vaccine candidate antigens from Peruvian isolat phisms were non-synonymous, and most mutations were at the first or second position of the codons, corroborating earlier studies [7]. In comparison with 7G8, W2 and HB3 strains, alleles 1 and 2 (  (Table  3).

Pfmsp
Allele 1 (a K1 type) and Allele 2 (a MAD-20 type) were the most frequently found in all Peruvian samples. K1 type alleles commenced with the hexapeptide SAQSGT or SAQSGA and ended with SGPSGT. Most diversity was due to duplications or deletions of the repeat motifs SAQ, SGT and SGP. The MAD-20 type allele started with SGGSVT and ended with SVASGG and diversity was due to repetitions of SGG and SVA. Synonymous substitutions were observed for Alanine (GCA/GCT) in K1 isolates and for Glycine (GGC/GGT) in MAD-20 isolates. Also, Block 2 PCR products of the same size did not necessarily correspond to the same allele family since there were two K1 alleles that were the same base-pair length, but had different sequence (Table 3).

Pfama-1
The alignment of PfAMA-1 domain I included 128 isolates. Regions 259-271 (R1) and 279-287 (R2) are naturally immunogenic T and B cell epitopes and nonsynonymous substitutions were found within these resgions in positions 267 (Glu→Gln) and 282 (Ile→Lys) ( Table 2). Overall Pfama-1 Domain I Pervian nucleotide sequences, 24 variable sites were found, 23 of them corresponding to nonsynonymous substitutions and only one corresponding to a synonymous substitution. Three different alleles were found in Peru; from which allele 1 had identical amino acid sequence to the vaccine strain 3D7 with only one nucleotide different (AAA/AAG) in codon 292 (Lys). This was the allele most commonly found, followed by allele 2, and then followed by allele 3, which was only detected in study-year 2006 (Table 3).

Pflsa-1
Sequences of the N-terminal region of Pflsa-1, designated T1, were aligned from 137 isolates with the 3D7 sequence. The N-terminal region of LSA-1 has been shown to induce interferon gamma production in peripheral blood mononuclear cells (PBMC). The alignment showed two nonsynonymous substitutions in aa 85 (Thr→Ser) and aa 96 (Asn→Tyr) and that the most abundant allele for T1 was allele 1. Alleles 2 and 3 were less frequent (Table 3), the latter having the same nucleotide sequence as 3D7 (Table  2).

Multilocus analysis
For each sample, the combination of alleles detected in the MSP-1B2, CSP, AMA-1, and LSA-1 (multilocus haplotypes) were determined. The order, listed above, was based upon pairwise multilocus association (MLA) analysis demonstrating the most consistent linkage between MSP-1B2 and the other antigens. The result is a four digit code where each digit is the allele detected for the respective loci: twenty-four different multilocus haplotypes were detected (

Discussion
Investigating the extent of genetic variation can assist in the laborious process required in selecting antigens for further vaccine development. Several P. falciparum and P. vivax vaccine candidate antigens are highly polymorphic which could pose a serious problem in the formulation of vaccines from a single, well-characterized strain. Previous studies have addressed the diversity and maintenance of several malarial vaccine candidates in Africa [6]. An understanding of genetic variability in T and B cell epitopes of vaccine candidate antigens in P. falciparum Peruvian isolates from the Amazon basin is presented in this study.
According to the Pfcsp nucleotide and haplotype diversity values, Th2R and Th3R regions have higher values than those obtained for other antigens in this Peru study; nevertheless, Pfcsp nucleotide and haplotype diversity values in Peru for Th2R and Th3R regions are much lower than those obtained in Kenya (π = 0.09772 and 0.07442), Gambia (π = 0.08789 and 0.06454) and Venezuela (π = 0.08466 and 0.08025). However, Peruvian isolates exhibit similar Th2R nucleotide diversity values to samples from Brazil, Vietnam and India [7,9]. Allele 1 has been previously reported in Iran [41], Brazil [42] and Gambia [43]; allele 2 in Thailand [44], Iran [41] and Myanmar (de Stricker et al., unpublished); and allele 3 in Venezuela [9].
In addition, haplotype diversity for the entire 3' end of CSP was influenced by both Th2R and Th3R regions exclu-sively, since all 7 variable sites were contained in these regions and all of the polymorphims produced nonsynonymous substitutions. Linkage and recombination events between the Th2R and Th3R regions have been described in previous studies [9]. In the present study, haplotype linkage between Pfcsp Th2R and Th3R polymorphisms was also observed. Nucleotide and haplotype diversity values for the entire 3' end of CSP of all samples were 0.01102 and 0.46387, respectively. These values show that Peruvian isolates are less diverse than African and more similar to Indian isolates [9].
CSP is one of the most widely characterized malaria vaccine candidate antigens, and the only one whose components have gone so far as completion of a successful phase II b clinical trial [12,45], the generation of relevant genetic, epidemiologic and immunologic data for the CSP gene is important, particularly for regions of low malaria endemicity. It is apparent that diversity in Pfcsp is regionally restricted, and that Peru has low genetic polymorphism with one predominant allele and variants in a small number, as in India [10], Vietnam [7] and Thailand  [5]. This suggests that since polymorphisms are restricted and can be grouped, allelic variants can be included in a polyvalent vaccine that could be widely effective.
On the other hand, diversity in Pfmsp-1 Block 2 was lower than reported in P. falciparum isolates studied in other geographic regions [16]. Neither RO33 nor MR Block 2 alleles were found; although these Block 2 alleles have been reported in other studies, including Venezuela [15,46]. The number of alleles detected using this direct sequencing method is lower than those detected using PCR-based genotyping methods of samples from other populations such as Kenya [15] [48,49]. The 3D7 vaccine strain has the MSP-1 19 kDa haplotype ETSRL, suggesting that if immunity conferred by monovalent vaccines is allele specific, it would have low overall efficacy in populations where the target allele is in the minority [50].
Results also showed two non-synonymous substitutions in R1 and R2 regions of Pfama-1. The extensive number of non-synonymous polymorphisms outside R1 and R2 described in this study revealed that this gene could undergo intense selective pressure in these other regions of AMA-1 upon administration of a full-length peptide vaccine having a single allele type. The overall genetic diversity in Pfama-1 domain I (π = 0.00697) is lower than in the Kenyan and Southeast Asian isolates (π values of 0.01361 and 0.01196, respectively) [23]. From the three different Pfama-1 alleles found, allele 3 was only present in samples from 2006 and could have arisen by a new introduction of this R1-R2 haplotype from another geographic region or by recombination between alleles 1 and 2 which existed in prior study-years.
Two non-synonymous and no synonymous substitutions were found in LSA-1 T1 in Peruvian isolates. In comparison, within Brazilian, Papua New Guinean and Kenyan isolates two synonymous substitutions (positions 87 and 104) and three different non-synonymous substitutions (positions 92, 95 and 104) have been reported [51]. These data suggest that T cell epitopes of P. falciparum LSA-1 are highly conserved in field isolates from geographically diverse regions with varying transmission patterns, since non much allelic diversity is observed. However, even in this low and recent transmission region of Peru, we detected two non-synonymous substitutions.
Pfssp-2 HABPs sequences were highly conserved among these Peruvian isolates. Part of the peptide 2 sequence has been reported to be a B cell epitope, recognized by sera from malaria immune humans living in endemic areas [52]. Peptides 3 and 4 are located within a region characterized by the presence of a great number of Asn, Lys and Pro residues. This long stretch of amino acids suggests configuration binding sites, which could be altered with just minimal changes in amino acid sequence [31]. For instance, results showed Pro in site 434 instead of Ser as in the 3D7 sequence; also, P3 showed a characteristic sequence in which amino acids Asp and Pro are present each forth residue and that may allow binding in different registers [28]. P1 and P5 have been identified as CD8+ Tlymphocyte epitopes and also showed to be highly conserved, with just one variation in aa 82 (Glu/Asp) compared to 3D7 strain.  [37].
Several studies have demonstrated the importance of T and B cell mediated immunity to malaria and how nonsynonymous substitutions may affect the conformation of epitopes and in consequence the immunological response [53]. Nevertheless, more studies need to be done in order to understand the immunological implications of amino acid changes in these malaria vaccine candidate antigens.
Studying different genes and their alleles help us to understand if they interact to influence in malaria infection and if minimal changes in their sequence could render a vaccine ineffective. An alternative to improve the effectiveness of a vaccine would be to create a construct using the most common regional specific alleles considering the genetic diversity found in the area. By providing information about the prevalence and dynamics of vaccine candidate antigens polymorphisms, an accurate construct could be built.

Conclusion
Peruvian isolates are less polymorphic than African and more like Indian populations. Although, conserved epitopes were found in Peru, the observation of uneven geographic distribution of polymorphisms and the high number of alleles distributed worldwide, especially for CSP and MSP-1 Block 2 may have an adverse impact on the effectiveness of vaccines. The number of allelic variants increased over time in this study, suggesting that even in geographic regions with low transmission, vaccine strategy development should include conduction of diversity studies. The uneven geographic distribution of alleles may jeopardize the formulation and use of vaccines directed on a specific variable loci since local variants may not be considered in the vaccine design.