Genetic polymorphism of Plasmodium falciparum circumsporozoite protein (PfCSP) and mismatches against RTS, S/AS01 malaria vaccine observed on Bioko Island, Equatorial Guinea and globally


 Backgroud RTS, S/AS01 is a Plasmodium falciparum circumsporozoite protein ( PfCSP ) based anti-malaria vaccine, but various genetic polymorphisms of PfCSP among global P. falciparum population could lead to mismatch against the PfCSP - based vaccine and reduce vaccine efficacy. This study aimed to investigate the genetic polymorphisms and natural selection of PfCSP in Bioko as well as global P. falciparum population. Methods From January 2011 to December 2018, 148 blood samples were collected from P. falciparum infected Bioko patients and 96 monoclonal sequences of them were successfully acquired and analyzed with 2200 global PfCSP sequences mined from MalariaGEN Pf3k Database and NCBI. Results In Bioko, the N-terminus of PfCSP showed limited genetic variations and the numbers of repetitive sequences (NANP/NVDP) were mainly found as 40 (35%) and 41 (34%) in central region. Most polymorphic characters were found in Th2R/Th3R region, where natural selection (p>0.05) and recombination occurred. The overall pattern of Bioko PfCSP gene had no obvious deviation from African mainland PfCSP (Fst=0.00878, p<0.05). The comparative analysis of Bioko and global PfCSP displayed the various mutation patterns and obvious geographic differentiation among populations from four continents (p<0.05). The global PfCSP C-terminal sequences were clustered into 138 different haplotypes (H_1 to H_138). Only 3.35% of sequences matched 3D7 vaccine strain haplotype (H_1). Conclusions The genetic polymorphism phenomena of PfCSP were found universal. The overall vaccine efficacy might be influenced by the low proportion of vaccine-matched isolates in global parasites population. Genetic polymorphism and geographical characteristics should be considered for future improvement of RTS, S/AS01.

Malaria, caused by Plasmodium spp. infections, is one of the most significant life-threatening infectious diseases to humans worldwide. According to the World Malaria Report 2019 (https://www.who.int/news-room/feature-stories/detail/world-malaria-report-2019), an estimated 228 million (95% confidence interval [CI]: 206-258 million) persons suffered from malaria infections worldwide, with 405,000 malaria deaths in 2018. Twenty countries accounted for 85% of global malaria cases in 2018; all these countries are in sub-Saharan Africa, except for India. Resistance to anti-malarial drugs and insecticides, coupled with the lack of availability of an effective vaccine, is the leading factors behind the parasite's continuing burden. Apart from its complex life cycle, which alternates between the human and the mosquito host, the malaria parasite also exhibits stages characterized by extensive genetic and antigenic diversity which may present adverse obstacles to anti-malarial control measures.
Currently, there is no licensed vaccine against malaria, though many efforts and studies have been performed in order to develop effective vaccines. However, a few potential vaccine candidates targeted against pre-erythrocytic, erythrocytic and sexual stages of Plasmodium falciparum are under various stages of clinical development [1,2]. In 2015, the European Medicines Agency for the immunization of children against malaria approved the RTS, S/AS01 vaccine [3]. This is currently the most clinically advanced malarial vaccine. Phase 3 clinical trials conducted in various sites in Africa showed that the RTS, S/AS01 vaccine has a protective efficacy of 45% in children in the first twenty months after vaccination [4,5]. In 2018, the World Health Organization through a large-scale pilot malaria vaccine implementation program (MVIP) aimed to introduce this vaccine in three sub-Saharan countries (Ghana, Kenya, Malawi) [3]. RTS, S/AS01 vaccine is a pre-erythrocytic stage vaccine based on the P. falciparum circumsporozoite protein (PfCSP) [6,7]. PfCSP is predominantly distributed on the surface of the sporozoites with a molecular mass of about 58 kDa. The structure of PfCSP can be divided into three distinct regions: a highly variable central repeat region flanked by a conserved Nterminal region and a C-terminal non-repeat region [8]. The central repeat region, which has been recognized as a major target for antibody-mediated neutralization, is rich in Asn-Ala-Asn-Pro (NANP) tandem repeats and contains a small number of Asn-Val-Asp-Pro (NVDP) motifs [8], constitutes immunodominant B cell epitopes. The C-terminal non-repeat region includes two polymorphic subregions, Th2R and Th3R, where T cell epitopes were identified.
The previous studies revealed higher single nucleotide polymorphisms (SNPs) of PfCSP within the P.
falciparum population from different geographic regions [9]. Indeed, most P. falciparum vaccine candidate antigens including PfCSP have been found to show various genetic and antigenic polymorphisms in global parasites, which can obstruct or reduce the efficacy of vaccines [10,11].
Therefore, understanding the genetic nature of vaccine candidate antigens in global P. falciparum isolates is critical for designing an effective vaccine. The aims of the present study are to investigate the polymorphism pattern of PfCSP gene and its diversifying selection in Bioko Island P. falciparum, and to elucidate how PfCSP gene is differentiated among global P. falciparum populations. Our results would fill in the blank of Bioko Island PfCSP data, as well as be helpful not only for understanding the molecular evolution of the PfCSP gene in P. falciparum but also for designing peptide vaccines based on the PfCSP antigen.

Study area
The study was carried out in Malabo Regional Hospital and the clinic of the Chinese medical aid team to the Republic of Equatorial Guinea. Ethical approval was obtained from the Ethics Committee of Malabo Regional Hospital. Bioko is an island 32 km off the west coast of Africa and located in the northernmost part of Equatorial Guinea. The island has a population of 334,463 (2015 census), of which approximately 90% live in Malabo (the capital city of Equatorial Guinea) in a humid tropical environment. Malaria due to P. falciparum is still the major public health problem on the island [12].
Since the Bioko Island Malaria Control Project (BIMCP) has launched at 2004, the parasite prevalence on Bioko from over 45% prevalence in 2004 to the 8.5% in 2016, and the reduction of entomological inoculation rate from more than 1,000 before 2004 to 14 in 2015 (www.mcdinternational.org).

Samples collection
A total of 148 blood spot samples were collected from the patients with uncomplicated malaria during January 2011-December 2018 in Bioko Island. Included patients were residents on Bioko Island aged between 4 months and 80 years. Malaria patients were classified into uncomplicated malaria states according to the WHO criteria, which were defined as positive smear for P. falciparum and presence of fever (≥37.5℃). Consents were obtained from all participating subjects or their parents.
Dried blood spots were collected on day zero of enrollment through finger prick bleeding spotted onto Whatman 903 ® filter paper (GE Healthcare, Pittsburgh, USA) for future use. Laboratory screening for malaria was done using rapid diagnostic tests (RDT) and confirmed using microscopic examination of blood smears. For quality control, archived malaria-positive microslides were re-examined and parasite density was recorded; The Plasmodium species were identified by a real-time PCR followed by high-resolution melting (HRM) [13]. The pGEM-T standard plasmids of four human Genomic DNA extraction Parasite genomic DNA was extracted from dried filter blood spots by Chelex-100 extraction method described in our previous article [14]. The DNA products were collected in sterile tubes and stored at -80℃ in reserve.

Amplification of the entire CSP gene
Entire PfCSP Gene (NCBI Gene ID: 814364) were amplified by nested PCR. For the first round PCR, 2ml of genomic DNA was amplified with 0.25ml 2×HotStart DNA Polymerase, 2ml dNTP Mixture, 5ml 5×PCR buffer, 1m1 10 mol/L forward primer (5'-CCGGTCATAAATTCTGAATTATCAA-3'), 1ml 10 mol/L reverse primer (5'-CTACAATTAATCGCAAACGTA-3'), and sterile ultra-pure water to a final volume of 25ml. Thermal cycling parameters for PCR were as follows: initial denaturation at 95℃ for 3 min; 30 cycles of 98℃ for 10 s and 68℃ for 90 s. For the second round PCR, 3ml of the primary PCR product was amplified in a 50ml reaction volume comprised of 0.4ml HotStart DNA Polymerase, 3.2ml dNTP Mixture, 8ml 5×PCR buffer), 1.6 ml 10 mol/L forward primer (5'-CGTGTAAAAATAAGTAGAAA CCACG-3'), 1.6 ml 10 mol/L reverse primer (5'-GTACAACTCAAACTAAG ATGTGTTC-3'), and sterile ultra-pure water to a final volume of 50ml. PCR procedure was as follows: initial denaturation at 95℃ for 3 min; 30 cycles of 98℃ for 10 s and 68℃ for 90 s. All PCR products were analyzed using 1.2% agarose gel electrophoresis, and then, they were purified and sequenced by using ABI 3730×L automated sequencer (Shanghai Yingjun Biotechnology Co., LTD, Guangzhou branch). To ensure the accuracy of the sequencing, we sequenced at least two clones for each isolate. Sequencing primers were the reverse primers of the second round PCR; all the sequences were analyzed and integrated by MEGA 6.0 software [15].

Sequences analysis
The PfCSP sequence of the laboratory-adapted P. falciparum strain 3D7 (NCBI Gene ID: 814364) was included in the alignment for comparison as a reference sequence. The values of segregating sites (S), number of Haplotypes (H), haplotype diversity (Hd), and observed average pairwise nucleotide diversity (π) were calculated using DnaSP version 6.12.01 [16]. The π was also calculated on a sliding window plot of 10 bases with a step size of 5 bp in order to estimate the stepwise diversity across the sequences. In order to test the null hypothesis of neutrality of PfCSP, the rates of synonymous (dS) and nonsynonymous (dN) substitutions were estimated and were compared by MEGA 6.0 program using Nei and Gojobori's method [17] with the Jukes and Cantor (JC) correction of 1000 bootstrap replications. Tajima's D test [18], Fu and Li's D and F statistics analysis [19] were performed using DnaSP 6.12.01 in order to evaluate the neutral theory of natural selection ( Table 1). The recombination parameter (R), which included the effective population size and probability of recombination between adjacent nucleotides per generation, and the minimum number of recombination events (Rm) were analyzed using DnaSP 6.12.01 (Table 1).
Sequence acquisition and global analysis. Project (release 5) [9] using samtools [20] and vcftools [21]; ii. 453 sequences of Philippines, Iran, India, Papua New Guinea (PNG), Vanuatu, Solomon Islands, Cameroon, Tanzania, Venezuela and Brazil were obtained from NCBI database (Additional File 1). Genetic polymorphism and tests of neutrality were calculated for each population using DnaSP 6.12.01 and MEGA 6.0 as described above. A logo plot was constructed for each PfCSP population using the WebLogo program (https://weblogo.berkeley.edu/logo.cgi). In order to investigate the genetic relationships among global PfCSP haplotypes, the haplotype network for C-terminal of PfCSP from Bioko and other 24 countries and areas listed above was constructed by Popart program (http://popart.otago.ac.nz) using Median-Joining method [22].
Prediction of impact of amino acid change upon protein structure The crystallized structure of CSP C-terminus, PDBID 3VDK [23] was applied in analysis. PolyPhen-2 [24] and SIFT [25] online serve was used to predict potential impact of amino acid substitutions on the structure or function. Using FOLDX plugin [26] in YASARA [27] to predict the changes in free energy before and after the mutations: ΔΔG(change) = ΔG(mutation) -ΔG(wild-type). As a rule of thumb we use: ΔΔG (change) > 0: the mutation is destabilizing; ΔΔG (change) < 0: the mutation is stabilizing.

Amplification of Bioko PfCSP
Of the 148 blood samples extracted from our collections in Bioko Island, 118 yielded suitable PfCSP amplicons for sequencing. Finally, 96 full-length monoclonal PfCSP were analyzed in this study and 22 polyclonal PfCSP were excluded. As expected, size variations were observed in the amplified PfCSP.
The approximate sizes of amplified products varied from 1.1 to 1.2 kb, which was mainly caused by differences in the number of tandem repeats in the central repeat region. These nucleotide sequences have been deposited at GenBank under Accession Numbers (MN623126-MN623221).

Genetic polymorphisms of N-terminal region of Bioko and global PfCSP
The N-terminal non-repeat region was relatively conserved in Bioko PfCSP. Compared with the 3D7 reference sequence (XM_001351086), five variations were found in PfCSP N-terminal region of Bioko parasites including L5F (2.08%, 2/96), R70K (1.04%, 1/96), D82N (1.04%, 1/96), A98G (24%, 23/96) and a 57 bp (encoding 19 amino acids of 80 NNGDNGREGKDEDKRDGNN 81 ) insertion (50%, 48/96). A comparative analysis of the N-terminal non-repeat region in global PfCSP also showed that the region is relatively well-conserved in global parasites. As shown in Figure 1A, the 19 amino acids length insertion and A98G were two major variations observed in global PfCSP. Almost all Asian and Oceanian countries showed a high frequency of insertion and A98G (ranging from 80% to 100%), but lower in African and American isolates (ranging from 15% to 79%). Meanwhile, some variations showed uneven geographic distributions and in relatively low frequencies. As shown in Figure  African countries ones. Compared to patterns of Asia, Africa and America, the one of Oceania was in relatively low diversity, especially in Th2R region, which nearly shows no nucleotide diversity ( Figure   2).
The parameters associated with nucleotide diversity and natural selection were also evaluated on Cterminus non-repeat region (311-363) of Bioko and global PfCSP ( Table 1). The average number of nucleotide diversity (K) of Bioko PfCSP was 5.775 and the overall haplotype diversity (Hd) was 0.962±0.008. The estimated value of dN-dS in Bioko PfCSP was found to be 0.0166 (Table 1). For further analysis of natural selection in the C-terminus of Bioko PfCSP, Tajima's test and Fu and Li's test were performed and the result was shown in Table 1. Both Tajima's D (-0.68556, p>0.1) and Fu and Li's F and D (-1.23926, p>0.1 and -1.22255, p>0.1, respectively) values were found to be negative.
As for globally situation, Hd of African countries were generally higher than others (Hd>0.9), which verified the higher level of genetic diversity on African PfCSP. The global dN-dS were shown as positive except Nigeria, and global Tajima's D values were deviation from 0 in different extents.
Recombination events were also evaluated among both Bioko and global PfCSP. As shown in Table 1, relative high recombination parameters were shown in all African countries and Philippines, Bangladesh and Venezuela, while lower recombination parameters in other countries.
In terms of amino acid, the mutation types and its frequencies in C-terminus (311-363) were briefly presented in Figure 3. There were totally 26 logos generated, one for 3D7 reference isolate and 25 for isolates from different countries and areas. As for Bioko PfCSP, mutations were detected at twelve positions (314, 317, 318, 321, 322, 324, 327, 352, 356, 357, 359, 361). All these positions were situated at two T-cell epitopes (Th2R and Th3R). The overall pattern of Bioko is similar to those of African countries. Relatively, more kinds of mutations existed in African isolates, as well as in Philippine and Venezuelan isolates. In contrast, the Oceanian mutation patterns were tended to more uncomplicated. Rare mutation L320I was only found in Philippines while S326A was only found in Venezuela. The high frequency mutation, A361E, existed in all 25 countries, while its wild type (A361) was mainly found in Africa. Notably, the wild type residues of 317, 318, and 321 positions were rarely seen in global PfCSP isolates, instead, K317E, E318K, E318Q, N321K were mainly found in these positions ( Figure 3).

Discussion
Bioko Island, Equatorial Guinea, is a historically high malaria transmission region [28]. Though BIMCP had launched in Bioko Island since 2004 and achieved a remarkable result, malaria is still a major health problem in this region. The genetic diversity and natural selection were analyzed in Bioko PfCSP and global PfCSP mined from database. In general, the polymorphism patterns between Bioko PfCSP and African mainland PfCSP have no obvious differentiation, although the geographic location of Bioko Island was relatively isolated. This result could be explained by the previous report that the human population of Bioko is highly mobile [12]. Consist with global PfCSP, the natural selection analysis revealed that Bioko PfCSP might under a selection effect although there is no statistical significance (p > 0.1). These findings were in line with our prior studies about P. falciparum merozoite surface protein-1/2 (PfMSP-1/2) and P. falciparum apical membrane antigen-1 (PfAMA-1) genes in Bioko Island [29,30].
N-terminal region of PfCSP plays an important role in the procedure of sporozoite invades to the hepatocytes [31]. In the N-terminal region of Bioko PfCSP and global PfCSP, the genetic polymorphism of N-terminal was in a relatively low level. 19 amino acids length insertion and A98G were universally popular while several novel mutations were found with low frequency. Some scientists verified previously that the antibodies against to N-terminal region could be produced by host immune system and could evoke a partial inhibition of sporozoite invasion of hepatocytes in vitro [32]. These evidences support the potential of the immunogenic and relatively conservative CSP N-terminus to be a component of the anti-malarial vaccine.
Central repeat region is an immunodominant epitope of PfCSP, and it had been applied to the component of RTS,S malaria vaccine [33]. Different numbers of tetrapeptide repeat was an important cause of PfCSP polymorphism. As expect, our results revealed the diversity of the repeat times.
Through the analysis among global different geographic regions, it was found that majority of samples possessed the tetrapeptide repeat ranging from 39 to 44 times. Though the various number of tetrapeptide repeat make no significant impacts on RTS,S vaccine efficacy [10], it was known to correlated with the stability of CS protein structure [34]. Mechanism and effect of this variation is still mystical. For the universality of this variation, deeper research towards to this region is still necessary.
Partial C-terminal region was also applied in the composition of RTS,S malaria vaccine, including the flexible Th2R and Th3R regions [33]. In the global comparison analysis, there are abundant polymorphisms within the C-terminus of the PfCSP, especially in Th2R and Th3R regions, the proven T cell immunogenic epitopes [35]. The C-terminus of African, Asian, American and Oceanian samples presented their own distinctive diversity patterns. No surprisingly, more polymorphisms were performed in the two larger-size parasite population (African and Asian) compared to those of America and Oceania. Because of the geographical isolation effect, some mutations showed the regional difference, for example the mutant at 325 position (N325Y) was only occurred in Asian countries; S326A was only found in Venezuela; wild type A361 was mainly observed in Africa, and so on. These phenomena indicated us that continuous monitor to these regional characteristic mutations, and exploration on their association with regional malaria epidemic situation are necessary.
In terms of C-terminal haplotypes analysis, 29 of 34 Bioko PfCSP haplotypes were shared with African continent samples while only 5 were limited to singleton, which implied that Bioko PfCSP was not completely independent of African continent. An obvious phenomenon was found that haplotypes from Oceanian PfCSP have closer genetic relationship with Asian haplotypes. Additionally, the same phenomenon happened among the parasites from America and Africa. It reflects that worldwide genotype of PfCSP C-terminus might divide into two major groups (Africa&America and Asia&Oceania), which probably caused by the frequent communication due to geographical advantages. It provides an insight of the vaccine design based on PfCSP that the regional differentiation might be took into consideration.
The absence of wild-type isolate was not the uncommon finding anymore [9,36]. And in Bioko Island, only 2% samples were found as 3D7 unsurprisingly. This exactly reflected that the perfectly RTS,Smatched isolates were rarely existed not only in Bioko Island but also around the world. An previous study stated that the 3D7-mismatched malaria could probably weaken the efficacy of RTS,S malaria vaccine [10]. Therefore, the imperfect RTS,S/AS01 vaccine effect[37] might likely associated with the lack of 3D7 standard isolate which perfectly matched the vaccine composition.
Mutation-effect prediction of 28 mutations in 16 heteromorphosis positions was carried out using two predicted method (Polyphen and SIFT), respectively [24,25]. 15 of 28 mutations were predicted as damaging. Notably, when mutations located at some positions, including 317, 318, 321, 322, 327 and 354, great changes have taken place on the free energy difference, which would result in destabilization on CS protein structure in difference extent. Especially position 317 and 354, proved by Neafsey et al. in previous report that the mismatch against RTS,S vaccine at these positions could significantly impair the RTS,S vaccine effect [10]. Based on the data above, the P.falciparum may execute the escape of the host immune system by means of changing the structure stability after mutating at these specific positions. The specific mechanism has not yet clear. Therefore, deeper exploration and continuous monitoring is still necessary.
The malaria vaccine, RTS,S/AS01, is progressing toward being an universal effective malaria vaccine deployed against a human parasite. According to this study, several suggestions were put forward: 1) The globally high frequency mutants instead of the wild-type ones of C-terminus were recommended to use for composing vaccine. 2) The immunogenic N-terminus together with central repeat region and C-terminus could try to be the new combination of vaccine component.
3) The regional differences should be considered in the improvement of universal malaria vaccine, majority divided as Asia-Oceania region and Africa-America region.

Conclusions
In this study, the genetic diversity of Bioko and global PfCSP was analyzed. The genetic polymorphism phenomena of PfCSP were found universal. Besides this, significant geographical differentiation of PfCSP were found around the world, while the vaccine-matched isolate was rare to found worldwide.       Haplotype network of C-terminal region among global PfCSP isolates. Isolates from four continents and Bioko Island were marked in five different series colors, blue series for Africa, red series for Asia, khaki series for Oceania, green series for America, and yellow for Bioko Island.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.