Genetic polymorphism of circumsporozoite protein of Plasmodium falciparum among Chinese migrant workers returning from Africa to Henan Province

Plasmodium falciparum malaria is recognized as a major global public health problem. The malaria vaccine was important because the case fatality rate of falciparum malaria was high. Plasmodium falciparum circumsporozoite protein (PfCSP) is one of the potential vaccine candidates, but the genetic polymorphism of PfCSP raises concerns regarding the efficacy of the vaccine. This study aimed to investigate the genetic polymorphism of PfCSP and provide data for the improvement of PfCSP-based vaccine (RTS,S malaria vaccine). Blood samples were collected from 287 Chinese migrant workers who were infected with P. falciparum and returning from Africa to Henan Province during 2016–2018. The Pfcsp genes were analysed to estimate the genetic diversity of this parasite. The results showed that there were two mutations at the N-terminus of imported Pfcsp in Henan Province, including insertion amino acids (58.71%, 118/201) and A → G (38.81%, 78/201). The number of repeats of tetrapeptide motifs (NANP/NVDP/NPNP/NVDA) in the central repeat region ranged mainly from 39 to 42 (97.51%, 196/201). A total of 14 nonsynonymous amino acid changes were found at the C-terminus. The average nucleotide difference (K) of imported Pfcsp in Henan Province was 5.719, and the haplotype diversity (Hd) was 0.964 ± 0.004. The estimated value of dN-dS was 0.047, indicating that the region may be affected by positive natural selection. The minimum number of recombination events (Rm) of imported Pfcsp in Henan Province was close to that in Africa. The analysis of genetic differentiation showed that there may be moderate differentiation between East Africa and North Africa (Fst = 0.06484), and the levels of differentiation in the other regions were very small (Fst < 0.05). The N-terminus of Pfcsp was relatively conserved, and the central repeat region and the Th2R and Th3R regions of the C-terminus were highly polymorphic. The gene polymorphism pattern among Chinese migrant workers returning from Africa to Henan Province was consistent with that in Africa. The geographical pattern of population differentiation and the evidence of natural selection and gene recombination suggested that the effect of polymorphism on the efficacy of PfCSP-based vaccines should be considered.

1 Department of Parasite Disease Control and Prevention, Henan Center for Disease Control and Prevention, Zhengzhou, China Full list of author information is available at the end of the article Background Malaria, caused by Plasmodium spp. infections, is one of the most significant life-threatening infectious diseases to humans worldwide. According to the World Health Organization (WHO) Malaria Report 2021, the total number of malaria deaths worldwide reached 627,000, equivalent to one death from malaria every minute in 2020. The incidence of malaria increased 69,000 in 2020 compared with that in 2019. However, affected by COVID-19, the diagnosis of malaria has declined, and the number of malaria deaths in sub-Saharan Africa has increased by 13%. [1]. Plasmodium falciparum is the most common parasite causing human malaria. It has the strongest pathogenicity and is also the main cause of severe malaria. Falciparum malaria patients with low immunity or untimely treatment easily develop severe malaria and even die [2,3].
Malaria has historically been a major health problem in Henan Province [4]. In 2010, Henan Province launched an action plan to eliminate malaria, achieved no indigenous infection cases in 2012, and passed the assessment of malaria elimination in 2019 [5][6][7]. Although indigenous malaria transmission has been effectively controlled, the problem of imported malaria infection from abroad has become increasingly prominent. In recent years, with the development of global trade and the transnational economy, especially the increasing number of workers and businessmen in malaria-prone areas, such as Africa and Southeast Asia, overseas imported infections caused by population mobility have become the main source of malaria cases in Henan Province, which has brought new challenges to the overall elimination of malaria [8][9][10]. The vast majority of imported malaria cases come from Africa, and P. falciparum has become responsible for these infections.
Developing a malaria vaccine that provides durable protection against clinical disease and completely prevents infection will be critical for controlling and eliminating malaria. Anti-sporozoite vaccines, such as RTS,S, which target P. falciparum circumsporozoite protein (CSP) expressed on the surface of sporozoites, are leading malaria vaccine candidates undergoing phase III clinical trials in malaria-endemic areas [11,12]. RTS,S malaria vaccine trials showed a significant effect in reducing the malaria incidence in many African countries including Ghana, Kenya, Mozambique, Gambia, Tanzania, and Gabon [13][14][15]. PfCSP is divided into three distinct regions: a highly variable central repeat region flanked by a conserved N-terminal region and a C-terminal nonrepeat region. The central repeat region, which has been recognized as a major target for antibody-mediated neutralization, is rich in Asn-Ala-Asn-Pro (NANP) tandem repeats and contains a small number of Asn-Vla-Asp-Pro (NVDP) motifs. The C-terminal nonrepeat region includes two polymorphism subregions, Th2R and Th3R, where T-cell epitopes have been identified [16][17][18][19]. PfCSP is predominantly distributed on the surface of sporozoites and has a molecular mass of approximately 58 kDa [20]. PfCSP has been found to show various genetic and antigenic polymorphisms in global parasites, which might obstruct or reduce the efficacy of vaccines [19,21,22]. The study of Pfcsp gene polymorphisms is an international research hotspot, but there have been few domestic research reports. This study aimed to determine the molecular characterization of falciparum malaria to produce a genetic characterization of Pfcsp, to understand the molecular evolution of the Pfcsp gene and to provide data for the improvement of PfCSP-based vaccines (including the RTS,S malaria vaccine).

Sample and data collection
Plasmodium falciparum-infected blood samples from patients, including finger-prick blood samples and venous blood samples, were collected. Blood samples (2 mL each) were collected from symptomatic patients before treatment. All of the patients came from Africa in 2016-2018. The samples were confirmed by PCR and microscopy examination. Samples were collected in Ethylene diamine tetraacetic acid (EDTA) tubes and transported to the Henan Province Center for Disease Control and Prevention (Henan CDC) the next day. The samples were stored at − 80 °C until laboratory analysis. A structured questionnaire was used to collect sociodemographic and clinical data from the subjects.

DNA template preparation
Plasmodium falciparum genomic DNA was extracted from the blood samples using a QIAamp DNA Mini kit (Qiagen, Valencia, CA, USA) following the manufacturer's instructions. TE buffer (10 Mm Tris-HCL, pH 8.0, 0.1 M EDTA) was used to dissolve the DNA and it was differentiation and the evidence of natural selection and gene recombination suggested that the effect of polymorphism on the efficacy of PfCSP-based vaccines should be considered.
Keywords: Imported malaria, Plasmodium falciparum, Circumparzoite protein, Genetic polymorphism, Henan Province, Africa stored at − 20 °C until use. A 1.5% agarose gel stained with ethidium bromide was used to check the quality of the DNA and it was visualized with UV illumination.

Pfcsp gene amplification and sequencing
A PCR amplification method was used to amplify the Pfcsp gene. The primers were as follows: PfCSP -F (5'-CGT GTA AAA ATA AGT AGA AAC CAC G -3'), PfCSP -R (5'-TGT ACA ACT CAA ACT AAG ATG TGT TC-3') [23]. Amplification reactions were performed in a 60 µL reaction volume containing 2 µL of DNA sample, 30 µL of a 2 × Go Taq Green Master Mix (Promega Inc., Madison, WI, USA), 3 µL of target primers, and 22 µL of ddH 2 O. PCR was performed with the following conditions: 94 ℃ for 1 min, followed by 35 cycles of 94 ℃ for 30 s, 50 ℃ for 30 s, and 72 ℃ for 2 min and a final extension of 72 ℃ for 5 min. Sequencing was conducted by Shanghai DNA Bio Technologies Co., Ltd. (Shanghai, China). All PCR products were analysed using 1.5% agarose gel electrophoresis and were then they were purified and sequenced by using an ABI 3730 × L automated sequencer. To ensure the accuracy of the sequencing, at least two clones for each isolate were sequenced.

Statistical analysis Sequence alignment and amino acid polymorphism analysis
Sequence alignment and analysis were carried out using Bio-Edit software. The amino acid sequences were compared with the 3D7 strain (XM_001351086) as a reference sequence. The sequences of the amplicons were aligned with published data from the 3D7 strain from the NCBI database by BLAST analysis.

Nucleotide polymorphism, natural selection, and gene recombination analysis
Nucleotide polymorphism, natural selection, and gene recombination were analysed using DnaSP 6.12.03 software [24]. For one of the indicators of nucleotide sequence polymorphism, nucleotide diversity (π) was calculated by the Jukes and Cantor method with a sliding window length of 10 bp and step size of 5 bp. The sliding window diagram was used to estimate the stepwise diversity between sequences. The values of segregating sites (S), number of haplotypes (H), and haplotype diversity (Hd) were calculated by DnaSP 6.12.03 software. To test the null hypothesis of Pfcsp neutrality, the rates of synonymous (dS) and nonsynonymous (dN) mutations were estimated and compared by MEGA 7.0.26 software [25].
Tajima's D test (α = 0.05) and Fu and Li's D and F test (α = 0.05) were used to evaluate the neutral theory of natural selection. Tajima's D and Fu and Li's D and F statistics were positive (D > 0, F > 0), indicating that it was a positive selection; A negative statistic (D < 0, F < 0) indicating that it was a negative selection [26].
R represents the occurrence of gene recombination. Ra is the recombination probability between adjacent nucleotides of each generation; Rb refers to the recombination estimation of the whole gene, that is, the effective population size; and Rm is the minimum number of reorganization events.

Population differentiation analysis
For population differentiation analysis, Arlequin 3.5.2.2 software and the R program were used to calculate the Fst index [27]. Fst was used to measure the degree of population differentiation, ranging from 0 to 1. If the Fst was 0 ~ 0.05, it indicated that the genetic differentiation between populations was very small, which cannot be considered; if the Fst was 0.05 ~ 0.15, it indicated that there was moderate genetic differentiation among populations; if the Fst was 0.15 ~ 0.25, it indicated that the genetic differentiation among populations was large; and if the Fst was more than 0.25, it indicated that there was great genetic differentiation among populations [28].

Respondent characteristics
A total of 287 blood samples were collected from patients who were infected with P. falciparum returning from 27 countries of Africa to Henan Province during 2016-2018. The male: female ratio was 56.5:1 (282/5). The age ranged from 19 to 71 years old, of which the proportion of patients who were 18 to 55 years old was 98.61% (283/287). The imported patients all came from African countries, including countries in East Africa, West Africa, South Africa, North Africa, and Central Africa, which accounted for 7.66%, 40.07%, 28.23%, 1.05%, and 22.99% respectively. The positive rate of Pfcsp gene PCR amplification was 91.29% (262/287), and the size of the amplification product was 1100-1300 bp. After gene sequencing, 262 amplified positive products successfully obtained the full-length Pfcsp sequence. Ultimately, 201 full-length monoclonal Pfcsp sequences were analysed in this study, including 83 in West Africa, 57 in South Africa, 48 in Central Africa, 10 in East Africa, and 3 in North Africa, while 61 polyclonal Pfcsp sequences were excluded (Table 1).

Central repeat region gene polymorphism of Pfcsp
As shown in Fig. 1B, thirty-one unique haplotypes were identified in imported PfCSP at the amino acid level. Haplotypes completely consistent with the 3D7 reference sequence were not found. Two repeat haplotypes encoding NPNP and NVDA were found in H28 and H29.

C-terminal amino acid polymorphism of PfCSP
A total of 52 different haplotypes (H1-H52) were identified in the C-terminal nonrepetitive region of imported PfCSP in Henan Province, of which H7 (4.48%, 9/201) was completely consistent with the 3D7 reference sequence. The Th2R and Th3R regions were highly polymorphic, and 14 nonsynonymous amino acid changes were found. A total of 8 of 14 were located in Th2R Fig. 1 A Analysis of the N-terminal polymorphism of imported PfCSP in Henan Province. Dots represent the same residue as the 3D7 reference sequence. Dashes indicate intervals to maximize alignment. The yellow shaded area represents the predicted T-cell epitope area. The underlined conserved motif (KLKQP) was involved in the invasion of spores into mosquito salivary glands and bound to hepatocytes before invasion. The number of samples refers to the frequency of each haplotype. B Polymorphism characteristics of the central repeat region of imported PfCSP in Henan Province; 1: tetrapeptide motif NANP; 2: tetrapeptide motif NVDP; 3: tetrapeptide motif NPNP; 4: tetrapeptide motif NVDA; 1 and 2 represent known tetrapeptide repeats, and yellow shaded areas 3 and 4 represent newly discovered tetrapeptide motifs ( 314 KHIKEYLNKIQNSL 327 ), including K314Q, K317E/T, E318Q/K, N321K/Q, K322T/R/I/ E, Q324K/R, N325Y, L327/I. N352D/G, P354S, D356N, E357Q, D359N, while A361E/I were located in Th3R ( 352 NKPKDELDY-AND 363 ). Th2R and Th3R were identified as T-cell epitope regions (Fig. 2).

C-terminal nucleotide polymorphism, natural selection, and gene recombination of Pfcsp
The nucleotide diversity (π) of the C-terminal nonrepeat region was analysed in the imported Pfcsp of Henan Province. The sliding window diagram showed that the T-cell epitope regions Th2R and Th3R had high nucleotide diversity, while the connecting region between Th2R and Th3R was highly conserved. The nucleotide diversity in the Th2R region was higher than that in the Th3R region in the imported PfCSP of Henan Province (Fig. 3).
For natural selection and gene recombination, the Henan Province imported Pfcsp and African native Pfcsp were evaluated. The average nucleotide difference (K) of the imported Pfcsp in Henan Province was 5.719, and the haplotype diversity (Hd) was 0.964 ± 0.004. The estimated value of dN-dS was 0.047, indicating that the region may be affected by positive natural  (Table 2).

C-terminal population differentiation of Pfcsp
Through the analysis of genetic differentiation between populations, the genetic differentiation among imported Pfcsp populations in Henan Province showed that except that there may be moderate genetic differentiation between East Africa and North Africa (Fst = 0.06484), the genetic differentiation among other regions was very small (Fst < 0.05), which cannot be considered (Table 3).

Discussion
The N-terminal region of PfCSP plays an important role in the process of sporozoite invasion into hepatocytes by mediating or promoting the interaction between sporozoites and host cells [29,30]. The N-terminal genetic polymorphism in the imported PfCSP population in Henan Province was at a low level, and the N-terminal polymorphism pattern was consistent with the African PfCSP polymorphism pattern. This might also be related to the fact that the malaria cases imported from Henan Province were all from Africa. The above results were similar to those of previously reported studies [19][20][21]. According to the results of study by Huang et al., five variations were found in the Pfcsp N-terminal region of Bioko parasites including L5F, R70K, D82N, A98G, and a 57 bp insertion (encoding 19 amino acids: 80 NNGDNGREGKD-EDKRDGNN 81 ) insertion. Two variations were found in the Pfcsp N-terminal region in this study. The study by Huang et al. demonstrated that A98G and 19 amino acid length insertions were universally popular while several novel mutations were found with low frequency [20]. Notably, none of the sequenced Sudanese isolates showed any insertions in the N-terminal region such as the 19 amino acid insertion (NNGDNGREG-KDEDKRDGNN) that was found in the middle of the N-terminal region. This result was attributed to the sample sizes [31]. Larger sample size from other different regions and the selected regions of this study might provide different results if this insertion occurs by chance in the Sudanese Pfcsp gene [32]. The N-terminal region can be an attractive component of PfCSP-based vaccine due to the N-terminus of imported Pfcsp was relatively conserved.
The central repeat region of PfCSP played a key role in sporozoite formation and development [33]. The results of this study showed that the number of repeats of tetrapeptide motifs (NANP/NVDP/NPNP/NVDA) ranged from 38 to 43. Huang et al. reported that the numbers of repetitive sequences (NANP/NVDP) were mainly found as 40 and 41 in Bioko PfCSP [20]. Two repeat haplotypes encoding NPNP and NVDA were found in H28 and H29, and the result differed from the results of Lê et al. [19]: two novel repeat haplotypes, which encode NTNP and NANS motifs, were identified in two haplotypes (H3 and H9) of Myanmar PfCSP. Imported PfCSP in Henan Province had a high number of tetrapeptide repeats in the central repeat region, as 70.15% of Pfcsp had between 40 and 41 repeats. In addition, two novel tetrapeptide motifs NPNP and NVDA were found. Other tetrapeptide motif forms have been reported in the literature, including NVVP, NAKP, NAHP, NAIP, NVNP, NANL, NVAD, NADP, KANP, and SANP. It was unclear how these tetrapeptide motifs changed and how different positions affected the antibody response to CSP. The central repeat region is important in the PfCSP-based vaccine (RTS, S malaria vaccine). However, no studies indicated that the various number of tetrapeptide repeats can or may affect the effectiveness of the RTS, S malaria vaccine [34]. Therefore, the polymorphism in this region requires further in-depth study and analysis.
Abundant polymorphisms were found in the C-terminal analysis of PfCSP, especially in the thrombospondin type-I repeats (TSRs, small adhesive domains containing approximately 60 amino acid residues that mediate a broad range of biological interactions) region (including Th2R and Th3R), which confirmed T-cell immunogenic epitopes. The overall values of Hd (0.964 ± 0.004) in the C-terminal region of PfCSP were higher than those in previously reported studies [19,21]. The genetic diversity in the C-terminal nonrepeat region among global PfCSP has been reported. The overall values for haplotype and nucleotide diversity for the PfCSP C-terminal region were higher in African PfCSP than in PfCSP from other continents, indicating that African PfCSP had a higher level of genetic diversity; the results of Zeeshan et al. was similar to this study [21]. The comparative analysis of the sliding window diagram of π in the C-terminal region showed that there were two peaks in the Th2R and Th3R regions, indicating that the genetic variation was mainly concentrated in these two regions. For natural selection of the C-terminal region, both Tajima's D and Fu and Li's D and F values were positive, indicating that the region may be experiencing positive selection, but these observations might be somewhat different from the study of Huang et al. [20]. Some previous studies revealed that the C-terminal region might be in a state of balanced selection to maintain or produce the genetic diversity of the global PfCSP population, and the value of Tajima's D in other regions were positive and highly polymorphic, which might be due to the balanced selection of this immunogenic epitope by host immune pressure [35][36][37]. The Rm value of imported PfCSP in Henan Province was close to that in Africa, possibly because these were people returning from Africa. Based on the study of Lê et al., these results indicate that high Rm values were predicted for African PfCSP, while lower levels of Rm were identified in PfCSP from other geographical areas, which may be due to the high polyclonal infection rate of this population and the subsequent crossfertilization and active recombination of mosquitoes [19]. The RTS,S vaccine is composed of the C-terminal T-cell epitope, this region can be very important in terms of designing a specific vaccine.
The population differentiation analysis revealed that the genetic relationship between PfCSP in East Africa, West Africa, South Africa, and Central Africa was very close, and there was almost no differentiation, while North Africa and East Africa showed slight differentiation, which may be due to the small sample size of North Africa and the non-representativeness. Thus, the imported PfCSP did not consider geographical differences.
The limitation of this study was that there was no further study on the effects of the amino acid mutations on the structure or function of CSP to predict the effect of amino acid mutations on the efficacy of PfCSP-based vaccines.

Conclusions
PfCSP is a main component of RTS,S, the most advanced malaria vaccine currently, but the genetic diversity in the Pfcsp gene among the different regions may affect the efficacy of the RTS, S malaria vaccine. In this study, the analysis of the genetic diversity of imported PfCSP in Henan Province indicated that N-terminus non-repeat region was relatively conserved, but the central repeat region and the Th2R and Th3R regions of the C-terminus were highly polymorphic. According to natural selection and gene recombination, the maintenance and production of genetic polymorphisms were speculated. The gene polymorphism pattern was consistent with that in Africa. These findings filled in missing data of imported PfCSP data in Henan Province and provided valuable information for the improvement of the PfCSP-based vaccines (including the RTS,S vaccine).