Genetic polymorphism and evidence of signatures of selection in the Plasmodium falciparum circumsporozoite protein gene in Tanzanian regions with different malaria endemicity

Background In 2021 and 2023, the World Health Organization approved RTS,S/AS01 and R21/Matrix M malaria vaccines, respectively, for routine immunization of children in African countries with moderate to high transmission. These vaccines are made of Plasmodium falciparum circumsporozoite protein (PfCSP), but polymorphisms in the gene raise concerns regarding strain-specific responses and the long-term efficacy of these vaccines. This study assessed the Pfcsp genetic diversity, population structure and signatures of selection among parasites from areas of different malaria transmission intensities in Mainland Tanzania, to generate baseline data before the introduction of the malaria vaccines in the country. Methods The analysis involved 589 whole genome sequences generated by and as part of the MalariaGEN Community Project. The samples were collected between 2013 and January 2015 from five regions of Mainland Tanzania: Morogoro and Tanga (Muheza) (moderate transmission areas), and Kagera (Muleba), Lindi (Nachingwea), and Kigoma (Ujiji) (high transmission areas). Wright’s inbreeding coefficient (Fws), Wright’s fixation index (FST), principal component analysis, nucleotide diversity, and Tajima’s D were used to assess within-host parasite diversity, population structure and natural selection. Results Based on Fws (< 0.95), there was high polyclonality (ranging from 69.23% in Nachingwea to 56.9% in Muheza). No population structure was detected in the Pfcsp gene in the five regions (mean FST = 0.0068). The average nucleotide diversity (π), nucleotide differentiation (K) and haplotype diversity (Hd) in the five regions were 4.19, 0.973 and 0.0035, respectively. The C-terminal region of Pfcsp showed high nucleotide diversity at Th2R and Th3R regions. Positive values for the Tajima’s D were observed in the Th2R and Th3R regions consistent with balancing selection. The Pfcsp C-terminal sequences revealed 50 different haplotypes (H_1 to H_50), with only 2% of sequences matching the 3D7 strain haplotype (H_50). Conversely, with the NF54 strain, the Pfcsp C-terminal sequences revealed 49 different haplotypes (H_1 to H_49), with only 0.4% of the sequences matching the NF54 strain (Hap_49). Conclusions The findings demonstrate high diversity of the Pfcsp gene with limited population differentiation. The Pfcsp gene showed positive Tajima’s D values, consistent with balancing selection for variants within Th2R and Th3R regions. The study observed differences between the intended haplotypes incorporated into the design of RTS,S and R21 vaccines and those present in natural parasite populations. Therefore, additional research is warranted, incorporating other regions and more recent data to comprehensively assess trends in genetic diversity within this important gene. Such insights will inform the choice of alleles to be included in the future vaccines.


Background
Malaria remains a major public health problem worldwide, with nearly half of the world's population at risk and lives in areas where malaria transmission occurs, but with the majority of cases and deaths from sub-Saharan (SSA) [1].In 2022, over 95% (233 million) of malaria cases and 96% of all deaths were reported from 29 countries in the World Health Organization (WHO) region of Africa (WHO-AFRO).The majority of the deaths occurred in Nigeria (31%), the Democratic Republic of the Congo (12%), Niger (6%) and the United Republic of Tanzania (4%) [2].In Tanzania, malaria is still a leading cause of morbidity and mortality mainly among children and pregnant women, and about 93% of the population lives in malaria-transmission areas [3].However, in recent years, the pattern of malaria in Tanzania has become heterogeneous, with a consistently higher burden in north-western, western and southern regions where high prevalence in school children (reaching 82%) has been reported, while other regions have consistently low to very low prevalence over the past 20 years [3].
The WHO Global Technical Strategy (GTS) for Malaria 2016-2030 aims to reduce 90% of both malaria incidences and mortality rates by 90% of the 2015 levels by 2030 and proceed to the elimination targets [4].Malaria eradication which is projected by 2050 [5], will greatly depend on the use of new and innovative tools as well as enhanced use of current interventions, including surveillance, monitoring and evaluation (SME), vector control, and case management through parasitological testing with rapid diagnostic tests (RDTs) and treatment with effective anti-malarials (artemisinin-based combination therapy-ACT) [4].However, malaria control strategies face many challenges including emerging drug and insecticide resistance and histidine-rich protein 2/3 (Pfhrp2/3) gene deletions, limited diagnostic capacity, socio-cultural hindrance and a lack of effective vaccines [4,6,7].Urgent measures are required to get malaria elimination efforts on track and progressing to the GTS targets by 2030.
Malaria parasites have a complex life cycle in both human and mosquito hosts, including an asymptomatic stage (pre-erythrocytic), followed by a symptomatic blood stage (erythrocytic) in humans and a sexual stage in the mosquito host.The parasite also exhibits stages characterized by extensive genetic and antigenic diversity and these factors which are associated with the biology of parasites and vectors may present obstacles to anti-malarial control measures [8,9].As with other pathogens, vaccines are among the most successful and cost-effective interventions in the history of public health [10].For effective malaria control and elimination strategies, it is important to develop a malaria vaccine that provides durable protection against clinical malaria and prevents infection.Efforts to develop a malaria vaccine have been ongoing since the 1980s and focused on vaccines targeting specific stages of the parasite life cycle such as pre-erythrocytic, erythrocytic and sexual stage vaccine candidates [11][12][13].Antigens expressed in the pre-erythrocytic (sporozoites and liver) stages represent the ideal vaccine candidate to block the progression to clinical malaria through vaccine-based interventions [14,15].RTS,S/AS01 is a pre-erythrocytic vaccine based on the circumsporozoite protein (PfCSP) of the 3D7 laboratory strain of P. falciparum, which is a clone of the NF54 strain [9].It is the first malaria vaccine to be endorsed by the WHO for routine immunization of children in moderate to high transmission areas [16].The vaccine is made of a fragment of CSP antigen fused with Hepatitis B surface antigen and AS01 adjuvant [17].It targets specific immunogenic epitopes expressed on the surface of the Pfcsp gene and offers better protection against malaria infections [17,18].Clinical trials of the RTS,S vaccine which were conducted in African countries including phase III trials involving children aged 5-17 months and 6-12 weeks from Mozambique, Gabon, Gambia, Ghana, Kenya and Tanzania showed significant protection against natural falciparum infections [19][20][21][22][23].The overall efficacy of the vaccine administered at 0-, 1-, 2-month followed by a fourth dose administered at 18 months was 36.3% (95% CI 31.8-40.5)after an average of 48 months of follow-up [19][20][21][22][23].
The R21/Matrix-M vaccine is the next-generation preerythrocytic vaccine based on CSP, similar to the RTS,S/ AS01 vaccine [24].The R21 vaccine contains only the CSP-HBsAg fusion protein, resulting in a higher density of CSP on the surface antigen compared with the RTS,S/ AS01 vaccine (Fig. 1).The R21 vaccine was created by combining the C-terminal segment of CSP from the P. falciparum strain NF54 [24,25].The utilized CSP portion encompasses 19 NANP repeats sourced from the central repeat region, recognized as a protective B cell epitope, along with the C-terminal region housing essential T cell epitopes.It is the second malaria vaccine endorsed by the WHO, succeeding the RTS,S/AS01 vaccine, and is recommended for use in children residing in regions with low to moderate malaria prevalence [2].This vaccine has successfully completed phase III clinical trials, involving children aged 5-36 months, at five sites in four countries across West Africa (Burkina Faso and Mali) and East Africa (Kenya and Tanzania), each with varying malaria transmission patterns.Results from this study have demonstrated a high efficacy of more than 75% against clinical malaria in African children, addressing both seasonal and perennial transmission settings [26,27].However, a study from elsewhere has indicated that the CSP-based vaccine demonstrates improved efficacy against clinical malaria when infections match the 3D7/NF54 vaccine construct at epitope haplotypes and amino acid positions.This implies that the vaccine's efficacy may partially depend on the prevalence of the C-allele in a given geographical location [28,29].
PfCSP is predominantly a surface protein of sporozoite and plays a critical role in the invasion of hepatocytes (liver cells) [30,31].The gene encoding this protein is located on chromosome 3 of P. falciparum within regions that encode epitopes recognized by the human immune system.The C-terminal repeat region of the Pfcsp gene, which codes for epitopes recognized by anti-CSP antibodies, contains tetrameric repeats that vary in both sequence and number of tetramers [32][33][34].For cellmediated immunity, the RTS,S/AS01 vaccine includes a fragment of the central NANP-NVDP repeats polymorphic B-cell epitope region and a highly polymorphic C-terminal non-repeat epitope region of Pfcsp, which covers CD4+ and CD8+ T-cell epitopes denoted as TH2R and TH3R, respectively [9,35].Previous studies revealed high polymorphisms in these regions within the C-terminal of the Pfcsp gene in natural parasite populations, which might have resulted from natural selection by the host immune system [36,37].
Reports from previous studies of the population structure of P. falciparum genes coding for vaccine antigens, including the Pfcsp showed variable levels of diversity and geographic restriction of specific subgroups which may have an impact on the efficacy of malaria vaccines based on this gene in specific geographic regions [36].In particular, comparative diversity studies of the 3D7 Pfcsp gene showed that only 0.2% to 5.0% of the vaccine strains matched the global Pfcsp gene [38].A larger number of these studies described polymorphisms in the Pfcsp gene at the global level, with limited local-specific studies.
Therefore, there is an urgent need to explore the extent of genetic diversity and natural selection of the malaria vaccine antigen Pfcsp in natural parasite populations in Tanzania and other countries where the RTS,S/AS01 and R21/Matrix-M vaccines will be deployed in the near future.This study investigated the genetic diversity and population structure of the Pfcsp gene within five regions of Tanzania with varying malaria transmission intensities.In addition, the study aimed to determine whether genetic diversity and signatures of selection in this important gene are present.The findings present baseline data before the Ministry of Health rolls out any of the malaria vaccines in an effort to eliminate malaria in Tanzania.
Fig. 1 The diagram illustrates the structure of the circumsporozoite protein subunit vaccines RTS,S and R21.Unlike RTS,S, the R21 particle exclusively comprises the CSP-HBsAg fusion protein

Study area
Field studies that generated samples and data for this study were conducted between May 2013 and January 2015 as part of the Pathogen Diversity Network Africa (PDNA) baseline surveys in 15 countries from Sub-Saharan Africa (SSA) [39][40][41].In Tanzania, the studies covered five districts from five regions with varying malaria transmission that are located more than 800 km apart.The study districts included Muheza in the Tanga region and Morogoro (moderate to low transmission area), and Ujiji in Kigoma, Muleba in Kagera and Nachingwea in Lindi (all with high transmission), as described elsewhere [39] (Fig. 2).

Study design and sampling
The samples were collected through a cross-sectional study (CSS), which was conducted in five districts of Morogoro, Muheza, Muleba, Nachingwea and Ujiji in 2013 while other samples were collected from two therapeutic efficacy studies (TES) which were conducted at Mkuzi Health Centre in 2013, and in Muheza designated district hospital and Ujiji health centre from May 2014 to January 2015 as described elsewhere [39].The TES were conducted according to the WHO protocol as previously described [42,43].

Sample collection, processing, DNA extraction and Sequencing
Whole blood samples were collected from patients with malaria infections who were initially screened with rapid diagnostic tests for malaria (RDTs) and the infections were confirmed using microscopy as described earlier [39].Human white blood cells (WBCs) were depleted from whole blood samples using CF11 cellulose columns [44], and parasite genomic DNA was extracted using QIAamp DNA blood midi kits (Qiagen GmbH, Hilden, Germany) following the manufacturer's instructions.DNA samples were shipped to the Wellcome Trust Sanger Institute in Hinxton, UK, for whole genome sequencing (WGS) using the Illumina HiSeq platform as part of the MalariaGEN Plasmodium falciparum Community Project [45].Illumina sequencing libraries (200 bp insert) were aligned to the reference P. falciparum 3D7 genome, after which variant calling was conducted via the customized GATK pipeline [45].Each Fig. 2 Map of Tanzania showing P. falciparum prevalence as a colour gradient, and the location of the study regions (74) sample was genotyped for 797,000 polymorphic biallelic coding single nucleotide polymorphisms (SNPs) across the genome, ensuring a minimum of 5× paired-end reads coverage across each variant per sample.The dominant allele was retained in the genotype file at loci with mixed reads (reference/non-reference).

Sequence acquisition and pre-processing
WGS data of 589 P. falciparum samples collected from the five districts of Morogoro (n = 32), Muheza (n = 297), Muleba (n = 52), Nachingwea (n = 65) and Ujiji (n = 143) were downloaded from the MalariaGEN Plasmodium falciparum Community (Pf7k) project in variant call format (VCF) at https:// www.malar iagen.net/ resou rce/ 34.The VCF file was generated following standardized protocols [46].In brief, reads mapping to the human reference genome were discarded before conducting any analyses.The remaining reads were then mapped to the P. falciparum 3D7 reference genome using Burrows-Wheeler aligner (bwa) mem version 0.7.15 [47].Binary Alignment Map (BAM) files were created using the Picard tools [48] CleanSam, FixMateInformation and Mark-Duplicates version 2.6.0 and GATK v3 [49] base quality score recalibration.Potential SNPs and indels were discovered by running GATK's HaplotypeCaller independently across each sample BAM file and genotyping these for each of the 16 reference sequences (14 chromosomes, 1 apicoplast and 1 mitochondria) using GATK's Com-bineGVCFs and GenotypeGCVFs.SNPs and indels were filtered using GATK's Variant Quality Score Recalibration (VQSR).All variants with a variant quality score log-odds (VQSLOD) score ≤ 0 were filtered out and functional annotations were applied using snpEff version 4.1.Genome annotation was performed using vcftools version 0.1.10and masked if they were outside the core genome.Only biallelic SNPs from chromosome 3 that passed all VCF filters were retained.Then SNP variants for the entire Pfcsp were extracted from chromosome 3 (position: 221,323-222,516) for further analysis.

Population genetics analysis
The analysis included determining the minor allele frequency (MAF) distribution for all putative SNPs within Pfcsp using Plink1.9, and rare alleles (MAF ≤ 0.01) were removed from the analysis.Wright's inbreeding coefficient (F ws ) [50] was determined using R package moimix [51].F ws refers to the number of different parasite strains contained within an infection, and it ranges from 0 to 1.A sample is classified as having multiple infections (polyclonal) when F ws < 0.95 and a single infection (monoclonal) when F ws ≥ 0.95.A Pearson chi-square test was used to determine the statistical significance of any differences observed in F ws estimates between the populations, and a standard threshold of p < 0.05 was considered statistically significant.
To assess gene flow between parasite populations, genetic differentiation was first estimated using the Wright Fixation index (F ST ) [52] using Vcftools v0.1.5and population structure was determined using principal component analysis (PCA) as implemented in PLINK1.9.F ST < 0.05 indicates minimal population differentiation or gene flow between population pairs.Haplotype diversity (the number of two randomly selected strains within the population having different haplotypes) in Pfcsp was determined by studying the variants in the C-terminal region of the gene (221,422-221,583 on chromosome 3).A total of 229 FASTA DNA sequences were reconstructed from the retained monoclonal VCF samples and the Pfcsp sequence of the 3D7 strain was included as a reference.The monoclonal VCF samples were obtained based on the F ws results, where samples with F ws ≥ 0.95 were considered a single infection (monoclonal).The average number of pairwise nucleotide diversity (K), haplotype diversity (Hd) and nucleotide diversity were calculated using DnaSP version 6.12.03 [53].To assess the genetic relationships between Pfcsp C-terminal haplotypes in the five districts, the haplotype networking for 229 sequences of Pfcsp was analysed using NETWORK version 10.2 with the Median-joining algorithm [54].
Nucleotide diversity (π) was used to measure the degree of polymorphism within the parasite population, and it was analysed by calculating the pairwise difference between all possible pairs of individuals in the samples using Vcftools v0.1.5[55] in the sliding windows, with a size of 1000 bp.The Tajima's D statistical test [56] was used to detect evidence of balancing selection or purifying selection in the Pfcsp.Tajima's D was calculated in Pfcsp monoclonal samples using Vcftools v0.1.5 in 1000 bp sliding windows.Tajima's D test compares the average pairwise differences (π) and the total number of segregating sites (S) [57] and it was assessed using MEGA software version 11.0.13[58] and DnaSP version 6.12.03 [53].

Within-host parasite diversity estimation and statistical analysis
Within-host diversity in the Pfcsp gene was assessed using the inbreeding coefficient (F ws ) and the results indicated high within-host diversity (polyclonality), varying from 69.23% in Nachingwea to 56.9% in Muheza.However, the observed differences were not statistically significant (p = 0.3596) (Fig. 3).

Genetic differentiation and structure analysis
Genetic differentiation in the Pfcsp population was analysed by calculating Wright's fixation index (F ST ).The mean F ST across five districts-Morogoro, Muheza, Muleba, Nachingwea and Ujiji was very low at 0.0068 (which is < 0.05) (Fig. 4a).This indicates low genetic differentiation among the parasite populations or a high exchange of genetic materials despite the districts being geographically separated with a distance of over 800 km.Principal component analysis (Fig. 4b) showed no population structure between the five districts which suggest gene flow between the populations.

Genetic diversity of Pfcsp C-terminal within parasite populations
To assess the extent of genetic diversity and similarity within and between the five populations, an investigation into the diversity in the C-terminal region of Pfcsp (221,422-221,583) was done and summarized in a median-join haplotype diversity networks.In total, 50 haplotypes were observed among the 230 Pfcsp sequences, including the 3D7 and NF54 sequences.Of these, more haplotypes were identified in Ujiji [24] and Muheza [27] while the other districts had fewer haplotypes between 8 and 14 (Morogoro = 8, Muleba, 14, Nachingwea = 14).The RTS,S vaccine haplotype (Pf3D7type) was found in 2% (H_50) of all samples.Four haplotypes (8%) were shared by Pfcsp sequences from all five regions, indicating genetic closeness between these populations.Haplotype 5 emerged as the most prevalent Pfcsp C-terminal haplotype, representing 16.5% (38/230) of the isolates.Among the haplotypes, 50% appeared only once (singleton haplotypes) (Fig. 5a).While with NF54 sequence, 49 haplotypes were observed among the 230 Pfcsp sequences and of these, 10, 19, 20, 52 and 128 were found in the Morogoro, Muleba, Nachingwea, Ujiji and Muheza, respectively.The R21 vaccine haplotype (NF54type) was found in 0.4% (Hap_49) of all samples.Four haplotypes (41.3% of the isolates) were shared by Pfcsp sequences from all five regions, suggesting genetic closeness between these populations.Haplotype 5 was the most prevalent Pfcsp C-terminal haplotype representing 16.5% (38/230) of the isolates.Of the haplotypes, 73.5% (36/49) appeared once (singleton haplotypes) (Fig. 5b).

Nucleotide diversity and signatures of selection in the C-terminal non-repeat region of the Pfcsp gene
The average number of pairwise nucleotide differences (K) was higher in Muleba (4.1) and Ujiji (4.0) compared to other districts with 3.3-3.4(in Morogoro = 3.4,Muheza = 3.4, and Nachingwea = 3.3).The highest haplotype diversity (Hd) was observed in Muleba (0.976 ± 0.008) and Ujijji (0.973 ± 0.005) and the lowest was observed in Morogoro (0.917 ± 0.035).The highest nucleotide diversity (π) was observed in Muleba (0.0038) and Muheza (0.0037) and the lowest was observed in Morogoro (0.0029) but the differences were not significant (p > 0.05).Evidence of natural selection in the C-terminal non-repeat region of the Pfcsp gene was tested using Tajima's D and there were slightly positive values of 1.3, 1.3, 0.76 and 0.16 in Muheza, Muleba, Nachingwea and Ujiji, respectively; while Morogoro showed negative value of-0.21.However, these values suggest evidence of weak balancing selection in the population (Table 1).
The nucleotide diversity peaked at the Th2R and Th3R T-cell epitopes, while the connecting region between Th2R and Th3R was conserved.Nucleotide diversity showed slightly different values between populations according to their geographic origin (Fig. 6), with Muleba showing a higher peak > 0.07 in Th2R epitopes than the other four districts.The extended haplotype homozygosity revealed some extended haplotypes from the focal SNP locus 221,554 in all populations, but no long range haplotypes extended beyond 221,554 (Fig. 7).

Discussion
Tanzania is one of the malaria-endemic countries with a high number of cases and deaths in SSA and malaria is still a major health problem despite intensified interventions in the past two decades [2].Recent studies show that malaria transmission in Tanzania has become heterogeneous with persistently high burden in the western, north-west and southern regions for over 20 years [59], but lower transmission intensities in other regions attributed to the ongoing control efforts.The Tanzanian NMCP is implementing targeted interventions to reduce the burden in areas with high transmission and eliminate malaria in regions that have consistently reported low transmission over the past 20 years [60].Among the interventions, NMCP is considering the deployment and use of malaria vaccines following their approval by WHO in 2021 and 2023 [2,61].This study was therefore undertaken to generate baseline data on the genetic diversity, population structure and signature of selection in the Pfcsp gene before the roll-out of malaria vaccines in Tanzania.The data will potentially be useful in monitoring the performance of the vaccines after their introduction and use in Tanzania to look for shifts in variations-or new variations-that may arise due to vaccine pressure.
In general, the study showed no clear differentiation in the patterns of the Pfcsp gene polymorphism despite the study sites being widely separated geographically (with over 800 km apart), which is consistent with global patterns of Pfcsp variations which have been reported elsewhere [62,63].The lack of clear differentiation was according to the pairwise index of differentiation analysis (F ST below 0.05) and confirmed by PCA.Similar findings were reported in Ghana where pfcsp sequences collected from two sites located over 784 km apart showed a largely homogeneous population with F ST below 0.05, and without any population structure as detected by PCA   [62].This and other studies of the Pfcsp gene suggested a continued gene flow among parasites or human population mixing between the areas which might have contributed to the observed findings [62,64,65].High genetic diversity and a lack of population differentiation may have important implications related to the efficacy and emergence as well spread of vaccine-resistant parasites [66,67].
The C-terminal of the Pfcsp gene displayed high genetic diversity in the study areas.A comparative sliding window plot analysis of nucleotide diversity in the C-terminal region displayed predictable results of two peaks at the Th2R and Th3R regions, suggesting that genetic variations were mainly concentrated in these regions.Differences were also observed among parasites from different geographical locations.The overall nucleotide diversity was 0.0035 with Muleba showing a high peak in Th2R epitopes.A similar pattern of nucleotide polymorphism was found in other African countries including Malawi, The Gambia, Nigeria, Ghana, Mali and Guinea [8,38,62].
The overall haplotype diversity in the C-terminal of the Pfcsp gene in Tanzanian isolates was 0.937 which is similar to those previously observed in Cameroon, Gambia, Ghana, Senegal, Congo, Malawi and Guinea but higher than those observed in non-African countries [8,68].This suggests that the C-terminal of the Pfcsp gene in Africa has a higher level of diversity compared to other parts of the world [37].The values of Tajima's D in the C-terminal region of the Pfcsp exhibited slightly positive values which suggest evidence of weak balancing selection in response to host immune pressure on this region [8].Similar results have been reported showing weak variant-specific selection within this region, as shown by a study in Malawi [69].Evidence of balancing selection has been reported in other malaria vaccine candidates such as the extracellular domains of P. falciparum apical membrane antigen (PfAMA-1), Cell-Traversal protein for Ookinetes and Sporozoites (CelTOS), Thrombospondin Related Adhesion Protein (TRAP), Liver Stage Antigen 1 (LSA1) and merozoite surface proteins (MSP1) [70][71][72][73].Evidence of recent directional selection (iHS) was not observed in all study areas and extended haplotype homozygosity showed no extended haplotypes from the focal SNP locus (221,554) [69,74].
The haplotype network analysis in the C-terminal showed a sequence connection between the five areas, which suggests they were closely related.This relatedness is in agreement with the F ST and PCA results which showed limited separation in isolates from the five districts.Both the RTS,S and R21 vaccines were constructed using the Pfcsp gene from the P. falciparum 3D7 and NF54 strains, respectively.However, in Tanzanian isolates, only 2% of the Pfcsp haplotypes matched those of the 3D7 strain, and 0.4% matched the NF54 strain.Low matching of the 3D7/NF54 to the natural Pfcsp is consistent with other studies conducted across Africa and worldwide, which demonstrate low 3D7 haplotypes and little population structure [8,38,64,68].However, other studies of the genetic diversity and protective efficacy of the RTS,S/AS01 vaccine suggested that high levels of mismatch between natural isolates and 3D7 may weaken vaccine efficacy [28,68].Moreover, the high degree of location-specific Pfcsp diversity observed in Tanzania might result in differences in vaccine efficacy, potentially reducing the effectiveness of the RTS,S/AS01 and R21 vaccines.Monitoring of differential vaccine efficacy according to Pfcsp haplotypes during RTS,S/AS01 and R21 implementation programmes will be valuable, particularly in high transmission areas, where a post-vaccination expansion of non-vaccine haplotypes is likely to be observed which could lead to reduced vaccine efficacy and vaccine breakthrough infections [64].Going forward and ensuring the vaccine is efficacious in all endemic areas, the design of malaria vaccines may consider developing a vaccine that works for specific regions based on the genetic architecture of the parasites found in those areas.

Conclusion
These findings demonstrate a significant level of genetic diversity in the Pfcsp gene within the study areas.Although there is limited population differentiation and structure, there is a high degree of polyclonality, particularly in Morogoro, Muleba and Nachingwea.Furthermore, there is no genetic structure across all the study districts, indicating a high level of gene flow and genetic exchange within the Pfcsp gene.In addition, the C-terminal region of the Pfcsp gene from Muleba and Muheza exhibited higher nucleotide diversity compared to Morogoro, Nachingwea and Ujiji.Additionally, the Pfcsp gene displayed slight positive Tajima's D values in all five studied P. falciparum populations.These findings are consistent with the hypothesis of balancing selection acting on the Th2R and Th3R regions of the gene.However, it is important to note that further studies and continuous monitoring of the Pfcsp gene are necessary.These future investigations should encompass other regions and incorporate more recent data to comprehensively assess the current diversity and dynamics of the Pfcsp gene.By expanding the understanding of the genetic variations within the Pfcsp gene, it creates an opportunity to improve the knowledge of the parasite's adaptability and contribute to the development of effective malaria control strategies.

Fig. 5 aFig. 5 (
Fig. 5 a Haplotype network analysis of the Pfcsp gene among parasites from Morogoro, Muheza, Muleba, Nachingwea, and Ujiji was constructed using the NETWORK program with the Median Joining algorithm.The network revealed 50 haplotypes among 229 Pfcsp sequences.The size of each node indicates the proportion of the total haplotype frequencies.Each haplotype is denoted as "H, " with the vaccine strain 3D7 represented as "3D7" in Hap_50.The color of each node corresponds to the site of sample origin.b Haplotype network analysis of the Pfcsp gene among parasites from Morogoro, Muheza, Muleba, Nachingwea, and Ujiji was constructed using the NETWORK program with the Median Joining algorithm.The network revealed 49 haplotypes among 229 Pfcsp sequences.The size of each node indicates the proportion of the total haplotype frequencies.Each haplotype is donated as "H" with the vaccine strain NF54 in Hap_49.The color of each node corresponds to the site of sample origin (See figure on next page.)

Fig. 6 Fig. 7
Fig. 6 Nucleotide diversity of the C-terminal region of the Pfcsp gene was calculated among 229 parasite sequences from Morogoro, Muheza, Muleba, Nachingwea, and Ujiji.The results indicated high diversity in both Th2R and Th3R, while the connecting region between Th2R and Th3R remained conserved

Table 1
Diversity of the Pfcsp C-terminal region of samples included in the network analysis n number of sequences, h number of unique haplotypes, s number of segregating sites, k average of pairwise nucleotides differences, π nucleotide diversity, Hd haplotype diversity