Characterizing the impact of sustained sulfadoxine/pyrimethamine use upon the Plasmodium falciparum population in Malawi

Malawi experienced prolonged use of sulfadoxine/pyrimethamine (SP) as the front-line anti-malarial drug, with early replacement of chloroquine and delayed introduction of artemisinin-based combination therapy. Extended use of SP, and its continued application in pregnancy is impacting the genomic variation of the Plasmodium falciparum population. Whole genome sequence data of P. falciparum isolates covering 2 years of transmission within Malawi, alongside global datasets, were used. More than 745,000 SNPs were identified, and differences in allele frequencies between countries assessed, as well as genetic regions under positive selection determined. Positive selection signals were identified within dhps, dhfr and gch1, all components of the parasite folate pathway associated with SP resistance. Sitting predominantly on a dhfr triple mutation background, a novel copy number increase of ~twofold was identified in the gch1 promoter. This copy number was almost fixed (96.8% frequency) in Malawi samples, but found at less than 45% frequency in other African populations, and distinct from a whole gene duplication previously reported in Southeast Asian parasites. SP resistance selection pressures have been retained in the Malawian population, with known resistance dhfr mutations at fixation, complemented by a novel gch1 promoter duplication. The effects of the duplication on the fitness costs of SP variants and resistance need to be elucidated.


Background
Malawi suffers a heavy burden of endemic falciparum malaria with year-round transmission that peaks during the long rainy season from early December to May [1]. Malaria still accounts for 40% of hospitalizations in children under 5 years of age and 30% of all outpatient visits [2]. The malaria mortality rate is 63 per 100,000 population, and amongst the highest in East Africa [3] despite the roll out of control measures, such as insecticide-treated bed nets (ITNs), intensive indoor residual spraying (IRS), and artemisinin-based combination therapy (ACT) [2]. As one of the first African countries to switch from chloroquine to sulfadoxine/pyrimethamine (SP) in 1993, and the last to switch from SP to ACT in 2007, Malawi stands out from the rest of Africa in having a significantly prolonged exposure to SP [4]. Whilst this meant the reduced frequency of chloroquine resistance alleles in the Plasmodium population [5], the same cannot currently be said for SP resistance [6]. Resistance to SP is thought to be a cumulative process whereby mutations are successively acquired in both the dhfr (S108N, N51I, C59R, then I164L) and dhps (A437G then K540E) genes. These dhps and dhfr polymorphisms persist at high frequency in the Malawian Plasmodium population despite exposure to SP being reduced. Retention of these variants may be due in part to the use of SP in intermittent preventive treatment for pregnant women (IPTp) or a lower than expected fitness cost associated with these variants [6]. The scaling up of the distribution of ACT and IPTp contributed to a 36% drop in the mortality rate between 2004 and 2014 for children under 5 years of age, to an estimated 85 deaths per 1000 live births [2]. However, the control efforts could be derailed by the use of SP in IPTp strategies in sites where parasites resistant to SP persist, as well as any future emergence and spread of ACT-resistant parasites (as seen in Southeast Asia).
Genetic variation in Plasmodium falciparum is central to the parasite's survival and can potentially undermine malaria control interventions. The evolutionary process enables parasite populations to select for variants that rapidly overcome host immune responses and anti-malarial drugs to establish persistent infections and increase transmission. Therefore, surveying evolutionarily driven genetic changes in P. falciparum and investigating parasite responses to anti-malarial interventions are crucial to efforts to reduce the malaria burden. Where regions are working towards pre-elimination, longitudinal samples are required to monitor parasite transmission dynamics and the efficacy of interventions to control malaria.
Previous work in a rural Malawian P. falciparum population in the Chikwawa district in a single 'baseline' season (n = 69, December 2010 to July 2011) identified several genes encoding merozoite invasion ligands as being retained in the population due to balancing selection [7]. This type of selection actively maintains multiple alleles in the gene pool of a population. Further, by comparing the Malawian P. falciparum population to others in Africa and Southeast Asia, signals of recent positive selection were identified at known drug targets (e.g., dhps, crt and mdr1), metabolic enzymes (e.g., gch1), and in several invasion ligands (e.g., msp3.8, trap and ama1) [7]. Initial analysis provided evidence of population divergence presumably driven by drug selection on crt, dhps and mdr1 genes, and reflects the adaptation of parasite populations to local drug pressure, especially SP. The dhps (sulfadoxine target) and dhfr (pyrimethamine) genes are on the folate biosynthesis pathway of P. falciparum, and in Southeast Asian populations a copy number variant in gch1 (first gene in pathway) is thought to be associated with SP resistance and its persistence [8].
The initial work in Malawi [7] was followed up by including additional 'baseline' season samples (n = 29, total n = 98) and comparing the genetic diversity in 122 isolates collected in the subsequent 2012 dry and wet seasons in the Chikwawa and Zomba districts, located 100 km apart. These regions are sentinel sites in Malawi, chosen for intensive anti-malarial intervention involving ACT, ITNs and IRS. The aim was to identify changes in allele frequency within individual, intra-and inter-season (wet and dry seasons) and identify regions under selection pressure. The findings demonstrate limited variation between the Malawian sub-populations over time and the impact of prolonged exposure of parasites to SP. Particularly, fixation of several known SP resistance SNPs and a novel copy number increase of the gch1 promoter region were identified. Cross-population analysis revealed selective pressure for chloroquine resistance in non-Malawian populations. These populations have experienced prolonged use of chloroquine. Overall, the findings support the use of parasite and population genetic approaches to monitor transmission and the adaptation to drug pressure, and thereby inform the timing and type of interventions to be applied. Existing surveillance could be enhanced with rapid, field-based, genomic tests which genotype the gch1 promoter region as a proxy for SP resistance in an African setting.

Study sites and sample collection
Whole blood samples were collected from October 2010 to November 2012 from children aged 5-28 months recruited in an ongoing ACTia[abbrev?] study within the high-transmission Chikwawa and Zomba regions in Malawi [7]. All individuals recruited had clinical falciparum malaria and received artemether/lumefantrine (AL) or dihydroartemisinin/piperaquine (DHA) treatment post-collection. Written informed consent was obtained from a parent or guardian of each child with the ethics committees of the University of Malawi's College of Medicine and the Liverpool School of Tropical Medicine both approving the study.

Whole-genome sequencing and quality control
Human DNA contamination was reduced through leukocyte-depletion of the blood samples using CF11 column filtration [9]. Purified DNA samples (n = 220) containing less than 30% human DNA were sequenced at the Sanger Institute using Illumina HiSeq2500 technology, with a minimum of 76-base, paired-end, fragment sizes. All short reads were mapped to the 3D7 reference genome (version 3.0) using bwa-mem [10]. SNPs and small indels were called using samtools and bcf/vcftools with default settings [10]. Only those variants with quality scores in excess of 30 (indicating an error rate less than one per 1000 bp) and with minimum coverage of ten were retained [10]. Genotypes at SNP positions were called using ratios of coverage and heterozygous calls were converted to the majority genotype on a 70:30 coverage ratio or greater [7,11,12]. SNPs were excluded from analysis if they had more than 5% mixed or missing genotype calls, or they were positioned within non-unique regions, sub-telomeric regions or within the hypervariable var, rifin and stevor gene families.

Statistical analysis
Population stratification in the isolates was investigated using a principal component analysis (PCA) of the pairwise SNP distances between samples. This approach identified distinct African, Asian and South American clusters, with a further African-only investigation identifying West, Central and East African clusters. Differences in allele frequencies at each SNP were estimated using fixation indexes (F ST ), with genes ranked by their maximum scores [15]. Tajima's D [16] was implemented to identify genomic regions under balancing selection, with a score greater than two suggesting strong balancing selection. Scores were calculated for each gene containing at least four SNPs. All SNP-based analyses were performed using base R functions. Copy number variation in isolates and strains was analysed with DELLY using default settings [17].
Extended haplotype homozogysity (EHH)-based selection analyses, intra-population iHS and inter-population XP-EHH, were performed using selscan [18], using the default minor allele frequency and EHH truncation values of 0.05. Pair-wise country XP-EHH analyses used Malawi as the reference group, and both iHS and XP-EHH values were normalized genome-wide. P values for iHS and XP-EHH estimates were calculated using a Gaussian approximation. A significance threshold of P < 0.00006 was established for both iHS (>4) and XP-EHH (>6), using a simulation approach.

Malawi sub-population analysis over time and location
Potential stratification within the Malawi dataset, across season, year and location of collection, was explored before consideration of shared signals. Of the 220 parasite isolates collected in this study, 85.9% were from the Chikwawa region whilst only 14.1% were from the Zomba region. Season-wise 43.2% were collected in a wet season, 56.8% in a dry season, across three years (2010 5.0%, 2011 39.5%, 2012 55.5%). Allele frequency differences between the location (median F ST 0.003, max. 0.11) and year (median F ST 0.004, max. 0.07) sub-populations were small. Within and between the seasons and location, the number of SNP differences was similar (~9400 SNPs) (see Additional file 1: Table S1). Twenty-six SNPs have an F ST value greater than 0.15 for within-Malawi year, location and season-based sub-populations. Seven of these are intergenic and four within 'conserved unknown' genes (see Additional file 1: Table S2). The remaining 15 SNPs are within putative or known genes, including surfin14.1 (max. F ST 0.172), heat shock protein 90 (max. F ST 0.166) and the immune evasion antigen PfEMP1 (max. F ST 0.157) [19]. Notably, all top pair-wise F ST values above 0.15 were from season-based comparisons. No major differences were detected in allele frequency for known drug resistance mutations (see Additional file 1: Table S3).

Selection within the combined Malawian parasite sub-populations
Given the absence of strong stratification between the sub-populations, the Malawian datasets were combined to identify selection signals within population. Signatures of potential recent positive selection were identified within 13 genes (iHS > 4). Five of these genes are currently uncharacterized, and three within close proximity to genes of known function and established selection sites (see Fig. 1; Additional file 1: Table S4). For example, the PF3D7_1223400 signal (iHS: −4.948) is within 2 kbp of gch1, and loci were identified within 60 kbp of dhps and 8.5 kbp of dhfr, both members of the P. falciparum folate pathway [20]. Selection in these genomic regions within Malawi has been suggested previously [11] and relates to selective pressure due to use of SP.
Additional allele selection signals are present within the erythrocyte invasion critical protein ama1 [21] (iHS: 5.129, 5.017) and the sporozoite surface protein trap (iHS: 5.366, 5.209), both representing hits that have been suggested previously [22,23]. Non-reference allele selection loci were also found within Plasmepsin X (iHS: 4.686), which has a role in ookinete invasion [24]. In contrast, reference allele selection was identified centred on four loci within TRS85 (iHS:  Table S5). These are ama1, required for erythrocyte invasion [25], surf8.2, a group A SURFIN protein not found to be expressed post-invasion [26] but associated with susceptibility to pyrimethamine (inhibits folic acid metabolism via dhfr) [27], and a conserved unknown protein (PF3D7_0710200) within chromosome 7. The ama1 locus has both balancing and positive selection, suggesting positive selection for one specific haplotype alongside the retention of multiple others. Such complex selection has previously been identified as differing between ama1 domains [27,28].

Malawi within a global context
Parasite genetic diversity within Malawi was further contextualized within a global setting. A PCA approach identified distinct African, Asian and South American clusters, as previously reported using large SNP datasets [7,11,12,14,29]. Within the African-only analysis, distinct West, Central and East African clusters were also distinguished, and Malawi was separated from Tanzania and Kenya. (see Additional file 2: Figure S1).
Pair-wise F ST values were calculated genome-wide for Malawi against each country with at least 14 samples. The median F ST values per region per SNP were then estimated to identify those that contained Malawi-unique variants (see Table 1). The top hits included the dhps K540E causal mutation, likely reflecting the prevalence of SP resistance in Malawi and other East African populations considered, compared to elsewhere ( Table 2; Additional file 1: Table  S6). Of the remaining top hits, the study identified the P47 and P230 genes that share roles in gametocyte fertility [30], P48/45 reflecting differences in mosquito vectors [30], and PF3D7_0913900 a putative arginine-tRNA ligase. When considering drug-resistant candidate polymorphism, crt mutations (K76T, Q271E, N326S, I356T) were absent in Malawi, reflecting early withdrawal of chloroquine compared to other African countries (see Table 2; Additional file 1: Table S6). K76T and Q271E mutations are near fixed in Asia, and known to have undergone 'hard' selective sweeps [31]. Compared to other African populations, Malawi has a higher frequency of dhfr triple mutant haplotypes (N51I/C59R/S108N) and dhfr⁄ dhps quintuple mutant genotypes (dhfr N51I/C59R/S108N haplotype and dhps A437G/K540E haplotype). The dhps S436A mutation was at high frequency in West Africa, and almost absent in Malawi. The contributing dhfr quadruple mutation (I164L) and dhfr⁄ dhps sextuple mutant genotypes (dhfr⁄ dhps quintuple mutant genotype and dhfr I164L) were only present in Asia. In Malawi, the dhps A581G mutation, which has been shown to reduce the effectiveness of SP preventive therapy [32] was present at low frequency, leading to the presence of an alternative dhfr⁄ dhps sextuple genotype (dhfr⁄ dhps quintuple genotype and dhfr I164L). In the Malawian population, no variants of the kelch13 gene, previously described in Southeast Asia to be associated with artemisinin resistance [33], were identified. All alternative alleles in kelch13 are present at low frequencies, the maximum at 0.091 for K189T, and reflect previous SNPs frequencies for the rest of Africa [34].
The XP-EHH method was used to identify regions under selection in the Malawi population compared to others. Positive values suggest relative selection in Malawi, whilst negative values suggest selection in non-Malawi (see Additional file 2: Figure S2; Additional file 1: Table S7). 31 genes (|XP-EHH| > 6) were identified, of which 18 are uncharacterized and 14 appear against only one other country. The most striking signals were within PF3D7_1223400 and PF3D7_1223500, both uncharacterized but within 2 kbp of gch1, and indicate relative selection within the Malawian population when compared to Burkina Faso, the DRC, The Gambia, Ghana, Guinea,  Table S7; Additional file 2: Figure S2). Strong shared negative XP-EHH scores were also present within acs8, the uncharacterized PF3D7_1421100 and the SP resistance gene dhps. Strong positive signals, suggestive of selection in the non-Malawian populations, were present within msp10, trap and three genes directly downstream of crt (cg1, glp3, cg2).

Table 1 Differences in allele frequencies between Malawi and other populations (based on pairwise F ST scores)
Values are the median F ST values for each regional population. In italics are regional medians above 0.5  Table 2; Additional file 1: Table S6). The gene duplication appears predominantly on the background of dhfr triple mutant haplotype in West Africa (79.2%) and a quadruple mutant haplotype in Southeast Asia (80.3%) and South Asia (50.0%) (see Table 3). The promoter duplication is seemingly not linked to the dhfr I164L quadruple mutation and sits predominantly on a triple mutant haplotype background in Malawi (99.1%), DRC (87.9%) and West Africa (85.8%) (see Fig. 3). In these populations, the haplotype background of dhps S436A, A437G and K540E appears less important with a number of mutation combinations present ( Table 3). The exception is Malawi where the promoter sits on a predominantly dhps A437G/K540E haplotype background, leading to the high frequency of quintuple mutant genotypes. Coverage analysis of the P. falciparum reference strains 3D7, HB3, DD2, 7G8, and GB4 identified no upstream promoter duplication. Previously identified whole gene duplications in 3D7, 7G8, DD2, and GB4, and the absence of duplication in HB3, were confirmed [36] (see Additional file 2: Figure S4). 38 SNPs (eight non-synonymous) were identified across the  The overall low levels of nucleotide variation and sequence homogeneity support the argument that gch1 copy number variants, rather than associated coding SNPs, are targeted by selection. Examination of the EHH revealed differences in linkage disequilibrium (LD) between continents and within Africa (see Fig. 4; Additional file 2: Figure S2). When considering African populations, there is evidence in non-Malawian populations of near symmetrical decays of EHH to a level close to zero within 25 kb. Malawi LD extends much wider and is consistent with a sweep around the promoter duplication (see Fig. 4).

Gene ID Position Gene Other East Africa DRC West Africa South Asia Southeast Asia South America
To determine whether the gch1 promoter duplication was under relative positive selection in the DRC, Ghana and Guinea, a sub-population was applied XP-EHH analysis, where those with the duplication were compared to those without. This approach identified no significant differential sub-population selection in either the DRC, Ghana or Guinea (|XP-EHH| < 4), although there are near-significant positive selection signals for the duplication-positive sub-populations in Ghana and Guinea (see Additional file 2: Figure S3). This contrasts with Malawi, where the duplication appears to be under active selection, potentially reflecting differences in SP usage (see Fig. 1; Additional file 2: Figure S2).

Discussion
Malaria surveillance is crucial to informing and supporting disease prevention, control and elimination strategies. Access to technological advances and their reduced costs mean monitoring approaches involving genomic data are being implemented within malaria programmes  [7,12,14]. To inform such programmes, genomic differences were investigated in Malawian P. falciparum parasites from two regions across multiple seasons. No major differences were observed between the two regional or temporal sub-populations, potentially due to small sample sizes, a narrow two-year study period, the removal of a minority of SNPs with a high frequency of missing or heterozygous genotypes, and study regions that are only 100 km apart with high human migration between them.
Because of the effects of control measures on Plasmodium genomes, signals of selection and differences in allele frequencies were also investigated across Malawi parasites and compared with those from global datasets. Signals in drug resistance-associated genes, and surfaceassociated proteins that are exposed to the host immune system were detected, including previously described selection in msp10, trap and ama1 [7,23]. Antibodies against the pre-erythrocyte TRAP and erythrocyte stages MSP10 and AMA1 proteins have been detected in anti-malarial-acquired immunity individuals living in malaria-endemic areas [37,38]. Vaccines using TRAP, AMA1 and MSP proteins (e.g., MSP1 and MSP3) are being tested with promising results [39]; as such the MSP10 protein may represent a viable vaccine candidate.
Consistent with previous reports, crt mutations (K76T, Q271E, N326S, I356T) were absent from Malawi [40,41]. Neighbouring countries (as well as others in Africa) that did not switch away from chloroquine as early as Malawi continue to report higher levels of crt mutations. This observation has led to a setting likened to a Malawian island of chloroquine drug sensitivity in a sea of resistance [41]. Three genes, cg1, glp3 and cg6, immediately downstream of the crt gene, displayed high relative positive selection in multiple non-Malawian populations, consistent with hard selective sweeps [31]. In contrast, SP resistance-associated dhfr and dhps alleles are retained at high frequencies in Malawi. Compared to other African populations there is a significantly higher level of  both the dhfr C59R mutation and the quintuple dhfr⁄ dhps mutant genotypes. These frequencies are consistent with previous analyses highlighting the pan-African distribution of the dhfr triple mutant haplotypes in contrast to the East-West African differences in distribution of the A437G and K540E dhps variants [42,43]. The SNP-based work is consistent with the results from microsatellitebased studies, considering the dynamics of strong selection for mutations conferring SP resistance, including support for the observation of the independent origin of sulfadoxine-resistant alleles across a number of regions [42,44,45]. Together, these observations reflect the early decline in chloroquine usage and early introduction and prolonged use of SP in Malawi when compared to other African populations.
In almost all Malawian samples (96.8%), a novel duplication was identified in the promoter of the gch1 gene (973,804-974,240 bp), a member of the folate pathway that includes targets for sulfadoxine and pyrimethamine. This duplication was also detected in other parasites, particularly from West Africa and the DRC, but is nearabsent in Asia. Whole gene duplications have previously been detected in Thailand and Cambodia [33] and were detected in this dataset. Positive selection signals are present across the gch1 region in Malawi, with relative selection in Malawi compared to other African populations where the promoter duplication is also present. Further, samples with the duplication were not found to be under relative selection in the DRC, Ghana or Guinea. Together, the strong evidence of selection for this promoter duplication in Malawi supports its role as being advantageous for P. falciparum parasites in regions with high SP use, particularly where there is a higher frequency of resistance-associated dhfr and possibly dhps variants.
The function of the promoter duplication remains to be established. It is possible that both promoter and whole gene duplications increase gch1 expression in vivo thereby reducing the fitness cost associated with dhfr and dhps variants, which convey resistance to SP. In silico, functional predictions for the promoter duplication  identified multiple TATA, TATAA and TGTAA PfTBP binding motifs [46]. If this duplication acts to reduce the fitness cost of other SP resistance-associated variants, its presence may suggest a more persistent form of resistance which further surveillance will need to confirm. Ongoing fieldwork in Malawi will allow to survey the parasite population during longer periods and to detect genomic changes following the introduction of ACT. Further work should also consider the genomic landscape within other African countries to determine the frequency of the gch1 duplication and other variants associated with SP resistance.

Conclusion
This study reports the persistence of genetic variants associated with SP and chloroquine resistance within the P. falciparum population in Malawi, despite withdrawal of these anti-malarials from front-line use. Signals of positive selection were also identified, which suggest retention of these resistance-associated variants, as well as various life stage-specific surface antigens. Investigation of gch1 copy number variation identified the near fixation of a specific 436 bp promoter duplication within Malawi, present in other African countries but absent from Asian populations. It is most likely that this promoter duplication acts in a similar fashion to the whole gene duplication present in Asian populations, although further experimental work is required to elucidate any functional impact.