Genomic analysis reveals independent evolution of Plasmodium falciparum population in Ethiopia

Deriba Abera Jomo Kenyatta University of Agriculture and Technology College of Pure and Applied Sciences Caleb Kipkurui Kibet Jomo Kenyatta University of Agriculture and Technology Teshome Degefa School of Medical Laboratory Science, Faculty of Health Sciences, Jimma University Lucas Amenga-Etego Western African Center for Cell Biology of Infectious Pathogens WACCBIP), University of Ghana, Accra, Ghana Joel L Bargul Department of Biochemistry, College Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya Lemu Golassa (  lgolassa@gmail.com ) Addis Ababa University, Aklilu Lemma Institute of Pathobiology https://orcid.org/0000-0002-1216-8711


Conclusion
Plasmodium falciparum population in central region of Ethiopia were structurally diverged from both southeast Asian and other East African populations. A low within host diversity is noted among the Ethiopian parasites. Indeed, the parasites carry xed chloroquine resistance markers despite the withdrawal of this drug for the treatment of P. falciparum.

Background
Plasmodium falciparum malaria remains one of the major public health problems worldwide accounting for 228 million cases in 2018 compared to 231 million in 2017, while the number of deaths decreased by just 2.5%, from 415, 000 to 405, 000 during the same period [1]. Sub-Saharan Africa (sSA) still accounts for 94% of the global death. In Ethiopia, more than 75% of the total area is malarious and P. falciparum, and P. vivax co-exist [2] making malaria control more complicated than in other African countries.
Across malaria endemic regions, large-scale deployment of antimalarial drugs has led to the emergence of drug resistance to chloroquine (CQ) and sulfadoxine/pyremethamine (SP) drugs [3][4][5]. Like many other countries, Ethiopia has switched from CQ to SP in 1998 and from SP to AL in 2004 [6] for the treatment of uncomplicated P. falciparum malaria in response to development of parasite resistance. However, CQ remained the rst-line drug for P. vivax treatment in the country [7] leading to continued selection of CQ-resistance markers in P. falciparum as the result of indirect pressure from CQ and the presence of mixed infections (P. falciparum and P. vivax). Similar to CQ and SP, P. falciparum developed resistance to aretemether-lumefantrine (AL) rst at Thai-Cambodian border [8] and recently in East Africa [9]. The continuous development of resistance in P. falciparum to series of rst-line antimalarial hinders malaria prevention, control and elimination efforts.
Of the many pressures known to select for parasite genomics, antimalarial drugs pose tremendous selective pressure on P. falciparum leading to worldwide spread of resistant parasites [10]. It was well noted that P. falciparum resistance has resulted in increased malaria morbidity and mortality across endemic settings. Apart from increased morbidity and mortality, selective sweeps of drug resistance mutations have reduced levels of polymorphism in P. falciparum as these resistant and sensitive strains continue to recombine in mosquitoes [11,12] with perhaps a reduced diversity around the selected loci. However, the greatly reduced level of diversity across the entire P. falciparum genome most likely resulted from a recent severe population bottleneck, which is most plausibly explained by the gorilla-to-human cross-species transmission event [13].
Based on analysis made on 12 strains collected from different countries in Africa and Asia, the average diversity of P. falciparum at four-fold degenerate sites was estimated to be 8 × 10 − 4 per site [13]. However, published mutation rates for P. falciparum were in the range 1-10 × 10 − 9 mutations per site per replication cycle [14,15]. Considering the varying lengths of time that the parasites spend either in the vector or in the mammalian host, P. falciparum parasites are likely to undergo at least 200 replication cycles per year suggesting that the observed level of genetic diversity in P. falciparum could have readily accumulated within the past 10,000 years [10].
In response to exposure to drugs, challenges from host immunity and therapeutic interventions, P. falciparum parasite change and select its new genetic variants [16]. Indeed, high pressure from immunity and drugs are known to select adaptive parasite strains that maintain transmission [17] and therefore many P. falciparum genes encoding immune and drug targets remain under natural selection and show signatures of balancing or directional selection [4,[17][18][19][20][21].

Methodology Study area and population
The study was conducted in West Arsi, Oromia (07′ 17′′ 34.2 S, 038′ 21″ 46.3 W) located about 251 km from Addis Ababa, Ethiopia. This region with distinct wet and dry seasons has an altitude of about 1500 -2300m above sea level with human population of 176,671. The inhabitants of this malarious region have high levels of poverty worsened by the malaria diseases caused predominantly by P. falciparum and P. vivax with a seasonal and unstable pattern of transmission [7].
Sample collection and processing About ~5mL venous blood was collected from July 2012 to December 2013 from consented P. falciparum malaria patients following standard procedures. Sequencing of P. falciparum from leukocyte-depleted infected whole blood was done as described in [27] at the Welcome Sanger Institute as part of the MalariaGEN P. falciparum Community Project (www.malariagen.net/projects). Open source sequence data from Cambodia, Thailand, Democratic Republic of Congo (DR Congo), and Malawi were accessed via the Pf3K project (https://www.malariagen.net/data/pf3k-5).
Short sequence reads were generated on the Illumina HiSeq platform and aligned to Pf3D7 reference (version 3) by burrows-wheeler-aligners (BWA). SNP calling was done following a customized genome analysis tool kit (GATK) pipeline. Each sample was genotyped for polymorphic coding SNPs across the genome, ensuring a minimum of 5x paired-end coverage across each variant per sample. Polymorphic sites within hyper-variable, telomeric, and repetitive sequence regions were excluded. Biallelic high-quality SNPs with mapping quality (MQ) > 20 and Variant Quality Score (VQSLOD) >= 3 in the core region loci with a minor allele frequency of at least 2% and individual sample and SNP-site missing less than 10% across the isolates was extracted and used for downstream analysis.

Analysis of population genetic diversity and within host infection diversity
Pairwise nucleotide diversity (π) [28] was used to determine average pairwise nucleotide genetic diversity using a 5-kb sliding window as described [29]. The genome-wide F WS (inbreeding coe cient within a population) metric was used to calculate within-host diversity as described in [21]. To derive F WS (=1-H W /H S ), within isolate expected heterozygosity (H W ) was calculated from the relative allele frequencies for all genic SNPs, averaged across the genome and compared with the heterozygosity of local population (H S ). F WS value ranged from zero to one, where zero indicates high diversity of infection, and one represents a single infection within the sample as compared to local population diversity. For this analysis, individual alleles with coverage of less than < 5 reads and positions with total coverage of < 20 reads were classi ed as undermined (missing). Isolates with greater than > 5% missing SNP data and SNPs with > 10% missing isolate data were discarded. Isolates with F WS scores of > 0.95 were classi ed as a single predominant genotype infection.
Population structure and admixture analysis Principal component analysis (PCA) was used to estimate population structure using glPCA function in the open source R statistical software version 3.6.2. The rst 10 principal components (PCs) axis were calculated, and the rst three PCs which explained majority of the variation in the data were retained. The data was thinned downed by pruning SNPs with pairwise linkage disequilibrium (LD) by r 2 greater than 0.05 for determining the PCs. The pruned SNP loci employed in glPCA function was used to calculate an allele sharing matrix in custom R scripts. This function use variance between and within groups to determine population genetic structure. A discriminant analysis of principal components (DAPC) [30] was used to transform the PCA data, and perform discriminant analysis on the retained principal components using the adegenet package in the R software version 3.6.2. Population admixture was determined based on spatial modeling of allele sharing among geographical coordinates of sampling sites. DAPC determines ancestry proportions and membership probability modeled on genetic variation across space to determine admixtures as described in [30].
Allele frequency and differentiation analysis Analyses of allele frequency distributions between-population F ST values [31] were calculated using Vcftools or hierfstat package from adegenet in R, after excluding SNPs with greater than 10% missing data. For F ST analysis, missing data were excluded on the SNP basis with the size of each population corrected to account for F ST value difference due to population size variation.

Detection of signatures of natural selection
Within-population Tajima's D index [32] was calculated using Vcftools. Tajima D values were determined for each SNP and the average value for each gene was calculated. Genes with at least ve SNP and positive TajimaD values > 1 were considered as genes under balancing selection.
The standardized integrated haplotype score (IHS) analysis was used to identify positive directional selection signatures by using phased SNP data with allele frequency > 5%. IHS was determined using the rehh package in R software with default parameters [33] after imputing missing SNP data using Beagle version 5.2. The |IHS| > 2.5 (top 1% of the expected distribution ) was used as cut off value [34] to report genes under recent directional selection as reported for genome analysis of West African P. falciparum [17].

Sequencing of Plasmodium falciparum and analysis of allele frequency
High-quality sequence data obtained from 25 P. falciparum clinical isolates collected from the West Arsi of Ethiopia enabled the identi cation of 672,956 biallelic SNPs with less than 10% missing SNPs data and < 5% sample missing data in individual isolate. All isolates had 95.95% (645715/672,956) SNPs call. Sequences from the intergenic regions had lower read coverage compared to those sequences in the coding regions, and as a result, 78.92% In general, all populations had a high percentage of non-synonymous coding SNPs at polymorphic marker consistent with previous ndings [17]. SNPs with minor allele frequency (MAF) <5% were common in all analyzed P. falciparum populations following exclusion of monomorphic SNPs in each population. Further, SNPs with minor allele's frequency of < 5% occurred more frequently in samples from Malawi than in Ethiopian isolates ( Figure 2).

Genetic diversity of Ethiopian Plasmodium falciparum population
The overall π value in Ethiopian P. falciparum population was 0.00022 which is relatively lower than the genetic diversity in Malawian P. falciparum [20]. High variability of genetic diversity π values across the chromosomes was observed in Ethiopia with minimum value of 0.00015 in chromosome 12 and maximum value of 0.00045 in chromosome 4 ( Figure 3). This was consistent with variable genetic diversity π values observed in different chromosomes of other P. falciparum population [35]. The mean F WS scores of Ethiopian P. falciparum population were not signi cantly different from Cambodia's (Welch two Sample t-test, P = 0.42) and Thailand's (Welch two-sample t-test, p = 0.083) at 95% con dence intervals. However, mean F WS was signi cantly higher in Ethiopia compared to DR Congo (Welch two-sample t-test, p = 5.603e -06 ) and Malawi (Welch two-sample t-test, p = 3.242e -08 ) at 95% con dence intervals.
Population structure and admixtures Analysis using PCA revealed the presence of four clear major population groups of isolates, which were coincident with their geographical origins ( Figure 5A- Similarly, the ndings from admixture analysis were consistent with the PCA clustering. The isolates from the three regions were distinguished. This admixture analysis showed that four major components could be differentiated with a cluster value of K = 5. Multiple parasite subpopulations were observed in Malawi and DR Congo parasite populations suggestive of gene ow between these two populations ( Figure 6). There was no detectable gene ow between the isolates from Ethiopia and East African or Southeast Asia.
The clustering of Ethiopian P. falciparum isolates was consistent with the xation index (F ST ) values with or without correcting for sample size. The F ST values of Ethiopian isolates versus those from the two other East African regions (DR Congo and Malawi) ranged from 0.08 to 0.09, while F ST value of Ethiopian P. falciparum versus the two southeast Asian regions (Thailand and Cambodia) was 0.18 (Table 1).

Signatures of selection in the Plasmodium falciparum isolates
The Ethiopian isolates had the average Tajima's D value of 0.18 across the entire genome (One sample t-test, p < 2x10 -16 ). There were 1,450 genes that had at least one SNP with TajimaD value > 1 of which 125 genes had at least ve SNPs with Tajima D values > 1 of which 125 genes had at least ve SNPs with Tajima D values >1 (Additional le 3). These genes include apical membrane antigen-1 (ama1), erythrocyte binding antigen-175 (eba175), merozoites surface protein-1 (msp1),thrombospondin related anonymous protein ( trap), duffy binding like merozoites surface protein (dblmsp), and cytoadherence linked asexual protein 2 (clag2), that were previously reported for the balancing selection [24,29].
The standardized integrated haplotype homozygosity score (IHS) was applied to investigate genome-wide evidence for recent positive directional selection due to drug pressure, immune impact, or other mechanisms. Using |IHS| score of > 2.5 (top 1% of the expected distribution) as a threshold for hits, 36 genes with at least one SNP that could be under signi cant positive selection were identi ed, and out of these, 15 genes had at least two SNPs (Table 2; Figure 7). Table 2: Genes under recent positive directional selection in P. falciparum in Ethiopia as identi ed using the integrated haplotype score at a signi cance threshold of P< 0.01. SNPs and ID stand for single nucleotide polymorphisms and gene identi cation number, respectively.
Thirteen (13) out of above 15 genes under positive directional selection showed both positive balancing and directional selections ( Table 3) and these genes includes the vaccine candidate gene SURF4.2 on chromosome 4 and CLAG8 (cytoadherence linked asexual protein 8) on chromosome 8 [36]. Table 3: Genes under both recent positive directional selection and balancing selection in Ethiopian P. falciparum populations.
Interestingly, our analysis failed to detect selection signals in drug-resistance genes such as pfcrt, pfmdr1, pfdhfr, and pfdhps. The reason could be that IHS may not be suitable for detecting positive selection for those SNPs that have reached or are near xation in the local P. falciparum population [34].
Prevalence of mutations conferring antimalarial drug resistance in Plasmodium falciparum Table 4 shows inter-population differences in the prevalence of drug resistance genes observed among the P. falciparum global datasets analyzed. In tandem with previous studies [20,37] that suggest temporal differences in the geographical distribution of antimalarial drug resistance mutations, we observed that CQ-resistance alleles (pfcrt-K76T, pfcrt-A220S and pfcrt-Q271E) were xed in Ethiopia, Cambodia, and Thailand, regions where malaria transmission rates are comparably low; however, the prevalence of these same alleles were 0% in Malawi and ranged from 66% to 72% in DR Congo.
Sulfadoxine/pyrimethamine drug resistance mutations were also present in pfdhfr and pfdhps genes in all analyzed P. falciparum populations. The major pyrimethamine resistance-conferring alleles such as pfdhfr-N51I and pfdhfr-C59R were also identi ed in all parasite populations with xed or near xation in frequency. pfdhfr-S108N was xed in other P. falciparum populations, except in Ethiopia's. Variable prevalence of drug resistance-conferring alleles was also observed in pfdhps (pfdhps-S436A, pfdhps-G437A, pfdhps-K540N, and pfdhps-A581G), for the parent drug sulfadoxine resistance. In terms of artemisinin resistance, the African population-speci c Pfk13-K189T mutation was observed in Ethiopia (in 20% of the samples), DR Congo (17%), and Malawi (13%). This mutation was previously identi ed in African P. falciparum populations [20,37]. As previously reported [8] the validated and most characterized artemisinin resistance-conferring mutation pfK13-C580Y was identi ed in Cambodia (36% of the samples) as well as in Thailand (26%), but not in Africa.

Discussion
The transmission dynamic coupled with the unique history, ecology and demography of Ethiopia raises interest in the genetics of its parasite population. High resolution whole genome SNP data was used to analyze P. falciparum parasite genetic diversity in central region of Ethiopia and compared with similar parasite data from mainland Africa (DR Congo and Malawi) and Southeast Asian parasites from Cambodia and Thailand. We observe similar MAF across all ve parasite populations with over representation of low frequency (< 5%) variants as previously reported [19,21]. Interestingly, mean F ws values were signi cantly higher in the Ethiopian parasite isolates compared to the other African populations but not the Southeast Asian parasite populations. F ws is genome-wide metric that averages heterozygosity across the genome in comparison with heterozygosity within the local parasite population [21]. Hence it is a measure of within-host diversity of infections that allows us to gauge the potential for inbreeding (or outcrossing). The higher F ws values (> 0.95) in Ethiopia (P. falciparum prevalence of 0.02) [38] and east Asian infections is underscored by the low malaria transmission rates in these settings which supports a higher inbreeding and clonal propagation of infections ( Fig. 4; Additional le 2). Unlike the other East African countries (DR Congo and Malawi) where transmission intensities are higher [20,37], and there was a good distribution of F ws values with majority of infections being polyclonal with high potential for outcrossing (Fig. 4). These ndings are supported by a similar studies that link lower F WS values to in west African where transmission is high [18]. However, of note, is possibility for high F WS values to occur in areas of high transmission intensity if P. falciparum circulates in a geographically isolated community which limits the chance of outcrossing with other genetically distinct P. falciparum parasites as observed in the previous study [21].
Further, we found that overall, genetic diversity (π) in the Ethiopian P. falciparum population was relatively lower than that in the Malawian P. falciparum population. This observation is probably because of the higher potential for outcrossing in Malawi as compared to Ethiopia due to differences in parasites transmission intensity. P. falciparum malaria transmission occurs throughout the year in Malawi [20], unlike Ethiopia where transmission is low and fragmented especially in the region where these samples were collected [38]. Our observed variance in pairwise genetic Diversity (π) across the 14 chromosomes of P. falciparum parasites in Ethiopia with a tendency for higher genetic diversity in the smaller chromosomes, notably 1 and 4, and lowest values in chromosome 12 (Fig. 3), may be explained by differences in recombination activities at chromosomal level as observed in the previous studies [20,35].
An analysis of parasite population structure within and between continents revealed a higher degree of population structure between Ethiopian parasites and other east African parasites and between Southeast Asia and East Africa. However, neither PCA (Fig. 5) nor admixture analysis (Fig. 6) could resolve parasite populations in DR Congo and Malawi. These observations are corroborated by several studies that report regional and inter-continental level structure in global P. falciparum parasite populations [21]. However, the separation of Ethiopian parasites from the two east African populations is worth noting. Notwithstanding the increased human mobility between Addis Ababa and the rest of Africa, particularly east Africa, there remain important barriers to gene ow between parasite populations in central Ethiopia and the rest of the sub-region. We believe one of the factors that severely limit gene ow between Ethiopia and its neighbors is the local malaria transmission intensity as a function of poor vectoral capacity determined by the ecological landscape (highlands).
Against the backdrop of this unique eco-epidemiology of P. falciparum malaria in Ethiopia, TajimaD and HIS identi ed many antigenic genes to be under balancing selection with TajimaD value greater than one. These genes included known vaccine candidates such as ama1, trap, msp1, eba175, and clag2 (Additional le 3) which were previously identi ed in different populations that vary in transmission intensity [24,29,39], to be under balancing selection. Besides, we identi ed 15 genes under positive directional selection by IHS, which includes SURFIN and PHIST families previously suggested to be targets of immunity [24]. We believe the low seasonal transmission in Ethiopia maintains signi cant immune selection pressure on the infection reservoir than drug pressure due to clinical malaria. Therefore, the candidate vaccine antigen loci under balancing selection may be largely due to immune modulation and not positive adaptive selection in uenced by drug pressure. This is supported by a failure to detect selection signatures in known drug target genes such as pfcrt, pfmdr1, pfdhfr, pfdhps, and pfkelch-13. In this study, the ability of IHS to detect selection in these drug resistance genes in Ethiopia may be because the frequency of polymorphisms in these loci are either xed or near xation in the Ethiopian population (Table 4). These ndings are supported by a previous study in Ethiopia which showed that the CQ-resistant haplotype (CVIET) was xed [7]. The continued use of CQ in Ethiopia for the treatment of P. vivax malaria may account for the high prevalence of CQ resistant markers through indirect pressure. Also, pfmdr 1 mutations have been demonstrated to mediate AL resistance and therefore, the high prevalence of pfmdr 1 mutations may signal poor e cacy of AL as rst treatment for P. falciparum malaria in Ethiopia.
Variable prevalence of CQ-resistant polymorphisms were observed only in DR Congo and not in Malawi, evidence that support the complete reversal of CQ susceptibility in Malawi as reported by Ochola etal [20] Undoubtedly, artemisinin resistance has taken root in Southeast Asia. Although no validated pfKelch13 mutation was found in the African samples, 36% and 26% prevalence of pfKelch13-C580Y mutations were reported in Cambodia and Thailand, respectively. However, an uncharacterized pfkelch13 mutation (pfK13-K189T) found at prevalence > 10% in all the African datasets was previously reported in other studies [37] although its role in artemisinin resistance is yet to be determined. Indeed, a study reported mutation in pfkelch13 at amino acid positions less than 441that may not play any role in mediating artemisinin resistance [8]. But, validated pfkelch13-R561H mutation for artemisinin resistance was recently reported in other East African P. falciparum populations [9].

Conclusion
Overall, our study revealed comparably low genetic diversity of P. falciparum parasites in Ethiopia. The majority of infections were of low complexity, demonstrated signi cant population structure with Ethiopian parasites diverged from parasite populations within the sub-region. We highlight limited gene ow between parasite populations in the east African sub-region and Ethiopia. We also reported balancing selection in antigenic loci known to be targets of immunity and adaptive positive selection in SURFIN and PHIST gene families that are potential vaccine antigens. Though selection analysis did not pick up any adaptive mutations in known drug resistant genes, we reported xation of the CQ-resistance pfcrt-K76T genotype and others in Ethiopia and the wild-type genotype (K) in Malawi. We reported no pfKelch13 validated mutations in Ethiopia, DR Congo and Malawi except a PfK13-K189TAfrican speci c uncharacterized mutation. Further molecular studies involving deeper sampling of Ethiopian parasite populations are essential to understand the genetic diversity, gene ow and temporal evolution of drug resistance loci within Ethiopia. Our ndings can be used to support national malaria control decision making for optimal impact in further reducing malaria transmission in Ethiopia.