Barcode of alleles from 11 variable sites
PCR fragments were successfully amplified for 20 SNPs. There were no PCR amplification for the other four SNPs and were rejected. Assays for only 13 SNPs provide interpretable LUMINEX signal (Additional file 3) and others were excluded from the assay. Out of the 13 SNPs, the locus #7 was abandoned due to non-reproducibility and non-accuracy of the detection on control DNAs and locus #5 was rejected because genotyping analysis revealed that this locus was monomorphic. Finally, 11 SNPs were validated for barcoding (BC01 to BC11). Four multiplex PCRs and two multiplex LDRs were set up for the LUMINEX detection according to their annealing temperature.
Since the work of Daniels et al. [11], the genome version and annotation has been improved. Databases revealed that 23 SNPs amongst the 24 were located in a coding region and they are equally distributed between synonymous and non-synonymous mutations (Additional file 6). Five SNPs were located in subtelomeric regions. Genomic analysis revealed that non detection in the locus #11 was due to the presence of two nearly identical copies of the rifin gene in which the SNP is located (Additional file 6). PlasmoDB v11.1 suggested that locus #15 corresponding locus was tri-allelic. LUMINEX data treatment was adapted for this locus. Locus #24 (BC11) was validated for LUMINEX genotyping despite its low variation in Cambodian parasite population (Additional file 6). Initial analysis was performed on 533 samples, 79 were resulting from mixed infection, 183 of the samples present no significant signal with LUMINEX for at least one barcode position and 50 samples could not be amplified using PCR for at least one locus. Among the 251 rejected samples, 59 samples show more than one type of errors (Additional file 7). Finally, 282 samples among 533 blood samples were successfully genotyped at 11 SNP loci.
Allele distribution associated to health centres
Correspondence analysis was performed using barcode of all the samples (Fig. 1a). Axis 1, 2 and 3 were explaining 21, 16 and 12 % of the information respectively. For each SNP, REF (reference 3D7) and ALT (non-reference) alleles were in opposite quadrants except for BC07 barcode position which was tri-allelic. The BC11_ALT_T allele was located at the centre of the representation as it was present in nearly all samples. Despite the low number of isolates with corresponding BC11_REF_G allele, correspondence analysis showed association of this allele with eastern and southern Cambodia. Association of alleles with health centres was questioned using Between-Class analysis (Fig. 1b). Matching of relative position of health centres in the Between-Class analysis with their geographic position suggests that some alleles show association with samples geographic origin. A strict opposition between eastern and western Cambodia was observed, which can be due to specific distribution of BC03 and BC05 alleles in samples from these areas (Fig. 1). Northern and southern localities present a similar distribution pattern in other projections (Additional file 8). Present analysis suggests that allele frequencies are in agreement with the geographic location of health centres. Pursat is not in a correct position in both correspondence and Between-class analysis. It is located in the western part of Cambodia and it clusters with eastern localities. The discrepancy with its geographic localization could be due to the BC06_ALT_T and BC07 allele frequencies.
Allele frequency gradient between localities
Uneven distribution of alleles was confirmed by Chi squared analysis (p value <0.05) and the allele frequencies were represented using Weblogo (Fig. 2). Barcode BC11 was excluded from this analysis because of its low variation. Most important allele enrichments were highlighted in red and blue colors for ALT and REF alleles respectively, using the Chi squared values (Additional file 1). Allele enrichment was not restricted to a single health centre but is often present in geographically close localities. Western Cambodia, including Battambang, Pailin and Pursat provinces displayed significant enrichment of ALT alleles for BC01, BC02, BC03, BC07 and BC08 and the REF allele for BC04, BC05 and BC06 (Additional file 8). The region of Kampot (Koh Slar and Chhouk health centres) showed strong enrichment of BC02_REF_T and BC04_ALT_T with a quasi-absence of opposite alleles (Additional files 1, 9B, D). For some alleles, the gradient from ALT to REF significant allele frequency was emphasized by the presence of health centres located between ALT and REF significant geographic area where no significant enrichment could be specified for any of the alleles. For example, western Cambodia appeared as the starting point for the diffusion of BC02_ALT_C and northern Cambodia was associated with BC02_REF_T (Additional file 9B). The region of Battambang shows no significant allele frequency for this locus. An example for the West-East axis is observed for BC10 locus where the region of Pailin was associated with BC10_REF_G alleles whereas BC10_ALT_A allele was found in eastern Cambodia. Accordingly, Battambang and Pursat health centres located between these two areas show no significant bias in allele frequency (Additional file 9J). Therefore, association between barcode alleles and localities could suggest the presence of specific subpopulations with fixed allele in restricted geographic distribution and overlap between these subpopulations or even gene flow.
Presence of fixed alleles at the border of Cambodia
The presence of subpopulations was confirmed using an average FST value calculated per health centres. High FST values are observed at the localities near the borders of Cambodia (Fig. 3), including Keov Seima (eastern Cambodia). Tasanh and Sampov Loun health centres in western Cambodia are associated with high FST values and accordingly BC02_ALT_C allele was observed to be fixed in Tasanh region. Similarly, BC04_ALT_T and BC09_ALT_C might have contributed to high FST values in northern localities. The fixation of BC04_ALT_T allele was also observed in Kampot province (Chhouk HC).
FST analysis and gradients of allele frequencies (Additional file 9A–J) over the country suggest gene flow in a centripetal orientation. According to high FST values, the five locations Anlong Veng, Keov Seima, Sampoev Loun, Tasanh and Trapaing Prasat might be associated with parasite subpopulations. Crossing of subpopulations could be responsible for allele diffusion over the country. Especially in western Cambodian sites, where the low FST values could result from overlap between subpopulations. This likely reflects gene flow driving the homogenization of the population.
Identification of emerging subpopulations in Cambodia
Results presented in sections above suggest that subpopulations were restricted to small geographic areas. Unsupervised clustering runs based on different random subset of the 282 isolates suggested the existence of 9 robust clusters (referred as G1–G9, size of the groups, n = 18–44) representative of the parasite subpopulations. The relationship between groups and health centres was established based on distance of samples to their geographical centroid. None of the groups had samples restricted to a single health centre, and most of the geographical centroids are focused in north-west area of the country (Fig. 4).
The three groups G1, G3 and G7 were significantly associated to specific geographic area (p value <0.05). The samples in these groups were mostly isolated in western Cambodia, but also include samples from the north or from the south of the country (Additional file 10). G1 includes two samples from eastern Cambodia. G3 samples were originating from western and southern Cambodia only. G7 had two samples from the southern and one from northern Cambodia. Relationship with previously described Cambodian parasite subpopulations shows that the three groups could be associated with KH2 and KH3 subpopulations. Accordingly, samples that have been probed, carried C580Y k13 mutation. Weblogos were added to the analysis to illustrate the frequency of alleles at the 11 barcode position among conserved clusters (Fig. 4). In accordance with the results presented above (Figs. 1, 2), G1 shows conserved allele positions: BC01_T, BC04_A, BC05_G, BC08_A and BC10_G. This genotype was very close to the two barcodes associated with Pailin and Ou Chra health centres. Group G3 weblogo was more reminiscent of Promoy HC barcode (Additional file 11). The barcode analysis based on 11 SNPs was efficient to describe conserved subpopulations that emerged recently in western Cambodia concomitantly with artemisinin resistance.
The three groups G2, G4 and G8, are localized in the area between north-western region and the centre of the country (Fig. 4; Additional file 10). The average distance of these groups to the geographical centroid presents a p value between 0.1 and 0.3. The two groups, G2 (n = 32) and G4 (n = 27), have samples from various localities. Most of the samples in G2 are originating from localities in southern Cambodia and include 10 samples with C580Y k13 allele. Three barcodes in this group are found in the admixed KHA subpopulation described earlier [8]. The samples in the group G4 are mostly originating from localities in western Cambodia and include eight samples with C580Y, two samples with R539T and one sample with N458Y k13 allele. Two barcodes in this group are identified in the previously defined parasite subpopulations, one barcode is found in KH4 subpopulation (also carrying Y493H allele) and the other is found in KHA subpopulation (also carrying R539T allele). The samples in the group G8 (n = 44) are mostly originating from the localities in the north (Trapaing Prasat and Anlong Veng health centre) of the country and include four samples with C580Y and six samples with R539T k13 allele. In this group the 11 barcode loci are conserved in most of the samples and some samples have variation at BC01, BC03 and BC08 locus. Three barcodes in this group are identified in the previously defined KH3 (shown to carry R539T alleles), KH1 (ancestral population) and KHA (admixed population with C580Y alleles) subpopulations.
The three other groups G5, G6 and G9 are localized close to the centre of the country and show no significant geographical centroid p values (0.99, 0.99 and 0.44, respectively). The samples in these groups are originating from localities from all over the country. In the group G5 (n = 40) only four samples are originating from the localities in the south of Cambodia. This group includes four samples with C580Y, one sample with R539T, one sample with P553L and one sample with Y493H k13 alleles. Four barcodes of this group are identified in the KH subpopulations, two barcodes in KH1 and two barcodes in KHA. The samples in the group G6 (n = 33) are mostly coming from the localities in the southern and eastern regions of Cambodia. This group includes six samples with C580Y, one sample with I543T and one sample with V568G k13 alleles. Only one barcode of this group is identified in the KH3 subpopulation. The samples in the group G9 (n = 27) are mostly originating from the localities in eastern and western regions and 4/5 tested sample are negative for k13 allele. In this group, two barcodes are identified in the KH4 subpopulation, two barcodes in the KHA subpopulation, one barcode in KH1 subpopulation and one barcode in KH3 subpopulation. The relationship between barcodes matching the KH4 subpopulation and Y493H allele was confirmed for seven isolates of this group.
The map (Fig. 4) represents a gradient of distribution of relevant subpopulation based on barcode description from north-west to the centre of Cambodia emphasizing gene flow in that orientation. The barcodes of the groups including samples from the north-western localities are mostly associated to the KH2 and KH3 subpopulations and most of the samples carry C580Y k13 allele only. Moving towards the centre it is observed that the barcodes are associated more with the KHA, KH1 and a specific KH3 subpopulations in the north (G8). The samples are mostly carrying C580Y and R539T k13 alleles and also a rare N458Y allele in one of the samples. The groups in the centre of the country are including more barcodes associated to the KHA and KH1 subpopulations and some barcodes matching the KH3 and KH4 subpopulations. The samples are shown to carry C580Y, R539T, Y493H and the three rare mutations P553L, I543T and V568G (Additional file 10). This could suggest the localization of the admixed populations with high diversity towards the centre of the country.
Mefloquine resistance is strongly associated to northern Cambodia
Mutations in k13 gene associated with resistance to artemisinin were determined in 98 patients, as described earlier [6]. From these patients, 70 % of the samples were positive for one of the k13 resistant alleles (C580Y, R539T, Y493H, I543T, P553L, V568G & N458Y). Artemisinin resistance was more frequent in western and northern Cambodia (Chi squared test p < 0.01). The mutant alleles Y493H, I543T, P553L, V568G, and N458Y were found once in the 282 isolates. The C580Y allele was the most prevalent (54/68 positive patients) and was found to be present in all the conserved groups. Thirty-seven different barcodes were found among these 54 samples. No association was found between the C580Y allele and 11-SNPs barcode. The R539T was the second most frequent allele (9 isolates over 68 positive patients) with six isolates belonging to G8, two to G4 and one from G5. Four isolates were from northern Cambodia, three from western and two from southern Cambodia. Barcodes of these nine samples have BC01_REF_C, BC03_ALT_A, BC04_ALT_T, BC05_REF_G, BC09_ALT_C and BC11_ALT_T in common. All these alleles were significantly associated to Trapaing Prasat health centre (Fig. 2; Additional file 11).
In vitro IC50 susceptibilities to chloroquine (n = 109), mefloquine (n = 111), piperaquine (n = 103) and quinine (n = 107), were assessed in isolates with a parasitaemia >0.1 % [16]. Samples were distributed among all geographical locations and clustering groups. Piperaquine showed no geographical bias. The susceptibility for chloroquine and mefloquine were lower in eastern Cambodia (Additional file 12). High mefloquine IC50 values were found in isolates from Promoy, Takavit and Trapaing Prasat health centres (Fig. 5). Mefloquine resistant parasites in the region between Promoy and Takavit were mostly carrying C580Y allele. R539T mutant parasites had significantly high mefloquine IC50 values (Fig. 5) suggesting two geographic loci for mefloquine resistance in Cambodia, one associated to C580Y allele and one associated to R539T allele. Large proportion of G8 samples were carrying R539T alleles and most originating from Trapaing Prasat HC in the north. The samples in G4 and G8 groups show high mefloquine IC50 values (Additional file 13). The FST values are shown to be high for Trapaing Prasat HC (Fig. 2). These results suggest the presence of a recently emerging P. falciparum subpopulation in northern Cambodia.