Diversity pattern of Duffy binding protein sequence among Duffy-negatives and Duffy-positives in Sudan

Background Vivax malaria is a leading public health concern worldwide. Due to the high prevalence of Duffy-negative blood group population, Plasmodium vivax in Africa historically is less attributable and remains a neglected disease. The interaction between Duffy binding protein and its cognate receptor, Duffy antigen receptor for chemokine plays a key role in the invasion of red blood cells and serves as a novel vaccine candidate against P. vivax. However, the polymorphic nature of P. vivax Duffy binding protein (DBP), particularly N-terminal cysteine-rich region (PvDBPII), represents a major obstacle for the successful design of a DBP-based vaccine to enable global protection. In this study, the level of pvdbpII sequence variations, Duffy blood group genotypes, number of haplotypes circulating, and the natural selection at pvdbpII in Sudan isolates were analysed and the implication in terms of DBP-based vaccine design was discussed. Methods Forty-two P. vivax-infected blood samples were collected from patients from different areas of Sudan during 2014–2016. For Duffy blood group genotyping, the fragment that indicates GATA-1 transcription factor binding site of the FY gene (− 33T > C) was amplified by PCR and sequenced by direct sequencing. The region II flanking pvdbpII was PCR amplified and sequenced by direct sequencing. The genetic diversity and natural selection of pvdbpII were done using DnaSP ver 5.0 and MEGA ver 5.0 programs. Based on predominant, non-synonymous, single nucleotide polymorphisms (SNPs), prevalence of Sudanese haplotypes was assessed in global isolates. Results Twenty SNPs (14 non-synonymous and 6 synonymous) were identified in pvdbpII among the 42 Sudan P. vivax isolates. Sequence analysis revealed that 11 different PvDBP haplotypes exist in Sudan P. vivax isolates and the region has evolved under positive selection. Among the identified PvDBP haplotypes five PvDBP haplotypes were shared among Duffy-negative as well as Duffy-positive individuals. The high selective pressure was mainly found on the known B cell epitopes (H3) of pvdbpII. Comparison of Sudanese haplotypes, based on 10 predominant non-synonymous SNPs with 10 malaria-endemic countries, demonstrated that Sudanese haplotypes were prevalent in most endemic countries. Conclusion This is the first pvdbp genetic diversity study from an African country. Sudanese isolates display high haplotype diversity and the gene is under selective pressure. Haplotype analysis indicated that Sudanese haplotypes are a representative sample of the global population. However, studies with a large number of samples are needed. These findings would be valuable for the development of PvDBP-based malaria vaccine. Electronic supplementary material The online version of this article (10.1186/s12936-018-2425-z) contains supplementary material, which is available to authorized users.


Background
Vivax malaria is a leading public health concern worldwide [1]. Out of 6 malaria parasites that infect humans, Plasmodium vivax is the second lethal cause of infection after Plasmodium falciparum, which is widely distributed in malaria-endemic regions where approximately 2.85 billion people are at risk [2]. Although global malaria cases are on the decline over the years, with the emergence of drug-resistant P. vivax strains and associated severe and fatal consequences and the ability to produce hypnozoites causing relapse in patients, there are clear indications of notable public health significance [3][4][5][6][7]. Eradication and control interventions targeting P. falciparum are supposed to not be fully effective against P. vivax which is certainly a major challenge for malaria eradication [8]. Therefore, there is an urgent need for a vaccine against this deadly parasite [9].
Plasmodium vivax invasion is primarily dependent on the interaction between Duffy binding protein and its corresponding receptor Duffy antigen receptor for chemokines (DARC) [10]. Recently there are confirmed P. vivax infections among Duffy-negative population, indicating alternate invasion pathways [11][12][13]. Therefore, P. vivax Duffy binding protein (PvDBP) is a novel vaccine candidate because it induces strong immune responses in humans, and anti-DBP antibodies inhibit DBP-DARC interaction in vitro and also block merozoite invasion of human erythrocytes [14].
The erythrocyte-binding motif of DBP resides in the N-terminal cysteine-rich domain (C1-C12) known as DBPII [15]. However, the critical binding motif is a 170 amino acid (291-460 aa) stretch within the 330 aa of DBPII, which includes cysteines C4-C7 [16]. Successful invasion engages dimerization between 2 DBP and 2 DARC molecules [17]. Similar to other blood stage adhesion ligands, DBPII resides under strong immune selective pressure [18]. PvdbpII region is highly polymorphic with 4 times higher substitution rate compared to the rest of the gene, and polymorphisms at certain residues were found to be consistent in global pvdbpII isolates, demonstrating that polymorphisms help in immune evasion mechanism [19,20]. Although such polymorphism is not known to affect the hostparasite binding, anti PvDBP antibodies, detected in residents of malaria-endemic areas, recognize PvDBPII but reveal significant differences regarding inhibitory responses [18].
The variable nature of the PvDBP, in particular DBPII, is a major challenge to the development of a vaccine against vivax malaria, restricting strain-transcending global protection [21]. The haplotype analysis using global pvdbpII sequences previously identified 7 haplotypes that can cover 60% parasite population in 8 malariaendemic countries [22]. Such knowledge is important for the rational design of DBP-based vaccine against vivax malaria. However, existing data lack the information of DBP sequence polymorphism originating from an African country such as Sudan, as African P. vivax studies are neglected because of the high burden of P. falciparum cases and a high percentage of Duffy-negative population, which is usually thought to serve as a selective barrier to P. vivax infection [23]. Moreover, several studies revealed that P. vivax infection in African countries is not as low as was considered before [24]. High numbers of imported cases of P. vivax have also been documented in Chinese workers returning from African countries [25,26]. Additionally, around 86.6 million Duffy-positive individuals were at risk in sub-Saharan countries; more attention to P. vivax in Africa is needed to control this parasite [24]. A recent study showed a 35.6% P. vivax infection rate in Central Sudan, indicating great public health significance of this neglected parasite [27]. However, there is no study of DBP sequence polymorphism from Sudan.
This study was conducted to determine the haplotype diversity of pvdbpII and Duffy blood group genotyping, natural selection of Sudanese pvdbpII and the haplotype prevalence based on predominant, non-synonymous single nucleotide polymorphisms (SNPs) in other endemic countries worldwide. The implication in terms of a DBPbased vaccine design is discussed.

Blood sample collection and DNA preparation and species identification by PCR
Sixty-three blood spots in Whatman filter papers were collected from cross-sectional surveys between 2014 and 2016 during the malaria transmission season (August to February) from malaria patients attending hospitals or health facilities in Sudan (Additional file 1). Initially, all Conclusion: This is the first pvdbp genetic diversity study from an African country. Sudanese isolates display high haplotype diversity and the gene is under selective pressure. Haplotype analysis indicated that Sudanese haplotypes are a representative sample of the global population. However, studies with a large number of samples are needed. These findings would be valuable for the development of PvDBP-based malaria vaccine.
Keywords: Malaria, Plasmodium vivax, Sudan, Haplotype, Genetic diversity samples were screened by rapid diagnostic kit (RDT) and microscopy for P. vivax infection. Genomic DNA was extracted using a QIAamp Blood Mini Kit Qiagen, Valencia, CA, USA) and kept at − 20 °C until used. Then Plasmodium species were identified by 18SrRNA gene-based nested PCR using genus and species-specific primers as described previously [28].

Duffy genotyping
A 252 bp fragment, which indicates GATA-1 transcription factor binding site of the FY gene (− 33T > C), was amplified using nested PCR. The primers were used as follows: Nest 1 Duffy_F: 5′-GGA TGG AGG AGC AGT GAG AG-3′ and Nest 1 Duffy_R: 5′-CAA AGG GAG GGA CAC AAG AG-3′; Nest 2 Duffy_F: 5′-TAG TCC CAA CCA GCC AAA TC-3′ and Nest 2 Duffy_R: 5′-TCA CCC TGT GCA GAC AGT TC-3′. PCR amplification was performed using an Accupower premix (Bioneer, Daejeon, Korea) in a final volume of 20 µL containing 200 pM of each primer, 250 µM of each deoxyribonucleoside triphosphate, 1.5 mM MgCl 2 , 10 mM Tris-hydrochloric acid (pH 9.0), 30 mM KCl, 1.0 units of Taq polymerase and 2 µL of genomic DNA template. For nest 1 PCR reaction, the cycling conditions were: initial denaturing at 94 °C for 5 min followed by 35 cycles of 94 °C for 30 s, 59 °C for 1 min, and 72 °C for 45 s, and a final extension step of 10 min at 72 °C. The nest 2 primers were added with the nest 1 amplicons and amplified using the similar conditions except that the extension time of 25 s at 72 °C during 35 cycles. PCR products were analysed by electrophoresis on 1% agarose gel stained with 0.05% Redsafe dye (iNtRON Biotechnology, Daejeon, Korea). Amplicons were gel purified using DNA purification kit (Macherey-Nagel, Germany) as per manufacturer's instruction. Purified products were sequenced in two directions using Nest 2 Duffy_F and Nest 2 Duffy_R primers by commercial sequencing company (Genotech, Daejeon, Korea) for genotyping. Sequences were aligned with NCBI reference Sequence: NG_011626.3 to detect a polymorphism (− 33T → C) for genotype confirmation.

PCR amplification and sequencing
The region flanking P. vivax dbpII fragment (nucleotide positions 586-1848 bp; aa 196-617) was amplified using nested PCR. Nest 1 forward and reverse primers were Nest 1_F: GAT AAA ACT GGG GAG GAA AAA GAT and Nest 1_R: CCC GTA ACA GCT TTA CCT G, respectively. Nest 1 PCR amplification was performed using an Accupower premix (Bioneer, Daejeon, Korea) in a final volume of 20 µL containing 200 pM of each primer, 250 µM of each deoxyribonucleoside triphosphate, 1.5 mM MgCl 2 , 10 mM Tris-hydrochloric acid (pH 9.0), 30 mM KCl, 1.0 units of Taq polymerase and 3.5 µL of genomic DNA template. PCR cycling conditions were performed as follows: initial denaturing at 94 °C for 5 min followed by 35 cycles of 94 °C for 30 s, 59.6 °C for 30 s, and 72 °C for 1 min 30 s, and a final extension step of 10 min at 72 °C. Nest 2 forward and reverse primers were Nest 2_F: ATG TTA GAT TAT GAG ACA TCT AGC AA and Nest 2_R: AAC AGC TTT ACC TGT GGT AGAAC, respectively. Then nest 2 PCR amplification was performed using the similar reaction condition to nest 1 with the exception of 4 µL of nest 1 amplicon as template and thermal profile were used as follows; initial denaturing at 94 °C for 5 min followed by 35 cycles at 94 °C for 30 s, 57.6 °C for 30 s, and 72 °C for 1 min 30 s, and a final extension step of 10 min at 72 °C. PCR products were analysed by electrophoresis on 1% agarose gel stained with 0.05% Redsafe dye (iNtRON Biotechnology, Daejeon, Korea). Amplicons were gel purified using DNA purification kit (Macherey-Nagel, Germany) as per manufacturer's instruction. Direct sequencing was performed using Nest 2_F and Nest 2_R by a commercial sequencing company (Genotech, Daejeon, Korea).
All sequences generated were assembled and edited using SeqMan and EditSeq, Lasergene v 7.0 (DNASTAR) and aligned using the Clustal W program in MegAlign Lasergene v 7.0 (DNASTAR). Samples having mixed genotype infections were excluded and any singleton site was confirmed by re-sequencing. SNPs at different residues were identified compared to Sal-1 (GenBank accession no. DQ156512) sequence as a reference. The sequences reported here have been deposited in the GenBank database under the accession numbers (MG805616-MG805657).

DNA sequence and polymorphism analysis
DNA sequence polymorphism analysis was investigated on 42 Sudan pvdbpII sequences (nt. 727-1653). Using the DnaSP ver. 5.10.00 [32] following genetic diversity parameters were analysed; a number of segregating sites (S), nucleotide diversity (π), haplotype diversity (Hd), and the average number of pair-wise nucleotide differences within the population (k).
To explore the effect of overall natural selection in the fragment and on the epitope regions, rates of synonymous (dS) and non-synonymous (dN) substitutions was investigated using Nei and Gojobori method [29] with Jukes and Cantor correction. Tajima's D test [30] and Fu and Li's D and F tests [31] were performed with DnaSP ver. 5.10.00 [32]. For epitope regions, 10 known linear B-cell epitopes, which were classified into 3 distinct classes, high inhibitory (H), medium (M) inhibitory and low (L) inhibitory, were examined for natural selection [18].

Analysis of predominant pvdbpII nucleotide haplotypes
In order to determine whether Sudanese haplotypes were prevalent in other endemic countries, 10 predominant non-synonymous SNPs were used, and these haplotypes were compared with haplotype data from 10 endemic countries (Mexico, Colombia, Brazil, Iran, India, Sri Lanka, Myanmar, Thailand, South Korea, Papua New Guinea).

Genetic polymorphism of PvDBPII
Out of the 63 Sudanese isolates, the pvdbpII gene was successfully PCR amplified from 54 isolates. High-quality sequencing results for the PvDBPII fragments (927 bp) were obtained from 42 isolates, which were taken for further analysis. DNA sequence analysis of the 42 pvdb-pII sequences showed that there was no size variation among the sequences compared to reference Sal-1 strain sequence. Twenty SNPs were found, 6 (30.0%) of which were synonymous and 14 (70.0%) were non-synonymous substitutions. Of these 20 SNPs, 7 occurred at the first base of the codon, 5 at the second, and 8 at the third base of the codon, resulting in significant amino acid changes in protein level (Fig. 1a). Among these, 2 singleton nonsynonymous substitutions were detected (D497N and T513K). Five synonymous sites were found in 2.4% R408, 2.4% I380, 7.1% Q432, 2.4% A514, and 2.4% N531. Alignment of deduced amino acid sequences revealed 11 different haplotypes (haplotype 1-11) with amino acid changes at 14 positions in which one showed trimorphic polymorphism (K386N/Q) and the others were dimorphic (Fig. 1a). Duffy-negative P. vivax isolates reside in 6 different haplotypes out of 11 haplotypes (Fig. 1a). Of the 14 non-synonymous substitutions, 13 were reported in the previous study in the parasites circulating in different parts of the world, whereas one point mutation was found unique to current study (D497N, 2.4%,) [33][34][35]. All 12 cysteine residues were conserved among the 42 Sudanese isolates. Moreover, among all non-synonymous sites, frequency of residues K371E (47.6%), D384G (73.8%), N417K (76.2%), L424I (88.1%), W437R (76.2%), and I503K (85.7%) were higher than remaining sites (Fig. 1b). However, haplotype 1 was found dominant (26.2%) among other haplotypes (Fig. 1a). Comparison of the most common amino acid variants observed globally in PvDBPII revealed that Sudan isolates showed a comparatively similar pattern of minor allele frequency to Mexico isolates, but differed from other countries isolates (Papua New Guinea, Colombia, Brazil, Iran, India, Sri Lanka) ( Table 1). Although Mexican isolates showed similar amino acid changes compared to Sudan isolates, 2 variants found in the Sudan isolates (D497N and T513K) were not identified in Mexican isolates [36].

Haplotype diversity and comparison with global populations
Based on the most common 10 non-synonymous SNPs observed globally, 8 haplotypes were found in Sudan (Additional file 2). The distribution of these haplotypes in 10 P. vivax endemic countries worldwide showed that all countries had at least one Sudanese haplotype, (Haplotype 4 in Colombia) and maximum 6 Haplotypes (Haplotype 1, 2, 3, 4, 6, and 9 in Sri Lanka) (Fig. 2) covering varied frequencies of parasite population, with the highest in Mexico (Haplotype 1, 2, 4 and 6, 91.4%) and lowest in Papua New Guinea (Haplotype 3 and 9, 1.8%) (Fig. 2). In a previous study, seven haplotypes were identified to cover an overall 60.0% worldwide parasite population but without any African countries [22]. Three of Table 1 Minor allele frequencies of the most common PvDBPII amino acid variants a The first letter represents the amino acid in that position in the Sal-1 sequence and the other letter represents the substituted amino acid; b [36]; c [22]; d [34]; e [35]; f [33]; g [42] Isolate PvDBPII variants forms on 10 non-synonymous position a the haplotypes in the present study (Haplotype 1, 2 and 6) were shared with those haplotypes.

Nucleotide diversity and natural selection of pvdbpII
Analysis of molecular polymorphism within the 42 pvdbpII sequences revealed lower nucleotide diversity (π = 0.00497) and higher haplotype diversity (Hd = 0.900) which was similar to the diversity indices from several endemic countries [35]. Sliding window plot (window length 100 bp, step size 20 bp) using the DnaSP, revealed that nucleotide diversity was highest at 300-650 nt position of pvdbpII (Fig. 3). A significant positive natural selection (dN − dS = 2.05, p < 0.05) was found within the Sudanese pvdbpII sequences.

Polymorphisms related to B-and T-cell epitopes
Analysis of natural selection in 10 known B-cell epitopes regions within Sudanese isolates indicated that the H3 epitope (residues 384-399 aa) was under positive selection (Table 2) whereas H2, M1, L1, L2, and L4 epitopes were conserved among the 42 Sudanese isolates ( Table 2). The nucleotide diversity at epitope H3 was also high.

Discussion
Plasmodium vivax, the widely distributed malaria parasite is a great public health concern. A broadly effective malaria vaccine is a powerful tool for the control, elimination and eradication of this deadly disease [21]. Scientists across the globe are searching for a good strategy that provides strain transcending as well as long-term protection. However, the high degree of diversity of parasite antigens and the strain-specific immune response are challenging issues in the success of malaria vaccines [18]. In the current study, the genetic diversity and natural selection of DBPII have been investigated among Sudanese P. vivax isolates, and this is the first report of any African origin P. vivax isolates.
Based on amino acid variants, 11 different PvDB-PII haplotypes were identified in the Sudanese isolates. Except for one unique haplotype (Haplotype 11), the majority of the haplotypes identified in this study were previously reported from other geographical areas (Mexico, Brazil, Iran, India, Sri Lanka, Thailand, South Korea) [22,[33][34][35][36]. Haplotype 1 was found to be the most prevalent in Sudanese isolates. Haplotype analysis showed that the number of identified haplotypes among Sudanese P. vivax isolates (26.2%) was higher than those previously reported from Mexico (22.9%) and Myanmar (22.2%) [34,36], but it was lower than those reported from Sri Lankan (33.0%) and Brazilian (27.9%) isolates [35,37].
Although Duffy negativity act as a selective barrier for P. vivax infection, few studies reported Duffy-negative P. vivax infection from several African countries [11][12][13].
Interestingly, in the current study, 7 Duffy-negative P. vivax cases were found, and they reside into six PvDBP haplotypes. However, it was found that 5 PvDBP haplotypes were shared among Duffy-negative as well as Duffy-positive individuals, except for one haplotype (Fig. 1a), which was unique to a Duffy-negative individual. Moreover, the 2 PvDBP Malagasy haplotypes from 2 Duffy-null individuals were also not identical to any of the haplotypes found in Sudan [38]. The findings of the shared PvDBP haplotypes between Duffy-positive and Duffy-negative individuals raise the question whether they have a role in P. vivax infection in Duffy-null individuals or not? The previous study reported that mutations in PvDBP did not relate to the ability of P. vivax to infect Duffy-null individuals [38]. An expansion of PvDBP copy number in Duffy-null P. vivax infection was documented which might allow binding the DBP with another ligand on Duffy-null erythrocyte [38]. Further PvDBP haplotype diversity, as well as a functional study with a large number of P. vivax isolates infecting Duffy-negative individuals will be needed to elucidate the exact function of shared haplotypes between Duffy-positive and Duffy-negative individuals and their role in P. vivax infection. Among the 14 non-synonymous mutations identified in the current study, 13 were previously reported, whereas the D497N was unique to Sudan isolates [33][34][35][36][37]. It was also found that all 12 cysteine residues were conserved, indicating the conserved binding ability to host erythrocytes in the field. However, previous studies revealed that these non-synonymous mutations do not affect the binding to the erythrocyte receptor but rather promote the parasite for immune evasion [18]. Three residues, N417K, W437R and I503K forming an important discontinuous epitope in PvDBP, were described as the main target for binding inhibitory antibodies against erythrocyte binding [39][40][41]. In the present study, N417K (76.2%), W437R (76.2%) and I503K (85.7%) were detected and these 3 SNPs were found simultaneously among 35 isolates (83.3%), which suggests a strong association between these residues in Sudanese P. vivax isolates. The frequency of these 3 SNPs was similar to that reported from Mexico but different from Sri Lanka, Colombia, Iran, and Myanmar [19,34,35,42].
Many previous pvdbpII diversity studies documented high rates of non-synonymous mutations relative to synonymous mutations (dN − dS > 1.9) that reflected a positive natural selection promoting greater diversity of pvdbpII, probably to avoid host immune responses irrespective of the geographical distribution [18]. In this study, analysis of natural selection within the 42 Sudanese P. vivax isolates also identified significant positive selection (dN − dS = 2.05, p < 0.05). These results indicate that by increasing polymorphisms within the pvdbpII, parasites escape from the host immune response [18]. Natural selection in the known B-cell epitope regions in Sudanese isolates indicated that only H3 epitope (aa 384-399) was found to be under strong positive selection that could generate greater diversity. Higher genetic diversity was also found in the H3 epitope region. Tajima's D, Fu and Li's D * , and Fu and Li's F * for the H3 epitope region was positive but not significant. Similar strong selection pressure in this epitope region was observed in Sri Lankan and Iranian isolates [35,42].
The number of migrant workers from P. vivaxendemic countries has risen in most African countries, including Sudan [27,43]. Recently, P. vivax infections have been reported in high numbers in Sudan. This rising trend is possibly due to human migration from several Asian and African countries to work at newly formed petroleum and sugar companies in Sudan [27]. It was hypothesized that Sudanese pvdbpII haplotype might be a representative sample of global P. vivax population and this information could be used for the rational design of a PvDBP-based multi-allele vaccine. On this basis, haplotypes were generated considering 10 most frequent polymorphism sites of pvdbpII and 8 haplotypes were identified from Sudanese P. vivax isolates (Additional file 2). The prevalence of these 8 haplotypes was observed in all 10 malaria-endemic countries with a variable frequency that covers the highest 91.4% parasite population in Mexico and lowest in Papua New Guinea (1.8%) (Fig. 2), with an overall coverage of 36.8% (Fig. 2). However, haplotype analysis from seven endemic countries (Mexico, Brazil, Iran, India, Sri Lanka, Thailand, South Korea) indicated that haplotypes from Sudan covered up to 50% parasite populations. The current study was conducted using a low number of samples from Sudan (n = 42) and the information generated can be considered preliminary, yet it was representative of global P. vivax population and further studies with a large number of samples from Sudan is required to come to conclusions. In the current study among the identified haplotypes, none was identical to the Sal-1 isolate, which was also found absent in Thailand [22]. However, several studies from other malaria-endemic countries documented Sal-1 at low frequencies [22]. Although a phase 1 trial of PvDBP vaccine based on the Sal-1 sequence has been reported [44] it might be an inefficient strategy because it covers only 10% of the worldwide samples [22]. To cover the major portion of the P. vivax population multiallele PvDBP might be a good approach. In a previous study, a similar multi-allele vaccine strategy with a minimum number of haplotypes using worldwide pvdbpII sequences identified seven haplotypes that covered an overall 60% worldwide parasite population but without any African countries [22]. In this study, out of 8 Sudanese haplotypes, 3 haplotypes were shared with that previous study those were (haplotypes 1, 6 and 7) suggested for multi-allele DBP vaccine approach.