Exploration of genetic diversity of Plasmodium vivax circumsporozoite protein (Pvcsp) and Plasmodium vivax sexual stage antigen (Pvs25) among North Indian isolates

Background Malaria is one of the important vector-borne diseases with high fatality rates in tropical countries. The pattern of emergence and spread of novel antigenic variants, leading to escape of vaccine-induced immunity might be factors responsible for severe malaria. A high level of polymorphism has been reported among malarial antigens which are under selection pressure imposed by host immunity. There are limited reports available on comparative stage-specific genetic diversity among Plasmodium vivax candidate genes in complicated vivax malaria. The present study was planned to study genetic diversity (Pvcsp and Pvs25) among complicated and uncomplicated P. vivax isolates. Methods Pvcsp and Pvs2-specific PCRs and DNA sequencing were performed on P. vivax PCR positive samples. Genetic diversity was analysed using appropriate software. Results The present study was carried out on 143 P. vivax clinical isolates, collected from Postgraduate Institute of Medical Education and Research, Chandigarh. Among the classic and variant types of Pvcsp, the VK210 (99%; 115/116) was found to be predominant in both complicated and uncomplicated group isolates. Out of the various peptide repeat motifs (PRMs) observed, GDRADGQPA (PRM1) and GDRAAGQPA (PRM2) was the most widely distributed among the P. vivax isolates. Whereas among the Pvs25 isolates, 100% of double mutants (E97Q/I130T) in both the complicated (45/45) as well as in the uncomplicated (81/81) group was observed. Conclusion An analysis of genetic variability enables an understanding of the role of genetic variants in severe vivax malaria.

regions. Population and genetic diversity of P. vivax are important factors in understanding vivax malaria transmission dynamics [7]. Knowledge of the extent of genetic diversity enables the prediction of a pattern of emergence and spread of phenotypes of novel antigenic variants which leads to drug resistance or escape of vaccine-induced immunity, which might be responsible for the development of severe vivax malaria [8,9]. Over the past few years extensive studies have been undertaken to understand the genetic diversity of P. falciparum, however there is a scarcity of literature on P. vivax genetic diversity [10].
Well-characterized polymorphic regions of both preerythrocytic and erythrocytic stage of P. vivax have been analysed to study genetic diversity patterns. For population genetics studies, circumsporozoite protein (CSP) (required in exo-erythrocytic cycle) is an important molecular marker to understand P. vivax diversity [11]. CSP is a prime target for anti-infection vaccines and has been studied extensively in terms of antigenicity and polymorphism. Pvcsp is a single copy gene encoding highly immunogenetic major sporozoite surface protein [8]. It encodes protein that consists of a central domain, having tandemly repeat sequences flanked by two non-repetitive conserved domains RI and RII: type I thrombospondin repeat (TSR) at C terminal and a 5-aa sequence at the N terminus as shown in Additional file 1: Figure S1A [11]. Three different genotypes (VK210, VK247, P. vivax-like) have been identified for the Pvcsp gene, based on the variation in the number of the peptide repeat motifs (PRMs) and sequences in the central repeat domain of Pvcsp [12]. The different genotypes are found to be globally distributed with geographic biases, where VK210 predominates in the endemic regions, while VK247 is reported from the regions, possessing cases of mixed infections [13,14].
Several sexual stage antigens have been recognized on the basis of strong immunogenicity and potential transmission inhibiting activities [15]. The sexual stage antigens generally are of two types: 1st type consisting of post-fertilization antigens, such as Pfs25 of P. falciparum and Pvs25 and Pvs28 of P. vivax, which are expressed on the zygote and ookinete surface; 2nd type are pre-fertilization antigens expressed on the both male and female gamete surface of malaria parasite, such as Pfs48/45 and Pfs230 [16]. Pvs25 consists of a secretory N terminal signal sequence and 22 cysteine residues in a hydrophobic C terminus four epidermal growth factor (EGF) domains, as shown in Additional file 1: Figure S1B. These EGF domains consist of the consensus amino acid sequences of zygote/ookinete surface proteins of malaria parasites [17]. This protein plays an important role in various transformation processes occurring in the midgut of the mosquito. Polymorphism has also been reported in the Pvs25 gene, like the other vaccine candidate genes, which can hamper the vaccine efficacy [18]. The present study was planned to study the molecular epidemiology, levels of genetic diversity (Pvcsp and Pvs25) among severe and non-severe P. vivax isolates collected from a tertiary hospital (PGIMER, Chandigarh) for a better understanding of the complexity of infection.

Study subjects
In the present study, a total of 143 P. vivax-positive patients attending PGIMER outpatient and inpatient departments (OPD and IPD) from adjoining states of Chandigarh (Haryana, Punjab, Uttar Pradesh) fulfilled the inclusion criteria of signs and symptoms of malaria and were enrolled for the period 2013 to 2016. The blood samples were collected aspectically by a trained practitioner and were transported to the laboratory for further processing. All the samples were further confirmed for P. vivax by molecular methods as described earlier by Kaur et al. [5]. WHO-based criteria for severe malaria was followed for the classification of P. vivax-positive patients into complicated and uncomplicated P. vivax groups.

Amplification of Plasmodium vivax genetic diversity associated genes
The reference sequence of P. vivax (GU339059 and AF083502.1) was used for the primer designing of P. vivax genetic diversity genes Pvcsp and Pvs25. The sequences of primers used in the study are shown in Additional file 2: Table S1. The nested PCRs for Pvcsp and Pvs25 genes were carried out on all P. vivax-positive samples. The negative control was included in each amplification reaction and precautions were taken to prevent cross-contamination. All PCR reaction mixtures were prepared using high fidelity Platinum Taq DNA polymerase (Thermo Fisher Scientific, Inc, Wilmington, DE, USA) shown in Additional file 2: Table S2. The thermal cycling profiles used for the amplification of Pvcsp and Pvs25 are shown in Additional file 2: Table S3.

Nucleotide sequencing and analysis
After visualization of the amplified products of Pvcsp and Pvs25 on gel, the PCR products were purified using the PCR purification kit (Qiagen, Germany) as per manufacturer's instruction. The nested PCR primers (NF and NR) for Pvcsp and Pvs25 was then utilized to perform Sanger sequencing (Genewiz INC, NJ, USA) in forward and reverse directions for all the purified products. The obtained sequences were then edited and analysed to see the intraspecific variation (SNPs) if any, among the sequences [10]. To investigate the genetic diversity among the Pvcsp and Pvs25 genes, MEGA vs 7.0.21 software was used [19]. The rate of synonymous and nonsynonymous substitutions per site were estimated using the Nei and Gojobori method with Jukes and Cantor correction [20]. Mega software was used to test the null hypothesis (which states the strict neutrality of the gene), by estimating the dN−dS difference with the standard error of mean by 1000 bootstrap replications with two tail Z-test on the difference between dN and dS [21]. The rate of synonymous substitution is seen to accumulate at faster rate as compared to non-synonymous under the neutral model without effecting the parasite fitness (dS > dN). On the other hand, a high rate of non-synonymous substitutions is observed if positive selection is maintaining the polymorphism (dS < dN). A null hypothesis was assumed when the polymorphism was not under selection (dS = dN). To analyse the genetic relationship among the present study and worldwide haplotypes, Haplotype network was constructed [22].

Results
In the present study, a total of 143 vivax malaria-positive patients were enrolled and demographic details and clinical history were collected at the time of sample collection ( Table 1). The majority of P. vivax patients were from neighbouring states of Chandigarh: Haryana (29.4%), Punjab (23.7%) and Uttar Pradesh (21.7%). Out of 18.8% (27/143) cases of P. vivax observed from Chandigarh, the majority had travel history to the adjoining malarious regions of Chandigarh.
The nested PCR for Pvcsp and Pvs25 was performed in a total of 143 P. vivax isolates. The sequencing of Pvcsp and Pvs25 fragments was successful in a total of 81% (116/143) and 88% (126/143) of P. vivax-positive clinical isolates, respectively. The multiple sequence alignment (MSA) of the deduced protein sequence was performed as shown in Additional file 3: Figure S2.

Plasmodium vivax circumsporozoite protein sequence analysis
The Pvcsp sequence analysis revealed the presence of only single infections of VK210 and VK247 variant types, without the presence of any mixed infection when compared to reference Sal-I sequence GU339059. The majority (99%; 115/116) of the P. vivax isolates were of VK210 variant type of Pvcsp and only one isolate was found of the VK247 variant type in the uncomplicated group. No P. vivax type of Pvcsp was identified in the clinical isolates. The 99% (115/116) of VK210 isolates consisted of variable repeats of majorly two PRMs, GDRADGQPA (PRM1), GDRAAGQPA (PRM2), followed by the conserved post-repeat sequence GNGAGGQAA (PRM4). Only one among 115 isolates was found to consist of third type of PRM (GDRAAGLPA) (PRM3). The KLKQP prerepeat sequence was observed in all the clinical isolates of Pvcsp. Table 2 shows the observed non-synonymous substitution in the PRMs which gave rise to different PRMs types on the basis of different types of repeat allotypes (RATs). The rate of non-synonymous substitution per site was found to be higher in the complicated group of patients as compared to the synonymous substitution and uncomplicated group, leading to Dn-Ds of 0.003 ± 0.0010 SEM which was not found to be statistically significant ( Table 3). The dN/dS ratio observed for Pvcsp in complicated group is > 1, which clearly suggests that the Pvcsp strains of this group are under positive selection compared to those of the uncomplicated group where the ratio was observed < 1. The haplotype network analysis shows the 28 haplotypes among the studied sequences (Fig. 2).

Plasmodium vivax sexual stage antigen 25 (Pvs25) sequence analysis
The  Table 4). The rate of non-synonymous and synonymous

Discussion
Malaria is a disease of global importance and is one of the most important life-threatening parasitic infection affecting human beings. In the recent years, an upsurge of severe vivax malaria infection has been reported from various parts of the world, including India [24]. The genetically diverse population is thought to have increased potential to resist anti-malarials, vaccines and host immune response. Among the various genetic markers, Pvcsp is an important genetic marker used by many researchers from different geographical regions for elucidation of population genetics and evolutionary dynamics [8,25,26]. In the present study, the genetic diversity of Pvcsp was estimated among clinical isolates collected from PGIMER, Chandigarh. The results are in concordance with previously published results showing the prevalence of VK210 in 81-100% of isolates, emphasizing the dominance of VK210 over VK247 [6,7,9]. Variations present in the number of repeat units, along with differing amino acid and nucleotide sequence in repeat regions of Plasmodium antigens, are suggestive of natural selection pressure imposed by the host immune system [27]. In the present study, GDRADGQPA (PRM1) and GDRAAGQPA (PRM2) were found to be the two major PRMs. Earlier studies have reported the prevalence of these two major PRMs (GDRADGQPA, GDRAA-GQPA) in clinical isolates [8,28]. All the isolates were found to consist of similar pre-repeat sequence (KLKQP Region) and conserved post-repeat sequence GNGAG-GQAA (PRM4). This conserved post-repeat sequence is found to present at the end of the sequence as a last unit in all the VK210 isolates, which was also reported in previous studies from Iran and Sri Lanka [8,28]. One more peptide repeat motif GDRAAGLPA (PRM3) was observed at lower frequency (0.9%) among the isolates of the present study. The numbers of these PRMs are the main contributing factor for the development of genetic diversity in the Pvcsp gene among different geographical regions. The mode of evolution occurring in CSP of P. vivax is thought to be similar to that of CSP of P. falciparum. In P. falciparum it is known to include events of repeated non-reciprocal, intrahelical recombination events during mitotic DNA replication [29]. This phenomenon might lead to the generation of novel variants possessing the capability to evade the host immune response. Thus, the major factors responsible for the sequence evolution among the natural population involves the phenomenon of mitotic recombination accompanied with the positive selection of new variants in P. vivax [12]. Point mutations and intragenic recombination events might have also played a role in the generation of RATs, displaying remarkably similar arrangement in the RATs, suggestive of relatively recent origin from a common ancestor [12].
In the present study, the specific arrangement of the two dominant PRMs namely [GDRADGQPA (PRM1) and GDRAAGQPA (PRM2)] gave rise to a total of 28 different haplotypes in Pvcsp. The observed dN/dS ratio was found to be more than 1 in the complicated group of patients compared to the other group, which is suggestive of positive selection events occurring in the complicated group P. vivax isolates. The observed sequence variation in the VK210 between complicated and uncomplicated groups might be due to several factors, such as the distribution pattern of vector species (Anopheles stephensi, Anopheles culicifacies, Anopheles subpictus) and the difference in the infectivity of vector species by different genotypes of Pvcsp and/or host immunity against certain genotypes [30,31].
Pvs25 is one of the most promising vaccine candidate proteins among the cysteine-rich protein family. However, vaccine efficacy could be hindered due to reported antigenic diversity in Pvs25 [32]. In the present study, the presence of 100% of double mutants carrying the combination of E97Q/I130T in both groups of patients was observed. The most frequent changes were observed in the EGF2 (E97Q) and EGF3 (I130T) domain of Pvs25. No novel amino acid substitution was observed in Pvs25 in the present study. Chaurio et al. [22] reported overall low level of variation in Pvs25, with E97Q in 50% and I130T in 89% of the isolates, compared to present study results. The dominant prevalence of double amino acid substitutions at two positions (97Q/130T) observed in the present study was found to be similar to the previous study from Iran, having the prevalence of 97Q/130T (84%) among clinical isolates [18]. In contrast, a very high rate of non-synonymous amino acid substitution in Pvs25 gene (n = 10 amino acid substitution) was observed by Prajapati et al. from India, however the presence of only two non-synonymous amino acid substitutions with double mutant haplotype was found in the present study [33]. Another study from China has reported the high prevalence of 97Q and 130T haplotype in 20% and 100% of the isolates, respectively [34]. The E97Q amino acid substitutions have been reported mainly from Asian isolates, i.e., from Bangladesh, Thailand, Indonesia, and South Korea [15,33,[35][36][37], whereas the I130T amino acid have been mainly reported from both Asia and America [34]. The EGF2 and EGF3 of Pvs25 were reported to consist of epitope recognition sites, identified for blocking antibodies [38]. As the sexual stage-specific proteins (Pvs25) are adapted to environment inside a vector's body to complete its life cycle, the polymorphism observed among these surface antigens (Pvs25) has been associated with selective pressure exerted by the human immune system [39].

Conclusion
Population genetic studies, such as in the present study, are required to understand the population genetic structure for the identification of signatures of balancing selection within P. vivax surface antigens. These studies will enable the identification of domains targeted by the host immune pressure and an understanding of the mechanism of host immune response for the identification of potential vaccine candidate [40]. A better understanding of genetic variability in different geographical regions will enlighten the role of these genetic variants in severe vivax malaria. These studies will be key for designing and implementing efficacious vaccines. The results of the present study will be used as baseline data for future studies.
Additional file 1: Figure S1. Schematic diagram of A) Pvcsp containing signal sequence (S), RI domain, central repeat region domain, post repeat region (PR), RII region containing thrombospondin repeat (TSR) and an anchor sequence; B) Pvs25 containing signal sequence (SS), four EGF domains, and glycosylphosphatidylinositol (GPI) anchor.
Additional file 2: Table S1. Primers used for the amplification of Pvcsp and Pvs25 genes. Table S2. Final concentration of PCR reagents used for nested and conventional PCRs of Pvcsp and Pvs25. Table S3. Thermal cycling profile used for the amplification of Pvcsp and Pvs25 genes.
Additional file 3: Figure S3. Multiple sequence alignment (MSA) of A) Plasmodium vivax circumsporozoite protein (Pvcsp) and B) Plasmodium vivax sexual stage antigen Pvs25 of the Plasmodium vivax clinical isolates using Clustal X 2.1.