Merozoite surface protein-3 alpha as a genetic marker for epidemiologic studies in Plasmodium vivax: a cautionary note
- Benjamin L Rice†1,
- Mónica M Acosta†1,
- Maria Andreína Pacheco1 and
- Ananias A Escalante1, 2Email author
© Rice et al.; licensee BioMed Central Ltd. 2013
Received: 17 May 2013
Accepted: 18 August 2013
Published: 21 August 2013
Plasmodium vivax is the most widespread of the human malaria parasites in terms of geography, and is thought to present unique challenges to local efforts aimed at control and elimination. Parasite molecular markers can provide much needed data on P. vivax populations, but few such markers have been critically evaluated. One marker that has seen extensive use is the gene encoding merozoite surface protein 3-alpha (MSP-3α), a blood-stage antigen known to be highly variable among P. vivax isolates. Here, a sample of complete msp-3α gene sequences is analysed in order to assess its utility as a molecular marker for epidemiologic investigations.
Amplification, cloning and sequencing of additional P. vivax isolates from different geographic locations, including a set of Venezuelan field isolates (n = 10), yielded a sample of 48 complete msp-3α coding sequences. Characterization of standard population genetic measures of diversity, phylogenetic analysis, and tests for recombination were performed. This allowed comparisons to patterns inferred from the in silico simulation of a polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) protocol used widely.
The larger sample of MSP-3α diversity revealed incongruence between the observed levels of nucleotide polymorphism, which were high in all populations, and the pattern of PCR-RFLP haplotype diversity. Indeed, PCR-RFLP haplotypes were not informative of a population’s genetic diversity and identical haplotypes could be produced from analogous bands in the commonly used protocol. Evidence of frequent and variable insertion-deletion mutations and recurrent recombination between MSP-3α haplotypes complicated the inference of genetic diversity patterns and reduced the phylogenetic signal.
The genetic diversity of P. vivax msp-3α involves intragenic recombination events. Whereas the high genetic diversity of msp-3α makes it a promising marker for some epidemiological applications, the ability of msp-3α PCR-RFLP analysis to accurately track parasites is limited. Local studies of the circulating alleles are needed before implementing PCR-RFLP approaches. Furthermore, evidence from the global sample analysed here suggests such msp-3α PCR-RFLP methods are not suitable for broad geographic studies or tracking parasite populations for an extended period of time.
KeywordsPlasmodium vivax Merozoite surface protein-3 alpha Molecular markers Parasite diversity Population genetics PCR-RFLP Recombination
Evolutionary genetics theory indicates that the interplay of mutations with multiple, different processes, such as natural selection and recombination, can maintain high levels of genetic variability for a locus. For epidemiological investigations focused on tracking and characterizing parasite populations, it is then important to account for what evolutionary forces have shaped the genetic variation at a locus used as a marker. Understanding such processes allows the user to interpret patterns in an epidemiological context. Indeed, two alleles could be similar by convergence due to natural selection (identical by state) but have totally independent origins [34, 35]. Thus, rather than just focusing on high diversity per se, epidemiologists should consider how evolutionary processes can obscure inferences about population patterns when using specific genetic loci as markers in their investigations.
An important aspect of any technique used in measuring or characterizing genetic diversity is the actual interpretation of the observed polymorphism. Specifically, a critical consideration is whether or not the observation of two identical genetic patterns can be interpreted as evidence that such alleles or genotypes are identical by descent, that is, that they share a common ancestor at a recent time in the past . This allows the user to claim that, in a given epidemiologic context, such identity provides evidence of a relevant event (e.g., recrudescence or separating a local case from an introduction).
To this end, a larger sample set of msp-3α sequences was generated in order to better characterize the pattern of genetic variation in P. vivax populations at this locus. The ability of a commonly used PCR-RFLP protocol to accurately capture the pattern of P. vivax msp-3α diversity was then evaluated. Despite finding substantial nucleotide diversity between isolates, there was limited evidence that variation in the msp-3α coding region was structured by geography or by PCR-RFLP haplotype. Rather, diversity was high and similar across geographic regions, and showed a pattern consistent with recurrent recombination. Such complexities are not easily interpretable without suitable sequencing data. This study stresses the importance of locally evaluating the use of PCR-RFLP protocols targeting msp-3α as a marker in molecular epidemiologic investigations in P. vivax. Furthermore, the problems and limitations observed in msp-3α should be considered in other simple PCR based-fragment size genotyping techniques that target repetitive regions in loci encoding merozoite surface antigens.
To generate a global sample of the genetic diversity of MSP-3α [PlasmoDB  ID: PVX_097720], all publically available sequences were retrieved. In addition, in order to increase the probability of sampling the most divergent msp-3α alleles, sequences were newly obtained from ten geographically and temporally diverse laboratory isolates from across the parasite’s broad distribution (Figure 1 and Additional file 1). These strains (with their year of isolation, if available) were: North Korean (1953), Indonesia I (1990), Thai III, Vietnam-Palo Alto (before 1978), India VII (2001), Salvador I (1970), Panama I (1969), Brazil I (1994), Mauritania I (1998), and Chesson from New Guinea (1944). To analyse msp-3α diversity at a finer, local scale, sequences of 10 clinical isolates collected in Tumeremo, Venezuela (Bolívar State) in 2003-2004 were also generated. Sequences available in the National Center for Biotechnology Information nucleotide database included 17 sequences of isolates collected from three provinces of Thailand in 2001-2002 by Mascorro et al  that allowed assessment of local diversity in another region (accession codes for the P. vivax sequences used are shown in Additional file 1). Combined, a sample of 48 complete P. vivax msp-3α coding sequences was analysed.
To further understand the evolution of the msp-3α gene and its standing diversity, its ortholog in the macaque parasite Plasmodium cynomolgi, the most closely related species to P. vivax known, was amplified, cloned and sequenced from nine strains. The P. cynomolgi samples spanned the species’ South East Asian distribution and included isolates from Peninsular Malaysia (Berok, B Strain, PT1, PT2, Gombak, and Mulligan), Cambodia (Cambodian), Sri Lanka (P. c. ceylonensis), and Burma (RO). Generating population polymorphism data for the gene’s ortholog in a closely related species allows more sophisticated tests of natural selection and the ability to observe the pattern of nucleotide variation in a lineage of the gene unbiased by P. vivax specific demographic events.
Due to the previously known variance in gene length among members of the MSP-3 family of P. vivax (ranging from 1,100 to 3,700 bp) and among MSP-3α alleles, amplification by PCR was performed using two different enzymes in order to increase the probability of amplifying msp-3α alleles that potentially differed in length: AmpliTaq Gold (Applied Biosytems, Roche, USA) and the TaKaRa LA Taq (TaKaRa Mirus Bio Inc, Shiga, Japan) known to better amplify longer products. There were two or more independent reactions per sample. 50 μL PCR reactions included approximately 20 ng of total genomic DNA, 1X PCR buffer, 0.05-0.1 mM of each dNTP, and 0.2-0.4 μM of each primer. Other reagent concentrations and the primers used depended on the polymerase used: TaKaRa reactions included 2.5 U/μM polymerase and primer sequences of 5’ AAG AAA ATT TAC TYS AAA GGS AGT TAA CCG 3’ and 5’ TGT TCT CAA YCG ACA TGC RAA TTR GCT AG 3’; for AmpliTaq Gold reactions, 2.5 mM MgCl2 and 0.03 U/μM of polymerase was used with the primers 5’ ATG AAA CAC ACC CGC AGC GTC 3’ and 5’ GCT CAA AAA TAG GTG ATT CAT ATC GG 3’. The TaKaRa thermocycle protocol was: an initial denaturation step at 94°C for 1 minute; 35 cycles of denaturation at 94°C for 30 seconds followed by annealing and elongation at 65°C for 3 minutes; and a final elongation step of 10 minutes at 72°C. The AmpliTaq Gold protocol was: an initial denaturation step at 94°C for 4 minutes; 35 cycles of denaturation (94°C for 1 minute), annealing (54-60°C for 1 minute), and elongation (72°C for 2 minutes); and a final elongation step of 10 minutes at 72°C. PCR products were excised from 0.9% agarose gels and purified using the QIAquick DNA gel extraction kit (QIAgen™, Hilden, Germany), then cloned using the pGEM®-T Easy Vector System (Promega, WI, USA). Both strands for multiple clones were sequenced using an Applied Biosystems 3730 capillary sequencer. Sequences generated in this study have the NCBI accession codes: [GenBank: KC935422-KC935447].
Evolutionary genetic analyses
In order to calculate genetic diversity and infer phylogenetic relationships, three alignments were constructed: P. vivax and P. cynomolgi species-specific alignments and an interspecies alignment of both P. vivax and P. cynomolgi. Alignment was performed using the MUSCLE algorithm  in SeaView4  on translated sequences, followed by manual editing. Phylogenetic relationships were estimated using Bayesian methods as implemented in MrBayes v3.1.2  and using the neighbor joining (NJ) method  implemented in MEGA5 v5.05 . The distance based NJ tree was constructed using the Kimura 2-parameter model of nucleotide substitution to calculate a pairwise nucleotide distance matrix, with invariant rates among sites and 1000 bootstrap replicates to assess the confidence of clades. The pairwise distance matrix was also used separately to identify pairs of more similar sequences.
For the Bayesian methods, a general time reversible + gamma + evolutionarily invariable (GTR + G + I) model was used. This model was the one with the fewest parameters to best fit the data as estimated by MEGA5. Bayesian support for nodes was inferred in MrBayes using 42 x 106 Markov Chain Monte Carlo (MCMC) steps, with sampling every 100 generations. 50% of the samples were discarded as burn-in. Convergence is reached after the average standard deviation of the posterior probability is below 0.01 and the value of the potential scale reduction factor is between 1.00 and 1.02 .
Genetic diversity among isolates was estimated using the average number of pairwise nucleotide differences per site (overall mean distance, d) in MEGA5. The Jukes-Cantor model of nucleotide substitution was used to correct for multiple mutations when calculating distance, and 1000 bootstrap replicates determined the standard error. To investigate possible signatures of selection, the diversity at synonymous (dS) and nonsynonymous (dN) sites was calculated separately using the Nei-Gojobori method  with the Jukes-Cantor correction. To determine whether the estimated difference between dS and dN deviated significantly from the null expectation, the standard error of dS and dN was calculated with 1000 bootstrap replicates and tested using a two-tailed Z test , a commonly used evolutionary genetic test (e.g., [44–48]). A null hypothesis of neutrality was assumed, with the expectation that synonymous and nonsynonymous positions would have similar levels of diversity (dS = dN). Genetic differentiation between subpopulations was estimated using the FST statistic  as calculated in DnaSP v5 . Two further tests of neutrality were performed in DnaSP, Tajima’s test based on the frequency of variants as summarized by the D statistic , and the McDonald and Kreitman test  comparing polymorphism and divergence in the P. vivax and P. cynomolgi alignment.
To identify putative recombination events, the RDP algorithm  within the RDP3 program  was used to perform an automated screen of the P. vivax alignment. Default parameters for the detection of recombination break points and donor-recipient pairs were used with a significance cut-off of 0.05. Only the events where the donor and recipient sequences could be unambiguously identified were recorded.
Simulated PCR-RFLP analysis
The PCR-RFLP analysis of Bruce et al  that has been frequently used to genotype P. vivax clinical isolates for MSP-3α was simulated in order to compare PCR-RFLP haplotype diversity to the pattern of nucleotide diversity inferred from the set of full length sequences. The protocol is described in Bruce et al . Briefly, msp-3α genotypes are distinguished by variation in gene length due to insertion-deletion (further referred to as indels) mutations and by sequence variability at restriction enzyme cut sites. For the first step, alleles are separated using variation in the size of the product from amplifying the indel rich region of the gene by nested PCR. Sequences of similar length are then further differentiated on the basis of the banding patterns resulting from digestion of the nested PCR product separately with two restriction enzymes (Alu I and Hha I).
In silico digestion was then performed on the sequences using a Perl script (modified from ). The positions of enzyme cute sites were recorded for both enzymes for each sequence and the resultant restriction digestion bands were arranged by size (Figure 2). To be conservative and to identify the most PCR-RFLP haplotypes possible, it was assumed that only bands differing by less than ±30 bp would be indistinguishable during electrophoresis, a very fine scale detection limit for agarose gels with a standard DNA ladder (Life Technologies™, Carlsbad, CA manufacturer’s protocol). Unique combinations of Alu I and Hha I digestion bands for samples within a size class were used to create PCR-RFLP haplotypes. Each sequence within a size class was assigned to the corresponding haplotype. As per the Bruce et al. protocol, bands less than 100 bp in length were disregarded when determining haplotypes.
Results and discussion
Genetic diversity of P. vivax
Genetic polymorphism of MSP-3α in P. vivax and P. cynomolgi populations
Average indel length
Z-stat p value
Sequence polymorphism between isolates varied, but on average there were a high number of segregating (polymorphic) nucleotide sites between isolates in pairwise comparisons (Table 1). Indeed, all P. vivax isolates sampled here could be distinguished by their full length gene sequence. The number of sites differing between isolates ranged from a single missense mutation resulting in an isoleucine to phenylalanine change at residue 74 for two Venezuelan isolates, to over 100 segregating sites for multiple pairs of isolates. The average number of nucleotide differences between sequences in the sample was 38.0, for an average genetic distance of 0.027 ± 0.003 differences per site (Table 1). Diversity was not distributed uniformly in the gene, however, with polymorphism at synonymous sites (dS = 0.044 ± 0.007) significantly higher than at nonsynonymous sites (dN = 0.023 ± 0.003, p = 0.004). This indicates that these aligned sites could be evolving under functional constraints as evidenced by the significant excess of synonymous changes in almost all comparisons from the two-tailed Z test (Table 1).
Diversity among the Thai (d = 0.033 ± 0.002) and Venezuelan (d = 0.031 ± 0.002) clinical isolates was slightly higher but approximated that seen among the 21 geographically and temporally diverse ‘worldwide’ sample of isolates (d = 0.027 ± 0.003). Combining the 27 clinical isolates from these two distant subpopulations with the other 21 diverse isolates did not result in a change in the amount of genetic diversity captured, as average pairwise divergence for the total sample of 48 complete P. vivax msp-3α sequences remained 0.027 ± 0.003 (Table 1).
Genetic diversity among indel haplotypes for MSP-3α
Gene length (bp)
Average pairwise divergence per site (SE)
Geographic origin of isolates
At all informative sites among the group
At sites shared by all isolates
New Guinea, Thailand
Malaysia, El Salvador, Thailand
Mauritania, Bangladesh, Brazil, Thailand
Another observation was that seven of the 10 Venezuelan clinical isolates constituted indel haplotype 7 and had a substantially reduced level of nucleotide diversity (d = 0.007 ± 0.001) (Table 2) relative to the entire P. vivax sample. This explains the reduced indel haplotype diversity and reduced average indel size seen in the Venezuelan population (Table 1). It is worth noting, however, that despite this cluster of seven similar sequences, the other three Venezuelan sequences were highly divergent. Their inclusion raised the average pairwise divergence within Venezuela to a value approximately equivalent to that of the diverse global sample (Table 1).
With this evidence that local genetic diversity in both Thailand and Venezuela recapitulated the level of global MSP-3α diversity, a standard measure of population differentiation, the FST statistic, was used to investigate whether variation at this locus was indeed distributed randomly across populations. The FST value, which compares the amount of diversity within populations to the amount of divergence between, for the Venezuelan and Thai sample populations was moderately high likely due to the geographic distance (FST = 0.118).
Genetic diversity of P. cynomolgi
Plasmodium cynomolgi is the closest known relative of P. vivax[56–58] and it parasitizes simian hosts, mostly of the genus Macaca, in Southeast Asia . Sequencing of its msp-3α ortholog from nine strains revealed extensive nucleotide polymorphism between strains (d = 0.064 ± 0.004), similar to that seen for P. vivax (Table 1). Although the average number of segregating sites among the nine P. cynomolgi sequences was higher than that observed for P. vivax, there was much less indel diversity among the P. cynomolgi strains sampled. The difference between the smallest and largest alleles was only 9 bp as a result, as opposed to a difference of more than 800 bp between some P. vivax isolates. Maximum gene length for P. cynomolgi (1,968 bp) was substantially shorter than for P. vivax (2,601 bp). This observation suggests that P. vivax may have inherited a shorter gene from its common ancestor with P. cynomolgi. In such a case, rather than deletions being responsible for the shorter alleles seen in P. vivax, large insertions in a shorter ancestral sequence would explain the range in allele length. The direction of indel mutations cannot be inferred from alignment, however, and the larger P. vivax alleles have been found repeatedly to be the predominant type (e.g., [13, 26, 30, 31, 33], as was the case in this sample here. Further, diversity among indel haplotypes for sequences with lengths over 2500 bp (indel haplotypes 3-6) varied, but was within the range seen for all haplotype groups in P. vivax (Table 2). Diversity would be expected to be reduced among more recently arisen allele types, but this failed to be the case for either the longer or shorter alleles. Finally, due to the prevalence of indel mutations of varying lengths, gene length varied almost continuously and thus it was not possible to easily identify distinct allele ‘types’ based solely on size in most cases.
As with P. vivax, a significant majority of the diversity seen in P. cynomolgi was constrained to synonymous sites (p = 0.020, Table 1). It is noteworthy, though, that despite it being significantly less than at synonymous sites, diversity at non-synonymous sites in both P. vivax (dN = 0.023 ± 0.003) and P. cynomolgi (dN = 0.058 ± 0.005) was substantial, with amino acid changes accordingly frequent. Intraspecies diversity for MSP-3α was comparable to that seen for other merozoite surface proteins, such as MSP-5 , and the 200 L fragment of MSP-1 , but was substantially higher than for those thought to be evolving under purifying selection such as MSP-7 , MSP-8, and MSP-10 .
To further test for evidence of a distribution of genetic variation that deviated from expectations of neutrality and to take advantage of the newly generated set of P. cynomolgi sequences, the test of McDonald and Kreitman was performed using the P. vivax and P. cynomolgi alignment. Although there were substantially more non-synonymous than synonymous differences fixed between species, the ratio of fixed to polymorphic sites did not differ significantly (Fisher’s exact test, p = 0.80), indicating that the observed patterns of divergence were still consistent with neutrality. Tajima’s tests of neutrality for both P. vivax and P. cynomolgi (D = -0.987 and D = 0.392, respectively) were insignificant as well.
PCR-RFLP haplotype diversity
Genetic diversity among PCR size classes and PCR-RFLP haplotypes for MSP-3α
Band size (bp)
Average pairwise divergence per site (SE)
Geographic origin of isolates
At all informative sites among the group
At sites shared by all isolates
Size Class 1
Asia, Africa, South Americaa
Size Class 2
Size Class 3
Brazil, Thailand, Malaysia
Size Class 4
Thailand, New Guinea
Size Class 5
Korea, Mauritania, Thailand, Brazil
Thailand, New Guinea
In silico digestion of the predicted nested PCR products with Hha I resulted in msp-3α sequences being cut into three to seven bands (Figure 2). Variance among sequences in the size and combinations of these bands, due to sequence diversity at and between cut sites, allowed distinguishing 13 Hha I haplotypes among the 27 sequences in size class 1. Three, two and two different Hha I haplotypes were identified among sequences in size classes 2, 3, and 4, respectively. Sequences also had from 4 to 13 digestion sites for Alu I that resulted in banding patterns that allowed the further discrimination of 10 more msp-3α genotypes, for a total of 31 distinct PCR-RFLP haplotypes from the sample of 48 sequences. As a result, 17 of the 48 sequences were within one of the eight PCR-RFLP haplotypes that was shared by more than one sequence after in silico digestion with both enzymes. This therefore allowed the genetic diversity within these eight haplotypes to be calculated (Table 3). PCR-RFLP analysis, as simulated here, was unable to resolve all samples to distinct genotypes, and in six of the eight PCR-RFLP haplotypes shared by more than one sequence, the sequences within that group originated from distant geographic locations (Table 3).
An initial observation made when comparing these PCR-RFLP alleles to the pattern of nucleotide diversity was that similarly sized amplicons, the basis of distinguishing PCR-RFLP size classes, were often produced by different indel haplotypes. From 27 sequences with gene lengths of 2526-2601 bp (resulting in an 1884-1950 bp amplicon for digestion using the Bruce et al. method) 17 different combinations of indels were observed from sequencing, with indel events averaging 27 bp in length. With numerous indels that averaged a size close to the assumed detection limit of 30 bp segregating into many combinations for class 1, raw amplicon length in many cases simply reflected a convergent sum over these mutation events (indels) and failed to reflect the high levels of MSP-3α polymorphism even within this class. Further, as shown in Table 3, nucleotide diversity within PCR-RFLP size classes (d = 0.024 ± 0.002 to 0.052 ± 0.003) varied, but approximated the amount of diversity seen in the total sample population of globally diverse isolates (d = 0.027 ± 0.003, see Table 1).
All ten Venezuelan MSP-3α isolates fell within the same size class, and included a cluster of 5 isolates (haplotype 2.1) that had markedly less nucleotide diversity (d = 0.002 ± 0.001) than the other PCR-RFLP alleles found in 2 or more samples (d = 0.016 ± 0.002 and 0.058 ± 0.005). However, this reduced PCR-RFLP allelic diversity in Venezuela contradicts that the total nucleotide diversity retained within the Venezuelan sample population was similar to that of Thailand, and the entire sample combined (Table 1), despite that those two groups had much greater allelic diversity. More evidence that MSP-3α genetic variation was not structured by PCR-RFLP haplotypes was that isolates with identical banding patterns, such as PCR-RFLP haplotype 1.1 (Table 3), could again exhibit more average pairwise divergence (d = 0.043 ± 0.003) than was seen on average between sequences in the entire sample (d = 0.027 ± 0.003, Table 1).
The Alu I digestion for 47 of the 48 samples resulted in sequences having two or more different digestion fragments from different regions of the gene that differed by less than the ±30 bp discrimination limit. In three additional cases to the one above, identical Alu I haplotypes would have been inferred erroneously after electrophoresis due to coincident size in non-homologous bands for divergent sequences. Multiple digestion bands coinciding to have a similar size resulted from the Hha I digestion much less frequently, only occurring for six of the 48 isolates; however, similar Hha I banding patterns also often failed to correlate to similarity at the nucleotide sequence level. For example, the average pairwise diversity for three Hha I RFLP haplotypes (haplotypes H1.1, H1.4, and H2.2) was 0.046 ± 0.003, 0.056 ± 0.005, and 0.064 ± 0.005, respectively, all values much greater than was seen on average between sequences chosen randomly (d = 0.027 ± 0.003).
Phylogenetic analysis and recombination
However, an observation from the pairwise distance matrix and one consistent across both phylogenetic methods was that there was no discernible pattern of more similar sequences having a closer geographic origin or more similar PCR-RFLP haplotype. Although five of the Venezuelan clinical isolates that shared the same PCR-RFLP haplotype (haplotype 2.1) grouped together, other Venezuelan sequences obtained from Bolívar State were distributed throughout the tree in both the Bayesian and NJ analyses (Figure 4). Furthermore, the sequences of these clinical isolates collected in Venezuela in 2003-2004 were often most similar to isolates collected distantly in time and space. Examples include Venezuelan isolate 1060, which was most similar to an isolate collected in Brazil before 1988 and an Indonesian isolate from 1990. Another example, Venezuelan isolate 166, grouped with a Vietnamese isolate from before 1978 (Figure 4). The relationship of Venezuelan isolate 1081 to the other P. vivax sequences was difficult to infer, with it having the least genetic distance to two Thai clinical isolates from 2000-2001 in the NJ analysis, but situating within a large, unresolved polytomy in the Bayesian tree (Figure 4). Further, the inability to resolve the cluster of five Venezuelan clinical isolates (isolates 109, 120, 127, 164, 167, and 168) from distant strains (e.g., Panama I, Chesson from New Guinea, Thai III) in either tree (Figure 4) makes it questionable whether MSP3-alpha could be used to resolve relationships at a finer scale. For epidemiologic efforts applied at a local scale, this may preclude using phylogenetics to accurately determine the origins of isolates or migration events with MSP-3α.
This is the pattern expected by recombination shuffling the variation at these genic regions into multiple combinations, and numerous events were detected by the RDP algorithm as well. Recombination was commonly detected between isolates from different geographic regions, between isolates in different PCR-RFLP size classes, and throughout the phylogeny (Figure 4). Frequent recombination would also explain the lack of phylogenetic resolution observed and the failure for Bayesian analysis to reach convergence on an optimal topology. Indeed, Mascorro et al  showed previously that isolates segregated differently into phylogenies made from four different regions of msp-3α, indicating different segments of haplotypes have different evolutionary histories and thus yield a conflicting phylogenetic signal for the entire gene. As a result of recombination acting to shuffle genetic variation at this locus, circulating msp-3α haplotypes are then complex composites of multiple segments that have each likely been differentially shaped by their respective population of origin’s demography or selective environment. This seems a probable explanation for the inability to resolve relationships between distant P. vivax isolates, despite recovering a high number of segregating sites.
In silico simulation of the genotyping protocol used in studies of MSP-3α diversity (see ) and sequence analysis of an augmented sample set detected high allelic diversity in the sample population. Whereas high diversity is considered intrinsically a desirable property in a genetic marker , there were incongruences observed between the patterns of nucleotide diversity at the sequence level and from the PCR-RFLP genotyping. Full length gene sequencing of small sample populations in Thailand and Venezuela failed to reveal substantial differences in nucleotide variation between these two endemic regions though they are thought to have different transmission rates and parasite population sizes [1, 62–64]. Rather, local nucleotide diversity in the two small sample populations closely approximated the global P. vivax MSP-3α diversity from a temporally and geographically diverse sample. These findings suggest that MSP-3α nucleotide diversity is, therefore, unlikely to reflect current P. vivax population trends at the local scale.
While the ability to distinguish parasite haplotypes using the diverse MSP-3α locus was considered of potential use for certain molecular epidemiology applications, such as tracking recrudescence and infection clonality, this investigation finds that these PCR-RFLP haplotypes may not be informative. Specifically, it was noted that identical PCR-RFLP haplotypes were observed between very divergent sequences and that; likely due to recombination, sequences did not contain a strong signal of their evolutionary relationships. Similar patterns have been observed when studying other loci used as markers in depth such as block 2 of MSP-1 in P. falciparum, a marker that is still used in molecular epidemiologic investigations . The situation could worsen in the context of mixed-cloned infections where restriction patterns are likely difficult to interpret. Thus, this study cautions against the use of these PCR-RFLP methods for comparing populations in extended geographic regions or whenever the sampling involves an extended time frame and migration is suspected.
Pilot sequencing studies are needed in order to validate whether msp-3α PCR-RFLP haplotypes can be tracked locally through time and space. Lack of such information hampers the ability of linking PCR-RFLP patterns with epidemiologically relevant events (e.g. a recrudescence or a re-introduction) in a given endemic area. Multi-locus microsatellite  or single nucleotide polymorphism panels  are likely more suitable for molecular epidemiologic studies than msp-3α PCR-RFLP or similar genotyping methods. The choice among these markers, microsatellites or SNPs, should be based on the time scale of the events that the study aims to document . Overall, this study indicates that high polymorphism is a necessary but not sufficient criterion for choosing a genotyping method; it is also important to understand how such variation is generated and maintained in order to interpret the observed patterns in an epidemiological context.
Merozoite surface protein-3 alpha
Polymerase chain reaction
Restriction fragment length polymorphism
Overall mean genetic distance
Diversity at synonymous and nonsynonymous (dN) sites
Markov Chain Monte Carlo.
This work was supported by the US National Institutes of Health (R01 GM080586 to AAE, and the Latin American Center for Malaria Research and Control 1U19AI089702). BLR and MMA were supported in part by the US NIH: grant R25GM071798, ASU PREP for Biomedical Research. The authors thank Dr. William E. Collins at the Centers for Disease Control and Prevention for providing the P. cynomolgi samples, Leopoldo Villegas for providing the samples from Venezuela, and the DNA laboratory at the School of Life Sciences for their technical support. The content is solely the responsibility of the authors and does not represent the official views of the NIH.
- Gething PW, Elyazar IRF, Moyes CL, Smith DL, Battle KE, Guerra CA, Patil AP, Tatem AJ, Howes RE, Myers MF, George DB, Horby P, Wertheim HFL, Price RN, Müeller I, Baird JK, Hay SI: A long neglected world malaria map: Plasmodium vivax endemicity in 2010. PLoS Negl Trop Dis. 2012, 6: e1814-10.1371/journal.pntd.0001814.PubMed CentralView ArticlePubMed
- Carlton JM, Sina BJ, Adams JH: Why is Plasmodium vivax a neglected tropical disease?. PLoS Negl Trop Dis. 2011, 5: e1160-10.1371/journal.pntd.0001160.PubMed CentralView ArticlePubMed
- Carlton JM, Das A, Escalante AA: Genomics, population genetics and evolutionary history of Plasmodium vivax. Adv Parasitol. 2013, 81: 203-222.View ArticlePubMed
- Alonso PL, Tanner M: Public health challenges and prospects for malaria control and elimination. Nat Med. 2013, 19: 150-155. 10.1038/nm.3077.View ArticlePubMed
- Galinski MR, Barnwell JW: Plasmodium vivax: who cares?. Malar J. 2008, 7 (Suppl 1): S9-10.1186/1475-2875-7-S1-S9.PubMed CentralView ArticlePubMed
- Rayner JC, Huber CS, Feldman D, Ingravallo P, Galinski MR, Barnwell JW: Plasmodium vivax merozoite surface protein PvMSP-3 beta is radically polymorphic through mutation and large insertions and deletions. Infect Genet Evol. 2004, 4: 309-319. 10.1016/j.meegid.2004.03.003.View ArticlePubMed
- Mueller I, Galinski MR, Baird JK, Carlton JM, Kochar DK, Alonso PL, del Portillo HA: Key gaps in the knowledge of Plasmodium vivax, a neglected human malaria parasite. Lancet Infect Dis. 2009, 9: 555-566. 10.1016/S1473-3099(09)70177-X.View ArticlePubMed
- Bitencourt AR, Vicentin EC, Jimenez MC, Ricci R, Leite JA, Costa FT, Ferreira LC, Russell B, Nosten F, Rénia L, Galinski MR, Barnwell JW, Rodrigues MM, Soares IS: Antigenicity and immunogenicity of Plasmodium vivax merozoite surface protein-3. PLoS ONE. 2013, 8: e56061-10.1371/journal.pone.0056061.PubMed CentralView ArticlePubMed
- Galinski MR, Corredor-Medina C, Povoa M, Crosby J, Ingravallo P, Barnwell JW: Plasmodium vivax merozoite surface protein-3 contains coiled-coil motifs in an alanine-rich central domain. Mol Biochem Parasitol. 1999, 101: 131-147. 10.1016/S0166-6851(99)00063-8.View ArticlePubMed
- Bruce MC, Galinski MR, Barnwell JW, Snounou G, Day KP: Polymorphism at the merozoite surface protein-3alpha locus of Plasmodium vivax: global and local diversity. Am J Trop Med Hyg. 1999, 61: 518-525.PubMed
- Bruce MC, Galinski MR, Barnwell JW, Donnelly CA, Walmsley M, Alpers MP, Walliker D, Day KP: Genetic diversity and dynamics of Plasmodium falciparum and P. vivax populations in multiply infected children with asymptomatic malaria infections in Papua New Guinea. Parasitology. 2000, 121: 257-272. 10.1017/S0031182099006356.View ArticlePubMed
- Rayner JC, Corredor V, Feldman D, Ingravallo P, Iderabdullah F, Galinski MR, Barnwell JW: Extensive polymorphism in the Plasmodium vivax merozoite surface coat protein MSP-3alpha is limited to specific domains. Parasitology. 2002, 125: 393-405.View ArticlePubMed
- Cui L, Mascorro CN, Fan Q, Rzomp KA, Khuntirat B, Zhou G, Chen H, Yan G, Sattabongkot J: Genetic diversity and multiple infections of Plasmodium vivax malaria in Western Thailand. Am J Trop Med Hyg. 2003, 68: 613-619.PubMed
- Han E-T, Song T-E, Park J-H, Shin E-H, Guk S-M, Kim T-Y, Chai J-Y: Allelic dimorphism in the merozoite surface protein-3alpha in Korean isolates of Plasmodium vivax. Am J Trop Med Hyg. 2004, 71: 745-749.PubMed
- Cole-Tobian JL, Biasor M, King CL: High complexity of Plasmodium vivax infections in Papua New Guinean children. Am J Trop Med Hyg. 2005, 73: 626-633.PubMed
- Mascorro CN, Zhao K, Khuntirat B, Sattabongkot J, Yan G, Escalante AA, Cui L: Molecular evolution and intragenic recombination of the merozoite surface protein MSP-3alpha from the malaria parasite Plasmodium vivax in Thailand. Parasitology. 2005, 131: 25-35. 10.1017/S0031182005007547.View ArticlePubMed
- Ord R, Polley S, Tami A, Sutherland CJ: High sequence diversity and evidence of balancing selection in the Pvmsp3alpha gene of Plasmodium vivax in the Venezuelan Amazon. Mol Biochem Parasitol. 2005, 144: 86-93. 10.1016/j.molbiopara.2005.08.005.View ArticlePubMed
- Kim JR, Imwong M, Nandy A, Chotivanich K, Nontprasert A, Tonomsing N, Maji A, Addy M, Day NPJ, White NJ, Pukrittayakamee S: Genetic diversity of Plasmodium vivax in Kolkata, India. Malar J. 2006, 5: 71-10.1186/1475-2875-5-71.PubMed CentralView ArticlePubMed
- Zakeri S, Barjesteh H, Djadid ND: Merozoite surface protein-3alpha is a reliable marker for population genetic analysis of Plasmodium vivax. Malar J. 2006, 5: 53-10.1186/1475-2875-5-53.PubMed CentralView ArticlePubMed
- Aresha M, Sanath M, Deepika F, Renu W, Anura B, Chanditha H, Wimaladharma A, Rajitha W: Genotyping of Plasmodium vivax infections in Sri Lanka using Pvmsp-3alpha and Pvcs genes as markers: a preliminary report. Trop Biomed. 2008, 25: 100-106.PubMed
- Sutton PL, Neyra V, Hernandez JN, Branch OH: Plasmodium falciparum and Plasmodium vivax infections in the Peruvian Amazon: propagation of complex, multiple allele-type infections without super-infection. Am J Trop Med Hyg. 2009, 81: 950-960. 10.4269/ajtmh.2009.09-0132.PubMed CentralView ArticlePubMed
- Véron V, Legrand E, Yrinesi J, Volney B, Simon S, Carme B: Genetic diversity of msp3alpha and msp1_b5 markers of Plasmodium vivax in French Guiana. Malar J. 2009, 8: 40-10.1186/1475-2875-8-40.PubMed CentralView ArticlePubMed
- Dhangadamajhi G, Rout BK, Kar SK, Ranjit MR: Genetic diversity of Plasmodium vivax in a hyperendemic area predominated by Plasmodium falciparum; a preliminary study. Trop Biomed. 2010, 27: 578-584.PubMed
- Khatoon L, Baliraine FN, Bonizzoni M, Malik SA, Yan G: Genetic structure of Plasmodium vivax and Plasmodium falciparum in the Bannu district of Pakistan. Malar J. 2010, 9: 112-10.1186/1475-2875-9-112.PubMed CentralView ArticlePubMed
- Nam DH, Oh JS, Nam MH, Park HC, Lim CS, Lee WJ, Sattabongkot J, Klein TA, Ayala FJ: Emergence of new alleles of the MSP-3alpha gene in Plasmodium vivax isolates from Korea. Am J Trop Med Hyg. 2010, 82: 522-524. 10.4269/ajtmh.2010.08-0332.PubMed CentralView ArticlePubMed
- Prajapati SK, Joshi H, Valecha N: Plasmodium vivax merozoite surface protein-3 alpha: a high-resolution marker for genetic diversity studies. J Vector Borne Dis. 2010, 47: 85-90.PubMed
- Wickramarachchi T, Premaratne PH, Dias S, Handunnetti SM, Udagama-Randeniya PV: Genetic complexity of Plasmodium vivax infections in Sri Lanka, as reflected at the merozoite-surface-protein-3alpha locus. Ann Trop Med Parasitol. 2010, 104: 95-108. 10.1179/136485910X12607012374190.View ArticlePubMed
- Zakeri S, Raeisi A, Afsharpad M, Kakar Q, Ghasemi F, Atta H, Zamani G, Memon MS, Salehi M, Djadid ND: Molecular characterization of Plasmodium vivax clinical isolates in Pakistan and Iran using pvmsp-1, pvmsp-3alpha and pvcsp genes as molecular markers. Parasitol Int. 2010, 59: 15-21. 10.1016/j.parint.2009.06.006.View ArticlePubMed
- Zakeri S, Safi N, Afsharpad M, Butt W, Ghasemi F, Mehrizi AA, Atta H, Zamani G, Djadid ND: Genetic structure of Plasmodium vivax isolates from two malaria endemic areas in Afghanistan. Acta Trop. 2010, 113: 12-19. 10.1016/j.actatropica.2009.08.025.View ArticlePubMed
- Ribeiro RS, Ladeira L, Rezende AM, Fontes CJF, Carvalho LH, de Brito CFA: Analysis of the genetic variability of PvMSP-3α among Plasmodium vivax in Brazilian field isolates. Mem Inst Oswaldo Cruz. 2011, 106 (Suppl 1): 27-33.View ArticlePubMed
- Rungsihirunrat K, Chaijaroenkul W, Siripoon N, Seugorn A, Na-Bangchang K: Genotyping of polymorphic marker (MSP3α and MSP3β) genes of Plasmodium vivax field isolates from malaria endemic of Thailand. Trop Med Int Health. 2011, 16: 794-801. 10.1111/j.1365-3156.2011.02771.x.View ArticlePubMed
- Schousboe ML, Rajakaruna RS, Amerasinghe PH, Konradsen F, Ord R, Pearce R, Bygbjerg IC, Roper C, Alifrangis M: Analysis of polymorphisms in the merozoite surface protein-3α gene and two microsatellite loci in Sri Lankan Plasmodium vivax: evidence of population substructure in Sri Lanka. Am J Trop Med Hyg. 2011, 85: 994-1001. 10.4269/ajtmh.2011.11-0338.PubMed CentralView ArticlePubMed
- Zhong D, Bonizzoni M, Zhou G, Wang G, Chen B, Vardo-Zalik A, Cui L, Yan G, Zheng B: Genetic diversity of Plasmodium vivax malaria in China and Myanmar. Infect Genet Evol. 2011, 11: 1419-1425. 10.1016/j.meegid.2011.05.009.PubMed CentralView ArticlePubMed
- Gillespie JH: Population Genetics: A Concise Guide. 2004, Baltimore: Johns Hopkins University Press
- Takala SL, Escalante AA, Branch OH, Kariuki S, Biswas S, Chaiyaroj SC, Lal AA: Genetic diversity in the Block 2 region of the merozoite surface protein 1 (MSP-1) of Plasmodium falciparum: additional complexity and selection and convergence in fragment size polymorphism. Infect Genet Evol. 2006, 6: 417-424. 10.1016/j.meegid.2006.01.009.PubMed CentralView ArticlePubMed
- Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Stoeckert CJ, Treatman C, Wang H: PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 2009, 37: D539-D543. 10.1093/nar/gkn814. Database issuePubMed CentralView ArticlePubMed
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 2004, 5: 113-10.1186/1471-2105-5-113.View Article
- Gouy M, Guindon S, Gascuel O: SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010, 27: 221-224. 10.1093/molbev/msp259.View ArticlePubMed
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.View ArticlePubMed
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.PubMed
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.PubMed CentralView ArticlePubMed
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMed
- Nei M, Kumar S: Molecular Evolution and Phylogenetics. 2000, New York: Oxford University Press
- Pacheco MA, Elango AP, Rahman AA, Fisher D, Collins WE, Barnwell JW, Escalante AA: Evidence of purifying selection on merozoite surface protein 8 (MSP8) and 10 (MSP10) in Plasmodium spp. Infect Genet Evol. 2012, 12: 978-986. 10.1016/j.meegid.2012.02.009.PubMed CentralView ArticlePubMed
- Pacheco MA, Poe AC, Collins WE, Lal AA, Tanabe K, Kariuki SK, Udhayakumar V, Escalante AA: A comparative study of the genetic diversity of the 42kDa fragment of the merozoite surface protein 1 in Plasmodium falciparum and P. vivax. Infect Genet Evol. 2007, 7: 180-187. 10.1016/j.meegid.2006.08.002.PubMed CentralView ArticlePubMed
- Tanabe K, Arisue N, Palacpac NMQ, Yagi M, Tougan T, Honma H, Ferreira MU, Färnert A, Björkman A, Kaneko A, Nakamura M, Hirayama K, Mita T, Horii T: Geographic differentiation of polymorphism in the Plasmodium falciparum malaria vaccine candidate gene SERA5. Vaccine. 2012, 30: 1583-1593. 10.1016/j.vaccine.2011.12.124.View ArticlePubMed
- Weedall GD, Conway DJ: Detecting signatures of balancing selection to identify targets of anti-parasite immunity. Trends Parasitol. 2010, 26: 363-369. 10.1016/j.pt.2010.04.002.View ArticlePubMed
- Escalante AA, Cornejo OE, Rojas A, Udhayakumar V, Lal AA: Assessing the effect of natural selection in malaria parasites. Trends Parasitol. 2004, 20: 388-395. 10.1016/j.pt.2004.06.002.View ArticlePubMed
- Hudson RR, Slatkin M, Maddison WP: Estimation of levels of gene flow from DNA sequence data. Genetics. 1992, 132: 583-589.PubMed CentralPubMed
- Librado P, Rozas J: DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009, 25: 1451-1452. 10.1093/bioinformatics/btp187.View ArticlePubMed
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.PubMed CentralPubMed
- McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991, 351: 652-654. 10.1038/351652a0.View ArticlePubMed
- Martin D, Rybicki E: RDP: detection of recombination amongst aligned sequences. Bioinformatics. 2000, 16: 562-563. 10.1093/bioinformatics/16.6.562.View ArticlePubMed
- Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P: RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics. 2010, 26: 2462-2463. 10.1093/bioinformatics/btq467.PubMed CentralView ArticlePubMed
- Tisdall J: Beginning Perl for Bioinformatics. 2001, Sebastopol, CA: O’Reilly Media
- Coatney RG, Collins WE, Warren M, Contacos PG: The Primate Malarias. 1971, Washington, DC: US Government Printing Office
- Pacheco MA, Battistuzzi FU, Junge RE, Cornejo OE, Williams CV, Landau I, Rabetafika L, Snounou G, Jones-Engel L, Escalante AA: Timing the origin of human malarias: the lemur puzzle. BMC Evol Biol. 2011, 11: 299-10.1186/1471-2148-11-299.PubMed CentralView ArticlePubMed
- Pacheco MA, Reid MJC, Schillaci MA, Lowenberger CA, Galdikas BMF, Jones-Engel L, Escalante AA: The origin of malarial parasites in orangutans. PLoS ONE. 2012, 7: e34990-10.1371/journal.pone.0034990.PubMed CentralView ArticlePubMed
- Putaporntip C, Udomsangpetch R, Pattanawong U, Cui L, Jongwutiwes S: Genetic diversity of the Plasmodium vivax merozoite surface protein-5 locus from diverse geographic origins. Gene. 2010, 456: 24-35. 10.1016/j.gene.2010.02.007.PubMed CentralView ArticlePubMed
- Valderrama-Aguirre A, Zúñiga-Soto E, Mariño-Ramírez L, Moreno LÁ, Escalante AA, Arévalo-Herrera M, Herrera S: Polymorphism of the Pv200L fragment of merozoite surface protein-1 of Plasmodium vivax in clinical isolates from the Pacific coast of Colombia. Am J Trop Med Hyg. 2011, 84 (2 Suppl): 64-70. 10.4269/ajtmh.2011.09-0517.PubMed CentralView ArticlePubMed
- Garzón-Ospina D, Romero-Murillo L, Tobón LF, Patarroyo MA: Low genetic polymorphism of merozoite surface proteins 7 and 10 in Colombian Plasmodium vivax isolates. Infect Genet Evol. 2011, 11: 528-531. 10.1016/j.meegid.2010.12.002.View ArticlePubMed
- Battle KE, Gething PW, Elyazar IRF, Moyes CL, Sinka ME, Howes RE, Guerra CA, Price RN, Baird KJ, Hay SI: The global public health significance of Plasmodium vivax. Adv Parasitol. 2012, 80: 1-111.View ArticlePubMed
- Cornejo OE, Escalante AA: The origin and age of Plasmodium vivax. Trends Parasitol. 2006, 22: 558-563. 10.1016/j.pt.2006.09.007.PubMed CentralView ArticlePubMed
- Taylor JE, Pacheco MA, Bacon DJ, Beg MA, Dantas-Machado RL, Fairhurst RM, Herrera S, Kim JY, Menard D, Póvoa MM, Villegas L, Mulyanto SG, Cui L, Zeyrek FY, Escalante AA: The evolutionary history of Plasmodium vivax as inferred from mitochondrial genomes: parasite genetic diversity in the Americas. Mol Biol Evol.
- Al-Abd NM, Mahdy MA, Al-Mekhlafi AM, Snounou G, Abdul-Majid NB, Al-Mekhlafi HM, Fong MY: The Suitability of P. falciparum Merozoite Surface Proteins 1 and 2 as Genetic Markers for In Vivo Drug Trials in Yemen. PLoS ONE. 2013, 8: e67853-10.1371/journal.pone.0067853.PubMed CentralView ArticlePubMed
- Chenet SM, Schneider KA, Villegas L, Escalante AA: Local population structure of Plasmodium: impact on malaria control and elimination. Malar J. 2012, 11: 412-10.1186/1475-2875-11-412.PubMed CentralView ArticlePubMed
- Daniels R, Volkman SK, Milner DA, Mahesh N, Neafsey DE, Park DJ, Rosen D, Angelino E, Sabeti PC, Wirth DF, Wiegand RC: A general SNP-based molecular barcode for Plasmodium falciparum identification and tracking. Malar J. 2008, 7: 223-10.1186/1475-2875-7-223.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.