Spatiotemporal mathematical modelling of mutations of the dhps gene in African Plasmodium falciparum

Background Plasmodium falciparum has repeatedly evolved resistance to first-line anti-malarial drugs, thwarting efforts to control and eliminate the disease and in some period of time this contributed largely to an increase in mortality. Here a mathematical model was developed to map the spatiotemporal trends in the distribution of mutations in the P. falciparum dihydropteroate synthetase (dhps) gene that confer resistance to the anti-malarial sulphadoxine, and are a useful marker for the combination of alleles in dhfr and dhps that is highly correlated with resistance to sulphadoxine-pyrimethamine (SP). The aim of this study was to present a proof of concept for spatiotemporal modelling of trends in anti-malarial drug resistance that can be applied to monitor trends in resistance to components of artemisinin combination therapy (ACT) or other anti-malarials, as they emerge or spread. Methods Prevalence measurements of single nucleotide polymorphisms in three codon positions of the dihydropteroate synthetase (dhps) gene from published studies of dhps mutations across Africa were used. A model-based geostatistics approach was adopted to create predictive surfaces of the dhps540E mutation over the spatial domain of sub-Saharan Africa from 1990-2010. The statistical model was implemented within a Bayesian framework and hence quantified the associated uncertainty of the prediction of the prevalence of the dhps540E mutation in sub-Saharan Africa. Conclusions The maps presented visualize the changing prevalence of the dhps540E mutation in sub-Saharan Africa. These allow prediction of space-time trends in the parasite resistance to SP, and provide probability distributions of resistance prevalence in places where no data are available as well as insight on the spread of resistance in a way that the data alone do not allow. The results of this work will be extended to design optimal sampling strategies for the future molecular surveillance of resistance, providing a proof of concept for similar techniques to design optimal strategies to monitor resistance to ACT.

The unstructured components, ε(x,t) , were assumed to be independent and identically distributed with zero mean and variance ε(x,t) V~N (0,V ) .
The random field, f (x,t) , was modeled as a stationary Gaussian process, with mean function µ(x,t) and covariance function C(x,t) : f (x,t) θ M ,θ C~G P(µ( x,t),C(x,t)) , where and are vectors of parameters that specify the mean and covariance function, respectively. Specifically, it was assumed that the mean function varies linearly in time and with malaria transmission intensity where θ M = {β 0 ,β 1 ,β 2 } . The covariance function was chosen to be a version of the spatio-temporal structure advocated by Stein [6] and adopted previously by Hay et al. [7] and Gething et al. [8]. The covariance between a study conducted at location in year , and a study performed at in year was given by where is the gamma function, is the modified Bessel function of the second kind of order , Δt = t i − t j and The distance was given by is the great circle distance between locations and . In the notation adopted here, the parameter refers to the temporal scale factor, to the temporal limiting correlation, to the partial sill and to the spatial range. The covariance parameters were: The joint probability model for the dhps540E observations, the structured and unstructured components, given the model parameters, the space-time location of the data and the sample sizes was therefore given by where N + , N, X and t are the augmented set of positive dhps540E responses, number of samples tested, location and time of the study, respectively, for the set of studies.
That is, for example,

Inclusion of the dhps437G and dhps581G markers
To incorporate the information contained within the dhps437G and dhps581G marker data into the model framework outlined in the previous section, factor potentials were introduced that multiply the joint probability model [9,10]. Essentially, the presence of dhps437G or dhps581G data at a location without dhps540E data placed an upper (dhps437G) or lower (dhps581G) constraint on the predictive dhps540E prevalence allowed by the model.
For a space-time location (x k ,t k ) that was not associated with a dhps540E observation, but was associated with dhps437G data ( of the samples are positive for the dhps437G marker), the likelihood, given the random field and unstructured random component was modified: is an indicator function such that ) is the predicted model prevalence of the dhps540E marker at location and time . A separate factor potential (and hence indicator function) was used for each space-time location where dhps437G data was available but dhps540E data was not.
In a similar fashion, dhps581G data (N 581 was incorporated into the model by multiplying the likelihood by the indicator function

Prior specification
Priors were specified for the mean and covariance parameters The logarithm of the partial sill ( ) and the spatial range ( ) were assigned skew-normal priors: The temporal scale ( ) was given a relatively vague prior φ t~E xponential(0.1).
Noninformative priors were specified for the regression coefficients in the mean function of the Gaussian process p(β 0 ,β 1 ,β 2 ) ∝1 .
The inverse of the variance of the unstructured random component (1 / V ) was assigned a diffuse Gamma prior with mean 0.25

A2.2 Implementation
The implementation of the model proceeds with two main steps: inference and prediction, as detailed below.

Parameters estimation (inference stage)
In the parameter estimation stage, the output of the MBG model was the posterior probability distribution of the model parameters, given the observed data. Samples were drawn from the posterior distribution of the model mean and covariance parameters ( θ M ,θ C ,V ) and the random field ( f (x i ,t i ) ) at each location where dhps540E, 437G or 581G data was available, using a MCMC approach. The MCMC algorithm was implemented in the Python [11] package PyMC [12]. PyMC is an open-source Python module that implements Bayesian statistical models and fitting algorithms, including MCMC.
The mean and covariance parameters ( θ M ,θ C ,V ) were updated jointly within the MCMC algorithm using Metropolis steps while the values of the space-time random field at the data locations and times were updated using Gibbs steps. The unstructured random components ( ε(x i ,t i ) ) were updated separately using Metropolis steps.

Spatio-temporal mapping (prediction stage)
In the prediction stage, the output was the posterior distribution of the prevalence of where µ j (x k ,2010) and C j (x k ,2010, x k ,2010) are scalar quantities of the mean function value (Eq 1) and the covariance function value (Eq 2) in 2010 at , respectively, f j (X,t) , µ j (X,t) and C j (X,t, x k ,2010) are vectors of length of the random field, mean function and covariance function values at the data locations: and is an by matrix with elements C j (X,t, X,t) r, p = C j (x r ,t r x p ,t p ) r, p . The subscript denotes that the quantity was evaluated with the sample from the posterior distribution.
To this f j (x k ,2010) sample, the unstructured component (drawn from

A2.3 Transmission intensity
The Malaria Atlas Project (MAP) has developed spatio-temporal MBG frameworks to generate world P. falciparum endemicity maps, the most recent of which was created for 2010 [8]. The computational demands to generate predictive P. falciparum maps at each year from 1990-2010 are substantial under the current MAP framework. In this model, only the spatial predictions for 2010 were included in the mean function of the Gaussian process. That is; µ(x,t) = β 0 + β 1 t + β 2 m(x,2010) In this way, the P. falciparum transmission was incorporated into the model as a mechanism for how the prevalence of dhps540E changes spatially.