Childhood malaria case incidence in Malawi between 2004 and 2017: spatio-temporal modelling of climate and non-climate factors

Background Malaria transmission is influenced by a complex interplay of factors including climate, socio-economic, environmental factors and interventions. Malaria control efforts across Africa have shown a mixed impact. Climate driven factors may play an increasing role with climate change. Efforts to strengthen routine facility-based monthly malaria data collection across Africa create an increasingly valuable data source to interpret burden trends and monitor control programme progress. A better understanding of the association with other climatic and non-climatic drivers of malaria incidence over time and space may help guide and interpret the impact of interventions. Methods Routine monthly paediatric outpatient clinical malaria case data were compiled from 27 districts in Malawi between 2004 and 2017, and analysed in combination with data on climatic, environmental, socio-economic and interventional factors and district level population estimates. A spatio-temporal generalized linear mixed model was fitted using Bayesian inference, in order to quantify the strength of association of the various risk factors with district-level variation in clinical malaria rates in Malawi, and visualized using maps. Results Between 2004 and 2017 reported childhood clinical malaria case rates showed a slight increase, from 50 to 53 cases per 1000 population, with considerable variation across the country between climatic zones. Climatic and environmental factors, including average monthly air temperature and rainfall anomalies, normalized difference vegetative index (NDVI) and RDT use for diagnosis showed a significant relationship with malaria incidence. Temperature in the current month and in each of the 3 months prior showed a significant relationship with the disease incidence unlike rainfall anomaly which was associated with malaria incidence at only three months prior. Estimated risk maps show relatively high risk along the lake and Shire valley regions of Malawi. Conclusion The modelling approach can identify locations likely to have unusually high or low risk of malaria incidence across Malawi, and distinguishes between contributions to risk that can be explained by measured risk-factors and unexplained residual spatial variation. Also, spatial statistical methods applied to readily available routine data provides an alternative information source that can supplement survey data in policy development and implementation to direct surveillance and intervention efforts.


Calculation of expected cases and SMR
The expected malaria case-counts e st were calculated by multiplying the overall malaria risk for Malawi, π and the population of each district p st , i.e. e st = p st π. The overall malaria risk for Malawi is given by dividing the total number of cases in Malawi with the total population, i,e, π = yst pst where y st is the observed malaria counts in district s at time t and p st is the corresponding population. The logarithm of the expected malaria counts is then included in the model as offset with a coefficient of 1 and hence no effect on the response variable. The standardised morbidity ration (SMR) is given by the ratio of observed estimated cases, i.e. SM R = yst est . The maximum likelihood estimate (MLE) of the risk R st of a district in a GLM without random effects is the corresponding SMR, i.e. R st = yst est . In the mixed model setting, the posterior mean of the relative risk for each district is therefore a weighted average of the SMR for the district and the prior mean of the relative risk in the overall spatial region giving rise to smoother estimates than those provided by raw SMR estimates.

Exploratory analysis
To provide a more detailed description of the national malaria incidence pattern, Figure S1 shows a decomposition of the monthly incidence series into seasonal, trend and residual components using the loess method (Cleveland, Cleveland, McRae, & Terpenning, 1990). The trend component in Figure S1 shows a steady rise between 2004 and 2010, followed by a sharper drop between 2010 and 2013, and another rise since 2013. Both climatic and non-climatic factors could affect this. The period 2010 to 2013 coincided with a number of national-level policy interventions, whilst reporting mechanisms over time could also induce spurious trends; see below for further discussion.
The overall reporting rate from all the health facilities in the Malawi health system has been rising steadily over the years (Ministry of Health (MOH), 2015). Districts with better infrastructure are more likely to deliver better quality data. In general, data quality varies between districts.
Within the climatic zones, districts also show different patterns of malaria incidence. Climate is likely to be responsible, in part, for the observed differences in the incidence due to variable climatic conditions between zones. Figure S2 shows monthly malaria incidence in selected districts from each of the climatic zones. From the figure, it is apparent that malaria is endemic to all regions and zones and that there is inter-annual variation in the incidence. The problem of low data quality, partly caused by a gap in reporting is highlighted in one of the districts.

Model fitting
We use a two-stage strategy to fit a general model within the class defined by equations In the first stage, we fit a standard Poisson log-linear model, i.e. omitting the random effects U st in equation 2, to identify potentially important covariates. This is a conservative strategy in the sense that failure to take account of random effects typically leads to spuriously small standard errors of regression parameters.
In the second stage, we fit a full, mixed effects model including the covariates identified in stage one. We estimate the parameters β and θ by Bayesian inference. This requires us to specify prior distributions for the parameter vectors β and θ.
In what follows, we specify independent priors for β and θ, and use Y and U as shorthand for the complete sets of values of Y st and U st , respectively. Writing [·] to denote "the distribution of" and a vertical bar to denote conditioning, this results in the following model structure, For parameter estimation, we apply Bayes' theorem to obtain the joint posterior distribution of β,θ and the random effects U to give The integral in the denominator of 4 is analytically intractable, but we can sample from the joint posterior using Markov chain Monte Carlo (MCMC) methods; we used both Metropolis-Hastings and Gibbs sampling. We generated 3 chains of length 300,000 after a burn-in of 50,000 iterations, retaining every fiftieth iteration to obtain a sample of 5000 approximately independent realisation from the joint posterior distribution of β, θ and U for post-processing. To assess the convergence of the chains, we used visual inspection of trace plots and the Geweke statistic (Geweke et al., 1991) as well as the Gelman-Rubin diagnostic (Gelman, Rubin, et al., 1992). To carry out the Geweke test, a chain is divided into early and late windows containing the iterates. If the chain has achieved stationarity, the means of the values in the two windows are similar. The statistic is given by the difference between the two means divided by the standard error of their difference. The Z statistic is the test statistic and Z ∼ N (0, 1) if the chain has converged.

Predictive mapping
A standard approach to mapping results is to calculate point predictions of the relative risks, R st , which is defined as; see, for example (Best, Richardson, & Thomson, 2005). The minimum mean square point predictor of R st (or of any other quantity of interest) is its conditional expectation given Y , i.e. its expectation under the posterior distribution, which we estimate as the observed mean over MCMC samples. A useful refinement is to map separately the conditional expectations for a multiplicative decomposition of R st into the explained risk, exp(x st β) and the unexplained risk exp(U st ), again estimated as the observed means over MCMC samples. A useful way to convey uncertainty in predictions, especially in situations where relative risk thresholds are linked to policy interventions, is using probability exceedance maps (Diggle et al., 2007;Diggle, Moraga, Rowlingson, Taylor, et al., 2013;Diggle & Giorgi, 2015). In these, the mapped quantity is the posterior probability that θ st exceeds a specified threshold; where predictions are precise, the mapped probabilities will be close to zero or one.
A third option is to map quantiles of the posterior distributions of R st . For example, posterior medians are an alternative to posterior expectations as point predictors, whilst 2.5% and 97.5% quantiles would correspond to point-wise 95% credible intervals.

Discussion
To check whether the included climate and non-climate variables fully explain the observed recent increase in malaria cases, we fitted an additional model with ITN in addition to the model in the main paper without the ITN. Results of observed malaria risk show comparable results indicating both factors do not explain away the increase in malaria. The figure below shows the total, unexplained and explained variation by the model covariates.  Figure S4: (A) Overall variation, (B)Explained and (C) Unexplained variation