As malaria transmission declines, it becomes increasingly focal. In order to target resources accordingly, an understanding of transmission risk over fine scales is required. Traditionally, such risk mapping is done using cross-sectional infection prevalence surveys, such as malaria indicator surveys. In low transmission settings such surveys do not produce the number of positives required for risk mapping or decision making [4]. While a handful of malaria elimination programmes, such as Swaziland [5] and the Solomon Islands and Vanuatu [31], map the households of malaria cases facilitating fine scale risk mapping, most countries have to rely on health facility level case data. This paper describes a method that uses routine health facility malaria case data in conjunction with freely available remotely sensed data to predict malaria risk, and associated uncertainty, down to a scale of 1 km2.
While this example is restricted to the cross-scale prediction of malaria risk from health facility level data in Swaziland, this approach holds promise for cross-scale modelling and prediction of malaria in other transmission settings. In particular, this method is well suited to situations where prevalence is low (i.e. <3%) but the number of cases is still too high to allow follow up and mapping. The implementation of this approach in other settings requires several considerations. Firstly, in this study, the estimation of catchment boundaries was done only for those health facilities offering malaria diagnosis. In settings where this information is not known, catchment boundaries would have to be generated for each health facility, which could affect predictions. Secondly, only cases classified as local were used in the modelling process. If cases are not correctly classified according to origin of infection (i.e. if there is an under or over estimation on the proportion of cases that are imported), this approach is likely to lead to unreliable predictions. Thirdly, in Swaziland, catchment areas are reasonably small, due to the relatively small size of the country and the high population coverage of the public health system. How well this method works in settings where catchment areas are larger, and encompass a wider range of environmental conditions, is not clear. Fourthly, while treatment-seeking data are collected as part of MIS or Demographic Health Surveys, for some countries this information does not exist. In these settings, a relationship could be assumed using data from neighbouring countries or, ideally, a representative survey could be undertaken. Conducting sensitivity analyses, varying rates of treatment seeking, could also be a useful tool to assess the impact of this parameter on model predictions. This was not done here as it was beyond the scope of the study which focuses more on showing proof of concept. Fifthly, in settings with different health systems, for example where surveillance consists of community- and facility-based case detection or where the private sector plays a more prominent role, slightly different models would be required. Finally, the specific models used should be appropriate for the case data used. In this study, logistic regression was used to estimate pixel-level relationships due to the very low numbers of cases across the country. In other settings, employing alternative methods, such as zero-inflated, negative binomial or Poisson regression models, may be more appropriate.
Though not the main focus of this study, results from the model suggest that a positive relationship between temperature and risk of malaria exists at the pixel level. This fits with our understanding of the disease, as warmer temperatures are more conducive to parasite development. Unusually, the model seemed to suggest that areas further from health facilities were at reduced risk. This may be due in part to correlations between distance to health facility and other variables not included in the modelling process. For example, imported cases, proximity to which has been shown to be a risk factor for being a local case [14], could be less common in more isolated communities. Equally, population density could be much lower in more isolated communities which may reduce transmission potential.
Maps of the catchment level random effect term show that areas in the north east of the country tended to deviate the most from model predictions using temperature and distance to nearest facility. This suggests that there are other factors associated with risk in these areas that are not formally accounted for in the model. For those areas with positive random effect values (i.e. higher than expected risk given the environment) this could be due to presence of imported cases, or areas missed by IRS or ITN distribution. For those areas with negative random effect values (i.e. lower than expected risk given the environment), this could be due to high coverage of IRS or ITNs, or presence of other interventions not included in the modelling process. Equally, differences between areas could indicate differences surveillance capacity at the facility level in terms of diagnosis and reporting. Irrespective, mapping random effects is useful as it allows questions and hypotheses to be raised, other predictor variables to be considered or dubious data to be identified. Furthermore, it highlights catchment areas with no cases that may be suitable for transmission and areas unsuitable for transmission in which cases occur.
While the modelling framework has generated encouraging results with regard to cross-scale prediction, the approach outlined here has several important limitations and considerations for implementation elsewhere. Firstly, despite finding a relationship between treatment-seeking in the public sector and travel time to nearest health facility, overall rates of treatment-seeking in the public sector was extremely high at 98%. While Swaziland is a small country and does not have a large private sector, this percentage appears high. Unfortunately, the more frequently used question to assess treatment-seeking behaviour, where if at all those with a reported fever in the previous two weeks sought treatment, could not be used due to small sample sizes.
Secondly, it was assumed that individuals with malaria seek treatment at their closest health facility by travel time. While this is likely to be true in many cases, choice of health facility may be influenced by other factors such as the type and quality of service provided [32] as well as the cost of travel [33]. More complex catchment models, which include competition between different types of facilities and allow overlapping catchment areas [34], may improve the predictive accuracy of cross-scale predictive models. Equally, travel models using local data on travel preferences may improve predictions.
Thirdly, a relatively simple linear modelling approach was used to estimate the pixel scale relationships between malaria risk and covariates. There are several possible improvements which could lead to more accurate predictions. For example, generalized additive models, which would allow for more complex non-linear relationships could be explored. Similarly, including terms to account for spatial autocorrelation both at the catchment and/or pixel level might lead to more robust estimation of relationships and could help with predictions. Furthermore, a consideration of the environmental and ecological conditions at the time point at which the case occurred, or lagged as appropriate, would likely improve the accuracy of predictions and would allow time specific predictions to be made. With the availability of remotely-sensed data increasing, accessing such data should become progressively straightforward. Additionally, use of techniques such as lasso regression for covariate selection provide potential for a more automated modelling approach. While there are still a number of challenges to overcome, this raises the possibility of making these types of predictive models accessible to non-experts within malaria control programmes. These issues were not explored here due to computational limits, but are the focus of future studies.
Fourthly, it should be noted that not all cases that were diagnosed in Swaziland between 2011–2013 received an investigation, due to either resource constraints or failure to make contact with the case. The data used to build and validate these models therefore do not necessarily represent the full picture of malaria in the country. While we do not believe this introduced any bias in this case, such an issue illustrates the benefit of conducting case investigation at the health facility, which, as of 2014 is done in Swaziland. This also highlights the fact that in contrast to higher transmission settings, where predictions can be validated against gold-standard cross sectional survey data, a comparison with passively detected geolocated cases is the only method to validate predictions in this setting.
Finally, Swaziland is one of only a few programmes to have information on the household location of cases, enabling validation of the fine scale risk maps. If this information is known, there is no need for cross scale prediction and modelling can be done using the locations of case households [14]. If this information is not known, the Bayesian modelling approach described here can be used to generate estimates of uncertainty in the predictions. That said, household investigation and active surveillance should still be encouraged. High rates of active testing are generally believed to be a requirement for any elimination programme [35]. Furthermore, visiting case households provides an opportunity for additional targeted interventions such as presumptive treatment, ITN distribution or IRS and allows an assessment of household risk factors. These results do, however, suggest that programmes can obtain detailed understanding in the heterogeneity of malaria transmission without these specific data.
While this paper focusses on the prediction of fine scale malaria transmission risk from health facility data, this modelling framework has potential utility in other multi-scale modelling problems. This could be, for example, fine scale prediction of risk from district or school level disease data. Equally, this method could be used to look at the impact of fine scale interventions, such as village level ITN distributions, on catchment level malaria incidence. Similarly, this approach could be applied to modelling and predicting other environmentally and ecologically driven diseases such as Plasmodium vivax[36], schistosomiasis [28, 37], soil-transmitted helminths [38, 39] and lymphatic filariasis [40]. This is particularly true for low transmission settings, where large scale prevalence surveys become inefficient due to very large sample size requirements to find positives [3].