Setting, design and sample
Ghana is in West Africa and covers a total area of 238,538 km2. It lies between latitude 4 and 12 N and longitudes 4 W and 2 E. It is bordered in the south by the Gulf of Guinea, Côte d’Ivoire to the west, Togo to the east, and Burkina Faso to the north. Presently, Ghana has 16 administrative regions.
Data from the 2019 GMIS of the DHS program was used in this study [11]. The 2019 GMIS is the second round of the survey with the first round conducted in 2016 which provides a population-based estimates of malaria indicators as a supplement to the routine administrative data collected in the country that are used to inform strategic planning and evaluation of the Ghana Malaria Control Programme [11]. Computer-assisted personal interviewing (CAPI) was employed to collect the data. In the survey, information on malaria prevention, treatment, and prevalence is collected. The data is freely available online at DHS MEASURE Program website [17]. Parents or guardians consent were sort for children aged 6–59 months who were tested for anaemia and malaria infection. The study used a biomarker questionnaire to record the results of the anaemia and malaria testing of the children aged 6–59 months. This study used data on under-five children from the biomarker dataset which has malaria RDT results on 2867 under-five children residing in 192 geographical locations (clusters). Detailed description of the survey methods employed in the 2019 GMIS is available elsewhere [11].
The Ghana Malaria Indicator Survey (GMIS) is based on a two-stage sampling design. The sampling was based on ten administrative regions. Each region was divided into urban and rural areas, resulting in twenty sampling strata. Enumeration areas (EAs) were sampled from each stratum. In the first stage, 200 EAs (97 in urban areas and 103 in rural areas) were selected with probability proportional to EA size. In the second stage of selection, approximately 30 households were selected from each cluster to make up a total sample size of 6,002 households of which 5388 were occupied at the time of field work. A total of 5799 household were interviewed among the occupied households, resulting in 99.4% response rate. Of the 5246 eligible women, about 5181 women aged 15–49 years (representing 98.8% response rate) who were either permanent residents of the selected households or visitors who stayed in the household the night before the survey were interviewed. All children aged 6–59 months from the interviewed households were eligible for malaria testing upon parental or guardian consent [11].
Outcome variable
The outcome variable of interest is the number of under-five children with positive test on rapid diagnostic test (RDT) kit in each sampled cluster. The RDT malaria test was conducted by taking a drop of blood with the SD BIOLINE Malaria Ag Pf RDT and tests for one antigen, histidine-rich protein II (HRP-II), specific to Plasmodium falciparum, the major cause of malaria in Ghana. The RDT kit produces result in 15 min [11].
Covariates
Though the main goal of the study is to predict and map under-five malaria risk, the study adjusted for selected environmental factors to allow for examination of how these factors help explain some of the spatial variability in under-five malaria risk across Ghana. These factors include insecticide-treated nets (ITNs) coverage (i.e., proportion of the population protected by ITNs), travel time (time required to reach a high-density urban centre), aridity (ranging from most arid to most wet), enhanced vegetation (ranging from least vegetation to most vegetation), annual temperature (mean temperature), and precipitation (average precipitation–per month). A detailed description of the methods and procedures employed to generate these geospatial covariates and their sources are published elsewhere [18]. The consideration of these environmental and climatic covariates was based on the available literature on predictors of malaria and other health outcomes [19,20,21,22]. Following recommended strategy [20, 23], the study accounted for the displacement of the GPS coordinates of the sampled cluster locations by creating 2 km buffers for urban and 5 km buffer for rural settings to ensure that the correct cluster centroids were captured in the analysis.
Geospatial analysis
Model formulation
Following a previous modelling approach [20], we employed a Bayesian Geospatial model [20, 24] to study spatial risk in under-five malaria while adjusting for environmental predictors. Consider \({Y}_{i}\) to be the number of under-five children with positive RDT test out of the total \({N}_{i}\) under-five children sampled per geographical cluster. Given the true malaria risk \(P\left({z}_{i}\right)\) at location \({z}_{i}\), the number of under-five children with positive RDT test out of the total number of under-five children sampled follows a binomial distribution formulated as:
$${Y}_{i}|P\left({z}_{i}\right) \sim Binomial\left({N}_{i}, P\left({z}_{i}\right)\right),$$
$$logit\left(P\left({z}_{i}\right)\right)= {\beta }_{0}+{\varvec{d}}{\left({{\varvec{x}}}_{{\varvec{i}}}\right)}^{\mathbf{^{\prime}}}\beta +S\left({z}_{i}\right).$$
where \({\beta }_{0}\) is the intercept parameter which by default is assigned Gaussian prior with mean and precision to be zero (0), \(d\left(.\right)\) is a vector of observed environmental predictors of the outcome variable \(Y\), \(\beta\) is a vector of spatial regression coefficients for the covariates which by default was assigned Gaussian prior with mean zero (0) and precision 0.001, and \(S\left(.\right)\) is a spatially structured random effect and follows a zero-mean Gaussian process with variance \({\sigma }^{2}\) and a given correlation function
$$\rho \left(u\right)=corr\left\{S\left({z}_{i}\right), S({z}_{j})\right\}$$
where \(u\) is the Euclidean distance between locations \({z}_{i}\) and \({z}_{j}\). There are various parametric families for \(\rho \left(u\right)\) as outlined by Diggle (2007) [25]. In the current analysis, the study use the Matérn class of covariance function[26] given by
$$Cov\left(S\left({z}_{i}\right),S\left({z}_{j}\right)\right)=\frac{{\sigma }^{2}}{{2}^{v-1}\Gamma \left(v\right)}{\left(k||{z}_{i}-{z}_{j}||\right)}^{v}{K}_{v}\left(k||{z}_{i}-{z}_{j}||\right).$$
Here, ||. || denotes Euclidean distance, \({\sigma }^{2}\) represents the spatial variance, \(v\) is the shape parameter which determines the smoothness of\(S\left(z\right)\), in the sense that \(S\left(z\right)\) is \(v-1\) times mean-square differentiable and the scale parameter \(\kappa >0\) is related to the practical range \(\rho =\frac{\sqrt{8v}}{k}\), the distance at which the spatial correlation approaches 0.1 or is negligible, \({\kappa }_{v}(.)\) is the modified Bessel function of second kind and order\(v>0\).
The model was implemented under the Integrated Nested Laplace Approximation (INLA) approach [27] with Stochastic Partial Differential Equation (SPDE) strategy [28]. Based on a previous study [20], a mesh for inference and prediction was created for the SPDE strategy because the data (i.e., geostatistical data) points in this study do not have explicit neighbours required by the SPDE strategy unlike areal data. The description of the mesh creation is provided in Additional file 1. The detailed procedures for mess creation are published elsewhere [20, 29].
In this study, nine (9) models were set up: two (2) non-spatial models with different set of covariates included, one spatial model without covariates, and five (5) spatial models with different set of covariates. The Watanabe-Akaike information criterion (WAIC) was employed to investigate how well each of these nine (9) models fits the data, and to select the model that relatively fits the data well among the competing models. The level of uncertainty in the fitted model estimates were quantified by estimating the 95% credible intervals and the standard errors and map these uncertainties continuously across the whole of Ghana. Furthermore, the study compares the predictive maps for the spatial model with covariate and spatial model without covariates to examine if the included covariates in the spatial model explained some differences in malaria prevalence predictive maps. The study investigated how well the predictive model performs in the presence of new data via cross-validation procedure by splitting the data into training and validation sets, a common and generally accepted model validation approach in this area [30]. The R-INLA package [29, 31] was used for all the analyses.
Model validation
It is critical to examine how well the predictive model performs, especially in the presence of new data. This study employed cross-validation approach to assess the predictive performance of the model under out of sample procedure. First, the data was split into training and validation sets, and set a seed of 123 to make the partition reproducible. The model was trained on 75% of the samples and tested on 25% of the samples. The study assessed the model predictive performance by plotting the observed and the predicted malaria prevalence and estimated the resultant correlation.
Interactive web-based mapping of the predicted malaria prevalence
To support policymakers with readily available quality data for targeted policy and intervention strategies, especially for malaria surveillance amidst limited public health resources in these settings, the study produced interactive web-based maps for the predicted malaria prevalence to improve visualization and identification of higher risk communities for urgent intervention and further research in this setting where universal intervention is practically impossible due to limited public health resources. The spatsurv, rgdal, leaflet, and sp packages in R version 4.2.0 and RStudio [32, 33] were used to support the development of the interactive web-based predicted malaria prevalence maps.
Ethical consideration
Permission was granted by DHS MEASURE Program to use the 2019 GMIS data for the study. The data is freely available after a simple, registration-access request at the link https://dhsprogram.com/data/dataset_admin/index.cfm. The protocol for the 2019 GMIS was approved by the Ghana Health Service Ethical Review Committee and ICF’s Institutional Review Board [11].
The role of the funding source
The present study did not receive any support from any funding source. Also, the funders of the original survey played no role in the design, data collection, analysis, interpretation, writing of the manuscript, and the decision to submit this manuscript. The author confirm that he has full access to all the data in this study and accept responsibility to submit for publication.