Bayesian spatio-temporal analysis of malaria prevalence in children between 2 and 10 years of age in Gabon

Background Gabon still bears significant malaria burden despite numerous efforts. To reduce this burden, policy-makers need strategies to design effective interventions. Besides, malaria distribution is well known to be related to the meteorological conditions. In Gabon, there is limited knowledge of the spatio-temporal effect or the environmental factors on this distribution. This study aimed to investigate on the spatio-temporal effects and environmental factors on the distribution of malaria prevalence among children 2–10 years of age in Gabon. Methods The study used cross-sectional data from the Demographic Health Survey (DHS) carried out in 2000, 2005, 2010, and 2015. The malaria prevalence was obtained by considering the weighting scheme and using the space–time smoothing model. Spatial autocorrelation was inferred using the Moran’s I index, and hotspots were identified with the local statistic Getis-Ord General Gi. For the effect of covariates on the prevalence, several spatial methods implemented in the Integrated Nested Laplace Approximation (INLA) approach using Stochastic Partial Differential Equations (SPDE) were compared. Results The study considered 336 clusters, with 153 (46%) in rural and 183 (54%) in urban areas. The prevalence was highest in the Estuaire province in 2000, reaching 46%. It decreased until 2010, exhibiting strong spatial correlation (P < 0.001), decreasing slowly with distance. Hotspots were identified in north-western and western Gabon. Using the Spatial Durbin Error Model (SDEM), the relationship between the prevalence and insecticide-treated bed nets (ITNs) coverage was decreasing after 20% of coverage. The prevalence in a cluster decreased significantly with the increase per percentage of ITNs coverage in the nearby clusters, and per degree Celsius of day land surface temperature in the same cluster. It slightly increased with the number of wet days and mean temperature per month in neighbouring clusters. Conclusions In summary, this study showed evidence of strong spatial effect influencing malaria prevalence in household clusters. Increasing ITN coverage by 20% and prioritizing hotspots are essential policy recommendations. The effects of environmental factors should be considered, and collaboration with the national meteorological department (DGM) for early warning systems is needed. Supplementary Information The online version contains supplementary material available at 10.1186/s12936-024-04880-8.


Visualization of malaria prevalence
As shown in Figure 1 (see appendix), the distribution of the prevalence was right skewed.To reduce the skewness and to approximate by the normal distribution, the prevalence was transformed using the log or the square root.

Figure 2: Overview of transformation of the prevalence using normal plot probability to approximate normal distribution
However, as shown in Figure 3, the log transformation helped to obtain a more symmetric distribution but bimodal.From the normal probability plot, some departures from the ideal line at the start and the end was observed.For the inverse square root transformation, the distribution was also bimodal, somehow symmetric but with tail.From the normal probability plot, there was some departures from the ideal line only at the starting point.In contrast to the log transformation, the mean and median were slightly different for the inverse transformation (0.08 vs 0.02).Therefore, more symmetric with the log transformation than the inverse square root transformation.Hence, normality was assumed or even the theorem of central limit (TCL) was used to assume normality even for this bimodal distribution.The same was found for the distribution by type of residence (Figure 4).The formula is provided in the supplementary document suggested to approximate the weight, including the correction for non-response between the clusters was as follows: ,  ∈ [, ] and    =    the denormalized type of the household weight provided in the DHS data as HV005 variable. is the stratum from  to , which represents the number of strata by rural and urban area for all the regions used as stratification, and  represents the cluster.  is the total number of clusters from the census in the stratum ,    is the number of clusters interviewed in stratum ,   the average number of households by sampling strata for each cluster from the census data found in Table A.3 of the Gabon DHS final report,   the number of households completed, and  is the number of total households according to the census frame.For this work,  was set to 0.5.The sensitivity analysis could be performed for the other values [29].

Model building
As shown in Table 7, the results suggested that using the model with the random slope on time improved the fit of the model significantly.The results need to be interpreted with caution, due to collinearity of the parameters and the assumptions.(1) Multi-collinearity As shown in Table 8, collinearity was detected for the variables annual precipitation and aridity.The VIF was above 10, therefore annual precipitation was removed from the model.Also, the problem of multicollinearity was detected in the association of precipitation variable with diurnal day land temperature, land surface temperature, wet days and proximity to water.Therefore, precipitation was dropped from the model.The growing season was removed as there was some evidence of correlation with aridity as shown in the plot.However, aridity was used in the model because some studies have suggested an association with the prevalence of malaria ( 14).After the exclusion, the VIF was re-calculated for the remaining variables.Thus, in Table 8, the VIF was less than 10 for all the variables after removing some variables.(2) Comparison of the models: day land surface and land surface temperature After evaluating the multicollinearity, the following models were compared: 1) model with covariates + day land temperature and 2) model with covariates + land surface temperature.The idea was to reduce as much as possible the multicollinearity by removing potential variables which were not improving the fit of the model.As a result, from the Table 9, the model with day land surface temperature was the best as the AIC was the lowest.(3) comparing the models: including and excluding diurnal temperature In Table 10, the AIC for the model without diurnal temperature was the lowest.Hence, diurnal temperature was excluded from the model.The following variables were the final variables selected: Proximity to Water, all population count, aridity, day land surface temperature, enhanced vegetation index, ITN coverage, mean temperature, wealth index, wet days and rainfall.For this analysis the main variables were ITNs coverage, rainfall, aridity and wet days.(4) Varying the intercept and slope on ITN coverage and year After selecting this model, to assess if the fit of the model was better when the random slope is introduced on the variable year and ITNs coverage, the two models were compared.In Table 11, the model including slopes on both ITNs coverage and year was better than the model including the random slope only on year (ANOVA; p-value < 0.05).The random slope was considered only for the variables ITN and year.In fact, if rainfall, aridity and the other variables were used with a random slope, the number of observations in the data would not allow to fit a good model.Also, according to our description, ITNs coverage appeared to be the most varying in space and time.

Diagnostics of the selected model
As shown in Figure 8, the variance was not constant as the vertical spread was not the same for different values on the x-axis.The probability normal plot either for the random effect or the residuals showed that there was a slight departure from normal distribution as points were more deviating from the ideal line.However, the log transformation was better in terms of reducing the variance to become more constant, and stretching the points better than the untransformed data.The departure from the normal distribution either for the random effect or the residuals was reduced.As shown in Figure 9, after using the square root transformation, the residuals and the random effects became more normal, and the variance became more constant than observed with the log transformation.Using the inverse square root transformation, the random effect for year and ITN did not approximate the normal distribution more than the square root transformation (see appendix table (Erreur !Source du renvoi introuvable.).

Final Conclusion:
Despite little deviation from the assumption, the final model was interpreted using either the log or the square root transformation, because the linear model is robust for small deviations.The interest here was in the random effect, hence the model with square root transformation could be used.Indeed, with this model, the assumptions were less violated for the random effect (68).According to this model (Figure 10), the average effect of the prevalence was different from cluster to cluster when not adjusting for the other variables.This result was significant because the credible interval for the intercept did not contan 0. The effect of year and ITNs coverage on the prevalence by cluster was significant.It was represented by those black points who did not cross the red line with their credible interval.Using the model with log transformation, the same result was also observed.However, no significant effect on the year was obtained for the model using the inverse square root transformation (see appendix).

Multiple linear regression of malaria prevalence
Multiple linear regression for different years was performed using variables selected previously.However, as there was no data available for the variables ITNs coverage and all population count for the year 2000, the linear regression was conducted for this year without these two variables.The results with and without extreme values found on the variable population count were compared.The results of the regression were not modified except the coefficient which became significant after removing the extreme values.Hence, all the analysis was performed without these values.For all the years, the F-test showed with overwhelming evidence that at least one variable was associated with the outcome among all the variables selected.The assumption of linearity was assumed looking to the scatterplot presented in Erreur !Source du renvoi introuvable..
After using either the log or square root transformation, the assumption of normality and homoscedasticity were met (Figure 14).For the assumption of independence on the residuals, overwhelming evidence of strong spatial autocorrelation in the residuals for each year was found in Table 12: Spatial autocorrelation on residuals and malaria prevalence over time.After including variables to explain the variation of the prevalence, the spatial autocorrelation observed on the prevalence was slightly reduced but not enough.
A small effect of time was observed from the year 2000 to 2005, 2005 to 2010, 2010 to 2015, that is, a difference of spatial autocorrelation of 0.12, 0.26, and 0.06 respectively.This difference was almost constant when considering only the dependent variable.This OLS regression showed that ITNs coverage was significantly associated with the prevalence, in such a way that one-unit increase was associated with 10% decrease of the prevalence in a linear fashion.Its magnitude was varying slightly by year.The prevalence was increasing by 2% with a one unit increased of wet days.Wealth index was also increasing with the prevalence.In fact, almost all the variables were significantly related with the prevalence of malaria.It was also noted that the coefficients were varying slightly by year (Table 13).

Spatial varying coefficients
No evidence was found to support that the relationship for some variables was varying by location.However, only day land surface temperature and ITNs coverage were plotted to represent the variation showed in Figure 15.All the other variables were not showing any pattern (see supplementary).The patterns observed assumed that the increase of the ITNs coverage was associated with the decrease of the prevalence in the center of the country, and a slight increase around.An increase of the day land surface temperature had was associated with the decrease of the prevalence in the center part.But they were not significant.

Linear models
From M1, after adjusting with M2, M3 (enviromental variables), and M4 (full model), the spatial variation was reduced by 68%, 88%, and 76% (26%, 9%, and 27%) for SPTOLS (space-time correlated GAM) respectively (Table 15).The best model with small DIC was the full model.The value of the range showed a strong spatial correlation decreasing slowly up to 142 km (290 km with GAM).
For all the sub-models, the spatial variance was obviously greater than the measure error by more than 95%.The coefficients for time () were almost the same for all the three sub-models.No evidence for the time effect was found since the credible interval contains 0. The coefficient for time was too high and the credible interval too tight.For the replicated GAM, M3 was better than M2, while for the space-time correlated model, M2 was better than M3 with the lowest DIC.Using the full model, increasing the ITNs coverage (or population) by 1%, decreased significantly (and slightly) malaria prevalence by about 52% (3%); increasing the night land temperature by 1-degree increased malaria prevalence by 3% (supplementary material).Since no evidence of the effect of the time on the prevalence was found, the model without time by taking the mean value for all the years was run.

Computing models
Using the Moran index, or the Lagrange multiplier (LM), no evidence of spatial dependence was observed.The DICs of the spatial lag models were smaller than the DIC of the GAM ran previously (-3125.58 vs -1744.26).Based on the ML (Maximum Likelihood), the best model was the SDEM.The fit with ML and INLA, gave approximately the same values (Table 16).

Impact
As shown in Table 16, since the indirect effect of ITNs coverage was positive and significant, therefore for a particular cluster, increasing the ITNs coverage (day land surface) in its nearby clusters (same cluster) was found to be significantly associated with the decrease of the prevalence in this cluster by 21% (2%).An increase of wet days (mean temperature) in a cluster (nearby clusters) was associated with an increase of the prevalence in the cluster by 3% (2%).There was a significant difference of the prevalence between the type of residence.From M0, the spatial variation was reduced by 33% , 13%, 8% when considering the M4, the model with environmental variables (M3) and M2, respectively.Observing particularly the curve for the ITNs coverage, it appeared that malaria prevalence started to decrease only after 20% of the coverage (supplementary material).

Figure 1 :
Figure 1: Panel (A), (B) and (C) show the distribution of the prevalence using histogram before and after transforming with log and square root for each year.Panel (D), (E) and (F) show the density in overall for each transformation After the transformations, no one of the other transformations (Figure 2) was following the normal distribution even by type of residence.

Figure 3 :
Figure 3: Distribution of the prevalence using normal plot probability to compare log and inverse square root prevalence in overall (left) and by type of residence (Urban-Rural -right)

Figure 4 :Figure 5 :
Figure 4: Panel (A), (B) and (C) show the distribution of the prevalence using histogram before transforming and after transforming with log and square root Prevalence in overall by type of residence (Urban-Rural)

Figure 6 :
Figure 6: Comparison of variance with estimate obtained using smoothing of the sample design.Panel (A) and Panel (B) show the estimation obtained without considering the design, and panel (C) and Panel (D) are the estimation obtained when the survey design included.For each plot the direct model is plotted against the smoothed model.

Figure 7 :
Figure 7: Correlation matrix estimating the correlation between the covariates and malaria prevalence.The black cross means no evidence of association between two variables.

Figure 8 :
Figure 8: Comparison of residuals from the log transformed and the untransformed prevalence.The plot at the left and right represents the residual variance of log transformed and for untransformed malaria prevalence respectively

Figure 9 :
Figure 9: Comparison of residual's variance and distribution for the diagnostic of the model between log and square root transformation.Panel A and B represent the random effects, C to E the residuals of model using the log, D and F the residual of the model using square root transformation on the outcome.

Figure 10 :Figure 11 :
Figure 10: Effect of ITNs coverage and year on the prevalence from each cluster

Figure 12 :Figure 13 :
Figure 12: Effect of ITN and year on the prevalence from each cluster: square root transformation of malaria prevalence

Figure 14 :
Figure 14: Residuals inspection: The pair of plots represent the diagnostic for each year respectively 2000, 2005, 2010 and 2015

Figure 15 :
Figure 15: Spatial varying coefficient -visualization of the prevalence of malaria with Day land surface temperature (left) and ITN (right)

Figure 16 :Figure 17 :
Figure 16: Smoother of malaria prevalence and covariates estimated from GAM with replicated method

Figure 20 :
Figure 20: Spatial autocorrelation on residuals and malaria prevalence over time.There was overwhelming evidence of the observed spatial autocoorelation (P < 0.001).

Table 1 :
Distribution of malaria prevalence by type of residence and year

Table 2 :
Covariates distribution by year and type of residence

Table 3 : Comparison of models based DIC for interaction effect Random Walk type Space-time type DIC Interaction
Cluster-specific temporal trends spatially correlated -or spatial patterns temporally correlated rw = random walk, DIC = Deviance Information Criterion

Table 4 :
DIC model interaction vs without interaction a Spatial fraction is the mixing parameter in the model; b CI = Credible Interval

Table 5 :
Space-time smoothing estimation of malaria prevalence

Table 6 :
Space-time smoothing estimation prevalence by year, province and type of residence from the unit level model

Table 7 :
Random intercept vs random slope year

Table 8 :
Multicollinearity after removing variables

Table 9 :
Model + Day land surface vs Model + Land surface temperature -AIC

Table 10 :
model with diurnall vs without diurnal

Table 11 :
Random slope year vs Random slopes ITN + year

Table 12 :
Spatial autocorrelation on residuals and malaria prevalence over time

Table 13 :
Ordinary least square regression for each year

Table 14 :
Diagnostic of GWR method

Table 15 :
Spatial parameters GAM with all variables considered as non-linear

Table 17 :
Checking spatial autocorrelation on the residuals

Table 18 :
DIC for all GAMs

Table 19 :
Comparison of rho from ML vs INLA P-value from Likelihood Ratio test; b P-value from Wald test.Both tests are for evaluating if the spatial correlation coefficient is different from 0 a