Complete sources of cluster variation on the risk of under-five malaria in Uganda: a multilevel-weighted mixed effects logistic regression model approach

Background Malaria, a major cause of mortality worldwide is linked to a web of determinants ranging from individual to contextual factors. This calls for examining the magnitude of the effect of clustering within malaria data. Regrettably, researchers usually ignore cluster variation on the risk of malaria and also apply final survey weights in multilevel modelling instead of multilevel weights. This most likely produces biased estimates, misleads inference and lowers study power. The objective of this study was to determine the complete sources of cluster variation on the risk of under-five malaria and risk factors associated with under-five malaria in Uganda. Methods This study applied a multilevel-weighted mixed effects logistic regression model to account for both individual and contextual factors. Results Every additional year in a child’s age was positively associated with malaria infection (AOR = 1.42; 95% CI 1.33–1.52). Children whose mothers had at least a secondary school education were less likely to suffer from malaria infection (AOR = 0.53; 95% CI 0.30–0.95) as well as those who dwelled in households in the two highest wealth quintiles (AOR = 0.42; 95% CI 0.27–0.64). An increase in altitude by 1 m was negatively associated with malaria infection (AOR = 0.98; 95% CI 0.97–0.99). About 77% of the total variation in the positive testing for malaria was attributable to differences between enumeration areas (ICC = 0.77; p < 0.001). Conclusions Interventions towards reducing the burden of under-five malaria should be prioritized to improve individual-level characteristics compared to household-level features. Enumeration area (EA) specific interventions may be more effective compared to household specific interventions.


Background
Malaria, a major cause of mortality worldwide [1] is caused by protozoa of the genus Plasmodium [2].Within the two years of the start of COVID-19 pandemic, malaria endemic countries, Uganda inclusive, reported more than 101 million cases [3].Uganda emerges as one of the six countries that account for more than half of all malaria cases worldwide [4] where children under five years of age are the most vulnerable group [5].
Malaria is linked to a web of determinants ranging from individual to contextual [6][7][8][9].Hence, it's vital to take advantage of the opportunity provided by malaria data, to critically examine the magnitude of the effect of clustering of data points [10].
Issues that researchers usually overlook while modelling multilevel data based on complex survey design are; use of final survey weights (single level weights that are only appropriate for single level analysis) instead of multilevel weights or level-specific weights [11] and ignoring cluster variation on the risk of study outcomes/diseases [12].This most likely produces biased estimates, misleads inference and lowers study power.Besides, it is critical to consider several metrics like ICC as it was the case of this study, to identify risk factors of disease and derive appropriate public health interventions while analysing data from multilevel study designs [13].Failure to do so can lead to inappropriate interventions for prevention and control of diseases like malaria and associated adverse consequences.Although some studies based on a multilevel design have adjusted for clustering in the data, a number of them have not considered determining the complete sources of cluster variation on malaria which may be an important statistic to guide critical levels of public health interventions and future study designs [14,15].
The objective of this study was, therefore, to determine the complete source of cluster variation on the risk of malaria, and to identify risk factors associated with under-five malaria in Uganda.

Data source and study population
This study made use of secondary data based on a twostage cluster and stratified sampling technique from the Uganda Malaria Indicator Survey (UMIS) of 2018/19.The first stage of sampling involved selecting sample points (clusters) from the sampling frames.A total of 320 clusters were selected with probability proportional to size from the enumeration areas (EAs) covered in the 2014 National Population and Housing Census (NPHC).The second stage of sampling involved systematic selection of households.Twenty-eight households were selected from each EA, for a total sample size of 8,878 households.The primary objective of the 2018-19 UMIS is to provide up-to-date estimates of basic demographic and health indicators related to malaria.Specifically, the 2018/19 UMIS collected information on vector control interventions such as mosquito nets and indoor residual spraying of insecticides, on intermittent preventive treatment of malaria in pregnant women, on care-seeking and treatment of fever in children, and malaria knowledge, behaviour, and practices.All women age 15-49 who were either permanent residents of the selected households or visitors who stayed in the household the night before the survey were eligible to be interviewed.After a parent's or guardian's consent was obtained, children age 0-59 months were tested for anaemia and malaria infection.The study population consisted of 7,632 children less than 5 years of age who were tested for anaemia and malaria infection by a team of two health technicians, respectively [16].The selection of the final study sample is as shown in Fig. 1.

Analysis model
The dataset was first explored for preparation purposes.Before any analysis was conducted, the data were sorted, some variables recoded while other variables and some observations that were not of interest to the research problem were eliminated.Categorical variables were represented as counts and percentages.Collinearity was assessed among independent variables using a correlation matrix.Variables with correlation coefficient of 0.4 and above were not included in the same model.The survey design estimation command (svy) in Stata 15.0 (StataCorp, College Station, TX) was used to conduct descriptive analysis, accounting for the level weights.The level of statistical significance was p < 0.05 for all analyses.Overall, four multivariable models were considered; the first model neither adjusted for weighting nor cluster variation in the risk of under-five malaria; the second model only adjusted for cluster variation; the third model only adjusted for weighting; and the forth model adjusted for both weighting and cluster variation.A model was, therefore, considered to best fit the data if it had lower design factor (deft) values in general.Lower deft values are associated with lower loss of precision of model estimates [17].The design factor (deft) was calculated as follows: where; deff is the design effect.rho is the intra-class cor- relation for the variable in question.n is the size of the cluster.
To assess the association between malaria infection in under-five children and individual, household, and enumeration area factors, a multilevel-weighted mixed effects logistic regression model (chosen among the four compared models as the best model) was specified to account for contextual within-household and within-EA correlations [18][19][20].The model is represented as below: where; ln is the natural logarithm.p ijk is the probability of testing positive for malaria for the ith under 5-year-old child in household j and EA k. β 0 is the mean log-odds of malaria across household and EA.
X ijk is a level 1 covariate for the ith child in household j and EA k.
β 1 represents the slope associated with X ijk which rep- resents the relationship between the level 1 covariates and the log-odds of malaria.
η k is the random effect for EA k. ξ jk is the household random effect.Bivariate multilevel weighted-mixed effects logistic regression was conducted, using each of the individual, household, and community level risk factors as predictors and malaria test result as the outcome.Individual predictors with p < 0.20 were considered for inclusion in the multivariable multilevel logistic regression models.The multivariable analysis was conducted in a sequential process resulting into several models.Model 0 (the null model) was fitted to decompose the total variance of malaria risk between the cluster and level-1 covariates.It only included the random intercept to assess EA and household contribution to malaria risk before adding explanatory variables.The null model established the degree of variance at the cluster level in order to validate the use of multilevel modeling.Model 1 contains individual (level-1) variables; model 2 has household (level-2) variables in addition to variables in model 1; model 3 includes EA (level-3) variables in addition to variables in model 2. Model 3 was selected as the final model that was used to identify factors associated with malaria risk in under-five children since it was the most complete among the three models.To measure the extent to which individuals within the same group are more similar to each other than they are to individuals in different groups, intra-class correlation coefficient (ICC) was used [21].A higher proportion of the ICC was linked to a higher general contextual effect [22].The formula for the ICC is presented as below: where V A is the cluster or area level variance and π 2 /3 is a scalar that corresponds to the individual level variance.When the contribution to the overall ICC of a level(s) was very low (< 10%) its effect was considered insignificant and hence, the random effects component(s) at the specific level(s) was considered insignificant.

Weighting
Since the sample for this study is a two-stage stratified cluster sample, level weights were calculated separately, based on sampling probabilities for each sampling stage and cluster.In this study, level weights were estimated using a framework for approximating level weights in  1.

Regional variation in prevalence of malaria among the under-five children
Malaria prevalence showed great variability across regions.Of the 15 statistical regions in Uganda, five of them had malaria prevalence above the national average of 21.1%.Prevalence was highest in the West Nile region (42.6%),followed by Karamoja region (41.5%),Busoga region (39.3%),Acholi region (31.7%) and Lango region (22.7%) as shown in Fig. 2.

Factors associated with malaria risk in under-five children
Table 2 shows results of the multivariable multilevelweighted mixed effects logistic regression modelling at various levels.It is by this model that risk factors of under-five malaria were determined.At the individual level, every additional year in a child's age was associated with 42% higher odds of malaria infection (AOR = 1.42; 95% CI 1.33-1.52).Also, children whose mothers had at least a secondary school education had about 47% lower odds of malaria infection compared to those whose mothers were uneducated (AOR = 0.53; 95% CI 0.30-0.95).At household level, children who dwelled in households in the two highest wealth quintiles had lower odds of malaria infection compared to those in the two lowest wealth quintiles (AOR = 0.42; 95% CI 0.27-0.64).At enumeration area level, an increase in altitude by 1 m was associated with slightly lower odds of under-five malaria infection (AOR = 0.98; 95% CI 0.97-0.99).
The 'sex of child' variable was maintained in the final model regardless of its statistical insignificance because of its biological importance.Variables 'has bed net' and 'HH sprayed' were maintained because of their relational

Region
Fig. 2 Weighted regional ranking of prevalence of under-five malaria importance to the research problem, according to previous research literature.

Measure of sources of cluster variation in the risk of malaria
Figure 3 shows the proportional contribution of ICC across levels of the nested data and the error term.This informed the choice of the random effect term that was included in the mixed effects models.That is, EA.The EA random effect estimate for measuring variation in Table 3 shows significant variation in underfive risk of malaria.The random effect of Model-0 (null model) shows that there was statistically significant variation in the odds of a positive test for malaria across EAs (variance = 11.16;95% CI 8.55-14.56).The ICC value for a positive malaria test between EAs (ICC = 0.77; p < 0.001) indicates that 77% of the total variance in the positive testing for malaria was attributable to differences between EAs.Hence variation in the risk of underfive malaria was mainly attributed to EAs compared to  households.In addition, moving from Model-0 to Model-3, area variance reduced by 29.6% (from 11.16 to 7.86).Area variance steadily reduced as more fixed effects were added into the model.The percentage reduction in area variance in each model equates the percent of variance contributed by fixed effects at the specific level.Moreover, the variation across EAs remained statistically significant throughout the three models (1, 2, and 3).The ICC values showed heterogeneity between EAs.The ICC values in models 1, 2 and 3 indicate that; 44% was attributable to EAs differences after adjusting for individual factors, 29% after adjusting for household factors and 18% after adjusting for EA factors as indicated in Table 3.

Discussion
Identifying risk factors of malaria is critical in designing interventions and evaluating existing ones.This study did not find any sex difference among children underfive years of age with malaria.These results are consistent with similar studies based on multilevel analysis in Ethiopia [23] and Nigeria [24].In addition, according to this study's findings, children with educated mothers were less likely to have malaria compared to those whose mothers were uneducated.These results are similar to previous findings where prevalence of under-five malaria drastically decreases with increase in a mother's education [25] since education has previously been associated with use of malaria prevention measures like mosquito nets [16] as educated mothers are more prone to using these preventive measures.
At the household level, wealth index was significantly associated with malaria parasitemia.These finding are in agreement with previous studies [25,26].This is partly due to the fact that a household's economic status that can be represented by wealth index affects other factors like housing conditions which in turn have an effect on malaria prevalence in the affected households [16].Also, the nature of residence like in this study, has been found to be significantly associated with under-five malaria infection like it is the case with other studies [26,27].
Determining sources of cluster variation is important in designing future study designs and identifying effective and efficient levels at which interventions should be designed.The study presented a significantly higher ICC at EA level compared to other levels.Computing the ICC to estimate power and sample sizes is problematic given the difficulty in estimating variances a priori.Therefore, the ICC is typically estimated using the values reported in previous research [28].Hence the calculated ICC and deft from this study can be used in computation of power and effective sample size in future studies of MIS and other similar surveys based on complex designs like Demographic and Health Survey (DHS), Multiple Indicator Cluster Survey (MICS) and Performance Monitoring for Action (PMA) surveys.
Ideally, more units should be samples at EA level compared to household level for future designs of studies based of multi-stage (specifically two-stage) sampling since ICC at EA level was higher than that at household level.These results are consistent with a study [28] that found out that as a general rule of thumb, increasing the sample size at the highest level that is, sampling more groups will do more to increase power than increasing the number of individuals in the groups.However, the problem is that increasing sample size at higher levels is more difficult and costly than increasing the sample within each group.Thus, the increases in study power may come at a substantial cost.
ICC is used in equations along with the cluster size and the number of clusters to calculate the effective sample size (ESS) which is the sample size in clustered samples as compared with the number of subjects actually enrolled in the study.As an example, proper accounting for correlation among subjects in a cluster almost always results in a net loss of power, requiring increased total subject recruitment.Increasing the number of clusters enhances power more efficiently than does increasing the number of subjects within a cluster [29].
In addition, ICC guides level-based interventions.The higher this proportion, the higher is the general contextual effect [22].As an example, when the ICC value is lower at a specific level, then interventions need not to be done at that level but at a level where ICC is higher as it is the case of the study findings.High values of ICC therefore inform intervention to be done at specific levels [30].This study similarly suggested interventions based at a level with higher ICC value.Instead of household-specific interventions, this study recommends EA-specific interventions.

Study strength and limitations
The multilevel design of the MIS that allows contextual analysis was the strength of this study.The limitations were parasitaemia was diagnosed using rapid diagnostic test (RDT) which is not the gold standard.However, both microscopy (gold standard) and RDT have always yielded very close results.Also, given that the data resulted from a cross-sectional design, causal relationship between explanatory variables and under-five parasitaemia could not be guaranteed.This was a drawback concerning the prediction capability of the final model.Also, although some variables deemed important in analysis were not among the collected data, the variables available in the data were sufficient to address the study objective.Another limitation was potential recall bias because household and children characteristics were purely based on self-report by survey respondents.The potential bias was however minimal since response was only required about events from the most recent past and by eligible respondents.

Conclusions
Individual or child characteristics were more significantly associated to malaria risk compared to household and EA characteristics.Even after accounting for individual, household and EA level fixed effects; variation remained significant at EA level hence, cluster variation was substantial at EA level.Interventions towards reducing the burden of under-five malaria should be prioritized to improve individual-level characteristics compared to household-level features.This is because more variables were significant at child-level compared to household-level.Also, EA-specific interventions towards malaria control may be more effective compared to household level interventions.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from: [11] chart showing selection of the study participantsMalaria Indicator Surveys (MIS) proposed by the Demographic Health Survey program[11].The framework required data that is included in the publicly available UMIS datasets and final report.
The rest of the results are presented in Table

Table 1
Distribution of under-five children by selected background characteristics

Table 2
Factors associated with prevalence and risk of malaria in under-five children AOR: Adjusted odds ratio, CI: Confidence interval, EA: Enumeration area, HH: Household ** p < 0.001, *p < 0.05

Table 3
Measures of EA level variations in the risk of under-five malaria CI: Confidence interval, ICC Intra-class correlation coefficient ** p < 0.001