Age and geographic patterns of Plasmodium falciparum malaria infection in a representative sample of children living in Burkitt lymphoma-endemic areas of northern Uganda

Background Falciparum malaria is an important risk factor for African Burkitt lymphoma (BL), but few studies have evaluated malaria patterns in healthy BL-age children in populations where both diseases are endemic. To obtain accurate current data, patterns of asymptomatic malaria were investigated in northern Uganda, where BL is endemic. Methods Between 2011 and 2015, 1150 apparently healthy children under 15 years old were sampled from 100 villages in northern Uganda using a stratified, multi-stage, cluster survey design. Falciparum malaria prevalence (pfPR) was assessed by questionnaire, rapid diagnostic test (RDT) and thick film microscopy (TFM). Weighted pfPR and unadjusted and adjusted associations of prevalence with covariates were calculated using logistic models and survey methods. Results Based on 1143 children successfully tested, weighted pfPR was 54.8% by RDT and 43.4% by TFM. RDT sensitivity and specificity were 97.5 and 77.8%, respectively, as compared to TFM, because RDT detect malaria antigens, which persist in peripheral blood after clinical malaria, thus results based on RDT are reported. Weighted pfPR increased from 40% in children aged under 2 years to 61.8% in children aged 6–8 years (odds ratio 2.42, 95% confidence interval (CI) 1.26–4.65), then fell slightly to 49% in those aged 12–15 years. Geometric mean parasite density was 1805.5 parasites/µL (95% CI 1344.6–2424.3) among TFM-positive participants, and it was higher in children aged <5 years at 5092.9/µL (95% CI 2892.7–8966.8) and lower in those aged ≥10 years at 983.8/µL (95% CI 472.7–2047.4; P = 0.001). Weighted pfPR was lower in children residing in sub-regions employing indoor residual spraying (IRS) than in those residing in non-IRS sub-regions (32.8 versus 65.7%; OR 0.26, 95% CI 0.14, 0.46). However, pfPR varied both within IRS (3.2–55.3%) and non-IRS sub-regions (29.8–75.8%; Pheterogeneity <0.001). pfPR was inversely correlated with a child’s mother’s income (P = 0.011) and positively correlated with being enrolled in the wet season (P = 0.076), but sex was irrelevant. Conclusions The study observed high but geographically and demographically heterogenous patterns of asymptomatic malaria prevalence among children living in northern Uganda. These results provide important baseline data that will enable precise evaluation of associations between malaria and BL. Electronic supplementary material The online version of this article (doi:10.1186/s12936-017-1778-z) contains supplementary material, which is available to authorized users.


Appendix 1. Calculation of sampling weights
Primary sampling weights. In the first stage of sampling, 100 Enumeration Areas (EAs), which constituted the primary sampling units (PSUs), were selected from the following four strata: near water (wet) and low population density (rural) (WR), far from ambient water (dry) with low population density (DR), near ambient water and high population density (urban) (WU), far from ambient water with high population density (DU). Of the 100 EAs, 12 were selected in the pilot phase of the study, followed by sampling of additional 88 EAs for the main study. Specifically, we sampled (4, 4, 2, 2) pilot EAs from ( Tertiary sampling weights. At this level children were sampled from a list of all eligible in a given village, conditioning on age group at time of census (0-<3, 3-<6, 6-<9, 9-<12, 12+ years of age) and sex. The number of eligible children in a given age group was adjusted for infant and child mortality as estimated by the UBOS in the 2006 survey [1]. According to UBOS, infant mortality (mortality in the first year of life) was 76 out of 1000 live births, and child mortality (mortality between age one and five) was 67 per 1000 live births. The lowest two age groups for sampling weights were 0-<3 and 3-<6, thus the mortality rates were approximated for those age groups. The mortality rate for children aged 2-<6 was assumed to be constant. Then, the mortality rate for children in the 0-<3 age group was approximated as the infant mortality rate plus the average annual child mortality rate over two years: (76 + 67/4*2)/1000 = 0.1095. The mortality rate for children in the 3-<6 age group was approximated by the average annual child mortality rate over a three-year period: (67/4*3)/1000 = 0.05025. The number of eligible children in the sampling weight calculation was multiplied by (1000-76 -67/4*2)/1000 = 0.8905 for the 0-<3 age group and by (1000-67/4*3)/1000 =0.94975 for the 3-<6 age group. Thus, the tertiary sampling weights were equal to the number of eligible children in a given sex and age group (adjusted for mortality in the lowest two age groups) divided by the number selected in that sex and age group in a given village. For the subjects who were originally sampled in the pilot villages (n=113) tertiary sampling weights were additionally multiplied by the number of eligible pilot controls divided by the number of selected pilot subjects in a given sex and age group.

Appendix 2. Computational details and R code used for the analysis Sampling design setup
The multistage sampling design and the computation of the sampling weights (denoted by the variable "weights1234" below)are described in detail in Figure 1 in the main paper and Appendix 1. In brief, the sampling design of our study is defined by strata (4 strata, referred to as "sampling.strata" from here on), enumeration areas (EAs), which are the primary sampling units (PSUs), and village. In the computations, we accommodate clustering at the village level, noting that there is a one-to-one correspondence between PSU and village.
The 'survey' package (version 3.31) in R [2, 3] was used for all the weighted analyses. In this package the design is incorporated by d.svy <-svydesign(ids = ~village, weights = ~weights1234, strata = ~sampling.strata, data = d) The svydesign object d.svy is passed to all the functions that follow. The presentation below describes the R functions used for each Table. 1. Weighted contingency tables: totals and percentages (Table 1) svytotal(~interaction(malaria_RDT, malaria_microscopy), design=d.svy) The percentages are the counts in each cell divided by their sum.

Weighted proportions and odds ratios (ORs) and 95% confidence intervals (Cis) (Table 3, Supplementary Table 1)
To obtain weighted proportions we used the function: svyby(~outcome by =~covariate, FUN = svymean, design=d.svy) This function was called for each covariate in the rows of the table separately. Outcome for Table 2 is "malaria positivity by RDT", for Supplementary Table 2 it is "microscopy" and "joint RDT and microscopy results".
P-values for the association of a covariate with the outcome for unadjusted and adjusted logistic models were calculated using svyglm object from the unadjusted or adjusted models using: regTermTest(svyglm.object, test.terms = ~ covariate, method = 'Wald') where strata.water is a strata variable based on distance of the village from water. To assess age effect in more detail we fit a model with a linear and quadratic term to age, coded using the midpoints of 0-2, 3-5, 6-8, 9-11, 12-15 year intervals using:

P-values for the overall test of heterogeneity (
Model with a linear and quadratic effect of age on RDT positivity: svyglm.object.m2 <-svyglm( outcome ~ age.group.midpoint + I(age.group.midpoint^2), family = quasibinomial(), design=d.svy) with age.group.midpoint coded as a numeric variable of midpoints of age groups listed in the table.
The overall p-value for the age effect in model with a linear and quadratic effect of age was computed from : regTermTest(svyglm.object.m2, test.terms = ~ age.group.midpoint+I(age.group.midpoint^2), method = 'Wald')