Global variation in major health problems
Across the six WHO world regions, inequality was greater for each of the infectious diseases compared to the major non-communicable diseases (Fig. 1). There was most extreme inequality in malaria burden (Gini coefficient, G = 0.77, 95% CI 0.66–0.81), which was significantly higher than for each of the other diseases as demonstrated by bootstrap confidence intervals (Fig. 1 and Additional file 1: Table S1).
Removing the Europe WHO region from the malaria calculation (which had no reported cases in 2018) did not greatly reduce the index of inequality (G = 0.73, 95% CI 0.59–0.77). Removing the African WHO region (containing approximately 90% of all malaria cases), showed residual inequality among remaining regions to be much lower but still substantial (G = 0.40, 95% CI 0.26–0.54), reflecting that most other global cases are in Asia or the Western Pacific. There was no decline in levels of global inequality between 2010 and 2018 based on data estimates from the World Malaria Report (G values for each year remained between 0.76 and 0.78, Additional file 1: Table S2).
Variation in malaria within West Africa
As the African region has the majority of the malaria burden, and more than half of the cases are estimated to occur in West Africa, we analysed the inequality among the 16 countries that constitute The West African sub-region according to the UN definition. This shows the variation in estimated malaria burden as a proportion of overall populations among countries in West Africa between 2010 and 2018, and the persistent inequality is revealed by the Gini coefficient (Fig. 2 and Additional file 1: Table S3). Although there have been moderate reductions in malaria overall, and notable reductions in a few countries, the average burden is still high and variation among countries persists (Fig. 2a). Accordingly, the Gini Coefficient of inequality remained high, between 0.27 and 0.32 in each year (Fig. 2b). In 2018, the final year estimated, there was still marked inequality in malaria burden among countries in West Africa (G = 0.31, 95% CI 0.22–0.39) and no indication of this having reduced from 2010 onwards.
Sub-national variation in malaria within high burden countries in West Africa
Sub-national variation within four of the highest burden countries was analysed using Malaria Indicator Survey data of malaria infection prevalence in children under 5 years of age, with surveys between the years of 2014 and 2016 allowing prevalence to be analysed for each of the principal formal administrative divisions within each country (States in Nigeria, Regions in Ghana and Burkina Faso, Districts in Sierra Leone) (Fig. 3 and Additional file 1: Table S4). Sierra Leone had the highest mean prevalence, but the highest level of sub-national inequality in malaria parasite prevalence was seen in Nigeria (G = 0.30, 95% CI 0.26–0.35), followed by Burkina Faso (G = 0.25, 95% CI 0.19–0.29), with more moderate sub-national inequality within Ghana (G = 0.18, 95% CI 0.12–0.25) and Sierra Leone (G = 0.17, 95% CI 0.12–0.22).
Seasonal malaria heterogeneity among villages in a highly endemic area
Local variation was investigated among villages in the Garki Project, a large study conducted in northern Nigeria in the early 1970 s [25]. The P. falciparum prevalence was compared among 16 villages for which there were data at eight different survey timepoints (each survey separated by approximately 10 weeks) during the pre-intervention phase of the project (Fig. 4a and Additional file 1: Table S5). An overall seasonal peak of malaria prevalence is evident which corresponds to the late wet season and immediate post-wet season (surveys 5 and 6), while lower prevalence is seen during the annual dry seasons (surveys 2 and 3 for one year, and surveys 7 and 8 for the following year). The Gini coefficients are highest at survey timepoints 2, 3, and 7, which coincide with the dry seasons (Fig. 4b). This demonstrates the ability of the Gini coefficient to track local variation in epidemiology, including temporal changes and effects of seasonality. To test for consistency in variation among villages across years, the rank order of prevalence in villages at survey 2 and survey 7 (representing similar points in consecutive dry seasons) was tested and shown to be significantly correlated (Spearman’s rho = 0.55, P = 0.028). This indicates that a significant component of the inter-village variation was maintained for at least a year.
Data were then analysed from a study of 20 villages in The Gambia [26], where malaria endemicity is lower and surveys were conducted shortly after malaria had declined significantly throughout the country [27, 28]. Comparing across all villages surveyed using the data provided in the original publication, heterogeneity was slightly higher in the dry season (G = 0.55) than the wet season (G = 0.40) (Additional file 1: Figure S1). Focusing on the eastern part of The Gambia where malaria prevalence is highest also showed greater variation among villages in the dry season compared to the wet season (Additional file 1: Figure S1). Variation among villages was generally higher than seen in the data from The Garki Project.
Comparison to the coefficient of variation
There is a strong correlation between the values of the Gini Coefficient presented here, and the values of the Coefficient of Variation (CV, standard deviation divided by the mean) calculated for each of the same datasets (Spearman’s rho = 0.982, P < 10–4). Gini coefficient values ranged from 0.05 to 0.77, while CV values ranged from 9 to 198%, with a strong correlation over the whole range (Fig. 5). As the Gini coefficient is bounded between 0 and 1 and has bootstrap confidence intervals (not plotted in Fig. 5), it offers advantages for interpretation compared to the use of an unbounded CV index.
Discussion
Inequality in malaria burden among populations is effectively summarized into a single index using the Gini coefficient, as shown here. Among leading global infectious and non-communicable public health problems, malaria shows the highest amount of inequality among different world regions, with a Gini coefficient of 0.77 being closer to the theoretically maximum possible value of 1.0 than to zero which would indicate equitable distribution. This coefficient has not been reduced in recent years, so there clearly needs to be increased effort to reducing the malaria burden in the most highly affected African region, while sustaining recent reductions of malaria elsewhere. This global need is already qualitatively clear [4], but the use of the Gini coefficient highlights the extreme situation for malaria in comparison to other diseases, and shows the measurability of inequality which is essential for future monitoring of progress.
Of equal importance, the Gini coefficient is also shown to be useful for summarising inequality at other population levels, from regional to local. Within West Africa, the sub-region with the highest overall malaria burden globally, the coefficient shows that malaria inequality among countries has not declined in recent years, reflecting that relative reductions in malaria burden have not been particularly great in the countries with most malaria. Moreover, the amount of sub-national inequality within four high burden countries in West Africa is also shown to be significantly variable. For example, there is more inequality in the infection prevalence among different states in Nigeria than among the major administrative areas within Ghana or Sierra Leone, analysing data from national Malaria Indicator Surveys that employ broadly comparable survey methods. The causes of such sub-national inequality will be complex and require more research attention for future malaria control.
The Gini coefficient is sensitive to village-level, area-level, and seasonal variation, as illustrated here by re-analyses of research survey data from studies previously conducted in different parts of West Africa. The coefficient has features that make it potentially a more useful descriptor of epidemiological heterogeneity than other summary indices. The Gini coefficient demonstrates a defined lower and upper boundary of 0 (perfect equality) and 1 (perfect inequality) while the coefficient of variation (CV) based on standard deviation does not. Although there is a strong correlation in their quantification of heterogeneity, the CV summarizes variation in an unbounded range that can transcend 100%. The Gini coefficient is therefore more appropriate for use in the context of epidemiological studies and disease reports, to enable a more standardized quantitative interpretation of inequality.
While the Gini coefficient is a useful descriptor, limitations should be considered. Technically, although bootstrap resampling is a generally robust method of calculating confidence intervals for the Gini coefficient, it has been suggested that in small samples of uniform, normal, or lognormal distributions bootstrap confidence intervals may be calculated as too narrow [14, 29], and robustness of these intervals increases with larger numbers of sampled populations. Statistical methods have been developed to mitigate this issue by approximating the Lorenz curve of a log-normal distribution [30], and could be investigated in future to check the sensitivity of confidence interval estimations. Also, although this was not a particular issue with the data analysed here, Gini coefficients could be skewed by ‘small number bias’ if they were based on samples of populations with extremely low prevalence, essentially corresponding to sampling noise that gives a systematic bias towards an inflated Gini coefficient in such situations.
Epidemiologically, the Gini coefficient needs to be recognized as a simple relative measure that does not present absolute differences, and different distributions of measurements may produce the same Gini coefficient. Demographic and socioeconomic differences, as well as ecological, genetic and geographical determinants all combine together [31, 32] and the effect of each is not explored or separately accounted for by the Gini coefficient. Indeed, in its more usual application to economic inequality, it has been well pointed out that use of the Gini coefficient should not preclude other analyses that can help understand details of the underlying heterogeneity [33]. Clearly, the coefficient does not substitute for separate analyses of hypothetical epidemiological determinants, or for maps of disease distribution, where these are available or where they may be estimated [34, 35]. Instead, it should be applied alongside presentation of more detailed or qualitative data, and used to advocate focus on populations most affected and where control of malaria is most needed, aiming to reduce the extreme inequity that continues to prevail at multiple levels.