The temporal lagged association between meteorological factors and malaria in 30 counties in south-west China: a multilevel distributed lag non-linear analysis

Background The association between malaria and meteorological factors is complex due to the lagged and non-linear pattern. Without fully considering these characteristics, existing studies usually concluded inconsistent findings. Investigating the lagged correlation pattern between malaria and climatic variables may improve the understanding of the association and generate possible better prediction models. This is especially beneficial to the south-west China, which is a high-incidence area in China. Methods Thirty counties in south-west China were selected, and corresponding weekly malaria cases and four weekly meteorological variables were collected from 2004 to 2009. The Multilevel Distributed Lag Non-linear Model (MDLNM) was used to study the temporal lagged correlation between weekly malaria and weekly meteorological factors. The counties were divided into two groups, hot and cold weathers, in order to compare the difference under different climatic conditions and improve reliability and generalizability within similar climatic conditions. Results Rainfall was associated with malaria cases in both hot and cold weather counties with a lagged correlation, and the lag range was relatively longer than those of other meteorological factors. Besides, the lag range was longer in hot weather counties compared to cold weather counties. Relative humidity was correlated with malaria cases at early and late lags in hot weather counties. Minimum temperature had a longer lag range and larger correlation coefficients for hot weather counties compared to cold weather counties. Maximum temperature was only associated with malaria cases at early lags. Conclusion Using weekly malaria cases and meteorological information, this work studied the temporal lagged association pattern between malaria cases and meteorological information in south-west China. The results suggest that different meteorological factors show distinct patterns and magnitudes for the lagged correlation, and the patterns will depend on the climatic condition. Existing inconsistent findings for climatic factors’ lags could be due to either the invalid assumption of a single fixed lag or the distinct temperature conditions from different study sites. The lag pattern for meteorological factors should be considered in the development of malaria early warning system.


Background
Malaria is an important cause of death and illness in children and adults in tropical countries. Globally, the World Health Organization estimates that in 2010, 219 million clinical cases of malaria occurred, and 660,000 people died of malaria [1]. Despite significant reductions in the overall burden of malaria in the 20th century, malaria remains a significant public health issue in China, especially in the southern part of the mainland. Particularly, Yunnan Province used to be the highest endemic province [2]. For south-west China, the majority of previous studies focused on spatiotemporal pattern for mortality or morbidity [3][4][5], or pathogenic classifications of reported cases [6].
Malaria is transmitted by female mosquitoes of the genus Anopheles, and the transmission and prevalence of malaria are influenced by many factors, in which meteorological factors are considered to play a crucial role. However, researchers still have a poor understanding of the mechanistic link between climate and malaria risk [7][8][9][10]. Many studies were conducted to explore the link with inconsistent findings reported, and the nature and extent of the link remains highly controversial [11][12][13][14]. See [7] for a recent review of existing studies supporting and rebutting the role of climatic change as a driving force for highland invasion by malaria. Some existing studies in China made contradictory conclusions [15]. For example, while [16][17][18] found that rainfall was closely related to malaria incidence, [19][20][21] failed to identify such an association. Similar inconsistent results were also reported in sub-Saharan Africa [22].
Biologically speaking, climate is fundamentally associated to the malaria incidence through its effects on both the mosquito vector and the development of the malaria parasite inside the mosquito vector. Two aspects of the meteorological effects require special attention, the lag and non-linear characteristics. On the one hand, most time series studies have provided evidence of an association between meteorological variables and malaria, typically at a single lag of 0, 1 or 2 months [23][24][25][26][27]. However, the single fixed lag assumption was not plausible for describing population level associations. From the perspective of biological mechanism, there are several periods need to be considered for the lag effect, such as the time for mosquito to develop, the development period of parasites within the mosquito, and the incubation period of parasites within human body. Every stage shall show a variation in terms of the time lag, resulting in a smoothly varied lag distribution at population level between climatic factors and malaria cases. On the other hand, the non-linear effect was recognized in temperature, and substantial existing studies validated the nonlinear correlation between temperature and malaria in terms of laboratory and epidemiological studies [13,18,[28][29][30]. Similar potential non-linear correlations were also proposed to rainfall [30][31][32].
The association between malaria and meteorological factors is complex due to the above two characteristics. Existing inconsistent findings may be due to two reasons. On the one hand, regional variations makes distinct regions have different association patterns. On the other hand, invalid statistical assumptions result either from the misspecification of the single fixed lag or from the invalid assumption of the linear relationship.
At present, there are few studies regarding the pattern of delayed effect for meteorological factors after accounting for the nonlinearity. Besides, the comprehensive lag pattern for meteorological factors have not been examined in China. [33] investigated the lag pattern for rainfall in Anhui Province in China using monthly data. However, it is not satisfying, since monthly data was relatively coarse for the lag pattern, and other crucial meteorological variables were not included in the analysis.
The purpose of this work is to explore the lag association between meteorological variables and malaria in south-west China. Specifically, a Multilevel Distributed Lag Non-linear Model (MDLNM) was used to study the temporal lagged correlation between weekly malaria cases and weekly meteorological factors using the data from 2004 to 2009 in 30 counties in south-east China. Using these more reasonable models a better understanding can be obtained for the association between climatic variables and malaria transmission, testing the biological hypothesis in terms of epidemiological level. Also, the result may have the potential to improve forecasting of changes in malaria incidence, which would shed light to public health authorities on how to effectively distribute resources for malaria control programmes.

Study sites
South-west China (21°14′to 34°31′N, 97°35′to 110°19′E) consists of four provinces, Sichuan, Chongqing, Yunnan and Guizhou. The area has a population of 189,977,077 (sixth national census in 2010) and encompasses 1,137,570 square kilometres. There are 483 counties (county-level cities and districts). 30 counties were selected as the study sites based on availabilities of malaria and meteorological data. The malaria data covered the 483 counties while only 131 counties had the daily meteorological record; the detailed description of these datasets is in the next section. The set of counties with both malaria and meteorological data were sorted by the average annual incidences, and the top 30 counties were included in the analysis. See Figure 1 for a map of the 483 counties in south-west China and the selected 30 counties.

Data collection and management
Meteorological data were collected from the publicly available Chinese Meteorological Data Sharing Service System [34]. This system was constructed by Chinese National Meteorological Information Centre. There are 836 meteorological monitoring stations with the daily record in the whole China, 131 in the Southwest. Approximately 3-4 counties (438/131) share one monitoring station to monitor the daily meteorological information, and no counties have two monitoring stations. The monitoring station should suffice to represent the county where it is. This assumption has been made substantially in existing studies, both for malaria [4,[16][17][18]20,27,35] and other mosquito-borne diseases [36]. As mentioned in the last section, those monitoring stations located in the high incidences counties and corresponding counties were used. Four weekly meteorological data from July 2003 to December 2009 were obtained for the 30 selected counties. They are rainfall, mean relative humidity, mean minimum temperature and mean maximum temperature. Rainfall and temperatures variables are in the unit of millimetres (mm) and centigrade (°C) respectively.
Weekly malaria cases in the 30 counties were obtained from 2004 to 2009 from Chinese Centre for Disease Control and Prevention (CCDC). At the county level, it is not unreasonable to assume that malaria heterogeneity is not great, which is a usual assumption from existing studies [5,37,38]. In addition, as the interest is on the effect of climatic variables, the heterogeneity caused by other factors should not influence the result, unless other factors are related to the meteorological variables. The malaria data collection was facilitated by Chinese Information System for Infectious Diseases Control and Prevention (CISIDCP). CISIDCP was established on the basis of individual cases and public health emergencies. A Virtual Private Network (VPN) was constructed, and information of individual cases is directly reported to the national database through the internet. This system covers all health data sources and will report new malaria cases to CCDC within 24 hours [39]. Although malaria cases observed in the 30 counties including Plasmodium vivax and Plasmodium falciparum, most data did not separate different parasites. Population data for every county from 2004 to 2009 were retrieved from the National Bureau of Statistics of China.

Stratification by temperature
The 30 counties were divided into 15 hot weather and 15 cold weather counties according to the mean minimum temperature, in order to examine the differences between these two groups. Moreover, this approach will lead to more reliable and precise estimates for a given condition, making more generalizable results within similar climatic conditions.

Multilevel distributed lag non-linear models
The methodology of DLNMs was originally developed for time series data, and a thorough methodological overview was given in [40][41][42]. Distributed Lag Non-linear Models (DLNM) represent a modelling framework to describe simultaneously nonlinear and delayed dependencies. To get the basic idea on DLNM, in this section, the model included just one meteorological factor, the rainfall. The extension to the other factors is straightforward and shall be presented in the next section. The expected number of cases E(Y it ) in week t in county i were modelled by the Poisson regression, Here, d it is the population in county i in week t; β i0 is the intercept effect for county i; L min r and L max r represent the minimum and maximum range for the lag effect; and Model (1) involves two basis functions for the non-linear and lag effects, respectively. One function is f(x i(t − l),r , β rl ), which is the non-linear effect of the rainfall l weeks before. Many functional forms can be chosen for f(x i(t − l),r , β rl ), such as polynomial function. The other function is to constrain the parameter β rl . Since there is substantial correlation between rainfall on weeks close together, the above regression will have a high degree of collinearity, which will result in unstable estimates of the individual β rl ' s.To gain more efficiency and more insight into the distributed effect of rainfall over time, it is useful to constrain the β rl ' s. If this is done flexibly, substantial gains in reducing the noise of the unconstrained distributed lag model can be obtained, with minimal bias [43].
The next section concentrates on the choices of two basis functions and the range of the lag effect, L min r and L max r .

Lag range specification and other implement issues
The ranges of lag effects for the four meteorological variables were chosen according to [44], which gave an extensive overview of the lag range based on laboratory findings. 3-10 weeks were considered for temperatures. For the rainfall, instead of 4-12 weeks in [44], 4-15 weeks were specified to account for the possibility of longer range, which were reported in existing studies [32,45]. Relative humidity adopted the same lag range as rainfall.
The 3rd-order polynomial was used for both the non-linear and lag effects of meteorological variables. This choice was partly due to the flexibility of the 3rd-order polynomial and partly due to the requirement of parsimony.
Correlations within one county would be greater over those between counties due to some unmeasured (or perhaps unmeasurable) county-specific covariates, and therefore β i0 took a multilevel structure random intercept, which was a normal distribution with a mean of β 0 and a variance of σ 2 0 . Including all meteorological variables results in the final model where x it,h , x it;T n and x it;T x denote the average relative humidity, the average minimum temperature and the average maximum temperature in county i in week t, respectively. β 0 is the average intercept over all counties, and σ 2 0 characterizes the variation of county-specific intercepts around the average intercept.
One consequence of the stratification by temperatures was that the two groups do not have the same range for the meteorological factors, especially for the temperatures. Besides, the lag pattern could be distinct at different meteorological values. For example, the lag pattern for weekly rainfall might differ between 13.1 mm weekly rainfall and 26.1 mm weekly rainfall. To deal with these two issues, we selected three equally-spaced values on the highly overlapped intervals for the two groups of four meteorological variables, to make the two groups comparable and to reveal the pattern of lag effects over different meteorological variables. Zero was used for all four climatic factors as the reference value to report the result.
Lastly, as sensitivity for the choice of constrained lag function, we also investigated different functional forms fit for the lag effect. Particularly, the 4th order polynomial and B-spline were fitted. The results showed no significant change. All the implementations above were accomplished by R. R is a free software programming language and a software environment for statistical computing and graphics [46]. Specifically, we used two addnon packages, dlnm [47] and lme4 [48].

Descriptive analysis
A total of 21,944 malaria cases were reported in the selected 30 counties in south-west China from 2004 to 2009. Table 1 presents the descriptive analysis for the 30 counties, while Table 2 shows the Spearman correlation between meteorological variables. Minimum temperature was positively correlated with the remaining climatic factors, with the greatest correlation with maximum temperature (r = 0.782). Maximum temperature showed a weak correlation with rainfall and relative humidity. Finally, rainfall and relative humidity had a relatively strong correlation (r = 0.585). In this part, the result of the test of significance was omitted, as the huge sample size will always lead to a small P value, which is noninformative [49]. Figure 2 demonstrates the comparison of meteorological variables between hot and cold weather counties. While temperatures present a distinctive difference between the two groups, rainfall and relative humidity show similar distributions. Based on Figure 2, three common values for each meteorological variable were selected, to make the hot and cold weather groups comparable. Take the minimum temperature for example, 11.72°C was the 25% percentile for the weekly mean minimum temperature in hot weather counties, and 16.8°C was the 75% percentile for the weekly mean minimum temperature in cold weather counties. These two values shall be used and their mean value (14.26°C) to report the lag pattern in the next section, as both groups covered these values. Similar manipulations were also implemented for the other three meteorological variables. Figure 3 shows the estimates of distributed lag between rainfall and malaria cases. First, the distributed lag curves have the same overall trend, an inverse-U shape, with the estimated relative risk increasing at the first half and decreasing at the second half. For the hot weather counties, the correlation gets significant at approximately the 7th week, peaking during the 11-12th weeks and ending with a non-significant correlation at the last week. Besides, the range of significant correlations in cold weather counties is pronounced shorter than that of hot weather counties. Unlike the hot weather counties, the lag range increases with the increase of rainfall in cold weather counties. Furthermore, at each rainfall value, the hot weather counties present higher relative risks verses those of cold weather counties. In addition, in both hot and cold weather counties the relative risk increases with the increase of rainfall, and the magnitude of the increasing trend is greater at the lower rainfall, almost 100 times from 0.1 mm to 13.1 mm, while increasing little from 13.1 mm to 26.1 mm. Figure 4 gives estimates of distributed lag relationship between relative humidity and malaria cases. In hot weather counties, there is a significant positive decreasing correlation during the 4-5th weeks, while in the middle the correlation becomes non-significant, and gets significant during the 13-15th weeks. By contrast, the association of relative humidity in cold weather counties is almost not significant over the whole range. Figure 5 demonstrates estimates of relationship between minimum temperature and malaria cases. In the hot weather counties, minimum temperature shows a constantly significant association with malaria, usually starting from the 4th week. In contrast, the cold weather counties show a limited range for statistically significant association, usually ending at the 7th week. Besides, at the same minimum temperature value, the correlation in hot weather counties is a greater compared to that of cold weather counties. The 3rd week in hot weather counties shows a statistically negative correlation, but its 95% confidence interval is close to 1. Figure 6 demonstrates estimates of distributed lag between maximum temperature and malaria cases. The association is similar for hot and cold weather counties, with a significant association during the 3rd-4th weeks.

Multilevel distributed lag non-linear models
In Figures 4, 5, 6, the functional form and magnitude of effect are almost the same among three categories of values for the three climatic variables.

Discussion
Like all mosquitoes, anophelines go through four stages in their life cycle: egg, larva, pupa, and adult [50]. The first three stages are aquatic and also depend on the temperature. The adult stage is when the female Anopheles mosquito acts as malaria vector [51]. Once adult mosquitoes have emerged, the temperature, humidity, and rains will determine their chances of survival. To transmit malaria successfully, female Anopheles must survive long enough after they have become infected to allow the parasites they harbour to complete their growth cycle [52]. Furthermore, a better climatic environment will also shorten the time required for the parasite development in the mosquito (the extrinsic incubation period) [53]. In summary, the meteorological variables can affect the malaria cases through the effect both on every stage of mosquitoes and the parasite within mosquitoes. The results find that different meteorological factors have distinct patterns and magnitudes for the lagged correlation. Rainfall is associated with malaria cases in both hot and cold weather counties with a delayed correlation and a relatively long lag range, suggesting that rainfall may create collections of water to promote the whole process for the mosquitoes' development. Furthermore, the lag is longer in hot weather counties compared to cold weather counties, which is biologically plausible, as temperature must be warm enough to support the developments of mosquito and parasites. Besides, although greater rainfall leads to a higher relative risks for malaria cases in both hot and cold weather counties, the increase is greatest when the rainfall is low, while the increase is weaker when the rainfall is high. The saturation effect may be used to explain this phenomenon, in the sense that when rainfall is sufficient, additional rainfall contributes little to the developments of mosquito and parasites.
While relative humidity is not statistically significant in the cold weather counties, the association is statistically significant at early and late lags in the hot weather counties. This could be explained as follows. When the temperature is not low, the relative humidity primarily contributes to the mature of parasites and early development of mosquitoes, respectively.
Minimum temperature has a longer lag range and larger correlation coefficients for the hot weather counties compared to cold weather counties. This is contradictory with some existing studies [44,54], in which they found that small increases in temperature will have a greater effect on malaria transmission in areas with lower average temperatures. This may result from the large difference regarding prevalence rate between the hot and cold weather counties, which can be inferred from the incidence. Hot weather counties have significantly larger incidences than cold weather counties according to Table 1 and the six years incidences for the two kinds of counties, with 407/ 10,000 and 66/10,000 for the hot and cold weather counties, respectively. Therefore, more infected persons can lead the mosquitoes to a greater exposure chance, which could compensate the less increase effect from minimum temperatures in warmer counties. Maximum temperature is only statistically associated with malaria cases at early lags, implying that maximum temperature may contribute to the mature of the parasite. The lag functional form is relatively stable for all four meteorological variables within a climatic condition. On the other hand, in terms of the magnitude, the rainfall presents a variation among different values while the other three climatic factors do not show such variation. This indicates the lag pattern is crucial and greatly determines the variation of the effect. Also, it reflects the non-linearity.
Spearman correlation between meteorological variables shows strong correlation between maximum and minimum temperatures on the one hand, and between rainfall and relative humidity on the other hand. This highlights the importance of including a comprehensive set of climatic variables in the model to avoid invalid association.
Existing studies concluded different lags for meteorological factors, and our work gives two possible reasons.
On the one hand, existing studies usually assume a fixed lag, and use statistical methods to select the statistically best lag. This approach omits the variation for the lag time, leading to imprecise estimates for the lag. On the other hand, distinct temperature conditions lead to different lag patterns, and therefore existing conclusions are limited to generalize to similar climatic conditions. The goal of this study is for scientific understanding, not predicting, but the results may provide suggestions for future predicting model. China is implementing a National Malaria Elimination Programme, and the southern border areas will be the one of most hard issue to elimination the disease, particularly in Yunnan [55]. One measure is to predict the malaria cases and release early warning signal when necessary. The results imply that the traditional moving average or fixed lag methods should be modified to weight different time intervals to take account of the biological mechanism.
The nonlinearity was not extensively examined, since the focus is on the lag pattern. Furthermore, it turned out that the lag pattern did not vary significantly with the change of meteorological variables, indicating that Week lag logRR F Figure 5 The estimates of distributed lag between minimum temperature and malaria cases. the lag pattern is relatively stable within a climatic condition. This study still has several limitations. First, as with all observational studies for malaria and meteorological factors, it is likely that some confounders influence the result [56]. 30 counties might have different preventive measures (with different magnitudes) to combat malaria, and they may also have different behaviours, such as the use of nets. Including city-specific random effect could not eliminate the potential bias. Second, the quality and completeness of the data may change over the six year period [57][58][59]. The change mainly occurs in the time dimension, with best quality in 2009 [59]. Third, the finite and pre-defined lag ranges for meteorological variables were used. The lag lengths were chosen mainly according to [44], which gave both the biological reasoning and the empirical study. Finally, the lag pattern of P. vivax and P. falciparum malaria could be different. As with some existing studies [5], separate analyses by different parasites were not made owing to a lack of detailed information on P. vivax and P. falciparum in this study. The difference may come from the incubation period, the time between the initial malaria infection and symptoms. However, the incubation period generally ranges from 9 to 14 days for P. falciparum and 12 to 18 days for P. vivax [60], and therefore the general lag pattern should not differentiate greatly. Nonetheless, further epidemiological researches are warranted to explore the possible different lag patterns.

Conclusions
Using weekly malaria cases and meteorological information, this work studied the temporal lagged association pattern between malaria cases and meteorological information over six years (2004)(2005)(2006)(2007)(2008)(2009) in 30 counties in south-west China. The results can be viewed as supplementary information for the existing inconsistent findings on the lag pattern, especially for China, where no similar study was reported before. Different meteorological factors show distinct patterns and magnitudes for the lagged correlation, and the patterns will depend on the climatic condition. Therefore, existing inconsistent findings for climatic factors' lags could be due to either the invalid assumption of a single fixed lag or the distinct temperature conditions from different study sites. The lag pattern for meteorological factors should be considered in the development of malaria early warning system, and how to incorporate the lag pattern into predicting model is an open question. Week lag logRR F Figure 6 The estimates of distributed lag between maximum temperature and malaria cases. The red line is the estimated distributed lag, with shaded bands indicating its 95% confidence interval. A & B show the scenario for 23.74°C weekly mean maximum temperature; C & D show the scenario for 24.84°C weekly mean maximum temperature, and E & F show the scenario for 25.93°C weekly mean maximum temperature. A, C and E are in the hot weather counties, while B, D, F are in cold weather counties.