Development of temporal modelling for forecasting and prediction of malaria infections using time-series and ARIMAX analyses: A case study in endemic districts of Bhutan

Background Malaria still remains a public health problem in some districts of Bhutan despite marked reduction of cases in last few years. To strengthen the country's prevention and control measures, this study was carried out to develop forecasting and prediction models of malaria incidence in the endemic districts of Bhutan using time series and ARIMAX. Methods This study was carried out retrospectively using the monthly reported malaria cases from the health centres to Vector-borne Disease Control Programme (VDCP) and the meteorological data from Meteorological Unit, Department of Energy, Ministry of Economic Affairs. Time series analysis was performed on monthly malaria cases, from 1994 to 2008, in seven malaria endemic districts. The time series models derived from a multiplicative seasonal autoregressive integrated moving average (ARIMA) was deployed to identify the best model using data from 1994 to 2006. The best-fit model was selected for each individual district and for the overall endemic area was developed and the monthly cases from January to December 2009 and 2010 were forecasted. In developing the prediction model, the monthly reported malaria cases and the meteorological factors from 1996 to 2008 of the seven districts were analysed. The method of ARIMAX modelling was employed to determine predictors of malaria of the subsequent month. Results It was found that the ARIMA (p, d, q) (P, D, Q)s model (p and P representing the auto regressive and seasonal autoregressive; d and D representing the non-seasonal differences and seasonal differencing; and q and Q the moving average parameters and seasonal moving average parameters, respectively and s representing the length of the seasonal period) for the overall endemic districts was (2,1,1)(0,1,1)12; the modelling data from each district revealed two most common ARIMA models including (2,1,1)(0,1,1)12 and (1,1,1)(0,1,1)12. The forecasted monthly malaria cases from January to December 2009 and 2010 varied from 15 to 82 cases in 2009 and 67 to 149 cases in 2010, where population in 2009 was 285,375 and the expected population of 2010 to be 289,085. The ARIMAX model of monthly cases and climatic factors showed considerable variations among the different districts. In general, the mean maximum temperature lagged at one month was a strong positive predictor of an increased malaria cases for four districts. The monthly number of cases of the previous month was also a significant predictor in one district, whereas no variable could predict malaria cases for two districts. Conclusions The ARIMA models of time-series analysis were useful in forecasting the number of cases in the endemic areas of Bhutan. There was no consistency in the predictors of malaria cases when using ARIMAX model with selected lag times and climatic predictors. The ARIMA forecasting models could be employed for planning and managing malaria prevention and control programme in Bhutan.


Background
There are an estimated 300-500 million clinical cases of malaria reported each year [1][2][3][4]. One factor contributing to this problem is the known global climate change; this is considered as a big challenge in fight against the scourge of malaria [5][6][7]. In Bhutan, despite many years of prevention and control measures, malaria still remains a public health problem in low lying areas of the country adjacent to India. In some districts, the transmission persistently occurs throughout the year. Besides its impact on social, demographic and economic disruptions, malaria is considered a threat as it may introduce malaria into all temperate climates and nonmalarious areas across Bhutan.
Historically, the first malaria survey in Bhutan was conducted in 1962 [8]. There was a steady increase in the number of cases rising from 518 in 1965 to reach a peak of 39,852 cases with 62 deaths in 1994, followed by a decline to 392 cases with two deaths in 2008 as a result of intensive prevention and control measures. The highest number of cases was recorded in 1994 with 39,852 cases, and the highest number of deaths was also recorded in 1994 with 62 deaths. The Annual Parasite Incidence (API) per 1,000 population at risk of malaria in Bhutan was 3.98% in the year 2006 [9]. However, the numbers of cases remain persistently high in seven endemic districts of Bhutan with malaria transmission occurring throughout the year. The seven endemic districts of Bhutan are: Chukha, Dagana, Pemagatshel, Samdrup Jongkhar, Samtse, Sarpang and Zhemgang.
Malaria transmission is clearly associated with the rainy season [10][11][12][13]. Prevention and control measures have been intensified during the summer months-May to July, since the transmission is increased during these months. The control measures also have changed over the years. Indoor residual spraying (IRS) using dichlorodiphenyltrichloroethane (DDT) was implemented since 1964 with three rounds and later two rounds until 1994. With reports of resistance to DDT in some parts of the world, and global concern over environmental concerns, DDT was replaced by Deltamethrin (synthetic pyrethoid) from 1995. However, DDT is used in the neighbouring state of Assam in India which borders five of the seven malaria endemic districts of Bhutan, since the main vector Anopheles minimus is still sensitive to DDT [14]. Insecticide treated-bed nets (ITN) became the main control strategy from 1998 with focal IRS being used during outbreaks and emergencies, and in high Plasmodium falciparum transmission areas with API >10% [8].
With the growing evidence of the effectiveness of LLINs in reducing the mortality and morbidity of malaria by preventing man-mosquito contact [15], from 2006 Bhutan distributed over 100,000 long-lasting insecticidal nets (LLIN), supported by grants from Global Fund to Fight AIDS, Tuberculosis and Malaria (GFATM). Early diagnosis and prompt treatment (EDPT) remains a cornerstone of malaria control and is provided by microscopy facilities at all levels of health centres and supplemented by rapid diagnostic kits (RDT). The present treatment regimen of Plasmodium falciparum is the combination of artemether and lumefantrine (Coar-tem®) administered over three days (except for pregnant woman). Plasmodium vivax is treated with chloroquine (25 mg/kg) for three days and primaquine (0.25 mg/kg) administered over 14-days [16,17].
Although there has been a marked reduction of malaria cases due to the implementation of these integrated control measures by the VDCP for the whole country, the malaria problem is still a major threat to the country and requires a well developed strategic plan and resource preparation for prevention and control, and eventual elimination. Forecasting malaria enables suitable allocation of resources and forestalls outbreaks and epidemics. There has been no such modelling of malaria in Bhutan before. This study aimed to propose a forecasting model based on the time series analysis. In addition, this study also aimed at proposing a prediction model incorporating climatic factors such as temperature, humidity and rainfall; which are important in the development of malaria parasites and vector bionomics [18][19][20][21].

Study area
Seven malaria endemic districts of total 20 districts in Bhutan were selected for this study as they have been identified since malaria transmission occur throughout the year; these districts are Chukha, Dagana, Pemagatshel, Samdrup Jongkhar, Samtse, Sarpang and Zhemgang districts ( Figure 1). These districts lie in the foothills of Himalayas neighbouring India. There are four seasons in Bhutan, each season lasts about three months. The general climate in these districts is sub-tropical with heavy rainfall in the summer season lasting from May to July. Malaria transmission is intense during the summer season, the gradual decline in malaria transmission is observed during winter season when the ambient temperature is cooler. The total population of these seven districts, as of 2008, was 277,257. However, populations from different parts of Bhutan usually come to these districts for business and other works because major commercial hubs are located there.

Malaria data
The monthly incidence of malaria cases were obtained from Vector-borne Disease Control Programme (VDCP), Department of Public Health, Ministry of Health, Royal Government of Bhutan. Malaria cases from various levels of health centres are reported to the programme every month. These health centres provide free malaria diagnosis either by microscopy or by using rapid diagnostic test (RDT). All malaria cases are treated with the national standard regimen.
Malaria distribution varies greatly between the districts and sub districts. Figures 2 and 3 shows the malaria case and incidence rate distribution down to the sub district level with some sub districts reporting as many as 438 cases, where as other some sub districts do not report any malaria cases. Figure 3 shows the malaria incidence rate per 1000 population at the sub district level.

Meteorological data
A climatic record from 1996 to 2008 was obtained from the Meteorological Unit, Department of Energy, Ministry of Economic Affairs, Royal Government of Bhutan. Daily reported climatic variables include mean minimum and maximum temperature, humidity and rainfall; these variables are collected and recorded at the weather stations in each districts. The meteorological Unit maintains the records of all the district climatic variables at the central level. Overall, the maximum temperature of the whole region varies between 20°C and 30°C; mean minimum temperature ranges between 10°C and 20°C. Monthly relative humidity ranges between 60% and 90%. Amount of rainfall greatly varies from month to month, ranging from zero to 1000 mm per month ( Figure 4).

ARIMA Modelling Methods
The forecasting model proposed in this study was the multiplicative seasonal Auto-regressive Integrated   Moving Average (ARIMA). A seasonal ARIMA (p, d, q) (P, D, Q) s model where: ▪ p and Pthe auto regressive and seasonal autoregressive, respectively; ▪ d and Dthe non-seasonal differences and seasonal differencing, respectively; ▪ q and Qthe moving average parameters and seasonal moving average parameters, respectively. ▪ s representing the length of the seasonal period.
A stationary time series is a time series whose statistics do not change over time. Such statistics are typically the mean and the variance. There are three types of non-stationarity; series that have a non-stationary mean; series that have a non-stationary variance; and series with a periodic or seasonal component [22].
In this research, the data series was non-stationary ( Figure 5-Top) but after taking first difference, the mean is constant (Figure 5-Bottom). But, there is no evident that the original data series has a variance that gets larger over time. However, in our study non-stationary mean was dealt by taking first differences and a strong seasonal component with a season of length by taking a seasonal difference.
From the autocorrelation functions (ACF) and partial autocorrelation functions (PACF), plausible models were identified. The forecasting models were developed for each selected district as well as the overall endemic districts. Time series data from 1994 to 2006 were used as a training set while data from 2007 to 2008 were used as a testing set ( Figure 6). The model diagnostic was performed using Bayesian Information Criteria (BIC) and p-value. The lowest BIC values with p value less than 0.05 was considered good model [23][24][25].
The good ARIMA models from different data series were explored in which the actual cases and predicted cases were closely matched as shown in Figure 6-Bottom. The mean average percentage errors (MAPE) were computed. The best model with the least MAPE was used to forecast the malaria cases for the year 2009 and 2010, respectively.

ARIMAX Modelling Methods
ARIMAX model is an extension of ARIMA modelling in attempt to predict the malaria cases using the climatic factors and the number of cases in a previous month. The predictors in the model included the number of cases in the previous month, mean maximum and minimum temperature, relative humidity and rainfall lagged at one month.
The modelling of ARIMA and ARIMAX models were performed using STATA Intercooled 9. The use of data was permitted by the Ministry of Health and Ministry of Economic Affairs in Bhutan.

Overall malaria incidence
The number of malaria cases of the seven malaria endemic districts decreased from 38,735 in 1994 to 6,781 in 1998. An outbreak of malaria was observed in 1999 when the incidence dramatically increased to 12,109. A significant decrease in malaria cases was observed again since the year 2005. The number of malaria cases fell to a low of 292 in 2008. Malaria distribution also varies greatly across the districts and sub districts (Figure 3). Some sub district reported malaria as many as 438 malaria cases, while some districts did not observe any malaria cases.
The best model was fitted to forecast the malaria cases for 2009 and 2010. Sarpang district was forecasted to report the highest number of cases with 350 and 915, followed by Samdrup Jongkhar district with 131 and 258 cases, respectively. The lowest cases were forecasted to be reported by Zhemgang district with nine cases for 2009 followed by Samtse district with 11 cases,  respectively. In 2010, Samtse was predicted to report the least cases with five followed by Pemagatshel district with 13 cases, respectively (Figure 7 and Tables 2 and 3).

Prediction Models with Covariates
The covariates fitted in the models included the number of malaria cases in the previous month, mean maximum and minimum temperature, humidity and rainfall, all lagged at one month period. Pemagatshel and Zhemgang did not have any significant predictors (climatic variables and the previous month's malaria cases) of malaria for the subsequent month. The best model for Chhukha district was the model II with a BIC of 1107.768. The significant predictors was mean minimum temperature (p = 0.002) when the malaria cases of the previous month, mean minimum temperature and humidity lagged at one month were fitted. Previous month's malaria cases were a significant predictor of subsequent month's malaria cases for Dagana districts in all the five models. However, model III was the best model with the lowest BIC of 748.5172. The different predictors that were fitted in the model III were malaria cases, maximum temperature and rainfall. Model III with previous month's malaria cases, maximum temperature and rainfall was the best model for Samdrup Jongkhar district. But two of the covariates, maximum temperature (p < 0.001) and rainfall (p < 0.001) were significant predictors for Samdrup Jongkhar in the Model IV. Samtse, Sarpang and all endemic districts in the model I consisting of previous month's malaria cases, and mean minimum temperature was the best model. However, only significant predictor was mean minimum temperature ( Table 4).

Models based on the time series (ARIMA)
This study provides an example of applying a simple ARIMA model to forecast malaria cases in a low malaria-transmission area, where targeted interventions are highly recommended for most effective malaria control. This model was developed according to the trend of the malaria cases over the years and presuming pattern stability of all other conditions such as climatic factors, control and preventive measures, treatment seeking behaviour and migration of people. The developed models were validated and appeared to fit well at both each district level and the overall districts, providing tolerable error levels in forecasting.
Different ARIMA models were found for different districts, which is consistent with the findings of the study by Briet et al [26]. This suggests that each district would have its own particular trend. Districts close to each other are likely to have similar disease transmission patterns corresponding to their spatial and climatic similarity. However, the ARIMA models of the selected endemic districts in Bhutan did not follow this hypothesis in terms of spatial location. In some areas, the high malaria incidence district is adjacent to the very low malaria incidence district. On the other hand, different malaria trends were observed among districts that have similar climatic characteristics. The difference in malaria trends across districts may be partly explained by the difference in control measures or patients' seeking behaviours. The control measures in these endemic districts remain same. However, the actual implementation of the control measures such as IRS may vary from districts to districts. This difference could be due to logistic reasons such as the availability of the people to carry out this activity.

Models based on time series with covariates (ARIMAX)
In exploring different prediction models by fitting covariates to the time series data, no single best model was found; some covariates were found significant in  The temperature is also an important factor that determines the rate of development of parasites in the mosquitoes. Two districts did not have any significant covariates as predictors of malaria cases. This can be explained by the fact that the malaria cases in these districts were rather low with no malaria cases in certain months, therefore the malaria cases of the preceding month was not a significant predictor, or at worst could lead to inaccurate prediction, similar to a study in Sri Lanka [26]. Similar findings in which monthly malaria cases in the previous month were not effective in forecasting were reported by Hay et al [27]. The climatic factors that were not significant predictors particularly in these two districts could be explained such that the meteorological stations collecting climatic factors are from a different location from where malaria cases usually occurred. This indicates as one of the main limitations of using the climatic factors as the predictors for forecasting in certain situation/location [28].
Temperature was found as an important predictor for three districts and overall districts. Temperature affects -the mosquito bionomics through the time required for development of the ookinete, the egg of the parasite, in the midgut of the anopheline mosquito, which decreases as temperature increases from 21°C to 27°C [29]. Increase in temperature also decreases the interval between mosquitoes' blood meals there by shortening the incubation periods of the plasmodium parasites in the mosquitoes and the number of times eggs are laid by the mosquitoes [5,21]. A decrease in temperature would have the opposite effect.

Conclusions
These forecasting models developed in the study provide the VDCP with expected malaria cases in advance, which would be a useful guidance for timely prevention and control measures to be effectively planned. The knowledge of the spatial distribution of the malaria at the sub districts would also greatly aid in the targeting the control measures, even though the forecasting is feasible at the district level data due to the small number of cases at the sub district level. Thus, VDCP can use the forecasting model to estimate the number of malaria cases at the district while targeting the control measures at the sub district using the spatial distribution of malaria cases of the previous years. The prediction model based on the time series and climatic factors were developed and showed different predictors for different districts. Some districts did not have any covariates as predictor of malaria. Model based on time series would provide forecasting for longer period unlike the models fitted with other covariates like climatic factors, which is best suited for forecasting in shorter period.
The main limitation of models based on time series is that they provide forecasting for longer period. In contrast, the models that are fitted with other covariates such as climatic factors may be best suited for forecasting within a shorter period. The time series model proposed in the study can be applied to other diseases such as dengue. However, further research is recommended to evaluate the effectiveness of integrating the forecasting model into the existing malaria control programme in terms of its impact in reducing the disease occurrence and also the cost of control interventions.