### Study Area

Karuzi is a province in the central-eastern part of Burundi, at an altitude of 1500 – 1900 m. The province has a tropical climate characterized by a rainy season from October through April and a dry season from May through September, with mean annual minimum temperatures of 10.5 – 13°C and maximum temperatures of 25.5 – 28.5°C. Vegetation in the province consists primarily of palm and banana trees, with pine forests in the hills and cereal crops in the valleys. The mean NDVI is 0.36 from July through October and 0.53 from November through June. The population of Karuzi is 302,062, and the province is subdivided into seven communes. The health infrastructure consists of one 100-bed hospital and 11 health centers with a total of 311 beds [14].

### Data collection

#### Incidence of malaria

A case of malaria was defined as a patient seeking medical attention with fever over 38°C and no signs of acute respiratory infection, urinary infection, otitis, meningitis, measles or abscesses. This is the definition for a "case of malaria" used in all Burundi health facilities at the time of the study and for notification purposes. No changes of this definition were performed during the study period. Only 5 – 20% of clinical cases had microbiological confirmation in non-epidemic periods and no more than 2% during outbreaks, depending on the health facility. The health services compile all notifications of malaria consultations on a monthly basis. The cumulative number of notifications is as the numerator for the incidence rate, and the denominator is the total population of the province according to the population census adjusted for the growth factor which is 1.32 for the period 1995–2000 and 3.29 for the period 2000–2005 [15]. For this study there were used the 1997–2003 series of these incidence rates per 100 inhabitants of Karuzi.

#### Rainfall

Monthly cumulative precipitation for 1997–2003 was obtained from the Karuzi meteorological station of the Burundi Geographic Institute, measured in millimeters of rain fallen.

#### Temperature

Minimum and maximum monthly temperatures for 1997–2003, measured in degrees centigrade, were obtained from the same source as the rainfall data.

#### Vegetation density

The mean NDVI per month in Karuzi for 1997–2003 was obtained from images taken with the Advanced Very High Resolution Radiometer (AVHRR) sensor on board the National Oceanographic and Atmospheric Administration satellites, with a resolution of 8 km, on a scale of 0–0.7 [16].

### Epidemiological assumptions

The relation between malaria transmission and various factors was described by MacDonald in 1957 [17]. The main factors affecting transmission are vector population density, transmission capacity (based on vector survival and duration of the extrinsic incubation period – EIP) and immunity of the susceptible human host. Other factors such as strain virulence are of negligible importance. Of the meteorological data available in this study, rainfall influences the vector population (by increasing the capacity of larva production and maturation) and is reflected in the vegetation index, and temperature influences the transmission capacity (with higher temperatures shortening the EIP). This hypothesis is based on the fact that in tropical areas at altitudes over 1200 m, the most important factor limiting malaria transmission is the minimum temperature, because parasite development (sexual reproduction and development of sporozoites) is interrupted at temperatures lower than 16°C.

Generally, there are tropical areas between 25N-25S latitudes, at elevations of 1000–2000 m that have enough rainfall to maintain abundant marshy areas where the vector larvae develop, so that rainfall is often not the limiting factor. In epidemic situations, some of the factors that could plausibly explain fluctuations in transmission are: 1) higher minimum temperature, permitting prolongation of seasonal transmission and a "staircase" effect of repeated superinfections with increased parasitemia and anaemia up to clinical thresholds; 2) higher maximum temperature, shortening the EIP and producing an exponential effect on vector transmission capacity; 3) more abundant rainfall, with a consequent increase in vegetation density, resulting in a larger vector population and a linear increase in transmission, and 4) increased population reservoir of the parasite, which induces increased speed of transmission.

### Modelling assumptions

Taking the above-mentioned epidemiological assumptions into account, the following general form of the model is proposed to estimate the expected malaria incidence rate: let *I*_{
t
}represent the malaria incidence rate in month *t*; *R*_{
t
}is the cumulative level of precipitation for that month; *T*_{
t
}is either the mean minimum temperature or mean maximum temperature for that month; *V*_{
t
}is the mean vegetation density for that month, *p* is the seasonal period of oscillation for the previous three variables; and, *I*_{t+k}is the malaria incidence rate for a future month that is *k* months from *t*. Then, the relation of influence among these variables, remain as follows:

∑*αI*_{
t
}* ∑*β* sin [(2*π/p*)*R*_{
t
}* *T*_{
t
}* *V*_{
t
}] → *I*_{
t+k
}

This relation expresses that a linear or cumulative combination of previous values of the incidence rate, as an estimator of population reservoir, and the cumulative combination of past levels of rain, temperature and vegetation density, as estimators of vector capacity, combine to influence future values of the incidence rate. The term that includes rainfall, temperature and vegetation density implies that the malaria incidence oscillates with a period that is proportional to their common seasonality. In expression (1) α is the linear regression coefficient for the incidence rate, and β is a parameter that determines the amplitude of seasonal oscillation estimated by regression. The use of * as an operator to link the components expresses the lack of *a priori* knowledge of how they are interrelated – interrelations that will be determined by trial and error. The model combines all those terms having significant autocorrelation and cross-correlation coefficients with the incidence rate in their corresponding lags at a significance level of p ≤ 0.05.

### Data processing

The following steps were carried out: a) exploration of serial incidence rates, temperatures, precipitation and vegetation, to identify regularities; b) trend analysis and periodogram of the incidence rate with Fast Fourier and Tukey Transforms to identify the periodic oscillations to be modeled, so the last seasonal periods sub-series gets separated for validation purposes, shortening also the rainfall, temperature and vegetation series accordingly; c) correlograms of the simple autocorrelation function (ACF) and partial autocorrelation function (PACF) for the incidence rate, with lags equal to their period of oscillation. Identification, adjustment and evaluation of the autoregressive integrated moving average (ARIMA) equation that explains the rate by its previous values, to use as a term in model (1); d) periodograms with Fast Fourier Transform of serial rainfall, temperature and vegetation, to identify seasonal oscillations and their period *p* in (1), and cross-correlations of these three data series with the ARIMA residuals in the serial incidence rate to identify lags in the influence; e) combination of ARIMA terms and oscillatory component to shape model (1), and estimation of the linear regression coefficients of the terms and goodness-of-fit of the model; f) successive entry in the model of serial incidence rates, rainfall values, temperature and vegetation, in their corresponding lags, to obtain the expected malaria incidence rates for each point of the temporal window of the series; g) the model was tested using the sub-series of malaria incidence rates separated, and its reliability was tested by comparing each predicted rate with that observed for the corresponding month.

The reliability criteria for the forecast consisted in verifying that: i) the difference between the predicted and observed value is white noise, or a normal random variable with a mean of 0 and standard deviation of 1; the randomness of the difference is tested based on the periodogram of the data and the runs test, and normality is tested with the histogram and the Kolmogorov-Smirnov test; ii) the differences do not exceed the limits of the 95% confidence interval by more than 5%; values falling outside the confidence interval are counted in the scattergram of the difference (y axis) with respect to the observed rate (x axis); and iii) the differences do not tend to increase or decrease when the observed rate increases, that is, the precision of the forecast does not depend on the magnitude of the rate. To test this, the correlation between the difference and the observed rate is estimated by the Pearson linear correlation coefficient. Trend analysis is used to test the statistical significance of the slope of the trend of the difference with respect to the observed rate.

The images of the vegetation index were processed using WinDisp 4. Data processing was performed using the statistical packages SPSS™ 13.0 from SPSS Co., and Statgraphics Plus^{©} 5.1 from Statistical Graphics Co. A 2-tailed significance level of 0.05 was established for all tests.