A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates
© Jacob et al; licensee BioMed Central Ltd. 2009
Received: 2 April 2009
Accepted: 21 September 2009
Published: 21 September 2009
Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature.
Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix.
By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat.
An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity.
The autoregressive conditional variance (i.e., nuisance parameter) is important in mapping Anopheles arabiensis Patton, as it is used in habitat prediction and confidence intervals, tests of hypotheses, spectral estimates, and for estimating prediction error in the model . Nuisance parameters are often variances, but there are exceptions: for example, in an errors-in-variables model, generated from An. arabiensis aquatic habitat parameter estimates, the unknown true habitat location of each observation is a nuisance parameter . Stochastic models have been generated with non-linear nuisance parameters for examining the interrelationship between mosquito productivity and oviposition of gravid mosquitoes . By designing a model that explicitly features non-stationary behavior of An. arabiensis aquatic habitat data, a hierarchy of conditional variance components can be linked by applying Bayes theorem [4–6]. Commonly, having obtained the joint conditional distribution of all of the unknown random variables, given the known sampled habitat covariates, by applying Bayes theorem, nuisance variables are marginalized to obtain the conditional distribution for determining ecological parameters associated with georeferenced anopheline aquatic habitat data. However, even though this generalized treatment of the conditional variance can generate an autoregressive error model, the residual estimates will not be able to spatially target prolific An. arabiensis aquatic habitats based on larval/pupal productivity. Treatments of anopheline aquatic habitat perturbations should be based on surveillance of larvae in the most productive areas of an ecosystem [1, 2]. Additionally, residual-based diagnostics for multivariate heteroscedasticity from previously constructed An. arabiensis aquatic habitat models has revealed that errors in variance uncertainty estimation can substantially alter numerical predictions of a model by inflating the value of test statistic thereby, increasing the chance of a Type I error - incorrect rejection of the null hypothesis, H0: no spatial autocorrelation [1, 2]. Autocorrelation is a characteristic of data derived from a process that is articulated in one or more spatial dimensions which can describe the error structure of ecological sampled data . Thus, autoregression forecasts of An. arabiensis aquatic habitat locations requires an absolute relative prediction error estimator to identify prolific habitats for developing habitat-based intervention models for implementing Integrated Vector Management (IVM).
Traditionally, the random error terms in Gaussian autoregressive models have been posited as a proper conditional autoregressive (PCAR) or as an improper conditional autoregressive (ICAR) specification for identifying spatial trends in residual parameter estimates . The normal distribution in these models furnishes a feasible prior distribution for coefficients while the error variance prior distribution often is represented in the gamma distribution. Statistical criteria in autoregressive coefficients are crucially dependent on such assumptions as normality and homogeneity . However, the CAR prior is usually improper, making it imperative to constantly check the propriety of the joint posterior . Even though this problem can be remedied by introducing a constrained autoregressive parameter to ensure a proper joint distribution for a resulting multivariate model, input errors and structural data errors still can give rise to complex error structures, including heteroscedasticity and nonstationarity. Maximum likelihood estimation which ignores heteroskedasticity yields inconsistent estimates of the variance-covariance matrix and renders likelihood ratio tests with restrictions which make assumptions of the Gauss-Markov theorem of independence among sampled habitat covariates inappropriate . These prediction errors can lead to overconfidence in the estimates of parameter values, or to errors in an An. arabiensis aquatic habitat model being compensated by large residual variances.
For determining spatial errors in an An. arabiensis aquatic habitat model, Bayesian geostatistical kriging models of the form described in Diggle et al.  has on occasion been used as opposed to the CAR model. The Bayesian kriging model assumes that autoregressive errors are modeled using a multivariate Gaussian distribution with an uncertainty covariance matrix expressed as a parametric function of the distance between pairs of georeferenced data points. Another uncertainty estimator, for spatial simulation models generated from field and remote-sampled An. arabiensis aquatic habitat parameters, is a relative error norm technique which normalizes the difference between model predictions and sampled predictor variables and computes residual estimates for discrete and continuous domain problems . Using this technique, multiplicative errors can be treated in the same way as in an autoregressive model using log-transformed habitat data (i.e., larval/pupal counts). The error model is then evaluated using predictions based on some optimal parameter set. Another metric involves measuring uncertainty estimates through statistical distributions and classical hypothesis testing . Examples of the metric includes the Bayesian Model Averaging (BMA) , which for ecological sampled An. arabiensis aquatic habitat covariates can be applied by directly likelihood weighting the outputs of multivariate analyses either by using deterministic or stochastic techniques. Subsequently, the predictive error distributions obtained with these models can be combined using BMA techniques to obtain a multi-model prediction of An. arabiensis aquatic habitat locations.
Although relative norm and BMA can explicitly model the covariance structure of the error terms in an An. arabiensis aquatic habitat model, the output will be in the form of global parameter estimates. These global estimates can indicate how reliable results from an An. arabiensis aquatic habitat model are but, like any global statistic these accuracy assessments will summarize the standard error from many sampled habitat locations. Standard heuristic approach to anopheline aquatic habitat model selection is to measure when global residual error variance begins to stabilize . However, global statistics will summarize standard error from many sampled habitat locations, thus making it difficult for spatial assessment of predictive error at a single sampled habitat. Moreover, if global parameter estimates are used for evaluating autoregressive residual coefficients, then the assumption is that parasitological indicators of An. arabiensis aquatic habitats are homogenous in their quantitative predictions. For example, the assumption must be made that contacts between hosts and blood feeding mosquitoes are uniformly distributed in the focal area, whereas studies has shown blood feedings of mosquitoes tend to aggregate in geographic space [1, 2].
Local spatial autocorrelation indices [17, 18] may provide a method for assessing variance uncertainty estimates in models generated from field and remote-sampled of An. arabiensis aquatic habitats covariates. By far, the most popular test for spatial autocorrelation is based on the Moran I test statistic. In essence, this test statistic is formulated as a properly normalized quadratic in terms of the variables that are being tested for spatial correlation. Moran's original specification standardizes the variables by subtracting the sample mean, and then deflating by an appropriate factor. The error variance-covariance matrix appearing in the quadratic form, based on the non-independence of the sampled observations, is a spatially weighted matrix. The eigendecomposition of this matrix may have interesting properties in various contexts for mapping variance uncertainty in Bayesian probabilistic models using distribution properties of Moran's I and generalized linear models. Algorithms that assume independently-distributed errors of An. arabiensis aquatic habitats may formally establish an asymptotic distribution of the Moran test statistic for determining spatial correlation in models for quantifying variance uncertainty estimates.
In this research, error propagation in Bayesian regression coefficients was spatially quantified using Monte Carlo Markov Chain (MCMC) methods, and ecological parameters of individual sampled riceland An. arabiensis aquatic habitats. The MCMC methods are a class of powerful stochastic algorithms, which provides a means for taking spatially dependentsamples from probability distributions, by generating a set of random samples from an arbitrary probability density function (pdf), which in Bayesian analysis is the posterior distribution . Essentially all inference about uncertainty in Bayesian regression models, generated from ecological sampled covariates of anopheline aquatic habitats have revealed high reliability in their prediction estimates . Spatial filtering techniques were then used, which included the eigendecomposition of a spatial weighted matrix, using the non-linear regression estimates generated from the Bayesian framework. Spatial eigendecomposition models can focus on an error specification, at the habitat level, using a mean response that forces the auto-model spatial dependency parameter value to zero [1, 2]. In this research, the eigenvector filtering approach promoted by Griffith et al  and Getis and Griffith  was used, which is a non-parametric technique that removes the inherent spatial autocorrelation from generalized linear regression models by treating it as a missing variables (i.e., first order) effect. The aim of non-parametric spatial filtering is to control for residual latent autocorrelation at the individual habitat level, with a set of proxy variables rather than to identify a global autocorrelation parameter for a spatial process . An autoregressive variance uncertainty analyses for heteroskedastic error modeling was then performed using autocorrelation indices in which conditional means and residual variances were specified. Given valid assumptions about the nature of variance uncertainty estimates in Bayesian applications, autocorrelating error residuals in a spatial weights matrix may provide a method for predicting clusters of An. arabiensis aquatic habitats.
Additionally, testing variance uncertainty estimates from a spatial autocorrelation error matrix may reveal pertinent statistics (e.g., y-intercept, slope coefficients, standard errors, t-values, residuals, and diagnostic test results) for determining the relative plausibility of a model for correctly statistically prioritizing sampled covariates of An. arabiensis aquatic habitats based on larval/pupal productivity. These statistical approaches may also infer correlates of species abundance data (Poisson or normally distributed response), for other mosquito species and insect research, while accounting for spatial autocorrelation in model error residuals using autocovariate regression, spatial eigenvector mapping, generalized least squares (conditional and simultaneous) autoregressive models and generalized estimating equations. Therefore, our objectives in this research were to: (1) generate global autocorrelation statistics for decomposing sampled An. arabiensis aquatic habitat parameters into spatial eigenvectors using a Poisson model with a gamma-distributed mean; (2) perform a Bayesian regression analyses incorporating a MCMC algorithm using field and remote sampled predictor variables; and, (3) autocorrelate all uncertainty coefficients in a spatially weighted matrix to determine variance uncertainty in an An. arabiensis aquatic habitat model.
Field sampling strategy
The rates parameter λ i (X i ) was both the mean and the variance of the Poisson distribution for an An. arabiensis aquatic habitat i. The dependent variable was the total larval/pupal count in an An. arabiensis aquatic habitat. The regression analyses were performed in SAS PROCREG. The sampled habitat data were log-transformed before analyses to normalize the distribution and minimize standard error. All of the covariate estimates for the models were tested for multicollinearity, using partial F test in SAS, and no problematic correlations were found.
Bayesian estimation procedures
In this research Bayesian estimation and MCMC methods were used to model the sampled covariates of An. arabiensis aquatic habitats in the study site. In the Bayesian paradigm, hierarchical models can be used to model heterogeneity of variances on the log- scale . In this research, the natural logarithms of variances were modeled using a linear model to account for heterogeneity of the variances (on a logarithmic scale), in terms of the predictor variables sampled. In an anopheline aquatic habitat model, an environment-specific variance parameter is considered to be an independent draw from a random sampling distribution .
The MCMC sampling began with conditional (marginal) probability distributions, and the parameter estimates that were obtained using pseudo-likelihood estimation (i.e., an autoregressive term estimated with a conventional regression procedure). This involved estimating covariate coefficients (β) and ρ as though the field and remote-sampled observations were independent. MCMC outputs can sample values for an anopheline aquatic habitat parameter drawn from the joint posterior probability distribution . In the first stage of the Bayesian analyses, a likelihood model was specified for the vector of the An. arabiensis aquatic habitats larval/pupal counts. At the second stage, predictor variables of the sampled An. arabiensis aquatic habitats were analyzed for specifying a prior model.
Three chains were estimated for the variables in each potential model. Samples were discarded to allow the model to stabilize and the next 10,000 samples, after burn in, were used to derive parameter estimates. Discarding the first set of "burn-in" iterations can ensure that the chain has reached steady state, when estimating Monte Carlo parameters, such as posterior means from sampled anopheline habitat covariates . After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the model.
The Monte Carlo method of error propagation assumed that the distribution of error variables for each of the input data layers, generated in WinBUGS® from the ecological sampled An. arabiensis aquatic habitats parameters, were known. For each of the data layers an error surface was simulated by drawing, at random, from an error pool defined by the geographic distribution of the sampled habitat data. Error surfaces were added to the input data layers and the model was run using the resulting data error layers as input. The process was repeated so that, for each run, a new realization of an error surface was generated for each input data layer. The results of each run were accumulated and a running mean and standard deviation surface for the output was calculated. This process continued until the running mean stabilized. Since the random error visualizations were both positive and negative, the stable running mean were taken as the true model output surface, and the standard deviation surface was used as a measure of relative error. A simple summary was generated, showing posterior mean, median and standard deviation, with a 95% posterior credible interval.
Models were compared using the Deviance Information Criterion (DIC) in WinBUGS®, where , was the sum of the posterior mean of the deviance, (D), a measure of goodness-of-fit, and the effective number of parameters (p D ), a measure of model complexity. A measure of goodness-of-fit based on the DIC values was applied and an R2DIC, calculated in line with the standard R2 measure for the regression models. This was defined as: where DIC k was the DIC value for model k under evaluation, DICmax was the DIC value for one-fixed parameter model and was the posterior deviance from the model .
Checking the statistical efficiency of the MCMC Sequence
for lag l ≥ 0.
After compilation, the files contained some initial values for the parameters selected in the model. After careful inspection of the data, no aberrant values, leading to numerical overflow were found.
Spatial autocorrelation error matrix
All residual estimates from the Bayesian model were then evaluated in a spatial error (SE) model. An autoregressive model was employed that used a sampled habitat variable, Y, as a function of nearby sampled habitat Y values [i.e., an autoregressive response (AR) or spatial linear (SL) specification] and/or the residuals of Y as a function of nearby Y residuals [i.e., an AR or SE specification]. Distance between sampled habitats was defined in terms of an n-by-n geographic weights matrix, C, whose c ij values were 1 if the sampled An. arabiensis aquatic habitat locations i and j were deemed nearby, and 0 otherwise. Adjusting this matrix by dividing each row entry by its row sum, with the row sums given by C1, converted this matrix to matrix W .
where H = (I - 11 T /n) was an orthogonal projector verifying that H = H2, (i.e., H was independent). Features of matrix W for analyzing sampled covariates of An. arabiensis aquatic habitats include that it: is a stochastic matrix, expresses each observed value yi as a function of the average of habitat location i's nearby habitat larval/pupal counts, and allows a single spatial autoregressive parameter, ρ, to have a maximum value of 1 .
Simultaneous autoregressive model (SAR) specifications
where μ was the scalar conditional mean of Y, and ε was an n-by-1 error vector whose elements were statistically independent and identically distributed (iid) normally random variates. The spatial covariance matrix for equation (2.1), using the sampled anopheline aquatic habitat covariates was E [(Y - μ l)' (Y - μ l)] = Σ = [(I - ρ W')(I - ρ W)]-1σ2, where E (●) denoted the calculus of expectations, I was the n-by-n identity matrix denoting the matrix transpose operation, and σ2 was the error variance.
If either ρ+ = 0 (and hence I+ = 0 and I- = I) or ρ- = 0 (and hence I- = 0 and I+ = I), then equation (2.3) reduces to equation (2.1) . This indicator variables classification was made in accordance with the quadrants of the corresponding Moran scatterplot generated using the sampled An. arabiensis. aquatic habitat covariates sampled in the study site.
The SF model specification
where μ the scalar mean of Y, Ek was an n-by-k matrix containing the subset of k <<n eigenvectors selected with a stepwise regression technique, and β was a k-by-1 vector of regression coefficients 
A number of the eigenvectors were extracted from (I - 11'/n) C (I - 11'/n), which were affiliated with geographic patterns of the sampled An. arabiensis aquatic habitat covariates, in the study site, portraying a negligible degree of spatial autocorrelation. Consequently, only k of the n eigenvectors was of interest for generating a candidate set for a stepwise regression procedure. Candidate eigenvector represents a level of spatial autocorrelation which can account for the redundant information in orthogonal anopheline aquatic habitat map patterns 
Of note is that because the 2-by-2 square tessellation rendered a repeated eigenvalue.
To identify spatial clusters of An. arabiensis aquatic habitats, Thiessen polygon surface partitioning were generated to construct geographic neighbor matrices, which also were used in the spatial autocorrelation analysis. Entries in matrix were 1, if two sampled An. arabiensis aquatic habitats shared a common Thiessen polygon boundary and 0, otherwise. Next, the linkage structure for each surface was edited to remove unlikely geographic neighbors to identify pairs of sampled An. arabiensis aquatic habitats sharing a common Thiessen polygon boundary. Attention was restricted to those map patterns associated with at least a minimum level of spatial autocorrelation, which, for implementation purposes, was defined by |MCj/MCmax| > 0.25, where MCj denoted the j th value and MCmax, the maximum value of MC. This threshold value allowed two candidate sets of eigenvectors to be considered for substantial positive and substantial negative spatial autocorrelation respectively. These statistics indicated that the detected negative spatial autocorrelation may be considered to be statistically significant, based upon a randomization perspective. Of note, is that the ratio of the PRESS (i.e., predicted error sum of squares) statistic to the sum of squared errors from the MC scatterplot trend line was 1.27 which was well within two standard deviations of the average standard prediction error value (roughly 1.13) for a sampled An. arabiensis aquatic habitat in the study site. Because larval/pupal counts were being analyzed, a Poisson spatial filter model specification was employed in this research [1, 2]. Detected overdispersion (i.e., extra-Poisson variation) results in its mean being specified as gamma distributed .
where μ i was the expected mean larval/pupal count for habitat location i, μ was an n-by-1 vector of expected larval/pupal counts, LN denoted the natural logarithm (i.e., the generalized linear model link function), α was an intercept term, and η was the negative binomial dispersion parameter. This log-linear equation had no error term; rather, estimation was executed assuming a negative binomial random variable.
Eigenfunctions of a spatial weighting matrix
The upper and lower bounds for a spatial matrix generated using Morans indices (I) can be given by λmax(n/1 T W 1) and λmin(n/1 T W 1) where λmax and λmin which are the extreme eigenvalues of Ω = HWH . Hence, in this research, the eigenvectors of Ω were vectors with unit norm maximizing Moran's I. The eigenvalues of this matrix were equal to Moran's I coefficients of spatial autocorrelation post-multiplied by a constant. Eigenvectors associated with high positive (or negative) eigenvalues have high positive (or negative) autocorrelation . The eigenvectors associated with eigenvalues with extremely small absolute values correspond to low spatial autocorrelation and are not suitable for defining spatial structures 
From autocorrelation to correlation coefficient
The term cor2 (u i , z) represented the part of the variance of z that was explained by u i in the An. arabiensis aquatic habitat model z = β i u i + e i . This quantity was equal to . By definition, the eigenvectors u i were orthogonal, and therefore, regression coefficients of the linear models z = β i u i + e i were those of the multiple regression model z = Uβ + ε = β i u i + ⋯ + β n-run-r+ ε.
The distribution of the error residuals in the autocovariance matrix
The maximum value of I was obtained by all of the variation of z, as explained by the eigenvector u1, which corresponded to the highest eigenvalue λ1 in the spatial autocorrelation error matrix. In this research, cor2 (u i , z) = 1 (and cor2 (u i , z) = 0 for i ≠ 1) and the maximum value of I, was deduced for Equation (2.9), which was equal to Imax = λ1(n/1 T W 1). The minimum value of I in the error matrix was obtained as all the variation of z was explained by the eigenvector un-rcorresponding to the lowest eigenvalue λ n-r generated in the An. arabiensis aquatic habitat model. This minimum value was equal to Imin = λn-r(n/1 T W 1). If the ecological sampled predictor variable was not spatialized, the part of the variance explained by each eigenvector was equal, on average, to cor2 (u i , z) = 1/n-1. Because the field and remote-sampled An. arabiensis aquatic habitat variables in z were randomly permuted, it was assumed that we would obtain this result. In this research the set of n! random permutations, revealed that . It was easily demonstrated that and it followed that .
Information collected in the rice fields of Karima study site for analysis in SAS
Total larval count (dependent variable)
0 = not turbid, 1 = turbid
Distance to animal
Comparison of improvement of fit measured by likelihood ratio between unadjusted and adjusted effects models, and full main effects and interactions and saturated models for the Karima study site
Improvement χ 2
Improvement χ 2
1st Degree Interactions
Improvement of Fit of the WinBUGS Hierarchical Bayesian Model (HBM) model
Improvement χ 2
Improvement χ 2
Results of SAS regression used to estimate prior distribution of coefficients for WinBUGS MCMC analysis
Coefficient parameters estimates for WinBUGS Bayesian model
Spatial analysis of residual errors and habitat depths for the Karima study site
Raw Count data (unadjusted for habitat factors)
Moran's I Coefficient (Z)
Moran's I Coefficient (Z)
Depth of habitat
Moran's Coefficient I (Z)
Poisson spatial filtering model results for Anopheles arabiensis larval mosquito counts by study site
SF: # of eigenvectors
Positive SA SF: # of eigenvectors
Positive SASF: MC
Positive SA SF: GR
Positive SA SF pseudo-R2
Negative SA SF: # of eigenvectors
Negative SA SF: MC
Negative SA SF: GR
Negative SA SF pseudo-R2
In the Bayesian analyses, all "high risk" habitats were identified and ranked based on the sampled ecological covariates and larval/pupal productivity. Parameter estimates were used to define expectations for prior distributions in the autoregressive framework, which revealed that the sampled covariate number of tillers was a significant predictor variable, positively associated with An. arabiensis aquatic habitats in the study site. The abundance of An. arabiensis has been associated with early vegetative stage of the rice growth [20, 21]. At the tillering stage, in Karima rice fields, there is addition of inorganic nitrogenous fertilizers . The addition of the nitrogenous fertilizers can act as the attractant for oviposition by gravid An. arabiensis mosquitoes. Broadcasting nitrogenous fertilizers in rice fields has been found to enhance mosquito larval populations [24, 25]. For effective control of developmental stages of mosquito larvae, the application of larvicides should be done at the vegetative stage and the larvicides should persist until the beginning of the reproductive stage of the rice .
The summarization of the simulated posterior distribution correctly accounted for the error of estimation of all sampled An. arabiensis aquatic habitat parameters; each simulated posterior distribution represented an "average" over the joint posterior distributions of all other parameters in the model so that any uncertainty estimation of the sampled predictor variables was fully accounted for in both the mean or the mode of simulated posteriors and in the dispersion of the posterior. Because Bayesian statistical analysis is involved, prior distributions need to be posited for each varying quantity: the response variable, each variable coefficient, the spatial autoregressive parameter, the error variance, and the random error term . This "Bayesian averaging" over the uncertainty of estimation is a very desirable property of Bayesian frameworks for modeling An. arabiensis aquatic habitat covariates as any predictor variable, depending strongly on poorly estimated parameters, will have relatively flat posteriors i.e., the posterior will be a direct indication that the available precision on the parameter is very poor. In this research, the DIC comprised two goodness-of-fit measures and the posterior distribution of the deviance, which was the number of effective parameters for measuring complexity in the An. arabiensis aquatic habitat model. Aquatic habitats with high larval/pupal count, were compared with the results of a Monte Carlo simulation, which established the probabilities and occurrences of highly productive habitats in the study site based on larval/pupal productivity.
The spatial filter analyses used geographic weights matrices and a stepwise negative binomial regression routine, to select eigenvectors as regressors. This eigenvector spatial filtering approach added a minimally sufficient set of eigenvectors as proxy-variables to a set of linear predictors of An. arabiensis aquatic habitats. The regression residuals represented spatially independent variable components. The eigenvectors yielded distinct An. arabiensis aquatic habitat map patterns for description of the latent autocorrelation in the sampled data. There was positive autocorrelation in the residual spatial pattern: similar log-larval/pupal counts of An. arabiensis aquatic habitats aggregated in geographic space based on the sampled covariate depth of habitat.
Positive autocorrelation pattern in An. arabiensis aquatic habitat covariates is often driven by multiple causes that may be exogenous (e.g. autocorrelated environment disturbance) and/or endogenous (conspecific attraction, dispersal limitations, demography) [1, 2]. For example, positive autocorrelation patterns of anopheline aquatic habitats can be influenced by environmental landscape , vector control activities , host density , proximity to larval habitats and blood-meal hosts , quality of the larval habitats , availability of domestic animals  and inter-human variation in mosquito preferences, based on host odors and other cues . Positive autocorrelation may be also due to common local weather patterns that cause habitats to spatially cluster and partially govern anopheline larval/pupal population dynamics [1, 2]. Climatic factors particularly temperature, precipitation and relative humidity, predicts to a large degree the natural distribution of An. arabiensis aquatic habitats , as well as ecological factors, such as predation, parasitism, cannibalism, availability of blood meal hosts and quality of larval habitats . Additionally, mosquito species differ in their habitat preference and disproportionately utilize available aquatic habitats. For example, some species, such as An. funestus, thrive in permanent and marshy water bodies  and others, including An. gambiae and An. arabiensis, prefer small pools of water that are sun-lit and devoid of vegetation . Mosquitoes also differ in their foraging behavior, as well as host choice and resting behavior , which can effect clustering of An. arabiensis aquatic habitats based on larval/pupal productivity. Furthermore, socio-economic/demographic dimensions in riceland environments may tend to impact upon contagion diffusion, inducing An. arabiensis aquatic habitats to cluster together in geographic space . For example, the number of sleepers, the house roof materials (grass thatch, iron, or tile roof) have significant effects on the number of mosquitoes caught .
A graduated, systematic MCMC sampling methodology that uses a spatial autocorrelation error matrix for Gaussian variance estimation, can adjust for sampled ecological covariates, which can identify more clustering of An. arabiensis aquatic habitats within riceland areas than techniques that use a random sampling strategy. A major advantage of using autocorrelation indices is that the sampling error distributions are well-defined. Thus, if the epidemiological data about the hotspots (clusters of An. arabiensis) is correct, based on targeted MCMC surveillance, then using autocorrelation indices for variance uncertainty estimation can yield model outputs with higher sensitivity for detection of highly productive An. arabiensis aquatic habitats, than random surveillance for riceland larval control operations. The statistical significance of spatio-temporal autocorrelation patterns found in the model can be directly assessed using standard normal deviates (z scores). Since it is more feasible to expand intensified surveys to targeted An. arabiensis aquatic habitats, based on spatially selected potential foci , a systematic MCMC surveillance sampling frame, using a spatial autocorrelation error matrix for estimating variance uncertainty, can focus on specific habitats, which would allow for intensified entomologic surveillance at prolific habitats, while not increasing overall sampling efforts. Random interventions are excessive and wasteful because the vectors are not themselves randomly distributed .
A spatial autocorrelation error matrix can also locally calculate total, omission and commission errors using one assessment and report each conditional-variance term in the model using the original sampled data units. These error residuals will be reported as one value for each sampled habitat location rather than being a combination of habitat values. The ability to adequately reflect the spatial dependence in individual sampled habitat covariates comprehensively in an important advantage of using autocorrelation indices for variance uncertainty estimation in an An. arabiensis aquatic habitat model. The strategy of targeted interventions is to recognize the importance of the variation in mosquito production among individual habitat breeding sites throughout the rice cycle . Autocorrelation indices should not be interpreted, however, as a direct estimate of the correlation parameter: a spatial stochastic model, such as a first-order conditional autoregressive (spatial Markov) model, must first be specified to enable parameter estimation. Although in this research the discussion was centered on malaria vectors, specifically of the An. gambiae complex, the framework and derived guidelines described are applicable to integrated control programs for other mosquito species and insect born diseases.
The Bayesian regression analyses revealed that the sampled covariate number of tillers was positively associated with prolific An. arabiensis aquatic habitats based on larval/pupal productivity in the study site. A spatial filter analyses selected eigenvectors as regressors, resulting in spatial autocorrelation being filtered out of the residuals of the ecological sampled data. The spatial filtering analyses transformed all variables, containing spatial dependence, into covariates free of spatial dependence by partitioning the original georeferenced An. arabiensis aquatic habitat attribute variable, within a generalized linear model framework, into two synthetic variates: (1) a spatial filter variate capturing latent spatial dependency, that otherwise would have remained in the response residuals, and (2) a nonspatial variate that was free of spatial dependence. The eigenfunction spatial filter derived from the MC determined the mean, variance and statistical distribution characterizations and descriptions of the sampled covariates at each individual habitat. The spatial autocorrelation residual error analyses using the estimates from the Monte Carlo simulation suggested positive autocorrelation of the An. arabiensis aquatic habitats based on the covariate depth of habitat. The spatial autocorrelation error matrix revealed the presence of roughly 19% redundant information in the An. arabiensis aquatic habitat parameter estimates. The spatially adjusted models identified the clustering patterns of the sampled An. arabiensis aquatic habitat in the ecological datasets while accounting for all conditional heteroscedastic error terms in the models. Autocorrelation indices can enable significance testing of An. arabiensis aquatic habitat models using field and remote sampled explanatory variables which can be very useful for model improvement and resource allocation for implementing mosquito control strategies in riceland areas.
We would like to thank the data collection efforts of the ICIPE Mwea Rice Mosquito Team: provided by James Wauna, Peter Barasa, Nelson M. Muchiri, Gladis Kamari, William M. Waweru, Christine W. Maina, Peter M. Mutiga, Irene Kamau, Paul K. Mwangi, Nicholus G. Kamari, Martin Njigoya and Naftaly Gichuki at the Mwea Divison in Kenya, for conducting the study. We would also like to thank Dr. C. Mutero for providing data for the various base maps of the study site. This research was funded by the National Institute of Health Grant U01A154889 (Novak Robert) University of Alabama at Birmingham.
- Jacob BG, Griffith DA, Gunter JT, Muturi EJ, Caamano EX, Shililu JI, Githure JI, Regens JL, Novak RJ: Spatial filtering specification for an auto-negative binomial model of Anopheles arabiensis aquatic habitats. Transactions in GIS. 2008, 12: 243-259.Google Scholar
- Jacob BG, Griffith DA, Novak RJ: Decomposing malaria mosquito aquatic habitat data into spatial autocorrelation eigenvectors in a SAS/GIS® module. Transactions in GIS. 2008, 12: 341-364. 10.1111/j.1467-9671.2008.01104.x.View ArticleGoogle Scholar
- Jacob BG, Griffith DA, Gunter JT, Muturi EJ, Caamano EX, Githure JI, Regens JL, Novak RJ: Quantifying stochastic error propagation in Bayesian parametric estimates using non-linear parameters of Anopheles gambiae s.l. habitats". International Journal of Remote Sensing. 2009,Google Scholar
- Kleinschmidt I, Omumbo J, Briet O, Giesen van de N, Soboga N, Mensah NK, Windmeijer P, Moussa M, Teuscher T: An empirical malaria distribution map for West Africa. Tropical Medicine & International Health. 2001, 6: 779-10.1046/j.1365-3156.2001.00790.x.View ArticleGoogle Scholar
- Diggle P, Moyeed R, Rowlingson B, Thomson M: Childhood malaria in the Gambia: a case-study in model-based geostatistics. Journal of the Royal Statistical Society, Series C (Applied Statistics). 2002, 51 (4): 493-506. 10.1111/1467-9876.00283.View ArticleGoogle Scholar
- Gao X, Asami Y, Chung CF: An empirical evaluation of spatial regression models. Computers and Geosciences. 2006, 32 (8): 1040-1051. 10.1016/j.cageo.2006.02.010.View ArticleGoogle Scholar
- Sun D, Tsutakawa RK, Kim HS, He Z: Spatio-temporal interaction with disease mapping. Statistics in Medicine. 2000, 19 (15): 2015-2035. 10.1002/1097-0258(20000815)19:15<2015::AID-SIM422>3.0.CO;2-E.View ArticlePubMedGoogle Scholar
- Diggle PJ, Tawn JA, Moyeed RA: Model-based geostatistics. Applied Statistics. 1998, 47: 299-350.Google Scholar
- Oberkampf WL, Trucano TG: Verification and validation in computational fluid dynamics. Progress in Aerospace Sciences. 2002, 38 (3): 209-272. 10.1016/S0376-0421(02)00005-2.View ArticleGoogle Scholar
- Hills RG, Leslie IH: Statistical Validation of Engineering and Scientific Models: Validation Experiments to Application. 2003, Alburquerque: Sandia National Laboratories, 92-View ArticleGoogle Scholar
- Beven K, Binley A: The future of distributed models: Model calibration and uncertainty prediction. Hydrological Processes. 2006, 6 (3): 279-298. 10.1002/hyp.3360060305.View ArticleGoogle Scholar
- Draper D: Assessment and Propagation of Model Uncertainty. Journal of the Royal Statistical Society: Series B (Methodological). 1995, 57 (1): 45-97.Google Scholar
- Hoeting JA, Madigan D, Raftery AE, Volinsky CT: Bayesian model averaging: a turorial (with comments by M. Clyde, David Draper and E.I. George, and a rejoiner by the authors). Statistical Sciences. 1999, 14 (4): 382-417. 10.1214/ss/1009212519.View ArticleGoogle Scholar
- Woolhouse MEJ, Dye C, Etard J, Smith T, Charlwood JD, Garnett GP, Hagan P, Hii JLK, Ndhlovu PD, Quinnell RJ: Heterogeneities in the transmission of infectious agents: Implications for the design of control programs. Proceedings of the National Academy of Sciences. 1997, 94 (1): 338-342. 10.1073/pnas.94.1.338.View ArticleGoogle Scholar
- Hills RG, Leslie IH: Statistical Validation of Engineering and Scientific Models: Validation Experiments to Application. SAND 2001-0312. 2003, Albuquerque: Sandia National LaboratoriesGoogle Scholar
- Henebry GM: Spatial model error analysis using autocorrelation indices. Ecological Modeling. 1995, 82 (1): 75-91. 10.1016/0304-3800(94)00074-R.View ArticleGoogle Scholar
- Griffith DA: A linear regression solution to the spatial autocorrelation problem. Journal of Geographical Systems. 2000, 2 (2): 141-156. 10.1007/PL00011451.View ArticleGoogle Scholar
- Getis A, Griffith DA: Comparative spatial filtering in regression analysis. Geographical Analysis. 2002, 34: 130-140. 10.1353/geo.2002.0009.View ArticleGoogle Scholar
- Griffith DA: Spatial autocorrelation on spatial filtering. Springer. 2003Google Scholar
- Muturi EJ, Mwangangi J, Shililu J, Jacob BG, Mbogo C, Githure J, Novak RJ: Environmental factors associated with the distribution of Anopheles arabiensis and Culex quinquefasciatus in a rice agro-ecosystem in Mwea, Kenya. Journal of Vector Ecology. 2008, 33 (1): 56-63. 10.3376/1081-1710(2008)33[56:EFAWTD]2.0.CO;2.View ArticlePubMedGoogle Scholar
- Mwangangi J, Muturi EJ, Shililu J, Muriu S, Jacob B, Kabiru E, Mbogo C, Githure J, Novak RJ: Environmental covariates of Anopheles arabiensis in a rice agroecosystem. Journal of the American Mosquito Control Association. 2007, 23 (4): 13-22. 10.2987/5605.1.View ArticleGoogle Scholar
- Muturi EJ, Mwangangi J, Shililu J, Muriu S, Jacob B, Kabiru E, Gu W, Mbogo C, Githure J, Novak RJ: Mosquito species succession and physicochemical factors affecting their abundance in rice fields in Mwea, Kenya. Journal of Medical Entomology. 2007, 44 (2): 336-344. 10.1603/0022-2585(2007)44[336:MSSAPF]2.0.CO;2.View ArticlePubMedGoogle Scholar
- De Jong P, Sprenger C, van Veen A: On extreme values of Morans I and Gearys c. Regional Studies and Urban Economics Entomology. 1984, 37 (4): 491-496.Google Scholar
- Mutero C, Blank H, Konradsen F, Hoek van der W: Water management for controlling the breeding of Anopheles mosquitoes in rice irrigation schemes in Kenya. Acta Tropica. 2000, 76: 253-263. 10.1016/S0001-706X(00)00109-1.View ArticlePubMedGoogle Scholar
- Mutero CM, Ng' ang'a PN, Wekoyela P, Githure J, Konradsen F: Ammonium sulphate fertilizer increases larval populations of Anopheles arabiensis and culicine mosquitoes in rice fields. Acta Tropca. 2004, 76: 253-263. 10.1016/S0001-706X(00)00109-1.View ArticleGoogle Scholar
- Jacob BG, Arheart KL, Griffith DA, Mbogo CM, Githeko AK, Regens JL, Githure JI, Novak RJ, Beier JC: Evaluation of environmental data for identification of Anopheles (Diptera: Culicidae) aquatic larval habitats in Kisumu and Malindi, Kenya. Journal of Medical Entomology. 2005, 42 (5): 751-755. 10.1603/0022-2585(2005)042[0751:EOEDFI]2.0.CO;2.PubMed CentralView ArticlePubMedGoogle Scholar
- Jacob BG, Muturi EJ, Mwangangi JM, Funes J, Caamano EX, Muriu S, Shililu J, Githure J, Novak RJ: Remote and field level quantification of vegetation covariates for malaria mapping in three rice agro-village complexes in Central Kenya. International Journal of Health Geographics. 2007, 6: 21-28. 10.1186/1476-072X-6-21.PubMed CentralView ArticlePubMedGoogle Scholar
- Shililu T, Ghebremeskel T, Mengistu S, Fekadu H, Zerom M, Mbogo C, Githure J, Gu WD, Novak RJ, Beier JC: Distribution of Anopheline mosquitoes in Eritrea. American Journal of Tropical Medicine and Hygiene. 2003, 69: 295-302.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.