Effective population size of Anopheles funestus chromosomal forms in Burkina Faso

Background As Anopheles funestus is one of the principal Afro-tropical malaria vectors, a more complete understanding of its population structure is desirable. In West and Central Africa, An. funestus population structure is complicated by the coexistence of two assortatively mating chromosomal forms. Effective population size (Ne) is a key parameter in understanding patterns and levels of intraspecific variation, as it reflects the role of genetic drift. Here, Ne was estimated from both chromosomal forms, Kiribina and Folonzo, in Burkina Faso. Methods Short-term Ne was estimated by evaluating variation at 16 microsatellite loci across temporal samples collected annually from 2000–2002. Estimates were based on standardized variance in allele frequencies or a maximum likelihood method. Long-term Ne was estimated from genetic diversity estimates using mtDNA sequences and microsatellites. Results For both forms, short-term and long-term Ne estimates were on the order of 103 and 105, respectively. Long-term Ne estimates were larger when based on loci from chromosome 3R (both inside and outside of inversions) than loci outside of this arm. Conclusion Ne values indicate that An. funestus is not subject to seasonal bottlenecks. Though not statistically different because of large and overlapping confidence intervals, short-term Ne estimates were consistently smaller for Kiribina than Folonzo, possibly due to exploitation of different breeding sites: permanent for Folonzo and intermittent for Kiribina. The higher long-term Ne estimates on 3R, the arm carrying the two inversions mainly responsible for defining the chromosomal forms, give natural selection broader scope and merit further study.


Background
The efficient application of malaria control methods that target the mosquito vector depends upon knowledge of its population genetic structure. This information can improve current insecticide-based strategies and aid in the management of insecticide resistance, but it is also essential to future genetic control or modification strategies that aim to reduce, eliminate or replace vector popula-tions with non-vectors. Unfortunately, present understanding of the population structure of any malaria vector is insufficient to underpin a genetic control programme, and nowhere is this shortfall more critical than in sub-Saharan Africa where three widespread species (Anopheles gambiae, Anopheles arabiensis and Anopheles funestus) are responsible for transmitting most of the 1-3 million fatal cases each year [1].
An. funestus, one of the most anthropophilic vectors known, exploits permanent or semi-permanent breeding sites such as marshes or rice fields. Its population density peaks in the dry season, extending malaria transmission by relay after An. gambiae and An. arabiensis populations have declined [2]. A highly polymorphic species, its population structure appears quite shallow across continental Africa. Evidence from microsatellite and mtDNA markers suggests a division between populations on either side of the Great Rift Valley complex, but little differentiation among populations within these regions, even between locations spanning several thousand kilometers [3]. However, in Burkina Faso, West Africa, analysis of polymorphic chromosomal inversions has revealed cryptic complexities in population structure. A temporally and spatially stable pattern of inversion polymorphism in sympatric and synchronous samples of An. funestus is inconsistent with random mating, suggesting the presence of two assortatively mating chromosomal forms designated Folonzo and Kiribina [4,5]. The relative abundance of Kiribina near rice cultivation and Folonzo near marshy areas suggests some partitioning of larval habitats, and the latter form is more likely to rest indoors, feed on humans, and be infected with malaria parasites [4]. At the molecular level, differentiation between forms is slight and is accounted for mainly by markers mapping to the chromosome arm (3R) bearing the principal inversions whose frequencies define the forms [6,7]. The working hypothesis is that Folonzo and Kiribina forms of An. funestus are incipient species whose distinctions are linked, at least in part, to chromosomal inversions.
Effective population size (N e ) is a central parameter in the description of population structure, though notoriously difficult to estimate with precision [8]. N e is inversely related to genetic drift, the rate of fluctuation in allele frequencies caused by random sampling [9], and can be defined as "the size of an ideal population that exhibits the same rate of drift as the actual population it characterizes" [10]. In an ideal population, one which is randomly mating, constant in size and uniform in reproductive potential, N e is equal to N, the census population size. In actual populations, N e is usually less than N, because of large variance in reproductive success among individuals. This is especially true where population size fluctuates seasonally, as is the case for the primary malaria vectors of Africa, because N e approximates the harmonic mean of single-generation sizes and, therefore, more closely reflects the lowest values. N e has been estimated for An. gambiae and An. arabiensis by a number of authors in different parts of Africa using indirect genetic methods and direct mark-release-recapture [11][12][13][14][15][16]; values were typically at least 10 3 , inconsistent with seasonal bottlenecks. Because no estimates were available for An. funestus, N e was estimated for Folonzo and Kiribina forms in Burkina Faso using temporal variation at 16 microsatellite loci across 3 consecutive years, and sequence variation in an 834 bp region of the mitochondrial DNA ND5 gene.

Microsatellites and mtDNA
Genomic DNA was extracted from single mosquito carcasses [7]. Prior to analysis by PCR, genomic DNA was diluted 1:10 in H 2 O (~5 ng/ul). Morphological identification of An. funestus was verified on each specimen by a modified rDNA-based PCR assay [7].
For 2001, microsatellite data of Michel et al. [7] were taken from 50 randomly chosen specimens of both chromosomal forms. For 2000 and 2002, the same 16 physically mapped microsatellites were PCR amplified from 50 specimens of each form (exact sample size given in [Additional file 1]). Products were diluted, pool-plexed (two groups of eight loci each), genotyped on a Beckman-Coulter CEQ8000, and sized with software provided (see [7] for detailed methods). Fstat2.9.3.2 [17] was used to estimate allelic richness (R s ), deviations from Hardy-Weinberg equilibrium (inbreeding coefficient, F IS ) and linkage disequilibrium. The impact of suspected null alleles was explored by using MICRO-CHECKER [18] to adjust allele and genotype frequencies based on the Brookfield 2 estimate [19], followed by repeated analyses on the null allele-adjusted data set, as described previously [7]. No significant changes in outcome were observed. Microsatellite differentiation (F ST ) between Folonzo and Kiribina samples from each year was computed with Microsatellite Analyzer [20]. There was no significant mtDNA or microsatellite differentiation between these locales for samples of the same chromosomal form [6].

Estimates of short-term N e
Short-term N e was estimated using temporal changes in microsatellite allele frequencies. Under the assumption of no mutation, selection or migration, changes in allele frequency are the result of genetic drift, whose strength is inversely related to population size. Both a moment estimator (based on the standardized variance in allele frequency change, F [21]) and a probability method (the maximum likelihood method of [22]) were used to calculateshort-term N e from temporal samples. Although the probabilistic approach has been shown to have higher accuracy and precision [8], the moment method of Waples [21] (hereafter, F-statistic method) was the approach adopted for estimates in other vectors; including it facilitated comparison of N e between vector species. The F-statistic method was implemented using the software programme NeEstimator [23]. The maximum likelihood method (hereafter, ML method) was implemented using the programme MLNE 2.0 [24]. To reduce computational burden, maximum population size was set initially at 15,000. Subsequently, analyses were repeated using a maximum size of 25,000. In both cases, the upper bound of the 95% confidence interval (CI) always reached the maximum limit for Folonzo (and did so for one sampling period for Kiribina). Thus, the upper bound CI was not determined in these cases. Following other authors [12][13][14][15], N e was evaluated under the conservative assumption of 12 generations per year, but the effect of fewer (10) or more (20) generations per year was also explored.

Estimates of long-term N e
Long-term N e was estimated from current genetic variation (θ = 4N e μ for autosomal loci, where μ is mutation rate), under the assumption of mutation-drift equilibrium (MDE). For microsatellites, current genetic variation was represented by Xu and Fu's [25] estimator, θ F , based on sample homozygosity under the single-step stepwise mutation model. The average mutation rate for dinucleotide microsatellite repeats is unknown for An. funestus, as for An. gambiae. In the latter species, an upper-bound estimate of 10 -4 mutations per locus per generation based on the data of Zheng et al. [26,27] has been used [12], but this value is likely an overestimate. Accordingly, a slower mutation rate of 9.3 × 10 -6 mutations per microsatellite locus per generation was adopted, based on the estimates derived from dinucleotide microsatellites in Drosophila melanogaster [28]. Mitochondrial values of θ were estimated from mean pairwise sequence differences per site (π [29]) and from the number of segregating sites among sequences (S [29]) as calculated in DNASP v 4.10.1 [30]. Assuming a mutation rate of 5.7 × 10 -8 per base per gamete [31], long-term N e was estimated using the equation θ = N e μ.

Results and discussion
Sample sizes and summary polymorphism statistics are presented in [Additional file 1]. Allelic richness and heterozygosity were relatively high across years and chromosomal forms (mean R s per locus ranged from 4.4 to 21.0; mean H o per locus ranged from 0.41 to 0.81). All loci were found to be in linkage equilibrium. Significant Hardy-Weinberg deficits occurred in 12 of 96 possible tests, but were clustered at 3 loci (AFND12, FunD, and AFUB12) suspected of harboring null alleles [6,7].
For all three sampling years there was slight but significant differentiation between the chromosomal forms, with frequency differences between loci on chromosome 3R (both inside and outside of inversions) accounting for much but not all of the differentiation (Table 1). Table 2 presents estimates of short-term N e across sampling intervals of one year (2000)(2001)(2001)(2002) and two years (2000)(2001)(2002) for both chromosomal forms. The number of generations per year is uncertain, but 12 are plausible and the actual number should fall within the extreme values employed of 10 to 20. Between these values, all N e estimates were on the order of 10 3 , indicating the absence of seasonal bottlenecks. This is consistent with results from An. gambiae and An. arabiensis, even under severe climatic conditions, on islands, and following insecticide spray campaigns (e.g., [13,14]).

Short-term N e
Regardless of the estimation method, N e values for the Kiribina form at each time point were at least 2-3 times smaller than those for the Folonzo form, though the 95% CIs overlapped. The upper bound of the 95% CI for , the upper bound of the 95% CI for Kiribina was always defined, and ranged from ~1,300-14,000. Ecological data concerning the differences between these chromosomal forms is very incomplete, owing to the lack of molecular markers and the consequent necessity of determining karyotype from semi-gravid females by laborious cytogenetic methods. Preliminary indications are that Kiribina may prefer anthropogenic breeding sites such as rice fields, whereas Folonzo predominates in association with more natural and permanent habitats such as marshes and swamps. If confirmed by follow-up studies, the different breeding habitats may help explain differences in N e , because rice fields do not produce anophelines continuously. Rice fields are flooded in June/July and harvested in October/November, with a second crop sometimes grown between March and June. Even during these intervals, the ability of An. funestus to exploit rice fields depends upon the stage of rice growth [32].
Estimates of N e produced by the F-statistic and ML methods were very similar, except for the two-year sampling interval for the Folonzo form, where N e values estimated from F were several-fold larger than those from the ML method and were inconsistent with both of the corresponding one-year sampling intervals. Unlike the ML method, the F-statistic method is known to have an upward bias if rare alleles are present, and it is possible that the longer sampling interval resulted in more rare alleles that led to an overestimate of N e [22]. Because of the inconsistent performance of the F-statistic method and the greater precision of the ML method [8], ML estimates of N e should be given greater weight. However, it should be noted that both methods assume the absence of mutation, selection and migration. While the first assumption is not unreasonable over short sampling intervals, and direct selection on microsatellite markers seems unlikely, the assumption of an isolated population without immigration may not be [24]. In the short-term, ignoring immigration, if it actually exists, leads to underestimates of N e [8]. Both methods also assume constant population size and discrete generations, which are not realistic for An. funestus. Fluctuating population size with  overlapping generations can lead to overestimates of N e [33].

Long-term N e
Unlike short-term N e that reflects recent effects on genetic variation in a focal population, long-term N e reflects evolutionary forces and demographic processes over a much greater geographic and historical frame (on the order of N e generations; [8]), thus it approaches the effective size of the entire species. Long-term N e was estimated from genetic diversity based on microsatellites and mtDNA sequences from samples collected in 2002 (Table 3). Microsatellite data from other years gave very similar estimates and, therefore, are not shown. As expected, longterm estimates are 2 orders of magnitude larger than the short-term N e values (10 5 versus 10 3 ). Moreover, the estimates from microsatellites and mtDNA are roughly in accord. Departing from recent convention for estimates of N e in other Afro-tropical vector species, 9.3 × 10 -6 was assumed as the mutation rate for microsatellites, an order of magnitude slower than the upper-bound estimate for An. gambiae [12] but consistent with measurements from D. melanogaster [28]. The mtDNA mutation rate was assumed to be 5.7 × 10 -8 . The uncertainties in these mutation rates as applied to An. funestus mean that the longterm N e values lack precision. It also should be noted that because of recent population expansion, An. funestus populations west of the Rift Valley (including those from Burkina Faso) violate the assumption of MDE under which long-term N e is derived [3,6]; the departure from MDE is suggested by divergent estimates of θ from S or π seen in Table 3. The degree to which the long-term estimates of N e are biased by population expansion is not known. However, long-term N e estimates from Burkina Faso can be compared to long-term estimates from countries east of the Rift Valley where the evidence does not support a population expansion [3]. Using values of gene diversity derived from the data of Michel et al [3], who employed 10 of the 16  Comparison between Folonzo and Kiribina N e values is not hindered by uncertainty in mutation rate, as this rate should be the same for both forms at a given set of loci.
The N e estimates from mtDNA are essentially identical between the forms. Across all 16 microsatellite loci, it appears that N e for Folonzo is slightly larger than that for Kiribina. However, partitioning the loci into two groupsthose residing on chromosome 3R or those that map elsewhere in the genome -reveals that the difference between forms can be explained by loci on 3R, the arm that carries the two main chromosomal inversions involved in distinguishing the forms [4,5]. There is higher diversity on this arm in Folonzo than in Kiribina. This is not altogether surprising, as the deterministic algorithm which defines these forms allows more polymorphism in Folonzo with respect to alternative arrangements on 3R. However, this is not the entire story. The relative level of genetic diversity at the 5 loci on 3R versus the 11 loci outside 3R is significantly higher in both forms (Mann-Whitney test: Folonzo U = 43.5, P < 0.03; Kiribina U = 49.5, P < 0.01). The reason(s) for this pattern are not clear from the present data, but higher genetic diversity provides more scope to natural selection and further investigation of chromosome 3R seems warranted.

Conclusion
The present estimates of effective population size for An. funestus, though approximate, preclude strong seasonal bottlenecks. Based on microsatellite variation between temporal samples of An. funestus in Burkina Faso, shortterm estimates of N e were on the order of 10 3 and were consistently smaller for the Kiribina than the Folonzo chromosomal form, possibly related to the preference of Kiribina for breeding in rice fields that are not in continuous production. Long-term estimates of N e were consistent between classes of marker (microsatellites or mtDNA), and were two orders of magnitude larger than short-term estimates. Although long-term N e did not differ between chromosomal forms, in each case there was significantly higher genetic diversity on the chromosome arm (3R) that carries the principal inversions defining the two taxa, a finding that merits further study.
As N e reflects the strength of genetic drift, it is a central parameter in descriptions of population genetic structure.
The N e estimates derived here provide some insight into the complex population structure of An. funestus in Burkina Faso where the two chromosomal forms coexist. However, the geographic extent and nature of these forms has not been well characterized outside of Burkina Faso. In general, An. funestus is a heterogeneous species that occupies diverse habitats across Africa, where its population structure and history may differ. A more complete understanding of the forces that structure genetic variation in An. funestus will depend upon additional studies in other parts of its range.

Authors' contributions
APM participated in field collections, conducted the microsatellite genotyping, performed the data analysis, and drafted the manuscript. OG performed karyotype analysis and mtDNA sequencing and analysis. WG participated in collection efforts and performed karyotype analysis. N'FS, CC and NJB participated in the design and coordination