Genetic population structure of Anopheles gambiae in Equatorial Guinea

Background Patterns of genetic structure among mosquito vector populations in islands have received particular attention as these are considered potentially suitable sites for experimental trials on transgenic-based malaria control strategies. In this study, levels of genetic differentiation have been estimated between populations of Anopheles gambiae s.s. from the islands of Bioko and Annobón, and from continental Equatorial Guinea (EG) and Gabon. Methods Genotyping of 11 microsatellite loci located in chromosome 3 was performed in three island samples (two in Bioko and one in Annobón) and three mainland samples (two in EG and one in Gabon). Four samples belonged to the M molecular form and two to the S-form. Microsatellite data was used to estimate genetic diversity parameters, perform demographic equilibrium tests and analyse population differentiation. Results High levels of genetic differentiation were found between the more geographically remote island of Annobón and the continent, contrasting with the shallow differentiation between Bioko island, closest to mainland, and continental localities. In Bioko, differentiation between M and S forms was higher than that observed between island and mainland samples of the same molecular form. Conclusion The observed patterns of population structure seem to be governed by the presence of both physical (the ocean) and biological (the M-S form discontinuity) barriers to gene flow. The significant degree of genetic isolation between M and S forms detected by microsatellite loci located outside the "genomic islands" of speciation identified in A. gambiae s.s. further supports the hypothesis of on-going incipient speciation within this species. The implications of these findings regarding vector control strategies are discussed.


Background
Malaria is an infectious disease that causes between 300-500 million annual clinical cases and 1.5-3 million deaths per year, mainly in children under five years old in sub-Saharan Africa [1]. Classical strategies of vector control developed in endemic areas of Africa, such as impregnated bed nets or indoor residual spraying, have not been as effective as expected, and malaria incidence is increasing. Among the factors involved in this failure are the lack of sustainability of vector control programmes and the emergence of insecticide resistance in mosquitoes [2].
Genetically based methods have been proposed for malaria vector control. These methods focus mainly in altering vectorial capacity through the genetic transformation of natural vector populations by means of introducing refractoriness genes or by sterile insect technologies [3]. Knowledge of the genetic structure of vector species is, therefore, an essential requirement as it should contribute not only to predict the spread of genes of interest, such as insecticide resistance or refractory genes, but also to identify heterogeneities in disease transmission due to distinct vector populations [4]. The most effective Afrotropical malaria vectors belong to the Anopheles gambiae complex, that comprises seven sibling species. Within the complex, A. gambiae sensu stricto (s.s.) is the most synanthropic species and shows remarkable genetic heterogeneity [5,6]. Cytogenetic analysis has revealed different chromosomal arrangements associated with paracentric inversions [5]. This has lead to the description of five chromosomal forms based in differences in the frequencies of polymorphic arrangements, geographical distribution and ecological data [5,7]. Furthermore, analysis of the X-linked ribosomal DNA cluster suggested further genetic subdivision within A. gambiae s.s. and led to the description of two molecular forms, provisionally named M and S, defined based on sequence differences in transcribed and non-transcribed rDNA spacers (IGS and ITS) [8,9]. Although the offspring between M and S forms are viable and fertile [10], M-S hybrids or cross-mating between the two forms are rarely observed in nature [6,11]. Genetic differentiation between molecular forms in this primary vector is of paramount relevance for the implementation and monitoring of its control, as illustrated by the extreme differences found in the distribution of knockdown resistance mutations among sympatric M and S form populations [12,13].
Previous population genetic studies pointed to a shallow population structure within major malaria vectors throughout the African continent, possibly as a result of recent population expansion leading to substantial retention of ancestral polymorphism [14,15]. The few cases of significant population differentiation have been attributed to barriers to gene flow, either physical or biological in the case of the M-S form partitioning in A. gambiae s.s. [16][17][18][19] However, recent studies suggest further subdivision within each of the molecular forms, as evidenced by significant levels of genetic differentiation among populations of different chromosomal forms, revealed by microsatellites and AFLP markers [20,21].
In Equatorial Guinea, malaria is one of the main causes of morbidity and mortality, being transmitted mainly by vectors of the A. gambiae complex [22]. In the island of Bioko, as well as in mainland Equatorial Guinea, both M and S forms are known to occur in sympatry. Different vector control measures are being implemented, including insecticide treated bed nets and indoors residual spraying [23]. However, studies regarding the genetic structure of A. gambiae s.s. remain scarce for Equatorial Guinea. The geography of the country, formed by both insular and continental regions, is likely to promote a greater biological heterogeneity among its vector populations. This may have important implications for the design and implementation of nationwide malaria vector control programmes. In addition, islands are regarded as potential sites for experimental releases of transgenic mosquitoes for malaria control, increasing the need for further genetic studies of its populations [18,24].
In this study, microsatellite markers have been used to estimate levels of genetic differentiation between populations of A. gambiae s.s. from the islands of Bioko and Annobón and from continental localities of Equatorial Guinea and Gabon, in order to determine the extent of population substructuring and its association with barriers to gene flow. until DNA extraction was performed according to Collins et al [26]. Species identification within the A. gambiae complex was done by PCR according to Scott et al [27]. Anopheles gambiae s.s. molecular forms were determined as described in Favia et al [28]. Although cytological analysis was not performed, the Forest cytoform of A. gambiae s.s. is likely to be the only one present in these localities [5,22].
An additional sample from Libreville (0°23'N/9°27'E), Gabon, was also included in the analysis. This sample was collected in 2000 and it is composed by S-form A. gambiae s.s. [29].

Microsatellite analysis
Eleven microsatellite loci [17,30] were genotyped: Ag3H128, Ag3H249, Ag3H119, Ag3H242, Ag3H577, Ag3H555, Ag3H59, Ag3H758, Ag3H88, Ag3H93 and 45C1. Only loci of chromosome 3 were used to avoid possible bias due to selective effects associated with paracentric inversions or reproductive isolation putative regions that are known to occur in chromosomes 2 and X [31,32]. Each locus was amplified by PCR using fluorescently labelled (FAM, NED, or HEX) forward primers [33]. Amplified fragments were separated by capillary electrophoresis in an automatic sequencer (ABI 3730, Applied Biosystems) and sizes scored using the software Gene-Marker (SoftGenetics, USA). Figure 1 Collection sites in Equatorial Guinea and Gabon.

Data analysis
Genetic diversity by locus and sample was characterized by estimates of unbiased expected heterozygosity (H e , [34]), and allele richness [35], available in FSTAT v 2.9.3.2 [36]. The latter estimate was used instead of the number of alleles per locus to account for differences in sample sizes. To account for differences in sample size, these estimates were re-calculated using randomly selected subsamples of each locality of size equal to the smallest sample size. Genotypic frequencies were tested against Hardy-Weinberg Equilibrium (HWE) proportions by exact probability tests performed in GENEPOP v. 3.4 [37]. Linkage disequilibrium to confirm independence between loci was tested by exact tests on contingency tables, also available in GENEPOP. was employed. Data was run using 10,000 simulations and a threshold of significance α = 0.01.
Finally, a Bayesian approach was used to infer the number of clusters (K) in the data set without prior information of the sampling locations, available in STRUCTURE 2 [45]. A model where the allele frequencies were correlated within populations was assumed (λ was set at 1, the default value). The software was run with the option of admixture, allowing for some mixed ancestry within individuals, and α was allowed to vary. Twenty independent runs were done for each value of K (K = 1 to 9), with a burn-in period of 100,000 iterations and 100,000 replications. The method of Evanno et al [46] was used to determine the most likely number of clusters. This approach uses an ad hoc quantity, ∆K, based on the second order rate of change of the likelihood function between successive values of K.
Whenever multiple tests were performed the nominal significance level (α = 0.05) was adjusted by the sequential Bonferroni procedure [47].

Species and molecular form identification
A total of 213 female A. gambiae s.s. were analysed in this study. Of these, 133 individuals were of the M molecular form, corresponding to the samples of Ngonamanga (45) and Bata (28) on the continent, Malabo (36) in the island of Bioko and Annobón (24). The sample of Sácriba (35), in Bioko, and the sample of Gabon (45) were composed by S-form individuals. Both molecular forms were found in sympatry in Ngonamanga and in both localities of Bioko island. However, the low numbers (N <20) of Sform individuals collected in these localities (or M-form in the case of Sácriba) precluded further analyses. The samples of Annobón and Bata had only M-form individuals and in Gabon only the S-form has been reported [29].

Within population genetic variability
Polymorphism at microsatellite loci varied, with allelic richness per locus ranging between four (Ag3H577 and 45C1) and 11 (Ag3H128). Two loci, Ag3H555 and 45C1, were monomorphic in Annobón. This island showed the lowest average allelic richness (3) compared to all other localities (7)(8) and also had the lowest mean expected heterozygosity (0.436). The lowest genetic diversity cannot be explained by the low sample size for Annobón, as comparable differences were obtained when data was reanalysed using randomly selected sub-samples of N = 24 for all sites other than Annobón (  assigned to Bata. Similarly, within the S-form over 30% of the individuals from Libreville were mis-assigned to Sácriba, in Bioko.

Bayesian cluster analysis performed with STRUCTURE
[45] showed that the most likely K value identified was K = 3 ( Figure 2a). This corresponds to three distinct genetic clusters (  ). In other islands in close proximity with mainland, genetic diversity was also similar to that of adjacent continental ones [24]. In contrast, the sample from the island of Annobón showed much lower levels of genetic diversity than Bioko samples. Bioko and Annobón lie at the opposite extremes of a volcanic chain in the Gulf of Guinea, which also includes the archipelago of São Tomé and Príncipe (STP). In these islands, estimates of genetic diversity were intermediate to those found in Bioko and Annobón (H e : 0.45-0.55; [18]). These findings overall agree with principles of island biogeography, in which biological (and genetic) diversity is positively correlated with the size of the island and negatively correlated with distance from mainland [50].

Discussion
The heterozygosity tests suggested an expansion process in both M and S-form continental populations of A. gambiae s.s., in agreement with previous works based on mainland populations of this species [51]. Within the island of Bioko, the differences found between M and S samples may indicate different historical processes. While the Mform was found at MDE, the S-form appears to be expanding. This pattern could be due to different timings of arrival of the two molecular forms on the island. The M-   Microsatellite allele frequencies in A. gambiae s.l. tend to vary little over generations, reflecting large effective population sizes [53,54]. The higher differentiation between M and S forms was also evident from the assignment tests performed in the present study, in which most mis-assignments were shared between samples of the same molecular form regardless of its geographic origin. Bayesian cluster analysis further supported this partitioning, by grouping together M and S form samples in two separate clusters, again independently of sample location. In a previous study, a significant F ST estimate (0.070) had also been obtained by microsatellite analysis, between sympatric M and S forms from Malabo [55].  Altogether, these results agree with the notion of a biological discontinuity within A. gambiae s.s., and that M and S forms are likely to be the result of an on-going incipient speciation process [6]. Evidence of limited gene flow between molecular forms has been described in other West African countries, with different genetic markers [4,8,13,56,57]. However, several studies, some of which based on microsatellites, suggest that the highest genetic differentiation between M and S forms appears to be restricted mainly to certain genomic regions, particularly in the low-recombination centromeric regions of chromosome X and chromosome 2L [32,[58][59][60][61]. This led the authors to hypothesise that these regions contain genes responsible for reproductive isolation. In this study, high differentiation between M and S-forms was detected by the analysis of microsatellites mapped in chromosome 3, i.e. outside regions where putative isolation genes are thought to occur, reinforcing the idea of high levels of genetic isolation between molecular forms in this geographic region. Similarly, Wondji et al [16] also observed high differentiation between sympatric M and S-forms in Cameroon, with the analysis of microsatellites located outside the centromeric regions of chromosomes 2L and X. Whilst their results may appear conflicting with those from Turner et al [59], given that both studies were based on samples from Cameroon, this may not be case as different genetic markers (i.e. microsatellites and microarray probes) were used. Microsatellites detect allele frequency differences in highly polymorphic regions of the genome, while hybridization approaches using microarrays will detect differentiation in regions where polymorphism is relatively low within each form relative to differences between forms, such as the case of centromeric regions. On the other hand, in a recent microsatellite-based study carried out in Ghana, levels of population differentiation in A. gambiae s.s.were more attributable to ecological zones rather than to the M-S molecular form partitioning [49]. These apparent differences may suggest that, although it is clear that incipient speciation is on-going within A. gambiae s.s., the degree of isolation between its reproductive units is likely to vary throughout the species eco-geographic distribution range.

Bayesian cluster analysis using STRUCTURE
Within the M-form, the low levels of differentiation between the sample of Bioko and those from continental Equatorial Guinea suggest that gene flow between this island and the mainland is likely to occur. Reimer et al [55] detected slightly higher levels of population differentiation between Bioko island (Malabo) and sites from the nearest continental country, Cameroon (F ST : 0.038-0.057). Being the capital of the country, connections with continental Equatorial Guinea (Bata), by air or sea at a daily frequency, may promote gene flow through humanmediated transportation of mosquitoes. Several studies provided evidence of human activities promoting gene flow in mosquito populations between islands or between islands and mainland [62]. Conversely, the highest levels of population differentiation were found in all comparisons that involved the M-form sample of Annobón island. This supports a higher degree of isolation of this island and agrees with previous studies demonstrating the ocean and other extensive water-bodies as a physical barrier to gene flow in anopheline species [18,19,24,63]. Similarly, microsatellite-based studies conducted in the neighbouring STP islands also showed high levels of differentiation with the continent (F ST : 0.118-0.250) [18] and subsequent sequencing analysis of rDNA and mitochondrial DNA regions suggests only two main colonization events of A. gambiae s.s. into these islands [Marshal et al, unpublished].

Conclusion
In the present study, strong levels of population substructure were detected in A. gambiae s.s. from Equatorial Guinea. Patterns of genetic differentiation are most likely governed by the presence of both physical/geographic (the ocean) and biological (the M-S form discontinuity) barriers to gene flow. These findings have important practical implications for the management of vector control strategies. The biological partitioning between M and Sforms may influence the evolution of genes of interest such as insecticide resistance genes. An unusual frequency of knockdown resistance (kdr) mutations has been detected in the M-form population of Bioko, contrasting with the absence of these alleles in the S-form of this island [55]. This implies that a detailed characterization of the distribution of M and S forms at a local level and continuous monitoring of kdr mutations within each form would be desirable for a rational management of insecticides for malaria control. The closest proximity and lowest differentiation with mainland coupled with the genetic isolation found between sympatric M and S form populations in Bioko, could make this island inappropriate for initial experimental releases of genetically modified mosquitoes, as only part of the vector population might be affected. On the other hand, in Annobón the presence of a single molecular form coupled with its higher geographic and genetic isolation, might render this island comparatively more suitable for transgenic-based malaria control.

Authors' contributions
MM was involved in the design of the survey, microsatellite genotyping, data analysis and manuscript preparation. PS participated in data analysis and drafting the manuscript. JLV carried out microsatellite and data analysis. JC participated in field surveys and helped drafting the manuscript. PB and AL participated in molecular analyses and in the elaboration of the manuscript. FS and AC were involved in sample collections, molecular analyses and revised the manuscript. VER participated in the design of the study and revised the manuscript. JP conceived and co-supervised the study, assisted data analysis and coordinated the draft of the manuscript. AB participated in the conception and design of the study, revised the manuscript and provided overall supervision to the work. All authors read and approved the final manuscript.