Classification of populations of An. gambiae s.s. into chromosomal forms using four paracentric chromosome inversions (2Rb, c, j and u) has been useful in identifying discrete sub-populations in Mali. Expanding this interpretation to include other areas in Africa has proven difficult because a significant proportion of karyotypes are ambiguous with respect to the classification scheme developed and used in Mali. Additional information, including molecular form, the 2La inversion and multi-locus microsatellite genotypes was employed in an attempt to obtain higher resolution of population structure in this species.
Seven groups were identified in the Bayesian analysis using molecular form, 2La, 2Rj, 2Rb, 2Rc, 2Rd, and 2Ru inversion genotypes as markers (Figure 2A). These seven groups are Forest-M, Forest-S, Godola-S, Savanna-S, Bamako, Mopti-M bc/bc, and other Mopti-M forms (Figure 2A). Each group showed distinctive inversion polymorphism, and each occupied a different land cover type (Figure 1). Bayesian analysis using microsatellite data for the most part supported the analysis based on chromosome inversions and molecular form, with some notable exceptions (Figures 2B and 2C). The microsatellite analyses at K = 4 and K = 5 resolved groups with roughly equivalent likelihoods (Figure 3B). Microsatellites failed to resolve the Godola-S group within the range of K values (K = [1, 8]). At K = 4 the S form was divided into the Forest-S form and all other S forms. At K = 5 the other S form group was split into the Bamako and Savanna forms.
The M form
The Bayesian analysis based on molecular form and karyotype divided M form populations into three discrete groups. The Forest M form was characterized as having the standard arrangement for all six major chromosomal inversions, namely 2La, 2Rj, 2Rb, 2Rc, 2Rd and 2Ru. The Mopti-M form was further divided into two groups, one that is fixed for the 2Rbc inverted arrangement and the other M forms with the 2La/a genotype. The Forest-M form was the most distinct (Figures 2 and 4). The high degree of genetic divergence between the Forest-M form and all the others was indicated not only by the 2R chromosome arrangement, but also at 20 microsatellite markers on the second and third chromosomes (Table 3, Figure 4). Moreover, the Forest-M form showed a strong association with very wet environments, distinguishing this form from the Mopti-M form, which is most abundant in much drier habitats (Figures 1 and 6, Table 4).
Despite occupying the same ecological zone and being identical with respect to karyotype (homosequential), the level of genetic differentiation (FST = 0.04263, P = 0.0000) between the Forest-M and Forest-S forms was higher than among any of the other subspecific forms of An. gambiae s.s. (Table 3, Figure 4). The relationship between the Forest-M and Forest-S forms is clearly different from the relationship between the Mopti-M and Savanna-S forms in Mali. Hybridization between the Mopti-M and Savanna-S forms, although rare, has been frequently reported [8, 39–41], whereas reproductive isolation between sympatric Forest-M and S forms appears to be complete [14, 15, 35]. These results demonstrate that reproductive isolation, and not ecological barriers, plays the major role in limiting gene flow between the Forest M and S forms in Cameroon and between the Mopti-M and Savanna-S forms in Mali.
The M forms were further sub-divided into two groups based on the Bayesian analysis of karyotype data (Mopti-M and "Other"-M, Figure 2A), while microsatellite data structured them as a single group (Figure 2B and 2C). The Mopti-M form is characterized as being fixed for the double homozygous b/c karyotype, whereas the "Other"-M form group carried the 2Rb and c inversions in other combinations or were standard karyotype. Both were fixed for the 2La inversion. Although genetic differentiation between the two groups as measured by pair-wise FST is significant, the degree of divergence is relatively small and insufficient to classify them as two separate groups, that is as two distinct gene pools.
The S form
The Bayesian clustering analysis based on molecular form and karyotype grouped S form populations into four group: Forest-S, Godola-S, Savanna-S and Bamako-S (Figure 2A). The Forest-S group includes individuals with the standard (non-inverted) karyotype and individuals carrying the 2La and/or 2Rb inversions. Forest-S 2Rb heterozygotes were located in woody savanna habitats (Ndop and Foumbot), while only Forest-S form with standard arrangement were collected from evergreen forest habitats (Mutengene and Tiko). This distribution may reflect selection for the 2Rb inversion in drier habitats. S form 2Rb homozygotes were classified as Godola-S form if the 2Rd inversion is also present, Savanna-S form if no other inversions occur on chromosome 2R, or Bamako-S if the 2Rj, 2Rc and 2Ru inversions occur together.
The Bamako-S form was not resolved by the analysis based on microsatellites with K = 4, but was resolved at K = 5. The Godola-S form however, was not resolved by the microsatellite data at either K = 4 or K = 5. The Godola-S group represents a distinctive group with respect to inversion polymorphism, notably a high frequency of the 2Rd inversion and 2Rb and c combinations not typically found in individuals within the S molecular form. However, the distinctive karyotypic polymorphism was not captured by the Bayesian analysis of microsatellite polymorphism.
Analysis of levels of genetic divergence, described with microsatellite-based FST values (Table 3, Figure 4) suggest several relationships among the An. gambiae populations studied here. The major division distinguishes the Forest-M form from all others (Figure 4). S form populations form four groups.
Interestingly, the Bamako-S and Savanna-S forms are the least diverged of the four, although it is the Bamako chromosomal form that has received the most attention, including the suggestion that it represents a distinct species [37, 42].
Forces affecting the distribution of forms
Recently Yawson et al.  reported higher levels of genetic divergence between within-form populations from different environments relative to between-form populations from the same environments in Ghana and Burkina Faso. This led them to conclude that barriers to gene flow among populations are due more to ecological barriers than reproductive isolation between molecular forms. The two sites in Cameroon from which we described the Forest-M form occur within a region that is more or less contiguous with the mangrove strand sites in southern Ghana from which Yawson et al.  collected M forms for their study . Furthermore, the 1 km resolution land-cover map of Africa by Mayaux et al.  indicates that land-cover in northern Ghana is similar to southern Mali. These results suggest that the M form collected by Yawson et al.  in northern Ghana represent the Mopti- M form while the M form they collected in the mangrove strand of southern Ghana are of the Forest-M form. It is, therefore, likely that what Yawson et al.  describe as intra-form comparisons of M form populations are in fact inter-form comparisons between Forest-M and Mopti-M forms. Overall, the results suggest that reproductive isolation among forms plays an important role in restricting gene flow among populations. Evidence laid out here does, however, support the idea that ecological barriers also play a significant role.
The correlations between 2La and 2Rb inversion frequencies and precipitation (Figure 6) were significant. In this study, precipitation is used as a proxy for different habitat type, rather than the causal factor for the distribution of 2La and 2Rb. Inversion polymorphisms in natural populations of An. gambiae s.s. are very complex and simple summary statistics, such as annual precipitation, are not sufficient to capture the diverse nature of the forces driving their distribution in nature. For example, the Godola collection has an unusual concentration of the 2Rd inversion which is difficult to explain by precipitation alone. Variation in rainfall was an obvious first environmental factor to consider, however describing this variable is not as trivial as it may seem. Rainfall is measured from 0 (meaning no rain) to some rain, and most of the regions included in this study have both rainy and dry seasons. Regions with high precipitation show greater variation in rainfall. Therefore, the correlation between genetic markers and the standard deviation of precipitation resulted in the same trend as the correlation with precipitation. Moreover, the standard deviation in rainfall is far greater than the mean (Table 4), illustrating the volatile nature of precipitation data. Examining the relationship between the distribution of genetic markers and rainfall in places like Tiko and Mutengene is further complicated because they have two rainy and two dry seasons. Other measures of rainfall variation, such as number of consecutive dry days or length of dry season can be explored. Development of summary statistics that may better reflect various rainfall patterns, as well as investigation of other environmental parameters are needed.
Microsatellites did detect population groups according to ecology and molecular forms. Division of Mopti-M, Savanna-S and Bamako-S groups are consistent with previous studies. Subdivision of the M form into Mopti-M and Forest-M forms has strong support in all of the Bayesian analyses presented here, as was subdivision of the S form into Savanna-S and Forest-S forms, although examination of Figures 2B and 2C suggest that the latter division is less clear.