The genetic structure of a population is shaped by interactions between the behaviour of individuals and their prevailing environment . These factors in combination, can influence the magnitude of gene flow within and between populations and the genetic structure of populations of a species throughout its range . Ecological diversification has been suggested to be one of the factors that can interrupt the movement of genes by creating landscape or genetic barriers such as rapid chromosomal evolution that may ultimately result in reproductive divergence and speciation . Understanding the relationship between individuals and their surrounding environment can, therefore, provide information about the movement of genes from one individual/population to another. This information is important not only for understanding species evolution, but also, in the case of disease vectors, for their control.
The method commonly used to study patterns of gene flow includes identifying genetic units within an ecological landscape and the features that help to shape the spatial distribution of these units . This involves describing genetic boundaries among pre-selected collection sites, presumed to be Mendelian populations, and estimating levels of gene flow among them using F
ST values or other parameters. Using this information it may be possible to deduce which features/factors are responsible for restricting or promoting movements of genes between or within these populations [5, 6]. Although this method has been useful, it suffers from drawbacks. It may for example, not be informative for small areas and relies on prior assumptions of population limits [[4, 7, 8]]. The development of Bayesian clustering methods have proven useful since they use individual genotypes as a sole source of information and individuals can be partitioned into genetic units with genotype frequencies in Hardy-Weinberg equilibrium . Bayesian clustering methods have gained popularity and have been applied to population genetic studies of a range of organisms such as humans , animals [4, 11] and plants .
Mosquitoes of the Anopheles gambiae complex, include the two primary vectors (An. gambiae s.s. and Anopheles arabiensis) of human malaria in sub-Saharan Africa, that are responsible for an estimated 240 million cases and 280,000 deaths worldwide, with over 80% occurring in Africa . These two species are the most widespread members of the An. gambiae complex and major vectors of malaria . Although they are commonly found occupying similar ecological niches, An. gambiae s. s. is associated with more humid climates than An. arabiensis, which has a greater tolerance for drier environments [14, 15]. Additionally, An. gambiae s.s. are highly anthropophagic [13, 16], endophagic and typically endophilic , whereas An. arabiensis are more zoophagic, exophagic , and exophilic [18, 19]. A strong pre-copulatory barrier exists between the two species. Although the post-mating isolation mechanism is incomplete, hybrids which are fertile  are competitively inferior as evidenced by rare hybrids in nature (0.02-0.76%) [6, 21].
Population substructure is more pronounced in An. gambiae s. s and is thought to be influenced by environmental heterogeneity . For example based on chromosomal inversions, five distinct An gambiae s.s. subpopulations, which exist in sympatry, have been revealed in West and Central Africa [6, 23]. Studies from north, south and western Africa have reported some degrees of genetic differentiation between An. arabiensis populations [[24–26]]. However, neither physical barriers nor geographic distance has been reported to be forces responsible for An. arabiensis population differentiation [26, 27], except for island populations whose genetic differentiation has been associated with historical drifts .
Anopheles gambiae s.s. subpopulations have been studied intensively and are of great epidemiological importance as they have been suggested to undermine available malaria vector control efforts. For example the two molecular forms (S and M) of An. gambiae s.s. have been reported to respond differently to control interventions, as the S form has developed resistance to pyrethroids (insecticides used for impregnating bed-nets) while the M form remains largely susceptible [28, 29]. In addition to undermining current control interventions, such population subdivisions are expected to pose more challenges to the application of new genetic control approaches. Such population subdivision may require genetic modification of multiple strains for successful introduction and spread of desired traits into wild populations [5, 18]. Therefore, understanding the genetic structure and relative amount of gene flow taking place within and among wild populations is an important component for effective planning and implementation of available insecticide-based vector control approaches. Additionally, poor understanding of the genetic structure and level of gene flow between target populations may possibly undermine proposed genetic control strategies, especially those that aim at reducing mating success of genetically-modified and/or sterile male mosquitoes from natural populations [30, 31]. The existence of genetic substructure within vector populations would create barriers that may restrict the spread of desired genes [5, 32].
This study, characterised the population structure of An. gambiae s.s. and An. arabiensis within the Kilombero valley (6650 km2) located in southern Tanzania. Although there are remarkable reductions in transmission intensities, this valley experienced some of the most intense malaria transmission in the world [33, 34]. Epidemiological studies in the valley revealed that, malaria transmission intensities, as indexed by entomological inoculation rate (EIR, number of infective bites a person is exposed to in a year) are very high and range between 100 to 1000s of infective bites per annum [[33, 35–37]]. It remains unclear whether such high levels of transmission can be attributed to environmental factors that affect mosquito population density and distribution within the valley [38, 39] or whether genetic factors that increase vectorial capacity play a larger role. Several studies have examined the ecology and population dynamics of malaria vectors within the Kilombero Valley [[36, 38, 39]], but there are no studies of the population structure of An. gambiae s. s. and An. arabiensis in it. In this study therefore, a Bayesian clustering analysis was used and 13 polymorphic microsatellite loci originally designed for use in An. gambiae s. s.  were employed to characterise the genetic structure of each of the two malaria vectors throughout the Kilombero Valley. The hypothesis that physical distance between collection sites may affect gene flow was tested and discrete genetic units within each of the two species were created.