Simulations: genomic signatures of successful control
The population genetic parameters from the simulated data sets are shown in Fig. 1. Strong population decline a long time in the past gave too few segregating sites for analyses (due to a long-term low Ne), so some curves are truncated. As expected, population declines lead to reductions in π, θW and ρ, and increases in Tajima’s D, with larger reductions in population size producing larger and more rapid effects on the population genomic statistics. (Simulating reductions even further in the past than 10,000 generations resulted in Tajimas’s D returning towards zero (results not shown); this reflects the fact that the population has remained small but stable for a long enough time equilibrium to be restored.) Importantly, for all scenarios, ρ responded faster and/or by a larger relative amount than the other three statistics, suggesting it can be a more sensitive measure of population control.
Empirical data from East Africa
Linkage disequilibrium
To compare linkage disequilibrium among populations and species, the analysis was restricted to the autosomes, and because inversions greatly increase linkage disequilibrium (see Additional file 2), SNPs in the segregating inversions in the samples (2La, 2Rb and 3Ra) were removed from all calculations in all species whether the inversion was segregating or not, to ensure only collinear regions of the genome were used. For each chromosome arm, pairwise linkage disequilibrium (r2) between polymorphic sites in different RADseq tag locations was calculated and averaged in bins from 103 to 107 bp (Fig. 2). For all four chromosome arms, linkage disequilibrium is substantially higher in the Kilifi population of An. gambiae than in any of the other populations. In these other populations linkage disequilibrium is at background levels (i.e., equal to that between polymorphisms on different chromosomes – Fig. 2d) at distances of 1–10 kb or greater, whereas for An. gambiae from Kilifi this only occurs at distances greater than 1 Mb. Linkage disequilibrium between polymorphisms in the same RADseq tag location (length 110 bp) were also analysed, though there are many fewer such pairs of polymorphisms and so data was combined across the chromosomal arms (Fig. 2e). There is some tendency for linkage disequilibrium to be higher in Kilifi An. gambiae than in other populations, particularly for markers separated by 50–110 bp, though the difference is not as dramatic.
Population recombination parameter ρ
In Fig. 3, estimates of ρ and ρ/θW for each species, sampling location and chromosome arm are shown. For ease of comparison, θW and Tajima’s D from [13] are also given. Setting aside the population of An. gambiae from Kilifi, one can see that ρ is relatively consistent across species, populations and chromosome arms, with the exception of 2L for An. gambiae in Muheza and 2R for An. arabiensis in Moshi, for both of which ρ is much reduced (Fig. 3a). Both these samples have segregating inversions in the relevant arm which presumably accounts for the reduction, though it is not clear why the effect is not seen for 2R in the other populations of An. arabiensis. ρ is also somewhat reduced in 2R for An. gambiae in Muheza; there is an inversion segregating on this arm in East African An. gambiae, which, even if not segregating in these samples, would reduce recombination and could account for the lower value. Apart from these exceptions, the ratio ρ/θW is close to its neutral expectation (=r/u), though consistently somewhat lower (Fig. 3b), perhaps because r has been over-estimated, or u under-estimated, or because of population structure [30].
The results from the Kilifi population of An. gambiae are markedly different, with ρ low in all arms. Comparison with the Muheza population demonstrates a statistically significant difference (paired t test, t = 2.39, p = 0.04). The reduction in ρ in Kilifi vs Muheza is between 11 % (2L) and 99.9 % (3R), with an average across chromosomes of 82 %. This difference is substantially greater than the reductions seen in diversity measures π and θW. As a result, the ratio ρ/θW for An. gambiae from Kilifi is more than 1000-fold lower then neutral expectations. The comparison among populations and species is perhaps clearest for chromosome arm 3L, which does not have segregating inversions in any of these three mosquito species (hatched bars in Fig. 3).
Estimating the timing and severity of population crash from ρ
To investigate the combination of timing and extent of population decline consistent with the observed value of ρ for Kilifi An. gambiae, population crashes of varying magnitude and at varying times in the past were simulated, in each case keeping the final value of π as close as possible to that currently observed in this population (π = 0.0081). Simulations of a population decline of 10−5 more than 100 generations in the past gave highly variable results and so were excluded from further analysis. For the remaining simulations a linear regression model was fitted using Fit Model in JMP v12.0.1 (SAS Institute Inc., Cary, NC). The best fitting model was
$${\text{Log}}10\,\left( \uprho \right) = 1.977 - 0.983 \times {\text{Log}}10\,\left( {{{{\text{N}}_{1} } \mathord{\left/ {\vphantom {{{\text{N}}_{1} } {{\text{N}}_{2} }}} \right. \kern-\nulldelimiterspace} {{\text{N}}_{2} }}} \right) - 0.717 \times {\text{Log}}10\,\left( {{\text{Generations}}} \right)$$
where ρ = ρ per base + 1/50,000 (to remove zero values), Generations = generations since population crash, N1 = pre-crash population size, N2 = post-crash population size. The model r2 was 0.884. A contour plot of the model is shown in Fig. 4, which also shows the contour line corresponding to the ρ value from Kilifi gambiae 3L [with the same transformation as the simulations; Log10 (ρ per base + 1/50,000)]. The ρ value for Kilifi is compatible with a range of population crash sizes from 5 × 10−4 to 4 × 10−6 corresponding to times in the past from 10,000 generations ago to ten generations ago. The first distribution of bed nets in Kilifi occurred 17 years before the mosquitoes used in this study were collected, which equates to ~170 generations. According to the model, if the crash occurred between 10 and 170 generations ago, this would correspond to a large crash of between 3 × 10−5 (at 170 generations ago) and 4 × 10−6 (at 10 generations ago) of the ancestral population. In terms of a reduction in mosquito numbers, a reduction of 3 × 10−5 would mean that a starting population of one million mosquitoes per square km would be reduced to just 30 mosquitoes per square km.
Runs of homozygosity
At least one long run of homozygosity (>4.5 Mb) was observed in 7 of the 13 Kilifi An. gambiae individuals, despite having RADseq data instead of full genome sequences. Examples of these are shown in Fig. 5. The runs of homozygosity in different individuals are in different genomic locations, and so cannot be attributed to a selective sweep. Anopheles gambiae from Muheza and all other species and populations do not show any such runs of homozygosity.