Remarkable diversity of intron-1 of the para voltage-gated sodium channel gene in an Anopheles gambiae/Anopheles coluzzii hybrid zone

Genomic differentiation between Anopheles gambiae and Anopheles coluzzii - the major malaria vectors in sub-Saharan Africa - is localized into large “islands” toward the centromeres of chromosome-X and the two autosomes. Linkage disequilibrium between these genomic islands was first detected between species-specific polymorphisms within ribosomal DNA genes (IGS-rDNA) on the X-chromosome and a single variant at position 702 of intron 1 (Int-1702) of the para Voltage-Gated Sodium Channel (VGSC) gene on chromosome arm 2 L. Intron-1 sequence data from West and Central Africa revealed two clearly distinct and species-specific haplogroups, each characterized by very low polymorphism, which has been attributed to a selective sweep. The aim of this study was to analyse Int-1 sequence diversity in A. gambiae and A. coluzzii populations from the Far-West of their range, in order to assess whether this selective-sweep signature could persist in a zone of high interspecific hybridization. A 531 bp region of VGSC Int-1 was sequenced in 21 A. coluzzii, 31 A. gambiae, and 12 hybrids from The Gambia and Guinea Bissau, located within the Far-West geographical region, and in 53 A. gambiae s.l. samples from the rest of the range. Far-West samples exhibit dramatic Int-1 polymorphism, far higher within each country than observed throughout the rest of the species range. Moreover, patterning of haplotypes within A. coluzzii confirms previous evidence of a macro-geographic subdivision into a West and a Central African genetic cluster, and reveals a possible genetic distinction of A. coluzzii populations from the Far-West. The results suggest a relaxation of selective pressures acting across the VGSC gene region in the hybrid zone. Genetic differentiation in the Far-West could be attributable to a founder effect within A. coluzzii, with subsequent extensive gene flow with secondarily-colonizing A. gambiae, potentially yielding a novel insight on the dynamic processes impacting genetic divergence of these key malaria vectors.


Background
Mosquito species belonging to the Afro-tropical Anopheles gambiae complex represent a valuable model for studies of ecological speciation ('speciation with gene flow') [1,2]. In fact, owing to the major role of some of these species in malaria transmission, their genetic divergence has been studied extensively for more than half century, revealing repeated events of 'ecotypic speciation' [3,4]. These studies have revealed the existence of morphologically indistinguishable, but chromosomally/ genetically distinct species with very different roles as malaria vectors (A. gambiae sensu stricto, Anopheles arabiensis, Anopheles quadriannulatus, Anopheles amharicus, Anopheles melas, Anopheles merus, Anopheles bwambae) [5,6]. In addition to these sibling species, which are isolated by both pre-and post-mating mechanisms, more recent studies have highlighted a further subdivision within the nominal species A. gambiae s.s., the most synanthropic and efficient malaria vector of the complex [7,8]. These studies have revealed the existence of two taxonomic units initially named the 'M' and 'S' molecular forms and now formally raised to species as Anopheles coluzzii and A. gambiae s.s. (hereafter referred as A. gambiae) [9], respectively. The two species are isolated only by (partially understood) pre-mating mechanisms and show limited genomic differentiation, which is localized most prominently in low-recombination pericentromeric regions of chromosome-X and of the two autosomes [10][11][12][13]. These regions have been postulated to represent 'genomic islands of speciation' , expanding in size by selection across linked loci connected to reproductive isolation, as predicted by ecological speciation with gene flow models [1,2]. However, their large size is probably enhanced by locally reduced recombination [10,14,15], leading to suggestions that high centromeric differentiation is primarily a result of recurrent background selection and hitchhiking unrelated to speciation [2,16]. Interestingly, the species-specific linkage disequilibrium among the three physically-unlinked centromeric regions is widespread [13], but has been lost at the western extreme of their range (i.e. the 'Far-West' , from The Gambia to Guinea Bissau). In this putative secondary contact zone between the two species, high frequencies of hybrids are found and pronounced inter-specific differentiation is maintained only on the chromosome-X centromere [17][18][19][20][21].
The first evidence of genetic linkage between the chromosome-X and chromosome-2 genomic island regions came from pre-genomic studies showing that the SNPs within the ribosomal intergenic spacer region (IGS-rDNA), which define the two species [7], are in strong linkage disequilibrium (LD) with a SNP at position 702 of Intron-1 (hereafter named Int-1 702 ) of the para Voltage-Gated Sodium Channel (VGSC) gene [22,23]. The VGSC gene has been studied extensively because of the presence of two mutations in the exon immediately downstream of Int-1, causing a change from Leucine to Serine (L1014S) or to Phenylalanine (L1014F). Each mutation can confer knock-down resistance (kdr) to DDT and pyrethroid insecticides [22,23]. In West and Central Africa the Int-1 702 SNP, which exhibits diagnostic nucleotides for each species (i.e. A. coluzzii = Int-1 C ; A. gambiae = Int-1 T ), defines two clearly distinct and species-specific haplotype groups each characterized by very low polymorphism. This led to the hypothesis of a selective sweep, pre-dating kdr mutations, centred on favourable variants in nearby genes that might contribute to the segregation of the two species [23].
The aim of this study was to investigate sequence diversity of the Int-1 region of the VGSC in A. gambiae and A. coluzzii populations from the Far-West to assess whether the selective-sweep observed in the rest of the species range is maintained in the face of high interspecific gene flow.

Methods
Anopheles gambiae sensu lato sample identification and kdr genotyping Anopheles gambiae s.l. specimens were identified as A. arabiensis, A. quadriannulatus, A. melas or A. merus using the method of Scott et al. [24], and as A. coluzzii or A. gambiae s.s. by using both PCR-RFLP of the intergenic spacer (IGS) rDNA region [25] and the SINE-PCR [26] methods. The latter method is based on an A. coluzzii-specific and irreversible single-locus insertion of a SINE200 retrotransposon in the X-chromosome centromeric region, about 1.5 Mb from IGS region containing A. coluzzii vs A. gambiae species-specific SNPs. The kdr 1014 locus was genotyped by either allelespecific-PCR (AS-PCR) or Hot Oligonucleotide Ligation Assay (HOLA) methods [27][28][29] in A. coluzzii and A. gambiae. Genotypes of kdr 1014 position in A. arabiensis, A. quadriannulatus, A. melas and A. merus were obtained by direct sequencing of a 531 bp fragment [27].

Data analysis
DnaSP v. 5.10.01 [33] was used to produce estimates of Int-1 polymorphism and perform Tajima's D and Fu and Li's D* and F* [34,35] neutrality tests. Genealogical relationships among Int-1 haplotypes were reconstructed by computing parsimony networks using TCS 1.21 [36]. Haplotypes names were retained for those previously reported in [23], while novel M1-and S1-related haplotypes were either named with consecutive numbers or with specific codes when exclusive of the Far-West region (i.e. GU for Guinea Bissau, GA for The Gambia, and GUGA for both countries). Frequency and distribution of Int-1 haplotypes in the Far West, West and Central African geographic regions in A. coluzzii, A. gambiae and putative hybrids (by merging original and previous data from [23]) were also computed. Inter-and intra-specific Fst estimates of genetic differentiation were calculated using Arlequin 3.11 [37].

Results
A 531 bp region of the VGSC gene Int-1 was sequenced in 21 A. coluzzii, 31 A. gambiae, 12 A. gambiae x A. coluzzii hybrids from the Far-West region. These sequences were aligned with those obtained by Gentile et al. [23] and with 53 additional original sequences from A. coluzzii and A. gambiae populations from the rest of the distribution range, as well as from other species of the complex (Additional file 1: Table S1), resulting in a total of 490 sequences. Note that A. coluzzii and A. gambiae were identified by both the IGS marker and SINE insertion, which provided completely consistent identification in all samples, with the notable exceptions of those from The Gambia (55% inconsistent) and Guinea Bissau (51% inconsistent). Inconsistently identified individuals were classified based on the SINE marker, as this is not biased toward the hybrid genotype by intra-chromosome recombination known to occur in the IGS multicopy rDNA region [38]. Specimens were genotyped for kdrmutations: all carried the wild type/insecticide kds susceptible allele of VGSC gene (i.e. TTA) with the exception of two out of eight individuals from Rwanda carrying the TCA allele (hereafter L1014S) in homozygosis and three carrying the TTT allele (hereafter L1014F) in heterozygosis (Additional file 2: Table S2).
Variable sites among haplotypes, and their relationships among different members of the A. gambiae complex, are shown in Figure 1 and in Figure 2, respectively. All species are well-separated from each other and from A. coluzzii and A. gambiae (except for one A. arabiensis individual from Senegal sharing the S1-haplotype with A. gambiae). Interestingly, A. gambiae carries a C to T mutation at site 702 (hereafter Int-1 702 , in red in Figure 1) separating it from all other species. This mutation was already shown by Gentile et al. [23] to separate "S-molecular form-S1 (Int-1 T )" from "M-form-M1 (Int-1 C )" haplotypes and closely-related locale-specific variants. In the Gambian and Guinean samples, however, A. coluzzii-specific Int-1 C haplotypes are also found in A. gambiae, but no A. gambiae-specific Int-1 T haplotypes are found in A. coluzzii (Figures 2b and 2c; Additional file 2: Table S2). This result does not change if the species are identified using the IGS marker rather than by SINE-PCR, as presented so far.
Overall, Int-1 haplotypes are not uniformly distributed across the ranges of A. gambiae and A. coluzzii ( Figure 3). Populations from the Far-West region (i.e. Senegal, The Gambia, Guinea Bissau, Guinea Conakry) differ from all others by the exclusive presence of the M3 haplotype (freq = 18% in A. gambiae, 32% in A. coluzzii and 16% in hybrids). Moreover, populations from The Gambia and Guinea Bissau possess 15 exclusive haplotypes (freq = 20% in both A. gambiae and A. coluzzii, 29% in hybrids), highlighted in yellow in Figure 2. M1 is the most frequent haplotype in both West and in Central African A. coluzzii populations, while the M5 haplotype reaches frequencies up to 30% in West African samples and is absent in Central ones. Haplotype S1 is almost the sole haplotype found in western A. gambiae populations (Mali, Ivory Coast, Burkina Faso and Nigeria; freq = 99%), whereas S1-related haplotypes are much more frequent in central ones (i.e. Cameroon and Angola; freq. = 22%). Finally, both S1 and the A. coluzzii-specific M1 haplotype are found at comparable frequencies (nearly 50% each) in A. gambiae from Rwanda (Additional file 2: Table S2).
Differentiation between A. coluzzii and A. gambiae in Central Africa (Fst = 0.69) is comparable to that observed between each species and A. arabiensis (Fst = 0.70 and 0.77, respectively), but decreases westwards (Fst in West-Africa = 0.51 and in Far-West region = 0.16) (Figure 4). Intraspecific Fst values are slightly lower between West and Central populations of each species (Fst = 0.04 and 0.14) than between Far-West samples and West and Central ones (Fst ranging from 0.19 to 0.31) (Figure 4). Table 1 shows genetic diversity and summary statistics for A. coluzzii and A. gambiae samples from Far-West, West, Central Africa, as well as from Rwanda, including original data and those from West/Central Africa reported by Gentile et al. [23]. In general, nucleotide/ haplotype diversity is low in samples from West and Central Africa (π = 0.02-0.21), but increases in the Far-West region (π = 0.47-0.49). Statistics applied to detect departures from neutrality reveal a weak trend toward positive values in A. gambiae and A. coluzzii (and hybrids) from "Far-West", whereas negative values were scored in both species from West and Central Africa

Discussion
Data on Int-1 of the VGSC gene highlight dramatic genetic differences between A. coluzzii and A. gambiae populations from West and Central Africa and those from the western extreme of the species range, where inter-specific geneflow is elevated and inter-specific differentiation reduced.
In West and Central Africa, Int-1 genetic differentiation between A. coluzzii and A. gambiae is high and comparable to that observed in their sibling A. arabiensis ( Figure 4). Across this wide geographical region, as previously shown [23], the two species exhibit low nucleotide/haplotype diversity and are strongly segregated based on two main Int-1 haplotypes (A. coluzzii-M1 and A. gambiae-S1, separated by a single C-T mutational 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9  6 8 9 0 0 1 1 1 2 2 3 5 5 6 7 7 8 8 0 1 2 4 4 5 5 7 7 7 7 7 8 9 9 9 9 9 9 9 9 9 0 0 0 0 1 2 3 5 6 6 6 7 8 0 1 1 2 2 2 3 3 3 5 6 7 8 8 9 9 4 4 6  5 6 1 3 9 0 3 6 4 5 0 3 5 5 2 3 2 3 8 0 5 6 9 3 4 3 6 7 8 9 4 1 2 3 4 5 6 7 8 9 0 1 2 3 0 5 4 0 2 6 9 5 3 4 1 4 4 7 8 0 5 9 5 9 1 1 6 2 6 0 3 8 Figure 2 Parsimony-based networks of genealogical relationships among VGSC Intron-1 haplotypes in the Anopheles gambiae complex. a) Network built with original data collected from all studied species of the A. gambiae complex, with the exception of Anopheles melas and Anopheles merus, whose haplotypes exceeded the 95% threshold of TCS connection limit. Haplotypes are represented by pies whose sizes are proportional to frequencies in the sample and coloured as follows: blue = Anopheles arabiensis (AR), red = Anopheles quadriannulatus (QD), violet = Anopheles coluzzii, green = Anopheles gambiae (haplotypes are named and numbered according to Gentile et al. [23]), yellow = private haplotypes from either Guinea Bissau (GU), The Gambia (GA), or from both Countries (GUGA); sequential codes are used to name A. coluzzii (i.e. M7, M8) and A. gambiae (S7-S10) novel haplotypes not exclusive to The Gambia and Guinea Bissau; M1, M3, M5, S1 and S7, which are not completely segregated between the two species in the Far-West and/or Rwanda, are shaded. Below: VGSC Int-1 network for M (A. coluzzii) and S (A. gambiae) molecular forms as in Gentile et al. [23]; b) and c) networks only including A. coluzzii, A. gambiae and hybrids haplotypes from The Gambia and Guinea Bissau, respectively. White squares report numbers of alleles for A. coluzzii (M), A. gambiae (S) and hybrids (MS) (identified based on SINE-PCR [26]) included in each haplotype. step at site Int-1 702 ) and species-specific rare variants stemming from these. To explain this low polymorphism, a selective sweep centered on a favourable variant in a nearby gene was suggested [22,23]. Since the VGSC gene includes mutations conferring kdr resistance to insecticides, a possible hypothesis is that Int-1 haplotypes are in linkage with these strongly selected alleles. Introgression of kdr alleles and adjacent genomic regions has repeatedly been reported from different geographic areas [19,39]. Indeed, whole genome sequence data have recently shown that over 3 Mb has introgressed from A. gambiae to A. coluzzii in Ghana along with the kdr L1014F mutation [40]. It is possible that selection on kdr-associated Int-1 haplotypes may have had a role in reducing diversity, as kdr-resistance has been reported in some of the A. gambiae analysed populations particularly from West Africa [41,42] [but not in the Far-West region, Pinto et al., unpublished observations; see discussion below]. However, the close physical proximity of kdr and Int-1 would make recombination between resistant and susceptible kdr alleles and their linked Int-1 polymorphisms highly unlikely over a short timescale. Moreover, in A. coluzzii, as the samples analyzed came from populations where kdr-alleles were either absent or present at moderate to low frequencies (i.e. populations from Benin, 40.0%, Nigeria, 19.5% and Cameroon, 6.3% [42]), and are  thus likely to have introgressed very recently [19,22,39]. Recent genomic studies have given additional hints for understanding the reduction in genetic variation at Int-1 and its linkage disequilibrium with markers on the physically unlinked X-centromeric region defining the two species. In fact, as already mentioned, the VGSC gene is located within the chromosome-2 "genomic island" of highest divergence between A. coluzzii and A. gambiae. Under the "speciation island" [10] scenario, it can be hypothesized that a hitchhiking effect on Int-1 has occurred due to diversifying selection on a chromosome-2 "island" gene participating in the building-up of premating barriers, or conferring differential ecological adaptation and niche segregation between A. gambiae and A. coluzzii. Alternatively, under the "incidental island" scenario [2,13,16], substitutions at Int-1 702 may have become fixed after species splitting and accumulated little genetic differentiation due to reduced recombination in the chromosome-2 centromeric region. Interestingly, however, the association between chromosome-X and −2 "islands" is neither observed in Rwanda (Additional file 2: Table S2) nor in Tanzania [41]. In these East African sites, both M1 and S1 haplotypes were found segregating in A. gambiae populations. Thus, if Int-1 C represents the ancestral allele in the A. gambiae complex (Figure 1), then Int-1 C/T may be considered an ancestral polymorphism retained in A. gambiae populations from East Africa (where A. coluzzii is absent), which became fixed in westward sympatric areas after the splitting of the two species.

M1 T C T G G G A A A A A A T A C T C T G C T A A T A T ---G T G C C A A C G C A A T C C A C T G A T C G C C A A T C G G A G C T T C T T C C C
In the Far-West region, a strong reduction of interspecific genetic divergence between A. gambiae and A. coluzzii is found -as indicated by the lower Fst observed in this region (0.14) as opposed to the rest of the range (0.51-0.69) (Figure 4) -and a preferential introgression of M1-related Int-1 haplotypes ("typical" of A. coluzzii) into A. gambiae is observed. These data are consistent with previous studies showing weak association between chromosome-X and −2 centromeric regions and occurrence of asymmetric introgression from A. coluzzii into A. gambiae in the westernmost extreme of their range [17][18][19][20][21]. Furthermore, 15 exclusive Far-West haplotypes were inferred through PHASE [32] and found interspersed and connected to M1 and S1 geographically widespread variants and to the Far-Westspecific M3 (Figure 2). The presence of such private haplotypes might indicate that selective pressures on the chromosome-2 centromere observed in Central African populations (Table 1) are relaxed in the Far-West. Note that, although recombination along the centromeric 500-bp Int-1 fragment analyzed would normally be considered minimal, some reduction in accuracy might occur when reconstructing haplotypes using the PHASE algorithm in the Far-West region, where LD along the 2 L-centromere is known to be lower than in the rest of the species range [19]. However, the PHASE results are supported by summary statistics (Table 1) also indicating extreme Int-1 diversity and recent introgression events in Far-West populations of both species.
There are contrasting possible explanations for the remarkable Int-1 polymorphism observed in the Far-West region. Under the 'speciation island' hypothesis [10] relaxation of diversifying selection on a key isolating trait on chromosome-2 centromeric "island" (of which Int-1 is a part) may have contributed to weaken pre-mating barriers between A. gambiae and A. coluzzii and promote a higher rate of gene-flow in the Far-West region. This hypothesis, however, is in contrast with data from other West and Central African areas, where introgression from A. gambiae to A. coluzzii of a kdr-related genomic portion in linkage with Int-1 does not produce an increase in hybridization rates [21]. Alternatively, the genomic region linked to Int-1 may be not related to speciation [2,16] and the observed pattern in the Far-West region could be attributed to a relaxation of purifying selection operating separately within each species on adaptive genetic traits not directly (or only weakly) involved in reproductive isolation. Hence, following this hypothesis, increased Int-1 polymorphism in the "Far-West" region might be the consequence of an increased recombination rate within the 2 L-centromeric "island" (and Int-1) following disruption of linked (background) selection. Resolution of these competing hypotheses requires assessment of the role of hybridization on the extent of linkage disruption throughout the 2 L-"island" and understanding of whether and how this might affect association with traits critical to speciation.
Finally, the frequency and distribution of Int-1 haplotypes within A. coluzzii across its range provides some hints on further intra-specific geographical patterns ( Figure 3). In fact, populations from the West and Far-West regions are characterized by the exclusive presence of haplotype M5, not observed in those from Central Africa. This is consistent with results obtained by other nuclear markers (e.g. microsatellites) showing a macrogeographic subdivision into two distinct West and Central African genetic clusters, corresponding to the forestsavannah biome transition, which may have acted as an ecological barrier to gene flow [26,43,44]. Moreover, the high frequency of the Far-West exclusive M3-haplotype (separated from the major and widespread M1-haplotype by 5 mutational steps) allows speculation that a founder effect (followed by either selection or drift) affected A. coluzzii populations colonizing this region in the past. This last point merits further investigation through a multi-locus approach at a wider genome scale to shed light on the genetic characteristics of source populations originating the A. gambiae/A. coluzzii hybrid zone.