Lineage divergence detected in the malaria vector Anopheles marajoara (Diptera: Culicidae) in Amazonian Brazil

Background Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The "Folmer region" detects a single taxon using a 3% divergence threshold. Methods To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (<100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (~798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapá state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara.


Background
About 1.8 million species are known on Earth, including more than 1 million insects, 250,000 higher plants and 69,000 fungi [1]. The Amazon comprises much of this biodiversity and is considered the largest gene reserve in the world, with an estimated 14% of all plant and animal species within its boundaries [2]. Because speciation is not always accompanied by morphological change [3], the true number of biological species is likely to be greater than current estimates [4]. Genetic analysis plays an increasingly important role in identifying changes in population structure and elucidating taxonomic status and phylogenetic relationships.
Various methods have been used to investigate species delimitation and identifications in the Albitarsis Complex [11,20,21], culminating, most recently, in the recognition of six species [22], but see Bourke et al [17]. Genealogical analyses of complete mtDNA COI (Cytochrome oxidase I) sequences found that A. marajoara is paraphyletic, and may consist of at least two phylogenetic species [10] or lineages [3,23], one of which may be A. janconnae.
The mitochondrial genome has been used extensively in studies of molecular evolution [24] and the COI gene has resolved evolutionary relationships among closely related species for a wide range of taxa [25,26], including insects [27,28] and cryptic species complexes [29,30]. The Folmer region, 648-bp at the 5' end of the COI mitochondrial gene, has emerged as the standard barcode region [31,32]. Interspecific divergence within insects almost always exceeds 3% and this value has been used as a speciation threshold [33]. The true test of DNA barcode precision would include comparisons with sister species [34]. The utility of DNA barcoding among insects is still being debated, because of success in revealing cryptic species [31,35,36] on one hand, and the inability to reliably detect species boundaries on the other [36][37][38]. An estimated 20% failure rate has been noted at the species level due to non-monophyly [39], which increases among insects due to overlapping ranges of intra-and interspecific sequence divergences [40]. Together these findings suggest that the threshold level could be set lower than 3% to minimize false negatives [36].
Accurate morphological identification of the adult females of species within the Albitarsis Complex is virtually impossible, and the ITS2 has become a recognized molecular tool for identification [21,41,42]. ITS2 variation is low within a species due to homogenization and fixation while the overall fragment length is generally variable between species [20,[43][44][45]. As such it can usually resolve phylogenetic relationships at different taxonomic levels, and detect recently diverged taxa such as sibling species of mosquitoes [46].
Members of a species are rarely distributed homogeneously in space, and population subdivision can occur in response to geographical boundaries, social behaviour and genetic variation [47]. Patterns in biological sequence data that arise from ancestry can be useful in determining the structure and boundaries of a given species [48]. The objective of this study was to test the hypothesis of paraphyly [10] using a combined data set (white gene + COI) and to evaluate the population structure of A. marajoara to address the following questions: 1) is the proposed paraphyletic status supported; 2) what is the level of genetic differentiation between populations; 3) can lineages of A. marajoara be distinguished by the Barcode of Life (BOLD) 3% species threshold; and 4) can genetic differentiation be explained by demographic phenomena, geographic boundaries and (or) natural selection.

Mosquito collection
Adult female mosquitoes from seven localities spanning roughly 890 kilometers of a transect along the Amazon River were collected using Shannon traps adjacent to breeding habitats between Amazonas state near Urucara along the Amazon river and the tributary of Rio Paru entering the Amazon river. They were also collected from Itaituba, south of the Amazon River, Pará State, in 2005 (collection protocol approved by the New York State Department of Health IRB and Brazilian Instituto Evandro Chagas, Belém, Pará state Ethical Committee). Mosquitoes were identified morphologically using the key of Deane et al. [49] as A. albitarsis s.l. Previously extracted specimens collected between 1995 and 2001, using a human landing catch protocol approved by the University of Vermont and the Brazilian Instituto Evandro Chagas, Belém, Pará state Ethical Committee, from seven localities in northeastern Pará and Amapá states of Brazil and from Itaituba, south of the Amazon River in Pará State, were also included [50]. Genomic DNA was extracted using the Puregene DNA isolation kit (Gentra Systems) and maintained at Griffin Laboratory at -80°C. Ten to 27 mosquitoes per site were selected for DNA amplification and sequencing of the COI (Table 1). A subset of four to 15 samples for the white gene and four to seven samples for ITS2 were amplified and sequenced (Table 1).

Amplification and sequencing
A 1200-bp fragment of the COI gene was amplified using the forward primer UEA3 and the reverse primer UEA10 [28]. An 822-bp fragment from the 3' end using PCR primers: 2195D (5'-TGATTYTTTGGTCATCCNGAAGT-3'; a modification of C1-J-2195 for amplification in Collembola [51]) and FLY10A (5'-AATGCACTAATCTGCCA TATTAG-3'; a modification of TL2-N-3014 for amplification in Simuliid flies [52]) was previously amplified for the northeastern collection samples [50]. Individual PCR reactions were preformed using Ready-To-Go-PCR bead (Amersham Pharmacia/Biotech, NJ, USA) and run on a PTC 100 or 200 series thermal cycler (Biorad, Inc.), or a PTC-100 (MJ Research, Inc.), using the conditions stipulated in Mirabello and Conn [53]. The Applied Genomics Technology Core (Wadsworth Center) carried out the sequencing. The forward and reverse COI sequences were aligned using Sequencher 3.0 (Gene Codes Corps, MI, USA), grouped together by sight and trimmed in PAUP*, version 4.0 [54]. The complete overlap of both COI primer sets created a 488-bp fragment (Additional file 1).
An 800-bp fragment of the white gene was amplified using the W2R and WF primers, with the PCR conditions as reported in Mirabello and Conn [55]. A 500-bp fragment of the ribosomal ITS2 was amplified using the primers 18S and 28S with the parameters in Li and Wilkerson [20]. The PCR products were cleaned, sequenced and aligned creating a 496-bp fragment of the white gene and 361-bp of the ITS2 with complete forward and reverse overlap for each marker. Unique sequences for all markers are available in GenBank [GenBank: HQ026025-HQ026113].

Phylogenetic relationship
All A. albitarsis s.l. sequences were identified to species by multiple gene ( Genealogical trees were estimated using the concatenated (white + COI) data set. ITS2 sequences were excluded from these analyses because of relatively limited sample size. Maximum Parsimony trees were generated in PAUP*, with one hundred replicates of a heuristic search performed with an initial random stepwise addition of sequences and TBR branch swapping. Branch support was estimated from 1,000 replicates of a bootstrap search. Bayesian inference (BI) analysis was performed with Mr. Bayes version 3.1 [56,57], partitioned by gene, using the model of nucleotide substitution (TPM3uf+G and TIM1+I) that best fit the white gene and COI respectively, determined with jModelTest [58,59]. The settings were two simultaneous, independent runs of the Markov Chain Monte Carlo (MCMC) for 4 million generations, sampling every 1,000 generations with a 'burnin' of 25%. The outgroup A. albimanus in the Albimanus Section of Nyssorhynchus (unpublished sequence) was chosen based upon its phylogenetic position in Sallum et al [60]. Estimates of time to coalescence were calculated for the COI fragment only and compared using θ S values [61] and BEAST [62].

Genetic variation
Genetic structure of two lineages was examined by analysis of molecular variance (AMOVA), a method of estimating population variance directly from molecular data, using Arlequin version 3.1.1 [63]. In addition, spatial analysis of molecular variance (SAMOVA), version 1.0 [64] was used to cluster the 488-bp COI sequence data into genetically and geographically homogeneous populations. SAMOVA generates F statistics (F SC , F ST , F CT ) using the AMOVA approach, into K groups to maximize the between group variation. SAMOVA estimates were computed for K = 2-13 with 1,000 simulated annealing steps from each of 100 sets of initial starting conditions.
A comparison of the pairwise divergences calculated by DnaSP between COI lineages using the 648-bp 5' end and the 488-bp 3' end tested the utility of the Folmer region and the 3% species threshold using 10 to 13 individuals from each lineage, three closely related taxa and an outgroup (Anopheles darlingi). Anopheles darlingi is more appropriate than A. albimanus for sister taxa comparisons as both it, and A. marajoara, are in the Argyritarsis Section [65]. A species screening threshold (SST) [66] may be more appropriate and can be calculated as 10× the mean intraspecific variation for a group [67]. The standard sequence threshold was calculated and examined for Nyssorhynchus using 4-58 archived Gen-Bank sequences for known species.

Population structure and demographic history
A statistical parsimony network estimated genealogical relationships among COI haplotypes with a 93% sequence identity and 95% identity for each of the nuclear markers using TCS 1.13 [68]. Homoplasy in all networks was resolved using the algorithm estimation rules in Crandall and Templeton [69]. The differentiation and polymorphism statistics for COI sequences by species or lineage and locality were computed in DnaSP, Version 4.0 [70], and the hypothesis of strict neutrality was examined using the statistics D T [71], D and F [72], and R 2 [73] which are based on the frequencies of segregating sites, and Fu's F S [74], based on the haplotype distribution. Tajima's D and both Fu and Li's D* and F* are the most effective tests to detect background selection, whereas Fu's F S and R 2 are among the most powerful tests to detect population expansion [73]. All neutrality tests were calculated using DnaSP, Version 4.0 or MEGA version 3.1 [75].
The mismatch distribution (simulated in Arlequin) is a frequency distribution of the observed number of pairwise nucleotide differences. The shape of the distribution is highly informative and able to differentiate between a population expansion and equilibrium [76], whereas the smoothness (raggedness statistic) distinguishes the fit of the empirical data to the model [77]. Statistically significant differences between observed and simulated distributions were evaluated with the sum of square deviations (SSD) to reject the hypothesis of demographic expansion [78].

Phylogenetic relationship
The estimated MP and BI trees, each with 42 parsimoniously informative sites, indicated two distinct lineages of A. marajoara with varying levels of support ( Figure 1). The more derived lineage 1 remains ambiguous (only moderate support, 0.66 and 52%); however it does have the broadest distribution and contains samples from near (~56 km) the type locality on Marajó Island, Pará state, Brazil [79]. Lineage 2, in contrast, is more restricted geographically and has higher support (1.00 and 89%, Figure 1). The relatively low BI branch support (0.66) for A. marajoara lineage 1 coupled with the moderate support among some of the subdivisions suggests this lineage is paraphyletic and that haplotypes that would otherwise increase lineage support are missing (extinct or not sampled). Within both lineages, smaller subdivisions, not related to geography, were apparent between the haplotypes that are separated by seven mutations in lineage 1 (between haplotypes 11 and 13) and 10 mutations in lineage 2 (haplotypes 48 and 49), illustrated in the network ( Figure 2).
The estimated time to coalescence for the two COI lineages was obtained using the equations, θ s = 2Neμ, [29] and 4N e = years since coalescence [80]. Based upon the 488-bp fragment, θ s is 7.530 (SD = 1.857). Therefore, the estimate of the time to coalescence is 232,476 -384,716 years ago, in the Pleistocene. A similar estimate using BEAST, the standard arthropod mtDNA mutation rate of 2.3% per million years [81], an applied HKY model with gamma distribution assuming constant size and a relaxed molecular clock, yielded an estimate of 3.07 mya. The discrepancy between coalescent estimates is likely due to the 10 generations per year that is factored into the Walton equation [29]. Although the incorporation of fossil calibration points can generate  Table 1). more reliable divergence estimates [82], no data were available because the mosquito fossil record is too recent and rare [83].

Genetic variation
The between lineage variance for the Folmer region of the two lineages of A. marajoara (2.94%) is marginally below the 3% BOLD threshold. However, it is comparable to among species comparisons within the Albitarsis Complex (e.g., A. albitarsis s.s. vs. A. oryzalimnetes, 2.64%; Additional file 2). The 3' end of the COI (above diagonal, Additional file 2) is less conserved, resulting in higher estimates of divergence. Among closely related Nyssorhynchus spp. (Table 2), the average intraspecific variation is 0.0116, and suggests that a SST of 11.6% might suffice within this subgenus. However, this estimate is limited as only 7 out of 32 currently recognized species [65] were available. Intraspecific nucleotide differences among Nyssorhynchus species were consistently less than 2%, indicating that the higher estimate (0.021) of combined A. marajoara lineages likely represents an overlooked species complex. AMOVA indicated that 61.94% (p = 0.0000) of the variance was explained by between-group variation of A. marajoara lineages 1 and 2. SAMOVA analysis, providing resolution for lineage 1 only, defined two groups that correspond to northeastern and western Amazonia, with 61.36% regional variation (Figure 3). Several COI haplotypes were found in both the northeastern and western populations of lineage 1 indicating at least some gene flow across the geographic barrier. High variation, especially between lineages 1 and 2, and between geographic populations in both COI and white genes, is strongly supported by population differentiation statistics, although the G ST values were not significantly different (Table 3). Furthermore, the K t values were much greater for the COI gene (11.9 and 6.96), compared with the white gene (1.0).

Population structure and demographic history
Analysis of mitochondrial data from 294 samples among seven riverine and seven non-riverine localities in Amazonas, Pará, and Amapá states, Brazil (Figure 3) led to the discovery of two co-occuring, but distinct A. marajoara lineages that are unable to be connected using statistical parsimony (Figure 2). Therefore, a median joining network that uses alternate algorithms to remove homoplasy was conducted and indicated that a minimum of 13 mutational steps separate the lineages, shown in Figure 2. Seven fixed mutations were estimated between lineages based on DnaSP. Additionally, A. oryzalimenetes was identified from Itaituba and from Macapá, a new record (Figure 3).
COI haplotypes were detected (Figure 2). Of the A. marajoara haplotypes, 26 (44.8%) were shared among localities, and 32 (55.2%) were unique (Additional file 3). The most common haplotypes were 1 (n = 16), 2 (n = 26), 3 (n = 20) and 5 (n = 78) of A. marajoara lineage 1, and haplotypes 43 (n = 9) and 46 (n = 9), 50 (n = 32), and 56 (n = 11) of A. marajoara lineage 2 ( Figure  2). There was a relatively high proportion of singletons (32, 55.17%) in the A. marajoara lineages combined, with 24/42 for lineage 1, and 9/16 for lineage 2. Overall, both A. marajoara lineages were indicative of a category II pattern [26] characterized by pronounced genetic gaps between some branches and the co-distribution of principle lineages over a wide area, which could theoretically arise in a species with large evolutionary N e (effective population size) and high gene flow. Lineage 1 contains a star shaped node surrounding haplotype 5, with short branches and an excess of singleton mutations, predominantly from the northeastern localities, which suggests a demographic expansion, background selection or selective sweep [74,84]. In contrast, lineage 2 indicates balancing selection with its longer branches, missing haplotypes and near equal distribution of shared haplotypes and single mutations, consistent with a signal of an older lineage. It could be argued that haplotype 50 (lineage 2, Figure 2) and the 5 singletons that arise from it also constitute a star-shaped node, which suggest recent expansion. Lineage 2 appears to be restricted to the westernmost localities. Statistical parsimony networks of nuclear data (white gene and ITS2) containing comparative numbers of specimens of both COI-defined lineages identified a single A. marajoara lineage ( Figure 4A, B). Haplotype clusters ( Figure 4A) correspond primarily to northeastern and western populations with a few outliers including haplotype L and two individuals in haplotype A. Both lineages retained the 4 th intron in the white gene, which is absent from all other Albitarsis Complex members [11]. Additionally, the ITS2 length was identical (361-bp) in all sequences, independent of the lineage. Copy number in microsatellite regions at positions 118 (GT) and 273 (GA) and the 2-bp indel at position 271 were not congruent with either lineage or geography, but consistent with findings by Li and Wilkerson [20,42]. Genetic polymorphism analyses of the COI sequences indicated that haplotype diversity was greater among lineage 1 populations, whereas lineage 2 localities exhibited slightly higher nucleotide diversity (Additional file 3). In lineage 1, there was greater diversity among the western localities (1-8), compared to northeastern Amazon (9)(10)(11)(12)(13)(14). The white gene exhibited lower nucleotide diversities overall compared to COI, and some differences in diversity between populations, particularly in Tartarugalzinho (locality 12), where haplotypes D and T are present, separated by 11 mutation steps (Additional file 4). Moderate nucleotide diversity in the COI data coupled with the single A. marajoara lineage detected with the white gene may reflect ancestral polymorphism as a result of co-occurence and haplotype mixing [85].
The high COI haplotype diversity combined with relatively low nucleotide diversity in both lineages (Additional file 3) suggests a population bottleneck followed by an expansion. The negative values for three neutrality tests in lineage 1 (Table 4) may reflect an excess of rare polymorphisms consistent with either positive selection or an increase in population size, although only one of these tests (F s ) was significant. In contrast, the positive values in lineage 2 indicate an excess of intermediatefrequency alleles in a population, which can result from either balancing selection or population bottlenecks [86]. Only the northeastern subdivision of lineage 1 revealed strong support for a population expansion event. The comparison between the northeastern and western populations within lineage 1 supports the SAMOVA findings of a geographic barrier, depicted in Figure 3.
The mismatch distribution model for sudden expansion was marginally significant for both lineages. Both exhibited a bi-modal distribution ( Figure 5) with a complete separation and significant raggedness values, suggesting constant population size [87]. However, an alternative interpretation would be that there were two expansions [88] dated to the Pleistocene (lineages 1 and 2, 142,203 (3,678-220,163) and 114,713 (8,545-230,635) ybp, respectively). The lineage 1 northeastern population ( Figure 2) expanded more recently, approximately 5,000 years ago, during the Holocene.

Discussion
The combined sequence results support divergence in A. marajoara depicting separate lineages, distinct from A. janconnae. The more ancestral lineage 2 appears monophyletic, whereas moderate substructure within lineage 1 suggests paraphyly. Nuclear markers, however, consistently depict a single A. marajoara lineage.
Discrepancies between mtDNA and nDNA are not enough to refute speciation, for example the newly recognized species A. janconnae and A. marajoara form a single lineage using the protein coding nuclear white gene and ITS2 sequences [20,21], but complete COI sequences [10] indicate monophyly of each taxon, and morphometrics analysis depicts A. janconnae as a separate taxon in the complex [22]. Nuclear mutation rates are generally slower than mitochondrial ones [89], but see Hellberg [90], thus the lack of divergence detected in the nuclear sequences in A. marajoara in the present study suggests either the differentiation is restricted to the mitochondrial genome, or it is recent and not yet visible in the nuclear genome. Given that lineage 1 has a broader distribution and includes samples from near the type locality, lineage 1 would be more appropriately named A. marajoara s.s. The status of lineage 2 is not yet resolved; it may be a new cryptic species belonging to the complex as suggested by Wilkerson et al. (2005) from Manaus, Brazil [41] and confirmed by JF Ruiz (personal communication) using the COI barcode region of the Albitarsis subgroup across South America.
Genetic variation between lineages was explained by 61.94% between group differences. Pairwise divergence estimates based on the 488-bp fragment confirmed 3.82% divergence between lineages (Table 3), which is comparable to the values observed among well-known Albitarsis Complex species (for example, 3.52% between A. albitarsis s.s and A. oryzalimnetes). These data may support cryptic or incipient speciation [91]. The effectiveness of the Folmer region and the BOLD threshold of 3% are questionable for at least some species of Nyssorhynchus because there is only moderate support between known species A. oryzalminetes and A. albitarsis s.s. If the average intraspecific variation of the Folmer region for the subgenus (11.6%) is used as a species threshold, the pairwise divergences of all sister taxa (including A. darlingi and A. albimanus) do not exceed this standard. Such a high SST is likely to result in false negatives even among recognized species, therefore separate intraspecific values and interspecific divergences were examined for resolution, where intraspecific COI variation was <2% and interspecific variation was between 6-10%, comparable to other anopheline species differences [ [92,93], unpublished data].
Pleistocene divergence in mtDNA has also been detected for a wide range of insect taxa across Central and South America, including L. longipalpis [94], H. erato [81], R. prolixus [95], and mosquitoes A. darlingi [53] and A. albimanus [85] suggesting climatic changes may be a common force driving Neotropical speciation. Pleistocene divergence coupled with paleoclimate records of the Miocene describing the world as warmer and exhibiting greater humidity and precipitation [96]; Table 3 Inter-and intra-population differentiation of A. marajoara lineages and populations on either side of the SAMOVA generated boundary. [97] support the more ancestral depiction of lineage 2. Pleistocene climatic changes, including lower temperature, precipitation and carbon dioxide levels than those exhibited today, may have contributed to the divergence of lineage 1 [98,99]. Lineages co-occur along the Amazon River with no obvious species diversity gradients, although data indicate several "high diversity localities" including Santarém, which is a recognized hotspot. However, this may be an artefact resulting from the relative logistical ease  white gene sequences and their corresponding haplotypes and B) 53 ITS2 sequences. There was no distinction between lineages; grey haplotypes are all from western localities (1-8 Figure 1; Table 1) and light-colored haplotypes are all from northeastern localities (9)(10)(11)(12)(13)(14).
of access to these localities and the many studies that have been conducted there [100]. The geographic barrier detected within A. marajoara lineage 1 is in the same region as the mtDNA divisions between northeastern and central western Amazon populations in A. darlingi [101], A. nuneztovari [102] and Atta species ants [103]. With similar distributions and geographic boundaries noted in both mosquitoes and ants, it seems unlikely that the boundary is the result of dispersal abilities. The detection of shared haplotypes in the present study on both sides of the barrier indicates a permeable boundary, suggesting the more geographically restricted lineage 2 may be an artefact of incomplete sampling, or perhaps a result of climate variation.
Climate, particularly rainfall, is a strong descriptor of broad-scale species-richness patterns in the tropics [104]. Vasconcelos et al [105] noted a restricted geographic distribution of ants in the western part of the basin, which to a great extent reflected a west to east gradient of decreasing rainfall. Additionally, Sombroek [106] explained an intricate pattern of continuous rainfall in the western interior of the Amazon, in contrast to the coasts, with a pronounced dry season containing a dry belt or corridor occurring near Rio Jari, Amapá state (near the SAMOVA barrier for A. marajoara lineage 1). Miocene flooding [103] and subsequent Pleistocene climatic events may explain the porous nature of the barrier.
Recent deforestation and habitat fragmentation resulting from the 1967 initiation of the single largest forest plantation in Latin America, an agro-forestry venture known as the Jari project [107] in northern Brazil could have contributed to an increase in A. marajoara abundance in this region. Clearing of forest, in combination with an increase in human and domestic animal host abundance, may have led to an A. marajoara population increase in Macapá, the capital of Amapá -state, in northeastern Amazonian Brazil [15].
Among the riverine localities, the hypothesis of sudden population expansion could not be rejected, although the mismatch distributions were smooth and bimodal for both lineages. The reduced genetic variation of A. marajoara lineage 1 in the northeastern Amazon region coupled with the unimodal distribution and the star-like pattern of the haplotype network are consistent with a rapid population expansion in the northeast. A. marajoara tends to breed in marshy sunlit pools and may have expanded into the northeast where savannah and agricultural habitats would be favorable. The population expansion occurred during the Holocene, possibly indicating a recent colonization. This explanation is supported by the expansion of savannah during periods of fire-associated drought and extinction-recolonization of rainforest tree populations in lowland areas during the Holocene [108][109][110]. Moderate gene flow in lineage 1 may hamper local vector control in Amazonas, Pará and Amapá states because of potential reinvasion and the spread of insecticide resistant genes [111].

Conclusions
Anopheles marajoara is an important vector in the savannah areas of South America [13], and with some of the highest incidences of malaria occurring in the states of Pará and Amapá [14,15,112] it is likely that A. marajoara will continue to increase its regional importance in malaria transmission. Additional studies will be needed to determine whether the subdivision resulting in the two co-occurring lineages in A. marajoara is strictly the product of genetic variation and evolutionary processes impacted by geographical barriers, or variations in ecology and behaviour that may have lead to niche divergences and microallopatry [113][114][115][116]. If both lineages are implicated as vectors, the characteristics of their breeding sites, their behaviours and their migration history could be used to predict changes in malaria transmission patterns in the Amazon basin [15] and other endemic regions, and provide useful information for targeted vector control strategies.

Additional material
Additional file 1: Complete COI with fragment orientation and overlap. Representative schematic of the COI gene, Folmer region and fragments; light blue bar indicating the region amplified by primers UEA3 and UEA10; the purple bar depicting fragment previously amplified from primers 2195D and C1-J-2195. *, denotes the piece of the COI that was used for the phylogeography and population structure analysis.
Additional file 2: D XY with Jukes Cantor pairwise divergence between species and lineages based upon the 648-bp Folmer region (below) and the 488-bp COI fragment (above).
Additional file 3: Description of shared COI haplotypes and genetic polymorphism statistics for A. marajoara lineages and A. oryzalimnetes. N, the number of sequences; 1-Σf i 2 , haplotype diversity; Π, nucleotide diversity; SD, standard deviation; P, polymorphic sites; and K, average number of differences. Underlining indicates shared haplotypes and bold indicates diversity among lineages or species.
Additional file 4: Description of shared white haplotypes and genetic polymorphism statistics for A. marajoara s.l.