A microarray-based system for the simultaneous analysis of single nucleotide polymorphisms in human genes involved in the metabolism of anti-malarial drugs

Background In order to provide a cost-effective tool to analyse pharmacogenetic markers in malaria treatment, DNA microarray technology was compared with sequencing of polymerase chain reaction (PCR) fragments to detect single nucleotide polymorphisms (SNPs) in a larger number of samples. Methods The microarray was developed to affordably generate SNP data of genes encoding the human cytochrome P450 enzyme family (CYP) and N-acetyltransferase-2 (NAT2) involved in anti-malarial drug metabolisms and with known polymorphisms, i.e. CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, and NAT2. Results For some SNPs, i.e. CYP2A6*2, CYP2B6*5, CYP2C8*3, CYP2C9*3/*5, CYP2C19*3, CYP2D6*4 and NAT2*6/*7/*14, agreement between both techniques ranged from substantial to almost perfect (kappa index between 0.61 and 1.00), whilst for other SNPs a large variability from slight to substantial agreement (kappa index between 0.39 and 1.00) was found, e.g. CYP2D6*17 (2850C>T), CYP3A4*1B and CYP3A5*3. Conclusion The major limit of the microarray technology for this purpose was lack of robustness and with a large number of missing data or with incorrect specificity.


Background
Drug action depends on how drugs are metabolized and differences in activity of metabolizing enzymes can significantly contribute to the efficacy of drugs [1,2]. This might also be true for drugs given to treat malaria. The objective was to analyse single nucleotide polymorphisms (SNPs) in genes encoding enzymes implicated in metabolizing anti-malarial drugs in order to determine the contribution of these enzymes to the pharmacokinetics of the specific drugs. There are many methods available for the detection of SNPs (for review see [3,4]). These methods are either based on allele-specific hybridization or on primer extension reaction. Many of these techniques are time consuming, expensive and/or not suitable for use in resource-poor countries. Previously, a cost-effective DNA microarraybased [5] technique to detect SNPs associated with drug resistance in malaria parasites in a larger sample size has been developed and successfully used. To test whether this system also could be used for SNP determination in metabolizing enzyme genes, microarray determined SNPs were compared with sequencing. For this, a microarray was developed to affordably generate SNP data on genes encoding the human cytochrome P450 enzyme family (CYP) and N-acetyltransferase-2 (NAT2) involved in antimalarial drug metabolisms. The performance of this microarray had to be determined subsequently. The microarray was designed to analyse enzyme genes with known polymorphisms such as CYP2A6 and CYP2B6 for the artemisinins, CYP2C8 for amodiaquine, chloroquine and dapsone, CYP2C9 and NAT2 for dapsone and sulphamethoxazole, CYP2C19 for dapsone and proguanil, CYP2D6 for chloroquine and halofantrine, CYP3A4 for the artemisinins, chloroquine, dapsone, halofantrine, lumefantrine, mefloquine, primaquine and quinine, and CYP3A5 for artemether, β-arteether, chloroquine, mefloquine, quinine and sulphadoxine . For certain antimalarial drugs (piperaquine, pyrimethamine and pyronaridine) the metabolic pathway is yet not well known, while others (atovaquone and doxycycline) are barely metabolized at all [30][31][32][33][34][35].

Study population
During an in vivo drug efficacy study in patients with uncomplicated malaria of all age, venous blood samples (anticoagulated using EDTA) were obtained after informed consent from 125 patients in Northern and Western Cambodia (64 in 2007 at Phnom Dék Health Centre, Rovieng district, Preah Vihear province, and 61 in 2008 at Pramoy Health Centre, Veal Veng district, Pursat province) and 149 patients in Central Tanzania (in 2008 at Kibaoni Health Centre, Kilombero district, Morogoro region).

Sequencing
Genomic DNA was extracted from 200 μl whole blood using the QIAamp 96 DNA Blood Kit (QIAGEN GmbH, Germany) according to the manufacturer's instructions.
Target sequences in cyp and nat2 genes of these samples were amplified using PCR. The PCR primers used are listed in Table 1. The amplified regions contained SNPs which are known to alter the function of enzymes involved in the metabolism of anti-malarial drugs (for target loci and effect of the SNP see Table 1). The PCR master mix contained 1 × reaction buffer B (Solis BioDyne, Estonia), 1 × solution S (Solis BioDyne, Estonia), 10 μM of each primer in 1 × Tris-EDTA (see Table 1, Operon Biotechnologies GmbH, Germany), MgCl 2 (according to Table 1, Solis BioDyne, Estonia), 2 mM of each dNTP in Tris-HCl 10 mM, pH 7.4 (GE Healthcare, Switzerland) and 2 U FIREPol DNA polymerase (Solis BioDyne, Estonia). 1 μl of extracted DNA was mixed with 24 μl of PCR master mix. The PCR protocol was 3 min at 96°C followed by 40 cycles (30 sec at 96°C, 1 min 30 sec at 56-64°C according to Table 1 and 1 min 30 sec at 72°C) with a final elongation for 10 min at 72°C. PCR products were purified and sequenced by Macrogen (Macrogen Ltd., Korea). ABI Prism AutoAssembler version 1.4.0 (Applied Biosystems) was used for assembly and analysis of sequences. The genotype of each patient was then assessed visually. Aliquots of the same PCR products were used for primer extension and microarray analysis.

Patient selection for DNA microarray validation
Of 274 patients from Cambodia and Tanzania, 96 were selected (i.e. to fit 96-well plate) for validation of the microarray. Samples were selected by the number of successfully sequenced SNPs and then their ID number. For all 96 selected samples at least 16 out of 18 SNPs have been successfully sequenced.

Extension control and elimination of non-incorporated nucleotides
As extension control for the microarray, amplified nested PCR product from the Plasmodium falciparum chloroquine resistance transporter gene (pfcrt) from strains 3D7 (wildtype at loci pfcrt76 and pfcrt97) and K1 (mutation at locus pfcrt76 and wild-type at pfcrt97) were mixed at a ratio of 55%:45%. Primers and PCR conditions have been described elsewhere [5].
To eliminate non-incorporated nucleotides prior to primer extension, all nested PCR products of one blood sample were pooled and 10 μl of the pooled PCR products and 0.5 μl of the extension control mix were digested with 8 U shrimp alkaline phosphatase (SAP) and 4 μl 10 × SAP buffer (both Amersham Biosciences, Freiburg, Germany) in a reaction volume of 48 μl for 1 h at 37°C. SAP was inactivated by incubating samples for 15 min at 90°C.

Primer extension and denaturation
Since the microarray scanner used only supported dualfluorescence measures simultaneously and because of the large similarity of the cyp genes, a strategy of three parallel reactions with different primer and dye combinations had to be applied. Different extension primer mixes (I, II and III) were prepared according to Table 2, each with a total volume of 320 μl containing the corresponding primers (62.5 nM final concentration of each primer, Operon Biotechnologies GmbH, Germany) diluted in 1 × Tris-EDTA (TE). Afterwards, three extension mixes (I, II and III) were prepared (final volume of 8 μl) containing 2 U of HOT TERMIPol ® DNA Polymerase (Solis BioDyne, Estonia), 1.8 μl of 10 × Reaction Buffer C (Solis BioDyne, Estonia), 2.5 mM MgCl 2 , and 0.625 μM of the corresponding Cy3-and Cy5-labeled ddNTPs (Perkin Elmer, Schwerzenbach, Switzerland); ddNTP mixes are listed in Table 3. Then, 12 μl SAP digested PCR product of each patient were mixed with 8 μl of the extension mix I, II or III, respectively. The following primer extension protocol was used: 1 min at 94°C followed by 40 cycles of 10 sec at 94°C followed by 40 sec at 50°C. The three mixes of each patient were then pooled again in a 96 well plate and mixed with 10 μl of the hybridization buffer. The hybridization buffer contained 37.5 μM EDTA pH 8.0, 7 pM of two differently labelled positive hybridization controls 5'-GCCTCCACG-CACGTTGTGATATGTA-[Cy3]-3' and 5'-CTGTGACAGAG CCAACACGCAGTCT-[Cy5]-5' (Operon Biotechnologies GmbH, Germany), and 3% sodium dodecyl sulphate

SNP primer Sequence MgCl 2 [μl] T [°C]
CYP2A6*2 (479T>A, L160H) forward reverse Primers highlighted in bold were used for sequencing. SNP positions are indicated in brackets; they were obtained from the Home Page of the Human Cytochrome P450 (CYP) Allele Nomenclature Committee [41], and the Consensus Human Arylamine N-Acetyltransferase Gene Nomenclature [44].
(SDS). The plate was incubated for 1 min at 94°C and immediately chilled on ice for 2 min.

Microarray production and microarray hybridization
Aldehyde-activated ArrayIt ® SuperAldehyde 2 glass slides with SuperMask™ 16 (EBN European Biotech Network, Dolembreux, Belgium) were used. Oligonucleotides (Operon Biotechnologies GmbH, Germany) corresponding to the antisense DNA of the extension primers, extension controls ( Table 2) and positive hybridization controls were spotted onto the microarrays in triplicate. The spotting was done by the DNA Array Facility of the Center for Integrative Genomics, University of Lausanne, Switzerland, using solutions of 50 μM oligonucleotide in 180 mM phosphate buffer (pH 8.0). All oligonucleotides had a C7-aminolinker attached to the 3' end. Anchor oligonucleotides prelabeled with Cy3 and Cy5 and four oligonucleotides with a random sequence were added as positive and negative controls, respectively. Of the pooled and denatured primer extension reaction mixture 35 μl were transferred to one well of the microarray, and 6 μl 20 × Standard Saline Citrate (SSC) were added. In each well, representing a single microarray, DNA from one patient was hybridized. Hybridization was carried out in a humid chamber at 50°C for 90 min. After hybridization, the slide was washed at room temperature in 2 × SSC with 0.2% SDS for 10 min, followed by a wash with 2 × SSC for 10 min, and a final wash with 2 × SSC plus 2% ethanol for 2 min. These three steps represent the first wash step. Slides were dried with compressed air.

Data acquisition
Microarrays were scanned at 635 nm and 532 nm using an Axon 4100A fluorescence scanner (Bucher Biotec AG, Basel, Switzerland). After the first scan slides were washed and scanned again and after each wash, Cy3 and Cy5 images were acquired and analysed using the Axon Gene-Pix Pro software (version 6.0). An in house developed perl script based on Kestler's statistics module [36] was used to call SNPs based on probe signal intensities. The script calculates receiver operating characteristic (ROC) curves using signal intensity values from the set of positive and negative controls for each hybridization. Hybridization specific thresholds that maximize both sensitivity and specificity were then used to make SNP calls.

Data comparison
SNP data gathered from sequencing and from microarray analysis were compared. Kappa statistics was used as an approach to the evaluation of agreement for the categorical data obtained by the two methods. The kappa index was calculated from contingency tables as follows κ = ((a The kappa index was interpreted based on the criteria of Landis and Koch [37]. Hardy-Weinberg equilibrium was tested using the chi-square Hardy-Weinberg equilibrium test calculator for bi-allelic markers of the Online Encyclopedia for Genetic Epidemiology studies [38].

Ethical approval
All the applied protocols were approved by the ethics committee of the two cantons of Basel (Ethikkommission beider Basel, EKBB) and the responsible local authorities (i.e. in Tanzania from the Institutional Review Board of the Ifakara Health Institute and the National Institute for Medical Research Review Board and in Cambodia from the National Ethics Committee for Health Research). Blood samples were collected following written informed consent in the respective local language (Khmer or Swahili) from the participants or their guardians.

Results
Agreement between results obtained from sequencing and from the microarray on 18 SNPs within eight cyp isoen-zyme genes and nat2 genes from 26 Cambodian and 70 Tanzanian malaria patients was tested. The results are summarized in Table 4. For some SNPs agreement ranged from substantial to almost perfect, whilst for other SNPs a large variability from slight to substantial agreement was found.

Discussion
Comparison of data generated by microarray analysis with sequencing showed that the performance of the DNA  [40]. Furthermore, among most of these SNPs a considerable number of patient samples failed to yield a signal on the microarray (e.g. CYP2D6*17 (2850C>T) and CYP CYP3A4*1B). The trend was clearly higher in samples from Tanzania, which might also be due to a lower quality of DNA arising from sub-optimal storage conditions after blood withdrawal. Sequencing data agreed with published reference sequences from public sources (Human Cytochrome P450 (CYP) Allele Nomenclature Committee [41]. The majority of samples genotyped by sequencing was found to be in Hardy-Weinberg equilibrium (exceptions were CYP2B6*5 in Cambodians and CYP2D6*4 in Tanzanians), proving that sampling was unbiased. The cases where population data generated by microarray was found not to be in Hardy-Weinberg proportion could be attributed to the large number of patient samples that failed to yield a signal on the microarray resulting in smaller simple size.
Because cyps evolved out of a single ancestor [1,42], they show very close sequence similarities, which in turn makes it difficult to design gene specific primers, in particular extension primers that have to be designed at a defined position. It, therefore, became almost impossible to develop a single multiplex PCR and thus the microarray method described here is time consuming and laborious. This is in contrast to a similar microarray developed for the analysis of drug resistance associated SNPs in P. falciparum genes that permits the simultaneous analysis of many SNPs in hundreds of samples in a very short time period (approximately 15 h for four 96-well plates) with significantly reduced costs compared to other systems [5].
Furthermore, the costs of sequencing have decreased considerably during the last years and this trend may well continue. On the other hand, the costs of microarray reagents (especially Cy3-and Cy5-labeled ddNTPs that are used in three combinations) and glass slides for arraying have increased and are unlikely to decrease over time. So the costs for the microarray technology are not considerably lower anymore and, therefore, overall costs of both methods have become comparable.
While SNP analysis microarray has been successfully used to analyse point mutations in drug resistance associated genes in Plasmodium [5,43], it seems to fail with closely related genes, such as the human cyp genes. For the latter, sequencing appears to be a more reliable method.

Conclusion
Although microarray allows the simultaneous determination of many SNPs, the lack of robustness for the described approach here prohibits its wide use in pharmacogenetics and sequencing occurs to be the more reliable technique. With the availability of large sequencing capacities worldwide, molecular-epidemiological studies using sequencing for a limited number of SNPs in CYP genes in a large population are feasible.