Test characteristics of two rapid antigen detection tests (SD FK50 and SD FK60) for the diagnosis of malaria in returned travellers

Background Two malaria rapid diagnostic tests were evaluated in a travel clinic setting: the SD FK50 Malaria Ag Plasmodium falciparum test (a two-band test) and the SD FK60 Malaria Ag P. falciparum/Pan test (a three-band test). Methods A panel of stored whole blood samples (n = 452 and n = 614 for FK50 and FK60, respectively) from returned travellers was used. The reference method was microscopy with PCR in case of discordant results. Results For both tests, overall sensitivity for the detection of P. falciparum was 93.5%, reaching 97.6% and 100% at parasite densities above 100 and 1,000/μl respectively. Overall sensitivities for Plasmodium vivax, Plasmodium ovale and Plasmodium malariae for the FK60 test were 87.5%, 76.3% and 45.2%, but they reached 92.6% and 90.5% for P. vivax and P. ovale at parasite densities above 500/μl. Specificities were above 95% for all species and both tests when corrected by PCR, with visible histidine-rich protein-2 lines for P. malariae (n = 3) and P. vivax and P. ovale (1 sample each). Line intensities were reproducible and correlated to parasite densities. The FK60 tests provided clues to estimate parasite densities for P. falciparum below or above 1,000/μl. Conclusion Both the FK50 and FK60 performed well for the diagnosis of P. falciparum in the present setting, and the FK60 for the diagnosis of P. vivax and P. ovale at parasite densities > 500/μl. The potential use of the FK60 as a semi-quantitative estimation of parasite density needs to be further explored.

2 (HRP-2) or P. falciparum-specific parasite lactate dehydrogenase (pLDH). Newer generation three-band tests display a control line and two test lines, one for detection of P. falciparum-specific antigen and another for detection of antigens common to the four species, such as pan-Plasmodium-specific parasite lactate dehydrogenase or aldolase. Simplified "one-step" malaria RDTs have been marketed. Unlike their predecessors, the one-step RDTs only require one or two manipulations, i.e. application of blood and a running buffer. It is to be expected that they will increase performance by laboratory and clinical staff who are using the RDTs on an incidental base. Indeed, multistep RDTs have been demonstrated to require considerable training to reach optimal sensitivity [2]. Many brands are marketed, but published reports are only available for a small number of them [3][4][5][6]. The World Health Organization, through the Regional Office for the Western Pacific, lists a number of malaria RDTs, which are produced in compliance with ISO 13485:2003 [7].
The SD FK50 Malaria Ag P. falciparum test (Standard Diagnostics, Hagal-Dong, Korea) and the SD FK60 Malaria Ag P. falciparum/Pan (Standard Diagnostics) are one-step malaria diagnostic tests in a cassette format, in a two-and three-band design, respectively. In this study, their performance was assessed when challenged with a collection of stored blood samples of returned travellers in a reference centre. For convenience, these tests will be referred to as FK50 and FK60, respectively.

Study design
Both kits were retrospectively evaluated in a reference laboratory on a panel of stored blood samples obtained in returned travellers suspected of malaria. The reference method was microscopy, performed at presentation of the patient. All discordant results were subsequently analysed by PCR, and test characteristics were recalculated according to the PCR-corrected results.

Patients and Materials
EDTA-blood samples from patients attending the outpatient clinic of the Institute of Tropical Medicine (ITM), Antwerp, Belgium, or those that were sent by Belgian laboratories to ITM for confirmation in the scope of the national reference function were used. Patients included European travellers returning from malaria-endemic areas and, to a lesser extent, natives of endemic countries returning from visiting friends and relatives. The samples were submitted as part of the diagnostic protocol for suspected malaria. Samples had been collected from January 1996 to October 2007 and had been stored at -70°C. Diagnosis was based on standard microscopy. Among these samples, a panel was selected based on relevant representation of the four malaria species (P. falciparum, Plasmodium vivax, Plasmodium ovale and Plasmodium malariae) and different parasite densities. For the FK60, additional samples of non-falciparum species were included. Samples without malaria parasites (negative samples) were collected prospectively during the period of September and October 2007 from returned travellers attending the outpatient clinic of ITM and for whom a thick film, requested as part of work-up of suspected malaria, did not show any malaria parasites.

Reference method
Diagnosis of malaria, species identification and determination of parasite density were done by microscopy. According to standard practice at the ITM, thick and thin blood films were prepared, stained with Giemsa (pH 8.0) and examined by light microscopy using a × 500 magnification. An examination of 15 minutes for a thick film, with a minimum of 200 fields read, was performed before the blood film was reported negative. Parasite densities were estimated by counting asexual parasites against 200 white blood cells (WBC) in thick blood films, converting this number to parasites/μl using the actual WBC count or, when this was not available, the standard 8,000 WBC/ μl value [4]. Parasite densities are further in this text expressed as counts (of asexual parasites)/μl (of whole blood).

Test platforms
The FK50 is a two-band RDT targeting HRP-2 antigen. Results are expressed as positive or negative for P. falciparum. The FK60 is a three-band test targeting HRP-2 and pLDH. The presence of a HRP-2 line together with a pLDH line indicates an infection with P. falciparum or a mixed infection with P. falciparum and one or more of the other Plasmodium species. The presence of a unique HRP-2 line refers to an infection with P. falciparum, whereas a unique pLDH line indicates infection with one or more of the other Plasmodium species. Both assays are lateral flow immunochromatographic antigen detection tests in a cassette format.

Test procedure
Tests were performed according to the instructions of the manufacturer, except that samples (5 μl) were loaded with a transfer pipette (Finnpipette, Helsinki, Finland) instead of the plastic tube supplied by the manufacturer and that a scoring system was used to assess the intensity of the test lines. In cases for which the control line did not appear, the results were interpreted as invalid and the tests were repeated. In order to score test line intensities, the scoring system of Bell and co-workers [8]was used and five categories were defined: None (no line visible), Faint (barely visible line), Weak (paler than the control line), Medium (equal to the control line) or Strong (stronger than the control line). To assure timely readings, tests were carried out in time-controlled batches of five samples. Readings were performed by three subsequent observers, of whom the one who performed the test procedure invariably was the first. Observers were blinded to the results of microscopy and to each others' readings. Readings were carried out at daylight assisted by a standard electricity bulb, between 20 and 30 minutes after application of the sample and buffer. The results of the readings considered were based on consensus agreement, which means that the same result was observed by at least two out of three different observers. Where there was no consensus (in rare cases for the FK60), results of the first observer were considered. To assess inter-observer agreement, results of positive and negative readings as well as line intensity readings were considered. To assess test reproducibility, a panel of six positive samples for P. falciparum with various parasite densities (116, 200, 1,123, 2,900, 138,000 and 275,000/μl) was tested on five occasions.

Statistical analysis
For the FK50, true positive results were defined as those with a HRP-2 line visible in samples with P. falciparum seen at microscopy, and true negative results as those with no HRP-2 line visible in microscopy-negative samples. Incorrect test results included false-negative samples (those with a microscopic diagnosis of P. falciparum but no test line visible), false-positive samples (microscopic negative samples showing a HRP-2 line) and species misidenfications (non-falciparum species showing a HRP-2 line). For the FK60, samples infected with P. falciparum and the non-falciparum species were considered separately and the control panels included both microscopy negative samples and samples infected by the non-falciparum species and P. falciparum respectively, as shown in Tables 1 and 2. Of note is that, for both tests, samples with pure gametocytaemia were included among the positive P. falciparum samples. To avoid complex problems of interpretation, microscopically identified mixed infections were considered separately and not included in the calculations of test characteristics.
Sensitivity and specificity were calculated with 95% confidence intervals (C.I.) and differences were tested for significance using the chi-square test or, in case of small sample sizes, a two-tailed Fisher's exact test. A p value < 0.05 was considered as significant. In addition, positive and negative likelihood ratios (LHR+ and LHR-) were calculated. Likelihood ratios provide direct information on the tests power to include (LHR+ > 10) or exclude (LHR-< 0.1) a disease without being influenced by its prevalence [3]. Inter-observer agreements were assessed using the kappa statistic for paired observers and percentage agreements for all three observers combined. Associations between line intensity readings and parasite densities were assessed for strength of association with Cramer's V for categorical variables, using interpretative criteria published previously [9].

Analysis of discordant results
For discordant results (i.e. false-negative and false-positive results and species misidentifications) a species-specific polymerase chain reaction (PCR) was performed. Primers and probes sequences detecting small subunit 18S rRNA genes were selected according to Rougemont et al [10].

Ethical review
The study was reviewed and approved by the Institutional Review Board of ITM and by the Ethical Committee of Antwerp University, Belgium.

Sensitivity, specificity and likelihood ratios
No invalid test results were observed. Additional file 1 shows the overall and detailed test characteristics matched for parasite densities. Sensitivity increased marginally when samples with pure gametocytaemia were subtracted. Sensitivity was related to parasite density, with values at parasite densities < 100/μl significantly lower compared to those at parasite densities > 100/μl (78.9% and respectively 97.6%, p < 0.001). Fifteen out of 21 false-negative samples had parasite densities < 100/μl (including three samples with pure gametocytaemia), the remaining six had parasite densities ranging from 122 to 400/μl. All these infections had been acquired in Africa. Of the four samples with mixed infection, a single one (with P. falciparum and P. ovale) gave a false-negative result. The overall specificity was 94.4%, with seven samples that were incorrectly identified: a visible test line was observed among four microscopic negative samples and three P. malariae samples. The exclusion power was excellent (LHR-< 0.10) except for parasite densities less than 100/μl, the inclusion power was also excellent (LHR+ > 10). Table 3 lists the line intensity readings related to parasite density. Line intensity readings were significantly related to parasite densities with a substantial correlation (V = 0.434, p < 0.001), but there was considerable overlap between categories. Faint or weak line intensities occurred in 98/305 (32.1%) true positive results, mostly but not exclusively at low parasite densities and among all seven false positive results and misidentifications.

Inter-observer agreement and reproducibility
The inter-observer agreement for positive and negative readings was high, with 97.3% overall agreement between the three observers and kappa values between 0.95-0.98  Of the four samples with mixed infection, a single one (with P. falciparum and P. ovale at a parasite density of 700/μl) showed only a pLDH line, the other mixed infections were correctly identified.

Line intensity readings
As for the FK50, faint and weak line intensities for the HRP-2 line occurred mostly but not exclusively at low parasite densities.  (Table 4). In addition, the presence of medium or strong pLDH line intensity was invariably associated with parasite densities exceeding 1,000/μl in the case of P. falciparum and 500/μl in the case of the non-falciparum species, except for one P. falciparum sample with pure gametocytaemia (12,700/μl). Of interest, among the 17 P. falciparum samples with pure gametocytaemia, there were 14 with HRP-2 lines visible compared to six with pLDH lines visible. HRP-2 lines in samples with non-falciparum species gave faint or weak line intensities except for a single P. malariae sample with strong line intensity. This latter sample however proved to be a mixed infection with P. falciparum/P. malariae upon PCR analysis (see below).

Inter-observer agreement and reproducibility
Both target lines performed well for inter-observer agreements, although the results of the HRP-2 line were better than the pLDH line (

Analysis of discordant results
PCR analysis of all discordances resulted in the following corrections: two samples that were microscopically diagnosed as P. ovale and that did not show any test line proved to be mixed P. falciparum/P. ovale infections, and one P. malariae sample that showed cross-reaction with the HRP-2 line proved to be a mixed P. falciparum/P. malariae infection. When correcting for these results, specificity for the detection of P. falciparum and sensitivity for the diagnosis of P. malariae increased slightly (Additional files 2 and 3).

Discussion
In this study, the performance of two one-step malaria rapid diagnostic tests, the SD FK50 Malaria Ag P. falciparum test (a two-band HRP-2 test) and the SD FK60 Malaria Ag P. falciparum/Pan test (a three-band HRP-2 and pLDH test) was evaluated on large panels of stored samples obtained from international travellers. For both tests, overall sensitivity for the detection of P. falciparum was 93.5%, reaching 97.6% and 100% at parasite densities above 100 and 1,000/μl respectively. Overall sensitivities for P. vivax, P. ovale and P. malariae for the FK60 test were 87.5%, 76.3% and 45.2%, but they reached 92.6% and 90.5% for P. vivax and P. ovale at parasite densities > 500/ μl. Specificities were above 95% for all species. Interobserver agreement and reproducibility were high for both tests.
One of the limitations of the present study was its retrospective design, which made it difficult to trace back causes of discordant results such as previous therapy and  the presence of interfering factors (e.g. rheumatoid factor). Further, the study population was not completely homogenous and the small numbers of semi-immune immigrants (who may tolerate low-level parasite densities [5]) were not identified. Another possible limitation is the fact that stored blood samples were used for analysis. Although, on theoretical grounds, there have been concerns about the stability of the target antigens under these conditions [11], previous evaluations of RDTs have been performed on stored samples [12,13] and a prospective evaluation of fresh and stored samples revealed similar results in case of the HRP-2 antigen [14]. In the present study, no obvious differences in test performance were found for samples stored for several (> 2) years compared to those stored for a shorter period (results not shown), and the samples had not been exposed to repeat thawing and freezing. Further, it should be realised that the present study design did not consider the performance of these RDTs when applied in clinical diagnosis by laboratory technicians in non-endemic settings, who have few exposure to malaria positive samples [1,2]. Assessing samples with different parasite densities should be part of the laboratory validation when introducing RDTs in clinical practice, for instance, to train the occasional reader to interpret faint line intensities as positive results. Finally, it should be realized that the present study was performed in a reference setting, with the availability of expert microscopy, trained observers and optimal environmental conditions. Likewise, a calibrated transfer pipette was used instead of the manufacturer's transfer device, in order to ensure correct volume transfer [11]. However, an evaluation of such a test in a reference setting is a logic first step preceding in-depth evaluations and field trials [11].
For both tests and the diagnosis of P. falciparum, the sensitivities were in line with those reported in other HRP-2 tests in returned travellers, with sensitivities ranging from 80% to 99%, depending on the setting and parasite densities [3,[15][16][17][18][19][20][21][22]. However, most of these studies, in particular the systematic reviews, addressed the multistep RDTs that are available on the market for a long time [3,5,6,23]. By contrast, evaluations of most of the other RDTs displayed on the WHO website are pending [7]. For the nonfalciparum species, the reported sensitivities vary, with decreasing sensitivities for P. vivax followed by either P. ovale or P. malariae [3,4,15,19,24]. In a recent meta-analysis on malaria RDTs in international travellers [3], sensitivities for P. ovale and P. malariae ranging from 36 -95% were mentioned. Part of this wide range in sensitivities is probably due to low sample sizes in different studies. The sample sizes in the present study enabled us to calculate test characteristics with narrower confidence intervals, and consequently this study demonstrates a significantly lower sensitivity for detection of P. malariae, as compared to P. ovale and P. vivax, even at parasite densities above 500/μl. Considering the present methods, two other remarks are to be made. First, in the case of P. falciparum, samples with only gametocytes were considered as part of the microscopic positive samples. From the standpoint of travel medicine, this is a recommended choice, but gametocytes may be present even after successful eradication of the asexual forms [3,5]. Moving these pure gametocytaemia samples to the "non-malaria" category in the present study collection would add slightly to the sensitivity and the LHR-of both tests (Additional files 1 and 2), at a considerable cost of specificity (87.0% and 93.3% for FK50 and FK60 respectively), but with LHR+ still above 10. Second, among our control panel not only microscopy negative samples were included, but also samples from other Plasmodium species and we consequently scored species misidentifications as incorrect diagnosis. One could argue that species misidentification can be tolerated as long as the diagnosis of malaria is not missed. Competent malaria diagnosis however requires distinction between at least P. falciparum and the other species, as prognosis, therapy, follow-up and epidemiology are different. With regard to both tests, it is reassuring that among the present samples, P. falciparum was not erroneously misidentified as a nonfalciparum species (with the exception of a single failure of HRP-2 reactivity in a mixed P. falciparum/P. ovale infection), and that misidentification only occurred in the other direction, i.e. non-falciparum species (especially P. malariae) were misidentified as P. falciparum. The impact of adding other species to the control group however was low in terms of test characteristics: limiting the control panel to exclusively the negative samples would result in a slight increase in specificity (95.8% and 98.9% for the FK50 and FK60 in case of P. falciparum respectively), a slight increase in sensitivity for P. vivax and P. ovale (88.8% and 77.5% respectively) and a moderate increase in sensitivity for P. malariae (58.1%). Of note are the false positive reactions for the HRP-2 line among the non-falciparum species, in particular among 10% of our P. malariae samples. HRP-2 cross-reaction have been reported for P. vivax and P. malariae, but not for P. ovale [25,26].
For all species, declining sensitivities at lower parasite densities were observed. For P. falciparum this is a wellknown phenomenon [3][4][5][6]11]. The present study demonstrated this decline for the non-falciparum species as well, indicating a breakpoint at 500/μl. In line with the results from the meta-analysis mentioned above [3], most of the P. falciparum false-negative results in this study occurred in samples with parasite densities < 100/μl. Although the failure to detect high parasite densities, is also mentioned as a pitfall of malaria RDTs [3][4][5][6], no cases of false-negative results were presently found at parasite densities above 400/μl. It should be noted however that false-negative results at elevated parasite densities are rare events that await prospective surveillance by incident reporting.
Further, the results did not show any relation between geographic distribution and false-negative results by HRP-2 due to possible genetic variations in HRP-2 target [27], but the majority of samples were acquired in Africa, and more samples should be tested from the Asia-Pacific to rule out an influence of such variations.
In contrast to most other studies, reproducibility and inter-observer agreements of both tests were presently assessed. Line intensity readings (and consequently test results) showed high inter-observer agreements and were also reproducible upon repeat testing, but performances were clearly better for the HRP-2 line as compared to the pLDH line. For the latter line, the preponderance of weak and faint readings for the non-falciparum species is of concern. Furthermore, for the detection of P. falciparum, the three-band FK60 test performed as well as the twoband FK50. This is of note, as one could expect the threeband test, which has to meet two optimums of antigenantibody interactions, would perform somewhat less than the two-band test. Although the present devices are not designed to use line intensities as a tool for grading parasite densities, this study also explored their possible diagnostic value. In line with other findings, line intensities were related to parasite density [17,26,28] but considerable overlaps precluded their use as a semi-quantitative estimation of parasite density. However, the FK60 test provided interesting clues to parasite densities below or above 1,000/μl for P. falciparum (the unique presence of a HRP-2 line and the presence of medium or strong pLDH line respectively). A similar approach has been described for a HRP-2 and aldolase three-band RDT, for which coreactivity of both test lines pointed to parasite densities of = 40,000/μl [29]. Further product research might refine and expand these possibilities, thereby enlarging the scope of malaria RDTs application [6].

Conclusion
Taking into account their sensitivity and specificity, interobserver agreement and reproducibility, it is clear that the FK50 and the FK60 tests devices are a valuable adjunct to microscopy for the diagnosis of malaria in a non-endemic setting. They share the limitations of other malaria rapid diagnostic tests, in particular the limited exclusion power for P. falciparum malaria at low parasite densities and a lower sensitivity for the non-falciparum species, especially P. malariae. Possible test improvements -apart from the sensitivity -would be an increase in intensity of the pLDH line, and the exploration of the semi-quantitative estimation of the parasite densities.