A review of the WHO malaria rapid diagnostic test product testing programme (2008–2018): performance, procurement and policy

Malaria rapid diagnostic tests (RDTs) emerged in the early 1990s into largely unregulated markets, and uncertain field performance was a major concern for the acceptance of tests for malaria case management. This, combined with the need to guide procurement decisions of UN agencies and WHO Member States, led to the creation of an independent, internationally coordinated RDT evaluation programme aiming to provide comparative performance data of commercially available RDTs. Products were assessed against Plasmodium falciparum and Plasmodium vivax samples diluted to two densities, along with malaria-negative samples from healthy individuals, and from people with immunological abnormalities or non-malarial infections. Three measures were established as indicators of performance, (i) panel detection score (PDS) determined against low density panels prepared from P. falciparum and P. vivax wild-type samples, (ii) false positive rate, and (iii) invalid rate, and minimum criteria defined. Over eight rounds of the programme, 332 products were tested. Between Rounds 1 and 8, substantial improvements were seen in all performance measures. The number of products meeting all criteria increased from 26.8% (11/41) in Round 1, to 79.4% (27/34) in Round 8. While products submitted to further evaluation rounds under compulsory re-testing did not show improvement, those voluntarily resubmitted showed significant increases in P. falciparum (p = 0.002) and P. vivax PDS (p < 0.001), with more products meeting the criteria upon re-testing. Through this programme, the differentiation of products based on comparative performance, combined with policy changes has been influential in the acceptance of malaria RDTs as a case-management tool, enabling a policy of parasite-based diagnosis prior to treatment. Publication of product testing results has produced a transparent market allowing users and procurers to clearly identify appropriate products for their situation, and could form a model for introduction of other, broad-scale diagnostics.

on detection of parasite antigens in patient blood, such as histidine rich protein 2 (HRP2) expressed by Plasmodium falciparum and/or Plasmodium lactate dehydrogenase (pLDH) expressed by all human malaria species [3]. RDTs attracted interest since they offer accurate diagnosis while circumventing obstacles faced when using microscopy in peripheral health care settings, including cost of equipment, unstable reagents, and the need for electricity and skilled personnel (2). RDTs are relatively easy to use and provide a rapid time to result (< 30 min) [3].
The first malaria RDTs emerged in the early 1990s [4], and the World Health Organization (WHO) held its first meeting on rapid diagnostic testing in 1999 [2]. While adoption was slow, reports suggested they could be a useful tool [5]. Rapid expansion in the number of products occurred by the early 2000s. However, reports of variable field performance underscored the need to develop guidance to aid national malaria programmes on RDT procurement and implementation [6][7][8]. Concern regarding weak in vitro diagnostic (IVD) regulation in many endemic countries, combined with the absence of an independent evaluation process, and lack of product validation standards, led the WHO and other agencies to create an international RDT quality control programme for malaria RDTs [2], focussed around independent product testing and lot testing.

Development of the WHO RDT evaluation programme (product testing and lot testing)
Development of a coordinated effort to quality control malaria RDTs pre-purchase (product testing) and post-purchase (lot testing) began in 2002 at the WHO Regional Office for the Western Pacific (WPRO) as a collaboration with the Special Programme for Research and Training in Tropical Diseases (TDR) and the WHO Roll Back Malaria Programme. In 2003 WPRO convened a multi-partner consultation including the Philippines Research Institute for Tropical Medicine (RITM), the Institut Pasteur du Cambodge (IPC)/Cambodian National Malaria Centre (CNM), TDR, WHO-RBM, US Centers for Disease Control and Prevention (CDC), and the Hospital for Tropical Diseases (HTD) [9]. Subsequently, standard operating procedures (SOPs) were developed, and collection of wild type P. falciparum and Plasmodium vivax samples was undertaken in 12 countries in Africa, Asia, and South America [10]. Samples were characterized by microscopy and polymerase chain reaction (PCR), followed by ELISA-based quantification of the parasite antigens HRP2, pLDH and aldolase. Only samples that contained monoinfections with P. falciparum and P. vivax and had antigen above a minimum threshold consistent with clinical infection were included [9,11].
After 4 years of development, specimen collection and piloting, in 2007, the WHO and the Foundation for Innovative New Diagnostics (FIND) implemented lot testing services (testing a sample of a production lot) on a limited basis at RITM and IPC/CNM. Soon after, WPRO issued recommendations that procurers only purchase products manufactured under the ISO 13485 standard, and submit a sample from each production lot, for lot-testing. However, comparative performance assessment was still needed to guide initial procurement decisions. Therefore, in 2008, the WHO invited ISO 13485-certified manufacturers to participate in the first round of 'product testing' to be conducted at the CDC, which assessed detection accuracy, reliability, and heat stability of commercially available RDTs, against a large panel of P. falciparum, P. vivax and negative samples, to enable WHO to develop evidence-based recommendations on product selection ( Fig. 1) [12]. Following consultations in 2009, the WHO established minimum recommended procurement criteria based on these product performance evaluations and compliance with ISO 13485. A panel detection score (PDS) of ≥ 50% was recommended against the 200 parasites/μL density for P. falciparum and P. vivax, ideally higher in low-transmission settings. A false positive rate of < 10% and invalid rate of < 5% was recommended in all transmission settings. Criteria were tightened in 2012 by the WHO Malaria Policy Advisory Committee (MPAC) to a PDS of ≥ 75% against the 200 parasites/μL density for both species in all transmission settings [13].

Overview of product testing procedures
Prior to each round of product testing, WHO issued a call for expression of interest to invite manufacturers to submit products for assessment. Manufacturers must have had a valid ISO 13485:2003 certificate to participate, and those accepted needed to submit more than 1000 RDTs from 2 lots, for each product. Evaluation was performed using cryo-preserved blood samples, with testing divided into two phases. During Phase 1, products were screened against 20 cultured P. falciparum parasites diluted in whole blood to 200 parasites/µL, with each sample being tested on two RDTs from each lot. A higher density of 2000 parasites/µL was also tested on one RDT from each lot. Products needed to meet a PDS of ≥ 80% against the 2000 parasites/µL density samples to proceed to Phase 2.
The Phase 2 panel comprised approximately 100 wildtype P. falciparum samples consisting of paired dilutions at 200, and 2000 parasites/µL, (or 5000 parasites/µL, in early panel iterations), 35 wild type P. vivax pairs, and 100 microscopy and PCR malaria negative samples from transmission-free populations with no recent history of exposure to malaria and half containing no known pathogens or immunological factors (clean negatives), and the other half containing pathogen and immunological factor-containing blood (dirty negatives). When wild type samples were depleted following a testing round they were replaced with new samples ensuring no statistical difference in the distribution of panel antigen concentration between rounds [10].
During evaluation, RDT results were read by two trained personnel; the first reader determined results at the minimum manufacturer stated time and the second reader as soon as possible thereafter (< 30 min). The second reader was blinded to results from the first read. Test line intensity was recorded on a scale of 0 (no band) to 4 (strong band) using standard colour charts, with intensities 1-4 classified as positive. The PDS was used as the performance measure to score products in each phase. Since Phase 1 acted as a screening step, only PDS measured in Phase 2 was used for product assessment. Results from the first read were used to determine PDS.
The PDS measure was developed to reflect both product sensitivity and reproducibility. It required all four tests, two from each of two manufacturing lots, against the same sample (at 200 parasites/µL) to be positive to register as "detecting" the sample, and quantifies the percentage of samples the product detected (Fig. 2). Thus it formed a more stringent measure than the more traditional measure of sensitivity.
Product false positive rate was reported, (i) overall, (ii) against each type of negative specimen, and (iii) as incorrect species detection. An invalid rate was reported for all products, with an invalid test defined as an absence of control line at the time of reading. Invalid tests were not repeated during product testing.

Uptake of invitation to participate in WHO product testing program
The number of requests from manufacturers to submit products for testing generally increased over the eight rounds ( Fig. 1). In five of the eight rounds the demand for testing exceeded the capacity of the testing laboratory and therefore each manufacturer was permitted to submit a limited number of products. In some cases manufacturers withdrew initial interest and, therefore, the final number of products tested in each round differed from the original expression of interest ( Fig. 1, Table 1).
In total 332 products were evaluated over the eight rounds of testing; 227 were unique [14], with the remainder (105) being resubmitted products that had been evaluated in previous rounds (Fig. 1). While some manufacturers voluntarily resubmitted products, compulsory re-testing was introduced in Round 5 to ensure products were re-evaluated at least every 5 years. This repeat assessment confirmed performance was maintained over time. Only the most recent results were included in the published WHO performance measures. Products not re-submitted to compulsory testing were removed from subsequent performance reports [10], the associated WHO information note, and the online database of results. Overall 33 products were assessed twice, 21 were evaluated three times, and five, two and one products were assessed four, five, and six times, respectively [10].

False positivity and invalid rates
The false positivity rates on clean negative samples varied between rounds (Fig. 4). The proportion of products with a high false positive rate (> 10%) increased between Rounds 1-5 with 19% (8/42) of Round 5 products having > 10% false positive rate. By Round 8, this trend reversed with just 5.9% (2/34) products obtaining > 10% false positive rate. The number of products with a high invalid rate was low overall; only two products had invalid rates > 5%.

Products meeting all WHO recommended performance criteria
As of Round 8, 89 products have met all three performance criteria, including 36 P. falciparum, 26 P. falciparum and pan, 21 P. falciparum and P. vivax/Pvom (vivax, malariae, ovale), 4 pan only, one product detecting P. falciparum on one line with a separate line detecting P. falciparum and P. vivax together and one product detecting P. falciparum on one line with a separate line detecting P. vivax and pan.. Between Rounds 1-8, the proportion of products eligible for procurement based on performance indicators more than tripled from approximately 25% to > 80% (Fig. 5). Since combination RDTs detecting both P. falciparum and P. vivax must have a PDS meeting the WHO criteria for both species, a lower proportion of combination RDTs tend to meet the performance criteria.

Compulsory retesting
Twenty-two, 19, 30 and 27 products were due for compulsory resubmission in Rounds 5 through 8. However, only 19 of these were actually resubmitted; 10 in Round  Table 2. Among the 19 compulsory resubmitted products, the P. falciparum PDS significantly decreased with a median change of 6.8% (IQR: 2.5-8.4; Wilcoxon Signed Rank Test, p = 0.006). Only eight of these 19 products detected P. vivax, and all except one were above the recommended PDS threshold of ≥ 75%. There was no significant change in the P. vivax PDS (median change = − 0.4%, IQR: − 10.0 to 5.4; Wilcoxon Signed Rank Test, p = 0.273). Overall there was a significant decrease in median false positive rate of 1.6% (IQR: 0-2.6, Wilcoxon Signed Rank Test, p = 0.033). Seventeen out of 19 products met the procurement criteria on either initial or repeat evaluation, with 12 meeting the criteria at both evaluation points.

Voluntary retesting
Of the 53 products voluntarily resubmitted, there was a significant improvement in mean P. falciparum PDS of               9.7% (95% CI 4.9-14.5%; paired t-test, p < 0.001), and a non-significant decrease in the mean false positive rate of 0.1% (95% CI − 5.9 to 5.8%; paired t-test, p = 0.98). Among the 37 P. vivax detecting products, significant P. vivax PDS improvements were observed with a mean change of 35.5% (95% CI 22.8-48.3%; paired t-test, p < 0.001). Fifteen products met the procurement criteria on initial evaluation, compared with 31 on repeat evaluation; 13 products met procurement criteria at both evaluation points.

Reflection on impacts of product testing programme
Spawned by challenges of field studies, weak IVD regulation, and the need to expand access to high quality malaria diagnosis, the WHO Malaria RDT Product Testing Programme has over the past decade generated performance data on 332 products. Through direct feedback to manufacturers and global stakeholder dissemination and communication efforts, the Round 1 report catalysed an evolution of malaria diagnostic testing by revealing a subset of high-performing products [15]. This provided a pivotal body of evidence that supported the 2010 WHO Malaria Treatment Guidelines recommending RDTs as an acceptable alternative to microscopy. It was in fact on the basis of this data and reports of health worker competency at performing malaria RDTs [16] that WHO evidence-based policy and procurement recommendations were developed [13], which in turn informed major donor policies [10,14,17]. The product testing results also provided detailed information for manufacturers which sometimes resulted in changes in the instructions for use (IFU). For instance, observations from Round 1 showed the results from the second RDT read were often better than the first read at the manufacturers' recommended reading time. This information was fed back to manufacturers, with many subsequently changing their IFU to increase the recommended reading times from 15 to 20 min.
The comprehensive testing protocol and transparent reporting of results not only facilitated product selection, but generated performance-based competition between manufacturers so as to capture a larger market share. A substantial improvement in test performance was associated with this, while prices have fallen [18,19]. After 2010, when the WHO introduced a policy of parasitebased diagnosis by RDT or microscopy prior to treatment in all cases of suspected malaria [17], there was an upsurge in the number of manufacturers interested in participating in product testing. Allowing manufacturers to voluntarily resubmit products for testing provided a unique opportunity to observe the evolution of improved development as manufacturers strived to improve products to demonstrate a high PDS.
Beyond positive changes in RDT performance, uptake and use in practice, there is evidence that the program has influenced the RDT marketplace. Specifically, FIND conducted a manufacturer survey which showed the proportion of RDTs sold with a PDS ≥ 75% more than doubled from 23% in 2007, to 57% in 2009 and tripled by 2010 to 78%, coinciding with the release of the first and second product testing reports [20]. Driven by widespread compliance with WHO recommended performance criteria, this proportion further increased to 93% in 2014 [21]. Similarly, data gathered from major public sector RDT procurers showed a market shift towards procurement of only high performing products; while products purchased in 2009 included several with a sizable market share that did not meet performance criteria, this proportion decreased each year and since 2014 almost 100% of procured products met WHO performance criteria [19]. Furthermore, the market has consolidated around two suppliers who manufactured the highest-performing tests across several rounds of product testing [10,18].
Between 2009 and 2019, all major public sector procurers have continuously had in place policies stating diagnostic test budgets can only be spent on RDTs that are recommended by the WHO. WHO recommendations on procurement of RDTs have evolved over the past decade being initially based on the results of product testing between 2009 and 2017, followed by a requirement for WHO prequalification for P. falciparum-only HRP2 RDTs in 2018 and also for RDT combination tests in 2019. An exception exists in which non-WHO prequalified RDTs, that meet performance criteria and specifically target non-HRP2 antigens, can be used in areas where pfhrp2 deletions are prevalent as an interim measure [14,[22][23][24]. Several manufacturers have achieved WHO prequalification status [25]. The results of product testing, which constitutes the independent laboratory evaluation component of the prequalification process was used by the WHO PQ programme in prioritizing applications that include a product dossier, and manufacturing site inspection(s) to review the quality management system.

Lot testing
Lot performance variation is an issue for all diagnostics. The product testing program tested RDTs from two different lots selected and supplied by manufacturers. There is no guarantee that results for the two lots submitted for evaluation are representative of every subsequent lot. Therefore, the WHO recommends both proactive and reactive post market surveillance to identify sub-standard lots prior to and/or post field deployment and continues to support needs of the global community through centralized testing at the Research Institute of Tropical Medicine, Philippines and the WHO has supported local capacity development for lot verification for malaria RDTs in Nigeria (ANDI Centre of Excellence for Malaria diagnosis, University of Lagos) and India (National Institute of Malaria Research) [26,27].

Conclusions
The objective of the WHO malaria RDT product testing programme was to provide independent comparative performance data to guide procurement decisions of UN agencies and WHO Member States. Through the close collaboration with FIND, CDC and several other partners, this objective has not only been repeatedly fulfilled, but the programme has influenced policy, clinical and manufacturer practice and helped shape the global market. Ultimately, it has driven improved product performance by establishing broadly accepted minimum performance criteria [22,28,29], making reference materials available that match that benchmark [30], and keeping the field open and regularly renewed, to encourage innovation and a competitive market. Since the programmes inception, an estimated 1.3 billion RDTs were procured in the public sector without any verified case of large-scale product/lot failure of WHO recommended products.
The RDT evaluation programme also served as a model for establishing and ensuring performance standards for RDTs detecting other diseases. To date, a leishmaniasis [31] and Ebola [32] RDT evaluation programme have been established using protocols adapted from malaria product testing. While significant gains have been made, there are still areas requiring attention to ensure effective case management, such as assessing RDT performance against Plasmodium malariae, Plasmodium ovale and Plasmodium knowlesi, and P. falciparum lacking HRP2.