The effect of varying analytical methods on estimates of anti-malarial clinical efficacy

Background Analytical approaches for the interpretation of anti-malarial clinical trials vary considerably. The aim of this study was to quantify the magnitude of the differences between efficacy estimates derived from these approaches and identify the factors underlying these differences. Methods Data from studies conducted in Africa and Thailand were compiled and the risk estimates of treatment failure, adjusted and unadjusted by genotyping, were derived by three methods (intention to treat (ITT), modified intention to treat (mITT) and per protocol (PP)) and then compared. Results 29 clinical trials (15 from Africa and 14 from Thailand) with a total of 65 treatment arms (38 from Africa and 27 from Thailand) were included in the analysis. Of the 15,409 patients enrolled, 2,637 (17.1%) had incomplete follow up for the unadjusted analysis and 4,489 (33.4%) for the adjusted analysis. Estimates of treatment failure were consistently higher when derived from the ITT or PP analyses compared to the mITT approach. In the unadjusted analyses the median difference between the ITT and mITT estimates was greater in Thai studies (11.4% [range 2.1–31.8]) compared to African Studies (1.8% [range 0–11.7]). In the adjusted analyses the median difference between PP and mITT estimates was 1.7%, but ranged from 0 to 30.9%. The discrepancy between estimates was correlated significantly with the proportion of patients with incomplete follow-up; p < 0.0001. The proportion of studies with a major difference (> 5%) between adjusted PP and mITT was 28% (16/57), with the risk difference greater in African (37% 14/38) compared to Thai studies (11% 2/19). In the African studies, a major difference in the adjusted estimates was significantly more likely in studies in high transmission sites (62% 8/13) compared to studies in moderate transmission sites (24% 6/25); p = 0.035. Conclusion Estimates of anti-malarial clinical efficacy vary significantly depending on the analytical methodology from which they are derived. In order to monitor temporal and spatial trends in anti-malarial efficacy, standardized analytical tools need to be applied in a transparent and systematic manner.


Background
In the past decade, the number of anti-malarial clinical trials has increased significantly. In Africa alone, the number of such studies published between 2001 and 2005 increased three-fold compared to the number published in the preceding five years [1]. This increase is primarily due to greater awareness of the emergence of multidrug resistant strains of Plasmodium falciparum and to the introduction of new treatment regimens such as artemisinin combination therapy (ACT). In addition, study designs have evolved to include a longer duration of follow-up and the inclusion of genotyping to distinguish recrudescence from new infection [1].
Anti-malarial clinical trials are usually conducted either to compare two or more treatment regimens (comparative trials) or to monitor for the emergence of anti-malarial resistance over time and in different geographical areas. The World Health Organization (WHO) currently recommends that countries change their anti-malarial treatment policy when the cure rate for the current recommended therapy falls below 90% and that a new anti-malarial treatment policy be adopted only when a therapy has an average cure rate ≥ 95% [2]. The WHO also recommends the use of survival analysis to generate efficacy estimates; however, in practice researchers adopt a variety of statistical methods tailored to the rationale of the specific study [3][4][5]. The derived estimates are confounded further by variations in the PCR correction methods used to distinguish recrudescent infections from new infections [1,6]. These methodological differences undermine attempts to monitor and compare cure rates between locations and over time and significantly limit the utility of clinical trials to guide policy [7].
In general, anti-malarial efficacy can be calculated using three approaches: per protocol, intention-to-treat and modified intention-to-treat. In the per protocol analysis (PP) the evaluable population includes only those patients who are followed throughout the protocoldefined follow-up period and in whom a clear treatment outcome can be determined. In this approach, patients deviating from the protocol, such as those who do not complete follow-up, are excluded from the analysis. Intention-to-treat analysis (ITT) adopts a conservative approach often advocated for comparative drug trials, in which all patients randomized to treatment are included in the analysis and patients with incomplete follow-up who do not reach the primary outcome of interest are generally considered treatment failures. In the third approach, the modified intention-to-treat analysis (mITT), survival analysis is used and patients with incomplete follow-up who do not reach the primary outcome of interest are included in the analysis as non-failures, but censored on the last day of follow-up. WHO guidelines and several recent consensus papers advocate modified ITT survival analysis as the most appropriate method for monitoring anti-malarial efficacy [3,5,7,8].
The aim of the current study was to quantify the magnitude of the differences between efficacy estimates derived from survival analysis of a mITT approach with that of simple proportions from PP and ITT approaches and to identify factors that influence these differences. Data were compiled from 29 comparative anti-malarial clinical trials conducted in Africa and Thailand and the derived estimates of treatment failure compared.

Data sources for analysis comparisons
Individual patient data were available from 14 comparative clinical trials conducted in Thailand between 1993 and 2005 and from 15 comparative clinical trials conducted in Uganda and Burkina Faso between 2003 and 2007 (Additional File 1). Data were included only for patients enrolled with uncomplicated malaria due to P. falciparum. Drug treatment was supervised in all patients, with daily observation until at least day 3 followed by weekly visits up to 28, 42, or 63 days.

Thai studies
The studies in Thailand were carried out in a camp for displaced persons of the Karen ethnic minority on the western border of Thailand [9]. Transmission of malaria here is unstable and seasonal, with peaks in May through July and December through January [10]. The estimated entomological inoculation rate (EIR) and corresponding incidence of malaria is low (approximately 0.5 to 1.5 cases/ person/year), with prevalence rates of 1-4% for P. falciparum. Overall, P. falciparum accounts for 37% of malaria infections, with the remainder due to P. vivax. All P. falciparum infections and approximately 90% of P. vivax infections are symptomatic. In Thailand, patients of all ages were enrolled, providing that they weighed more than five kilograms. Pregnant women and patients with severe disease were excluded.

African studies
The studies in Africa were conducted in Bobo-Dioulasso, Burkina Faso and in several study sites in Uganda. Patients recruited were six months of age or older with no evidence of severe disease. Plasmodium falciparum accounts for nearly 100% of all malaria cases in these regions. In Burkina Faso, malaria is seasonal, with transmission peaking during the rainy season from May to October. All patients were recruited from governmental health clinics. Studies in Uganda were conducted in areas of moderate to high transmission intensity, with peaks during two rainy seasons from March to May and from August to September. Three studies, in Kampala, Apac, and Tororo, were conducted in children only. Patients were recruited from district health clinics participating in the Ugandan Malaria Surveillance Project, household sampling, or from other outpatient clinics.

Malaria outcome classification
The key parameters for deriving the efficacy estimates were coded identically for all studies, as described previously [7]. Outcomes were classified according to the 2006 WHO guidelines as adequate clinical and parasitological response (ACPR), early treatment failure (ETF), late clinical failure (LCF), late parasitological failure (LPF), or follow-up interrupted (Table 1). For 24 of the 29 (83%) studies, parasites were genotyped to distinguish recrudescent and new infections due to P. falciparum, as previously described [1,11]. All ETFs were considered to be due to recrudescence. Patients meeting the criteria for LCF or LPF in whom genotyping was done but results were inconclusive or unavailable were classified as unsuccessfully genotyped.

Statistical analyses
The risks of failure unadjusted and adjusted by genotyping for each treatment arm of the individual studies were derived and compared using three analytical methods; per protocol (PP), intention-to-treat (ITT), and modified intention-to-treat (mITT). Although the general principles behind these analytical approaches are well-described, in practice subtle differences arise in the way in which the outcome measures may be classified. For the purpose of the present analysis, treatment outcomes were classified as summarized in Table 2. In the ITT analyses, the evaluable population for both the unadjusted and adjusted calculations included all patients enrolled in the study. In the PP analysis, the evaluable population included only patients classified as ACPR or recurrent parasitemia with P. falciparum (ETF, LPF, LCF) in the unadjusted calculations and only patients classified as ACPR, ETF or LCF/LPF due to recrudescence in the adjusted calculations. In the mITT analyses, the evaluable population for both the unadjusted and adjusted calculations included all patients enrolled in the study, with the exception that LCF/LPF outcomes with unsuccessful genotyping outcomes were excluded from the adjusted calculations. In the PP and ITT analyses, the risk of failure for each treatment group was calculated as the proportion of patient classified as failure (the numerator) divided by the number of patients in the evaluable population (the denominator). In the mITT analyses, the risk of failure was calculated using the Kaplan-Meier product limit formula with data censored for patients who were not classified as failures and with interrupted follow-up. For the unadjusted calculations, patients with follow-up interrupted and non-falciparum new infections were censored on the last day of observation. For the adjusted calculations, censored patients also included those with new P. falciparum infections. Risk of failure estimates derived from the three analytical methods are provided in Additional File 2.
The relationship between the proportion of patients with incomplete follow-up and the risk difference were com-pared using two different methods for estimating the risk of failure. Incomplete follow-up included any outcome category (listed in Table 2), where the classification of success/failure/censored/excluded differed between any of the three analytical methods. In the unadjusted analyses, incomplete follow-up was defined as any patient in whom follow-up was interrupted and those with non-falciparum new infections. In the adjusted analyses, incomplete follow-up was defined as any patient in whom follow-up was interrupted, those with non-falciparum new infections, those with P. falciparum new infections, and those with unsuccessful genotyping.
Since there were exclusive differences in the study characteristics between Thailand and Africa, stratified analyses were used to evaluate factors that may contribute to the pairwise differences in the risk of failure between the analytical methods for both the adjusted and unadjusted calculations. The following potential factors associated with incomplete follow up were included in the analysis: the location of the study (Africa or Thailand), the duration of study follow-up (28, 42, or 63 days), and malaria transmission intensity (classified as low (EIR < 1), moderate (EIR 1 to 100) and high (EIR > 100)).
All analyses were performed with Stata, version 10 (Stata-Corp, College Station, Texas). A p-value < 0.05 was considered statistically significant.

Results
In total, 29 drug studies were included in the analysis, with 65 treatment arms that enrolled 15,409 patients. Five (17%) trials in Thailand that included eight treatment arms were conducted prior to the introduction of genotyping and thus were not included in the adjusted analyses. Of the 15 studies conducted in Africa, the duration of follow-up was 28 days in 12 (80%) studies and 42 days in 3 (20%) studies. Ten (66%) trials conducted in Africa were conducted in areas of moderate transmission intensity and the remainder were conducted in areas of high transmission. Of the 14 studies conducted in Thailand, the duration of follow-up was 28 days for one (7%) study, 42 days for five (36%) studies, and 63 days for eight (57%) studies. All Thai studies were conducted in an area of low intensity transmission. Clinical outcomes for each location (Africa and Thailand) are summarized in Table 1.

Incomplete follow-up
For analyses unadjusted by genotyping, incomplete follow can be divided into two categories: patients whose follow-up is interrupted prior to reaching a defined endpoint (i.e. lost to follow-up) and recurrent malaria due to nonfalciparum infections (

Comparison of ITT and mITT analyses
The unadjusted risk of treatment failure derived by ITT analysis was consistently higher than that derived by mITT analysis (median difference = 4.7% [-0.3-31.8%]) (

Comparison of PP and mITT analyses
The unadjusted risk of treatment failure derived from the PP analyses was consistently higher than that derived from the mITT analyses (Table 4). The median difference (PP-mITT) in Thailand was 1.9% (range 0-10.6) and was correlated with both the proportion of patients with incomplete follow-up (p = 0.02) and the duration of the study (p = 0.03). The difference in the estimates was significantly smaller in African studies (median = 0.1% In the adjusted analyses the median difference between estimates was 1.7% (range 0-30.9) and was correlated significantly with the proportion of patients with incomplete follow-up (p < 0.0001; Figure 2) in both Africa and Thailand. The difference was greater in Africa (median 3.2% [range 0-30.9]) compared to Thailand (median 1.0% [range 0-6.9]; p = 0.033).

Discussion
Anti-malarial drug clinical trials are conducted both to monitor anti-malarial drug resistance and to compare treatment regimens. As in all clinical trials, protocol violations and incomplete patient follow-up challenge the analysis and interpretation of the results. Malaria studies are, by their nature, logistically difficult, often being conducted in poorly resourced communities and prone to  Relationship between incomplete follow up and the risk dif-ference between adjusted estimates from PP and mITT analy-sis varying patient adherence to protocols. In addition to problems related to protocol adherence, anti-malarial clinical trials are also confounded by interrupted followup resulting from recurrent infections, either by the same or different malaria species. The statistical approach to deal with these challenges can vary according to the rationale of the study [3,12]. For instance, in comparative studies a conservative approach (i.e. intention to treat, ITT), in which all patients are included in the analysis but those with incomplete follow-up are classified as a treatment failure, is often advocated. In contrast, when monitoring anti-malarial drug resistance, the objective is to determine the risk of failure, with failure limited to those with a clearly inadequate response to therapy. Patients with incomplete follow-up can be either dropped from the analysis (e.g. per protocol, PP), or included in a survival analysis with censoring as "non-failures" on the last day of follow-up (modified intention to treat, mITT). The WHO currently recommends the latter as the preferred method of analysis of malaria drug efficacy studies [8], although accepts the option of per protocol analysis. In this paper, three analytical methods were compared from drug trials conducted in Thailand, Uganda, and Burkina Faso to determine the degree of variation in the derived estimates of efficacy and factors underlying this.
The studies presented come from two highly experienced research groups, and although the proportion of patients with interrupted follow-up (i.e. incomplete follow-up due to reasons other than recurrent infections) was generally low, this rose to as high as 36%. Interrupted follow-up was greater in the Thai studies compared to those conducted in Africa, in part because of the longer duration of study follow-up in Thailand. The occurrence of new P. falciparum infections or relapse of P. vivax infection, generally require retreatment and termination of the primary study. Even in study populations with the highest adherence rates, these proportions can often exceed a third of all patients enrolled (Additional File 1), reducing considerably the per protocol population. Predictably incomplete follow-up was higher for the adjusted estimates, which distinguishes recurrent infections, and in the African studies this was more apparent in studies conducted in areas of high transmission.
The proportion of patients with incomplete follow-up has significant implications for the derived estimates of treatment efficacy. Both the ITT and PP methods consistently over-estimated the risk of failure when compared to the preferred mITT method, the discrepancy in risk estimates varying from trivial to highly significant. For example, in the comparison of the unadjusted ITT and mITT failure estimates, 46% (30/65) of the difference in estimates exceeded 5%, with one study having a difference of 31.8%. The bias was most pronounced in Thailand due to the high percentage of patients with incomplete followup. These findings highlight that although the ITT method of analysis has utility for conservatively comparing treatment arms within a comparative drug trial, it is significantly biased when deriving point estimates of efficacy, for comparison over time or geographical location.
New infection with P. falciparum constituted an additional confounding factor for the adjusted analyses (PP-mITT).
Whereas individuals with such infections are removed from the PP analysis, they are censored in the mITT analysis after contributing a period of observation to the cumulative risk during which treatment failure was not observed. As a consequence, the PP analysis consistently overestimates treatment failure compared to that derived by the mITT survival analysis (median = 1.7% [IQR 0.5-5.6]). In 28% (16/57) of cases this difference exceeded an absolute value of 5%. The discrepancy was particularly apparent in the high transmission sites in Africa where reinfections were highest. The differences in risk estimates were lower for the unadjusted analyses, although in Thailand, high relapse rates with P. vivax and greater loss to follow-up resulted in 18.5% (5/27) of PP estimates deviating by more than 5% from the mITT estimate.
Survival analysis is being used increasingly to derive estimates of anti-malarial treatment efficacy. Although the ease of calculating the simple proportions in the PP analysis retains its appeal, and these estimates continue to be reported frequently in the literature, caution is needed when generating temporal and geographical trends using different analytical methods. This is particularly true for studies with poorer patient adherence to follow-up, higher incidence of P. vivax relapse, or a high incidence of new P. falciparum infections. Furthermore, since the proportion of reinfections and relapses observed in clinical trials is dependent upon the efficacy of the drug and its pharmacokinetic properties, the potential bias introduced by methodologies has implications for the comparative analysis of antimalarials.
Given the variations in study methods, survival analysis remains the preferred approach for monitoring in vivo efficacy. First, survival analysis allows for all available data to contribute to the analysis, thus increasing the precision of the derived estimates. Second, it avoids systematic biases introduced by dropping from the analysis patients who do not complete follow-up (PP) or classifying failures as patients who do not complete follow-up (ITT). Finally, survival analysis allows for data from patients with different follow-up periods to be combined to generate efficacy estimates at different time points, thus enabling direct comparison between studies with different lengths of follow-up [7].
Over the last decade it has become evident that the wider availability of highly effective anti-malarial regimens must be an integral part of any realistic approach to achieving the global elimination of malaria [13]. Current international guidelines advocate that new anti-malarial treatments should be introduced only if they yield cure rates greater than 90%. Once introduced, the efficacy of such novel regimens needs to be monitored regularly in order to detect early signs of declining efficacy. Even small fluctuations in risk estimates or wide confidence intervals can have huge implications for policy makers. In order to monitor temporal and spatial trends in anti-malarial efficacy, in vivo efficacy data need to be collated at an individual patient level and standardized analytical tools applied in a transparent and systematic manner [7]. The recently launched WorldWide Antimalarial Resistance Network (WWARN -http://www.wwarn.org), aims to do precisely that; gather global anti-malarial efficacy data and provide open access to their uniform interpretation.