- Open Access
Patient variability in the blood-stage dynamics of Plasmodium falciparum captured by clustering historical data
Malaria Journal volume 21, Article number: 300 (2022)
Mathematical models provide an understanding of the dynamics of a Plasmodium falciparum blood-stage infection (within-host models), and can predict the impact of control strategies that affect the blood-stage of malaria. However, the dynamics of P. falciparum blood-stage infections are highly variable between individuals. Within-host models use different techniques to capture this inter-individual variation. This struggle may be unnecessary because patients can be clustered according to similar key within-host dynamics. This study aimed to identify clusters of patients with similar parasitaemia profiles so that future mathematical models can include an improved understanding of within-host variation.
Patients’ parasitaemia data were analyzed to identify (i) clusters of patients (from 35 patients) that have a similar overall parasitaemia profile and (ii) clusters of patients (from 100 patients) that have a similar first wave of parasitaemia. For each cluster analysis, patients were clustered based on key features which previous models used to summarize parasitaemia dynamics. The clustering analyses were performed using a finite mixture model. The centroid values of the clusters were used to parameterize two established within-host models to generate parasitaemia profiles. These profiles (that used the novel centroid parameterization) were compared with profiles that used individual-specific parameterization (as in the original models), as well as profiles that ignored individual variation (using overall means for parameterization).
To capture the variation of within-host dynamics, when studying the overall parasitaemia profile, two clusters efficiently grouped patients based on their infection length and the height of the first parasitaemia peak. When studying the first wave of parasitaemia, five clusters efficiently grouped patients based on the height of the peak and the speed of the clearance following the peak of parasitaemia. The clusters were based on features that summarize the strength of patient innate and adaptive immune responses. Parameterizing previous within host-models based on cluster centroid values accurately predict individual patient parasitaemia profiles.
This study confirms that patients have personalized immune responses, which explains the variation of parasitaemia dynamics. Clustering can guide the optimal inclusion of within-host variation in future studies, and inform the design and parameterization of population-based models.
Malaria continues to be a global health priority. In 2020, the World Health Organization (WHO) estimated that malaria caused 241 million cases and 627,000 deaths worldwide . Plasmodium falciparum was responsible for the majority of the morbidity and mortality associated with malaria . Most cases and deaths occurred in the African region, and mainly affected children under 5 years old who had not yet developed an efficient adaptive immune response .
The P. falciparum life cycle involves several stages in two different hosts: within the insect vector (female Anopheles mosquito) and within the human host, including a liver-stage and blood-stage cycle [2, 3]. The dynamics of the blood-stage have a great influence on the epidemiology of malaria. For the individual, the morbidity and mortality of patients depend on the parasite density during the blood-stage [2,3,4,5,6]. For the population, the blood-stage dynamics influence the time that patients are infectious and their infectiousness to mosquitoes, which determine the transmission of malaria [2, 3, 7, 8].
The most detailed data of blood-stage dynamics of P. falciparum is from the malariatherapy studies . The studies were conducted by the National Institutes of Health laboratories in the USA when malariatherapy was used to treat neurosyphilis (1931–1963) . All patients were Afro-American adults suffering from neurosyphilis and had no previous exposure to malaria . The dataset contains daily measurements of the P. falciparum density for 334 patients from the infection day until the day that they recovered, due to treatment or spontaneously . This dataset highlighted the extreme inter-patient variability of parasitaemia density profiles during the blood-stage, for example, the natural infection lengths of patients varied between 37 and 405 days . These variations are due to the complex interactions between the host and the parasite, variations among individual immune responses, and between parasite strains [10,11,12,13,14,15,16,17,18,19,20].
Many within-host mathematical models have investigated blood-stage dynamics of P. falciparum, and reproduced the parasitaemia dynamics of malariatherapy patients by considering the variations of host immune responses and parasite strains [21,22,23,24,25,26,27,28]. However, because of the extreme variation between individuals, predicting a parasitaemia profile for a specific patient generally relied on model parameters that were case-specific or stochastically chosen from assumed distributions. Furthermore, models were often calibrated to patient parasitaemia profiles using summary statistics which ignored individual variation across these statistics [21,22,23,24,25,26,27,28]. Consequently, some of these models cannot be used to predict patient a specific parasitaemia profile without knowing the parasitaemia profile of the patient beforehand. Moreover, a recent review and analysis highlighted that the outcomes of within-host models are likely not robust, as a slight variation of parameter value created totally different predictions . This review suggested that the efficiency of within-host models to capture variation in infection dynamics depends more on the selection of the best-fitting simulations than the value of the parameters .
Including patient variation of within-host dynamics in transmission models for malaria, at the population level, is also challenging. Compartmental models, such as [30, 31], assume that all hosts have homogenous within-host dynamics. Therefore, their use is limited when estimating the effect of control strategies that directly impact the asexual stage. In contrast, individual-based models (IBMs) include individual variability of blood-stage dynamics . This can make them computationally difficult to simulate , and they require many parameters, so fitting to data is complex [33, 34].
This paper aimed to find the balance between assuming a homogeneous population and modelling each individual specifically. Attributes that capture the essential information of the parasitaemia profile for an individual were selected. Clustering based on these attributes provided centroid values, so that most of the individual variation could be accounted with few additional parameters.
First, clusters of patients with similar full parasitaemia dynamics were identified. From the malariatherapy studies described above , only 35 patients had a complete parasitaemia profile. Thus a second cluster analysis was performed to identify patients with a similar first wave of parasitaemia. For this analysis, the sample size increased to 100 individuals and thus accounted for more individual variation. Then, the ability of the clusters to capture the variability of within-host dynamics was evaluated using within-host models [21, 22]. Two within-host models were used: one that predicted the parasitaemia density for a patient throughout an untreated infection (Molineaux et al. ), and one that predicted the first wave in the parasitaemia density (Dietz et al. ). Parasitaemia profiles were predicted using case-specific parameters (as in the original papers), using cluster-specific parameters, and lastly, using parameters derived from the means of the whole dataset. These predictions were compared to assess the ability of the clusters to capture the variability of within-host dynamics.
Moreover, the distribution of non-parasitaemia attributes, such as the sex of the patient and the strains used for the infection, were compared across the clusters. Identifying clusters of patients with similar parasitaemia dynamics will improve the inclusion of patient within-host variation in future within-host and population-based models whilst remaining computationally inexpensive.
In this section, the methods for both datasets and clustering analysis are provided separately. The first section describes the data, the potential attributes which could be used to cluster the data, and the clustering method. Then, the second section describes the method to evaluate the clusters ability to capture patients variability using within-host models [21, 22]. The last section describes the method used to investigate how non-parasitaemia attributes, such as the sex of the patient and the strains used for the infection, correlate with the clusters.
Clustering the full parasitaemia profile
Data from patients that had a complete natural parasitaemia dynamic profile, not affected by curative treatment, were used to identify clusters of patients with a similar natural parasitaemia dynamic . In the malariatherapy dataset, 35 patients fulfilled these criteria. In this subset, 16 patients received a low-dose treatment (quinine, chloroguanide, or chloroquine). However, these patients were included in the analysis because the treatment had only a limited and short effect on the parasitaemia dynamic . Patients were infected either by inoculation of infected blood (18 cases) or by mosquito bites (17 cases). Patients were inoculated with different strains of P. falciparum: 17 with El Limon, 17 with Sante Cooper, and 1 with the McLendon strain . Parasitaemia levels were measured daily by microscopy [the detection limit was equal to ten parasitized red blood cells per microlitre (PRBC/µl)].
To identify clusters of patients with similar natural parasitaemia dynamics, the 35 patients were clustered based on nine key features used by several modelling papers to summarize a patient’s parasitaemia profile (Table 1) . Using these features as attributes, the data was clustered with all possible combinations of pairs of attributes. Clustering with three or more attributes was not feasible because the dataset was relatively small, so three or more attributes led to unreliable clusters.
Clustering assumed that the features were independent, which, strictly speaking, they were not in this case, as all nine features arose from the parasitaemia profile. Nonetheless, clustering remained a relevant analysis technique since the pairwise dependencies between the features were not straightforward. For all combinations of two standardized key features, clusters of individuals were identified using a finite mixture model fitted by expectation maximization . This method was used because it allowed for more variance in the size and shape of the clusters and more variance in the range of the key features compared to classical clustering approaches, such as k-means clustering.
The optimal number of clusters was determined for each combination using the Bayesian Information Criterion index (BIC) . The optimal number of clusters is a balance between a good fit (how well the cluster represented each patient which belonged to that cluster) and the number of clustering parameters (number of clusters). Later, when using the cluster centroids to predict the parasitaemia profiles, the effect of having more than the ‘optimal’ number of clusters was also explored.
Finally, the robustness of the clusters was assessed using leave-one-out analysis. The leave-one-out analysis involved performing the clustering without one patient and then assigning this patient to a cluster based on the chosen attributes. This process was repeated for all patients. A successful cluster assigned the left-out patient to the cluster to which they were assigned when they were included. The proportion of patients correctly assigned to their cluster was referred as the robustness score, such that a score of unity was perfect.
Evaluating the clusters
Molineaux et al.  considered many biological aspects of the dynamics and their variability: intraconal antigenic variation (50 variants); variation of the variant baseline growth rate (variable among variant and host); innate immune response (variable among host); acquired variant-specific immune response corresponding to the variant-specific antibodies that target Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) expressed on the red blood cell membrane; acquired variant transcending immune response (variable among host) representing the antibodies that target merozoite surface proteins; and the measurement error . Molineaux et al.  predicted an individual parasitaemia density on a given day based on the parasitaemia density 48 h previously, the multiplication rate of the antigenic variants, and the immune responses. The strength of the immune responses was determined by two critical parasite densities that were specific to each patient:
Pm was the host critical density that determined the transcending adaptive immune system strength (named Pm* in Molineaux et al. ). It was based on the infection length of the patients ((ix) from Table 1).
For each patient, the multiplication rates for the variant’s baseline growth rates were sampled from a normal distribution centred around 16 . The final simulated profile was determined by comparing (using chi-squared) the nine key features (Table 1) from 50 simulated profiles with the key features from the actual patient profile. Note that Molineaux et al.  used case-specific data when first defining Pc and Pm and again when choosing from the 50 simulated profiles.
The novelty of this approach is to predict the parasitaemia dynamic of patients based on their respective clusters by parameterizing the model of Molineaux et al.  using cluster-specific parameter values for Pc and Pm instead of patient-specific parameter values. This means that all patients from the same cluster had the same parasitaemia profile based on the centroid values of that cluster. Note that the strength of the acquired variant-specific immune response in the model of Molineaux et al.  was not patient-specific and was, therefore, not relevant for clustering. For comparison, the model was also parameterized to each patient, replicating Molineaux et al.  (which corresponded to parameterizing the model to 35 clusters that each contain only one patient), and to all patients (which corresponded to parameterizing the model to one cluster that contains all patients). For each parameterization, the same methodology as Molineaux et al.  was followed except for the parameterization step and the selection of the best simulation (Table 2). Then, for each parameterization, the absolute error in the model’s prediction of the infection length, the first parasitaemia maximum of each patient, and a weighted error of these two errors was calculated. The weighted error was defined as the sum of the standardized error of the infection length and the standardized error of the log of the first parasitaemia maximum.
Clustering analysis of first parasitaemia wave
The same subset of patients used by Dietz et al.  was selected from the full malariatherapy dataset to identify clusters of patients that have a similar first wave of parasitaemia. This subset fulfilled the following criteria: The maximum of the first wave was the absolute maximum of the case, the patient had no curative or suppressive treatment, the patient had no missing parasitology data on even days, the days in which the patient had a fever were known, the patient had at least five parasitaemia measurements, the first observed density was less than 320 PRBC/µl, the log(maximum /last density) was higher than 0.6, and the log(last/penultimate density) was lower than 0.3 . These criteria were fulfilled by 100 patients. Patients were either infected by infected blood inoculation (67 cases) or by mosquito bites (33 cases) . Patients were infected with different strains of P. falciparum: 50 patients with McLendon, 25 with El Limone, 21 with Sante Cooper, 3 with Colombia, and 1 with the Costa strain . Parasitaemia levels were measured daily by microscopy (detection limit equal to 10 PRBC/µl) .
To identify clusters of patients with a similar first wave of parasitaemia, the 100 patients were clustered using five key features that summarized the dynamics of the first wave of parasitaemia, and were used as case-specific parameters by Dietz et al.  (Table 3). As in the previous section, for all pairwise combinations of standardized key features, clusters of individuals were identified using a finite mixture model fitted by expectation maximization . As before, the optimal number of clusters was determined for each combination using the BIC, and the robustness of the clusters was assessed using leave-one-out analysis .
Evaluating the clusters
The model from Dietz et al.  was used to assess how well each cluster predicted the first wave of parasitaemia for each individual. This model reproduced the first wave for the same 100 patients. Dietz et al.  simplified the model from Molineaux et al.  to consider only one PfEMP1 variant, and thus, instead of modelling three immunity components, only two were required: innate immunity and acquired immunity. Dietz et al. used four case-specific parameters to reproduce the natural first wave of patients (Table 4) .
To predict the typical first parasitaemia wave of patients based on their respective cluster, the model was parameterized to the clusters using the cluster-specific versions of these parameters (Tables 4 and 5). Dietz et al.  used the Powell Hill-climbing algorithm to calculate the case-specific parameters. Here, the parameters were directly calculated as defined in Table 4. To improve the accuracy of the prediction, the calculation of the cumulative effective density (Pm) was modified: 4830 divided by the ratio between the maximum density and the last density of the first parasitaemia wave. The improvement in the accuracy of the model is detailed in Additional file 1.
As before, the model was parameterized to each patient (which corresponded to parameterizing the model to 100 clusters that each contain only one patient) and to all patients (which corresponded to parameterizing the model to one cluster that contains all patients) (Table 5). The error of the predicted parasitaemia measurements was calculated for each patient.
For both clustering analyses, the distributions of all available external attributes across the identified clusters were examined: the sex of patients, the use of a repressive treatment, the strains used for the infection (Santee Cooper, Mc Lendon, El Limon, Colombia, Costa), and the method of infection (blood induction or sporozoite induction). For the full parasitaemia profile, the patient’s fever and gametocyte profiles were additionally compared for each cluster. This comparison was not made for the first wave of parasitaemia, as many patients received treatment that biased their fever and gametocyte profile.
Clustering analysis of parasitaemia profile
The strongest pairwise clustering was found to contain the length of infection (ix) and the density of the first maximum of parasitaemia (iii) (Fig. 1). The optimal number of clusters defined by the BIC was two (robustness score = 32/35 = 0.91). Patients were grouped in a big cluster of 30 patients (cluster A) and a small cluster of five patients (cluster B) (Fig. 1). Patients in cluster A had a long infection length and high density at the first parasitaemia maximum (centroids value of cluster A: 233 days, 4.91 log(PRBC/µl). In contrast, patients in cluster B had a shorter infection length and a lower first maximum of parasitaemia (centroids value of cluster B: 97 days, 4.01 log(PRBC/µl). This indicates that patients with a higher first peak of parasitaemia tended to be infected for a longer period.
The clustering results supported the modelling choice in Molineaux et al.  to use parameters Pc and Pm, which are based on the first parasitaemia maximum and the last positive days, respectively . Parameterizing the model from Molineaux et al.  to the clusters required simply using the centroid values of the clusters instead of case-specific values for Pc and Pm
Prediction of patient’s parasitaemia profiles
The typical parasitaemia profiles for patients within each cluster were for each cluster predicted (Fig. 2). To analyze the effect of the number of clusters on the error, the typical parasitaemia profiles from clustering the data to 3, 4, 6 and 8 clusters were also predicted.
Generally, the combined mean error decreased when the number of clusters increased, as did its standard deviation (Fig. 3A). With only one cluster, the mean weighted error was 3.17. By parameterizing the model to the two identified clusters, the mean weighted error was reduced to 2.34 (reduction of 26.2% compared to one cluster). This decrease was the biggest drop in the weighted error, which arose from adding only one additional cluster. For example, a third cluster only slightly reduced the mean weighted error further to 2.15 (reduction of 32.2% compared to one cluster). Nevertheless, as expected, the model parameterized to the 35 clusters had the minimum error, with a mean weighted error of 0.97 (reduction of 69.4% compared to one cluster).
The errors for the two parasitaemia values, the error in the infection length and the first maximum of parasitaemia, were reduced by parameterizing the model to the two identified clusters (Fig. 3B and C), compared to using one cluster only. The error for the infection length predictions was reduced slightly more (reduction of 29.1% compared to one cluster) than the error for predictions of the maximum of parasitaemia (reduction of 23.6% compared to one cluster). Note that the variation in the infection length over all patients was higher than the variation in the first peak of parasitaemia, suggesting that the variation of the adaptive immune response is stronger than the variation of the innate immune response. In contrast, when comparing the model parameterized to one cluster over the model parameterized to 35 clusters, the error for the first parasitaemia maximum was reduced more (error reduction of 88.6% compared to one cluster) than the error for the infection length of patients (error reduction of 51.0% compared to one cluster). This highlights that Molineaux et al.  captured the first parasitaemia maximum better than the infection length of patients.
The weighted error was examined separately for clusters A and B under the three different parameterizations. When the model was parameterized to 1 cluster, the weighted error was higher in cluster B (mean error 5.83) than in cluster A (mean error 2.48), meaning that patients in cluster B (the smaller cluster) were considerably different to the overall average (Fig. 4A). This hypothesis was also supported by the drop in the weighted error (from 5.83 to 1.81) for cluster B once the parameterization considered these patients separately (Fig. 4B). The weighted error of cluster A dropped when the number of clusters increased from one to two, but this drop was less dramatic (from 2.48 to 2.26). As expected, parameterizing to 35 clusters reduced the error further (mean error for cluster A: 0.94, cluster B: 0.32) (Fig. 4C). In this case, the reason that cluster A had a higher weighted error may be due to more inter-patient variation .
Comparing non-parasitaemia attributes and clusters
When comparing non-parasitaemia attributes amongst the two clusters, patients from cluster B (the cluster with low parasitaemia density) did not receive treatment (Additional file 1: Figure S3), which was expected because treatment was only given to patients with a high parasitaemia load. Moreover, there was no relationship between clusters and the parasite strain, nor the method of infection, meaning that these factors did not have a strong influence on the within-host dynamic (Additional file 1: Figure S3). However, the ten female patients (out of 35) all belonged to cluster A, meaning that from this dataset, all female patients experienced a high parasitaemia load and long infection length (Additional file 1: Figure S3).
In addition, the gametocyte profile of patients also varied by clusters. Patients from cluster A had a higher maximum peak of gametocyte density and number of days with detected gametocytes (Additional file 1: Figure S4). This observation was probably because parasitaemia density and gametocyte density are related, and the fact that patients from cluster A had a longer infection length and higher parasitaemia than patients from cluster B. In addition, these results could also be because some patients in cluster A received a low-dose treatment that could have triggered the generation of gametocytes. This result highlighted that patients from cluster A were probably more infectious than patients from cluster B.
There was also a straightforward relationship between the clusters and the fever of the patients. Patients from cluster A generally experienced fever for a longer period, and they had a higher maximum fever temperature than patients in cluster B (Additional file 1: Figure S5). This was because patients from cluster A had a longer infection length and a higher parasitaemia first peak (the fever often peaks at the beginning of the infection). These results indicate that patients from cluster A were more likely to suffer from severe symptoms than patients from cluster B.
Clustering analysis of the first parasitaemia wave
The pair of attributes that gave the strongest clustering was the cumulative effective density (iii) and the maximum density of parasitaemia (v) (see Table 3 for definitions). In Dietz et al. the cumulative effective density (iii) captured the strength of the adaptive immune response, and thus determined the decreasing slope of the first parasitaemia wave . The parasitaemia maximum (v) depended on both the multiplication factor of the parasite, and the host innate immune response . Therefore, the clustering analysis confirmed that all the different shapes of the first parasitaemia waves were captured using these key features.
Due to the high variability of the data, the data were preliminary divided based on the result of the first cluster analysis. Recall that the first cluster analysis highlighted that patients either have a large first peak of parasitaemia (higher than 4.3(log(PRBC/µl)) or a small first peak (lower than 4.3 log(PRBC/µl)). After this preliminary division, two independent cluster analyses were performed on the two subsets. Two clusters, clusters 2 and 4, were identified in the subset of patients that had a low first peak of parasitaemia (centroids cluster 2: 4.04 log(PRBC/µl), cluster 4: 4.02 log(PRBC/µl) (Fig. 5). Three clusters, clusters 1, 3 and 5, were identified in the subset of patients that had a high first peak of parasitaemia (centroids cluster 1: 4.71 log(PRBC/µl), cluster 3: 4.78 log(PRBC/µl), cluster 5: 4.90 log(PRBC/µl)) (Fig. 5). Clusters 1, 2, 3, 4, and 5 grouped 45, 21, 16, 6, and 12 individuals, respectively. Moreover, in both subsets, some clusters had a low centroid value of the cumulative effective density (cluster 1: 0.63 log(PRBC/µl), cluster 2: 1.00 log(PRBC/µl)), and some clusters had a high centroid value of cumulative effective density (cluster 4: 5.02 log(PRBC/µl), cluster 5: 5.90 log(PRBC/µl)). In the subset of high first peaks of parasitaemia (clusters 1, 3 and 5), one cluster had a medium centroid value for the cumulative effective density (cluster 3: 3.12 log(PRBC/µl)).
Predictions of patients’ first parasitaemia wave
The model from Dietz et al.  was parameterized to the five identified clusters. For each cluster, the cumulative effective density was equal to the cluster centroid value. The three other parameters (Table 4) were estimated as the median parameter values of patients belonging to the cluster. Note that the onset of the adaptive immunity did not vary across the clusters (Fig. 6A). However, as expected, clusters of patients with a lower maximum density of parasitaemia had a lower multiplication rate and critical effective density (which captured the strength of the innate immunity) (Fig. 6B and C).
As expected, the predicted maximums of the first parasitaemia wave of patients from clusters 2 and 4 were smaller (4.51, 4.48 log(PRBC/µl) respectively) than the maximum parasitaemia of patients from clusters 1, 3, and 5 (4.77, 4.74, 4.8 log(PRBC/µl) respectively) (Fig. 7). In addition, the typical first parasitaemia wave for patients in clusters 1 and 2 were long (39 and 35 days respectively) (Fig. 7A and B). Their first parasitaemia wave was characterized by a strong innate immune response, active for a long period, and a progressive increase of the adaptive immune response, which caused a progressive decrease in the parasitaemia density. In contrast, the typical first parasitaemia wave for patients in clusters 4 and 5 were short (23 and 21 days, respectively) (Fig. 7D and E). Their adaptive immune response became highly efficient quickly, which caused a rapid reduction of the parasitaemia density. Consequently, the innate immune response was not activated for a long time. Finally, patients in cluster 3 were between these two behaviours with a typical first wave lasting 27 days (Fig. 7C).
The error in the predictions of the model from Dietz et al.  for each patient was examined. The model was parameterized to each patient (100 clusters), to the cluster centroids (5 clusters), and to all patients (1 cluster). As expected, the error of patient’s parasitaemia measurement decreased as the number of clusters increased (Fig. 8). When the model was parameterized to all patients, the mean error was 1.93 (log(PRBC/µl)). When the model was parameterized to the five clusters, the mean error was greatly reduced (mean error 0.90 (log(PRBC/µl)), and had a similar value to when the model was parameterized to each patient (mean error 0.66 (log(PRBC/µl))). These results mean that the five clusters could effectively capture the variation of the first wave.
When the model was parameterized using the overall means (one cluster), the error was higher for clusters 4 and 5 than for other clusters (Fig. 9A). When the model was parameterized to the five clusters, the error was reduced in every cluster (Fig. 9B). However, the distribution of the error was higher for clusters 4 and 5 than for the other clusters. This was also the case when the model was parameterized to each patient (Fig. 9C). This suggests that although clusters 1 and 2 were considerably larger than cluster 4, cluster 5 had more variation.
Comparing non-parasitaemia attributes and clusters
When comparing non-parasitaemia attributes amongst the five clusters, the results agreed with the analysis on the smaller dataset (Additional file 1: Figure S6). That is, the strains used to infect patients and the method of infection did not vary between the clusters of the first wave of parasitaemia. In addition, as with cluster B from the previous analysis, all female patients belonged to clusters characterized by a high maximum of parasitaemia (clusters 1, 3, and 5).
This study performed cluster analyses on two datasets of patients’ parasitaemia profiles based on key features which summarized the strengths of the innate and adaptive immune responses. The first dataset was the full, untreated, parasitaemia profile, and the second was the first wave of parasitaemia. The analysis provides insights in itself. For example, the clustering analysis of the full parasitaemia profile identified a smaller group of patients whose peak parasitaemia density and infection length were not well represented by the overall means. With regards to quantifying recovery and transmission rates in a population, the analysis provided reasonable estimates that largely account for heterogeneity without overloading population models with a set of parameters for each individual. Below the implications of the study are detailed.
Two clusters were identified when analysing the parasitaemia dynamics over a full infection: a large cluster of 85% of patients that had a long infection length and a high parasitaemia maximum (cluster A), and a small cluster of 15% of patients that had a short infection length and a low maximum of parasitaemia (cluster B). These results suggest that patients with a small first wave tend to have shorter infections. According to previous studies and models, the height of the first peak of parasitaemia, and the length of infection, are attributes that capture the strength of the patients innate and adaptative immunity [17, 21, 29, 37]. Consequently, these results highlighted that patients have variable immune response strengths, as reported in previous studies [10,11,12, 17,18,19,20], and suggest that patients with a strong innate immune response tend to have a strong adaptive immune response too.
Only two clusters were sufficient to capture the extreme variability of the full, untreated, parasitaemia profiles, and thus patients’ parasitaemia profiles were predicted without using case-specific data. Comparing the parameterization of one cluster to two clusters showed that only one extra cluster reduced the weighted error by 26.29%. The weighted error was especially reduced for patients with a short infection length and a low maximum of parasitaemia (cluster B). Therefore, patients in this cluster were not well represented by overall mean values. Exploring the effect of dividing the dataset into more clusters shows that it would be necessary to include many more clusters (6–8 clusters in total) to reduce the error further. However, additional parameters would increase the complexity of a model. Therefore, it is most efficient to consider only two clusters to capture within-host variation in a simple mechanistic within-host model, and to inform different recovery rates in a population-based model.
When the model from Molineaux et al.  was parameterized to each patient, the error of the infection length was high compared to the error of the first peak of parasitaemia. Therefore, the model from Molineaux et al.  is better at capturing the variation in the first peak of parasitaemia than the variation in infection length. This accuracy difference is probably because of the model stochasticity for the multiplication rate. Consider that for each patient, 50 simulations were run with different values for the multiplication factor. The resulting profiles had similar values for the parasitaemia maximum. However, the range for the predicted infection length was highly variable. Therefore, predicting the patient infection length was more sensitive to the parasite multiplication rate. This suggests that the accuracy of the predicted infection length in Molineaux et al.  is driven more by selecting the best simulation, as opposed to parameterizing the model to the data. The same conclusion was reached by a recent review of current within-host models  and a recent modelling study , which showed that a slight variation in a parameter caused a significant variability of infection length. Note that when the Molineaux et al.  model was used, a similar prediction error of the first parasitaemia maximum (0.037 log(PRBC/µl)) was obtained, but the predicted infection length of patients was on average 25 days closer to the actual infection length. This difference may have arisen from the use of different parasite multiplication factors, or they may have selected from the 50 simulations with more weighting on matching the infection length.
The resolution of the analysis was increased by clustering patients only on the first wave of parasitaemia. As in the first analysis, a minority of patients had a low first wave, and a majority of patients had a high first wave. Patients with a small first wave were clustered in a manner that separated those with a quick (cluster 2) or a slow (cluster 4) decrease of the first wave. In contrast, patients with a high first wave were clustered in a manner that separated those with a quick (cluster 1), medium (cluster 3) or slow (cluster 5) decrease of the first wave. In the model from Dietz et al. , the first peak of parasitaemia captured the strength of the innate immune response, and the decrease of the first wave captured the strength of the adaptive immune response. In consequence, these results also confirm that the strength of the immune responses among patients varies greatly. The results further highlighted that patients have even more immune response variability during the first wave than the complete parasitaemia profile. This variability is particularly relevant for patients with an initial high parasitaemia density.
For each cluster of the first wave of parasitaemia, the typical dynamics of the first wave of patients were predicted using the model from Dietz et al., and similar predictions were obtained . Moreover, parameterizing the model using five clusters instead of one cluster reduced the error by 53.4%. In comparison, parameterizing the model using 100 clusters (each patient) instead of one cluster reduced the error by 65.8%. This suggests that the clusters are extremely efficient at capturing the variability in the first parasitaemia waves. Consequently, it is most efficient to consider five clusters to capture within-host variation in a simple mechanistic within-host model, and to inform different recovery rates in a population-based model that investigates the impact of interventions that effect the first parasitaemia wave, such as blood-stage vaccine and drugs.
Investigating the distribution of non-parasitaemia attributes across clusters highlighted which non-parasitaemia attributes can potentially impact within-host dynamics. Interestingly, Female patients did not experience malaria infections with low parasitaemia density and short infection length. Variation in the infection length of malaria between genders supports recent studies . The method of inducing infection and the strain used to infect patients did not vary between clusters. Consequently, the results suggest that these factors do not strongly influence the within-host dynamics in naive individuals. Other external attributes, such as age, could not be compared with the clusters due to the lack of data. Nevertheless, age plays an important role in determining parasitaemia dynamics in the field, as when individuals get older, they gradually develop immunity to malaria due to repeated exposure to the parasites. However, in this study, all patients were naïve to malaria. Thus, age would probably be less critical than observed in the field.
The clustering analyses were performed on subsets of patients from the malariatherapy dataset, thus the conclusions are limited to this dataset, noting there is no other fully detailed set for a complementary analysis. There are limitations of the dataset that may affect the conclusions. First, the detection limit of the microscope was high (10 PRBC/µl). As a consequence, infection lengths may be underestimated in this study . Second, in the malariatherapy dataset, all patients were Afro-American adults suffering from neurosyphilis and naïve to malaria . These patients do not represent all populations at risk for P. falciparum (for example, children under 5 years old) [1, 5]. Lastly, in the malariatherapy study, patients received suppressive treatment when they exhibited clinical symptoms due to high parasitaemia density . The clustering analysis of the first parasitaemia wave included 100 patients who did not receive suppressive treatment during the first parasitaemia wave. Therefore, this dataset did not include patients experiencing very high parasitaemia density. This means that although severe malaria is infrequent in the general population [1, 5], these patients are not represented in this dataset.
Clusters of patients that have similar parasitaemia dynamics were identified using key features that captured the strength of the innate and adaptative immune response. Non-parasitaemia attributes, other than the biological sex of the patient, were not associated with variations in the within-host dynamic. Thus, these results confirm that individuals have personalized and variable strengths of both immune responses, which causes inter-patient variation in the within-host dynamics in naïve individuals. The results suggest that by capturing the variability of patients’ immune response, the variability of within-host dynamics could be efficiently captured. That is, the parasitaemia profiles of patients were reasonably predicted by parameterizing previous within-host models to the identified clusters. The identified clusters can be used to include within-host variation in simplified mechanistic within-host models, and to inform categories of patients in population-based models.
Availability of data and materials
The data analysed during the current study are publicly available . The cleaned datasets used for the analysis can be requested to the corresponding author.
Bayesian information criterion
Individual based model
Plasmodium falciparum erythrocyte membrane protein 1
Parasitized red blood cells
Red blood cells
World health Organization
WHO. World malaria report 2020: 20 years of global progress and challenges. Geneva: World Health Organization. 2020 [cited 2022 Oct 4]. Available from: https://www.who.int/publications/i/item/9789240015791.
Tuteja R. Malaria—an overview. FEBS J. 2007;274:4670–9.
Cowman AF, Healer J, Marapana D, Marsh K. Malaria: biology and disease. Cell. 2016;167:610–24.
Milner DA. Malaria pathogenesis. Cold Spring Harb Perspect Med. 2018;8:a025569.
Mackintosh CL, Beeson JG, Marsh K. Clinical features and pathogenesis of severe malaria. Trends Parasitol. 2004;20:597–603.
Miller LH, Baruch DI, Marsh K, Doumbo OK. The pathogenic basis of malaria. Nature. 2002;415:673–9.
Meibalan E, Marti M. Biology of malaria transmission. Cold Spring Harb Perspect Med. 2017;7:a025452.
Ross A, Killeen G, Smith T. Relationships between host infectivity to mosquitoes and asexual parasite density in Plasmodium falciparum. Am J Trop Med Hyg. 2006;75:32–7.
Collins WE, Jeffery GM. A retrospective examination of sporozoite and trophozoite-induced infections with Plasmodium falciparum: development of parasitologic and clinical immunity during primary infection. Am J Trop Med Hyg. 1999;66:492–502.
Natama HM, Rovira-Vallbona E, Krit M, Guetens P, Sorgho H, Somé MA, et al. Genetic variation in the immune system and malaria susceptibility in infants: a nested case–control study in Nanoro. Burkina Faso Malar J. 2021;20:94.
Flori L, Delahaye NF, Iraqi FA, Hernandez-Valladares FF, Rihet P. TNF as a malaria candidate gene: polymorphism-screening and family-based association analysis of mild malaria attack and parasitemia in Burkina Faso. Genes Immun. 2005;6:472–80.
de Mendonça VRR, Goncalves MS, Barral-Netto M. The host genetic diversity in malaria infection. J Trop Med. 2012;2012:940616.
Hamilton R, Boots M, Paterson S. The effect of host heterogeneity and parasite intragenomic interactions on parasite population structure. Proc Biol Sci. 2005;272:1647–53.
Sondo P, Derra K, Lefevre T, Diallo-Nakanabo S, Tarngda Z, Zampa O, et al. Genetically diverse Plasmodium falciparum infections, within-host competition and symptomatic malaria in humans. Sci Rep. 2019;9:127.
Ariey F, Hommel D, Le Scanf C, Duchemin JB, Peneau C, Hulin A, et al. Association of severe malaria with a specific Plasmodium falciparum genotype in French Guiana. J Infect Dis. 2001;184:237–41.
Gupta S, Hill AV, Kwiatkowski D, Greenwood AM, Greenwood BM, Day KP. Parasite virulence and disease patterns in Plasmodium falciparum malaria. Proc Natl Acad Sci USA. 1994;91:3715–9.
Molineaux L, Träuble M, Collins WE, Jeffery GM, Dietz K. Malaria therapy reinoculation data suggest individual variation of an innate immune response and independent acquisition of antiparasitic and antitoxic immunities. Trans R Soc Trop Med Hyg. 2002;96:205–9.
Walther M, Woodruff J, Edele F, Jeffries D, Tongren JE, King E, et al. Innate immune responses to human malaria: heterogeneous cytokine responses to blood-stage Plasmodium falciparum correlate with parasitological and clinical outcomes. J Immunol. 2006;177:5736–45.
Modiano D, Chiucchiuini A, Petrarca V, Sirima BS, Luoni G, Roggero MA, et al. Interethnic differences in the humoral response to non-repetitive regions of the Plasmodium falciparum circumsporozoite protein. Am J Trop Med Hyg. 1999;61:663–7.
Korbel DS, Newman KC, Almeida CR, Davis DM, Riley EM. Heterogeneous human NK cell responses to Plasmodium falciparum-infected erythrocytes. J Immunol. 2005;175:7466–73.
Molineaux L, Diebner HH, Eichner M, Collins WE, Jeffery GM, Dietz K. Plasmodium falciparum parasitaemia described by a new mathematical model. Parasitology. 2001;122:379–91.
Dietz K, Raddatz G, Molineaux L. Mathematical model of the first wave of Plasmodium falciparum asexual parasitemia in non-immune and vaccinated individuals. Am J Trop Med Hyg. 2006;75:46–55.
Johnston GL, Smith DL, Fidock DA. Malaria’s missing number: calculating the human component of R0 by a within-host mechanistic model of Plasmodium falciparum infection and transmission. PLoS Comput Biol. 2013;9:e1003025.
Challenger JD, Bruxvoort K, Ghani AC, Okell LC. Assessing the impact of imperfect adherence to artemether-lumefantrine on malaria treatment outcomes using within-host modelling. Nat Commun. 2017;8:1373.
Childs LM, Buckee CO. Dissecting the determinants of malaria chronicity: why within-host models struggle to reproduce infection dynamics. J R Soc Interface. 2015;12:20141379.
Paget-McNicol S, Gatton M, Hastings I, Saul A. The Plasmodium falciparum var gene switching rate, switching mechanism and patterns of parasite recrudescence described by mathematical modelling. Parasitology. 2002;124:225–35.
Gatton ML, Cheng Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology. 2004;128:367–76.
Eckhoff PP. falciparum infection durations and infectiousness are shaped by antigenic variation and innate and adaptive host immunity in a mathematical model. PLoS ONE. 2012;7:e44950.
Camponovo F, Lee T, Russell J, et al. Mechanistic within-host models of the asexual Plasmodium falciparum infection: a review and analytical assessment. Malar J. 2021;20:309.
Ross R. Some a priori pathometric equations. BMJ. 1915;1:546–7.
Macdonald G. The epidemiology and control of malaria. London: Oxford University Press; 1957.
Bershteyn A, Gerardin J, Bridenbecker D, Lorton CW, Bloedow J, Baker RS, et al. Implementation and applications of EMOD, an individual-based multi-disease modeling platform. Pathog Dis. 2018;76:fty059.
Gu W, Killeen GF, Mbogo CM, Regens JL, Githure JI, Beier JC. An individual-based model of Plasmodium falciparum malaria transmission on the coast of Kenya. Trans R Soc Trop Med Hyg. 2003;97:43–50.
Smith T, Killeen GF, Maire N, Ross A, Molineaux L, Tediosi F, et al. Mathematical modeling of the impact of malaria vaccines on the clinical epidemiology and natural history of Plasmodium falciparum malaria: overview. Am J Trop Med Hyg. 2006;75(Suppl 2):1–10.
Fraley C, Raftery AE, Scrucca L, Brendan T, Fop M. Gaussian mixture modelling for model-based clustering, classification, and density estimation. 2018 [cited 2022 Oct 4]. Available from: https://cran.r-project.org/web/packages/mclust/mclust.pdf.
Coelho CH, Doritchamou JYA, Zaidi I, Duffy PE. Advances in malaria vaccine development: report from the 2017 malaria vaccine symposium. NPJ Vaccines. 2017;2:34.
Kwiatkowski D. Malarial toxins and the regulation of parasite density. Parasitol Today. 1995;11:206–12.
Briggs J, Teyssier N, Nankabirwa JI, Rek J, Jagannathan P, Arinaitwe E, et al. Sex-based differences in clearance of chronic Plasmodium falciparum infection. eLife. 2020;9:e59872.
We sincerely acknowledge all members of the Infectious Disease Modelling Groups of the Swiss Tropical and Public Health Institute for their precious advice.
This research was funded under Prof. Melissa Penny’s Swiss National Science Foundation Professorship PP00P3_170702 and Tamsin Lee’s Marie Curie Individual Fellowship 839121, Horizon 2020.
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Predicted dynamic of the first parasitaemia wave using parameters as in Dietz et al. (2006). Observed and predicted parasitaemia dynamics and the predicted immune responses of patients (A) 46, (B) 224, and (C) 26 when the constants were calculated as defined by Dietz et al. (2006). Figure S2. Predicted dynamic of the first wave of parasitaemia using parameters as defined here. Observed and predicted parasitaemia dynamics and the predicted immune responses of patients (A) 46, (B) 224, and (C) 26, when Pm was calculated as defined here. Figure S3. Variation of non-parasitaemia attributes across clusters of the full parasitaemia profiles. Visualization of the distribution of (A) the sex of the patients, (B) repressive treatment usage, (C) the way patients were infected, and (D) the strain used to infect patients among the two clusters of the full parasitaemia profile. Figure S4. Variation of patient gametocyte profiles across clusters of the full parasitaemia profiles. Boxplot of the distribution of (A) the number of days with detected gametocytes and (B) the maximum density of gametocytes of the patients belonging to the two clusters of the full parasitaemia profile. Figure S5. Variation of the patient fever profiles across clusters of the full parasitaemia profiles. Boxplot of the distribution of (A) the maximum value of fever and (B) the number of days with fever of the patients belonging to the two clusters of the full parasitaemia profile. Figure S6. Variation of non-parasitaemia attributes across clusters of the first wave of parasitaemia. Visualization of the distribution of (A) the sex of the patients, (B) the way patients were infected, and (C) the strain used to infect patients among the fives clusters of the first wave of parasitaemia.
About this article
Cite this article
Masserey, T., Penny, M.A. & Lee, T.E. Patient variability in the blood-stage dynamics of Plasmodium falciparum captured by clustering historical data. Malar J 21, 300 (2022). https://doi.org/10.1186/s12936-022-04317-0
- Plasmodium falciparum
- Blood-stage infections
- Patient variability
- Cluster analysis
- Mathematical model