The effects of a partitioned var gene repertoire of Plasmodium falciparum on antigenic diversity and the acquisition of clinical immunity

Background The human malaria parasite Plasmodium falciparum exploits antigenic diversity and within-host antigenic variation to evade the host's immune system. Of particular importance are the highly polymorphic var genes that encode the family of cell surface antigens PfEMP1 (Plasmodium falciparum Erythrocyte Membrane Protein 1). It has recently been shown that in spite of their extreme diversity, however, these genes fall into distinct groups according to chromosomal location or sequence similarity, and that recombination may be confined within these groups. Methods This study presents a mathematical analysis of how recombination hierarchies affect diversity, and, by using simple stochastic simulations, investigates how intra- and inter-genic diversity influence the rate at which individuals acquire clinical immunity. Results The analysis demonstrates that the partitioning of the var gene repertoire has a limiting effect on the total diversity attainable through recombination and that the limiting effect is strongly influenced by the respective sizes of each of the partitions. Furthermore, by associating expression of one of the groups with severe malaria it is demonstrated how a small number of infections can be sufficient to protect against disease despite a seemingly limitless number of possible non-identical repertoires. Conclusion Recombination hierarchies within the var gene repertoire of P. falciparum have a severe effect on strain diversity and the process of acquiring immunity against clinical malaria. Future studies will show how the existence of these recombining groups can offer an evolutionary advantage in spite of their restriction on diversity.

with age and exposure, young children seem to be protected against the most severe forms of the disease after only a few episodes [9].
One of the main targets of protective immune responses against P. falciparum are the highly polymorphic variant surface antigens (VSA) [3,[10][11][12][13], such as the Plasmodium falciparum Erythrocyte Surface Proteins, PfEMP1. These proteins are important virulence factors as they mediate cytoadherence to a variety of host cells receptors and cause sequestration of infected erythrocytes in vital organs such as the brain or placenta, which is a key element in the pathology of malaria [14][15][16][17]. PfEMP1 is encoded by a family of about 50-60 highly variable var genes per genome and is mutually exclusively expressed on the surface of infected red blood cells in a process known as clonal antigenic variation [18][19][20][21]. On a population level, the enormous sequence diversity of var genes facilitates the sequential reinfection of hosts, with the antigenic profile of newly infecting parasites appearing to correspond to a 'hole' in the antibody repertoire of the host [22][23][24]. Thus, immune responses elicited by one infection may not provide protection against a pathogen with a different set of var genes or a similar pathogen expressing a different subset of it var gene repertoire [11].
The extensive sequence diversity of PfEMP1 observed in the field is predominantly a consequence of high allelic and ectopic recombination rates [25][26][27][28]. However, the sequencing of the 3D7 malaria lab strain revealed genetic structuring in which var genes can be grouped according to their 5' upstream sequence (Ups) and chromosomal location [29], dividing them into into three major groups (A, B and C) and two intermediate groups (B/A and B/C) [30][31][32]. Bull et al. [22] proposed a different grouping based on features of short sequence tags within the Duffybinding-like (DBL) a domain which can be sampled from most var genes using universal primers [28]. Even though this grouping is based on only a small portion of the gene it corresponds well with the ones based on whole genome sequences and it has become apparent that recombination is, at least to some extent, confined within these groups. Through the comparison of different P. falciparum isolates from diverse geographical regions, for example, Kraemer et al. [32] illustrated that these groups may be evolving independently of each other. Furthermore, it appears that these groups may have clinical significance, possibly due to functional differences. Several studies have found that severe disease in young children associates with the expression of var group A and B/A [33][34][35]. Rosetting is another well-known virulence factor and recent studies have identified associations between the rosetting phenotypes and the expression of particular groups of var genes [22,36,37]. These studies suggest that a relatively confined group of genes are responsible for most of the clinical infections cause by P. falciparum; this may also explain why immunity to clinical malaria is acquired early on in life (depending on transmission intensity) and after only a few episodes [9].
Understanding the implications for restricted recombination between groups of var genes is key for a proper assessment of PfEMP1 as a vaccine candidate [2,38]. For example, if there exists a small genetically isolated partition that can be associated with severe disease, this subset would be a good vaccine candidate in much the same way that var2CSA, whose expression has been associated with the phenomenon of pregnancy associated malaria [39,40], is being used as the basis for developing a pregnancy associated malaria vaccine (see [41] for a review of the progress). However, not much is known about the direct consequences of a recombination hierarchy on the acquisition of immunity and the diversity of the global var gene population. This question is being explored here by first demonstrating that any partitioning of the antigenic repertoire into genetically isolated groups has a limiting effect on the total diversity that can be created through recombination, and then by analysing how diversity affects an individual's development of immunity to clinical malarial disease.

Methods and Results
In the following, an antigenic type or strain of P. falciparum is defined by its full repertoire of var genes. Two strains will then be said to have an overlapping repertoire if they have one or more var genes in common. In accordance with a recent model of antigenic variation in malaria, it is assumed that each PfEMP1 contains a unique major, immunogenic epitope [42]. The epitope itself might be encoded by a single combination of polymorphic building blocks within the gene such that each new combination, as a result of ectopic recombination between two genes, generates a novel epitope. For reason of simplicity, it is assumed that the population is at equilibrium with regards to these epitopes and every possible combination is currently in circulation within the parasite population. Although this assumption leads to an overestimation of diversity, it allows us to carry out the following analysis in disregard of ectopic recombination and generation of new variants. Therefore, recombination is considered only between two P. falciparum strains when it results in the generation of novel strains that differ in their antigenic repertoire from their 'parent' strains; this concept is illustrated in Figure 1.
Diversity is considered at two separate levels: intra-genic and inter-genic. The degree of intra-genic diversity, defined as n, refers to the number of unique epitopes at the genome level, whereas inter-genic diversity, defined as N, refers to the number of epitopes at the population level. Because each var gene is assumed to contain a single unique epitope, 'gene' and 'epitope' are interchangeable in this context.

Diversity reduction through segmentation
Each P. falciparum genome contains around 60 var genes, i.e. 60 unique epitopes. The extent of diversity at the population level, however, is much higher. As a result, allelic recombination between distinct P. falciparum strains can lead to an enormous number of different combinations of genes. As a numerical example, with 800 distinct epitopes within a local setting the number of possible non-identical repertoires that can be created through recombination without constraints, referred to as D T , is given as As a consequence of dividing the antigenic repertoire into recombining groups, epitopes are only available to the groups they are associated with. If the repertoire is divided into two partitions or recombining groups, such that n 1 + n 2 = n and N 1 + N 2 = N, the number of possible strains is given by However, the pool size of possible strains that can be created from genomes in which recombination is partitioned is always smaller than the pool size that can be created from the same number of epitopes in a non-partitioned system, that is, Therefore, recombination hierarchies have a limiting effect on the level of strain diversity. More importantly, the finer the partitioning the greater is the limiting effect. As an example, assuming that the repertoire is equally divided into five recombining groups such that each group has the same number of epitopes, both within the genome and within the population, i.e. n i = n/5 = 12 and N i = N/5 = 160. The reduction in diversity with this grouping can be calculated by the ratio D P /D T (i.e. the number of combinations with partitioning divided by the number of combinations without partitioning): This means that the potential pool size would be reduced by nearly four orders of magnitude by such a partitioning. Note that the existence of recombining blocks also restricts the number of possible epitopes that can be created through ectopic recombination (because the availability of 'building blocks' is limited); this puts a further limitation on the number of possible strains. Therefore, the diversity reduction ratio serves only as a lower bound for the constraining effect of recombination hierarchies.

Intra-and inter-genic diversity
Allocating genes or epitopes to distinct recombining groups limits the total number of strains but the group size distribution itself determines the extent to which ..

. (4)
Homologous recombination between two strains Figure 1 Homologous recombination between two strains. Recombination between two strains (strain 1, blue, and strain 2, red), shown as two cross-over events, is considered to be confined within recombining groups (here group1 and group2). Each box represents a particular gene. In this process, two strains emerge (strain 3 and strain 4) that differ in their antigenic repertoire from their parent strains. strain diversity is constrained. For example, if a total pool of 800 epitopes, with 60 epitopes per genome, is divided into two equally sized recombining groups, the reduction in diversity is nearly twice as much as if one group was 10 times bigger than the other group. (Naturally, in the extreme case effectively equals an unconstrained system which has the highest level of diversity.) Therefore, the number of strains that can be created in a partitioned system is greater if the partitions are unequal in size; this holds true for any number of partitions.
What about the relationship between the intra and intergenic group sizes? Bull et al. [22] reported similar frequency distributions of groups of var sequence tags within the parasite genome and within the whole population despite the fact that the groups themselves vary considerably in size. That is, the relative ratios of intra to inter-genic group sizes, n i /N i , are equal across the groups. Assuming a population pool of N epitopes, with each parasitic genome containing n epitopes grouped into two major groups, n 1 and n 2 , one can calculate the diversity reduction D P /D T for varying intra to inter-genic group size ratios, n 1 /N 1 and n 2 /N 2 . Figure 2 shows how the diversity reduction ratio D P /D T changes with inter-genic group size N 1 for three different values of n 1 for a system N = 800 and n = 60 (n 2 and N 2 are determined by n 2 = n -n 1 and N 2 = N -N 1 ). In all three cases, D P /D T assumes a maximum exactly when n 1 /N 1 = n 2 /N 2 . For example, in the case where n 1 = 10 and n 2 = 50, the smallest reduction in strain diversity is found when N 1 = 133 and N 2 = N -N 1 = 800 -133 = 667, i.e. when n 1 /N 1 = n 2 /N 2 = 1/5. This clearly demonstrates that the number of possible strains the parasite can create through recombination is least restricted (greatest value of D P /D T ) when the frequency distributions of

Diversity reduction
the different groups are equal within parasite genomes and across the whole population. Due to symmetry, the same results can be obtained for any number of partitions.

Overlapping repertoires
Immunity against severe malaria develops rapidly after relatively few clinical episodes, and recent studies have shown that severe disease may correlate with the expression of particular groups of var genes. Here it is assumed that each genome contains between six and 10 genes of this "severe disease" group and it is further assumed that each gene will be expressed during the course of infection. In the following, n SD and N SD define the number of severe disease (SD) associated genes within the genome and within the population, respectively. With an arbitrary intra to inter-genic diversity ratio n SD /N SD = 1/20, a pool of 120-200 SD-epitopes can be found within the population, and the boundaries for the total number of unique SD repertoires are given as ¿From now on, only epitopes associated with severe disease are being considered which allows the omission of the subscripts, i.e. n SD = n and N SD = N. Although the number of possible combinations of epitopes is huge, a large proportion of these have one or more epitopes in common. The proportion of combinations, i.e. strains, that share i epitopes can be expressed as and find the proportion of strains that overlap in their repertoire of severe disease associated genes by at least one as ∑ i s i , which can also be expressed as where the last term denotes the proportion of types with no overlap. For the lower and upper bound systems, (N, n) = (120, 6) and (200, 10), the proportion of overlapping types is 27% and 41%, respectively. Therefore, under the assumption of equal intra to inter-genic diversity ratios, larger groups tend to produce more overlapping combinations as the probability of two types having epitopes in common increases with the size of the genomic repertoire.
The assumption that every gene will be expressed during infection is of course artificial, but serves to illustrate the counterintuitive concept that overlap between strains actually increases with diversity when the intra to inter-genic diversity ratio is approximately equal. This will be shown to have consequences for the rate at which individuals acquire immunity through a series of infections.

Acquired immunity against severe disease
For an individual host suffering a series of infections, e t defines the number of SD epitopes he/she has been exposed to after the t th infection. Recall that each type contains n epitopes and so the probability of getting infected with m novel epitopes at the next infection, is equal to the proportion of strains that contain exactly n -m epitopes the host has previously been exposed to (note, it is assumed that all n genes will be expressed during the infection). That is, Moreover, P(e 1 = n) = 1, as the first infection necessarily results in n novel epitopes. Figure 3 illustrates the result of a stochastic simulation of individual infection histories where the outcome of each infection is determined by (8). Figure 3(a) and 3(b) show the build-up of immunity in terms of exposed epitopes with infection history for the two systems (N, n) = (120, 6) and (200,10). From these plots one can notice two phenomena: first, defining h t : = 1 -e t /N as the proportion of epitopes an individual has not yet experienced after the t th infection, i.e. the gap in its immune repertoire, then h t decays exponentially with t (shown as insets in Figure 3(a) and 3(b)). Second, the rate at which h t decays is equal for both systems (N, n) = (120, 6) and (200, 10). Therefore, the rate at which individuals get exposed to novel epitopes is mostly dependent on the ratio between the number of epitopes one experiences at each infection and the total number of epitopes within the population, n/N.
Here, it is assumed that the risk of developing severe disease corresponds to the probability of getting infected with an strain that does not contain any SD associated epitopes the host has previously been exposed to. Although a crude assumption, it is easy to see how the results obtained here can be applied to cases where minimal overlap can also cause severe disease or even to the case when not all genes are being expressed during infection. In terms of an individual's infection history the risk of severe disease is then given as the probability of experiencing n novel epitopes at the (t + 1) th infection. That is,  In line with the gap in the host's immune repertoire, h t , the risk of severe disease P SD decays exponentially with infection history, and it is worth mentioning that even after a limited number of infections the risk of getting infected by an entirely novel strain is minimal. For example, with 200 SD associated genes in the community and 10 per genome, the risk of severe disease is one in 1,000 after 13 infections only. However, it takes more than 22 infections if there are 120 SD genes in the community and six per genome, despite the fact that the total number of possible combinations in this case is less by over six orders of magnitude.
In contrast to the acquisition of an immune repertoire (through exposure to novel epitopes), the change in risk of severe disease with infection history is strongly and sep-arately dependent both on the intra and inter-genic diversity levels. Figure 4 shows how P SD depends on the genetic diversity of these epitopes in the population, N, keeping the intra to inter-genic diversity ratio, n/N, constant. A surprising effect of maintaining a constant n/N ratio is that enhanced diversity leads to a sharper drop in the risk of severe disease with every infection. For example, with 12 such genes per genome and 240 within the population, 11 infections can be sufficient to bring the risk of contracting a discordant type down to less than one in 1,000. On the other hand, with only six SD associated genes per genome and a total of 120 in the population it takes twice as many infections to gain the same level of protection.
This phenomenon occurs because larger groups tend to produce more overlapping combinations (under the assumption that the ratio between the number of epitopes per genome and the total number of epitopes in the population is constant). Furthermore, hosts experience a

Simulation of infection histories
higher number of epitopes through each infection. As a result, after relatively few infections, the overall number of exposed epitopes is bigger than for smaller groups (again, under the assumption that all genes are being expressed). Because of the extensive overlap of strains in the population, the risk of getting infected by a novel combination decays more rapidly. Note, these results rest on the assumption that all epitopes are expressed during infection, although there is little data on var gene expression patterns in vivo. From Figure 4 it is clear, however, that the rate at which clinical protection is acquired strongly correlates with the number of genes that the host experiences per infection. If for example only a subset of genes is being expressed during infection or reaches a sufficient level to trigger an immune response, this would have a negative effect on the acquisition of immunity.

Recombination and immune selection
So far it has been assumed that all possible combinations of var genes occur within the parasite population at equal prevalence. Another theoretical extreme can be found within a totally structured population. Mathematical models have shown that strong immune selection can cause antigenically diverse pathogen populations to segregate into sets of strains that are marked by discordant antigenic repertoires [43,44]. Within this framework, assuming strong selection pressure and an entirely discordant pathogen population, an estimate for the lower bound for the total diversity of strains is η : = N/n, i.e. the total number of unique var epitopes is divided into discrete, non-overlapping types. This hypothetical extreme of a completely strain structured population means that each infection is with a strain that is either novel or overlaps entirely with the ones the host has previously been Risk of severe disease Figure 4 Risk of severe disease. Risk of severe disease at different levels of diversity N, with n/N = const = 0.05. As the level of intergenic diversity increases the incremental risk of contracting a discordant strain decreases significantly. Each line represents the average of 100 simulations. The inset shows how the total number of strains (log scale) that can be generated through recombination increases with N.
log (no. strains) 10 P SD exposed to. The probability of severe disease is then simply given as where e is the number of strains the host has experienced. However, a pathogen population that is structured into discordant types may actually lower the rate at which individuals build up protection against severe disease. Figure  5 shows how the risk of disease decreases with infection history under the assumption of total strain structure compared to that of a homogenous setting where all possible combinations are equally prevalent in the population. These two extremes define the boundaries of a region where the actual drop in the risk of severe disease can be expected. This intermediate region is very likely to be influence by the local settings, especially the rate and mode of transmission.

Discussion and Conclusion
Despite different approaches to classifying the var genes of P. falciparum, it appears that recombination between genes and gene segments is, at least to some extent, confined within a number of groups. It has been speculated that this partitioning has evolved to accommodate various sets of genes with distinct antigenic and adhesive characteristics within each parasite genome [31,32]. Here it has been shown that confining recombination to specific groups also has a limiting effect on the overall level of diversity that can be attained by the parasite.
Within the model framework, an equal ratio of intra-to inter-genic diversity allows the parasite to generate a max- Strain structured vs. homogenous population Figure 5 Strain structured vs. homogenous population. Comparison between the decline in risk of severe disease in a homogenous distribution of all possible types and a strain structured population. Both hypothetical extremes, homogenous distribution and complete strain structure, define the boundaries of a region (grey) within which the risk will realistically decay with exposure. N = 120, n = 6, η = 20. imum level of strain diversity within the constraints of such a recombination hierarchy. In fact, the limiting effect of repertoire partitioning is strongly dependent on the number of partitions, their respective sizes and the distribution of epitopes among them. Interestingly, the lowest pathogen population diversity will be attained if the partitions are exactly equal in size, whereas pathogen populations with asymmetric group sizes are able to maintain a higher level of diversity. This prediction is consistent with studies showing wide variations among group sizes, although the sizes themselves seem remarkably stable. That maximum diversity is obtained when intra-to intergenic diversity ratios are equal is particularly interesting in light of findings that the groups defined by [22] do seem to show very similar frequency distribution among the group members within a single parasite and the population as a whole, despite the fact that the relationship between the genomic group size and the total pool of genes associated with the group is non-linear. That is, the ratio between the number of genes associated with a particular group within the genome and the number of those genes within the population is approximately the same for every group. One of the main caveats in the analysis is the restriction to homologous recombination alone, however. Ectopic recombination is known to be a major generator of var gene diversity. It can be expected that the rate at which novel and functional genes are being created and introduced into the population is slower than homologous recombination. Therefore, the analysis can be seen as being applied to a snapshot of the current diversity within the population. The assumption that all possible combination exist within the population, however, might compensate for disregarding the generation of novel variants. The exact level of diversity within a population is not known, and it is also not known if diversity is constant, fluctuating or growing, or how fast the turnover of var genes is. These are important considerations for future work.
The practical implication of categorizing var genes is to find associations between certain disease symptoms and the expression of particular groups of genes. Several studies have already found correlations between specific groups and clinical manifestations, such as severe childhood disease or rosetting, although more research is required to establish the exact nature of these associations. For the purpose of this analysis it was assumed that each genome contained a certain number of disease associated genes and that clinical protection is the result of having been exposed to a majority of circulating variants. More precisely, the risk of severe disease was assumed to correlate with the probability of getting infected by a strain with a completely discordant set of these genes. This was done for practical reasons only, although it is conceivable that any pre-existing immunity, in terms of specific antibodies, is likely to offer some degree of protection. Furthermore, the process of acquired immunity is multifactorial and depends on more than the exposure to PfEMP1. Host factors and other parasitic proteins, such as the vaccine candidate merozoite surface protein (MSP), might also be crucial during this process ( [45][46][47]). The diversity of these proteins is far more limited than PfEMP1, however, such that within the model presented here, a much steeper reduction in the probability of developing clinically disease with each infection can be expected. The analytical framework presented here can easily be extended to include different levels of diversity to account for more than one target of protective immune responses. However, if both targets are equally strongly associated with protective immunity, then the least diverse target would dominate the rate at which protection is acquired.
In terms of the actual number of genes an individual is being exposed to during infection it was found that a larger group size, i.e. a larger number of genes per genome and a proportionally larger number of genes across the population, resulted in a more significant decrease in the risk of severe disease with each infection. This occurred for two reasons: (i) larger groups tended to produce a higher proportion of strains that overlap in the antigenic repertoire; this lowered the per-infection risk of encountering a completely novel strain. (ii) larger groups led to hosts being exposed to higher number of variants per infection, which resulted in a further decrease in risk. The analysis presented here is based on the crucial assumption that every gene will be expressed during infection. This, however, is biologically questionable. As a result, the gradient shown for the decline in the risk of severe disease is probably steeper than if only a certain portion of the repertoire is being expressed. Nevertheless, the model serves to illustrate how the reduction in risk of severe disease can fall rapidly in spite of vast pathogen diversity. Variant expression itself appears to be influenced by the immune status of the host [22]. Although no specific order has been established at which the variants are being expressed during infection, there is some evidence that antibodies against particular groups of genes (groups A and B/A) are acquired earlier in life, which hints towards a predominance of this particular group [8]. This apparent predominance might also help to explain why protection against the most severe form of malaria is acquired early on in life.
The gradient in the decline in risk of severe disease is also influenced by the parasite population structure. Theoretical models have suggested that immune selection can segregate P. falciparum into discrete strains with minimal overlap in their antigenic repertoire [43,44]. This has been supported by field studies that also found little or no over-lap among DBL-α sequences between different isolates [48,49]. These studies are not entirely conclusive as they relied on a relatively small number of isolates or came from regions of low-moderate transmission intensity. The latter is expected to have a major influence on population structure as allelic recombination is hugely favoured when transmission is intense. Nevertheless, as Figure 5 clearly shows, population structures can have a huge impact on the rate at which individuals acquire protective immunity. Both hypothetical extremes, a strain structured population and one where all possible combinations co-circulate, can be seen as setting an upper and lower level for the rate at which the risk of severe disease decreases with infection history. The realistic rate can then be expected to be somewhere between those boundaries, determined by the intensity of transmission, for example.
The evolutionary forces that determine the size of each var gene group are not clear but one can envisage a complex and multi-factorial interplay of trade-offs between disease, infection length and host adaptation (including both immune evasion and cytoadhesion). For example, high levels of parasitaemia correlate with transmissibility but also with the risk of host death. Increasing the number of disease-associated genes might, therefore, be detrimental for the pathogen's fitness. On the other hand, retaining a certain number of these genes may confer an evolutionary advantage over parasites that carry only genes associated with mild disease because of increased transmissibility. This is highly speculative but the fact that clinical symptoms so far have only been found to correlate with the expression of a relatively small set of genes within the repertoire seems to support these claims. Ongoing studies will help to clarify how selection pressure has shaped and partitioned the var gene repertoire of P. falciparum and how it impacts on the expression of different subsets of genes. This knowledge is key to understanding host-pathogen interaction which might aid the design of new and ongoing intervention strategies.