Skip to main content

Integrative analysis of intraerythrocytic differentially expressed transcripts yields novel insights into the biology of Plasmodium falciparum



The intraerythrocytic development of Plasmodium falciparum, the most virulent human malaria parasite involves asexual and gametocyte stages. There has been a significant increase in disparate datasets derived from genomic and post-genomic analysis of the parasite that necessitates delivery of integrated analysis from which biological processes important to the survival of the parasite can be determined.


In order to resolve genes associated with stage differentially expressed transcripts, we have developed and implemented an integrative approach that combines evidence from P. falciparum expressed sequence tags (ESTs), genomic, microarray, proteomic and gene ontology data.


A total of 143 gametocyte-overexpressed and 51 asexual-overexpressed transcripts were identified. A subset of 74 genes associated with these transcripts showed evidence of stage-correlated protein expression, of which 53 have not been experimentally characterised. Our study has revealed (1) possible regulatory mechanisms in malaria parasites' gametocyte maturation, (2) correlation between EST and microarray data for a P. falciparum gene family to present unique EST-derived information, (3) candidate drug and antigenic targets on which computational and experimental studies can be performed, and (4) the need for more empirical studies on gene and protein expression in malaria parasites.


Applying different domains of data to the same underlying gene set has yielded novel insights into the biology of the parasite and presents an approach to appraise critically the data quality of post-genomic datasets from malaria parasites.


Pathogen bioinformatics have been developed and applied as a vehicle to discover novel genes and the search for virulence-associated genes combining approaches that assay gene expression, adaptive evolution and gene transfer [13]. In this study, layers of data about Plasmodium falciparum, obtained with gene transcript and genome sequencing as well as gene and protein expression profiling technologies, were integrated to reveal insights into previously undiscovered regulation during intraerythrocytic development. Genes that merit further analysis are described. This integrative approach uses an evidence-based assessment of disparate datasets similar to gene structure prediction approaches that rely on accumulation of evidence such as similarity to known genes, nucleotide compositional features, intron/exon boundaries and promoter sequences [4].

The high malaria burden in Africa [5, 6] necessitates increased efforts to understand the biology of the pathogen with a view to discovering new drugs, candidate vaccines and diagnostics, as well as improving existing ones. The publication of the genomes of the human malaria parasite P. falciparum and the rodent malaria parasite Plasmodium yoelii as well as ongoing sequencing projects of other Plasmodium species presents new opportunities to achieve the above-mentioned goals [79]. In addition, there have been efforts to obtain and analyse on a large-scale, gene expression profiles (transcriptome) of Plasmodium species using Expressed Sequence Tags (ESTs) [1, 1013], full length cDNAs [14], Serial Analysis of Gene Expression (SAGE) [15, 16] and microarrays [1719]. Protein expression profiles (proteome) on particular stages of the P. falciparum life cycle are also available [20, 21].

The random single-pass sequencing of a cDNA library to generate short (200–500 bp) nucleotide sequences that tag an expressed gene sequence is an established method of gene discovery [22, 23]. EST gene indices are generated by computer-based methods to organise these tags by assigning them into groups to remove redundancies and yield reconstructed transcripts that represent consensus sequences of each group [22, 24, 25]. These indices are being used to understand the complexity of the human genome, especially in providing information on alternative transcripts, non-translated transcripts, truly unique genes and extremely short genes that will complement the genome data [25]. The availability of the complete genome of P. falciparum 3D7 makes it possible to provide similar information for the parasite. In fact, additional EST and full-length cDNA sequences are required to improve the current annotation and verify predicted genes [7]. EST sequencing projects on Plasmodium have identified novel genes [1, 10, 13] but only limited analyses have been performed on ESTs for coordinate and differential gene expression [13].

Plasmodium ESTs from a variety of cDNA libraries are available in the GenBank EST database (dbEST). As of February 2003, 11 libraries comprising of nine asexual, one sporozoite and one gametocyte were available in dbEST. ESTs from some of these libraries have been indexed [1, 10, 13, 26]. Microarrays, mRNA differential display and EST-based analysis have been used to study transcriptional differences between asexual and gametocyte stages of P. falciparum, revealing stage-specific genes [13, 17, 27]. These studies were done prior to the publication of the genome sequence of strain 3D7. Furthermore, in the case of Li and colleagues [13], the functional annotation was selective. An EST-based analysis with an improved functional annotation that combines the automated annotation from P. falciparum gene indices and the curated annotation in the Plasmodium Genome Database (PlasmoDB) [28] is needed. In addition, integration of proteomic data with such analysis has been recognized as an important component in drug target identification and validation in the human genome [29].

The number of ESTs used to generate a consensus sequence in a gene index can provide a rough estimate of the mRNA abundance in the tissue or cell of origin [23]. Furthermore, statistical tests have been developed to identify genes that are differentially expressed (significantly overexpressed) in a particular tissue compared to one or more other tissues [30, 31]. The differences in EST counts have been applied to understand gene expression in different metabolic pathways, tissues or stages [3234]. These differences appear to correlate with biology of the tissue or stage under investigation. Microarray and SAGE methods are more narrow but sensitive for differential gene expression studies and can be used to validate broader EST-based analysis [13].

The life cycle of P. falciparum involves stages in the female anopheline mosquito vector and stages in the human host [35]. The parasite goes through pre-erythrocytic and intraerythrocytic stages in the human host. The pre-erythrocytic stage involves invasion and growth within liver cells, whereas the intraerythrocytic cycle is a multi-stage process, which includes differentiation into asexual stages (rings, merozoites, trophozoites and schizonts) as well as sexual stages (male and female gametocytes). The clinical symptoms of malaria are produced primarily as a consequence of the asexual life cycle, while the sexual cycle, which can be divided into early (I-II) and late (III-V) gametocyte stages [36], is necessary for the development of the parasite in the mosquito. The intensive research on gene expression in the asexual stage compared to gametocyte stage can be inferred from the number of cDNA libraries deposited in the dbEST as mentioned above. The late (mature) stage gametocyte cDNA library (ID:10054) should contain transcripts important for gametocyte maturation and also formation of gametes and fertilization [37]. The availability of a cDNA library of 3D7 (ID:9765) asexual mixed stage (rings, trophozoites and schizonts) and genome data from the same strain presents an opportunity to determine differentially expressed transcripts between the two libraries.

Transcription and translation in malaria parasites is complex and characterized by features such as multiple transcripts, antisense transcripts, stage-specific transcripts, chromosomal clusters encoding co-expressed proteins, unspliced mRNA, gene family member-specific expression and translational control [20, 38, 39]. These features contribute to parasite fitness and ability to undergo a complex life cycle. Understanding the role of these features in the regulation of important intraerythrocytic biological processes can deliver new tools for malaria control. For example, a proportion of genes involved in glycolysis, proteolysis and apicoplast targeting of nuclear encoded genes are thought to be regulated during the transition from asexual to sexual stages [7, 40]. The integration of data from EST sequencing with those from genomic, microarray and proteomic technologies could provide insights into molecular mechanisms that contribute to the regulation of these processes.

The significant increase in disparate datasets from genome sequencing and post-genomic analysis of P. falciparum necessitates delivery of integrated analysis from which biological processes important to the survival of the parasite can be determined. The integrated approach developed has identified stage-overexpressed genes with computational and experimental evidence to support their functional analysis. Furthermore, the approach is demonstrated as a means to appraise critically the data quality of the increasing number of post-genomic datasets from malaria parasites.


Integrative analysis approach

The integrative analysis approach that was used to combine genomic, expressed sequence tag, microarray, proteomic and gene ontology data from P. falciparum 3D7 is presented in Figure 1. The starting integrative criterion was significant overexpression of a transcript in a stage relative to the other stage. Criteria used and their acceptable ranges are presented in Table 1.

Figure 1

Simplified flowchart of integrative analysis of Plasmodium falciparum data. Flowchart symbols: rounded rectangle, start or end; rectangle, process; diamond, decision.

Table 1 Threshold values for steps in integrative analysis of Plasmodium falciparum data

Expressed sequence tags and transcript reconstruction

Expressed Sequence Tags derived from P. falciparum 3D7 mixed asexual stage (dbEST ID: 9765) and gametocyte (III-V) stages (dbEST ID: 10054) cDNA libraries were retrieved using Sequence Retrieval System (SRS) version 7.02 from EMBL database (Release 74, March 2003). These sets of ESTs were sequenced by Washington University Plasmodium EST Project [13]. A total of 15,126 ESTs consisting of 11,872 asexual and 3,254 gametocyte ESTs were downloaded. Transcript reconstruction of these ESTs was performed using stackPACK clustering system version 2.2 [22, 24] as described previously for reconstructing Plasmodium transcripts [1]. Briefly, the process starts with removal of artifactual sequences such as repeats and vector sequences. The "clean" sequences are grouped using a loose clustering approach into clusters and the clusters assembled into contigs. The alignments of sequences that make up these assembled clusters are analysed to produce consensus sequences of maximal length representing the reconstructed transcripts. stackPACK was chosen for its ability to provide extended consensus sequences [41] (Hide et al. in preparation). Clusters containing only a single sequence are called singletons. A gene index, manufactured by such a method, is therefore a non-redundant representation of a set of reconstructed gene fragments that approximates to the best available representation of genes for that organism. The clustering was unsupervised in that known sequences such as mRNA, full-length cDNA, previously reconstructed ESTs or exon constructs were not used to guide the process. This type of clustering was required to provide valid input data for the software used to calculate the differential expression statistics applied in this study.

Differential gene expression analysis

Audic-Claverie (AC) and the Chi-square (χ2) 2 × 2 statistical tests for differential gene expression were used to identify stage-overexpressed transcripts. These pairwise tag statistics are based on EST counts of contigs (assembled clusters) with at least five ESTs since for a 95% confidence interval, the first value that is significantly different from 0 is 5 [30, 32].

The calculation of these statistics was implemented with the web version of IDEG6 software; with a significance threshold of 0.05 [31]. A suite of PERL scripts was written to extract EST counts from output of stackPACK 2.2 and present the input dataset in the format required by IDEG6. Data extracted from the output file of IDEG6 were (1) contig description; (2) observed and normalised EST counts from the two libraries; and (3) probability that a transcript is differentially expressed as represented by P-values for the two tests. Transcripts for which the P-values for both statistics were less than 0.05 were taken as differentially expressed. Since these statistics determined transcripts differentially expressed, the terms asexual-overexpressed and gametocyte-overexpressed were used for transcripts (or genes) with significant overexpression in mixed asexual stage and late stage gametocytes respectively.

Protein expression profiles and functional annotation of transcripts

Annotated protein predictions (release 4.0) of the whole genome sequence of P. falciparum 3D7 was obtained from the PlasmoDB website; A total of 5,334 predicted protein sequences were obtained. The overview page for each gene was retrieved using wget and saved as a Hypertext Markup Language (HTML) file on a local computer to allow ease of manipulation without accessing the database over the Internet. A PERL script was used to query each page for the words sporozoite, merozoite, trophozoite or gametocyte preceded by an apostrophe (') followed by a specific text as for the gametocyte; 'gametocyte stage peptide fragment(s) detected by mass spectrometry'. A match of this text was taken as evidence of expression and protein expression at the stage was assigned 1 or else 0 for no evidence. Thus, a 4-digit binary accession that indicates evidence for expression in sporozoite, merozoite, trophozoite and gametocyte is used to represent the 15 protein expression profiles presented by Florens et al. [20] and an additional accession for lack of evidence in all stages (0000).

Reconstructed transcripts were annotated on the basis of similarity searches using NCBI BLASTX version 2.2.1 against predicted proteins of P. falciparum 3D7. Statistical significance cut-off was set at an E-value of 10-10 following that of Carlton et al. [1]. Since an unsupervised clustering was performed, to support the functional annotation, the annotations obtained were correlated with the TIGR P. falciparum Gene Index; (Version 6.0, Release Date – January 11, 2003) and the Apicomplexan EST Database (ApiESTDB); Both these indices were generated with supervised clustering. The correlation was done by computational extraction of associated annotation of the TIGR Tentative Consensus (TC) followed by manual checking to determine if the annotation obtained in our analysis was identical to that of the TIGR TCs. This was done for only differentially expressed contigs. If the annotations were not identical, the reconstructed sequence was excluded from further analysis. ApiESTDB was consulted when additional support was required to make a decision.

Mining gene ontology annotation associated with transcripts

Genes classified as being involved in glycolysis (GO:0006096), proteolysis (GO:0006508) or targeted to the plastid (GO:0009536) were retrieved by searching PlasmoDB gene overview page for the respective GO identification (ID) number in a similar way as described for the protein expression profile except the search text was the respective GO ID preceded by the greater than sign (>) for example >GO:0006096. This text limits the search to the Gene Ontology section of the gene overview page. The number of genes retrieved was: 20 for glycolysis, 98 for proteolysis and 553 for plastid component. This corresponds to values obtained from the web-based PlasmoDB query page.

Correlation of EST-based abundance with microarray expression levels

The numbers of ESTs used to generate a reconstructed sequence were retrieved from the FASTA sequence description line of all reconstructed sequences generated by stackPACK 2.2. The levels of expression or average signal intensities obtained from microarray experiments on the serine repeat antigen (SERA) gene family of P. falciparum [19, 4244] were used to compare the levels of expression obtained using ESTs. This gene family is characterised by a cysteine proteinase framework [39] and was selected because its members are annotated as being involved in proteolysis. Published microarray studies on this family have been obtained that facilitated comparative analysis with EST data.


Transcript reconstruction and functional annotation of transcripts

Transcript reconstruction using stackPACK 2.2 resulted in 1,760 contigs and 3,391 singletons. A total of 569 transcripts had an EST count of at least five ESTs. Functional annotation by similarity searching was performed for all reconstructed transcripts. A total of 210 transcripts that were differentially expressed were manually checked for correlation with TIGR and/or ApiESTDB P. falciparum gene indices. This process yielded 194 transcripts with correlated functional annotation.

Differential expression transcripts and protein expression profiling

The majority of the stage-overexpressed transcripts were from the late gametocyte stage. However, the mixed asexual stage had the highest percentage (83%) of genes with evidence of protein expression in the same stage (stage-correlated protein expression) compared to 31% for the late gametocyte stage. The observations are summarised in Tables 2 to 5. The 194 transcripts differentially expressed between the two libraries consisted of 51 from the mixed asexual stage and 143 from the late gametocyte stage. The complete list with transcript identification used in this study, correlated transcripts in the TIGR P. falciparum gene index, gene locus name, gene product description, representative EST or ESTs (for genes with representation from both libraries), observed and normalized EST counts for the two stages, as well as protein expression profile, are presented in the additional files 1 and 2 for mixed asexual stage and late gametocyte stage respectively. A list of stage-overexpressed transcripts that match those of Li et al. [13] is presented in additional file 3.

Table 2 Summary of functional annotation and protein expression of Plasmodium falciparum transcripts
Table 3 Asexual-overexpressed Plasmodium falciparum transcripts
Table 4 Gametocyte-overexpressed Plasmodium falciparum transcripts
Table 5 Distribution of protein expression profiles for Plasmodium falciparum stage-overexpressed genes

A total of 128 gametocyte-overexpressed and 48 asexual-overexpressed transcripts had a significant match with the predicted P. falciparum 3D7 proteins. Seventy-four genes (40 asexual-overexpressed, 34 gametocyte-overexpressed) showed evidence of stage-correlated protein expression (Tables 3 and 4). The well-studied S-antigen (PF10_0343) is one of the 8 asexual-overexpressed genes without stage-correlated protein expression. Four gametocyte-overexpressed genes (PFB0730w, PFI1210w, PF10_0115 and PFL0105w) had more than one reconstructed transcript. Multiple transcripts were generated when the reconstructed transcripts associated with a gene are not contiguous, and thus were not assembled into the same contig. Fifty-three of the 74 genes were classified as novel in that either the description of the gene product is labelled hypothetical protein or have the word putative.

In order to identify gametocyte-overexpressed genes that also have stage-correlated protein expression in the proteomics data of Lasonder et al. [21], the spreadsheet file containing 1,289 unique malaria proteins from that study was processed to yield a 3-digit binary accession representing evidence for protein expression of genes in trophozoites/schizonts, gametocytes and gametes. Fifteen of the 34 gametocyte-overexpressed genes were detected by both proteomic analyses (Table 6). Our analysis points to the need to clarify potential confusion in the annotation of the sexual stage specific protein precursor or Pfs16 (PFD0310w), a known marker for the earliest events of sexual differentiation [45]. The locus name (PF11_0318) of another gene, PF16, may be assigned to this gene [21]. PF16 has sequence similarity to a sperm flagella protein localized to the central pair of the axoneme. The gametocyte-overexpressed gene identified in this study was confirmed to be Pfs16 and not PF16 by the identical functional annotation of the associated consensus sequence from this study and that in the TIGR P. falciparum gene index.

Table 6 Gametocyte-overexpressed Plasmodium falciparum genes with correlated protein expression in two proteomic studies

The identified asexual-overexpressed genes that have been experimentally characterised have known roles in protein degradation, purine salvage, rhoptry biogenesis and protein trafficking, schizont rupture, merozoite invasion, phospholipid biosynthesis, nuclear metabolism, oxidative stress defense, cell proliferation and membrane biogenesis.

Mining gene ontology annotation associated with transcripts

Glyceraldehyde-3-phosphate dehydrogenase (PF14_0598) and ATP-dependent phosphofructokinase (PF11_0294) are two of 20 genes known to be involved in glycolysis. They demonstrate differential expression and show evidence of stage-correlated protein expression.

Microarray average intensities [19] available in PlasmoDB for PF11_0294 support its gametocyte-overexpression when compared to a closely related gene, PFI0755c that also codes for a phosphofructokinase and shows protein expression in intraerythrocytic stages [20, 21]. The microarray expression values for PFI0755c in trophozoite and schizont stages are 17,223.33 and 7,894 respectively in contrast to ~1,600 in both stages for PF11_0294. Inspection of the predicted protein features of PF11_0294 revealed the presence of two protein domains: gonadotropin-releasing domain, GnRH (Pfam ID: PF00446) and laminin N-terminal (Domain VI) (Pfam ID: PF00055). These domains are found in proteins that are extracellular and have a role in regulation of germ cell development.

PFB0340c, a cysteine protease and member of the SERA gene family was significantly overexpressed in mixed asexual stage. Other genes in the SERA family for which EST data were available were checked for correlation of functional annotation and their EST count retrieved. As shown in Table 7, the EST counts were variable across the gene family consistent with microarray-based studies [4244]. There was EST evidence for expression of PFB0345c (SERA4), PFB0340c (SERA5) and PFB0335c (SERA6), the three central genes that were demonstrated to be essential for asexual stage growth [42]. The GenBank accession numbers of a representative EST from these genes are BI936220, BI815392 and BQ633262 respectively. PFB0340c showed the highest EST count and microarray intensity values during asexual development of the parasite. Furthermore, multiple contigs mapped to this gene, which may represent alternative transcripts.

Table 7 Correlation of EST abundance and microarray intensity associated with SERA gene family

Out of the 17 transcripts (four asexual and 13 gametocyte) associated with genes targeted to the apicoplast, only two genes: MAL13P1.281 and PFE0145w have similarities to known genes (glutamate-tRNA ligase and 50S ribosomal subunit protein L28). There was evidence of protein expression in at least one asexual stage for two (PF07_0087, PF14_0543) of the four asexual-overexpressed genes (Table 3). Six gametocyte-overexpressed genes showed evidence for expression in the sporozoite stage while only PF11_0525 showed evidence in the sporozoite and gametocyte stages. PF11_0525 has predicted protein motifs that indicate its likely function. The domains are IQ (calmodulin-binding motif, Pfam ID: PF00612) and LysM (lysin motif, Pfam ID: PF01476), which is a general peptidoglycan-binding module. A list of apicoplast-targeted genes with stage-overexpressed transcripts is presented in additional file 4.


An integrative approach was used to determine genes associated with transcripts differentially expressed between mixed asexual stage and late stage gametocyte parasites. The publication of the genome sequence of two malaria parasites presents opportunities for post-genomic era malaria research including gene discovery and comprehensive understanding of gene expression [46]. The study has revealed (1) possible regulatory mechanisms in malaria parasites' gametocyte maturation, (2) correlation between EST and microarray data for a P. falciparum gene family to present unique EST-derived information, (3) candidate genes on which computational and experimental studies can be performed, and (4) the need for more empirical studies on gene and protein expression in malaria parasites.

A total of 569 contigs was used to determine stage-overexpression. These presents 366 more contigs than described by Li et al. [13] reflecting inclusion of new mixed asexual stage ESTs deposited after March 2002. Only 21 of the 24 significantly stage-specific transcripts identified by Li et al. [13] were among our stage-overexpressed transcripts after correlation of functional annotation. Both studies demonstrate the asexual-overexpression of the gene for glyceraldehyde-3-phosphate dehydrogenase (GAPDH), an important gene in the glycolytic pathway [47].

Gene and protein expression were observed, as well as protein domain evidence for specialization or adaptation of ATP-dependent phosphofructokinase (PF11_0294) for metabolic coupling of glucose utilization and maturation of gametocytes in malaria parasites. This enzyme is of major regulatory importance in Plasmodium and has been characterised only in Plasmodium berghei [48]. In addition, it has been proposed as a potential drug target in protozoan parasites [49]. Two genes (PF11_0294, PFI0755c) annotated as phosphofructokinase are present in the genome [7]. This is consistent with the fact that many key enzymes in the glycolytic pathway occur as isoenzymes [48]. Interestingly, PF11_0294 possesses a gonadotropin-releasing domain GnRH and laminin N-terminal (Domain VI) that are thought to regulate germ cell development. PFI0755c does not contain these domains.

PF11_0525 is the only apicoplast-targeted gene associated with a gametocyte-overexpressed transcript that showed stage-correlated protein expression. The fact that germ cell biology is conserved in evolution enables us to speculate on the possible roles of this protein. The calmodulin (CaM) binding site has been extensively studied in a sperm autoantigen (Sp17), which is a zona binding protein and a member of the family of CaM binding proteins that contain the IQ motif in the CaM binding domain. This domain has a regulatory role and undergoes proteolytic processing at the initiation of an acrosome reaction [50]. Some bacterial proteins such as hydrolytic enzymes contain the general peptidoglycan-binding module (LysM) and have a role in cell-wall penetration [51]. PF11_0525 does not have evidence of a bipartite peptide for apicoplast targeting and thus may be targeted via a different mechanism to the organelle or it may no longer function in the plastid.

The EST counts of the SERA gene family are comparable with the gene expression levels observed in microarray experiments. Both technologies agree that expression levels of members are variable as is expression of central genes during the asexual stage of the parasite. PFB0340c (SERA5) is the first described member of the family [39] and is also a malaria vaccine candidate [52]. The EST counts for PFB0340c observed is consistent with high gene expression levels in trophozoites and schizonts in published microarray experiments. Specifically, Miller et al. [42] and Aoki et al. [52] observed PFB0340c to be substantially more strongly transcribed than other SERA genes.

The increasing amount of published and unpublished data from microarray, SAGE, EST and differential display on malaria parasites shows that pairwise correlation is required. Comparison of such datasets obtained from different gene expression technologies can complement less sensitive technologies, hence adding value to data generation from these methods. For example, this study provides identity of ESTs and also potential alternative transcripts that can be used to further characterize the SERA central genes. Furthermore, PFB0325c (SERA8) did not have EST evidence consistent with low or absent expression observed in the microarray studies. However, there was evidence of its expression in the sporozoite stage, indicating the gene may be functional in other stages of the life cycle as speculated by Miller et al. [42]. Large-scale comparative expression analysis of gene families in multiple malaria parasites is needed to advance the knowledge of their evolution and their role during intraerythrocytic development.

The two uncharacterized genes from which we speculate functional insights, PF11_0294 and PF11_0525, have putative orthologues in P. yoelii yoelli (PY05918 and PY06990 respectively) [8] and were also detected in two independent proteomic analysis as expressed in the mature gametocyte stage [20, 21]. These observations strengthen the need for further studies on these genes and the possibility of studies with model malaria parasites. In general, various categories of candidate genes were provided that can be intensively studied as drug targets, antigenic targets, epidemiological or clinical markers. Eighty-seven of the 121 gametocyte-overexpressed genes did not show evidence of stage-correlated protein expression while 15 of those with such evidence were corroborated by the two proteomics studies. These corroborated genes represent a set of gametocyte-overexpressed genes with correlated transcription and translation data and thus candidates for studies on gametocyte maturation in malaria parasites. A shortlist of stage-overexpressed genes targeted to the plastid is presented to facilitate studies to understand the regulation of plastid metabolism in malaria parasites.

This study has identified the lack of correlation between gene and protein expression of the asexual-overexpressed S-antigen, consistent with observations from published proteome analysis [20]. This observation and those from the gametocyte-overexpressed transcripts as well as comparing outputs from EST clustering efforts demonstrate that our integrative approach has the utility to compare outputs of different post-genomic analysis. The analysis indicates the need for additional empirical studies on gene and protein expression in malaria parasites. Such studies could improve current understanding on discrepancies between gene and protein expression profiling data as well as the detection of proteins with unique characteristics such as proteolytic processing, post-translational modification and sub-cellular location.


The value of integrating a variety of datasets to unravel undiscovered regulation in biological processes during the gametocyte maturation stages of P. falciparum was demonstrated. Furthermore, comparative analysis of EST and microarray data was performed on the SERA gene family to advance the knowledge of their gene regulation and additional functional genomics reagents were presented to facilitate their study. Finally, the integrative approach was shown as a means to appraise critically the data quality of the increasing number of post-genomic datasets from malaria parasites.


  1. 1.

    Carlton JM, Muller R, Yowell CA, Fluegge MR, Sturrock KA, Pritt JR, Vargas-Serrato E, Galinski MR, Barnwell JW, Mulder N, Kanapin A, Cawley SE, Hide WA, Dame JB: Profiling the malaria genome: a gene survey of three species of malaria parasite with comparison to other apicomplexan species. Mol Biochem Parasitol. 2001, 118: 201-210. 10.1016/S0166-6851(01)00371-1.

    Article  PubMed  Google Scholar 

  2. 2.

    Davids W, Gamieldien J, Liberles DA, Hide W: Positive selection scanning reveals decoupling of enzymatic activities of carbamoyl phosphate synthetase in Helicobacter pylori. J Mol Evol. 2002, 54: 458-464. 10.1007/s00239-001-0029-6.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Gamieldien J, Ptitsyn A, Hide W: Eukaryotic genes in Mycobacterium tuberculosis could have a role in pathogenesis and immunomodulation. Trends Genet. 2002, 18: 5-8. 10.1016/S0168-9525(01)02529-X.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Mathe C, Sagot MF, Schiex T, Rouze P: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 2002, 30: 4103-4117. 10.1093/nar/gkf543.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  5. 5.

    Breman JG: The ears of the hippopotamus: manifestations, determinants, and estimates of the malaria burden. Am J Trop Med Hyg. 2001, 64: 1-11.

    CAS  PubMed  Google Scholar 

  6. 6.

    WHO/UNICEF: The Africa Malaria Report 2003. 2003, Geneva: WHO/UNICEF

    Google Scholar 

  7. 7.

    Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419: 498-511. 10.1038/nature01097.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ: Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature. 2002, 419: 512-519. 10.1038/nature01099.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Carlton J: The Plasmodium vivax genome sequencing project. Trends Parasitol. 2003, 19: 227-231. 10.1016/S1471-4922(03)00066-7.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Kappe SH, Gardner MJ, Brown SM, Ross J, Matuschewski K, Ribeiro JM, Adams JH, Quackenbush J, Cho J, Carucci DJ, Hoffman SL, Nussenzweig V: Exploring the transcriptome of the malaria sporozoite stage. Proc Natl Acad Sci U S A. 2001, 98: 9895-9900. 10.1073/pnas.171185198.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  11. 11.

    Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J: The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 2001, 29: 159-164. 10.1093/nar/29.1.159.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  12. 12.

    Kongkasuriyachai D, Kumar N: Functional characterisation of sexual stage specific proteins in Plasmodium falciparum. Int J Parasitol. 2002, 32: 1559-1566. 10.1016/S0020-7519(02)00184-4.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Li L, Brunk BP, Kissinger JC, Pape D, Tang K, Cole RH, Martin J, Wylie T, Dante M, Fogarty SJ, Howe DK, Liberator P, Diaz C, Anderson J, White M, Jerome ME, Johnson EA, Radke JA, Stoeckert CJ, Waterston RH, Clifton SW, Roos DS, Sibley LD: Gene discovery in the apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res. 2003, 13: 443-454. 10.1101/gr.693203.

    PubMed Central  Article  PubMed  Google Scholar 

  14. 14.

    Watanabe J, Sasaki M, Suzuki Y, Sugano S: Analysis of transcriptomes of human malaria parasite Plasmodium falciparum using full-length enriched library: identification of novel genes and diverse transcription start sites of messenger RNAs. Gene. 2002, 291: 105-113. 10.1016/S0378-1119(02)00552-8.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Munasinghe A, Patankar S, Cook BP, Madden SL, Martin RK, Kyle DE, Shoaibi A, Cummings LM, Wirth DF: Serial analysis of gene expression (SAGE) in Plasmodium falciparum: application of the technique to A-T rich genomes. Mol Biochem Parasitol. 2001, 113: 23-34. 10.1016/S0166-6851(00)00378-9.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Patankar S, Munasinghe A, Shoaibi A, Cummings LM, Wirth DF: Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite. Mol Biol Cell. 2001, 12: 3114-3125.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  17. 17.

    Hayward RE, DeRisi JL, Alfadhli S, Kaslow DC, Brown PO, Rathod PK: Shotgun DNA microarrays and stage-specific gene expression in Plasmodium falciparum malaria. Mol Microbiol. 2000, 35: 6-14. 10.1046/j.1365-2958.2000.01730.x.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Ben Mamoun C, Gluzman IY, Hott C, MacMillan SK, Amarakone AS, Anderson DL, Carlton JM, Dame JB, Chakrabarti D, Martin RK, Brownstein BH, Goldberg DE: Co-ordinated programme of gene expression during asexual intraerythrocytic development of the human malaria parasite Plasmodium falciparum revealed by microarray analysis. Mol Microbiol. 2001, 39: 26-36. 10.1046/j.1365-2958.2001.02222.x.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Bozdech Z, Zhu J, Joachimiak MP, Cohen FE, Pulliam B, DeRisi JL: Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray. Genome Biol. 2003, 4: R9-10.1186/gb-2003-4-2-r9.

    PubMed Central  Article  PubMed  Google Scholar 

  20. 20.

    Florens L, Washburn MP, Raine JD, Anthony RM, Grainger M, Haynes JD, Moch JK, Muster N, Sacci JB, Tabb DL, Witney AA, Wolters D, Wu Y, Gardner MJ, Holder AA, Sinden RE, Yates JR, Carucci DJ: A proteomic view of the Plasmodium falciparum life cycle. Nature. 2002, 419: 520-526. 10.1038/nature01107.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Lasonder E, Ishihama Y, Andersen JS, Vermunt AM, Pain A, Sauerwein RW, Eling WM, Hall N, Waters AP, Stunnenberg HG, Mann M: Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature. 2002, 419: 537-542. 10.1038/nature01111.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA: A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. Genome Res. 1999, 9: 1143-1155. 10.1101/gr.9.11.1143.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  23. 23.

    Okubo K, Hori N, Matoba R, Niiyama T, Fukushima A, Kojima Y, Matsubara K: Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat Genet. 1992, 2: 173-179.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W: STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res. 2001, 29: 234-238. 10.1093/nar/29.1.234.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  25. 25.

    Yuan J, Liu Y, Wang Y, Xie G, Blevins R: Genome analysis with gene-indexing databases. Pharmacol Ther. 2001, 91: 115-132. 10.1016/S0163-7258(01)00151-6.

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J: Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. 2002, 12: 493-502. 10.1101/gr.212002.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  27. 27.

    Cui L, Rzomp KA, Fan Q, Martin SK, Williams J: Plasmodium falciparum: differential display analysis of gene expression during gametocytogenesis. Exp Parasitol. 2001, 99: 244-254. 10.1006/expr.2001.4669.

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, Labo P, Li L, Mailman MD, Milgram AJ, Pearson DS, Roos DS, Schug J, Stoeckert CJ, Whetzel P: PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003, 31: 212-215. 10.1093/nar/gkg081.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  29. 29.

    Chanda SK, Caldwell JS: Fulfilling the promise: drug discovery in the post-genomic era. Drug Discov Today. 2003, 8: 168-174. 10.1016/S1359-6446(02)02595-3.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Audic S, Claverie JM: The significance of digital gene expression profiles. Genome Res. 1997, 7: 986-995.

    CAS  PubMed  Google Scholar 

  31. 31.

    Romualdi C, Bortoluzzi S, D'Alessi F, Danieli GA: IDEG6: a web tool for detection of differentially expressed genes in multiple tag sampling experiments. Physiol Genomics. 2003, 12: 159-162.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Mekhedov S, de Ilarduya OM, Ohlrogge J: Toward a functional catalog of the plant genome. A survey of genes for lipid biosynthesis. Plant Physiol. 2000, 122: 389-402. 10.1104/pp.122.2.389.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  33. 33.

    Lizotte-Waniewski M, Tawe W, Guiliano DB, Lu W, Liu J, Williams SA, Lustigman S: Identification of potential vaccine and drug target candidates by expressed sequence tag analysis and immunoscreening of Onchocerca volvulus larval cDNA libraries. Infect Immun. 2000, 68: 3491-3501. 10.1128/IAI.68.6.3491-3501.2000.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  34. 34.

    Megy K, Audic S, Claverie JM: Heart-specific genes revealed by expressed sequence tag (EST) sampling. Genome Biol. 2002, 3: RESEARCH0074-

    PubMed Central  PubMed  Google Scholar 

  35. 35.

    Miller LH, Baruch DI, Marsh K, Doumbo OK: The pathogenic basis of malaria. Nature. 2002, 415: 673-679. 10.1038/415673a.

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Day KP, Hayward RE, Smith D, Culvenor JG: CD36-dependent adhesion and knob expression of the transmission stages of Plasmodium falciparum is stage specific. Mol Biochem Parasitol. 1998, 93: 167-177. 10.1016/S0166-6851(98)00040-1.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Sinden R: Gametocytes and sexual development. In Malaria parasite biology, pathogenesis, and protection. Edited by: Sherman IW. 1998, Washington, DC: ASM Press, 25-47.

    Google Scholar 

  38. 38.

    Black CG, Wang L, Hibbs AR, Werner E, Coppel RL: Identification of the Plasmodium chabaudi homologue of merozoite surface proteins 4 and 5 of Plasmodium falciparum. Infect Immun. 1999, 67: 2075-2081.

    PubMed Central  CAS  PubMed  Google Scholar 

  39. 39.

    Mercereau-Puijalon O, Barale JC, Bischoff E: Three multigene families in Plasmodium parasites: facts and questions. Int J Parasitol. 2002, 32: 1323-1344. 10.1016/S0020-7519(02)00111-X.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Lang-Unnasch N, Murphy AD: Metabolic changes of the malaria parasite during the transition from the human to the mosquito host. Annu Rev Microbiol. 1998, 52: 561-590. 10.1146/annurev.micro.52.1.561.

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    Burke J, Davison D, Hide W: d2_cluster: a validated method for clustering EST and full-length cDNA sequences. Genome Res. 1999, 9: 1135-1142. 10.1101/gr.9.11.1135.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  42. 42.

    Miller SK, Good RT, Drew DR, Delorenzi M, Sanders PR, Hodder AN, Speed TP, Cowman AF, Koning-Ward TF, Crabb BS: A subset of Plasmodium falciparum SERA genes are expressed and appear to play an important role in the erythrocytic cycle. J Biol Chem. 2002, 277: 47524-47532. 10.1074/jbc.M206974200.

    CAS  Article  PubMed  Google Scholar 

  43. 43.

    Le Roch KG, Zhou Y, Batalov S, Winzeler EA: Monitoring the chromosome 2 intraerythrocytic transcriptome of Plasmodium falciparum using oligonucleotide arrays. Am J Trop Med Hyg. 2002, 67: 233-243.

    CAS  PubMed  Google Scholar 

  44. 44.

    Wu Y, Wang X, Liu X, Wang Y: Data-mining approaches reveal hidden families of proteases in the genome of malaria parasite. Genome Res. 2003, 13: 601-616. 10.1101/gr.913403.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  45. 45.

    Dechering KJ, Kaan AM, Mbacham W, Wirth DF, Eling W, Konings RN, Stunnenberg HG: Isolation and functional characterization of two distinct sexual-stage-specific promoters of the human malaria parasite Plasmodium falciparum. Mol Cell Biol. 1999, 19: 967-978.

    PubMed Central  CAS  PubMed  Google Scholar 

  46. 46.

    Horrocks P, Bowman S, Kyes S, Waters AP, Craig A: Entering the post-genomic era of malaria research. Bull World Health Organ. 2000, 78: 1424-1437.

    PubMed Central  CAS  PubMed  Google Scholar 

  47. 47.

    Campanale N, Nickel C, Daubenberger CA, Wehlan DA, Gorman JJ, Klonis N, Becker K, Tilley L: Identification and characterization of heme-interacting proteins in the malaria parasite, Plasmodium falciparum. J Biol Chem. 2003, 278: 27354-27361. 10.1074/jbc.M303634200.

    CAS  Article  PubMed  Google Scholar 

  48. 48.

    Sherman IW: Carbohydrate metabolism of asexual stages. In Malaria parasite biology, pathogenesis, and protection. Edited by: Sherman IW. 1998, Washington, DC: ASM Press, 135-145.

    Google Scholar 

  49. 49.

    Chi AS, Deng Z, Albach RA, Kemp RG: The two phosphofructokinase gene products of Entamoeba histolytica. J Biol Chem. 2001, 276: 19974-19981. 10.1074/jbc.M011584200.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Wen Y, Richardson RT, O'rand MG: Processing of the sperm protein Sp17 during the acrosome reaction and characterization as a calmodulin binding protein. Dev Biol. 1999, 206: 113-122. 10.1006/dbio.1998.9137.

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Bateman A, Bycroft M: The structure of a LysM domain from E. coli membrane-bound lytic murein transglycosylase D (MltD). J Mol Biol. 2000, 299: 1113-1119. 10.1006/jmbi.2000.3778.

    CAS  Article  PubMed  Google Scholar 

  52. 52.

    Aoki S, Li J, Itagaki S, Okech BA, Egwang TG, Matsuoka H, Palacpac NM, Mitamura T, Horii T: Serine repeat antigen (SERA5) is predominantly expressed among the SERA multigene family of Plasmodium falciparum, and the acquired antibody titers correlate with serum inhibition of the parasite growth. J Biol Chem. 2002, 277: 47533-47540. 10.1074/jbc.M207145200.

    CAS  Article  PubMed  Google Scholar 

Download references


The authors thank colleagues at the South African National Bioinformatics Institute for useful suggestions and staff of Electric Genetics for stackPACK support. RDI is a Claude Harris Leon Foundation Fellow and thanks the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR) and the Malaria Research and Reference Reagent Resource Center (MR4) for grants to attend workshops on Malaria Bioinformatics and Microarrays.

Author information



Corresponding author

Correspondence to Raphael D Isokpehi.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Isokpehi, R.D., Hide, W.A. Integrative analysis of intraerythrocytic differentially expressed transcripts yields novel insights into the biology of Plasmodium falciparum. Malar J 2, 38 (2003).

Download citation


  • Malaria Parasite
  • Protein Expression Profile
  • Asexual Stage
  • Gametocyte Stage
  • Reconstructed Transcript