Skip to main content

Table 1 Comparison of Plasmodium falciparum, Saccharomyces cerevisiae, Arabidopsis thaliana and Homo sapiens genomic statistics

From: Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

 

Plasmodium falciparum

Saccharomyces cerevisiae

Arabidopsis thaliana

Homo sapiens

Genome general statistics

No of chromosomes

14

16

5

22 + X/Y

Size (bp)

22,853,764

12,495,682

115,409,949

3,272,187,692

average (A+T) %

80.6

61.7

65.1

59.0

Estimated number of genes

5,268

5,770

25,498

31,778

Average gene length

2,283

1,424

1,310

1,340

% of coding genome

53

66

29

9

Initial annotation based on sequence similarity (BLAST or *Smith-Waterman E-values )

Proportion of predicted protein sequences:

- having a detectable similarity to sequences, in other organisms, of known function at the initial genome release date.

34 %

75 %

69 %

59 %*

- without any detectable similarity to sequences in other organisms at the initial genome release date, i.e. "no BLASTP match to known proteins" (estimates based on published data and local BLAST searches).

61 %

< 8 %

< 20 %

15 %*

- of totally unknown function (hypothetical proteins = with similarity to sequences of unknown function + without any detectable similarity to sequences in other organisms).

66 %

16 %

31 %

41 %*

Average characteristics of open reading frames

Exons:

    

No per gene

2.39

1.05

5.18

12.1

(A+T) %

76.3

60

55

52

average length

949

1356

253

111

Introns:

    

(A+T) %

86.5

64

66

60

Intergenic regions:

    

(A+T) %

86.4

64

66

60

  1. Presented data compile information from [22] for Plasmodium falciparum, [190] for yeast (completed with statistics made available via the Comprehensive Yeast Genome Database website, [191]), the Arabidopsis genome initiative [192] for Arabidopsis, and the International Human Genome Sequencing Consortium [193] and [194] for Human (completed with statistics made available via Ensembl, [195]). These statistics at the complete genome release date have been continuously updated since then.