close window print window

 

 

Preliminary analysis of biological relevance

Preliminary analysis of biological relevance of results presented in Table I below. We know that it isn't a thorough comparative analysis but we think that data suggest the following feauture that we stand out:

  1. As already known, the (CG)n sequence occurs with low frequency. Nevertheless in most eukaryotes it occurs more often than expected on a random basis (x>0).
  2. Among simple repeats with 4 bases or less which only contain A and T, the (AATT)n repeat is the least frequent. If AT rich repeats were shown to play a role, this repeat appears to be less suitable.
  3. In the prokaryotes studied it is surprising to note that Mycoplasma genitalium in spite of its small genome contains an A19 string. Other more complex repeats in this organism have been described by Hancock (1996).
  4. Mycobacterium leprae has a unique bacterial genome (Cole et al., 2001), since more than half the genome is not functional. It contains long AT repeats (17 and 18 times) unusual in bacterial genomes.
  5.
Among the eukaryotes, we have looked only to a small subset of the sequences available. The presence of some extremely long exact repeats in Dyctiostelium discoideum (AAT)88 and in Caenorabditis elegans (G483 and (AT)337) is striking Long (AAT)n and (AAAT)n repeats have also been found in Drosophila. Similar repeats might be present in other eukaryotes if additional sequences were analyzed. In the yeasts no such extremely long repeats have been found. The longest is (AAT)35.
  6. In the yeasts no such extremely long repeats have been found. The longest is (AAT)35.
  7. The highly reduced genome of the eukaryotic algal nucleus in Guillardia theta (Douglas et al., 2001) still contains some significant repeats

References

  Cole,S.T., Eiglmeier,K., Parkhill,J., James,K.D., Thomson,N.R., Wheeler,P.R., Honoré,N., Garnier,T., Churcher,C. Harris,D. et al. (2001) Massive gene decay in the leprosy bacillus, Nature, 409, 1007-1011.
  Douglas,S., Zauner, S., Fraunholz,M, Beaton,M., Penny,S., Deng,L.-T., Wu,X., Reith,M., Cavaller-Smith,T. and Maler,U-G (2001) The hihgly reduced genome of an enslaved algal nucleus, Nature, 410, 1091-1096.
  Hancock,J.M. (1996) Simple sequences in a 'minimal' genome, Nature Genetics, 14, 14-15.

 

Table I



size(Mb)

a

c

ac

at

cg

ct

aat

aaat

aatt

Bacteria1

26.6

16(1)

15(2)

12(1)

12(1)

12(1)

12(0)

18(3)

16(1)

12(0)

Archaeoglobus fulgidus

2.08

15(1)

9(-1)

8(0)

8(1)

6(-1)

10(0)

12(1)

12(1)

8(0)

Escherichia coli K12

4.42

9(-1)

10(-1)

10(0)

10(0)

10(0)

10(0)

12(0)

12(0)

12(0)

Mycobacterium leprae

3.12

9(-1)

22(6)

18(4)

36(19)

12(0)

8(-1)

9(0)

8(0)

8(0)

Mycoplasma pneumoniae

0.78

16(1)

8(-1)

10(0)

8(0)

8(0)

22(8)

9(0)

8(0)

8(0)

Mycoplasma genitalium

0.55

19(1)

7(0)

8(0)

8(-1)

4(0)

8(0)

12(0)

12(0)

12(0)












Dictyostelium discodium

49.96

206(63)

247(153)

74(40)

214(102)

8(2)

106(68)

264(113)

136(50)

52(16)

Trypanosoma brucei (chromosomes 1,9,10)

10.72

36(10)

19(5)

118(61)

58(31)

10(0)

30(12)

189(105)

40(15)

16(2)












Schizosaccharomyces pombe

12.39

39(9)

19(7)

36(15)

36(14)

8(0)

30(11)

33(10)

24(5)

16(2)

Saccharomyces cerevisiae

11.21

42(12)

19(6)

60(30)

40(16)

10(1)

62(30)

105(49)

52(20)

16(1)












Arabidopsis thaliana

113.02

47(13)

27(11)

50(23)

70(33)

12(2)

194(106)

177(86)

24(3)

20(3)

Guillardia theta

0.53

20(2)

12(4)

20(9)

10(0)

8(2)

24(10)

15(1)

12(0)

12(-1)












Homo sapiens (chromosomes 20,21,22)

122.20

60(22)

23(5)

98(51)

94(53)

20(11)

88(42)

60(28)

56(24)

28(8)

Drosophila melanogaster

118.43

71(24)

25(9)

118(65)

80(39)

12(0)

76(41)

252(127)

440(219)

40(13)

Caenorhabditis elegans (chrom. 1,2,3,4,5,10)

95.46

45(9)

483(308)

222(133)

674(308)

16(4)

192(112)

84(38)

168(77)

20(3)

Mus musculus (chromosomes 1,2,3)

21.21

49(19)

31(11)

74(36)

114(67)

20(13)

100(48)

60(29)

60(28)

32(12)

Bacteria1: this group is composed by Aeropyrum pernix, , Chlamydia muridarum, Chlamydia trachomatis,Chlamydia pneumoniae AR39, Chlamydophila pneumoniae CWL029, Chlamydia pneumoniae J138 , Pyrococcus abyssi, Pyrococcus horikoshi, Thermoplasma volcanium, Clostridium perfringens, Pseudomonas aeruginosa, Mycobacterium tuberculosis and Treponema pallidum.

The cells for the patterns are in form l (x), where l is the maximum length of the bases found for this pattern, and in parentesis is given the exponent(x) of the ratio r = number of sequences found / number of expected sequences = a*10b, simplified to a whole number in such a way that x=b for a<3 and x=b+1 for a>=3. Example: pattern is at, the maximum repeat is “atatat” and ratio is 3.4*101. Then the cell value is 6(2).



alggen group, mrepatt by Ph.d. J.A.Subirana - Ph.d. X.Messeguer - Roman Roset
rroset@lsi.upc.es