Preliminary analysis of biological relevance
Preliminary analysis of biological relevance of results presented in Table I below. We know that it isn't a thorough comparative analysis but we think that data suggest the following feauture that we stand out:
|1.||As already known, the (CG)n sequence occurs with low frequency. Nevertheless in most eukaryotes it occurs more often than expected on a random basis (x>0).|
|2.||Among simple repeats with 4 bases or less which only contain A and T, the (AATT)n repeat is the least frequent. If AT rich repeats were shown to play a role, this repeat appears to be less suitable.|
|3.||In the prokaryotes studied it is surprising to note that Mycoplasma genitalium in spite of its small genome contains an A19 string. Other more complex repeats in this organism have been described by Hancock (1996).|
|4.||Mycobacterium leprae has a unique bacterial genome (Cole et al., 2001), since more than half the genome is not functional. It contains long AT repeats (17 and 18 times) unusual in bacterial genomes.|
Among the eukaryotes, we have looked only to a small subset of the sequences available. The presence of some extremely long exact repeats in Dyctiostelium discoideum (AAT)88 and in Caenorabditis elegans (G483 and (AT)337) is striking Long (AAT)n and (AAAT)n repeats have also been found in Drosophila. Similar repeats might be present in other eukaryotes if additional sequences were analyzed. In the yeasts no such extremely long repeats have been found. The longest is (AAT)35.
|6.||In the yeasts no such extremely long repeats have been found. The longest is (AAT)35.|
|7.||The highly reduced genome of the eukaryotic algal nucleus in Guillardia theta (Douglas et al., 2001) still contains some significant repeats|
|Cole,S.T., Eiglmeier,K., Parkhill,J., James,K.D., Thomson,N.R., Wheeler,P.R., Honoré,N., Garnier,T., Churcher,C. Harris,D. et al. (2001) Massive gene decay in the leprosy bacillus, Nature, 409, 1007-1011.|
|Douglas,S., Zauner, S., Fraunholz,M, Beaton,M., Penny,S., Deng,L.-T., Wu,X., Reith,M., Cavaller-Smith,T. and Maler,U-G (2001) The hihgly reduced genome of an enslaved algal nucleus, Nature, 410, 1091-1096.|
|Hancock,J.M. (1996) Simple sequences in a 'minimal' genome, Nature Genetics, 14, 14-15.|
Bacteria1: this group is composed by Aeropyrum pernix, , Chlamydia muridarum, Chlamydia trachomatis,Chlamydia pneumoniae AR39, Chlamydophila pneumoniae CWL029, Chlamydia pneumoniae J138 , Pyrococcus abyssi, Pyrococcus horikoshi, Thermoplasma volcanium, Clostridium perfringens, Pseudomonas aeruginosa, Mycobacterium tuberculosis and Treponema pallidum.
The cells for the patterns are in form l (x), where l is the maximum length of the bases found for this pattern, and in parentesis is given the exponent(x) of the ratio r = number of sequences found / number of expected sequences = a*10b, simplified to a whole number in such a way that x=b for a<3 and x=b+1 for a>=3. Example: pattern is at, the maximum repeat is atatat and ratio is 3.4*101. Then the cell value is 6(2).