| 18.05.2005 |
|
All background upstream sequence sets were updated. Each background has now two different models
(full and clean), the full is a sequence set as it is in the ENSEMBL
or TAIR and clean is a sequence set where sequences containing other letters than A,C,G or T
have been removed. Since the number of sequences does not differ between the two sequence sets,
the recommended one is the clean version.
|
|
Another change is that now all (except A. thaliana) upstream sequence sets are updated from
ENSEMBL, this may have the biggest effect on S. cerevisia background that was previously
updated by using RSA-tools.
|
|
Since ENSEMBL ENSMART tool had a bug (one sequence per gene -function was not working) during
the updates, background upstream sequence sets were filtered manually. The manual filtering was performed with
MySQL where each instance with the same ENSEMBL Gene ID was accepted once. Althought this is
not the most optimal way to filter the sequences, it offers consistency between the old and new background
models. (the bug was reported nearly one week ago, but it still has not been fixed.)
|