Abstract
We introduce an algorithm, lllama, which combines simple pattern recognizers into a general method for estimating the entropy of a sequence. Each pattern recognizer exploits a partial match between subsequences to build a model of the sequence. Since the primary features of interest in biological sequence domains are subsequences with small variations in exact composition, lllama is particularly suited to such domains. We describe two methods, lllama-length and lllama-alone, which use this entropy estimate to perform maximum a posteriori classi cation. We apply these methods to several problems in three-dimensional structure classi cation of short DNA sequences. The results include a surprisingly low 3.6% error rate in predicting helical conformation of oligonucleotides. We compare our results to those obtained using more traditional methods for automated generation of classi ers