Logo image
Kernel Methods and Algorithms for General Sequence Analysis
Technical documentation   Open access

Kernel Methods and Algorithms for General Sequence Analysis

Pavel Kuksa, Pai-Hsi Huang and Vladimir Pavlovic
Rutgers University
2008
DOI:
https://doi.org/10.7282/T3ZP49HN

Abstract

Problems of analysis and modeling of sequential data arise in many practical applications. In this work, we develop efficient algorithms and methods for general sequence analysis. In particular, we propose novel ways of modeling sequences under complex transformations (such as multiple insertions, deletions, mutations) and present a new family of similarity measures (kernels), spatial string kernels, that can be computed very efficiently and show state-of-the-art performance on a variety of distinct classification tasks. We also present new algorithms for approximate (e.g. with mismatches) string comparison that improve currently known time bounds for such tasks and show order-of-magnitude running time improvements. In an extensive set of experiments on many challenging classification problems, such as detecting homology (evolutionary similarity) of remotely related proteins, categorizing texts, and performing classification of music samples, proposed algorithms and measures display state-of-the-art classification performance and run substantially faster than existing methods. We solve these problems in both binary and multi-class settings, as well as apply our methods to large-scale datasets with partially labeled samples.
pdf
rutgers-lib-57377_PDF-1281.52 kBDownloadView
Version of Record (VoR) Technical Documentation Open Access
url
Report an accessibility issueView
Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

290 File downloads
122 Record Views

Details

Logo image