Kernel Methods and Algorithms for General Sequence Analysis

Pavel Kuksa; Pai-Hsi Huang; Vladimir Pavlovic

doi:10.7282/T3ZP49HN

Back

Kernel Methods and Algorithms for General Sequence Analysis

Technical documentation

Open access

Kernel Methods and Algorithms for General Sequence Analysis

Pavel Kuksa, Pai-Hsi Huang and Vladimir Pavlovic

Rutgers University

2008

DOI:

https://doi.org/10.7282/T3ZP49HN

Abstract

Problems of analysis and modeling of sequential data arise in many practical applications. In this work, we develop efficient algorithms and methods for general sequence analysis. In particular, we propose novel ways of modeling sequences under complex transformations (such as multiple insertions, deletions, mutations) and present a new family of similarity measures (kernels), spatial string kernels, that can be computed very efficiently and show state-of-the-art performance on a variety of distinct classification tasks. We also present new algorithms for approximate (e.g. with mismatches) string comparison that improve currently known time bounds for such tasks and show order-of-magnitude running time improvements. In an extensive set of experiments on many challenging classification problems, such as detecting homology (evolutionary similarity) of remotely related proteins, categorizing texts, and performing classification of music samples, proposed algorithms and measures display state-of-the-art classification performance and run substantially faster than existing methods. We solve these problems in both binary and multi-class settings, as well as apply our methods to large-scale datasets with partially labeled samples.

Files and links (2)

pdf

rutgers-lib-57377_PDF-1281.52 kBDownload View

Version of Record (VoR) Technical Documentation Open Access

url

Report an accessibility issueView

Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

290 File downloads

122 Record Views

Details

Title: Subtitle: Kernel Methods and Algorithms for General Sequence Analysis
Creators: Pavel Kuksa (Author) - Computer Science (New Brunswick)
Pai-Hsi Huang (Author) - Computer Science (New Brunswick)
Vladimir Pavlovic (Author) - Computer Science (New Brunswick)
Date published: 2008
Publisher: Rutgers University
Number of pages: 22 p.
Academic Unit: School of Arts and Sciences; Computer Science (SAS)
Language: English
Resource Type: Technical documentation
Comment: Technical report DCS-TR-630
Identifiers: 991031549908504646