Exploiting knowledge of uncertainty: induction of classifiers by the incremental combination of probabilistic evidence

Steven W. Norton

doi:10.7282/t3-yhgg-jq36

Back

Exploiting knowledge of uncertainty: induction of classifiers by the incremental combination of probabilistic evidence

Technical documentation

Open access

Exploiting knowledge of uncertainty: induction of classifiers by the incremental combination of probabilistic evidence

Steven W. Norton

Rutgers University

1995

DOI:

https://doi.org/10.7282/t3-yhgg-jq36

Abstract

Addressing noise and uncertainty in training data is an important issue in inductive learning. Inductive learners are necessarily sensitive to noise and uncertainty in training data, since training data constitutes the primary basis for generalization. Some of today's more popular off-the-shelf learners ignore the presence of imperfect data or invoke statistically motivated post-processing to help compensate for its unwanted effects; none exploits specific knowledge of noise or uncertainty. Several research projects have taken a step in this direction by explicitly addressing noise in training data. Unfortunately, these works are limited because they depend on particular models of environmental noise, overly restrictive concept-description languages, and sometimes unrealistic sample complexity. This dissertation describes a knowledge-based approach that uses uncertain reasoning to help overcome these limitations. In what follows, learning from imperfect data is formulated as the search for a hypothesis with maximum {em a posteriori} probability. Implementing the search as incremental probabilistic-evidence combination extends the range of useful uncertainty models to those described by discrete probability distributions. I built a novel conjunction learner and from it an iterative DNF learner. On standard datasets, where strong knowledge is unavailable, the DNF learner is competitive with conventional learners. In experiments using synthetic data, where strong knowledge is available, the knowledge-based learners are superior to their more familiar, conventional counterparts. To demonstrate that problem-specific uncertainty models can be engineered and used effectively in practical problems, the evidence-combination approach was used to address a difficult open problem in molecular biology: learning to recognize promoter sequences in {em E.~coli}. Earlier efforts notwithstanding, the inherent uncertainty as to the location of the biologically active regions in the raw DNA data actually invalidates the direct application of many standard inductive learning methods. Here, knowledge from molecular biology was used instead to engineer models of three domain uncertainties and a mapping from raw sequence data to a plausible and focused evidence representation. The evidence-combination approach then yields classifiers that are accurate and credible, and the best yet developed for this important problem.

Files and links (2)

pdf

ml-tr-401.00 MBDownload View

Version of Record (VoR) Technical Documentation Open Access

url

Report an accessibility issueView

Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

61 File downloads

68 Record Views

Details

Title: Subtitle: Exploiting knowledge of uncertainty: induction of classifiers by the incremental combination of probabilistic evidence
Creators: Steven W. Norton (Author) - Computer Science (New Brunswick)
Date published: 1995
Publisher: Rutgers University
Number of pages: 1 online resource (196 pages) : illustrations
Academic Unit: School of Arts and Sciences; Computer Science (SAS)
Language: English
Resource Type: Technical documentation
Comment: Technical report ML-TR-40
Identifiers: 991031549862004646