The effect of class distribution on classifier learning

Gary M. Weiss; Foster Provost

doi:10.7282/t3-v9kt-9510

Back

The effect of class distribution on classifier learning

Technical documentation

Open access

The effect of class distribution on classifier learning

Gary M. Weiss and Foster Provost

Rutgers University

2001

DOI:

https://doi.org/10.7282/t3-v9kt-9510

Abstract

Many of today's large data sets must be reduced in size before invoking inductive algorithms, due to the costs associated with procuring/processing the data, and because most of these algorithms cannot handle enormous amounts of data. In these cases it is important to select the training data carefully so the impact on classifier performance is minimized. A tacit assumption behind much research on classifier induction is that the class distribution of the training data should match the "natural" distribution of the data. In this paper we analyze the relationship between training class distribution and classifier performance on 25 data sets and show that the natural distribution usually is not the best distribution for learning--a different class distribution should generally be chosen when the dataset size must be limited. We also explain how changing the class distribution of the training set affects classifier learning and why one training distribution might be better than another.

Files and links (2)

pdf

ml-tr-43125.83 kBDownload View

Author's Original (AO) Open Access

url

Report an accessibility issueView

Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

186 File downloads

243 Record Views

Details

Title: Subtitle: The effect of class distribution on classifier learning
Creators: Gary M. Weiss (Author) - Computer Science (New Brunswick)
Foster Provost (Author) - Stern School of Business, New York University
Date published: 2001
Publisher: Rutgers University
Number of pages: 1 online resource (6 pages) : illustrations
Academic Unit: School of Arts and Sciences; Computer Science (SAS)
Language: English
Resource Type: Technical documentation
Comment: Technical report ml-tr-43
Identifiers: 991031549986704646