2001 | OriginalPaper | Buchkapitel
Improving Identification of Difficult Small Classes by Balancing Class Distribution
verfasst von : Jorma Laurikkala
Erschienen in: Artificial Intelligence in Medicine
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
We studied three methods to improve identification of difficult small classes by balancing imbalanced class distribution with data reduction. The new method, neighborhood cleaning rule (NCL), outperformed simple random and one-sided selection methods in experiments with ten data sets. All reduction methods improved identification of small classes (20–30%), but the differences were insignificant. However, significant differences in accuracies, true-positive rates and true-negative rates obtained with the 3-nearest neighbor method and C4.5 from the reduced data favored NCL. The results suggest that NCL is a useful method for improving the modeling of difficult small classes, and for building classifiers to identify these classes from the real-world data.