Binary classification algorithms are often used in situations when one of the two classes is extremely rare. A common practice is to oversample units of the rare class when forming the training set. For some classification algorithms, like logistic classification, there are thorethical results that justify such an approach. Similar results are not available for other popular classification algorithms like classification trees. In this paper the use of balanced datasets, when dealing with rare classes, for tree classifiers and boosting algorithms is discussed and results from analyzing a real dataset and a simulated dataset are reported.
Selecting the training set in classification problems with rare events
SCARPA, BRUNO;TORELLI, NICOLA
2004
Abstract
Binary classification algorithms are often used in situations when one of the two classes is extremely rare. A common practice is to oversample units of the rare class when forming the training set. For some classification algorithms, like logistic classification, there are thorethical results that justify such an approach. Similar results are not available for other popular classification algorithms like classification trees. In this paper the use of balanced datasets, when dealing with rare classes, for tree classifiers and boosting algorithms is discussed and results from analyzing a real dataset and a simulated dataset are reported.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.