Supervised learning with imbalanced data sets: an overview

Castro, Cristiano Leite de; Braga, Antônio Pádua

doi:10.1590/S0103-17592011000500002

Acessibilidade / Reportar erro

Brasil

Sba: Controle & Automação Sociedade Brasileira de Automatica

Español English

Brasil

Español English

sumário « anterior atual seguinte »

Sumário

Sistemas Inteligentes • Sba Controle & Automação 22 (5) • Oct 2011 • https://doi.org/10.1590/S0103-17592011000500002 copy

Supervised learning with imbalanced data sets: an overview

Authorship SCIMAGO INSTITUTIONS RANKINGS

Traditional learning algorithms induced by complex and highly imbalanced training sets may have difficulty in distinguishing between examples of the groups. The tendency is to create classification models that are biased toward the overrepresented (majority) class, resulting in a low rate of recognition for the minority group. This paper provides a survey of this problem which has attracted the interest of many researchers in recent years. In the scope of two-class classification tasks, concepts related to the nature of the imbalanced class problem and evaluation metrics are presented, including the foundations of the ROC (Receiver Operating Characteristic) analysis; plus a state of the art of the proposed solutions. At the end of the paper a brief discussion on how the subject can be extended to multiclass learning is provided.

imbalanced data sets; supervised learning; evaluation metrics; ROC analysis; resampling methods; costsensitive approach

Sociedade Brasileira de Automática Secretaria da SBA, FEEC - Unicamp, BLOCO B - LE51, Av. Albert Einstein, 400, Cidade Universitária Zeferino Vaz, Distrito de Barão Geraldo, 13083-852 - Campinas - SP - Brasil, Tel.: (55 19) 3521 3824, Fax: (55 19) 3521 3866 - Campinas - SP - Brazil
E-mail: revista_sba@fee.unicamp.br

Acompanhe os números deste periódico no seu leitor de RSS