Acessibilidade / Reportar erro

Classificação Morfológica de Galáxias em Conjuntos de Dados Desbalanceados Este trabalho foi parcialmente financiado pela Fundação de Amparo á Pesquisa do Estado de São Paulo (2014/25302-2) e pelo Conselho Nacional de Desenvolvimento Científico e Tecnológico (200959/2010-7).

ABSTRACT

Galaxies can have various morphologies, which are an important source of information for cosmology. The Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) is a survey of thousands of galaxy images far from the Earth. Unfortunately, it is not possible to manually classify all of these galaxies. Hence, it is important to develop automatic classifiers that are able to accurately predict morphologies using such images. Unfortunately, standard prediction techniques have low predictive power on unbalanced datasets such as CANDELS. Hence, this work aims at studying three classification approaches developed to improve classification on unbalanced data using CANDELS. We deal with the problem of classifying galaxies as regulars and as mergers. We show that oversampling and changing the cutoff were effective approaches to improve merger classification, while they were not so effective in classifying regular galaxies. We also show that all classification methods used (classification trees, random forests and penalized logistic regression) yielded similar predictions, which indicates that better predictions could only be obtained by including new summary statistics of the images or by acquiring larger data sets.

Keywords:
Classification; unbalanced datasets; machine learning

Sociedade Brasileira de Matemática Aplicada e Computacional Rua Maestro João Seppe, nº. 900, 16º. andar - Sala 163 , 13561-120 São Carlos - SP, Tel. / Fax: (55 16) 3412-9752 - São Carlos - SP - Brazil
E-mail: sbmac@sbmac.org.br