Acessibilidade / Reportar erro

Data mining-based technique on sheep breed certification

ABSTRACT

This study aimed at developing a method based on data mining techniques to select key SNP markers (Single Nucleotide Polymorphism) for the sheep breeds Crioula, Morada Nova and Santa Inês. We gathered data from the International Sheep Consortium of 72 animals belonging to the aforementioned breeds; each animal has 49,034 SNP markers. Whereas the number of attributes (markers) is much greater than observations (animals), the LASSO (Least Absolute Shrinkage and Selection Operator), Random Forest and Boosting prediction methods were used to generate predictive models, incorporating selection methods and attributes. The results revealed that the predictive models selected the main SNP markers for sheep breed identification. The LASSO technique selected 29 relevant markers. Yet from Random Forest and Boosting selected 27 and 20 major markers, respectively. By intersecting the generated models, we could identify a subset of 18 markers with major potential for sheep breed identification.

single-nucleotide polymorphism; feature selection; predictive modeling; penalized regression

Associação Brasileira de Engenharia Agrícola SBEA - Associação Brasileira de Engenharia Agrícola, Departamento de Engenharia e Ciências Exatas FCAV/UNESP, Prof. Paulo Donato Castellane, km 5, 14884.900 | Jaboticabal - SP, Tel./Fax: +55 16 3209 7619 - Jaboticabal - SP - Brazil
E-mail: revistasbea@sbea.org.br