Acessibilidade / Reportar erro

A Critique Empirical Evaluation of Relevance Computation for Focused Web Crawlers

HIGHLIGHTS

  • This paper presents a survey on focused web crawlers.

  • This paper presents the challenges in focused crawling research.

  • This paper presents the highlights and hindrances of existing focused web crawlers.

  • This paper also presents the future scope for research in focused web crawling.

Abstract

Analogous to the spectacular growth of information-superhighway, The Internet, demands for coherent and economical crawling methods are translucent to shoot up. Consequently, many innovative techniques have been put forth for efficient crawling. Among them the significant one is focused crawlers. The focused crawlers are capable in searching web pages that are suitable for the topics defined in advance. Focused crawlers attract several search engines on the grounds of efficient filtering, reduced memory and time consumption. This paper furnishes a relevance computation based survey on web crawling. A bunch of fifty two focused crawlers from the existing literature survey is categorized to four different classes - classic focused crawler, semantic focused crawler, learning focused crawler and ontology learning focused crawler. The prerequisite and the mastery of each metric with respect to harvest rate, target recall, precision and F1-score are discussed. Future outlooks, shortcomings and strategies are also suggested.

Keywords:
Web Crawler; Focused Crawler; Semantic Crawler; Learning Crawler; Machine Learning; Ontology

Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 Curitiba PR Brazil, Tel.: +55 41 3316-3052/3054, Fax: +55 41 3346-2872 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br