0000000001132291
AUTHOR
Marius André Sætren
showing 1 related works from this author
Semi-automatic web resource discovery using ontology-focused crawling
2005
Masteroppgave i informasjons- og kommunikasjonsteknologi 2005 - Høgskolen i Agder, Grimstad The enormous amount of information available on the Internet makes it difficult to find resources with relevant information using regular breadth-first crawlers. Focused crawlers seek to exclusively find web pages that are relevant for the user, and avoid downloading irrelevant web pages. Ontologies have recently been proposed as a tool for defining the target domain for focused crawlers. In this project we have developed a prototype of an ontology-focused crawler. We have accomplished this by developing extra modules to the Java open source crawler Heritrix. In one of the modules we have developed, …