
Database Research Group (CIS)
Document Type
Journal Article
Date of this Version
March 2001
Abstract
The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier.
Keywords
web, XML, information extraction, wrappers
Date Posted: 08 June 2007
This document has been peer reviewed.

Comments
Postprint version. Published in Data and Knowledge Engineering, Volume 36, Issue 3, 2001, pages 283-316.
Publisher URL: http://dx.doi.org/10.1016/S0169-023X(00)00051-3