Database Research Group (CIS)

Document Type

Journal Article

Date of this Version

March 2001

Comments

Postprint version. Published in Data and Knowledge Engineering, Volume 36, Issue 3, 2001, pages 283-316.
Publisher URL: http://dx.doi.org/10.1016/S0169-023X(00)00051-3

Abstract

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier.

Keywords

web, XML, information extraction, wrappers

Share

COinS
 

Date Posted: 08 June 2007

This document has been peer reviewed.