WysiWyg Web Wrapper Factory (W4F)

Loading...
Thumbnail Image
Penn collection
Database Research Group (CIS)
Degree type
Discipline
Subject
Web wrapper
information extraction
HTML parsing
HTML to XML conversion
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Sahuguet, Arnaud
Azavant, Fabien
Contributor
Abstract

In this paper, we present the W4F toolkit for the generation of wrappers for Web sources. W4F consists of a retrieval language to identify Web sources, a declarative extraction language (the HTML Extraction Language) to express robust extraction rules and a mapping interface to export the extracted information into some user-defined data-structures. To assist the user and make the creation of wrappers rapid and easy, the toolkit offers some wysiwyg support via some wizards. Together, they permit the fast and semi-automatic generation of ready-to-go wrappers provided as Java classes. W4F has been successfully used to generate wrappers for database systems and software agents, making the content of Web sources easily accessible to any kind of application.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
1999
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Working Paper, 1999, 22 pages. Unpublished.
Recommended citation
Collection