Web Ecology: Recycling HTML pages as XML documents using W4F

Loading...
Thumbnail Image
Penn collection
Database Research Group (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Sahuguet, Arnaud
Azavant, Fabien
Contributor
Abstract

In this paper we present the World-Wide WebWrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation. Moreover, the entire description of wrappers is fully declarative. As an illustration, we demonstrate how to use W4F to cre- ate XML gateways, that serve transparently and on-the-fly HTML pages as XML documents with their DTDs.

Advisor
Date of presentation
1999
Conference name
Database Research Group (CIS)
Conference dates
2023-05-17T00:44:34.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Postprint version. Copyright ACM, 1999. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in WebDB '99.
Recommended citation
Collection