
Database Research Group (CIS)
Document Type
Working Paper
Date of this Version
January 1999
Abstract
The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers.
Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators).
Date Posted: 28 June 2007
Comments
Database Research Group.