Taming Web Sources with "Minute-Made" Wrappers
Penn collection
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers. Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators).