May 1987


University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-87-31.


This paper discusses a system for talking about objects and spatial relations. The work was done in the context of a project called Landscan, for Language-Driven Scene Analyser. The system takes questions in natural language about a partially analysed image of a scene, extends the analysis of the scene as necessary, and responds with information about the objects it contains. Image processing and reasoning about the scene are guided by the input query. Landscan comprises (1) a vision system, which is responsible for image processing and object recognition, (2) a language processor, responsible for understanding the input queries, and (3) a reasoning agent, to determine what is already known or knowable about the subject of the query, to formulate requests for data to the vision system as necessary, and to compile those data into meaningful answers.

This report is concerned with the last two. Since most queries in this context concern objects and their spatial relations, it describes a computational treatment of Herskovits' work on locative expressions, and evaluates the usefulness of Herskovits' approach for this system. It also proposes a general design for the reasoner/interface, outlines the protocols required for the language and vision systems to interact with it, and points out aspects of the project needing particular attention. The very ambitious scope of the Landscan project has naturally made it difficult to do more than point the way to further exploration of many issues.



