Operations, Information and Decisions Papers

Document Type

Journal Article

Date of this Version

12-2011

Publication Source

Information Systems Research

Volume

22

Issue

4

Start Page

739

Last Page

755

DOI

10.1287/isre.1100.0287

Abstract

Information specialists in enterprises regularly use distributed information retrieval (DIR) systems that query a large number of information retrieval (IR) systems, merge the retrieved results, and display them to users. There can be considerable heterogeneity in the quality of results returned by different IR servers. Further, because different servers handle collections of different sizes and have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the DIR system has to decide which servers to query, how long to wait for responses, and which retrieved results to display based on the benefits and costs imposed on users. The benefit of querying more servers and waiting longer is the ability to retrieve more documents. The costs may be in the form of access fees charged by IR servers or user’s cost associated with waiting for the servers to respond. We formulate the broker’s decision problem as a stochastic mixed-integer program and present analytical solutions for the problem. Using data gathered from FedStats—a system that queries IR engines of several U.S. federal agencies—we demonstrate that the technique can significantly increase the utility from DIR systems. Finally, simulations suggest that the technique can be applied to solve the broker’s decision problem under more complex decision environments.

Keywords

distributed information retrieval (IR), personalization, utility theory, optimal operational decisions, source selection, query termination, stochastic modeling

Share

COinS
 

Date Posted: 27 November 2017

This document has been peer reviewed.