Querying Provenance for Ranking and Recommending

As has been frequently observed in the literature, there is a strong connection between a derived data item’s provenance and its authoritativeness, utility, relevance, or probability. A standard way of obtaining a score for a derived tuple is by first assigning scores to the “base” tuples from which it is derived — then using the semantics of the query and the score measure to derive a value for the tuple. This “provenance-enabled” scoring has led to a variety of scenarios where tuples’ intrinsic value is based on their provenance, independent of whatever other tuples exist in the data set. However, there is another class of applications, revolving around sharing and recommendation, in which our goal may be to rank tuples by their “importance” or the structure of their connectivity within the provenance graph. We argue that the most natural approach is to exploit the structure of a provenance graph to rank and recommend “interesting” or “relevant” items to users, based on global and/or local provenance graph structure and random walk-based algorithms. We further argue that it is desirable to have a high-level declarative language to extract portions of the provenance graph and then apply the random walk computations. We extend the ProQL provenance query language to support a wide array of random walk algorithms in a high-level way, and identify opportunities for query optimization.

Date of presentation

2012-06-01

Conference name

Departmental Papers (CIS)

Conference dates

2023-05-17T07:12:16.000

Comments

Ives, Z., Haeberlen, A., Feng, T., & Gatterbauer, W., Querying Provenance for Ranking and Recommending, 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP'12), June 2012, https://www.usenix.org/conference/tapp12/querying-provenance-ranking-and-recommending

Collection

Presentations