Consolidated analysis of critical business information distributed across
structured and unstructured data is a key enabler for next generation
business intelligence and search. In this work, we address the problem of
linking a given text document with relevant structured data, retrieved
automatically from a RDBMS. We have developed a prototype system, called
EROCS, that views the structured data as a predefined set of ``entities''
and identifies the entities that best match the given document. EROCS also
embeds the identified entities in the document, effectively creating links
between the structured data and segments within the document. Unlike prior
approaches, EROCS identifies such links even when the relevant entity is
not explicitly mentioned in the document. EROCS exploits sophisticated
optimization in order to perform this task keeping the amount of
information retrieved from the database at a minimum.
(Paper appeared in VLDB 2006) |