With critical business information distributed across both structured
and unstructured data sources, enterprises are increasingly realizing the
importance of seamlessly integrating relevant structured and unstructured
data. Existing information integration solutions typically address this
issue by providing a single point of access for both structured and
unstructured data sources. This is not enough, since the application still
needs to formulate the SQL logic to retrieve the needed structured data
on one hand, and identify a set of keywords to retrieve the related
unstructured data on the other. This is a limitation since (a) the same
information need needs to be formulated using two disparate paradigms,
which is redundant effort, and (b) in many cases, it is hard (even
impossible)
for the application to identify appropriate keywords needed as above to
retrieve related unstructured data. The SCORE project addresses this
limitation by following a novel approach to information integration.
In this approach, the application specifies its information needs using only
a SQL query on the structured data, and the system automatically
"translates"
this query into a set of keywords that can be used to retrieve relevant
unstructured data. In this paper, we describe the techniques used in SCORE
for this query translation, and also present an experimental study that
illustrates the effectiveness of these techniques. |