Systems for extracting interesting structure from databases,
especially large data stores are becoming a necessity. The
existing data access model is clearly hitting its limits. Data
Mining methods provide a way to address some of these
problems. These methods have their origins in statistics,
databases, pattern recognition, learning, visualization,
and parallel computing. I'll outline some recent advances
towards scaling mining algorithms to large database, and cover
the research challenges and opportunities posed by the problem
of extracting models from massive data sets. The talk will
particularly focus on the decomposition of algorithm so they
work effectively with a database system backend. I'll outline
the research challenges and opportunities posed by the problem
of extracting models from massive data sets. Operating under
such scalability constraints poses interesting problems for
how models can be built and what methods are practical. Some
applications will be used to motivate and illustrate the
techniques. Of special interest is an application in science
data analysis: the automated cataloging and analysis of the 2
billion astronomical objects detectable in the Second Palomar
Observatory Sky Survey conducted by Caltech. |