Users are increasingly relying on search engines to obtain
useful information from the web. It is becoming more and more
difficult for users to find relevant information as a large number of
documents are returned as a result of a search. Hence, in order to
make the search, it is necessary to categorize documents into sets
(i.e. clusters) based on some subject or similarity. A way to cluster
documents based on relative similarity between them will be explored
in this talk. The documents are scanned and important keywords or
document representatives are obtained from each document. Weights are
assigned to these keywords based on their location in the document,
frequency and various other factors. We will then discuss the
Row-Column Iterative Algorithm that is applied on the set of N
documents to form clusters based on relative similarity of
documents. We will also discuss some on-going research projects. |