-------------------------------------------------------------------
講演会
-----------------------8<---------------------
-------------------------------------------------------------------
Search Engines for Indian Languages
Dr. TV Prabhakar
Department of Computer Science and Engineering,
Indian Institute of Technology, Kanpur, India
Abstract. There is a great need for search engines for web documents
written in languages other than English. In this talk, we describe the
design issues of a Search Engine for Indian Languages. After introducing
Indian languages technologies for the web, we describe the
implementation of two Search Engines for Indian Languages, one for
documents in ISCII and the other for documents in Unicode. The software
allows full-text indexing and searching of a database of documents
written in any Brahmi-based Indian Language. The Search engines gather
the HTML documents from the web, index and compress the documents and
then searches for the given keywords. The main features of the search
engines are phonetic tolerance, morphological analysis, compression and
indexing, leading and trailing sub string matches for keywords, search
through compressed documents. Performance results show that the search
engine achieves a compression of almost 80 percent and has an
appreciable precision and recall.
-------------------------------------------------------------------
sigmod-japan mailing list has been moved to sigmod-japan@sigmodj.is.uec.ac.jp.
For any questions about sigmod-japan (registration, deletion, etc),
please e-mail to helpdesk@sigmodj.is.uec.ac.jp