Supervised Latent Semantic Indexing for document categorization

Jian-Tao Sun, Zheng Chen, Hua-Jun Zeng, Yu-Chang Lu, Chun-Yi Shi, Wei-Ying Ma

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

38 Citations (Scopus)

Abstract

Latent Semantic Indexing (LSI) is a successful technology in information retrieval (IR) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However, LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper, we propose Supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results. © 2004 IEEE.
Original languageEnglish
Title of host publicationProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
Pages535-538
DOIs
Publication statusPublished - 2004
Externally publishedYes
EventProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004 - Brighton, United Kingdom
Duration: 1 Nov 20044 Nov 2004

Publication series

NameProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004

Conference

ConferenceProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
PlaceUnited Kingdom
CityBrighton
Period1/11/044/11/04

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Fingerprint

Dive into the research topics of 'Supervised Latent Semantic Indexing for document categorization'. Together they form a unique fingerprint.

Cite this