TY - GEN
T1 - Supervised Latent Semantic Indexing for document categorization
AU - Sun, Jian-Tao
AU - Chen, Zheng
AU - Zeng, Hua-Jun
AU - Lu, Yu-Chang
AU - Shi, Chun-Yi
AU - Ma, Wei-Ying
N1 - Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
PY - 2004
Y1 - 2004
N2 - Latent Semantic Indexing (LSI) is a successful technology in information retrieval (IR) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However, LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper, we propose Supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results. © 2004 IEEE.
AB - Latent Semantic Indexing (LSI) is a successful technology in information retrieval (IR) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However, LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper, we propose Supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results. © 2004 IEEE.
UR - http://www.scopus.com/inward/record.url?scp=19544389297&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-19544389297&origin=recordpage
U2 - 10.1109/ICDM.2004.10004
DO - 10.1109/ICDM.2004.10004
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 0769521428
SN - 9780769521428
T3 - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
SP - 535
EP - 538
BT - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
T2 - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
Y2 - 1 November 2004 through 4 November 2004
ER -