TY - JOUR
T1 - Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection
AU - Chow, Tommy W.S.
AU - Rahman, M. K M
PY - 2009
Y1 - 2009
N2 - This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD. © 2009 IEEE.
AB - This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD. © 2009 IEEE.
KW - Document retrieval (DR)
KW - Multilayer self-organizing map (MLSOM)
KW - Plagiarism detection (PD)
KW - Tree-structured representation
UR - http://www.scopus.com/inward/record.url?scp=70349246347&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-70349246347&origin=recordpage
U2 - 10.1109/TNN.2009.2023394
DO - 10.1109/TNN.2009.2023394
M3 - RGC 22 - Publication in policy or professional journal
C2 - 19643706
SN - 1045-9227
VL - 20
SP - 1385
EP - 1402
JO - IEEE Transactions on Neural Networks
JF - IEEE Transactions on Neural Networks
IS - 9
ER -