Fitness assessment of document model

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)893-903
Journal / PublicationInternational Journal of Systems Science
Volume37
Issue number13
Publication statusPublished - 20 Oct 2006
Externally publishedYes

Abstract

Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, "Resource selection for domain-specific cross-lingual IR", in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.

Research Area(s)

  • Document classification, Document model, Fitness measurement

Bibliographic Note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Citation Format(s)

Fitness assessment of document model. / Chen, Ding-Yi; Li, Xue; Dong, Zhao Yang et al.
In: International Journal of Systems Science, Vol. 37, No. 13, 20.10.2006, p. 893-903.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review