Identification of discriminative features for biological event extraction through linguistically informed feature selection

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)22_Publication in policy or professional journal

View graph of relations



Original languageEnglish
Pages (from-to)1032-1036
Journal / PublicationJournal of Food, Agriculture and Environment
Issue number1
Publication statusPublished - 2013


Machine learning classifiers have achieved significant performance in the area of biomedical event extraction. For example, support vector machine (SVM) classifiers in the Turku Event Extraction System achieved the best performance in BioNLP09 task. Such classifiers typically rely on the use of large feature sets. Despite their robust performance, however, recent research has suggested that feature sets produced through automatic training need to be further optimized through size reduction in order to improve system performance. The current paper attempts to identify ways to reduce the size of feature sets by investigating the contribution of four different feature sets constructed according to lexical, grammatical, syntactic and semantic information. It reports an experiment based on BioNLP data prepared by the Turku team for biological event extraction and examines to what extent the dimension of the feature sets can be reduced while the classifier can still achieve similar performance. The importance of each feature set is evaluated through a SVM classifier. Our experiments demonstrate that feature set construction according to lexical, grammatical and syntactic information can effectively reduce the set size by as much as 86% while maintaining a comparable performance, hence significantly resolving the feature dimension issue. It is also shown through our experiments that a hybrid feature set constructed according to a combination of lexical and semantic information can achieve the second highest accuracy, hence indicating the useful feasibility of constructing an optimal feature set through dimension reduction and feature combination. We conclude that the experiments reported in the current paper have produced empirical evidence supporting the importance of linguistic information for the construction of high-performance feature sets in addition to domain knowledge for the task of biomedical event extraction.

Research Area(s)

  • Event extraction, Feature selection, Linguistic features, Semantic information, Support vector machine, Syntactic information, Turku event extraction system