A Novel Hierarchical Discourse Model for Scientific Article and It's Efficient Top-K Resampling-based Text Classification Approach

Min Gao, Chun-Hua Chen*, Zhi-Han Gao, Wei-Long Chen, Yuan Ren, Sam Kwong, Zhi-Hui Zhan*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

3 Citations (Scopus)

Abstract

Scientific articles contain rich knowledge that can significantly assists scientific research, but it is difficult to precisely extract knowledge information due to the complexity of the discourse structure of scientific articles. To provide more accurate scientific research knowledge for researchers in a specific academic domain, it is necessary to study the discourse structure of domain scientific articles and to propose an automatic annotation approach to automatically annotate discourse information from articles. Unfortunately, few works have studied the discourse structure of domain scientific articles and the corresponding automatic discourse annotation. To fill this gap, we take scientific articles of the wastewater-based epidemiology domain as a case to study how to automatically and efficiently annotate discourse information. This paper has three contributions. Firstly, we propose a hierarchical discourse model with two layers to cover all potential discourses in various domain scientific articles. Specifically, the first layer defines four core discourse concepts to describe the main process of a scientific research which can be applied in all scientific articles in various domains. The second layer defines fine-granular domain-specific structure, which can accurately describe the entire research contents of a specific domain. Secondly, based on the proposed model, we build a corpus dataset of 100 annotated scientific articles in the wastewater-based epidemiology domain. Thirdly, based on the model and dataset, we propose a simple yet efficient Top-K resampling-based approach to train a more effective classifier for automatic annotation. Extensive experiments verify the effectiveness and efficiency of our proposed hierarchical discourse model and the Top-K resampling-based classification approach.
Original languageEnglish
Title of host publication2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) - Proceedings
PublisherIEEE
Pages774-781
ISBN (Electronic)978-1-6654-5258-8
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Prague, Czech Republic
Duration: 9 Oct 202212 Oct 2022

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Volume2022-October
ISSN (Print)1062-922X

Conference

Conference2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022
PlaceCzech Republic
CityPrague
Period9/10/2212/10/22

Research Keywords

  • automatic annotation
  • discourse
  • scientific articles
  • text classification

Fingerprint

Dive into the research topics of 'A Novel Hierarchical Discourse Model for Scientific Article and It's Efficient Top-K Resampling-based Text Classification Approach'. Together they form a unique fingerprint.

Cite this