Annotation, processing and visualization of functional-semantic information
功能語義信息的標註, 處理及可視化
Student thesis: Doctoral Thesis
Author(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 15 Jul 2014 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(ffdb06f7-76f2-47cc-823a-86755e199529).html |
---|---|
Other link(s) | Links |
Abstract
The functional-semantic approach to the description and analysis of language
has gained increasing influence and adoption as a powerful alternative to traditional
formalist theories. Rhetorical Structure Theory (RST) and Systemic Functional
Linguistics (SFL) are two prominent functionally oriented linguistic frameworks that
have important applications in a wide range of areas including natural language
understanding, discourse analysis, natural language generation and dialogue systems.
They provide descriptions of the functional-semantic structure of text at two
complementary levels: the beyond-clause, discourse-level structure accounting for
the overall architecture of text, and the clause-level structure accounting for the
detailed texture consisting of clausal constituents that serve specific functional roles.
Together the two frameworks provide a holistic approach to the exploration of
functional-semantic patterning in text.
However, the applications of such frameworks have been highly limited by the
overwhelming complexity and inefficiency in manual analysis, especially when
applied to large-scale texts. Meanwhile, a lack of corpus resources and
computational tools has made it difficult to automate applications of such theories to
deal with interesting, larger-scale problems.
This dissertation aims to explore an automated, data-driven approach to
analyzing functional-semantic information from the perspective of RST and SFL.
This approach to functional-semantic processing consists of several interrelated subproblems:
1) representing and annotating textual data with functional-semantic
information, 2) computing and coding functional components and integrating them
with existing resources and 3) presenting the annotated/computed functionalsemantic
coding in a clear and intuitive manner.
For these problems we present the design and development of a platform
dealing with the annotation, computational processing and visualization of
functional-semantic information. We survey the major issues about and present
initial solutions to some of the key problems in building the core components of the
proposed platform by drawing on data-driven machine learning methods, and
synthesizing them in an integrated pipeline to achieve maximal automation. In the
first stages of the pipeline, we addressed several infrastructural issues in collaborative annotation of functional-semantic information and compiled a first-ofits-
kind functional corpus annotated with the transitivity structure. Next, training
state-of-the-art machine learning algorithms on the annotated samples, we automated
the coding of the two complementary levels of functional-semantic analysis. Finally,
we proposed an innovative visualization interface for more effective analysis of the
coded information. The resulting platform constructed following the pipeline
demonstrates great potential for large-scale text analysis as well as a number of other
applications.
- Functionalism (Linguistics), Semantics, Data processing, Information visualization, Computational linguistics