Annotation, processing and visualization of functional-semantic information
Student thesis: Doctoral Thesis
The functional-semantic approach to the description and analysis of language has gained increasing influence and adoption as a powerful alternative to traditional formalist theories. Rhetorical Structure Theory (RST) and Systemic Functional Linguistics (SFL) are two prominent functionally oriented linguistic frameworks that have important applications in a wide range of areas including natural language understanding, discourse analysis, natural language generation and dialogue systems. They provide descriptions of the functional-semantic structure of text at two complementary levels: the beyond-clause, discourse-level structure accounting for the overall architecture of text, and the clause-level structure accounting for the detailed texture consisting of clausal constituents that serve specific functional roles. Together the two frameworks provide a holistic approach to the exploration of functional-semantic patterning in text. However, the applications of such frameworks have been highly limited by the overwhelming complexity and inefficiency in manual analysis, especially when applied to large-scale texts. Meanwhile, a lack of corpus resources and computational tools has made it difficult to automate applications of such theories to deal with interesting, larger-scale problems. This dissertation aims to explore an automated, data-driven approach to analyzing functional-semantic information from the perspective of RST and SFL. This approach to functional-semantic processing consists of several interrelated subproblems: 1) representing and annotating textual data with functional-semantic information, 2) computing and coding functional components and integrating them with existing resources and 3) presenting the annotated/computed functionalsemantic coding in a clear and intuitive manner. For these problems we present the design and development of a platform dealing with the annotation, computational processing and visualization of functional-semantic information. We survey the major issues about and present initial solutions to some of the key problems in building the core components of the proposed platform by drawing on data-driven machine learning methods, and synthesizing them in an integrated pipeline to achieve maximal automation. In the first stages of the pipeline, we addressed several infrastructural issues in collaborative annotation of functional-semantic information and compiled a first-ofits- kind functional corpus annotated with the transitivity structure. Next, training state-of-the-art machine learning algorithms on the annotated samples, we automated the coding of the two complementary levels of functional-semantic analysis. Finally, we proposed an innovative visualization interface for more effective analysis of the coded information. The resulting platform constructed following the pipeline demonstrates great potential for large-scale text analysis as well as a number of other applications.
- Functionalism (Linguistics), Semantics, Data processing, Information visualization, Computational linguistics