A corpus-based study on zero anaphora resolution in Chinese discourse
Student thesis: Doctoral Thesis
This research empirically explores the way that a zero pronoun refers to its antecedent, and implements systems which automatically resolve Chinese zero anaphora. It is proposed that for the purpose of discourse analysis, each individual clause can be defined on the basis of one predicate-argument structure. A zero pronoun occurs if an obligatory argument of a predicate is absent from the surface structure of a clause to which the predicate corresponds. As semantics and reasoning are avoided as much as possible in this study, the information annotated in our corpus is deliberately limited to shallow syntactic information and anaphoric links. Zero anaphora is categorized into intra- and inter-clausal anaphora. Drawn insights from the study on empty categories and binding principles [Chomsky 1981, 1982], we put an emphasis on the grammatical and pragmatic correlations held between a zero pronoun and its antecedent. The correlations are expressed in terms of the paths between a zero pronoun and its antecedent, and the categories of these paths. We implement two systems for zero anaphora resolution. One system operates within the framework of Centering Theory. By taking advantage of the hierarchical structure of a discourse, we extend the centering algorithm in such a way that Cp is the preferred interpretation of a zero pronoun in both inter- and intra-clausal anaphora. The other system makes use of a machine-learning algorithm, the decision tree, to resolve zero anaphora. It is revealed that while the factors representing the predictions based on Centering Theory capture the anaphora related to salience, those characterizing the categories of the relation between a zero pronoun and its antecedent candidate capture the anaphora with other correlations. Several decision tree-based systems outperform the system based on Centering Theory. It is concluded that the resolution of zero anaphora is not only a matter of the competition of salience, but also related to the grammatical and pragmatic correlations between a zero pronoun and its antecedent. The hierarchical structure of a discourse is a significant factor in the assessment of salience and correlations. When this factor is taken into account, the systems present promising results in the resolution of zero anaphora in Chinese discourse.
- Discourse analysis, Chinese language, Corpora (Linguistics), Anaphora