A corpus-based study on zero anaphora resolution in Chinese discourse
基於語料庫的漢語零形代詞的指代確認
Student thesis: Doctoral Thesis
Author(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 15 Feb 2008 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(8516a896-1a10-4585-808b-d4eb342f8a5e).html |
---|---|
Other link(s) | Links |
Abstract
This research empirically explores the way that a zero pronoun refers to
its antecedent, and implements systems which automatically resolve
Chinese zero anaphora. It is proposed that for the purpose of discourse
analysis, each individual clause can be defined on the basis of one
predicate-argument structure. A zero pronoun occurs if an obligatory
argument of a predicate is absent from the surface structure of a clause to
which the predicate corresponds. As semantics and reasoning are avoided
as much as possible in this study, the information annotated in our corpus
is deliberately limited to shallow syntactic information and anaphoric links.
Zero anaphora is categorized into intra- and inter-clausal anaphora.
Drawn insights from the study on empty categories and binding principles
[Chomsky 1981, 1982], we put an emphasis on the grammatical and
pragmatic correlations held between a zero pronoun and its antecedent.
The correlations are expressed in terms of the paths between a zero
pronoun and its antecedent, and the categories of these paths.
We implement two systems for zero anaphora resolution. One system
operates within the framework of Centering Theory. By taking advantage
of the hierarchical structure of a discourse, we extend the centering
algorithm in such a way that Cp is the preferred interpretation of a zero
pronoun in both inter- and intra-clausal anaphora. The other system
makes use of a machine-learning algorithm, the decision tree, to resolve
zero anaphora. It is revealed that while the factors representing the
predictions based on Centering Theory capture the anaphora related to salience, those characterizing the categories of the relation between a
zero pronoun and its antecedent candidate capture the anaphora with
other correlations. Several decision tree-based systems outperform the
system based on Centering Theory.
It is concluded that the resolution of zero anaphora is not only a matter of
the competition of salience, but also related to the grammatical and
pragmatic correlations between a zero pronoun and its antecedent. The
hierarchical structure of a discourse is a significant factor in the
assessment of salience and correlations. When this factor is taken into
account, the systems present promising results in the resolution of zero
anaphora in Chinese discourse.
- Discourse analysis, Chinese language, Corpora (Linguistics), Anaphora