Skip to main navigation Skip to search Skip to main content

Collaborative matching for sentence alignment

Xiaojun Quan*, Chunyu Kit, Wuya Chen

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Existing sentence alignment methods are founded fundamentally on sentence length and lexical correspondences. Methods based on the former follow in general the length proportionality assumption that the lengths of sentences in one language tend to be proportional to that of their translations, and are known to bear poor adaptivity to new languages and corpora. In this paper, we attempt to interpret this assumption from a new perspective via the notion of collaborative matching, based on the observation that sentences can work collaboratively during alignment rather than separately as in previous studies. Our approach is tended to be independent on any specific language and corpus, so that it can be adaptively applied to a variety of texts without binding to any prior knowledge about the texts. We use one-to-one sentence alignment to illustrate this approach and implement two specific alignment methods, which are evaluated on six bilingual corpora of different languages and domains. Experimental results confirm the effectiveness of this collaborative matching approach.
Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Subtitle of host publication17th China National Conference, CCL 2018, and 6th International Symposium, NLP-NABD 2018, Proceedings
EditorsMaosong Sun, Ting Liu, Xiaojie Wang, Zhiyuan Liu, Yang Liu
PublisherSpringer, Cham
Pages39-52
ISBN (Electronic)978-3-030-01716-3
ISBN (Print)978-3-030-01715-6
DOIs
Publication statusPublished - Oct 2018
Event17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018 - Changsha, China
Duration: 19 Oct 201821 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11221 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018
PlaceChina
CityChangsha
Period19/10/1821/10/18

Research Keywords

  • Machine translation
  • Sentence alignment

Fingerprint

Dive into the research topics of 'Collaborative matching for sentence alignment'. Together they form a unique fingerprint.

Cite this