Abstract
In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. Basic methodology and practical techniques are reported in detail. The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. It is particularly valuable to empirical MT research. This piece of work has also laid a foundation for exploring and harvesting English-Chinese bitexts in a larger volume from the Web.
| Original language | English |
|---|---|
| Title of host publication | 5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 - Proceedings |
| Publisher | Asian Federation of Natural Language Processing |
| Pages | 71-78 |
| Publication status | Published - 2005 |
| Event | 5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 - Jeju Island, Korea, Republic of Duration: 14 Oct 2005 → … https://aclanthology.org/volumes/I05-4/ |
Publication series
| Name | 5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 - Proceedings |
|---|
Conference
| Conference | 5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 |
|---|---|
| Place | Korea, Republic of |
| City | Jeju Island |
| Period | 14/10/05 → … |
| Internet address |
Bibliographical note
Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].Fingerprint
Dive into the research topics of 'Harvesting the bitexts of the laws of Hong Kong from the web'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver