Skip to main navigation Skip to search Skip to main content

Harvesting the bitexts of the laws of Hong Kong from the web

  • Chunyu Kit
  • , Xiaoyue Liu
  • , KingKui Sin
  • , Jonathan J. Webster

    Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

    Abstract

    In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. Basic methodology and practical techniques are reported in detail. The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. It is particularly valuable to empirical MT research. This piece of work has also laid a foundation for exploring and harvesting English-Chinese bitexts in a larger volume from the Web.
    Original languageEnglish
    Title of host publication5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 - Proceedings
    PublisherAsian Federation of Natural Language Processing
    Pages71-78
    Publication statusPublished - 2005
    Event5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 - Jeju Island, Korea, Republic of
    Duration: 14 Oct 2005 → …
    https://aclanthology.org/volumes/I05-4/

    Publication series

    Name5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005 - Proceedings

    Conference

    Conference5th Workshop on Asian Language Resources, ALR 2005 and 1st Symposium on Asian Language Resources Network, ALRN 2005
    PlaceKorea, Republic of
    CityJeju Island
    Period14/10/05 → …
    Internet address

    Bibliographical note

    Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

    Fingerprint

    Dive into the research topics of 'Harvesting the bitexts of the laws of Hong Kong from the web'. Together they form a unique fingerprint.

    Cite this