Abstract
When integrating biographical data extracted from 2,000+ local gazetteers (difangzhi) into the China Biographical Database (CBDB), records of the same historical person has to be identified and linked—this is the procedure of “disambiguating” them in the datafication processes of biographical information for prosopographical databases. The said data was for populating CBDB, a relational database with biographical information about approximately 471k individuals (as of November 2020), which is meant to be useful for statistical, social network, spatial, and other kinds of analyses. Traditional Chinese naming customs pose big challenges to this disambiguation, however, given the number of identical names, especially for a local gazetteer dataset containing 0.12 million records and 90k unique names of government officials from imperial China. Also, useful variables are missing in numerous entries in those gazetteers. In my conference presentation, I lay out the solutions to disambiguating identical personal names in Chinese script. First, the individuals who repeatedly took official posts in the same locality are identified digitally and are then disambiguated. Second, the overlap of content in different gazetteers are cross-tabulated, and the overlapping entries in those titles are processed through this. Finally, the remaining data is corroborated with external datasets e.g. the China Government Employee Database – Qing (CGED-Q) developed by the Lee-Campbell research group. With these workflows, 51k personal names from premodern China are disambiguated with optimal precision and unprecedented efficiency. Such task is only possible if done digitally and serves as an example of what digital humanities could achieve for research on Chinese history. The techniques explored in this study will also be useful for disambiguation and Named Entity Recognition of other large-scale data in non-Latin script.
| Original language | English |
|---|---|
| Pages | 65 |
| Publication status | Published - May 2021 |
| Event | International Conference on Digital Representation and Research in Art, Humanities and Culture (DH 2020) - Hang Seng University of Hong Kong or Online, Hong Kong, China Duration: 6 May 2021 → 7 May 2021 https://dh2020.hsu.edu.hk/ |
Conference
| Conference | International Conference on Digital Representation and Research in Art, Humanities and Culture (DH 2020) |
|---|---|
| Abbreviated title | DH2020 |
| Place | Hong Kong, China |
| Period | 6/05/21 → 7/05/21 |
| Internet address |
Fingerprint
Dive into the research topics of 'Disambiguating Names of Chinese Historical Figures in Local Gazetteers Digitally'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver