Word embeddings and semantic shifts in historical Spanish: Methodological considerations

Hai Hu, Patrícia Amaral*, Sandra Kübler

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

6 Citations (Scopus)

Abstract

Word embeddings have recently been applied to detect and explore changes in word meaning on large historical corpora. While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning accuracy and applicability of these methods for historical data. There is a scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can also be used for historical data sets. Our overarching goal is to use word embeddings for investigating semantic shifts in the history of Spanish. In the work presented here, we focus on methodological questions that arise. We first examine the stability and applicability of three commonly used word embeddings models on a small corpus of medieval and classical Spanish. Comparing our results with a study on the word algo as a test case, we show that a rank-averaging method can produce more stable results from the embeddings. We corroborate previous theoretical work while demonstrating the applicability of our method when training word embeddings on small corpora for the analysis of semantic change. Second, we investigate how best to evaluate different embeddings models. We show that an existing analogy test cannot be used without modification. Our new analogy test, consisting of roughly 10,000 questions for medieval and classical Spanish, will be released with the article. © The Author(s) 2021
Original languageEnglish
Pages (from-to)441-461
Number of pages21
JournalDigital Scholarship in the Humanities
Volume37
Issue number2
Online published25 Aug 2021
DOIs
Publication statusPublished - Jun 2022
Externally publishedYes

Funding

This project was partially supported by the Indiana University Institute for Digital Arts and Humanities (IDAH) and the Indiana University New Frontiers in the Arts and Humanities Program through two fellowships awarded to Patrícia Amaral.

Fingerprint

Dive into the research topics of 'Word embeddings and semantic shifts in historical Spanish: Methodological considerations'. Together they form a unique fingerprint.

Cite this