TY - JOUR
T1 - Coding latent concepts
T2 - a human and LLM-coordinated content analysis procedure
AU - Fan, Jia
AU - Ai, Yushi
AU - Liu, Xiaofan
AU - Deng, Yilin
AU - Li, Yongning
N1 - Publisher Copyright:
© 2024 Eastern Communication Association.
PY - 2024/10/3
Y1 - 2024/10/3
N2 - Measuring complex and latent concepts at a large-scale poses significant challenges for communication researchers. While computational and crowdsourced methods offer solutions, they often require high professional thresholds or incur significant costs. The recent advent of large language models has revolutionized content analysis. This paper employs a human and LLM-coordinated analysis procedure to measure complex and latent concepts in 1,000 public comments, exemplified by the multi-dimension concept of “deliberativeness.” We showcase the collaboration between humans and LLMs in completing complex coding tasks by designing and refining a codebook for human use and corresponding prompts for LLMs. Surprisingly, we find that fine-tuned GPT-3.5-turbo-1106 with smaller datasets can surpass GPT-4o-2024-05-13’s performance and match manual content analyses. This paper provides communication researchers with an efficient and cost-effective reference for measuring latent concepts. © 2024 Eastern Communication Association
AB - Measuring complex and latent concepts at a large-scale poses significant challenges for communication researchers. While computational and crowdsourced methods offer solutions, they often require high professional thresholds or incur significant costs. The recent advent of large language models has revolutionized content analysis. This paper employs a human and LLM-coordinated analysis procedure to measure complex and latent concepts in 1,000 public comments, exemplified by the multi-dimension concept of “deliberativeness.” We showcase the collaboration between humans and LLMs in completing complex coding tasks by designing and refining a codebook for human use and corresponding prompts for LLMs. Surprisingly, we find that fine-tuned GPT-3.5-turbo-1106 with smaller datasets can surpass GPT-4o-2024-05-13’s performance and match manual content analyses. This paper provides communication researchers with an efficient and cost-effective reference for measuring latent concepts. © 2024 Eastern Communication Association
KW - complex and latent concepts
KW - Content analysis
KW - large language model
UR - http://www.scopus.com/inward/record.url?scp=85205692258&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85205692258&origin=recordpage
U2 - 10.1080/08824096.2024.2410263
DO - 10.1080/08824096.2024.2410263
M3 - RGC 21 - Publication in refereed journal
AN - SCOPUS:85205692258
SN - 0882-4096
VL - 41
SP - 324
EP - 334
JO - Communication Research Reports
JF - Communication Research Reports
IS - 5
ER -