Skip to main navigation Skip to search Skip to main content

基于大语言模型的自杀意念文本数据增强与识别技术

Translated title of the contribution: Suicidal ideation data augmentation and recognition technology based on large language models
  • 章彦博 (Co-first Author)
  • , 黄峰 (Co-first Author)
  • , 莫柳铃
  • , 刘晓倩
  • , 朱廷劭 *
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Suicide constitutes a significant global public health challenge, with the World Health Organization reporting substantial annual mortality rates. Traditional suicide detection methods primarily depend on self-assessment scales and clinical evaluations, which require considerable resources and rely on patients actively seeking assistance. The integrated motivational-volitional (IMV) model offers a theoretical framework for comprehending suicidal behavior progression, with suicidal ideation serving as a critical risk indicator. While text-based analysis presents a promising non-invasive approach for early identification, it encounters technical challenges due to limited annotated data and linguistic complexity. Large Language Models (LLMs) offer unprecedented capabilities in language understanding and generation, potentially addressing these challenges through their ability to comprehend diverse expressions of suicidal ideation and generate high-quality training data.
This research employed a two-stage design leveraging LLMs to address the challenge of limited training data for suicidal ideation recognition. In Study I, we selected ChatGLM3-6B and Qwen-7B-Chat as foundation LLMs and implemented both zero-shot and few-shot learning approaches combined with supervised learning strategies. We extracted examples from an original dataset of Weibo comments to create high-quality training data for the LLMs. Comparative experiments evaluated model performance, with human coders assessing the quality of LLM-generated texts using established suicide risk evaluation criteria. In Study II, we evaluated the impact of LLM-based data augmentation on recognition models by comparing traditional machine learning approaches with LLM-based methods trained on both original and augmented datasets, measuring performance through accuracy and true negative rate metrics.
In Study I, the two self-developed LLM-based models demonstrated excellent performance in suicidal ideation data augmentation, significantly outperforming baseline models according to comprehensive evaluation metrics. The success of these LLM-enhanced models highlighted the effectiveness of high-quality data construction through advanced language modeling capabilities. In Study II, all experimental models trained on LLM-augmented data significantly outperformed their corresponding baseline models in both accuracy and true negative rate. The highest-performing model utilized the ChatGLM3-6B architecture with few-shot learning, showing marked improvements compared to its baseline counterpart. These findings demonstrate the substantial impact of LLM-based data augmentation on model generalization ability, particularly in capturing diverse and subtle expressions of suicidal ideation that traditional approaches often miss.
This study validates the effectiveness of LLM-based data augmentation methods in enhancing suicidal ideation recognition while addressing data scarcity challenges. The non-invasive approach developed through LLM technology has the potential to provide timely and effective early warning of suicide risk while protecting user privacy. This research contributes to both theoretical understanding of LLMs' capabilities in complex psychological text processing and practical applications in mental health monitoring. Future research should explore cross-platform applicability of LLMs, model interpretability, and ethical considerations to further advance this promising technology in suicide prevention and broader mental health applications.
Translated title of the contributionSuicidal ideation data augmentation and recognition technology based on large language models
Original languageChinese (Simplified)
Pages (from-to)987-1000
Number of pages14
Journal心理学报
Volume57
Issue number6
Online published15 Apr 2025
DOIs
Publication statusPublished - 25 Jun 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Research Keywords

  • 自杀意念
  • 数据增强
  • 自杀文本识别
  • 大语言模型
  • 人工智能
  • suicidal ideation
  • data augmentation
  • suicide text recognition
  • large language models
  • artificial intelligence

Fingerprint

Dive into the research topics of 'Suicidal ideation data augmentation and recognition technology based on large language models'. Together they form a unique fingerprint.

Cite this