TY - GEN
T1 - Research on Development of LLMs and Manual Comparison of Applications
AU - Liu, Xiaoya
AU - Li, Jiayi
AU - Bai, Tianhao
AU - Gao, Jingtong
AU - Zhang, Pengle
AU - Zhao, Xiangyu
N1 - Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).
PY - 2024
Y1 - 2024
N2 - This paper provides a systematic analysis of the real-world applications of large language models (LLMs) in human-computer interaction, emphasizing their performance and effectiveness. In recent years, the advanced capabilities of LLMs have revolutionized this field, leading to widespread adoption across both academic and practical domains. However, the lack of comprehensive assessments of the practical performance of these models has hindered researchers and practitioners from distinguishing between their capabilities and performance differences. This study examines how LLMs are applied in reasoning tasks within natural language processing, offering a detailed perspective that enhances the understanding and application of these models for both researchers and practitioners. It assesses the performance of leading open-source LLMs using the Moss dataset, focusing on their effectiveness, reliability, and applicability in real-world scenarios. Through meticulous manual comparison and evaluation across eleven key performance metrics, this research reveals performance disparities among these models in practical tasks. By shedding light on these comparative analyses, this study aims to guide future investigations toward a nuanced comprehension of LLM capabilities and limitations, addressing the evolving needs of academia and industry. Future endeavors will expand this analysis to encompass a broader spectrum of models and tasks, providing deeper insights and actionable recommendations for both the research and practical communities. © 2024 IEEE.
AB - This paper provides a systematic analysis of the real-world applications of large language models (LLMs) in human-computer interaction, emphasizing their performance and effectiveness. In recent years, the advanced capabilities of LLMs have revolutionized this field, leading to widespread adoption across both academic and practical domains. However, the lack of comprehensive assessments of the practical performance of these models has hindered researchers and practitioners from distinguishing between their capabilities and performance differences. This study examines how LLMs are applied in reasoning tasks within natural language processing, offering a detailed perspective that enhances the understanding and application of these models for both researchers and practitioners. It assesses the performance of leading open-source LLMs using the Moss dataset, focusing on their effectiveness, reliability, and applicability in real-world scenarios. Through meticulous manual comparison and evaluation across eleven key performance metrics, this research reveals performance disparities among these models in practical tasks. By shedding light on these comparative analyses, this study aims to guide future investigations toward a nuanced comprehension of LLM capabilities and limitations, addressing the evolving needs of academia and industry. Future endeavors will expand this analysis to encompass a broader spectrum of models and tasks, providing deeper insights and actionable recommendations for both the research and practical communities. © 2024 IEEE.
KW - Application
KW - LLMs
KW - Manual Metrics
UR - http://www.scopus.com/inward/record.url?scp=85219184875&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85219184875&origin=recordpage
U2 - 10.1109/BigDIA63733.2024.10808821
DO - 10.1109/BigDIA63733.2024.10808821
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 979-8-3503-5463-8
T3 - International Conference on Big Data and Information Analytics, BigDIA
SP - 23
EP - 30
BT - The 10th International Conference on Big Data and Information Analytics (BigDIA 2024) - Proceedings
PB - IEEE
T2 - 10th International Conference on Big Data and Information Analytics (BigDIA 2024)
Y2 - 25 October 2024 through 28 October 2024
ER -