GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models

Weisi Chen; Walayat Hussain; Junjie Chen

doi:10.32604/cmc.2025.065916

Open Access icon Open Access

ARTICLE

GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models

Weisi Chen^1,*, Walayat Hussain^2,*, Junjie Chen¹

1 School of Software Engineering, Xiamen University of Technology, Xiamen, 361024, China
2 Peter Faber Business School, Australian Catholic University, North Sydney, 2060, Australia

* Corresponding Authors: Weisi Chen. Email: email ; Walayat Hussain. Email: email

Computers, Materials & Continua 2025, 85(1), 1559-1583. https://doi.org/10.32604/cmc.2025.065916

Received 25 March 2025; Accepted 04 July 2025; Issue published 29 August 2025

Abstract

Topic modeling is a fundamental technique of content analysis in natural language processing, widely applied in domains such as social sciences and finance. In the era of digital communication, social scientists increasingly rely on large-scale social media data to explore public discourse, collective behavior, and emerging social concerns. However, traditional models like Latent Dirichlet Allocation (LDA) and neural topic models like BERTopic struggle to capture deep semantic structures in short-text datasets, especially in complex non-English languages like Chinese. This paper presents Generative Language Model Topic (GLMTopic) a novel hybrid topic modeling framework leveraging the capabilities of large language models, designed to support social science research by uncovering coherent and interpretable themes from Chinese social media platforms. GLMTopic integrates Adaptive Community-enhanced Graph Embedding for advanced semantic representation, Uniform Manifold Approximation and Projection-based (UMAP-based) dimensionality reduction, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering, and large language model-powered (LLM-powered) representation tuning to generate more contextually relevant and interpretable topics. By reducing dependence on extensive text preprocessing and human expert intervention in post-analysis topic label annotation, GLMTopic facilitates a fully automated and user-friendly topic extraction process. Experimental evaluations on a social media dataset sourced from Weibo demonstrate that GLMTopic outperforms Latent Dirichlet Allocation (LDA) and BERTopic in coherence score and usability with automated interpretation, providing a more scalable and semantically accurate solution for Chinese topic modeling. Future research will explore optimizing computational efficiency, integrating knowledge graphs and sentiment analysis for more complicated workflows, and extending the framework for real-time and multilingual topic modeling.

Keywords

Topic modeling; large language model; deep learning; natural language processing; text mining

Cite This Article

APA Style

Chen, W., Hussain, W., Chen, J. (2025). GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models. Computers, Materials & Continua, 85(1), 1559–1583. https://doi.org/10.32604/cmc.2025.065916

Vancouver Style

Chen W, Hussain W, Chen J. GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models. Comput Mater Contin. 2025;85(1):1559–1583. https://doi.org/10.32604/cmc.2025.065916

IEEE Style

W. Chen, W. Hussain, and J. Chen, “GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models,” Comput. Mater. Contin., vol. 85, no. 1, pp. 1559–1583, 2025. https://doi.org/10.32604/cmc.2025.065916

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models

Abstract

Keywords

Cite This Article

2905

2307

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link