Open Access iconOpen Access

ARTICLE

crossmark

GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models

Weisi Chen1,*, Walayat Hussain2,*, Junjie Chen1

1 School of Software Engineering, Xiamen University of Technology, Xiamen, 361024, China
2 Peter Faber Business School, Australian Catholic University, North Sydney, 2060, Australia

* Corresponding Authors: Weisi Chen. Email: email; Walayat Hussain. Email: email

Computers, Materials & Continua 2025, 85(1), 1559-1583. https://doi.org/10.32604/cmc.2025.065916

Abstract

Topic modeling is a fundamental technique of content analysis in natural language processing, widely applied in domains such as social sciences and finance. In the era of digital communication, social scientists increasingly rely on large-scale social media data to explore public discourse, collective behavior, and emerging social concerns. However, traditional models like Latent Dirichlet Allocation (LDA) and neural topic models like BERTopic struggle to capture deep semantic structures in short-text datasets, especially in complex non-English languages like Chinese. This paper presents Generative Language Model Topic (GLMTopic) a novel hybrid topic modeling framework leveraging the capabilities of large language models, designed to support social science research by uncovering coherent and interpretable themes from Chinese social media platforms. GLMTopic integrates Adaptive Community-enhanced Graph Embedding for advanced semantic representation, Uniform Manifold Approximation and Projection-based (UMAP-based) dimensionality reduction, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering, and large language model-powered (LLM-powered) representation tuning to generate more contextually relevant and interpretable topics. By reducing dependence on extensive text preprocessing and human expert intervention in post-analysis topic label annotation, GLMTopic facilitates a fully automated and user-friendly topic extraction process. Experimental evaluations on a social media dataset sourced from Weibo demonstrate that GLMTopic outperforms Latent Dirichlet Allocation (LDA) and BERTopic in coherence score and usability with automated interpretation, providing a more scalable and semantically accurate solution for Chinese topic modeling. Future research will explore optimizing computational efficiency, integrating knowledge graphs and sentiment analysis for more complicated workflows, and extending the framework for real-time and multilingual topic modeling.

Keywords

Topic modeling; large language model; deep learning; natural language processing; text mining

Cite This Article

APA Style
Chen, W., Hussain, W., Chen, J. (2025). GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models. Computers, Materials & Continua, 85(1), 1559–1583. https://doi.org/10.32604/cmc.2025.065916
Vancouver Style
Chen W, Hussain W, Chen J. GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models. Comput Mater Contin. 2025;85(1):1559–1583. https://doi.org/10.32604/cmc.2025.065916
IEEE Style
W. Chen, W. Hussain, and J. Chen, “GLMTopic: A Hybrid Chinese Topic Model Leveraging Large Language Models,” Comput. Mater. Contin., vol. 85, no. 1, pp. 1559–1583, 2025. https://doi.org/10.32604/cmc.2025.065916



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2590

    View

  • 2120

    Download

  • 0

    Like

Share Link