Table of Content

Open Access iconOpen Access


Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms

Jiahui He1, Chaozhi Wang1, Hongyu Wu1, Leiming Yan1,*, Christian Lu2

School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing, 210044, China.
School of Information Technology, Deakin University, Victoria, Australia.

*Corresponding Author: Leiming Yan. Email: email.

Journal of New Media 2019, 1(2), 51-61.


Multi-label text categorization refers to the problem of categorizing text through a multi-label learning algorithm. Text classification for Asian languages such as Chinese is different from work for other languages such as English which use spaces to separate words. Before classifying text, it is necessary to perform a word segmentation operation to convert a continuous language into a list of separate words and then convert it into a vector of a certain dimension. Generally, multi-label learning algorithms can be divided into two categories, problem transformation methods and adapted algorithms. This work will use customer's comments about some hotels as a training data set, which contains labels for all aspects of the hotel evaluation, aiming to analyze and compare the performance of various multi-label learning algorithms on Chinese text classification. The experiment involves three basic methods of problem transformation methods: Support Vector Machine, Random Forest, k-Nearest-Neighbor; and one adapted algorithm of Convolutional Neural Network. The experimental results show that the Support Vector Machine has better performance.


Cite This Article

J. He, C. Wang, H. Wu, L. Yan and C. Lu, "Multi-label chinese comments categorization: comparison of multi-label learning algorithms," Journal of New Media, vol. 1, no.2, pp. 51–61, 2019.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2005


  • 1856


  • 0


Share Link