HongGeun Ji1,2, Soyoung Oh1, Jina Kim3, Seong Choi1,2, Eunil Park1,2,*
CMC-Computers, Materials & Continua, Vol.70, No.1, pp. 669-678, 2022, DOI:10.32604/cmc.2022.019521
Abstract In the field of natural language processing (NLP), the advancement of neural machine translation has paved the way for cross-lingual research. Yet, most studies in NLP have evaluated the proposed language models on well-refined datasets. We investigate whether a machine translation approach is suitable for multilingual analysis of unrefined datasets, particularly, chat messages in Twitch. In order to address it, we collected the dataset, which included 7,066,854 and 3,365,569 chat messages from English and Korean streams, respectively. We employed several machine learning classifiers and neural networks with two different types of embedding: word-sequence embedding and the… More >