Table of Content

Open Access

ARTICLE

New Generation Model of Word Vector Representation Based on CBOW or Skip-Gram

Zeyu Xiong1,*, Qiangqiang Shen1, Yueshan Xiong1, Yijie Wang1, Weizi Li2
HPCL, School of Computer Science, National University of Defense Technology, Changsha, 410073, China.
Department of Computer Science, University of North Carolina at Chapel Hill, 27599, USA.
* Corresponding Author: Zeyu Xiong. Email: .

Computers, Materials & Continua 2019, 60(1), 259-273. https://doi.org/10.32604/cmc.2019.05155

Abstract

Word vector representation is widely used in natural language processing tasks. Most word vectors are generated based on probability model, its bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. Recently, neural-network language models CBOW and Skip-Gram are developed as continuous-space language models for words representation in high dimensional real-valued vectors. These vector representations have recently demonstrated promising results in various NLP tasks because of their superiority in capturing syntactic and contextual regularities in language. In this paper, we propose a new strategy based on optimization in contiguous subset of documents and regression method in combination of vectors, two of new models CBOW-OR and SkipGram-OR for word vector learning are established. Experimental results show that for some words-pair, the cosine distance obtained by the CBOW-OR (or SkipGram-OR) model is generally larger and is more reasonable than CBOW (or Skip-Gram), the vector space for Skip-Gram and SkipGram-OR keep the same structure property in Euclidean distance, and the model SkipGram-OR keeps higher performance for retrieval the relative words-pair as a whole. Both CBOW-OR and SkipGram-OR model are inherent parallel models and can be expected to apply in large-scale information processing.

Keywords

Distributed word vector, continuous-space language model, hierarchical softmax.

Cite This Article

Z. Xiong, Q. Shen, Y. Xiong, Y. Wang and W. Li, "New generation model of word vector representation based on cbow or skip-gram," Computers, Materials & Continua, vol. 60, no.1, pp. 259–273, 2019.

Citations




This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2124

    View

  • 1039

    Download

  • 0

    Like

Related articles

Share Link

WeChat scan