People occasionally interact with each other through conversation. In particular, we communicate through dialogue and exchange emotions and information from it. Emotions are essential characteristics of natural language. Conversational artificial intelligence is an integral part of all the technologies that allow computers to communicate like humans. For a computer to interact like a human being, it must understand the emotions inherent in the conversation and generate the appropriate responses. However, existing dialogue systems focus only on improving the quality of understanding natural language or generating natural language, excluding emotions. We propose a chatbot based on emotion, which is an essential element in conversation. EP-Bot (an Empathetic PolarisX-based chatbot) is an empathetic chatbot that can better understand a person’s utterance by utilizing PolarisX, an auto-growing knowledge graph. PolarisX extracts new relationship information and expands the knowledge graph automatically. It is helpful for computers to understand a person’s common sense. The proposed EP-Bot extracts knowledge graph embedding using PolarisX and detects emotion and dialog act from the utterance. Then it generates the next utterance using the embeddings. EP-Bot could understand and create a conversation, including the person’s common sense, emotion, and intention. We verify the novelty and accuracy of EP-Bot through the experiments.
Artificial intelligence technology is growing rapidly with countless big data and the development of technologies that can compute it. In particular, deep learning technology shows high performance in areas such as images and visions, showing various possibilities. Recently, with the advent of language models, the natural language processing field is also under active study [
Conversational artificial intelligence (AI) technology is emerging as research in various fields is active and good results are achieved in each area. It is not one research field or technology, but a combination of technologies that allow chatbots and voice assistants to communicate as if they were humans. A large amount of data, as well as natural language processing(NLP) technology, is used to enable computers to interact with people [
Natural language processing research consists of various tasks, such as named entity recognition, summarization, and sentence classification. It mainly consists of the natural language understanding (NLU) and the natural language generation (NLG) that processes natural language like a human being. NLP research is a crucial area in conversational AI research. But it is classified as a difficult task because computers have to learn all parts, unlike people who can naturally communicate through social and cultural learning [
A person communicates through conversation with others. Conversations that people send and receive include information that is ostensibly revealed, but also that is nested within sentences, such as the intention or emotion of the utterance. Emotions are a fundamental characteristic, especially since people can communicate with emotions through natural language. However, many conversational AI studies focus on improving the performance to understand natural language and generate answers [
The purpose of the conversational AI study is to enable computers to communicate like humans, after all. Emotional characteristics are essential for this. Computers should be able to understand the emotions that are implicit in the conversation and generate appropriate answers based on them. Dialogue systems that are not learned based on emotional information sometimes generate inappropriate responses. If we hear “I failed the exam” from a friend, how do we usually answer? There is no direct word for emotion in the sentence, but we can grasp the context and see that the friend feels bad or sad. A dialogue system that does not learn these emotions can give a wrong answer or, based on the similarity of sentences, give the same response as the answer to “I passed the exam” like “Congratulations.”
We propose a dialogue system based on emotion, which is an essential factor in conversation. To identify emotions from the utterance and generate appropriate answers based on the identified emotions. At this time, knowledge graphs are used to understand a person’s utterance better. Knowledge graphs are a graph that represents a person’s knowledge, including WordNet [
In this paper, we make it possible to respond to newly generated words or knowledge by utilizing PolarisX [
We propose EP-Bot (an Empathetic PolarisX-based chatbot), an empathetic chatbot using an auto-growing knowledge graph. It utilizes an automatically extended knowledge graph. It also utilizes PolarisX, an auto-growing knowledge graph, to understand the emotions and intentions inherent in the conversation more like a human being. It also generates appropriate answers to the utterance by using the extracted emotions and intentions.
Various conversational artificial intelligence technologies are already being used in our daily lives, including Google Assistant and Apple Siri. With the rapid development of language-related technology, agents based on conversational AI technology are playing a role in communicating with people to understand their intentions and provide them with necessary information. Among many technologies, especially in natural language processing, the development of language models has enabled much research in various fields, from natural language processing to natural language understanding, to natural language generation.
A language model refers to a model that assigns a probability to a sentence. It is mostly divided into models based on statistical techniques and those based on the neural network. The recent emergence of language models based on the neural network has enabled natural language processing-related technologies’ rapid growth. Models such as ELMo [
The language model achieved the best performance in major benchmarks with the emergence of a pre-learned deep learning-based language model in 2018. ELMo [
Both ELMo and GPT were shown to be the best performance on the benchmark at the time of their appearance, but the forward or reverse model’s structural nature is highly likely to make false predictions. BERT, a two-way model that simultaneously looks at the targeted word’s front and back words, wants to solve this problem. The performance was greatly improved by proposing a two-way pre-trained language model using only the encoder of Transformers. With the unveiling of deep learning-based language models such as ELMo, GPT, and BERT, the natural language processing field is developing rapidly [
Conversational AI is an area encompassing many areas of natural language processing. The combination of NLU, which can understand a person’s language, and several technologies in the NLG field that create a new natural language can be used to implement interactive artificial intelligence that can interact with people. With the rapid development of deep learning-based pre-learning language models, research on interactive artificial intelligence technology is also actively carried out [
A chatbot is a model that can interact with people most by applying interactive artificial intelligence technology. Recently, they have also learned the knowledge base or knowledge graph to improve chatbot performance. It is used to understand a person’s intentions and to learn necessary information [
However, many existing chatbots are based on specific purposes such as delivering information and performing missions, so they often fail to respond appropriately to human speech’s emotional information. In conversation, emotions are used as a significant factor. There is a limitation that even the same sentence often implies different meanings depending on the emotion, so if the dialogue system can’t find emotional information, it can’t give a proper answer [
A knowledge graph is a graph that shows the relational meaning of a word by linking words to words. Knowledge graphs are relationship-based, word-based, and can be used as a technology to help computers deduce words as humans perceive them in different contextual meanings [
However, many of the existing knowledge graphs are limited to not being able to continue to expand because they are based only on data for specific languages, such as English, or on existing wiki data. PolarisX [
We propose an emotion-based chatbot using a pre-trained language model and PolarisX, an automatically extended knowledge graph. To improve the limitations of models that utilize only existing language models or emotional information, PolarisX is used to propose chatbot models that can communicate based on human emotions and actions.
Since people acquire various information through dialogue, artificial intelligence technology must also be able to create the next conversation using the information they understand and grasp to learn by themselves. A person can help his or her work as a personal secretary or wants to talk naturally like a friend, mainly through conversations with conversational AI. However, many existing studies research models by classifying them as secretary and daily conversation.
Conversations to people are not just for information acquisition but also have various meanings such as emotional exchange and interaction. Since conversational AI permeates everyday life for functional performance, chatbots may not understand conversations that are not goals to be carried out or give out incorrect answers. As shown in
We propose a chatbot model that serves as a secretary to obtain the necessary information through dialogue and enables everyday conversation. Using PolarisX, a knowledge graph that automatically expands, it delivers accurate information to users. It suggests an emotional-based conversation creation model, EP-Bot, utilizing the information inherent in the conversation, such as emotion and intent of the conversation.
EP-Bot is an emotion-based chatbot that enables both information acquisition and daily conversation. The various elements inherent in the conversation should be identified and utilized to provide information and to generate appropriate answers according to one’s emotions. To this end, we use PolarisX, an automatically extended knowledge graph, and build a model that detects emotions and intentions, and finally produces the next utterance.
In order to use the information inherent in the conversation to give an appropriate answer, the model must first be able to extract the information. There is a lot of information embedded in the conversation, but we selectively figure out and utilize some information. We extract information about relationships with entities in sentences to provide information that we want to get from an utterance of a person. We also extract dialogue action and emotion information to respond appropriately to a person’s intentions and help identify their emotions.
A person’s common sense can be expressed as relationships between different objects. The link between the two entities and the relationship information between them allows knowledge to be represented graphically. Many existing knowledge bases have linked each object to a relationship to create a knowledge graph. Still, they have limitations because they have a lot of data in a particular language or cannot respond to new knowledge. PolarisX is an automatically extended knowledge graph that extracts relationships from news data and links them to knowledge graphs to cope with new words as well as various languages. Automatically extended knowledge graphs enable the delivery of more meaningful knowledge information to people.
In EP-Bot, a model that utilizes all of the previous extracts of PolarisX embedding, emotion embedding, and act embedding to generate the next appropriate utterance is very important. We use the OpenAI GPT model [
OpenAI GPT model is a compelling pre-trained language model proposed by OpenAI that can perform various NLP tasks. It is proposed to address the limitations of the prior state-of-the-art NLP models trained using supervised learning. It uses a multi-layer Transformer decoder [
We propose EP-Bot, an emotion-based chatbot. EP-Bot utilizes PolarisX, an automatically extended knowledge graph, to better understand the emotions and intentions inherent in the conversation. We use extracted embedding values to generate the next utterance. We experiment with the proposed model to verify its performance.
To verify the proposed model, we use the server with Ubuntu 18.04, AMD Ryzen Threadripper 1950X 16-Core processor, and 125 GB RAM. We also experiment on the GPU of NVIDIA GeForce RTX 2080 SUPER for training the model.
Implementing and experimenting with emotion-based chatbots requires datasets that contain emotional data, not just conversational data. We utilize DailyDialog [
EP-bot analyzes emotion and intention for a given utterance and thus generates the next dialogue. For this purpose, when an utterance is given, it extracts the given utterance’s emotion and intention first. Therefore, it should be possible to classify the emotions and intentions of the speaker more accurately. We conduct an accuracy experiment on the emotion-act detection model used in the EP-bot. We train the model with the train set of the DailyDialog dataset and verify it using the validation set. Also, to validate knowledge graph embedding, a model without knowledge graph and a model based on knowledge graph embedding have been experimented with.
Model | Emotion-F1 | Act-F1 | Average-F1 |
---|---|---|---|
Baseline | |||
Electra-small | 83.12 | 82.95 | 83.03 |
BERT-base | 83.44 | 82.48 | 82.96 |
ALBERT-base | 83.42 | 82.70 | 83.06 |
With KG Embedding | |||
Electra-small | 83.51 | 84.07 | 83.79 |
BERT-base | 84.09 | 84.04 | 84.06 |
The EP-bot we propose utilizes knowledge graphs, emotions, and dialog act. Since the EP-bot is a conversational chatbot, the performance of the model that generates the next conversation about a given utterance is essential. However, it is challenging to select evaluation criteria because evaluating dialog generation models is ambiguous. We use hit ratio, perplexity, F1-score, and BLEU score, commonly used by existing dialog generation models as evaluation metrics.
We conduct comparative experiments with existing studies to evaluate the performance of the proposed model. The experiment was carried out using a model using Seq2Seq [
Model | Hit@1 | PPL | F1 | BLEU |
---|---|---|---|---|
Models without emotion | ||||
Seq2Seq + Attention | 9.41 | 129.3 | 10.22 | 5.58 |
Transformer ranker | 17.20 | – | 26.37 | 15.79 |
OpenAI GPT without emotion | 75.01 | 10.19 | 18.2 | 3.755 |
Models with emotion | ||||
EmpTransfo | 77.25 | 10.63 | 19.39 | 3.99 |
EmpTransfo + action+topic | 78.47 | 9.04 | 17.27 | 2.45 |
EP-bot without KG, act | 71.49 | 9.90 | 14.43 | 3.70 |
The next utterance generation model in EP-bot shows some improvement over existing studies by utilizing knowledge graphs, emotions, and actions. However, F1 scores and BLEU scores show somewhat lower results than previous studies. Because the existing Seq2Seq and Transformer models focus only on giving the correct answer, regardless of one’s feelings, indicators such as F1 and BLEU can show good results. EP-bot is more focused on people’s feelings, so evaluating the accuracy of the sentences may be somewhat lower. Still, it is meaningful enough to show high performance in perplexity, which is used as a key evaluation index.
Conversational AI is a convergence technology that reduces the gap between computer and human interaction so that computers can communicate like humans. With advances in technologies such as big data, machine learning, and natural language processing, conversational AI is also developing rapidly. However, existing chatbots are mainly designed to carry out specific tasks, so there is a limitation in general conversations that they do not understand at all what is inherent in a person’s speech or give a wrong answer. For a chatbot to talk more like a person, it is necessary to identify and utilize the emotions and intentions contained in the dialogue.
We propose a chatbot based on the emotions inherent in a person’s utterance. In particular, PolarisX, a knowledge graph that automatically scales to improve the ability to identify emotions and intentions. Knowledge graphs can help computers understand the commonsense contained in conversations. EP-bot, a model that extracts emotions and intentions inherent in ignition based on knowledge graphs, and produces the following sentences based on them, was proposed and verified by experiments.
Computers need to understand people more and communicate like humans to learn their emotions and intentions. The EP-bot, which we propose, is expected to be used in various ways as a model that uses knowledge graphs to enable emotional and intentional conversations.
Still, conversations contain a variety of information and the relationship, emotion, and action information we use. In future work, we would like to extract the various information contained in the conversation and find the features that affect the performance of the dialogue model to improve the performance of the EP-Bot proposed. We also establish an environment to enable peoples to communicate with the EP-Bot in a real dialogue environment.