Translation software has become an important tool for communication between different languages. People’s requirements for translation are higher and higher, mainly reflected in people’s desire for barrier free cultural exchange. With a large corpus, the performance of statistical machine translation based on words and phrases is limited due to the small size of modeling units. Previous statistical methods rely primarily on the size of corpus and number of its statistical results to avoid ambiguity in translation, ignoring context. To support the ongoing improvement of translation methods built upon deep learning, we propose a translation algorithm based on the Hidden Markov Model to improve the use of context in the process of translation. During translation, our Hidden Markov Model prediction chain selects a number of phrases with the highest result probability to form a sentence. The collection of all of the generated sentences forms a topic sequence. Using probabilities and article sequences determined from the training set, our method again applies the Hidden Markov Model to form the final translation to improve the context relevance in the process of translation. This algorithm improves the accuracy of translation, avoids the combination of invalid words, and enhances the readability and meaning of the resulting translation.

Language translation has undergone a major change in recent years. Traditional statistical machine translators have considered only the linear relationship between words and neglected sentence structure and context. Differences in word order between languages limited the overall translation performance of these methods. However, rapid developments in deep learning have advanced translation toward intelligence. As a specific type of machine learning, deep learning offers great performance and flexibility [

The older mathematical models for statistical machine translation are five word-to-word models originally proposed by IBM researchers, termed IBM model 1 to IBM model 5. Google’s earlier online translation system was based on statistical machine translation. The system created a translation corpus by searching a large number of bilingual web pages, selected the most common correspondence between words, and generated translation results according to the mathematical model. The Internet now provides an abundant corpus, providing a foundation for the development and improvement of statistics-based machine translation methods. A few years ago, Google began to use a recurrent neural network (RNN) for translation, directly learning mappings from an input sequence (such as a sentence in one language) to an output sequence (the same sentence in another language). Existing NMT approaches are constrained by NMT’s one-way decoder. A one-way decoder cannot predict target words according to the context to be generated, but only according to historical information. However, the dependency between words is uncertain, and historical information may not be sufficient to predict the target words using NMT. The quality of translation is greatly influenced by the dependencies between words [

We present a translation method using the Hidden Markov Model (HMM) combined with context. The input text is processed by a Hidden Markov Model with a phrase-based translation unit. Then, machine learning calculates the sequence of articles with the sentence as the translation unit again to improve the accuracy of translation.

Machine translation has flourished since its emergence. With the help of a growing corpus, automatic translation has advanced from low-quality results that do not pay attention to grammatical analysis to higher-quality results from analyzing sentence structure and grammar. At present, improving the quality and efficiency of machine translation remains a difficult problem. It is worth exploring better methods of translation that incorporate context.

We offer a new opinion about translation algorithms. In the beginning, methods followed a one-to-one direct translation mode, which established the foundation for the noise channel theory of statistical machine translation as well as the groundwork for future intelligent translation. At that time, the theory took a big step forward in machine translation, but it was just the beginning in terms of language order and structure [

Japanese translation experts proposed case-based machine translation using a source language instance sentence library. Translation takes place by comparing input sentences to examples in the database and outputting the translations for the most similar ones. The target sentence is then processed further to obtain the final translation. However, the amount of memory available for translation and the system’s coverage of a language determine the quality of results. Further, not all users use the same definition of similarity. Therefore, case-based translation requires more effort to be successful [

By 2014, the popularity of the Seq2Seq model rose quickly, capturing long-distance information. In this method, the encoder compresses an entire input sentence into a vector with fixed dimensions, with the decoder generating the output sentence according to this vector. The addition of an integrating attention mechanism improves the feature learning ability for long sentences and strengthens the representation ability of source language sequences [

Chen et al. [

Additionally, in 2014, a graph-based method was proposed by Narouel et al. [

In 2017, Chinea-Rios et al. [

Using the model framework of Seq2Seq, Bapna et al. [

To support languages with large differences in syntax, Socher et al. [

In our method, we determine the topic of each sentence in a coherent document, with the document thus described as a sequence of sentence topics. However, topics are interrelated, and topic changes are continuous, similar to a relationship diagram. This topic sequence forms the document coherence chain. Our coherent capture framework for statistical machine translation uses a document coherence chain built using the Hidden Markov Model.

For the review, the Hidden Markov Model is defined as a quintuple:

A is the state transition probability matrix:

B is the probability matrix of the observation

For the state set

The question that requires prediction is

This formula is equivalent to

Both the Gaussian mixture model (GMM) and the dynamic neural network (DNN) as shown in

In the Hidden Markov chain, there are many values of state

The prediction translation chain of Hidden Markov requires a path in the graph so that the probability value of the corresponding path is the maximum. In the case of

Our overall approach implements the following requirements and calculations.

In the case of

Assuming that there are

According to the preceding requirements, when calculating the shortest path of the (_{2t} and when t from 2 to the shortest path of each node

In order to record the intermediate variables, two variables

where _{t}

Among all the single paths

The input model and observation status are, respectively,

The output to find the optimal path is

The programmed steps of our method are as follows.

Initialize the parameters:

According to formulas (

We terminate the calculations when ^{*} is the maximum of the

This results in the optimal path

We input the new observation sequence into the HMM and obtain the new sequence, which is the translation result.

We divide the different translation results into different dice. Each die is regarded as a translation result. By training all of the dice in the data set, we obtain the probability of the corresponding results of each die. Our method puts all of the probabilities into a matrix and compares the probabilities of each result using the Viterbi algorithm. We select the die with the highest probability to determine the final translation sentence. We then take all of the translated sentences as different results for the dice and continue applying the Viterbi algorithm again to obtain the translation with the maximum probability according to the calculated probability.

We use double layer LSTM network to train data, the training data details are in

Layer (type) | Output shape | Param # | Connected to | |
---|---|---|---|---|

input_1 (InputLayer) | (None, none) | 0 | ||

input_2 (InputLayer) | (None, none) | 0 | ||

embedding_1 (Embedding) | (None, none, 128) | 896000 | Input_1 [0] [0] | |

embedding_2 (Embedding) | (None, none, 128) | 1280000 | Input_2 [0] [0] | |

cu_dnnlstm_1 (CuDNNLSTM) | (None, none, 256) | 395264 | embedding_1 [0] [0] | |

cu_dnnlstm_3 (CuDNNLSTM) | (None, none, 256) | 395264 | embedding_2 [0] [0]cu_dnnlstm_1 [0] [1]cu_dnnlstm_1 [0] [2] | |

cu_dnnlstm_2 (CuDNNLSTM) | (None, 256) | 526336 | cu_dnnlstm_1 [0] [0] | |

cu_dnnlstm_4 (CuDNNLSTM) | (None, None, 256) | 526336 | cu_dnnlstm_3 [0] [0]cu_dnnlstm_2 [0] [1]cu_dnnlstm_2 [0] [2] | |

dense_1 (Dense) | (None, none, 10000) | 2570000 | cu_dnnlstm_4 [0] [0] |

As shown in

We align each word and then take the state transition probability matrix and observation state probability matrix as input. Finally, we list the shortest path of each sentence in the article, and form the article sequence.

As shown in

For word alignment, the forward and backtracking algorithms adopt one-to-many or one-to-one methods, subsection and retain context dependency through the aligning process. However, the Hidden Markov Model does not require one-to-many alignment. When we start translating from the second word, we deduce the optimal solution of the second word using the maximum probability of the first word. Further, when using HMM to segment sentences, the optimal solution of each sentence is integrated into a new text is done to form an article sequence. The simplification is possible because the Hidden Markov Model does not need to store the context dependency. We can directly use the article sequence as a new variable to execute another Hidden Markov Model operation.

Second, we use the HMM chain (see

In addition,

According to the experimental data, the accuracy of using the entire sentence when constructing context is much higher than using only the probabilities associated with each topic word, as shown in

Word | P | Word | P |
---|---|---|---|

United | 0.0209182 | Russia | 0.00637757 |

States | 0.0203053 | Security | 0.00617798 |

China | 0.00922345 | International | 0.00601291 |

Countries | 0.00842481 | ||

Military | 0.00749308 | Action | 0.000886684 |

Defense | 0.00702691 | ||

Bush | 0.00658136 | Movement | 0.000151846 |

In this study, we explored context-based processing for machine language translation. We used a Hidden Markov Model to decompose target sentences to identify possible translation paths. Through forward and backward tracking, our model calculates the probability of each translation result to form the article sequence. Each sentence is then taken as a translation unit, with consideration of mutual influence between sentences. The Hidden Markov Model calculates the maximum probabilities to determine the best contextual results. However, the experiments reveal many deficiencies, such as the small number of datasets stored in the database. We plan to increase the size of our datasets in future work to improve this performance. With the continuing growth of the language corpus, we expect the meaning of machine-translated sentences to move closer to the original meaning.