With the advent of the big data era, security issues in the context of artificial intelligence (AI) and data analysis are attracting research attention. In the metaverse, which will become a virtual asset in the future, users’ communication, movement with characters, text elements, etc., are required to integrate the real and virtual. However, they can be exposed to threats. Particularly, various hacker threats exist. For example, users’ assets are exposed through notices and mail alerts regularly sent to users by operators. In the future, hacker threats will increase mainly due to naturally anonymous texts. Therefore, it is necessary to use the natural language processing technology of artificial intelligence, especially term frequency-inverse document frequency, word2vec, gated recurrent unit, recurrent neural network, and long-short term memory. Additionally, several application versions are used. Currently, research on tasks and performance for algorithm application is underway. We propose a grouping algorithm that focuses on securing various bridgehead strategies to secure topics for security and safety within the metaverse. The algorithm comprises three modules: extracting topics from attacks, managing dimensions, and performing grouping. Consequently, we create 24 topic-based models. Assuming normal and spam mail attacks to verify our algorithm, the accuracy of the previous application version was increased by ∼0.4%–1.5%.
The governance basis of the metaverse is security. Anomaly detection is mainly employed for user safety protection in programs and applications. As such, companies mainly use natural language processing (NLP) technology to analyze a subject and character’s movement intentions in texts exchanged between users. For example, the malicious intent of hackers through spam is a type of attack. Considerable research has been conducted on extracting textual information from datasets to analyze real spam. Machine learning (ML) methods, such as support vector machine (SVM) and logistic regression (LR), have been widely used as representative analysis methods to detect malicious spam. Additionally, deep learning (DL) methods, such as recurrent neural networks (RNN) and deep neural networks, are widely used at present. However, as big data are generated within the metaverse where numerous users will come together in the future, security issues will become more prominent. Because the actions of users exchanged with various contents are recorded and affect others, a stable security strategy is required to analyze a subject. Various computational techniques, including term frequency and inverse document frequency (TF-IDF), word2vec, and RNN, have been developed, with accuracies of ∼95% or more. Particularly, RNN can now be derived with an accuracy of ∼98%. However, research is needed to increase accuracy for security to face various situations. In this study, we propose a grouping bridgehead (GB) algorithm for better performance and a more stable securing strategy than existing algorithms. The GB algorithm has three modules: extracting topics from attacks, managing dimensions, and performing grouping. It is a learning method that works on top of each other. In terms of F1-score, recall, and precision, we confirmed that the proposed algorithm performed highly compared with existing ML and DL counterpart algorithms. The novelties of this study are as follows.
Improvement compared with existing performance through GB algorithm strategy for security. Comparative analysis through various topic-based models via NLP technology. A framework for security within the metaverse.
The remainder of this article is organized as follows. Section 2 examines cases from previous studies on how artificial intelligence (AI) and NLP technology are used in various fields; related studies on various data are also investigated. Section 3 explains the purpose of this study as well as the principles and development motives of the models proposed herein for security; further, it explains the methodology in order. Section 4 reports the collection process of data used in this study and the performance our methodology achieves. Section 5 summarizes this study, lists application areas of this study, and introduces future development directions.
In this section, we present relevant studies within the metaverse.
It presented the characteristics of the metaverse and necessary technologies and problems for existing devices [
There have been various techniques using machine learning, deep learning, and computational linguistic techniques to classify texts as well as observation records and articles used in healthcare data. It proposed a machine learning technique, using machine and deep learning techniques (long short term memory (LSTM), support vector machine (SVM), naive Bayes, logistic regression (LR), and k-nearest neighbor (KNN)), to collect comments on Reddit for approximately three months related to COVID-19 subjects (approximately 90), and performed sentiment analysis (positive, negative, and neutral) [
A multilingual learning environment was built within the metaverse to collect information, and attempts were made to analyze records through problem-based learning [
In this study, we focused on the topics of deep learning (DL) and machine learning (ML) and documentation for our research. It created a system for monitoring widespread penetration to counter network attacks on social media [
We address security issues in the meta space in this study. First, an anonymous attack through malicious email is assumed. The device (cell phone, computer, etc.) that received it becomes infected with a virus. Types of damage from attacks include falsifying invoices, falsifying resumes of job seekers, and passing on information to others. Additionally, files containing vital information can be spread to an unspecified number of people. This is mainly done as an attack on URL links. Email is an unstructured text, and security has been studied using NLP technology for email attacks. It studied information management technology to secure competitiveness for security by applying NLP to information processing outsourcing [
The purpose and type of damage are to plant a virus, which causes severe damages such as monitoring and manipulating the screens of users’ devices (interphones, laptops, etc.) to shoot and distribute illegal images. This can occur primarily to target media connected to a home network. Security has been studied using NLP technology in preparation for attacks on Internet Protocol (IP) and domains of home networks. It studied sequence search and cost need for datasets [
Conversations may also occur with fraudulent intent. The hacker’s purpose is to implement language barriers to generate bills, steal users’ login information, perform recording scams, and then threaten them. NLP technology is used to classify unstructured texts on social network services (SNS). It proposed speech-and word-based metrics to solve the attack problem of private backdoors [ Unique ID/password, etc. (pilot) Avatar and Behavior (customed) Cyber environment (platform) Controller (If needed) Storage service (cloud, etc.) Software/hardware environment (spec) User communication (produced by contents in metaverse and pilot) Agent tech Devices (PC, notebook, etc.) Computing NLP Etc.
Multiple learning methods are used for high-quality learning data and algorithm performance. It analyzes the topic according to the fraud spam attack, classifies topics with high similarity, and converts it to a weight calculation technique for multiclassification. Then, optimizations are performed. Afterward, the topic is analyzed according to the malicious mail attack, and similar topics are classified and weighted for classification. Then, the dimension is managed and the optimal value is converted. Repeatedly, for message extraction, the topic is divided among subtopics, multi classified by calculating weights, and then dimensional decomposition is performed. As such, the optimal value is converted up to N times, and the obtained values are continuously combined. After iteration, until the final procedure is reached, group learning is performed, and the final output is extracted. The VR and NLP technology interaction systems excelled in maintenance technology, which will be discussed in detail in Section 4. The method algorithm in this study has four main proposed modules. The first module classifies the subject, which uses the latent Dijkstra allocation algorithm. In the past, and the formula for the study are as follows [
Term frequency-topic inverse document frequency with a singular value decomposition (TTIS) is a model that combines the dimensional decomposition technique with regularization to strengthen the sparsity problem generated by matrix of document and term by analyzing the document topic as Basis_i. The advantage of TTIS is that it creates a synergistic effect over the conventional language methods on social media. Second, term frequency-topic inverse document frequency (TFTIDF) performed standardization and topic-based document analysis on specific documents, inverse numbers, etc. using a probabilistic technique. As a result, it was found to be more effective than the conventional model. Term frequency inverse document in singular value decomposition (TIS) calculated the importance from the existing word frequency document frequency and performed singular value decomposition.
σi(t) is a model that performs malicious detection classification by LR. A linear vector is generated for the existing document-word topic classification weights {w0, w1, w2, …, wn−1}. For data processing, we trained several computational models to detect malicious classifications. Among them were SVM and k-nearest neighbor (KNN). SVM classified the variables as Wx+b in a hyperplane equation, whereas KNN grouped the variables by calculating the distance over a set of t∈T. A typical expression for KNN is as follows [
Representatively, the ensemble model was employed. Multiple trees were configured to classify properties; malicious attacks were classified. The expression is as follows [
After calculating the predicted distribution for the basis classifier in a post processing method for each procedure, the layers were increased, and optimization was performed.
When the layers are continuously stacked, there is a role of merging them. The topic grouping layer is allocated in a stack structure to improve the existing topic and to end the procedure.
The expression is as follows:
Topic group layer (TGL) calculates the probability of spam given the jth category in collection_v(fraud conversation, email, alarm, message attack, otherwise) to maximize the likelihood and merges the topic model. The expression is as follows:
The final En*(DL, ML) results by recruiting DL and ML complex data processors for each case for each bridgehead Nth stack. The expression of Estimator I-1 among all is as follows:
For the entire dataset and prediction data (X′, Y′), true positive (TP), true negative (TN), false positive (FP), and false negative (FN), respectively, representing true positive, true negative, false positive, and false negative are evaluated to measure accuracy after constructing a classification matrix. The model formula is as follows [
Precision was measured for basic classification detection. In the proposed algorithm, when FP and TP are given, the calculation method is to divide TP by FP + TP as a ratio. The malicious classification is detected by measuring recall. Meanwhile, given FN and TP, the calculation method is to divide TP by FN + TP.
Equity was also calculated by measuring the F1 scores.
The description of the algorithmic process of the grouping bridgehead (GB) model is as follows. There are parameters for estimating the binary dataset and bridgehead grouping learning assuming that the malicious data set have penetrated. N represents the number of layers, Bcl represents the number of classifiers, W is the topic weight, Tj is topic count, Pn is the estimator’s procedure, and r is the dimension split count. The metaverse stores user communication data in storage and processes necessary for analysis. After parsing the collection in MC, you will build it. By constructing the topic vector, the overall topic classification vector required for bridgehead analysis will be constructed. Optional dimension management for computing on big data. After configuring a data processor using ML and DL, calculations for each procedure are learned. This is repeated until convergence. After updating w according to the collection, we compute the topic probabilities, generate a vector, and compute the final model iteratively. After calculating the maximum stacking length according to the collection, the endpoint is determined and grouped. This GB strategy will excel in maintaining the interaction between VR and NLP technology.
Dataset | Environment | Algorithms |
---|---|---|
UCI Mail dataset | Tensorflow/Keras/Scikit-learn/Python 3.9 |
Grouping |
The performance measured for each layer of the basic models employed in this study is summarized. As the bridgehead layer increased, the performance was compared in various manners. The accuracy measurements of the training data are shown in
Typically, GRU had an accuracy of 0.9816 due to five times training, and recurrent neural network (RNN), k nearest neighbor (KNN), support vector machine (SVM), and logistic regression (LR) had accuracies of 0.9797 (Basis 4), 0.9749, 0.9723 (Basis 1), and 0.9705 (Basis 1), respectively. The proposed model has room for better performance than by the adjustment of GB strategy parameters. Obviously, there is an optimal number of layers, and a huge synergy is expected when using the GB strategy.
In this study, we applied AI-based linguistic computing technology to effectively process big data for security in cyberspace in view of the metaverse era, which comprises several assets. Accordingly, a topic-based grouping bridgehead model was developed to solve the security response problem. Each model presented in this study comprises three modules. There are topic classification, establishment, and confirmation of malicious beachhead groupings as well as dimension management. We found that the filtering performed well for the token classification strategy. This shows the effect of reinforcing the feature problem as a result of previous studies. As a result of the experiment, the accuracy, F1 score, recall, and precision improved, respectively, by ∼1%–30%, 1%–17%, 1%–18%, and 1%–22% or more compared with the existing models. The proposed methodology proved to be effective in protecting users from malicious infiltration of hackers within the twin space. As a result of conducting a security strategy study to prevent penetration from hacker attacks, we found that filtering performed well for topic classification, malicious group identification, dimension management, and token classification strategies. The proposed model has the potential to be further developed by extending it to attacks through email, messages, and alarms, and communication attacks that users may experience in the metaverse. The interaction of experts is required to develop the model and will assist in decision-making.
This research was supported by the SungKyunKwan University and the BK21 FOUR(Graduate School Innovation) funded by the Ministry of Education(MOE, Korea) and National Research Foundation of Korea(NRF)