The rapid growth of the use of social media opens up new challenges and opportunities to analyze various aspects and patterns in communication. In-text mining, several techniques are available such as information clustering, extraction, summarization, classification. In this study, a text mining framework was presented which consists of 4 phases retrieving, processing, indexing, and mine association rule phase. It is applied by using the association rule mining technique to check the associated term with the Huawei P30 Pro phone. Customer reviews are extracted from many websites and Facebook groups, such as re-view.cnet.com, CNET. Facebook and amazon.com technology, where customers from all over the world placed their notes on cell phones. In this analysis, a total of 192 reviews of Huawei P30 Pro were collected to evaluate them by text mining techniques. The findings demonstrate that Huawei P30 Pro, has strong points such as the best safety, high-quality camera, battery that lasts more than 24 hours, and the processor is very fast. This paper aims to prove that text mining decreases human efforts by recognizing significant documents. This will lead to improving the awareness of customers to choose their products and at the same time sales managers also get to know what their products were accepted by customers suspended.
Since the rise in social media usage in the last decade, as an additional source to traditional media, individuals have been looking to gain information from the crowd. Social media data can be analyzed to gain insights into issues, trends, influential actors, and other kinds of information [
Text Mining (TM) is defined as a process to extract meaningful information from the collected text data. Before applying any data mining techniques, it should take into consideration the important process of TM which is preprocessing operations [
In the field of text data analysis, there are several applications used similar to information extraction, summarization, and document classification, clustering. The vast amount of textual documentation becomes more intensive study with the rise of web technology and needs to be perfectly processed to help researchers get meaningful information data mining techniques [
The objective of this study is to discuss how the applying of the association rule with text mining helps researchers to know the importance of an item (product) in social media without referring to review a huge amount of data from customers. In this study, the researcher analyzed the Huawei P30 Pro phone’s consumer feedback, and text mining techniques are used to evaluate how various words are used for other words and what responses from the customer based mainly on this phone.
This research organization is as follows, Section 2 discusses the background of the text mining process, Section 3 presents the previous studies which focused on text mining techniques, Section 4 researchers present the proposed framework for text mining, Section 5 discusses the methodology of this study and Section 6 discusses the analysis and results for applying the proposed framework for text mining the customers’ reviews on social media.
The pattern is extracted from the unstructured data or natural language text as the input, as TM is the extraction of meaningful information from the text and then processed to obtain structured text [ —Document Gathering: In the first step, the text documents are collected in different formats which be in form of HTML doc, pdf, word [ —Document Pre-Processing:
In the second process, removing redundancies, separate words, inconsistencies, and stemming hence documents are prepared for the 3 next stages, as follows [
Tokenization:
The document string given is split into a single unit or token [
Removal of Stop word:
The removal of usual words like a, an, but, and, of, the, etc., in this step.
Stemming:
A stem is a group of words with equal significance that are very similar. The basis of a specific word [
Text Transformation: since the text document is a collection of words and their occurrences [
Feature Selection: this method retrieves an irrelevant feature from input [
—Pattern Selection: the conventional process of data mining is combined with the process of text mining in this stage [
Technique | Description | |
---|---|---|
Information extraction (stemming) | It is the task of automatically extracting structured information from unstructured and semi-structured machine-readable documents [ |
|
Summarization | Reduces a text document to create a summary of the most important points of the original document. | |
Topic tracking | Keeps the user profile based on previous searches and very efficiently guess other documents based on the user profile [ |
|
Classification | Detects counts of words and decides the subject of the document from that count [ |
|
Categorization | This is the task of assigning free-text documents to predefined categories [ |
|
Clustering | The most significant unsupervised learning issue is the finding of a structure in a collection of unlabeled data [ |
|
Concept linkage | To find related documents, text mining uses the linkage of the technique concept. Instead of searching, this mechanism navigates documents. It provides the facility for related documents to be linked. | |
Natural language processing | Design and create such a computer system to examine, understand, and generate NLP [ |
|
Stop-word list | Remove unimportant words such as “a”, “the,” “so” and so on. |
The Internet is an environment to collect a huge expanded amount of data. Whereas data can be extensively ordered into two types, qualitative and quantitative data [
Social network investment is a form of consumption and the various types of returns on social capital, such as economic returns [
In this section, previous studies of text mining on social media will be represented, such as social media effects on customer’s procurement via the internet, also text analysis through machine learning will be introduced.
Authors in [
Authors in [
While authors described in [
Authors in [
They indicated that the analysis of sentiment is a specific form of text analysis for the identification of valence and the analysis of subjectivity of user-generated content (UGC).
Authors in [
Authors in [
Authors in [
Supervised learning refers to a classification technique for machine learning that uses a set of labeled training data to determine class labels for unnoticed instances. One of the common algorithms for classification (K-Nearest Neighbors, Vector Machine Support [SVM], Logistic Regression, Naive Bay [NB]) [
The lexicon-based approach compares the characteristics of the text with pre-defined positive and negative sentiment lexicons and determines whether the document has a more positive or negative tone. For UGC valence classification, the supervised classification method exists. But the restriction of the lexicon-based approach to online review sentiment detection is that this method is highly domain-dependent.
Authors in [
By comparing the performance of various classification techniques (‘helpfulness analysis’), authors in [
The process of Social Media Analytics (SMA) proposed by [
The following SM analytical methods are applied after a thorough review of the SMA methods used to accomplish these processes, such as text analysis; sentiment analysis; content analysis; trend analysis; predictive analytics; social network analysis; spatial analysis; and comparative analysis.
In this section, the proposed framework for text mining will be presented with 4 main stages (retrieving data, processing, indexing, and association rule) phase as illustrated in (
In this study, the researcher gathered data on review.cnet.com, CNET.technology on Facebook, and amazon.com from consumer feedback. File formats (RTF, txt, doc, etc.) are approved at this stage and will be translated into XML format at the processing stage.
The processing phase has some sub-steps (transformation, filtration, and stemming of the documents). In this phase firstly text gathers from different sources for transformation. After that, unimportant words such as grammatical words (common adverbs, articles, determiners, pronouns, prepositions, and non-informative verbs (be)) are removed from documents content by the filtering process. Checking the content of the documents and eliminate all the unimportant words that are listed in stop words and also, after that, the special characters, parentheses, commas will be replaced with the spaces among words in the converted document. After completion of the categorization process, the process of word stemming will be started, which removes the word’s prefixes and suffixes. A stemming dictionary (lexicon) will be used as a stemming algorithm.
The techniques for automated production of indexes associated with documents usually rely on frequency-based weighting schemes. The weighting scheme TF-IDF (Term Frequency, Inverse Document Frequency) is used to assign higher weights to distinguished terms in a document, and it is the most widely used weighting scheme [
N tj refers to the no. of documents in collection C
Where the second clause, the value of
Document frequency formula as
(
In this phase, an algorithm is used to find out the related words that are frequently used and to generate the confidence and lift factors on these words that will be helpful to make association rules. For text mining using the association rule, the Frequent Pattern Growth (FP-Growth) algorithm is used.
The FP-Growth algorithm is more applicable than the Apriori algorithm. It represents the database in the form of a frequent pattern tree or FP tree whose purpose is to mine the most frequent pattern [
(
Customer reviews are collected from several sites and Facebook groups such as review.cnet.com, CNET. Technology on Facebook and amazon.com, where customers from everywhere put their notes about mobile phones. In this study, a total of 192 reviews of Huawei P30 Pro were collected from previously mentioned Facebook groups and amazon sites to analyze them through text mining techniques.
Next, stop words were removed which have no significant information and occur very frequently such as the words ‘a’, ‘an,’ ‘is,’ ‘are,’ this will be done through the stop-words process. After that, unimportant words such as grammatical words (common adverbs, articles, determiners, pronouns, prepositions, and non-informative verbs (be)) are removed from documents content by the filtering process. Next, a stemming dictionary (lexicon) will be used as a stemming algorithm.
Afterward, the indexing phase will be started, with the TF–IDF value of each word in each document was weighed. Each word existing in the matrix was created with TF–IDF scores.
Next step, the X-mean [
X-mean clustering is applied for collecting data and produced 3 clusters which were identified as technical feedback, emotional feedback, and smartphone brands feedback. Some of the words are given in (
Cluster_0 | Cluster_1 | Cluster_2 |
---|---|---|
Charger | love | Galaxy |
Android | Fast | Mate |
Face | Awesome | Pro |
Recognition | Overpriced | iPhone |
Camera | Work | Huawei |
Buy | Cheap | Apple |
Screen | Worth | Samsung |
Lightning | Great | Product |
Device | Quality | Device |
Features | Waste | Phone |
USB | Life | Note |
Headphone | Look | Mobile |
Jack | Lot | Model |
Lasts | Old | S |
Display | Perfect | P |
Adapter | Price | |
Battery | Simple | |
Plug | Smaller | |
Sound | Excellent | |
Service | Fact | |
Cable | Amazing | |
Build | Available | |
Disappointed | ||
Big | ||
Simple | ||
Outstanding |
Cluster_0 represents customer’s remarks which focused on the technical aspects of Huawei pro, whereas Cluster_1 is represented customer’s remarks which focused the emotional feedback, but Cluster_2 has represented customer’s comparison between several products and brands (Huawei, iPhone, Samsung) concerning the features of Huawei P30 Pro, iPhone 11 and Samsung Galaxy note 10 plus.
Word counts can be used to determine what are the most words which should be meaningful in the output. Hence, all reviews of Huawei pro, Huawei, and mate have occurred very frequently. (
Word | Freq. | Word | Freq. | Word | Freq. | Word. | Word |
---|---|---|---|---|---|---|---|
phone | 151 | S | 216 | Mate | 42 | Lasts | 4 |
iPhone | 80 | Samsung | 44 | Charge | 13 | LCD | 6 |
Face | 4 | Scratches | 4 | Android | 6 | Life | 6 |
Recognition | 4 | Se | 54 | Awesome | 19 | Look | 10 |
Fast | 10 | Service | 13 | Overpriced | 4 | Lot | 10 |
Love | 6 | Simple | 17 | Screen | 16 | Mobile | 4 |
Features | 12 | Smaller | 4 | Lightning | 4 | model | 5 |
Green | 4 | Sound | 9 | Device | 6 | Nice | 8 |
Pro | 107 | T | 287 | Beautiful | 4 | Note | 42 |
Huawei | 224 | Thanks | 4 | Camera | 30 | Old | 6 |
Buy | 12 | Think | 4 | Waste | 9 | OLED | 4 |
Use | 20 | Time | 8 | Water | 11 | Outstanding | 4 |
Money | 8 | USB | 8 | Work | 14 | p | 316 |
Great | 24 | Verizon | 4 | World | 4 | Get | 10 |
Quality | 21 | Price | 13 | Worth | 6 | Good | 26 |
i | 361 | Problem | 6 | Perfect | 12 | Got | 6 |
m | 204 | Product | 8 | Phones | 8 | Headphone | 6 |
Speed | 4 | Review | 8 | Plug | 9 | Jack | 10 |
Fact | 10 | Display | 4 | Connect | 6 | Charger | 4 |
Feature | 15 | Example | 4 | Day | 6 | Cheap | 6 |
Galaxy | 36 | Excellent | 6 | Disappointed | 4 | Compare | 4 |
Bluetooth | 11 | Available | 15 | Actually | 4 | Amazing | 18 |
Build | 4 | Average | 4 | Adapter | 4 | Apple | 24 |
c | 170 | Battery | 18 | Cable | 6 | Big | 6 |
Association rule mining presents the relation to other words and their occurrences in the document. In this phase, the FP-Growth algorithm is used to extract the related words that are repeatedly used and to generate the confidence and lifting factors on these words that will be helpful to make association rules.
In this study, (
Premises | Conclusion | Support | Confidence | Lift |
---|---|---|---|---|
Phone, pro, camera | Huawei | 0.052083333 | 1 | 2.042553 |
Phone, pro, great | Huawei | 0.052083333 | 1 | 2.042553 |
Phone, p, camera | Huawei | 0.052083333 | 1 | 2.042553 |
Phone, p, great | Huawei | 0.052083333 | 1 | 2.042553 |
Pro, camera | Huawei, p | 0.09375 | 1 | 3.310345 |
p, camera | Huawei, pro | 0.09375 | 1 | 3.310345 |
Pro, p, camera | Huawei | 0.09375 | 1 | 2.042553 |
p, good | Huawei, pro | 0.0625 | 1 | 3.310345 |
Pro, p, good | Huawei | 0.0625 | 1 | 2.042553 |
Pro, great | Huawei, p | 0.0625 | 1 | 3.310345 |
p, great | Huawei, pro | 0.0625 | 1 | 3.310345 |
Pro, p, great | Huawei | 0.0625 | 1 | 2.042553 |
Phone, pro, camera | Huawei, p | 0.052083333 | 1 | 3.310345 |
Phone, p, camera | Huawei, pro | 0.052083333 | 1 | 3.310345 |
Phone, pro, p, camera | Huawei | 0.052083333 | 1 | 2.042553 |
Phone, pro, great | Huawei, p | 0.052083333 | 1 | 3.310345 |
Phone, p, great | Huawei, pro | 0.052083333 | 1 | 3.310345 |
Phone, pro, p, great | Huawei | 0.052083333 | 1 | 2.042553 |
Phone, note | Samsung, galaxy | 0.052083333 | 1 | 6.857143 |
Phone, note, galaxy | Samsung | 0.052083333 | 1 | 5.647059 |
Apple | iPhone | 0.114583333 | 1 | 3.428571 |
Mate | Huawei | 0.1875 | 1 | 2.042553 |
Phone, p | Huawei | 0.145833333 | 1 | 2.042553 |
Phone, mate | Huawei | 0.052083333 | 1 | 2.042553 |
Text mining decreases human efforts by recognizing significant documents. So, not all 192 (Customer’s reviews) were important to be read to understand what customers opinions about Huawei P30 Pro which has been a large portion by most of the reviewers. The loyalty to the iPhone was re-presented by some user feedback and compared with the Samsung Galaxy note 10 plus by some others.
Customers who love Samsung claim that it was easy to use and nice in price, but others assume that the charger of Samsung is poor. In specific, Huawei lovers (Huawei P30 Pro) say that it has strong points such as best safe, high camera quality, a battery that lasts more than 24 h, and a very good processor.