TY - EJOU AU - Ghani, Norjihan Binti Abdul AU - Hamid, Suraya AU - Ahmad, Muneer AU - Saadi, Younes AU - Jhanjhi, N.Z. AU - Alzain, Mohammed A. AU - Masud, Mehedi TI - Tracking Dengue on Twitter Using Hybrid Filtration-Polarity and Apache Flume T2 - Computer Systems Science and Engineering PY - 2022 VL - 40 IS - 3 SN - AB - The world health organization (WHO) terms dengue as a serious illness that impacts almost half of the world’s population and carries no specific treatment. Early and accurate detection of spread in affected regions can save precious lives. Despite the severity of the disease, a few noticeable works can be found that involve sentiment analysis to mine accurate intuitions from the social media text streams. However, the massive data explosion in recent years has led to difficulties in terms of storing and processing large amounts of data, as reliable mechanisms to gather the data and suitable techniques to extract meaningful insights from the data are required. This research study proposes a sentiment analysis polarity approach for collecting data and extracting relevant information about dengue via Apache Hadoop. The method consists of two main parts: the first part collects data from social media using Apache Flume, while the second part focuses on querying and extracting relevant information via the hybrid filtration-polarity algorithm using Apache Hive. To overcome the noisy and unstructured nature of the data, the process of extracting information is characterized by pre and post-filtration phases. As a result, only with the integration of Flume and Hive with filtration and polarity analysis, can a reliable sentiment analysis technique be offered to collect and process large-scale data from the social network. We introduce how the Apache Hadoop ecosystem – Flume and Hive – can provide a sentiment analysis capability by storing and processing large amounts of data. An important finding of this paper is that developing efficient sentiment analysis applications for detecting diseases can be more reliable through the use of the Hadoop ecosystem components than through the use of normal machines. KW - Big data analysis; data filtration; text analysis; sentiment analysis; social media; event detection DO - 10.32604/csse.2022.018467