Social media platforms have proven to be effective for information gathering during emergency events caused by natural or human-made disasters. Emergency response authorities, law enforcement agencies, and the public can use this information to gain situational awareness and improve disaster response. In case of emergencies, rapid responses are needed to address victims’ requests for help. The research community has developed many social media platforms and used them effectively for emergency response and coordination in the past. However, most of the present deployments of platforms in crisis management are not automated, and their operational success largely depends on experts who analyze the information manually and coordinate with relevant humanitarian agencies or law enforcement authorities to initiate emergency response operations. The seamless integration of automatically identifying types of urgent needs from millions of posts and delivery of relevant information to the appropriate agency for timely response has become essential. This research project aims to develop a generalized Information Technology (IT) solution for emergency response and disaster management by integrating social media data as its core component. In this paper, we focused on text analysis techniques which can help the emergency response authorities to filter through the sheer amount of information gathered automatically for supporting their relief efforts. More specifically, we applied state-of-the-art Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL) techniques ranging from unsupervised to supervised learning for an in-depth analysis of social media data for the purpose of extracting real-time information on a critical event to facilitate emergency response in a crisis. As a proof of concept, a case study on the COVID-19 pandemic on the data collected from Twitter is presented, providing evidence that the scientific and operational goals have been achieved.
During the last few decades, the frequency of the occurrence of natural disasters has increased, according to Emergency Event Database [
Recently, the ubiquitous connectivity and proliferation of social networks opened up new opportunities for crisis management through crowdsourcing. One such crowdsourcing tool is Ushahidi [
In this project, we investigate a cloud computing-based big data framework that will enable us to utilize heterogeneous data sources and sophisticated machine learning techniques to gather and process information intelligently, and provide emergency workers useful insights for making informed decisions, as well as guide the general public on how to stay safe during emergencies. Such a comprehensive framework will help the Kingdom to develop comprehensive Disaster Risk Management (DRM) capability for automatically predicting hazards, early warning, risk assessment, and risk mitigation, including coordination of emergency activities and evacuation. The main thrust is to develop an information system product by dynamically extracting data from diverse social media channels, and storing, managing, analyzing, and interpreting these data in a real-time fashion. Additionally, to disseminate resultants information to decision-makers in a format appropriate for carrying out their tasks. The proposed framework performs the following tasks. First, it dynamically captures multilingual (Arabic, English, and other languages) multimodal (text, audio, video, and images) data from various social media channels in real-time. Second, it applies language-specific models to translate multilingual and multimodal contents into a unified ontology (knowledge-base). Third, it uses machine learning and artificial intelligence techniques for intelligent inference from the knowledge base. Last, it interprets results, present information on an interactive dashboard, and disseminate information to relevant stakeholders.
As a part of the project, this paper investigates the existing platforms developed to support crisis-related activities and highlights their benefits and shortcomings. An end-to-end disaster support system is proposed to build upon existing platforms on social media use during mass emergencies and disasters. Then, a methodological approach based on the state-of-the-art AI techniques ranging from unsupervised to supervised learning for an in-depth analysis of social media data collected during disasters is described.
Next, a case study where we apply the proposed methodology on the Twitter data collected during the COVID-19 pandemic for a comprehensive understanding of these real-world devastating crisis events is also described. Specifically, we use topic modeling techniques to understand the different topics discussed in disaster-related posts. To help concerned authorities quickly sift through big crisis data, we employ clustering techniques to group semantically similar messages and find high-level categories. To help humanitarian organizations fulfill their specific information needs, we use deep learning-based classification techniques to classify social media messages into humanitarian categories.
The rest of the paper is organized as follows: Section 2 investigates the existing platforms developed to support crisis-related activities and highlights their benefits and shortcomings. Section 3 discusses the architecture of the IT solution proposed to develop DRM capability for end-to-end crisis response and management. Section 4 describes the machine learning pipeline for in-depth analysis and mapping social media posts to relevant humanitarian categories for effective response. The proposed machine learning pipeline results are discussed in Section 5 using a COVID-19 case study. Finally, the conclusion and future directions are presented in Section 6.
This section discusses different social media platforms developed to extract crisis-related information from social media to support disaster-related activities.
The platform “Tweek the Tweet” [
Likewise, MicroMappers [
Ushahidi [
SensePlace2 [
Although, the system has a strong focus on situational awareness, which is useful to visualize temporal, spatial, and thematic aspects of a crisis, however, it does not create actionable reports for crisis management and decision making. survey manually using predefined survey.
Emergency Situation Awareness (ESA
AIDR (Artificial Intelligence for Disaster [
Most of the systems discussed above are built around the concept of a visual display according to temporal, spatial, and thematic aspects of crisis-related social media data for situational awareness. The visual elements are powered by different computational capabilities, such as extracting relevant information using specific criteria and Natural Language Processing (NLP) techniques, including Named Entity Recognition (NER) and linking entities to concepts. The findings from these platforms suggest that the more situational awareness the formal humanitarian organizations and people have, the better prepared they are to make informed decisions. Some of these platforms focus on creating actionable reports to support disaster response and relief activities. However, the information needed to create actionable reports requires crowdsourcing to manually tag predefined categories, which is neither scalable nor feasible. Furthermore, there is a lack of literature on cohesive information extraction pipeline to automatically extract relevant information, create actionable reports, and deliver information seamlessly to relevant response agencies. This paper proposes a machine learning pipeline that extracts relevant information for disaster relief workers through a cohesive pipeline, consisting of information extraction, clustering, and classification.
This research project aims to develop a cloud-based integrated solution for disaster and emergency management using social media analytics. The main thrust is augmenting existing sensor-based Disaster Risk Management (DRM) systems with social media capabilities by keeping the public in the loop (human sensors). The development of such a system will enable relevant disaster management authorities to integrate and access data from several internet-based social data sources and apply semantic analysis to generate actions to be executed in response to presented contents. The generated results will be used by relevant emergency monitoring and disaster management agencies for emergency response, early warning, risk assessment, and risk mitigation, including coordination of emergency activities.
The workflow can either be triggered automatically upon detecting an event. Alternatively, it can be initiated manually by an operator on deployment.
Several parameters need to be specified to start the crawler. The location-based crawler requires a predefined area of interest and time window size for all social media networks configured for crawling. The location will be provided using static location coordinates (i.e., longitude, latitude) through Google API. An operator needs to specify search-terms or use predefined terms stored in the language database to initiate the search in the keyword search. The crawler will then start searching the posts matching with search terms. The multi-language component’s goal is to provide the capability of crawling the content uploaded in different languages. For achieving that, our system provides a language translation service using Google/Microsoft language translation APIs to translate posts in the target language and store language-specific keywords in the language knowledge-base. Upon setting up these parameters, the application starts receiving data from social media sites, such as Twitter, Facebook, YouTube, a news feed, etc. The content from one or more sources may contain various forms, including text (posts, comments on blog posts, news, etc.) and media uploaded with the post (image and video contents with rich metadata information such as location and text). Following the data collection and translation, the data crawled is transformed into a suitable format to apply preprocessing techniques on text, image, and video, before applying semantic analysis.
The data is transformed in a format suitable to apply subsequent analysis and stored in the event database following preprocessing. The automatic reasoning and inference module performs analysis upon the event data, such as semantic analysis, topic extraction, classification, image, and video analysis. As a result, the system identifies themes and cluster together similar posts, topics, conversations, and content. Classification allows the contents to map into predefined categories. However, as social media contents change quickly, it is not practically feasible to define every disaster category. A disaster ontology will be developed to map metadata derived from social data to the matching class in ontology. Hence, ontology creation and alignment are the immediate future work under this research project. At this stage, we assume that ontology will have classes for different disaster types, as well as information about relevant relief organization and the location of those organizations.
To effectively monitor and visualize crisis-related social media contents, a web-based interface will be developed with the following functionalities:
As a part of the bigger project, described in Section 3, in this paper, we proposed a machine learning pipeline to uncover useful patterns of emergency events using one source of social media data (Twitter) on one type of data (Text). We used several natural language processing techniques to process social media posts. Our machine learning pipeline includes data preparation and textual analysis. Data preparation consists of two steps: (1) data collection and preprocessing (2) feature extraction. After preprocessing the raw dataset and extracting the features, we proceed to the textual analysis stage, including (1) topic modeling, (2) clustering, and (3) classification. As a result, disaster responders can sift through interesting social media data according to a specific humanitarian category. The following sub-sections explain the specific steps, equations, and algorithms in the proposed pipeline.
The first module is prepared to collect and prepare the data for analysis. We used Twitter’s streaming API using filters in terms of search terms. Twitter data is unstructured and varies significantly in readability, grammar, and sentence structure. The accuracy of any natural language processing techniques largely depends on the informal nature of the language in tweets. The first step in constructing this pipeline is to prepare the dataset in which the noise is reduced to an acceptable level, and relevant tweets should not be omitted at the preprocessing stage. So, we perform preprocessing of tweet text before using them to conduct language analysis.
We remove URL’s, mentions, hashtags, emojis, smileys, special characters, and stop words as they do not add valuable semantic information for further analysis. The remaining words are tokenized and converted into lowercase to decrease the influence of typos. In the next step, a normalization component is developed to make the tweets more formal. Tweet textual contents are usually in ill-form and full of misspellings, e.g., earthquak (earthquake), missin (missing), ovrcme (overcome), short words, e.g., pls (please), srsly (seriously), govt (government), msg (message) and repeated words e.g. soooo depressed (so depressed), okkk (ok). Additionally, people use phonetic substitution, e.g., 2nd (second), 2morrow (tomorrow), 4ever (forever), 4g8 (forget), w8 (wait), and words without spaces, e.g., prayforIran (pray for Iran), weneedsdonations (we need donations) to stuff more words in given limitation of the Twitter post under 140 charters [
The first module employs a series of regular expressions and linguistic utilities to normalize the text of the tweets. Spelling correction function, for example, utilized scspell (spell checker source code),
After normalization, we found several unrecognized terms which are not available in the English dictionary. For those terms, we used the WordNet
After basic preprocessing, we created the text corpus for further linguistic analysis by converting preprocessed text corpus into uni-grams and bigram’s bag-of-words vector. Previous studies have found that these two features outperform when used for similar classification tasks [
where
By calculating all terms in the corpus, we create the vectors for each document. In this study, posts are considered as documents for constructing the terms vectors. We denote each document by a vector
The next module in the pipeline facilitates the automatic extraction of thematic structure from the preprocessed textual corpus. This is done with the use of topic modeling, which is the most commonly unsupervised learning technique used for the above-mentioned purpose. We used LDA [
LDA is a generative probabilistic model which assumes that each topic is a mixture of a set of words, and each document is a mixture of probable topics. LDA has the capability to identify the topic by considering the topics as latent random variables valued from the text document by using a hierarchical Bayesian analysis technique [
LDA’s main weakness problem here is that it returns topics as numbered distributions rather than topic names. To overcome this, Ramage et al. [
In contrast to previous approaches, this paper addresses the labeling of topics exposing event-related content that might not have a counterpart on existing external sources. Based on the observation that semantic similarities of a collection of documents can serve as a label representing that collection, we propose the generation of topic label candidates based on the semantic relevance of a document in the observed corpus. In our automatic label identification approach, the task is to discover a sequence of words for each topic
Given the set of documents
Once each document has been weighted, the documents can be ranked by their weights from which top
where
The top
One of the limitations of topic modeling is that individual documents can be represented with a mixture of latent semantic topics in a document collection. It is essential to classify the posts in terms of events related to different humanitarian categories, which will help the emergency operators to sift through the top-ranked posts under the same category for understanding further the issues reported in a post. These classes should ideally be representative of the underlying data, in the sense that they reflect the problems caused by the disaster event. For this purpose, we use clustering techniques on the potentially relevant topics obtained from the topic model in the previous step. Automatically generated clusters are then manually observed by human experts who assign a category name/label to each cluster.
We obtained the topic proportion vector of each document from the LDA topic matrix to apply cluster analysis. Previous studies have demonstrated clustering effectiveness induced by topic modeling [
In the first method, LDA-derived topics are used as the corpus for cluster analysis, where each document (tweet)
In the second method, we applied feature-based cluster analysis. The algorithm starts by extracting features from the LDA-induced topic-word matrix (
Social media users tend to post real-time situational information that could be used by disaster relief and response agencies for effective decision making. However, it is essential to classify the posts into different humanitarian categories for effective processing. After the classification, the dataset would be more informative for assessing the situation in terms of a specific response agency. Several studies have used [
Devlin et al. [
We adopted multiple baseline models to assess the performance of fine-tuned BERT model. The baseline models include CNN with GlobalMaxPooling, Deep Pyramid Convolutional Neural Network (DPCNN), and Recurrent Neural Network (RNN) with GRU units and attention layer and hidden dense layer on top. The performance of the different model is evaluated by calculating accuracies on validation and test dataset:
We label each post with a humanitarian category using the classifier so that new posts on a given disaster type can be automatically classified with no manual effort. The performance of the BERT-based classifier is discussed in the results section of the case study.
The proposed architecture has been validated on Twitter data collected from recent biological disasters i.e., COVID-19. We decided to focus on biological disasters and their impact on society provided the world’s current pandemic situation. However, the system can be extended to any other type of natural or human-made disaster. For all experiments, we used the Python programming language.
We used Twitter Streaming API to collect data using keywords related to covid-19, Following keywords are used to collect data:
A total of 150,000 tweets were collected for experimentation purposes. After initial preprocessing and removing duplicates, the dataset size was reduced to 80,000. For each tweet, original post (tweet text), timestamp, and geographic information (coordinates) were obtained. All these fields are considered useful in deriving the patterns and identifying the purpose of the posts.
In the next step, the topic modeling algorithm LDA was applied to extract semantic topics from the preprocessed social data. The workflow of topic modeling consists of data processing, model training, parameter tuning, and identifying relevant topics. To select and realize the most efficient workflow, several topic models are trained by orchestrating preprocessing steps and applying different combinations of hyperparameters. For calculating the most appropriate number of topics, we used the genism coherence model. We choose 8 topics returned by LDA for the selected dataset because this number of topics achieved the highest coherence score (0.45).
We examined the document-term matrix for 8 topics and obtained the top 10 keywords for each topic.
Unlike the traditional clustering approach, where each data point belongs to one cluster/topic, in topic modeling, it can belong to different topics. For example, a tweet can talk about different symptoms and treatments. For this reason, in
Topic No. | Keywords |
---|---|
1 | Stop_spread, stay, give, virus, government, spread, place, risk, stay_safe |
2 | Worker, country, pandemic, schools, jobs, baby_wipe, world, hospital, public, employ |
3 | Listen, police, health, post, leaders, government, bring, hospitals, follow, world_health |
4 | Health, back, home, continue, water, together, mask, lockdown, face |
5 | Life, die, new_cases, leave, death, confirmed_cases, deadly, pandemic, bad, test |
6 | Symptoms, positive, outbreak, fever, deadly, virus, case_positive, confirm, cases, signs |
7 | Follow, drugs, care, share, worker, lie, figure, vaccine, check, water |
8 | Covid, people, time, home, keep, allow, water, care, week, today |
In this scenario, the keyword-based approach faces difficulties, since social media posts contain anything happening in the world, thus creating an exhaustive list of keywords for every event is not practically feasible. The other approach uses top topic words/terms to assign a semantic label to a topic. However, top topic words do not always reveal words relevant to the information cluster of a topic. Our proposed tf-id based summarization algorithm generates the labels for topics belonging to different categories by extending the top topic words to more topic coherent words in the highest-ranked documents under the target topic. In
Topic 1 | Topic 2 | ||
---|---|---|---|
TT | Stop_spread, stay, give, virus | Worker, country, pandemic, schools | |
TF-IDF | Disease_transmission, stop_spread, self_quarantine | People_fight, baby_wipe, school_close, job_loss |
To evaluate the accuracy of the automatic labeling algorithm, we manually label 2000 tweets based on human interpretation and compare results to understand whether it is realistic to achieve/generate/assign/create/give comparable human labels. For this purpose, we randomly generate some Tweets to be used as samples in each topic in order to understand the themes of these topics. Our research team of 20 members discussed terms in each topic and grouped them into nine common themes (see
Topic number | Top keywords | Predicted labels | Human label | |
---|---|---|---|---|
1 | Stop_spread, stay, give, virus, government, spread, place, risk, stay_safe | Disease_transmission, stop_spread, self_quarantine | Transmission | |
2 | Worker, country, pandemic, baby_wipe, schools, jobs, world, hospital, public, employ | People fight, baby_wipe, school_close, job_loss | Impact (affected people) | |
3 | Listen, police, health, post, leaders, government, bring, hospitals, follow, world_health | World_health, CDC, government | Authorities | |
4 | Health, back, home, continue, water, together, mask, lockdown, face | Stay_home, lockdown face_mask | Prevention | |
5 | Life, die, new_cases, leave, death, confirmed_cases, deadly, pandemic, bad, test | New_cases confirmed_cases, death | Reports (death, new cases) | |
6 | Symptoms, positive, outbreak, fever, deadly, virus, case_positive, confirm, cases, signs | Shortness_breath cough | Sign and symptom | |
7 | Follow, drugs, care, share, worker, lie, figure, vaccine, check, water | New_vaccine, drugs, salt_water | Treatment | |
8 | Covid, people, time, home, keep, allow, water, care, week, today, | Covid, care, share | Other useful information | |
9 | NA | NA | Irrelevant |
A two-dimensional confusion matrix demonstrates the accuracy of the proposed automatic labeling algorithm by showing the number of actual and predicted labels. The confusion matrix consists of four values: true positives (TP) manually and automatically classified as positive), false negatives (FN) (automatically classified as negative), false positives (FP) only automatically classified as positive and true negatives (TN) both manually and automatically classified as negative (see
The semantic categories for each document induced by the topic model are distributed over multiple topics. We used clustering (described in Section 4.4) to classify documents into distinct clusters. Each clustering function takes the LDA generated corpus as input and assigns documents to disjoint clusters. The results of the 8 topics are shown in
At first, we used the document topic-matrix generated by the topic model in the previous step to assign each document to its highest probable topic. The scatter plot of the results on 8 topics shown in
Classifying the social media posts into humanitarian categories is important for capturing the affected areas’ needs. We adopted the humanitarian categories and labeled data from Alam et al. [ Reports: the tweets are related to different reports, e.g., reports of deaths due to the disease, confirmed cases, and the number of cases reported. Signs or symptoms: text contains symptoms such as fever, cough, diarrhea, and shortness of breath or questions related to these symptoms. Disease transmission: text or questions related to disease transmission. Disease prevention: questions or suggestions related to the prevention of the disease or refer to a new prevention strategy, e.g., travel ban, self-isolation, wash hands, social distancing, quarantine. Disease treatment: questions or suggestions with regard to disease treatments. Impact: e.g., unemployment, economic impact, reports about people affected by the disease. Authorities: text related to government policies, WHO initiatives, etc. Other useful information: other helpful information that is related to the disease itself. Not related or irrelevant: text irrelevant to the situation.
We trained a classifier by fine-tuning a pre-trained BERT transformer [
We used the BERT
Categories | No. of labeled tweets | No. of test tweets |
---|---|---|
Reports | 350 | 3250 |
Signs or symptoms | 286 | 1540 |
Transmission | 83 | 253 |
Prevention | 267 | 2425 |
Treatment | 239 | 1358 |
Impact | 450 | 733 |
Authorities | 81 | 248 |
Other useful information | 174 | 1478 |
Irrelevant | 70 | 3534 |
We used some standard classification models such as GRU, CNN with GlobalMaxPooling and GRU, Deep Pyramid Convolutional Neural Network (DPCNN), Recurrent Neural Network (RNN) with GRU units and attention layer, and hidden dense layer on top. The validation and testing accuracy for the fine-tuned BERT and baseline models are shown in
Models | Validation accuracy (%) | Test accuracy (%) |
---|---|---|
GRU | 80.3 | 73.6 |
CNN (maxPooling) | 72.5 | 64.7 |
CNN GRU | 71.14 | 66.8 |
DPCNN | 73.25 | 68.6 |
RNN | 81.45 | 70.65 |
Fine-tuned BERT | 91.78 | 76.52 |
The fined-tuned BERT classifier achieved a validation accuracy of 91.78%, which is much higher than the standard models’ accuracy, ranging from 71% to 81%. The fine-tuned BERT model also outperforms other baseline deep learning models in terms of test accuracy. It is worth mentioning that this is a multi-classification problem (nine class); hence the validation accuracy is not very high compared to binary classification tasks. Also, due to the limited labeled dataset (2000 examples), the model would not learn features from the unseen text, hence achieving a lower test accuracy of 76.52%. A rigorous dataset labeling is required to address these problems. However, the classification results are reasonably acceptable for capturing most of the information categories and providing evidence for the capability of a fine-tuned BERT model used for precise classification.
This work presented a machine learning pipeline to automatically map the disaster events across the different humanitarian organizations for supporting their relief efforts. The pipeline integrates topic modeling, clustering, and classification with the capability to detect evolving disaster events and map them across different humanitarian categories using social media data. The proposed pipeline’s application was demonstrated in a case study related to the COVID-19 virus by using Twitter data. The results suggest the following: (1) the integrating enhanced topic summarization method is useful for detecting coherent topics and can be used to predict corresponding labels for real-time situation analysis; (2) In comparison to traditional clustering techniques the topic induced clustering method is more useful for grouping social media posts across different classes; (3) the fine-tuned BERT-based classifier performs better than standard deep-learning classifiers in classifying the tweets into different humanitarian categories.
The proposed machine learning pipeline offers important directions for future research. First, the proposed pipeline is validated on the COVID-19 case study. To extend the proposed pipeline capabilities to other disaster domains such as flood, traffic industrial accidents, and earthquakes can be further integrated into the pipeline. However, lack of broad humanitarian categories’ availability across different disaster domains might impair the process. Second, the integration of other intelligent techniques to uncover more situation awareness such as disaster-affected areas, entities mentioned in the posts, and additional useful information would better support response organizations in their relief work. Third, integration of other data sources such as different social media channels (Facebook, Instagram etc.), news articles, and remote sensing of data would help to gain better awareness for a disaster-affected region. However, integrating heterogeneous data into a unified format is a challenge. Finally, the integration of techniques used to identify social media posts’ credibility and information bias is an important area to be addressed.