MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech

Farah Mohammad; Khulood Mohammed

doi:10.32604/cmc.2024.056666

Open Access icon Open Access

ARTICLE

MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech

Farah Mohammad^1,2,*, Khulood Mohammed Al Mansoor³

1 Center of Excellence and Information Assurance (CoEIA), King Saud University, Riyadh, 11543, Saudi Arabia
2 Department of Computer Science, and Technology, Arab East Colleges, Riyadh, 11583, Saudi Arabia
3 Self-Development Skills Department, King Saud University, Riyadh, 11543, Saudi Arabia

* Corresponding Author: Farah Mohammad. Email: email

Computers, Materials & Continua 2024, 81(3), 4125-4147. https://doi.org/10.32604/cmc.2024.056666

Received 27 July 2024; Accepted 30 October 2024; Issue published 19 December 2024

Abstract

Depression is a prevalent mental health issue affecting individuals of all age groups globally. Similar to other mental health disorders, diagnosing depression presents significant challenges for medical practitioners and clinical experts, primarily due to societal stigma and a lack of awareness and acceptance. Although medical interventions such as therapies, medications, and brain stimulation therapy provide hope for treatment, there is still a gap in the efficient detection of depression. Traditional methods, like in-person therapies, are both time-consuming and labor-intensive, emphasizing the necessity for technological assistance, especially through Artificial Intelligence. Alternative to this, in most cases it has been diagnosed through questionnaire-based mental status assessments. However, this method often produces inconsistent and inaccurate results. Additionally, there is currently a lack of a comprehensive diagnostic framework that could be effective achieving accurate and robust diagnostic outcomes. For a considerable time, researchers have sought methods to identify symptoms of depression through individuals’ speech and responses, leveraging automation systems and computer technology. This research proposed MDD which composed of multimodal data collection, preprocessing, and feature extraction (utilizing the T5 model for text features and the WaveNet model for speech features). Canonical Correlation Analysis (CCA) is then used to create correlated projections of text and audio features, followed by feature fusion through concatenation. Finally, depression detection is performed using a neural network with a sigmoid output layer. The proposed model achieved remarkable performance, on the Distress Analysis Interview Corpus-Wizard (DAIC-WOZ) dataset, it attained an accuracy of 92.75%, precision of 92.05%, and recall of 92.22%. For the E-DAIC dataset, it achieved an accuracy of 91.74%, precision of 90.35%, and recall of 90.95%. Whereas, on CD-III dataset (Custom Dataset for Depression), the model demonstrated an accuracy of 93.05%, precision of 92.12%, and recall of 92.85%. These results underscore the model’s robust capability in accurately diagnosing depressive disorder, demonstrating the efficacy of advanced feature extraction methods and improved classification algorithm.

Keywords

Depression; deep learning; T5; WaveNet; CCA; neural network

Cite This Article

APA Style

Mohammad, F., Mansoor, K.M.A. (2024). MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech. Computers, Materials & Continua, 81(3), 4125–4147. https://doi.org/10.32604/cmc.2024.056666

Vancouver Style

Mohammad F, Mansoor KMA. MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech. Comput Mater Contin. 2024;81(3):4125–4147. https://doi.org/10.32604/cmc.2024.056666

IEEE Style

F. Mohammad and K. M. A. Mansoor, “MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech,” Comput. Mater. Contin., vol. 81, no. 3, pp. 4125–4147, 2024. https://doi.org/10.32604/cmc.2024.056666

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech

Abstract

Keywords

Cite This Article

2322

1248

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link