Open Access iconOpen Access

ARTICLE

crossmark

MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech

Farah Mohammad1,2,*, Khulood Mohammed Al Mansoor3

1 Center of Excellence and Information Assurance (CoEIA), King Saud University, Riyadh, 11543, Saudi Arabia
2 Department of Computer Science, and Technology, Arab East Colleges, Riyadh, 11583, Saudi Arabia
3 Self-Development Skills Department, King Saud University, Riyadh, 11543, Saudi Arabia

* Corresponding Author: Farah Mohammad. Email: email

Computers, Materials & Continua 2024, 81(3), 4125-4147. https://doi.org/10.32604/cmc.2024.056666

Abstract

Depression is a prevalent mental health issue affecting individuals of all age groups globally. Similar to other mental health disorders, diagnosing depression presents significant challenges for medical practitioners and clinical experts, primarily due to societal stigma and a lack of awareness and acceptance. Although medical interventions such as therapies, medications, and brain stimulation therapy provide hope for treatment, there is still a gap in the efficient detection of depression. Traditional methods, like in-person therapies, are both time-consuming and labor-intensive, emphasizing the necessity for technological assistance, especially through Artificial Intelligence. Alternative to this, in most cases it has been diagnosed through questionnaire-based mental status assessments. However, this method often produces inconsistent and inaccurate results. Additionally, there is currently a lack of a comprehensive diagnostic framework that could be effective achieving accurate and robust diagnostic outcomes. For a considerable time, researchers have sought methods to identify symptoms of depression through individuals’ speech and responses, leveraging automation systems and computer technology. This research proposed MDD which composed of multimodal data collection, preprocessing, and feature extraction (utilizing the T5 model for text features and the WaveNet model for speech features). Canonical Correlation Analysis (CCA) is then used to create correlated projections of text and audio features, followed by feature fusion through concatenation. Finally, depression detection is performed using a neural network with a sigmoid output layer. The proposed model achieved remarkable performance, on the Distress Analysis Interview Corpus-Wizard (DAIC-WOZ) dataset, it attained an accuracy of 92.75%, precision of 92.05%, and recall of 92.22%. For the E-DAIC dataset, it achieved an accuracy of 91.74%, precision of 90.35%, and recall of 90.95%. Whereas, on CD-III dataset (Custom Dataset for Depression), the model demonstrated an accuracy of 93.05%, precision of 92.12%, and recall of 92.85%. These results underscore the model’s robust capability in accurately diagnosing depressive disorder, demonstrating the efficacy of advanced feature extraction methods and improved classification algorithm.

Keywords

Depression; deep learning; T5; WaveNet; CCA; neural network

Cite This Article

APA Style
Mohammad, F., Mansoor, K.M.A. (2024). MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech. Computers, Materials & Continua, 81(3), 4125–4147. https://doi.org/10.32604/cmc.2024.056666
Vancouver Style
Mohammad F, Mansoor KMA. MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech. Comput Mater Contin. 2024;81(3):4125–4147. https://doi.org/10.32604/cmc.2024.056666
IEEE Style
F. Mohammad and K. M. A. Mansoor, “MDD: A Unified Multimodal Deep Learning Approach for Depression Diagnosis Based on Text and Audio Speech,” Comput. Mater. Contin., vol. 81, no. 3, pp. 4125–4147, 2024. https://doi.org/10.32604/cmc.2024.056666



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1309

    View

  • 575

    Download

  • 0

    Like

Share Link