Deep Learning-Based Sign Language Recognition for Hearing and Speaking Impaired People
Department of Information Technology, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif, 21944, Saudi Arabia
* Corresponding Author: Mrim M. Alnfiai. Email:
Intelligent Automation & Soft Computing 2023, 36(2), 1653-1669. https://doi.org/10.32604/iasc.2023.033577
Received 21 June 2022; Accepted 04 August 2022; Issue published 05 January 2023
AbstractSign language is mainly utilized in communication with people who have hearing disabilities. Sign language is used to communicate with people having developmental impairments who have some or no interaction skills. The interaction via Sign language becomes a fruitful means of communication for hearing and speech impaired persons. A Hand gesture recognition system finds helpful for deaf and dumb people by making use of human computer interface (HCI) and convolutional neural networks (CNN) for identifying the static indications of Indian Sign Language (ISL). This study introduces a shark smell optimization with deep learning based automated sign language recognition (SSODL-ASLR) model for hearing and speaking impaired people. The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign language provided by deaf and dumb people. The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign language classification. In the first stage, the Mask Region based Convolution Neural Network (Mask RCNN) model is exploited for sign language recognition. Secondly, SSO algorithm with soft margin support vector machine (SM-SVM) model can be utilized for sign language classification. To assure the enhanced classification performance of the SSODL-ASLR model, a brief set of simulations was carried out. The extensive results portrayed the supremacy of the SSODL-ASLR model over other techniques.
Recently, the population of deaf-dumb victims has raised due to birth defects and other problems. A deaf and mute individual may not able to interact with ordinary people by relying certain kinds of transmission mechanisms . The gesture displays certain physical actions of the hand which conveys a part of information. Gesture recognition was the analytical clarification of movement of a person via information processing mechanisms. Verbal communication offers the most effectual conversation platform for mute person for speaking with ordinary people . Only some individual realizes the meaning of sign. Usually, deaf person was deprived of usual contact with other typical individuals in the society. Human computer interface (HCI) was an exciting system amongst devices and individuals. This was an exciting research area which aims at the usage and formation of computer technology and particularly, combined communications amongst devices and humans. HCI structure was amazingly protracted and enhanced by the technical transition . On the basis of fruitful usability, evolving technologies apply modern user lines like Speech recognition, Non-Touch, and Gesture. It is a complex and expensive technology to reach [4,5]. This newly applied technology was after combined as implying particular applications on the basis of cost-effectiveness and demand. For addressing the complexities, numerous authors were attempting for developing such interferences with regard to robustness, performance, and accessibility . The optimum model could have several standard features like scalability, simplicity, flexibility, and precision. Nowadays, the human gesture turns to be an extensive HCI application, and the usage of human gestures, that fulfils all such standards, was rising quickly [7,8].
Sign language (SL) always considered the main way of verbal interaction amongst individuals who were both dumb and deaf . Whereas interacting such people is very helpless and therefore were only depends on hand gestures. Visual signs and gestures were an energetic part of automated sign language (ASL) which offers deaf individuals reliable and easy communication . It has well-defined code gestures in which every sign carries a specific meaning relating to communication. There were several methods to seek gestural data. But limiting to only major kinds there were 2 significant familiar kinds they are Vision-related and Sensor-related techniques . The sensor-related technique gathers data from the glove produced by hand movement. In the vision-related technique, the image was considered by using cameras . This technique indulges the image qualities like texture part and coloring which was necessary to the specific hand gesture.
In , a comparative analysis of different gesture recognition methods including convolutional neural network (CNN) and machine learning (ML) procedures has been deliberated and verified for realistic performance. A hierarchical neural network, pre-trained VGG16 with fine-tuning, and VGG16 with transfer learning were analyzed according to a trained parameter count. The model was trained on a self-developed datasets comprising images of Indian Sign Language (ISL) representation of each twenty-six English alphabet. In , the authors shows that a later fusion technique to multi-modality in sign language detection increases the entire capacity of algorithm when compared to singular approach of Leap Motion data and image classification. Using a larger synchronous dataset of eighteen BSL gestures gathered from different subjects, two deep neural networks (DNNs) were compared and benchmarked for deriving an optimal topology. The Vision model was carried out by a CNN and optimized artificial neural network (ANN), and the Leap Motion mechanism is carried out using an evolutionary search of ANN model.
Kumar et al.  developed the usage of graph matching (GM) to allow three dimensional motion capture for Indian sign language detection. The sign recognition and classification problems to interpret three dimensional motion signs are taken into account an adoptive GM (AGM) problems. But, the existing model to solve an AGM issue have two most important disadvantages. Firstly, spatial matching is implemented on a number of frames with a definite set of nodes. Then, temporal matching splits the whole three dimensional datasets into a definite set of pyramids. In , the authors conducted a complete systematic mapping of translation-assisting techniques to the provided sign language. The mapping has regarded the primary guideline for systematic review that is, pertains software engineering because it is necessary to take responsible for multidisciplinary fields of education, accessibility, human computer communication, and natural language processing. A continuous improvement of software tools named SYstematic Mapping and Parallel Loading Engine (SYMPLE) enabled the construction and querying of a base set of candidate studies. Parvez et al.  related the gap among conventional teaching and the technology-based methods that are employed to teach arithmetical concepts. The participant was separated into developed mobile applications and conventional approaches (board and flash cards). The variance in the performance these groups is assessed by accompanying quizzes.
This study introduces a shark smell optimization with deep learning based automated sign language recognition (SSODL-ASLR) model for hearing and speaking impaired people. The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign language provided by deaf and dumb people. The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign language classification. At the first stage, the Mask Region based Convolution Neural Netwokr (Mask RCNN) model is exploited for sign language recognition. Secondly, SSO algorithm with soft margin support vector machine (SM-SVM) model was utilized for sign language classification. To assure the enhanced classification performance of the SSODL-ASLR model, a brief set of simulations was carried out.
In this study, a new SSODL-ASLR technique was introduced for the recognition of SL for hearing and speaking impaired people. The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign language provided by deaf and dumb people. The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign language classification.
In the first stage, the Mask RCNN model is exploited for sign language recognition. Mask RCNN was a DNN mainly focused on solving instance segmentation issues in computer vision or ML . In other words, it can separate different objects in an image or a video. The feature pyramid network (FPN) to object detection, a 1st block structure of Mask RCNN is accountable for extracting features. The regional proposal network (RPN), the 2nd part of Mask RCNN, and shares whole image convolutional features with detection network thus approximately assisting cost-free RPN. Once the extended Fast RCNN processes the Mask RCNN with added a branch to forecast an object mask from equivalent with the suggested branch to bound box recognition. The RPN is carried out from Mask RCNN instead of selective search and thus RPN shares the convolution features of entire map with detection network. It forecasts fused boundary location and objects score at each place, and it is fully convolution network (FCN). Based on the Mask RCNN, it presented an algorithm to improve the speed using side fusion feature pyramid network (SF-FPN) with Resnet-86 and enlighten the performance. In such cases, the dataset, FPN architecture, and RPN variable setting were improved. The improved method developed in such cases is appreciating the segmentation, recognition, and detection of target at the same time. Fig. 1 illustrates the infrastructure of Mask RCNN method.
During the multitasking, loss function is carried out by trainable Mask CNN with three segments namely classification locate regression loss of bounding box, loss of mask, and loss of bounding box as follows.
From the expression, indicates the notable probability for ROI in classifier loss as well as employed to ground truth as one that ROI was taken into account as zero or foreground. represents the vector of accurate control to identify bounding boxes and characterize the ground truth in location regression loss whereby denotes the loss function to estimate the regression mistake. Each ROI detects the outcomes of dimension via mask branch and encoded K mask alongside resolution of . The loss of mask is regarded by Average Binary Cross-entropy Loss to carry out sigmoid function on each pixel from ROI. In class , mask loss was illustrated in above equation.
In this study, the SM-SVM model is utilized for sign language classification. SM-SVM aim is to expand the Maximal Separating Margin SVM (hard margin SVM), such that the hyperplane enables a noisy dataset to exist. In the event, a variable , called Slack factor was presented to accountable for the number of violations of classifier .
The Primal Problem is defined by
The Dual Problem Function of soft margin was equated by
KKT complementary conditions in inseparable instance are represented as follows
From the expression, denotes the Lagrange Multipliers respective to that was presented for enforcing the non-negativity of Where derivative of Lagrange function for primitive problem regarding is zero, the computation of the derivative produces. Fig. 2 showcases the overview of SVM hyperplane.
As a result, the optimum weight is formulated by:
The optimum bias is attained using samples in the trainset set where and , as follows
To optimally modify the SM-SVM parameter values, the SSO algorithm has been exploited in this study. As the optimum hunter by their nature, the sharks are foraging nature that rotate and drives frontward which is very effectual from determining prey . An optimized approach for simulating shark foraging is most effectual optimized approach. To certain places, the shark transfers at speed to particles that take intensive scent, so initial velocity vector is formulated as:
The sharks take inertia once it swims, so velocity equation of every dimensional is provided under,
In which , and ; implies the count of dimensional; signifies the count of velocity vectors (size of shark populations); denotes the count of iterations; signifies the objective function; demonstrates the gradient co-efficient; implies the weighted co-efficient, therefore it can be random value amongst 0 and 1, as well & represented the 2 arbitrary values from range of zero’s and one’s. The speed of sharks is needed for preventing boundary and particular speed limitation as given below:
whereas denotes speed limit factors of iteration. The sharks take a new place because of moving forward, and is determined as the previous place and speed which is provided as:
In which denotes the time interval of iteration. Also the moving forward, sharks commonly rotate along its path for seeking strong scent particles and improve its direction of motion that is actual direction of moving.
The rotating shark moves from the closed range which could not fundamentally a circle. During the view of optimized, the shark executes local searching at all the phases to determine optimum candidate solutions. The searching equation to this place was provided under as:
whereas signifies the amount of points at all the phases of place searches; denotes the random value from the range of −1 and 1. If the shark determines a strong odor point from the rotation, it moved towards the point and endures the search direction. The position search procedure was provided from the subsequent formula,
As aforementioned, has been obtained in the linear motion and has reached in the rotational motion. The shark is choosing the candidate solution with higher computation index value as shark following place . For boosting the convergence rate of SSO technique, the OBL model was utilized and so improves the quality of primary population solutions. It discovers either opposite or original direction solutions. The opposite number x is represented as a real value from the range of . The opposite number of x is determined as :
The aforementioned formula endures normalized if it obtains executed to search region with many dimensional. For normalizing it, the search agent and the respective opposite solution is demonstrated as:
The value of every component from is represented as:
At this point, the fitness function can be . When the fitness value of opposite solutions surpass the original solution x, then ; else .
The proposed model is simulated using Python 3.6.5 tool. The proposed model is experimented on PC i5-8600k, GeForce 1050Ti 4GB, 16GB RAM, 250GB SSD, and 1TB HDD. In this section, the sign language recognition performance of the SSODL-ASLR model is tested using a dataset comprising 3600 samples with 36 classes as illustrated in Tab. 1. A few sample American Sign Language Code images are displayed in Fig. 3.
Fig. 4 implies the confusion matrix formed by the SSODL-ASLR model on the applied 70% of training (TR) data. The figure indicated that the SSODL-ASLR model has proficiently recognized all the 36 class labels on 70% of TR data.
The classifier output of the SSODL-ASLR model is derived on 70% of TR data in Tab. 2. The experimental outcomes demonstrated the SSODL-ASLR method has resulted in effectual outcomes over other models. For instance, on class label 1, the SSODL-ASLR model has provided of 99.29%. Moreover, on class label 10, the SSODL-ASLR model has offered of 99.09%. Eventually, on class label 20, the SSODL-ASLR algorithm has rendered of 99.01%. Meanwhile, on class label 36, the SSODL-ASLR algorithm has presented of 99.13%.
Fig. 5 exhibits average classifier outcomes of the SSODL-ASLR model on 70% of TR data. The figure reported that the SSODL-ASLR model has reached effectual classification outcomes with average of 99.05%, of 82.82%, of 99.51%, of 82.81%, and Mathew Correlation Coefficient (MCC) of 83.28%.
Fig. 6 portrays the confusion matrix formed by the SSODL-ASLR method on the applied 30% of testing (TS) data. The figure signifies the SSODL-ASLR technique has proficiently recognized all the 36 class labels on 30% of TS data.
The classifier output of the SSODL-ASLR approach is derived on 30% of TS data in Tab. 3. The experimental outcomes established the SSODL-ASLR technique has resulted in effectual outcomes over other models. For example, on class label 1, the SSODL-ASLR approach has offered of 98.24%. Further, on class label 10, the SSODL-ASLR approach has presented of 98.70%. Eventually, on class label 20, the SSODL-ASLR model has granted of 99.17%. At the same time, on class label 36, the SSODL-ASLR model has offered of 99.35%.
Fig. 7 displays an average classifier outcome of the SSODL-ASLR model on 30% of TS data. The figure reported that the SSODL-ASLR model has reached effectual classification outcomes with average of 98.97%, of 81.48%, of 99.47%, of 81.37%, and MCC of 80.91%.
The training accuracy (TA) and validation accuracy (VA) acquired by the SSODL-ASLR method on test dataset is established in Fig. 8. The experimental outcome inferred the SSODL-ASLR technique has achieved maximal values of TA and VA. Predominantly the VA is greater than TA.
The training loss (TL) and validation loss (VL) attained by the SSODL-ASLR approach on test dataset are established in Fig. 9. The experimental outcome implied the SSODL-ASLR algorithm has exhibited least values of TL and VL. In specific, the VL is lesser than TL.
Tab. 4 exemplifies a comparative assessment of the SSODL-ASLR model with recent models . Fig. 10 reports a brief comparison study of the SSODL-ASLR model with latest methods interms of . The results pointed out that the SSODL-ASLR model has offered higher of 98.97% whereas the SVM, decision tree (DT), k-nearest neighbour (KNN), DNN, LeNet, and multilayer perceptron (MLP) models have exhibited ineffectual performance with lower values.
The results highlight the SSODL-ASLR model has granted higher and of 81.48% and 99.47% whereas the SVM, DT, KNN, DNN, LeNet, and MLP models have displayed ineffectual performance with lower and values. These results and discussion reported that the SSODL-ASLR model has shown effectual performance over other ML and DL models.
In this study, a new SSODL-ASLR technique was introduced for the recognition of sign language for hearing and speaking impaired people. The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign language provided by deaf and dumb people. The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign language classification. Primarily, the Mask RCNN model is exploited for sign language recognition. In the next stage, the SSO algorithm with SM-SVM model is utilized for sign language classification. To assure the enhanced classification performance of the SSODL-ASLR model, a brief set of simulations was carried out. The extensive results portrayed the supremacy of the SSODL-ASLR model over other techniques.
Funding Statement: The author received no specific funding for this study.
Conflicts of Interest: The author declares that he has no conflicts of interest to report regarding the present study.
- M. A. Ahmed, B. B. Zaidan, A. A. Zaidan, M. M. Salih and M. M. bin Lakulu, “A review on systems-based sensory gloves for sign language recognition state of the art between 2007 and 2017,” Sensors, vol. 18, no. 7, pp. 2208, 2018.
- M. Friedner, “Sign language as virus: Stigma and relationality in urban India,” Medical Anthropology, vol. 37, no. 5, pp. 359–372, 2018.
- X. Jiang, S. C. Satapathy, L. Yang, S. -H. Wang and Y. -D. Zhang, “A survey on artificial intelligence in Chinese sign language recognition,” Arabian Journal for Science and Engineering, vol. 45, no. 12, pp. 9859–9894, 2020.
- V. S. Pineda and J. Corburn, “Correction to: Disability, urban health equity, and the coronavirus pandemic: Promoting cities for all,” Journal of Urban Health, vol. 98, no. 2, pp. 308–308, 2021.
- M. Al Duhayyim, H. Mesfer Alshahrani, F. N. Al-Wesabi, M. Abdullah Al-Hagery, A. Mustafa Hilal et al., “Intelligent machine learning based EEG signal classification model,” Computers, Materials & Continua, vol. 71, no. 1, pp. 1821–1835, 2022.
- P. Kumar, R. Saini, P. P. Roy and D. P. Dogra, “A position and rotation invariant framework for sign language recognition (SLR) using Kinect,” Multimedia Tools and Applications, vol. 77, no. 7, pp. 8823–8846, 2018.
- S. A. Qureshi, S. E. Raza, L. Hussain, A. A. Malibari, M. K. Nour et al., “Intelligent ultra-light deep learning model for multi-class brain tumor detection,” Applied Sciences, 2022. https://doi.org/10.3390/app12083715.
- N. S. Khan, A. Abid and K. Abid, “A novel natural language processing (nlp)–based machine translation model for English to Pakistan sign language translation,” Cognitive Computation, vol. 12, no. 4, pp. 748–765, 2020.
- M. Khari, A. K. Garg, R. G. Crespo and E. Verdú, “Gesture recognition of rgb and rgb-d static images using convolutional neural networks,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 7, pp. 22, 201
- S. Shivashankara and S. Srinath, “American sign language recognition system: An optimal approach,” International Journal of Image, Graphics and Signal Processing, vol. 11, no. 8, pp. 18, 2018.
- J. G. Ruiz, C. M. T. González, A. T. Fettmilch, A. P. Roescher, L. E. Hernández et al., “Perspective and evolution of gesture recognition for sign language: A review,” Sensors, vol. 20, no. 12, pp. 3571, 2020.
- N. B. Ibrahim, H. H. Zayed and M. M. Selim, “Advances, challenges and opportunities in continuous sign language recognition,” Journal of Engineering and Applied Science, vol. 15, no. 5, pp. 1205–1227, 2019.
- A. Sharma, N. Sharma, Y. Saxena, A. Singh and D. Sadhya, “Benchmarking deep neural network approaches for Indian sign language recognition,” Neural Computing and Applications, vol. 33, no. 12, pp. 6685–6696, 2021.
- J. J. Bird, A. Ekárt and D. R. Faria, “British sign language recognition via late fusion of computer vision and leap motion with transfer learning to American sign language,” Sensors, vol. 20, no. 18, pp. 5151, 2020.
- D. A. Kumar, A. S. C. S. Sastry, P. V. V. Kishore and E. K. Kumar, “Indian sign language recognition using graph matching on 3D motion captured signs,” Multimedia Tools and Applications, vol. 77, no. 24, pp. 32063–32091, 2018.
- L. N. Zeledón, J. Peral, A. Ferrández and M. C. Rivas, “A systematic mapping of translation-enabling technologies for sign languages,” Electronics, vol. 8, no. 9, pp. 1047, 2019.
- K. Parvez, M. Khan, J. Iqbal, M. Tahir, A. Alghamdi et al., “Measuring effectiveness of mobile application in learning basic mathematical concepts using sign language,” Sustainability, vol. 11, no. 11, pp. 3064, 2019.
- I. P. Borrero, D. M. Santos, M. J. V. Vazquez and M. E. G. Arias, “A new deep-learning strawberry instance segmentation methodology based on a fully convolutional neural network,” Neural Computing & Applications, vol. 33, no. 22, pp. 15059–15071, 2021.
- H. Liang, H. Bai, N. Liu and X. Sui, “Polarized skylight compass based on a soft-margin support vector machine working in cloudy conditions,” Applied Optics, vol. 59, no. 5, pp. 1271, 2020.
- L. Wang, X. Wang, Z. Sheng and S. Lu, “Multi-objective shark smell optimization algorithm using incorporated composite angle cosine for automatic train operation,” Energies, vol. 13, no. 3, pp. 714, 20
- T. Chong and B. Lee, “American sign language recognition using leap motion controller with machine learning approach,” Sensors, vol. 18, no. 10, pp. 3554, 2018.