Indian Sign Language Alphabet Recognition and Speech Synthesis Using a Hybrid Deep Learning Approach

Downloads

Authors

  • Aswani Sivan Bharathiar University, India
  • Chandra Eswaran Bharathiar University, India

Abstract

Indian Sign Language (ISL) is vital for communication among India’s hearing-impaired community. However, the lack of standardised datasets and reliable identification frameworks has hampered the use of ISL in modern assistive technology. This paper presents a deep learning-based solution to robust ISL alphabet identification, with an emphasis on both accuracy and practical use. A curated static ISL alphabet collection was created by combining authoritative visual references from the official Indian Sign Language website
and the Ramakrishna Mission Vivekananda Educational and Research Institute (RKMVERI). Multiple deep learning models were trained and assessed, including CNN, ResNet-50, DenseNet-121, VGG16, MobileNetV2, and EfficientNet-B0, with a new hybrid CNN-ResNet architecture outperforming the others. 98% classification accuracy is achieved by the suggested approach, outperforming individual baseline models. Furthermore, the framework is expanded to support real-time applications, combining webcam-based capture with immediate conversion of recognized signs to textual and synthesized vocal output. A comprehensive performance evaluation, including the confusion matrix analysis and ROC curves, demonstrates the solution’s durability and practical applicability. This research enhances accessibility, promotes inclusive education, and prepares the path for scalable sign language translation systems in real-world human-machine interaction scenarios by enabling accurate and real-time ISL recognition with voice feedback.

Keywords:

Indian Sign Language, deep learning, CNN–ResNet hybrid model, real-time recognition, text-to-speech, assistive technology

References


  1. Amangeldy N., Kudubayeva S., Kassymova A., Karipzhanova A., Razakhova B., Kuralov S. (2022), Sign language recognition method based on palm definition model and multiple classification, Sensors, 22(17): 6621, https://doi.org/10.3390/s22176621

  2. Ashwanth B., Ventrapragada S.B., Prodduturi S.R., Depa J.R., Sharma K.V. (2023), Vision-based hand gesture recognition for Indian Sign Language using convolution neural network, International Journal of Computer Engineering in Research Trends, 10(1): 1–9, https://doi.org/10.22362/ijcert/2023/v10/i01/v10i0101

  3. Chollet F. (2017), Xception: Deep learning with depthwise separable convolutions, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258, https://doi.org/10.1109/CVPR.2017.195

  4. Damdoo R., Kumar P. (2025), An integrative survey on Indian sign language recognition and translation, IET Image Processing, 19(1): e700, https://doi.org/10.1049/ipr2.70000

  5. Gogoi P., Karsh B., Karsh R.K., Laskar R.H., Bhuyan M.K. (2025), Vision-based real-time gesture-to-speech translation for sign language gestures, Procedia Computer Science, 258: 2050–2059, https://doi.org/10.1016/j.procs.2025.04.455

  6. Govindharajalu Kaliyaperumal V., Gopalan P.A. (2025), A deep neural network framework for dynamic twohanded Indian Sign Language recognition in hearing and speech-impaired communities, Sensors, 25(12): 3652, https://doi.org/10.3390/s25123652

  7. Gupta S., Bindal A.K., Dasmana G., Shrivastva A., Sharma A. (2025), Hybrid CNN-LSTM framework for real-time deepfake detection with spatio-temporal analysis, [in:] 2025 IEEE International Conference on Smart Power, Energy, Renewables, and Transportation (SPERT), https://doi.org/10.1109/SPERT67079.2025.11469733

  8. He K., Zhang X., Ren S., Sun J. (2016), Deep residual learning for image recognition, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, https://doi.org/10.1109/CVPR.2016.90

  9. Houtsma A.J.M. (2007), Experiments on pitch perception: Implications for music and other processes, Archives of Acoustics, 32(3): 475–490.

  10. Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. (2017), Densely connected convolutional networks, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, https://doi.org/10.1109/CVPR.2017.243

  11. Indian Sign Language Research and Training Center (n.d.), Official ISL Dictionary, https://divyangjan.depwd.gov.in/islrtc/

  12. International Organization for Standardization (1998), Acoustics – Determination of acoustic properties in impedance tubes. Part 2: Two-microphone technique (Standard ISO No. 10534-2:1998), https://www.iso.org/standard/81294.html

  13. Karamanli A., Aydogdu M. (2019), Buckling of laminated composite beams due to varying in-plane loads, Composite Structures, 210: 391–408, https://doi.org/10.1016/j.compstruct.2018.11.067

  14. Kingma D.P., Ba J. (2015), Adam: A method for stochastic optimization, [in:] International Conference on Learning Representations (ICLR).

  15. Koller O. (2020), Quantitative survey of the state of the art in sign language recognition, https://doi.org/10.48550/arXiv.2008.09918

  16. Kraśkiewicz C. et al. (2024), Field experiment as a tool to verify the effectiveness of prototype track structure components aimed at reducing railway noise nuisance, Archives of Acoustics, 49(1): 61–71, https://doi.org/10.24425/aoa.2024.148770

  17. Mistry P., Jotaniya V., Patel P., Patel N., Hasan M. (2021), Indian sign language recognition using deep learning, [in:] 2021 International Conference on Artificial Intelligence and Machine Vision (AIMV), https://doi.org/10.1109/AIMV53313.2021.9670933

  18. Nandi U., Ghorai A., Marjit Singh M., Changdar C., Bhakta S., Pal R.K. (2022), Indian Sign Language alphabet recognition system using CNN with diffGrad optimizer and stochastic pooling, Multimedia Tools and Applications, 82(7): 9627–9648, https://doi.org/10.1007/s11042-021-11595-4

  19. Pandey S., Tahseen S., Pathak R., Parveen H., Maurya M. (2025), Real-time vision-based Indian Sign Language translation using deep learning techniques, International Journal of Innovative Research in Computer Science and Technology, 13(3): 38–44, https://doi.org/10.55524/ijircst.2025.13.3.6

  20. Pisharady P.K., Saerbeck M. (2015), Recent methods and databases in vision-based hand gesture recognition: A review, Computer Vision and Image Understanding, 141: 152–165, https://doi.org/10.1016/j.cviu.2015.08.004

  21. Qahtan S., Alsattar H.A., Zaidan A.A., Deveci M., Pamucar D., Martinez L. (2023), A comparative study of evaluating and benchmarking sign language recognition system-based wearable sensory devices using a single fuzzy set, Knowledge-Based Systems, 269: 110519, https://doi.org/10.1016/j.knosys.2023.110519

  22. Ramakrishna Mission Vivekananda Educational and Research Institute (n.d.), Indian Sign Language Dictionary Dataset.

  23. Rastgoo R., Kiani K., Escalera S. (2021), Sign language recognition: A deep survey, Expert Systems with Applications, 164: 113794, https://doi.org/10.1016/j.eswa.2020.113794

  24. Saini B., Venkatesh D., Chaudhari N., Shelake T., Gite S., Pradhan B. (2023), A comparative analysis of Indian Sign Language recognition using deep learning models, Forum for Linguistic Studies, 5(1): 197–222, https://doi.org/10.18063/fls.v5i1.1617

  25. Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L.C. (2018), MobileNetV2: Inverted residuals and linear bottlenecks, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, https://doi.org/10.1109/CVPR.2018.00474

  26. Sanjusaran K., Shakthipriyan S., Supreetraju RU. (2024), A real time Indian Sign Language recognition using tensorflow, International Journal of Engineering Research and Sustainable Technologies (IJERST), 2(4): 26–33, https://doi.org/10.63458/ijerst.v2i4.98

  27. Sharma S., Singh S. (2022), Recognition of Indian Sign Language (ISL) using deep learning model, Wireless Pers Communications, 123: 671–692, https://doi.org/10.1007/s11277-021-09152-1

  28. Simonyan K., Zisserman A. (2015), Very deep convolutional networks for large-scale image recognition, [in:] International Conference on Learning Representations (ICLR).

  29. Srivastava S., Singh S., Pooja, Prakash S. (2024), Continuous sign language recognition system using deep learning with MediaPipe Holistic, Wireless Personal Communications, 137: 1455–1468, https://doi.org/10.1007/s11277-024-11356-0

  30. Tan S., Khan N., An Z., Ando Y., Kawakami R., Nakadai K. (2024), A review of deep learning-based approaches to sign language processing, Advanced Robotics, 38(23): 1649–1667, https://doi.org/10.1080/01691864.2024.2442721

  31. Tan M., Le Q. (2019), EfficientNet: Rethinking model scaling for convolutional neural networks, [in:] Proceedings of the International Conference on Machine Learning, 97: 6105–6114.

  32. Zhou H., Zhou W., Zhou Y., Li H. (2021), Spatial-temporal multi-cue network for sign language recognition and translation, IEEE Transactions on Multimedia, 24: 768–779, https://doi.org/10.1109/TMM.2021.3059098