Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.071469
Special Issues
Table of Content

Open Access

ARTICLE

Syntactic and Socially Responsible Machine Translation: A POS and DEP Integrated Framework for English–Tamil

Rama Sugavanam*, Mythili Ramu
Department of Information Technology, School of Computer Science and Engineering, SRM Institute of Science and Technology, Ramapuram Campus, Chennai, India
* Corresponding Author: Rama Sugavanam. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.071469

Received 06 August 2025; Accepted 06 January 2026; Published online 28 January 2026

Abstract

When performing English-to-Tamil Neural Machine Translation (NMT), end users face several challenges due to Tamil’s rich morphology, free word order, and limited annotated corpora. Although available transformer-based models offer strong baselines, they compromise syntactic awareness and the detection and management of offensive content in cluttered, noisy, and informal text. In this paper, we present POSDEP-Offense-Trans, a multi-task NMT framework that combines Part-of-Speech (POS) and Dependency Parsing (DEP) methods with a robust offensive language classification module. Our architecture enriches the Transformer encoder with syntax-aware embeddings and provides syntax-guided attention mechanisms. The architecture incorporates a structure-aware contrastive loss that reinforces syntactic consistency and deploys auxiliary classification heads for POS tagging, dependency parsing, and multi-class offensive detection. The classifier for offensive words operates at both sentence and token levels and obtains guidance from syntactic features and formal finite automata rules that model offensive language structures-hate speech, profanity, sarcasm, and threats. Using this architecture, we construct a syntactically enriched, socially annotated corpus. Experimental results show improvements in translation quality, with a BLEU score of 33.5, UAS/LAS parsing accuracies of 92.4% and 90%, and a 4.5% F1-score gain in offensive content detection compared with baseline POS + DEP + Offense models. Also, the proposed model achieved 92.3% in offensive content neutralization, as confirmed by ablation studies. This comprehensive English–Tamil NMT model that unifies syntactic modelling and ethical filtering—laying the groundwork for applications in social media moderation, hate speech mitigation, and policy-compliant multilingual content generation.

Keywords

POS-aware NMT; dependency parsing; syntax-guided attention; multi-task learning; offensive language detection; offensive language neutralization; English–Tamil neural machine translation
  • 47

    View

  • 7

    Download

  • 0

    Like

Share Link