Effective Token Masking Augmentation Using Term-Document Frequency for Language Model-Based Legal Case Classification

Ye-Chan Park¹, Mohd Asyraf Zulkifley², Bong-Soo Sohn³, Jaesung Lee^4,*
1 Department of Artificial Intelligence, Chung-Ang University, Seoul, 06974, Republic of Korea
2 Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia
3 School of Computer Science and Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
4 AI/ML Innovation Research Center, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
* Corresponding Author: Jaesung Lee. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.074141

Received 03 October 2025; Accepted 20 November 2025; Published online 19 December 2025

Download PDF

Abstract

Legal case classification involves the categorization of legal documents into predefined categories, which facilitates legal information retrieval and case management. However, real-world legal datasets often suffer from class imbalances due to the uneven distribution of case types across legal domains. This leads to biased model performance, in the form of high accuracy for overrepresented categories and underperformance for minority classes. To address this issue, in this study, we propose a data augmentation method that masks unimportant terms within a document selectively while preserving key terms from the perspective of the legal domain. This approach enhances data diversity and improves the generalization capability of conventional models. Our experiments demonstrate consistent improvements achieved by the proposed augmentation strategy in terms of accuracy and F1 score across all models, validating the effectiveness of the proposed method in legal case classification.

Keywords

Legal case classification; class imbalance; data augmentation; token masking; legal NLP

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

539

View
210

Download
0

Like

Crops Leaf Diseases Recognition: A Framework of Optimum Deep Learning Features
Shafaq Abbas, Muhammad Attique...
Image-Based Automatic Energy Meter Reading Using Deep Learning
Muhammad Imran, Hafeez Anwar,...
Deep Learning-based Environmental Sound Classification Using Feature Fusion and Data Enhancement
Rashid Jahangir, Muhammad Asif...
A Deep Learning for Alzheimer’s Stages Detection Using Brain Images
Zahid Ullah, Mona Jamjoom
Automated Deep Learning Based Melanoma Detection and Classification Using Biomedical Dermoscopic Images
Amani Abdulrahman Albraikan, Nadhem...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Effective Token Masking Augmentation Using Term-Document Frequency for Language Model-Based Legal Case Classification

Abstract

Keywords

539

210

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link