Automating the Initial Development of Intent-Based Task-Oriented Dialog Systems Using Large Language Models: Experiences and Challenges

Ksenia Kharitonova¹, David Pérez-Fernández², Zoraida Callejas^1,3, David Griol^1,3,*
1 Department of Software Engineering, University of Granada, Granada, Spain
2 Department of Mathematics, Universidad Autónoma de Madrid, Madrid, Spain
3 Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, Granada, Spain
* Corresponding Author: David Griol. Email: email
(This article belongs to the Special Issue: Security and Robustness of Large Language Models (LLMs))

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.075777

Received 08 November 2025; Accepted 30 December 2025; Published online 14 January 2026

Download PDF

Abstract

Building reliable intent-based, task-oriented dialog systems typically requires substantial manual effort: designers must derive intents, entities, responses, and control logic from raw conversational data, then iterate until the assistant behaves consistently. This paper investigates how far large language models (LLMs) can automate this development. In this paper, we use two reference corpora, Let’s Go (English, public transport) and MEDIA (French, hotel booking), to prompt four LLM families (GPT-4o, Claude, Gemini, Mistral Small) and generate the core specifications required by the rasa platform. These include intent sets with example utterances, entity definitions with slot mappings, response templates, and basic dialog flows. To structure this process, we introduce a model- and platform-agnostic pipeline with two phases. The first normalizes and validates LLM-generated artifacts, enforcing cross-file consistency and making slot usage explicit. The second uses a lightweight dialog harness that runs scripted tests and incrementally patches failure points until conversations complete reliably. Across eight projects, all models required some targeted repairs before training. After applying our pipeline, all reached ≥70% task completion (many above 84%), while NLU performance ranged from mid-0.6 to 1.0 macro-F1 depending on domain breadth. These results show that, with modest guidance, current LLMs can produce workable end-to-end dialog prototypes directly from raw transcripts. Our main contributions are: (i) a reusable bootstrap method aligned with industry domain-specific languages (DSLs), (ii) a small set of high-impact corrective patterns, and (iii) a simple but effective harness for closed-loop refinement across conversational platforms.

Keywords

Task-oriented dialog systems; large language models (LLMs); RASA; dialog automation; natural language understanding (NLU); slot filling; conversational AI; human-in-the-loop NLP

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

202

View
37

Download
0

Like

Enhancing ChatGPT’s Querying Capability with Voice-Based Interaction and CNN-Based Impair Vision Detection Model
Awais Ahmad, Sohail Jabbar, Sheeraz...
Enhancing Relational Triple Extraction in Specific Domains: Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models
Jiakai Li, Jianpeng Hu, Geng Zhang
Improving Machine Translation Formality with Large Language Models
Murun Yang, Fuxue Li
LEGF-DST: LLMs-Enhanced Graph-Fusion Dual-Stream Transformer for Fine-Grained Chinese Malicious SMS Detection
Xin Tong, Jingya Wang, Ying Yang,...
A Critical Review of Methods and Challenges in Large Language Models
Milad Moradi, Ke Yan, David Colwell,...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Automating the Initial Development of Intent-Based Task-Oriented Dialog Systems Using Large Language Models: Experiences and Challenges

Abstract

Keywords

202

37

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link