Open Access
ARTICLE
TATA: A Trust-Aware Task-Oriented Agent Framework for Industrial Intelligence Scenarios
1 School of Computer Science and Engineering, Northeastern University, Shenyang, China
2 Neusoft Research Institute, Neusoft Group, Shenyang, China
3 School of Information Science and Engineering, Shenyang Ligong University, Shenyang, China
* Corresponding Author: Yingyou Wen. Email:
Computers, Materials & Continua 2026, 88(2), 78 https://doi.org/10.32604/cmc.2026.083087
Received 29 March 2026; Accepted 14 May 2026; Issue published 15 June 2026
Abstract
The rapid advancement of edge intelligence in Industrial Internet of Things (IIoT) is transforming human–computer interaction from conventional “command execution” to complex “human–AI deep collaboration”. Within such safety-critical industrial environments, establishing robust mutual understanding and trust mechanisms becomes a significant prerequisite for decision reliability and efficiency. However, existing industrial interaction systems predominantly focus on task progression and explicit command responses, lacking fine-grained, dynamic tracking of operators’ trust states, cognitive evolution, and behavioral dynamics. Moreover, current LLM-based user simulation in evaluation often exhibit an “over-cooperation” bias, failing to capture the cognitive conflicts and trust crises characteristic of high-pressure, high-risk industrial conditions. To address these challenges, we first propose a trust-aware user behavior model, which utilizes an LLM-parameterized Hidden Markov Model (HMM) to formalize collaborative trust as a dynamic latent variable, thereby structurally characterizing the psychological and behavioral dynamics of operators across multi-turn interactions. Building on this, we introduce TATA, a task-oriented agent framework integrating trust-awareness and cognitive alignment. Through a dual-track state monitoring mechanism and adaptive interaction policy coordination, TATA effectively advances collaborative tasks and fosters relationship maintenance in realistic collaborative environments. Comprehensive evaluations on six industrial task scenarios demonstrates that TATA achieves an optimal balance between collaboration depth and task efficiency, outperforming the strongest baseline by achieving 1.6 to 2.6 times higher collaboration efficiency and an absolute increase of over 15 percentage points in task completion rate. These findings provide valuable insights for developing resilient and adaptive deep human-AI collaboration tailored to IIoT scenarios.Keywords
The rapid convergence of AI and the Industrial Internet of Things (IIoT) is fundamentally transforming modern industrial systems, enabling intelligent monitoring, predictive maintenance, and data-driven decision-making across smart factories, power grids, and logistics networks [1]. At the center of this transformation, AI agents powered by Large Language Models (LLMs) [2,3] have emerged as a critical operational engine, supporting complex industrial decision-making and task execution through natural language interaction [4]. This shift is driving a transition in human–machine interaction (HMI) paradigms, moving from traditional rule-based command matching toward conversational human–AI collaboration (HAC) systems. Under this emerging interaction mode, operators can engage in multi-turn dialogue with AI agents to collaboratively address highly specialized industrial tasks through iterative communication rather than one-shot instruction execution [2,3,5].
Nevertheless, the effective operation of current AI-driven industrial collaboration systems fundamentally relies on the assumption that operators are able to provide clear and sufficiently detailed task specifications. This prerequisite often fails in dynamic industrial scenarios that require exploratory problem-solving and domain expertise [6]. In practice, when confronted with unexpected situations, industrial operators can typically observe only preliminary symptoms of a problem. Despite possessing extensive tacit experience, they often struggle to fully articulate precise system requirements, potential constraints, and underlying logic at the early stages of a task [7].
Consider a real-world industrial scenario: an operator is troubleshooting an unexpected temperature anomaly in a chemical reactor. Under the traditional command-execution interaction mode, the operator must diagnose the exact cause by themselves and issue explicit commands (e.g., Check the status of cooling valve V-102.). This places the entire cognitive burden on the human operator, who may not have access to complete real-time monitoring data. In contrast, in a collaborative dialogue scenario, the AI agent proactively engages in joint problem-solving. It can ask exploratory questions (e.g., Has the coolant flow rate changed in the past hour?) and adjust its subsequent dialogue policy based on the operator’s responses (e.g., I’m not sure if that’s the issue, but I recall encountering a similar situation last month.). This collaborative mode reduces the operator’s cognitive load and builds problem localization and shared understanding through multi-turn interactions, which are the capabilities that are critically important in industrial edge computing environments with extremely high requirements for safety and real-time performance. If an AI agent engages in “premature execution” or makes misaligned decisions based on incomplete information, it will not only waste the already limited computational and communication resources but also trigger cascading operational risks. Ultimately, this form of ineffective (or even harmful) interaction can rapidly erode operators’ trust in AI systems [8,9], thereby directly undermining the adoption and effectiveness of HAC decision-making.
To address these challenges, recent research has begun to explore deep human–AI collaboration mechanisms from a cognitive science perspective [6,7]. Based on the Common Ground theory [10,11], Li et al. proposed the Cognitive Collaborative Dialogue (CCD), a task that emphasizes that the user’s understanding of the problem and the cognition of the solution co-evolve through the interaction process between both parties. Through collaborative dialogue, AI agents can gradually guide users to jointly construct a shared understanding of the problem space and ultimately generate customized solutions. This insight aligns closely with the requirements of IIoT scenarios. In complex industrial decision-making, it is often necessary to deeply integrate AI-analyzed data insights and collected industrial environment information with the tacit expertise of human experts. This “cognitive alignment” is a gradual process that requires the continuous establishment of mutual trust and consensus, rather than a one-time execution of instructions.
Although the CCD framework has demonstrated effectiveness in collaborative interaction and task completion at a macro level, existing approaches lack comprehensive modeling and fine-grained tracking the dynamic trust and behavioral evolution of both human and agents during dialogue. Specifically, current practices in evaluating multi-turn interaction systems and constructing user simulators predominantly rely on mechanical behavior switching [12], predefined information disclosure strategies [13], or static persona templates [14], making it difficult to capture the psychological dynamics of real human users as they evolve with content and collaboration depth throughout the conversation. More critically, influenced by RLHF in model pretraining, LLM-driven user simulators often exhibit an excessively cooperative attitude [15], tending to accommodate any guidance from the agent, and rarely displaying genuine resistance, skepticism, or emotional fluctuations even when cognitive conflicts or trust decline occur. This idealized simulating behavior obscures the trust crises and collaboration barriers that agents may encounter in real-world applications. Simultaneously, AI agents themselves lack explicit mechanisms to perceive and respond to cognitive shifts (e.g., cognitive evolution, adjustment, or conflict) during each interaction turn. Under such settings, it becomes difficult to simulate authentic HAC processes, ultimately leading to overestimated evaluation results and creating “fake success” in collaborative effectiveness.
To bridge this critical gap, we tackle the aforementioned challenges by focusing on both user modeling and agent design, with a focus on achieving robust HAC in complex industrial decision-making environments. As shown in Fig. 1, we first propose a trust-aware user behavior model. This method formalizes users’ “collaborative trust level” during dialogue as dynamic latent variables within an LLM-parameterized Hidden Markov Model (HMM), thereby structurally characterizing the psychological and behavioral dynamics encountered in real-world interactions. Based on these insights, we present TATA, a task-oriented agent framework integrating trust-awareness and cognitive alignment. TATA comprises three core modules: a dialogue trajectory planner, a dual-track state monitor, and an adaptive interaction coordinator. The planner forecasts future dialogue trajectories; the monitor jointly assesses task-level cognitive development and relationship-level affective vigilance signals. Guided by a partially observable Markov decision process (POMDP), the coordinator leverages these dual-track observations to dynamically adjust interaction polices, balancing task advancement and trust maintenance throughout the collaboration process. Additionally, we further introduce a novel set of targeted trust dynamics evaluation metrics for more fine-grained assessment of HAC quality throughout the interaction.

Figure 1: Overview of the whole interaction framework. (1) Trust-aware user modeling (left): using an LLM-driven HMM to model users’ trust dynamics and behavior based on internal state and dialogue history; (2) TATA framework (right): featuring a dual-track state monitor for tracking task and emotional states, and an POMDP-based coordinator to update belief states and select optimal policies for response generation and trajectory regulation.
In summary, our main contributions are as follows:
• We propose a trust-aware user behavior modeling approach that represents collaborative trust as a dynamic latent variable, establishing a rigorous and practically grounded methodological basis for evaluating HAC in industrial decision-making scenarios.
• We develop the TATA agent framework, which integrates dual-track information fusion and adaptive policy regulation mechanisms to enhance collaboration capabilities in dynamic industrial environments.
• We introduce a process-oriented trust dynamics evaluation system, offering new insights into the deployment of trustworthy AI collaboration systems in industrial intelligence environments.
2.1 Task-Oriented Dialogue Agents and Industrial Applications
Task-oriented dialogue systems are transitioning from traditional pipeline architectures to LLM-driven agents [16,17]. Current studies primarily leverage LLMs to enhance task planning [18,19], dialogue state tracking [20], and tool invocation [21,22] to improve execution efficiency. However, most such agents typically employ self-iterative unidirectional mechanisms that lack user engagement, rendering them susceptible to failure when user information proves insufficient or ambiguous [23]. Consequently, proactive agents [24] have emerged as a prominent research focus. Moving beyond passive response patterns, these agents optimize interactions by actively clarifying ambiguities, guiding the conversational flow, and preemptively proposing solutions. Representative approaches include specialized model training [25], Chain-of-Thought (CoT) reasoning [26], and knowledge graph inference [27].
Regarding practical applications, the capabilities of LLMs and agents have catalyzed an expansion of task-oriented dialogue agents from everyday domains to complex industrial production [28]. Under the “Human-Centric” paradigm of Industry 5.0, human-machine interaction within IIoT environments is evolving from conventional unidirectional static command parsing toward dynamic, mixed-initiative decision-making and deep collaboration [29,30]. To address the challenges posed by high-noise, highly dynamic environments with stringent fault-tolerance costs, recent research prioritizes intent recognition and mixed-initiative dialogue mechanisms in complex scenarios to enable more flexible task allocation and execution [31–33]. For instance, latest studies have proposed a mixed-initiative dialogue system that dynamically assesses human collaborative willingness based on historical interactions to optimize human-machine task allocation [33]. Additionally, for high-risk industrial collaborative scenarios, researchers have introduced dialogue-based explainable safety mechanisms designed to communicate underlying safety constraints and operational logic transparently to operators, thereby maintaining shared situational awareness during task interruption and recovery [34].
Nevertheless, existing proactive interaction strategies exhibit significant limitations. Most studies still conceptualize humans as static instruction sources or assume that users exhibit perfect cooperation during evaluation [33]. Consequently, the proactivity of current systems remains largely confined to few-turn information acquisition and slot filling, lacking mechanisms for dynamic trust tracking, goal negotiation, and cognitive alignment over extended task durations [35]. These constraints substantially limit their practical applicability in authentic industrial intelligence scenarios.
User simulation serves as a critical technique for generating interactive dialogue data, widely employed for evaluating and training conversational agents [36]. Compared with conventional user simulators, LLM-based approaches exhibit superior cross-domain adaptability and behavioral diversity [37,38]. LLMs emulate users through persona simulation [14], role-playing [39], and individual behavior modeling [40]. Typical research approaches primarily encompass prompt engineering [41], retrieval-augmented generation [42], efficient fine-tuning [43], and reinforcement learning with preference optimization [44], among others. However, most of these methods rely on static user profiles or or are confined to the task objectives themselves, overlooking the dynamic cognitive processes and collaborative participation mechanisms inherent in human-AI interaction. Furthermore, existing LLM-based user simulators may inherently exhibit an “over-cooperation” or “sycophancy” bias [15,45], tending to overly accommodate the other party and avoid cognitive conflicts. In user simulation scenarios for evaluating human-AI collaboration, an overly compliant simulator fails to reproduce the cognitive conflicts, skepticism, or trust crises that may arise during real collaborative processes, leading to inflated evaluation results, which is the problem that our proposed approach explicitly addresses.
As a significant extension of traditional Task-Oriented Dialogue (TOD) in complex scenarios, Cognitive Collaborative Dialogue (CCD) is proposed to address tasks characterized by high exploration, professional expertise, and personalization [6]. Unlike conventional TOD systems that provide predetermined answers or mechanically execute “slot-filling” queries, CCD necessitates multi-turn progressive interactions between agents and users to co-construct shared understanding of underspecified problems [13].
Formally, a CCD task with
However, a notable challenge exists in basic CCD task settings: users’ internal trust states and cognitive evolution are implicit and unobservable. As a result, existing frameworks tend to advance tasks in a mechanical manner, lacking mechanisms to perceive and resolve implicit cognitive conflicts. More importantly, most existing evaluation environments rely on static user simulators that may exhibit a tendency toward “over-cooperation” [15], further obscuring the trust crises that could arise in real HAC.
Therefore, to authentically assess and address these issues, it is imperative to first model real user behaviors dynamically within the CCD context (see Section 3.2), and subsequently design agent frameworks capable of operating effectively in such complex and dynamic environments (see Section 3.3). The Overview of the trust-aware human-AI collaborative dialogue framework is shown in Fig. 1.
3.2 Trust-Aware User Behavior Modeling
Effective collaboration relies on the co-evolution of cognitive states between dialogue participants and the establishment of trust in this process. [10,46]. However, conventional user modeling approaches often oversimplify user behavior by employing static persona templates or random information disclosure strategies, neglecting the reality that user actions in real interactions are dynamic manifestations of evolving internal trust and cognitive states [7,13,14].
In highly exploratory collaboration scenarios, “trust” is not a static value but a complex cognitive process that dynamically evolves through multidimensional interactions [47]. Based on Trajectory Epistemic Network Analysis (T-ENA) conducted on real human subjects interacting with conversational agents1 [48], Li et al. demonstrated that human trust manifests interdependently through analytic processes (e.g., systematic evaluation and scrutiny) and affective processes (e.g., emotional experiences).
Drawing on these theoretical and empirical foundations, we propose a trust-aware user behavior modeling approach (Fig. 1-Left) that formalizes the user’s “cognition–trust–behavior” process during multi-turn collaboration as an LLM-parameterized Hidden Markov Model (HMM), denoted by
Notably, distinct from trustworthiness as an intrinsic property of AI models [49], collaborative trust in this study is defined as the operator’s dynamic, latent cognitive-affective state. This state reflects their willingness to rely on the AI agent, disclose task-relevant information to it, and follow its guidance during multi-turn collaborative problem-solving in industrial scenarios. Collaborative trust is a relational construct that co-evolves through the interaction process. Formally, it is defined as a discrete hidden state variable
• Negative Collaboration (
• Neutral Collaboration (
• Positive Collaboration (
While the classic HMM assumes a time-invariant state transition matrix (
This conditional transition function is approximated deterministically by the LLM according to the following criteria:
• Trust Increase: The agent accurately uncovers implicit elements within
• Trust Decrease: The agent exhibits forgetfulness (e.g., repeatedly asking for information already provided), logical inconsistencies, or makes suggestions that conflict with previously established user preferences.
• Trust Maintenance: Routine transitional exchanges or stable information sharing.
The LLM infers the value
Furthermore, existing research [48,51,52] indicates a systematic decoupling between latent trust states and explicit behavioral manifestations due to the influence of cognitive load, emotion regulation, or habitual communication patterns. Therefore, we introduce a parametric noise model for the emission process linking the user’s intrinsic trust states to their overt behaviors. With probability
where
In summary, the complete dialogue process with
This generative process yields a complete user utterance sequence

In the aforementioned complex, dynamic real-world user environments, the asymmetry between “task progression” and “user state” poses significant challenges for conventional agents to perceive and handle cognitive conflicts during interactions, which can easily lead to a breakdown in trust and failure of collaboration. To address this issue, we propose the TATA agent framework. This framework is grounded in the dynamic evolution of user trust observed in authentic human-AI interactions and aims to enable deep collaborative dialogues through mechanisms for bidirectional cognitive alignment and trust maintenance.
TATA comprises three core modules (Fig. 1-Right): the Trajectory Planner, the Dual-track State Monitor, and the Adaptive Interaction Coordinator. Among these, the Trajectory Planner is responsible for establishing the high-level task orientation. In this work, we adopt the construction methodology proposed in [6]. Specifically, given the user’s initial goal (
Here,
On the other hand, the Cognitive Monitor and Interaction Coordinator operate in tandem, utilizing a real-time dual-track “cognitive & affective” monitoring mechanism and adaptive policy regulation. This enables the agent to dynamically adjust the dialogue trajectory and select appropriate interaction policies, thereby facilitating effective task completion and fostering deep collaboration.
3.3.1 Dual-Track State Monitor
In exploratory cognitive collaboration, a user’s internal trust state and cognitive evolution are inherently unobservable. As discussed in Section 3.2, the user’s true trust level is modeled as a latent variable, while observable behaviors constitute only noisy and indirect manifestations of this state. If the agent generates responses solely based on the surface semantics of user utterances, it becomes difficult to detect cognitive shifts and trust fluctuations that have already occurred but have not yet been explicitly expressed.
To address this limitation, we design a “dual-track state monitor” that simultaneously extracts signals from real-time working memory and the user utterance at each turn along two dimensions: cognitive development
• Consistency or Evolution (
• Alteration (
• Conflict (
Concurrently, the emotional vigilance state
These two states jointly provide a real-time assessment of the current task status and the interpersonal relationship within the dialogue. The above process can be formally expressed as:
where
3.3.2 Adaptive Interaction Coordinator
Based on the external observations and perceptions of the user’s true trust state and dialogue state provided by the Monitor module, the interaction coordinator’s policy decision can be essentially formulated as a sequential decision-making problem under uncertainty. To achieve optimal interaction control, we adopt the theoretical perspective of the Partially Observable Markov Decision Process (POMDP) to structure this process. Specifically, we employ the POMDP tuple
Here, the hidden state space
Since the agent cannot directly access the true hidden state, the Coordinator maintains a belief state distribution
Given that TATA operates within an open-ended natural language space where computing high-dimensional integrals is computationally intractable, we employ the powerful contextual reasoning capabilities of LLMs to perform a parametric approximation of this process. Specifically, the LLM takes as input the dialogue history
In the TATA framework, the reward
• Task Advancement (
• Emotional Empathy (
• Intent Transparency (
• Global Breakdown (
Furthermore, the coordinator dynamically selects the most appropriate interaction policy
Notably, the proposed framework does not aim to compute an exact solution to the POMDP. Rather, it serves as a structural constraint that guides the coordinator’s adaptive decision-making process. It clearly defines the state space the agent needs to track, the boundaries of available policies, and the theoretical foundation for decision optimization. Under these guidance, the coordinator leverages the reasoning capabilities of the LLM as an approximate solver to perform trust maintenance and policy selection.
Through this mechanism, the dialogue agent treats the user as a collaborative partner with complex internal states, enabling dynamic calibration of interaction policies during the conversation and thereby supporting real bidirectional cognitive collaboration.
Building upon prior research on complex task-oriented dialogue [6,54], this study adopts a dialogue generation method for comparative evaluation and incorporates several competitive baseline methods to address such scenarios. We adapt the scenario selection and user profiling methodologies from CCD to industrial intelligence applications. Drawing upon established researches, we select six representative experimental environments, including: (1) intelligent logistics route planning, (2) equipment fault diagnosis, (3) production resource allocation, (4) energy management and scheduling, (5) quality control and defect detection, and (6) supply chain risk assessment [28,55,56]. Each scenario exhibits high exploratory complexity, domain-specific, and dynamic constraint satisfaction, which require operators and agents to construct mutual understanding through multi-turn dialogue rather than single-turn instruction execution. For each domain, we develop 15 industrial operator profiles representing varying experience levels and cognitive states, yielding 90 instances in total. These profiles are designed to reflect authentic role characteristics and cognitive diversity observed in real-world industrial settings (see Appendix A for the example profile).
To realistically simulate the psychological dynamics of human operators engaged in HAC at the network edge, and to mitigate the widely observed “over-cooperation” bias associated with using standard LLMs as user simulators, we implement the proposed trust-aware user (operator) behavior model, as detailed in Section 3.2. The specific parameter settings and execution logic are summarized as follows:
Initial Distribution and Emission Probability: During the dialogue initialization phase, we establish the user’s prior trust state distribution as
Behavior Sampling and Dialogue Generation: In each dialogue turn
To further enhance the authenticity and exploratory capacity of the interaction, we introduce additional “alteration noise” during dialogues. Specifically, when the user’s trust state
We compare our method against five representative baselines: (1) ReAct [57], an agent design paradigm enabling autonomous “think-act-observe” cycles that integrate logical reasoning with tool utilization to accomplish complex tasks; (2) Proactive [26], an agent framework employing proactive methodologies that affords two alternative actions—seeking clarification or maintaining inaction; (3) ProCoT [26], a strategy enhancing LLM proactivity by incorporating reasoning and planning steps into prompts, thereby generating descriptive deliberations requisite for decision-making and response generation; (4) Direct-Prompting [6], a robust baseline wherein the LLM is directly prompted to engage in collaborative dialogue; and (5) CoCo-Agent [6], an advanced cognitive collaboration framework featuring trajectory planning, interaction coordination, and monitoring regulation capabilities.
In the experiments, all aforementioned methods are evaluated through interactions with the proposed trust-aware user simulator. Furthermore, to assess the performance of TATA under conventional user modeling settings, we additionally employed the “hybrid user simulator” introduced in [6]. In each dialogue turn, this simulator randomly selects either an proactive or passive behavior mode with equal probability to generate responses, thereby serving as a robustness stress test to emulate highly dynamic and unpredictable conversational environments. Consequently, this simulator is also employed as a comparative baseline.
In this study, we strictly follow the setup of [6] for both the user simulator and the evaluation judge, employing GPT-4.1 as the backbone model in each case (temperature = 0). This choice intentionally aligns with the original CCD experimental protocol to control confounding factors, ensuring that any performance gains observed are attributable to the proposed framework rather than differences in the underlying LLMs.
Although using the same backbone model for simulation and evaluation may theoretically introduce self-preference bias, we mitigate this potential dependency and ensure generalization through three mechanisms: (1) Multi-variant evaluation: We evaluate all methods using Qwen-2.5-72B [58] and GPT-4.1 [59] as backbone models with temperature set to 0.6; (2) Human verification: We manually audit a randomly 5% subset of GPT-4.1 evaluation outputs that pass validity checks to ensure scoring reliability; (3) Statistical robustness: Dialogues are generated following the procedure in Algorithm 1 and §4.1. Specifically, a user profile is randomly sampled from the 90 available user profiles and input to the user simulator, which then interacts with the agent until the agent produces a final solution or the dialogue reaches 30 turns. Using this procedure, we generate 200 dialogues, from which 100 are randomly sampled, and all metrics are averaged over three iterations. This generation process is consistent with the settings in [6].
Notably, the trust state (
To comprehensively evaluate the performance of agents in dynamic cognitive collaboration, we adopt the integrated evaluation framework of CCD tasks, which assesses both the quality of cognitive collaboration and task completion capability. Additionally, to capture the dynamic evolution of user trust and cognitive states in real-world interactions, we introduce a new set of joint trust and efficiency metrics. Specifically, our evaluation encompasses the following aspects:
• Cognitive Collaboration Quality:
– Cognition Coverage Index (CCI): Proportion of the user’s initial cognitive states accurately identified by the agent.
– Cognition Gain Index (CGI): Degree of new cognitive elements discovered or existing elements enhanced during interaction.
– CoCo-F1: Harmonic mean of CCI and CGI, reflecting overall collaborative effectiveness.
See Appendix B.1 for detailed formulation.
• Task Completion Capability:
– Inform*: Similarity between the agent’s final solution and the user’s initial goal (normalized to [0, 1]).
– Success*: Proportion of cognitive elements successfully addressed in the final solution. See Appendix B.2 for detailed formulation.
• Trust Maintenance Capability: To quantitatively evaluate the agent’s ability to maintain and repair trust, we map the latent trust state
where
– Average Trust Momentum (ATM): Measure the mean accumulated trust momentum over the entire dialogue:
A higher ATM indicates more persistent and stable collaborative trust. The exponential accumulation mechanism ensures that sustained deterioration in trust is penalized far more heavily than occasional fluctuations.
– Trust Recovery Rate (TRR): The proportion of low-trust events after which the agent achieves a significant recovery in trust momentum within the subsequent
where
– Trust Collapse Rate (TCR): The probability of low-trust states during the dialogue:
where
For the aforementioned metrics of collaborative quality and task completion, we adopt a hybrid evaluation framework that integrates LLM-assisted processing with direct numerical computation [61–63].
Specifically, for the cognitive collaboration metrics, following the approach of Li et al. [6], we treat the user’s initial utterance
In addition, for the three trust-related collaboration metrics proposed in this work, we compute them by directly tracking the state of
4.5.1 Analysis of Collaboration Quality and Task Completion
Table 1 compares the performance of different methods under both hybrid user simulators and trust-aware user simulators, covering multiple evaluation metrics, including average dialogue turns, collaboration quality (

As shown in Table 1, for both user types, traditional baseline models (e.g., ReAct, Proactive, ProCoT and Direct) exhibit minimal capacity for sustaining multi-turn interactions, with an average turn count (
On the other hand, CoCoAgent, which is specifically designed for collaborative dialogue, achieves the highest cognitive collaboration scores under both user settings (with
Considering the “trust-aware user modeling” and “alteration noise” mechanisms introduced in this work, CoCo-Agent lacks effective task state awareness. When confronted with potential user ideas shifts or cognitive conflicts, it exhibits an overly compliant, appeasing behavior by continuously adjusting the dialogue trajectory to accommodate the user’s divergent thinking and expand the scope of discussion, ultimately resulting in topic drift and solution non-convergence. In industrial environments where safety and latency constraints are stringent, such uncontrolled exploration without effective convergence control may introduce substantial operational risks. Furthermore, the performance discrepancy observed between the two user settings provides additional evidence that conventional static simulators, lacking psychological state modeling, can mask trust-related breakdowns and collaboration barriers that may arise in real HAC, thereby artificially inflating evaluation outcomes.
By comparison, the proposed TATA demonstrates consistently balanced and robust performance across all experimental settings. Under the trust-aware user scenario, The TATA (GPT-4.1) variant maintains high-quality cognitive collaboration (
This demonstrates that TATA’s adaptive interaction coordinator effectively determines when to engage in divergent cognitive exploration (using
4.5.2 Analysis of Dynamic Trust Maintenance
For the evaluation of trust maintenance capability, we computed the relevant metrics for 200 dialogue samples under each method according to Eqs. (9)–(12). Table 2 and Fig. 2-left show the average trust momentum (ATM), trust collapse rate (TCR), and trust recovery rate (TRR, window size = [1, 5] turns) recorded by the user simulator for all comparison methods.


Figure 2: The trust recovery rate (TRR) across window size (left) and the trust momentum trajectory across the dialogue (right) for all methods.
As shown in the table, traditional baseline models exhibit extremely low average trust scores (
Furthermore, as shown in Fig. 2-left, when trust drops to a low level, methods with collaborative capabilities demonstrate a significant improvement in recovery performance over the subsequent 1–5 turns as the window size increases, whereas the recovery rates of other methods are mostly below 0.6 or even 0. Notably, at the minimal window size (
To visualize trust dynamics across multi-turn interactions, we tracked the average trust momentum trajectories of each method throughout the entire dialogue (Fig. 2-right). The analysis further confirms TATA’s mechanistic advantages in dynamic interactions:
(1) Traditional baseline methods typically end the dialogue prematurely within 3–5 turns, accompanied by a sharp decline in trust momentum (e.g., ProCoT drops to
We employ Qwen2.5-72B as the backbone model and conduct a series of ablation studies to validate the efficacy of the proposed method. Given the tight input-output dependencies among the Planner, Monitor, and Coordinator, direct module removal would disrupt system operation. Therefore, we construct degraded variants by preserving the overall interaction skeleton while disabling only the core functionality of the target module, and compare these against the full model.
Specifically, we evaluate: (1) Full TATA, retaining all modules; (2) TATA w/o Monitor, preserving the working memory and task execution framework but disabling dual-track state monitor with fix cognitive development
Results in Tables 3 and 4 demonstrate that removing any core module precipitates significant performance degradation. When dual-track state monitoring is ablated (w/o Monitor), the agent loses real-time perception of implicit cognitive shifts and emotional fluctuations. This not only diminishes cognitive collaboration quality (CoCo-F1) and task success rate (Success*), but also causes average trust momentum (ATM) to plummet from 1.335 to 0.796 due to the inability to perceive user states, representing a 20.3% decline in short-term trust repair capacity (

Similarly, removing the adaptive Coordinator (w/o Coordinator) reduces the agent to a sequential trajectory executor incapable of flexibly invoking strategies to advance collaboration depth or maintain relationships (CoCo-F1 and Success* dropping to minimum values of 0.432 and 0.712, respectively). This results in an almost doubled trust collapse rate (TCR: 0.094) and a 26.8% reduction in short-term trust repair capacity (
This study investigates the critical challenges of task-level cognitive alignment and relationship-level trust maintenance within HAC systems operating in highly dynamic and complex industrial intelligence scenarios.
To address the over-idealization of existing evaluation environments, we first develop a trust-aware user behavior model grounded in theoretical and empirical research from cognitive science and trust dynamics. By formalizing collaborative trust as a dynamic latent variable within a HMM, the proposed model characterizes operators’ psychological fluctuations and cognitive evolution across multi-turn interactions, thereby providing a realistically grounded and reproducible testbed for evaluating the reliability of industrial collaborative intelligence systems. Building upon this, we propose the TATA agent framework. Beyond macro-level task planning, TATA introduces a dual-track cognitive/emotional state monitor and dynamically calibrates the policy space under the guidance of an adaptive interaction coordinator. This enables AI agents to conduct targeted interactions that enhance collaborative efficiency while reducing redundant interaction turns. Furthermore, we introduce a novel set of process-oriented trust evaluation metrics and conduct comparative experiments across six representative complex industrial decision-making scenarios. Extensive experimental results demonstrate that TATA achieves an effective balance among task completion capability, collaborative decision efficiency, and relationship stability in HAC scenarios.
We anticipate that this work will provide new insights into human-in-the-loop (HITL) intelligent systems for industrial applications. Future research may further explore extending the TATA framework to multi-agent or multi-user collaborative settings within cloud edge device coordinated architectures.
Although the trust-aware user modeling approach and the proposed TATA agent framework demonstrate significant improvements in collaborative efficiency across industrial task domains, several limitations remain for further investigation: (1) Regarding user behavior simulation, although a structured HMM framework and noise mechanisms were introduced based on empirical observations, the LLM-based simulator still cannot fully replicate the authentic behavioral patterns of real industrial operators under complex conditions, such as varying levels of professional expertise and psychological stress tolerance. (2) In terms of mechanism design, while employing an LLM as an approximator for Bayesian belief updating and POMDP policy selection circumvents the challenge of high-dimensional computation, it introduces inherent uncontrollability of LLMs, such as sensitivity to prompt design and potential hallucination risks. (3) The current framework lacks deep integration with structured industrial knowledge bases, which limits the precision of agent recommendations in highly specialized scenarios. Furthermore, due to constraints imposed by the current simulation-based validation environment, this study has not yet conducted benchmarking of inference latency or computational overhead when deploying the system on real industrial edge devices. Future work will focus on human-in-the-loop (HITL) dataset collection, the integration of domain knowledge and sensor data, and the exploration of lightweight neural trust tracking methods combined with model quantization techniques, aiming to achieve a balanced trade-off between computational cost and collaboration depth.
Acknowledgement: Not applicable.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: Pan Li: Conceptualization, methodology and writing—original draft preparation; Zhi Li: Investigation, conceptualization and validation; Yingyou Wen: Supervision and project administration. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: The raw data supporting the conclusions of this article will be made available by the authors on reasonable request.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
Appendix A Examples of Industrial Scenarios and User Profiles
The following is an example of a specific operator profile for the “Equipment Failure Diagnosis” domain:
- “user_id”: “equipment_23”
- “occupation”: “Equipment maintenance engineer”
- “background”: “Mechanical engineering major”
- “experience”: “17 years of industrial field experience”
- “initial goal (
- “initial cognitive state (
(1) “known symptoms”: “Temperature occasionally exceeds setpoint”
(2) “constraints”: “Cannot immediately shut down for comprehensive inspection”
(3) “preferences”: “Prefer to check common fault points first”
(4) “knowledge”: “Encountered a similar issue last month caused by a stuck cooling valve”
Appendix B Evaluation Metric Details
Appendix B.1 Cognitive Collaboration Quality Metric Details
• Initial Cognition Coverage Index (CCI): Evaluate the ability to elicit and identify the user’s cognitive starting points during multi-turn interactions.
Here,
• Cognition Gain Index (CGI): Evaluates the incremental cognitive gains achieved during the interaction, including the discovery of new elements and the semantic evolution of existing ones.
where
• Cognitive Collaboration F1 (CoCo-F1): The harmonic mean (F1-Score) of CCI and CGI as the core composite metric:
Higher scores indicate a more in-depth and comprehensive collaboration, which quantitatively reflects the process of understanding co-construction and evolution.
Appendix B.2 Task Completion Capability Metric Details
• Success*: Calculates the proportion of collected cognitive elements that are successfully incorporated into the agent’s final response.
where
1The study recruited 24 participants and collected 1981 lines of dialogue text during the completion of 12 decision-making tasks with a conversational agent.
References
1. Joshi B, Singh A, Kumar N, Rautela S. Fuzzy-deep learning-based artificial intelligence for edge computing and real-time decision-making in uncertain IoT environments. In: Proceedings of the 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT); 2025 Feb 21–22; Bhimtal, Nainital, India. p. 1301–6. [Google Scholar]
2. ManusAI. Leave it to manus. 2025 [cited 2025 Oct 9]. Available from: https://manus.im/. [Google Scholar]
3. Yang H, Yue S, He Y. Auto-GPT for online decision making: benchmarks and additional opinions. arXiv:2306.02224. 2023. [Google Scholar]
4. Cheng J, Kang H, Shao Y, Li N, Chen P, Wang R, et al. Survey on efficient large language models: principles, algorithms, applications, and open issues. IEEE Trans Neural Netw Learn Syst. 2026;37(5):2025–45. doi:10.1109/TNNLS.2025.3628671. [Google Scholar] [PubMed] [CrossRef]
5. Yuan S, Song K, Chen J, Tan X, Shen Y, Ren K, et al. Easytool: enhancing LLM-based agents with concise tool instruction. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); 2025 Apr 29–May 4; Albuquerque, New Mexico. p. 951–72. [Google Scholar]
6. Li P, Wang J, Yang Q, Guo T, Liu Y, Wen Y. From answering to discussing: advancing human-AI cognitive collaboration in dialogue agents. Inf Process Manag. 2026;63(5):104711. [Google Scholar]
7. Kim TS, Lee Y, Yu J, Chung JJY, Kim J. DiscoverLLM: from executing intents to discovering them. arXiv:2602.03429. 2026. [Google Scholar]
8. Zhang Z, Zhao H. Advances in multi-turn dialogue comprehension: a survey. arXiv:2103.03125. 2021. [Google Scholar]
9. Becker J. Multi-agent large language models for conversational task-solving. arXiv:2410.22932. 2024. [Google Scholar]
10. Clark HH. Using language. Cambridge, UK: Cambridge University Press; 1996. [Google Scholar]
11. Tolzin A, Janson A. Uncovering the mechanisms of common ground in human-agent interaction: review and future directions for conversational agent research. Internet Res. 2026;36(1):292–315. doi:10.1108/intr-06-2023-0514. [Google Scholar] [CrossRef]
12. Dong W, Chen S, Yang Y. Protod: proactive task-oriented dialogue system based on large language model. In: Proceedings of the 31st International Conference on Computational Linguistics; 2025 Jan 19–24; Abu Dhabi, United Arab Emirates. p. 9147–64. [Google Scholar]
13. Laban P, Hayashi H, Zhou Y, Neville J. LLMs get lost in multi-turn conversation. arXiv:2505.06120. 2025. [Google Scholar]
14. Zhao Z, Vania C, Kayal S, Khan N, Cohen SB, Yilmaz E. PersonaLens: a benchmark for personalization evaluation in conversational AI assistants. arXiv:2506.09902. 2025. [Google Scholar]
15. Sharma M, Tong M, Korbak T, Duvenaud D, Askell A, Bowman SR, et al. Towards understanding sycophancy in language models. arXiv:2310.13548. 2023. [Google Scholar]
16. Yi Z, Ouyang J, Xu Z, Liu Y, Liao T, Luo H, et al. A survey on recent advances in LLM-based multi-turn dialogue systems. ACM Comput Surv. 2025;58(6):1–38. doi:10.1145/3771090. [Google Scholar] [CrossRef]
17. Luo J, Zhang W, Yuan Y, Zhao Y, Yang J, Gu Y, et al. Large language model agent: a survey on methodology, applications and challenges. arXiv:2503.21460. 2025. [Google Scholar]
18. Wang L, Xu W, Lan Y, Hu Z, Lan Y, Lee RKW, et al. Plan-and-solve prompting: improving zero-shot chain-of-thought reasoning by large language models. arXiv:2305.04091. 2023. [Google Scholar]
19. Zhang D, Zhoubian S, Hu Z, Yue Y, Dong Y, Tang J. ReST-MCTS*: LLM self-training via process reward guided tree search. Adv Neural Inf Process Syst. 2024;37:64735–72. doi:10.52202/079017-2066. [Google Scholar] [CrossRef]
20. Feng Y, Rahmani HA, Lipani A, Yilmaz E. Towards asking clarification questions for information seeking on task-oriented dialogues. arXiv:2305.13690. 2023. [Google Scholar]
21. Qin Y, Liang S, Ye Y, Zhu K, Yan L, Lu Y, et al. ToolLLM: facilitating large language models to master 16,000+ real-world APIS. arXiv:2307.16789. 2023. [Google Scholar]
22. Hudecek V, Dušek O. Are LLMs all you need for task-oriented dialogue. arXiv:2304.06556. 2023. [Google Scholar]
23. Qian C, He B, Zhuang Z, Deng J, Qin Y, Cong X, et al. Tell me more! Towards implicit user intention understanding of language model driven agents. arXiv:2402.09205. 2024. [Google Scholar]
24. Deng Y, Lei W, Lam W, Chua TS. A survey on proactive dialogue systems: problems, methods, and prospects. arXiv:2305.02750. 2023. [Google Scholar]
25. Zhang X, Deng Y, Ren Z, Ng SK, Chua TS. Ask-before-plan: proactive language agents for real-world planning. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024; 2024 Nov 12–16; Miami, FL, USA. p. 10836–63. [Google Scholar]
26. Deng Y, Liao L, Chen L, Wang H, Lei W, Chua TS. Prompting and evaluating large language models for proactive dialogues: clarification, target-guided, and non-collaboration. arXiv:2305.13626. 2023. [Google Scholar]
27. Besta M, Blach N, Kubicek A, Gerstenberger R, Podstawski M, Gianinazzi L, et al. Graph of thoughts: solving elaborate problems with large language models. Proc AAAI Conf Artif Intell. 2024;38:17682–90. [Google Scholar]
28. Fernandez C, Fernández I, Aceta C. LAMIA: an LLM approach for task-oriented dialogue systems in industry 5.0. In: Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology; 2025 May 27–30; Bilbao, Spain. p. 27–30. [Google Scholar]
29. Adel A. Future of industry 5.0 in society: human-centric solutions, challenges and prospective research areas. J Cloud Comput. 2022;11(1):40. doi:10.1186/s13677-022-00314-5. [Google Scholar] [PubMed] [CrossRef]
30. Tóth A, Nagy L, Kennedy R, Bohuš B, Abonyi J, Ruppert T. The human-centric industry 5.0 collaboration architecture. MethodsX. 2023;11(16):102260. doi:10.1016/j.mex.2023.102260. [Google Scholar] [PubMed] [CrossRef]
31. Kraus M, Wagner N, Minker W. Modelling and predicting trust for developing proactive dialogue strategies in mixed-initiative interaction. In: Proceedings of the 2021 International Conference on Multimodal Interaction; 2021 Oct 18–22; Montréal, QC, Canada. p. 131–40. [Google Scholar]
32. Peng J, Kimmig A, Wang D, Niu Z, Tao X, Ovtcharova J. Intention recognition-based human-machine interaction for mixed flow assembly. J Manuf Syst. 2024;72(4):229–44. doi:10.1016/j.jmsy.2023.11.021. [Google Scholar] [CrossRef]
33. Yu A, Li C, Macesanu L, Balaji A, Ray R, Mooney R, et al. Mixed-initiative dialog for human-robot collaborative manipulation. arXiv:2508.05535. 2025. [Google Scholar]
34. Xu Y, Zhan X, Kaltungo AY, Ng MS, Ishizawa T, Fujimoto K, et al. Dialogue based interactive explanations for safety decisions in human robot collaboration. arXiv:2604.05896. 2026. [Google Scholar]
35. Xi Z, Chen W, Guo X, He W, Ding Y, Hong B, et al. The rise and potential of large language model based agents: a survey. Sci China Inf Sci. 2025;68(2):121101. doi:10.1007/s11432-024-4222-0. [Google Scholar] [CrossRef]
36. Kong C, Fan Y, Wan X, Jiang F, Wang B. PlatoLM: teaching LLMs in multi-round dialogue via a user simulator. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2024 Aug 11–16; Bangkok, Thailand. p. 7841–63. [Google Scholar]
37. Ni B, Wang Y, Wang L, Kveton B, Dernoncourt F, Xia Y, et al. A survey on llm-based conversational user simulation. In: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers); 2026 Mar 24–29; Rabat, Morocco. p. 4266–301. [Google Scholar]
38. Ahmad A, Hillmann S, Möller S. Simulating user diversity in task-oriented dialogue systems using large language models. arXiv:2502.12813. 2025. [Google Scholar]
39. Wang N, Peng Z, Que H, Liu J, Zhou W, Wu Y, et al. RoleLLM: benchmarking, eliciting, and enhancing role-playing abilities of large language models. In: Findings of the Association for Computational Linguistics: ACL 2024; 2024 Aug 11–16; Bangkok, Thailand. p. 14743–77. [Google Scholar]
40. Lin J, Tomlin N, Andreas J, Eisner J. Decision-oriented dialogue for human-AI collaboration. Trans Assoc Comput Linguist. 2024;12(1):892–911. doi:10.1162/tacl_a_00679. [Google Scholar] [CrossRef]
41. Luo X, Tang Z, Wang J, Zhang X. DuetSim: building user simulator with dual large language models for task-oriented dialogues. arXiv:2405.13028. 2024. [Google Scholar]
42. Wang X, Sen P, Li R, Yilmaz E. Adaptive retrieval-augmented generation for conversational systems. In: Findings of the Association for Computational Linguistics: NAACL 2025; 2025 Apr 29–May 4; Albuquerque, New Mexico. p. 491–503. [Google Scholar]
43. Naous T, Laban P, Xu W, Neville J. Flipping the dialogue: training and evaluating user language models. arXiv:2510.06552. 2025. [Google Scholar]
44. Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C. Direct preference optimization: your language model is secretly a reward model. Adv Neural Inf Process Syst. 2023;36:53728–41. [Google Scholar]
45. Cheng M, Yu S, Lee C, Khadpe P, Ibrahim L, Jurafsky D. ELEPHANT: measuring and understanding social sycophancy in LLMs. arXiv:2505.13995. 2025. [Google Scholar]
46. Lee JD, See KA. Trust in automation: designing for appropriate reliance. Hum Factors. 2004;46(1):50–80. [Google Scholar] [PubMed]
47. Li M, Erickson IM, Cross EV, Lee JD. Estimating trust in conversational agent with lexical and acoustic features. Proc Hum Factors Ergon Soc Annu Meet. 2022;66(1):544–8. doi:10.1177/1071181322661147. [Google Scholar] [CrossRef]
48. Li M, Kamaraj AV, Lee JD. Modeling trust dimensions and dynamics in human-agent conversation: a trajectory epistemic network analysis approach. Int J Hum Comput Interact. 2024;40(14):3571–82. [Google Scholar]
49. Huang Y, Sun L, Wang H, Wu S, Zhang Q, Li Y, et al. TrustLLM: trustworthiness in large language models. arXiv:2401.05561. 2024. [Google Scholar]
50. Liu J, Tan YK, Fu B, Lim KH. From intents to conversations: generating intent-driven dialogues with contrastive learning for multi-turn classification. arXiv:2411.14252. 2025. [Google Scholar]
51. Gross JJ. Emotion regulation: affective, cognitive, and social consequences. Psychophysiology. 2002;39(3):281–91. [Google Scholar] [PubMed]
52. Hoff KA, Bashir M. Trust in automation: integrating empirical evidence on factors that influence trust. Hum Factors. 2015;57(3):407–34. doi:10.1177/0018720814547570. [Google Scholar] [PubMed] [CrossRef]
53. Spaan MT. Partially observable Markov decision processes. In: Reinforcement learning: state-of-the-art. Berlin/Heidelberg, Germany: Springer; 2012. p. 387–414. [Google Scholar]
54. Li P, Yang Q, Xu S, Li X, Li Z, Wang C, et al. Adaptive-TOD: an LLM-driven and adaptive agent for diverse interaction modes. Neurocomputing. 2025;652:130991. [Google Scholar]
55. Kumar N, Kumar RR. Human-AI collaboration in operations and supply chain management: a systematic literature review. Manag Rev Q. 2025;24(3):691. doi:10.1007/s11301-025-00575-9. [Google Scholar] [CrossRef]
56. Wang K, Du N. Real-time monitoring and energy consumption management strategy of cold chain logistics based on the internet of things. Energy Inform. 2025;8(1):34. doi:10.1186/s42162-025-00493-w. [Google Scholar] [CrossRef]
57. Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, et al. React: synergizing reasoning and acting in language models. In: Proceedings of the 11th International Conference on Learning Representations (ICLR); 2023 May 1–5; Kigali, Rwanda. [Google Scholar]
58. Yang A, Yang B, Hui B, Zheng B, Yu B, Zhou C, et al. Qwen2 technical report. arXiv:2407.10671. 2024. [Google Scholar]
59. OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. arXiv:2303.08774. 2024. [Google Scholar]
60. Zhang M, Press O, Merrill W, Liu A, Smith NA. How language model hallucinations can snowball. arXiv:2305.13534. 2023. [Google Scholar]
61. Qian K, Beirami A, Kottur S, Shayandeh S, Crook P, Geramifard A, et al. Database search results disambiguation for task-oriented dialog systems. arXiv:2112.08351. 2021. [Google Scholar]
62. Liu Y, Iter D, Xu Y, Wang S, Xu R, Zhu C. G-eval: NLG evaluation using GPT-4 with better human alignment. arXiv:2303.16634. 2023. [Google Scholar]
63. Zheng L, Chiang WL, Sheng Y, Zhuang S, Wu Z, Zhuang Y, et al. Judging LLM-as-a-judge with MT-bench and chatbot arena. Adv Neural Inf Process Syst. 2023;36:46595–623. doi:10.52202/075280-2020. [Google Scholar] [CrossRef]
64. Budzianowski P, Wen TH, Tseng BH, Casanueva I, Ultes S, Ramadan O, et al. Multiwoz—a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018 Oct 31–Nov 4; Brussels, Belgium. [Google Scholar]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools