Shuozhi Pei – SDCA PG Degree Show 2024

“This study investigates how different chatbot explanation modes (textual, rules, and mixed) impact trust, comprehension, and mental effort in healthcare. Overall, the mixed mode was found to balance clarity and engagement, optimizing user experience for non-experts.”

Problem Statement

A person sitting in front of a computer screen, connected via a cable to a head model, representing the challenges of integrating technology and human interaction in problem-solving.

Healthcare chatbots aim to tackle limited resources, rising chronic diseases, and unequal access to care. However, they often fail to process complex information and provide clear explanations for non-experts, causing misunderstandings and reduced trust. Therefore, it is essential to evaluate and optimize explanation modes – textual, rules, and mixed – to improve user trust and comprehension while reducing cognitive load. Enhancing these modes is key to making chatbots more reliable, user-friendly, and effective for non-experts managing their health.

Research Aim

Two healthcare professionals in lab coats standing and discussing, with a heart symbol and an electrocardiogram line in the background, representing healthcare and medical consultation.

This study aims to design and evaluate suitable chatbot explanation modes for non-experts. Three different modes are proposed, namely Textual mode, Rules mode and Mixed mode, to evaluate the impact of different explanation formats on users’ trust, comprehension and mental effort.

Research Questions

Research Objectives

What are the needs and requirements of explaining information in healthcare chatbots?
What is the effect of textual, rules, and mixed explanation modes on individuals’ perceived trust?
How do the three explanation modes affect an individual’s mental effort?
What impact do three explanation modes have on an individual’s comprehension?

To organise a focus group to get feedback and suggestions on healthcare chatbot design.
To ideate and generate two different explanation formats (Rules and Mixed format), while the text format uses the existing model.
To make two separate prototypes of healthcare chatbots with the rules and mixed format for the medical enquiring process, which will be compared with the traditional textual mode.
To conduct a within-subject controlled study where all participants are exposed to the three modes of healthcare chatbots.

Research Method

Research plan diagram showing the design thinking process with stages: Empathize, Define, Ideate, Prototype, and Test. It includes steps for user-centered design, participatory design activities, prototype summarization, formative testing, pilot study, and experiment.

The research plan follows the Design Thinking methodology, combaining user-centered design principles to create and test healthcare chatbot prototypes. It begins with understanding user needs through participatory design which concducted a focus group to defines requirements and create three chatbot prototypes. Formative testing and pilot studies refine the main features of prototypes, followed by final evaluation to figure out the answer, ensuring they meet user needs effectively.

Prototype Development

Focus Group

Participants provided key feedback on healthcare chatbots, highlighting areas for improvement. Trustworthiness is crucial; current outputs are too complex, and clear use of specialized terminology is needed. Guidance and topic direction are essential for understanding user descriptions. Users prefer personalized responses over generic ones and expect chatbots to show empathy to enhance comfort. Additionally, they desire instant feedback and integration of diverse information. Past negative experiences, such as incorrect or repetitive information, have affected trust. Overall, chatbots should focus on clear output, professional guidance, emotional support, and immediate feedback to meet user needs.

Inspired by cognitive load theory, I ideated solutions to minimize users’ mental effort. I recognized the potential of rule-based learning to improve decision-making and align with user preferences for structured information. Besides, information transparency is also important.

The Main Features in 3 Explanation Modes

Textual for a natural conversational flow basically using NLP mode, Rules for structured guidance always showed an “IF-Then” format in the explanation, while Mixed combines the both, it can give a natural reply with emotion and logical structure like “If-Then” as well.

A table comparing three chatbot modes: Textual, Rules, and Mixed. It outlines their descriptions, guidelines, examples, and the type of model used, including AI chatbot, Rule-based Chatbot, and Hybrid Chatbots.

Prototype version 1

Prototype version 2

Two images showing a chatbot prototype on Figma, featuring button interactions and fixed dialogue. Next to it, a person interacts with the prototype on a tablet. Text below outlines generated features and usability test results, highlighting issues like limited user control and lack of explicit status feedback. — The first prototype, developed using Figma, featured button interactions to simulate chatbot dialogues. I did preliminary usability testing by heuristics principles and SWOT analysis

For the second prototype, I incorporated feedback and adopted the Wizard of Oz method. I trained the ChatGPT and also generated the script about the topic. Then implemented a wizard assistant to refine responses from ChatGPT sending in WhatsApp.

Formative test (Version 1)

Formative test (Version 2)

SWOT analysis chart with four quadrants. Strengths include simple interface design and operation. Weaknesses highlight limited user control and lack of help links. Opportunities suggest using text input and speech recognition. Threats point out user trust decline, distraction, and reduced willingness to engage. — The SWOT analysis shows that while the prototype’s simple interface and operation reduce user input errors, it has weaknesses like limited user control and a lack of support features. Opportunities include integrating text input and speech recognition to enhance user flexibility. However, threats like reduced user trust and engagement could impact the study’s validity if these weaknesses are not addressed.

So in the end, I opted for prototype version 2, applying the Wizard of Oz approach to the experiment.

Evaluation

Overview of a within-subject experiment involving 28 participants interacting with three chatbot prototypes: Mixed mode, Rules mode, and Textual mode. The study includes filling out four questionnaires on mental effort, perceived trust, and usefulness, along with a semi-structured interview. Tasks include identifying causes of symptoms, learning about habit-cause links, and decision-making on management advice.

In the evaluation methodology, I used a within-subject experiment design, where 28 participants interacted with three different chatbot prototypes: Mixed, Rules and Textual Mode. The participants were asked to complete three tasks with each prototype, which involved easy and difficult tasks.

Following the tasks, participants filled out four questionnaires, assessing Mental Effort, Perceived Trust and Perceived Usefulness, and Comprehension. The interaction was assisted by a wizard behind the scenes to ensure the dialogue followed the designed map. This setup allowed us to closely monitor how each mode influenced the users’ experience and gather data for analysis.

The evaluation used both quantitative and qualitative methods. Quantitatively, a one-way within-subjects design compared three prototypes. Descriptive analysis included mean scores for Perceived Trust and Mental Effort, median scores for Perceived Usefulness, and contingency tables for Comprehension. Normality was tested using the Shapiro-Wilk test, followed by parametric (ANOVA) or non-parametric tests (Friedman’s, Wilcoxon) based on p-values. Qualitatively, semi-structured interviews were conducted using an inductive, experiential approach in six steps to understand user interactions.

Overview of quantitative and qualitative analysis methods. Quantitative analysis includes descriptive (mean scores, median scores, contingency tables) and statistical analysis (Shapiro Wilk test, ANOVA, Friedman’s, Wilcoxon, McNemar tests). Qualitative analysis involves semi-structured interviews using phenomenology and thematic analysis with six steps, from familiarizing with data to creating an affinity map.

Key Results

A participant engaged in a usability testing session, using a tablet for interaction. In the background, a whiteboard with handwritten notes and diagrams is visible, indicating a brainstorming or analysis process.

Quantitative Results: The mixed mode outperformed both textual and rules modes in Perceived Trust and Mental Effort, indicated by lower mean scores and statistical significance. For Comprehension, the Rules mode was most effective, particularly for those with lower numerical skills, though no significant differences were found.

Qualitative Analysis: Thematic analysis revealed four key themes: Users prefer simple, clear explanations for effective information retrieval. The mixed mode, blending structured logic and human interaction, was favored but can still cause information overload. Trust increased with clear, consistent explanations, and emotional engagement further boosted trust. Structured formats like ‘if-then’ statements helped reduce mental effort.

Discussion on RQ

The mixed mode ranked highest in trust due to its combination of sentiment analysis with structured statements, enhancing user confidence. For comprehension, the rules mode proved most effective, particularly for individuals with lower numerical skills, aligning with human learning theory. Regarding mental effort, the mixed mode was again preferred, as it reduced cognitive load by incorporating multiple text formats. Overall, the mixed mode offers a balanced approach between trust and mental effort, while the rules mode excels in enhancing comprehension.

A participant acting as the ‘wizard’ in a Wizard of Oz method, sitting at a table and interacting with a laptop screen. This setup is part of a simulated conversation test to evaluate chatbot responses.

Guidelines

Use Structured Information: Apply “IF-THEN” statements and highlight key points to improve clarity and reduce mental effort.
Balance Natural and Structured Interaction: Blend Textual and Rules modes, using natural language for simple information and structured responses for key details.
Increase Interactivity: Add prompts for diagnosing symptoms and offer flexible interactions to improve understanding.
Simplify Access: Use familiar platforms, quick-reply buttons, and personal explanations to make information easier to find.

Implication

Illustration of a laptop and smartphone connected by lines, symbolizing cross-device compatibility and interaction, representing the implications of designing responsive digital solutions.

Theoretical: The study supports cognitive load theory, showing that mixed mode improves trust and comprehension with less effort, and challenges the effectiveness of traditional textual modes.

For UX Designers: Prioritize mixed mode for complex tasks, balance emotional support with structured logic, and offer user control to enhance the experience.

Practical Use: The findings can guide the design of chatbots in telemedicine, health education, and assistive technologies for a more user-friendly experience.