Assessing Impact of Intelligibility on Understanding Context-Aware Applications.

January 18th, 2010 § 0

We sought to explore how much better participants could understand intelligent, decision-based applications when provided explanations. In particular, we investigated differences in understanding and resulting trust when participants were provided with one of four types of explanations compared to receiving no explanations (None). The four types of explanations are in terms of answers to question types:

  1. Why did the application do X?
  2. Why did it not do Y?
  3. How (under what condition) does it do Y?
  4. What if there is a change W, what would happen?

We showed participants an online abstracted application with anonymous inputs and outputs and asked them to learn how the application makes decisions after viewing 24 examples of its performance. Of the 158 participants recruited, they were evenly divided into groups where some received one of the four types of explanations and one group received no explanation. We subsequently measured their understanding by testing whether they can predict missing inputs and outputs in 15 test cases, and asking them to explain how they think the application reasons. We also measured their level of trust of the application output.

We found that participants who received Why and Why Not explanations better understood and trusted the application than How To and What If.

abbox -results

abbox -results

Motivation

… to be linked …

Method

To simulate a context-aware application, we defined a wearable device that detects whether a user is exercising, based on three sensed contexts of Body Temperature, Heart Rate, and Pace. It uses a decision tree to make decisions.

abbox - concrete - tree

abbox - concrete - tree

Since we are concerned about the information of the application and not the user interface, we show participants a simple “black box” user interface (UI). It just shows inputs and outputs.

abbox - concrete

abbox - concrete

We found that with the concrete domain information, users were depending a lot on their prior knowledge of how a device could detect exercise, and were not gaining much from various explanations the application provides. So we abstracted away the information, and represented the inputs and outputs as anonymous symbolic labels. The following figure shows the UI with the anonymized model.

abbox - abstract

abbox - abstract

To allow participants to learn how the application makes decisions, we showed them the black box interface with 24 examples of different inputs and resulting outputs. Some participants received one of the four explanation types (Why, Why Not, How To, What If), and some in the baseline condition received no explanations (None). Why and Why Not explanations are just textual representations, while How To and What If explanations require users to interact with to create a query.

Why Output classified as b because A = 5, and C = 3
Why Not Output not classified as a because A = 5, but not C > 3
How To

abbox - howto

abbox - howto
What If

abbox - whatif

abbox - whatif
None No Explanation provided

Screenshot of an example during the learning phase showing an example explanation, and a note-taking text box (for participants’ convenience):

abbox - abstract - learning

abbox - abstract - learning

After the learning phase, participants are tested on their understanding with a two-part quiz. First, they answer 15 test cases where at least an input or an output is blank, and they are to fill in the value they think should fit.
Secondly, they are shown three complete examples and are asked to explain how they think the application decided on the outputs.


Fill-in-the-Blanks Test Reasoning Test

abbox - abstract - quiz

abbox - abstract - quiz

abbox - abstract - quiz2

abbox - abstract - quiz2

Measures

We are interested in measuring the extent of the participants’ understanding, and how that impacts their task (quiz) performance and trust of the application output.

Performance Quiz answer correctness
Quiz completion time
Understanding Posited reasons
Why and Why Not reasoning
Mental Model of how application decides
Trust Reported trust

User reasoning and mental models were coded with the following coding scheme:

 
Guess / Unintelligible No reason given, guessed, or reason incoherent
 
Some Logic E.g. inputs odd/even, one input largest
 
Inequality Mentioned inequalities, but wrong values or relations
 
Partially Correct At least one wrong or extraneous inequality
 
Fully Correct Two correct inequalities

Participants

We recruited 53 participants for Experiment 1 and 158 participants for Experiment 2 from Amazon Mechanical Turk.

Results

For the concrete application about exercise detection, the results indicate that participants understood the application decision better when provided with explanations. However, there is no significant differences across explanation types.


abbox -results - correctness - concrete

abbox -results - correctness - concrete

abbox -results - correctness reasons - concrete

abbox -results - correctness reasons - concrete

When more explanation types were tested with the abstract application, we can see that users understood Why and Why Not explanations more than How To and What If explanations, and this similarly affected their trust.



abbox -results - correctness

abbox -results - correctness

abbox - results - correctness reasons

abbox - results - correctness reasons

abbox - results - understanding

abbox - results - understanding

abbox - results - trust

abbox - results - trust

abbox - abstract - learning

abbox - abstract - learning

Discussion

Why vs. Why Not

Examining the user reasons, we found that automatically generated Why explanations allowed users to more precisely understand how the system functions for individual instances compared to Why Not explanations. Why Not participants tended to learn only part of the reasoning trace, and did not associate the two rules together, but treated them separately. This failure in rule conjunction could be due to the inclusion of negative wording (i.e. “but” and “not”) in the Why Not explanation. The mental effort to understand the Why Not explanation and create such a rule conjunction is certainly more than those in the Why condition had to expend, which could explain the differences we observed.

Why & Why Not vs. How To & What If

We believe participants did not perform as well for the interactive explanation facilities due to their unfamiliarity and the complexity of the interfaces. Some participants indicated that they did not understand how to use them.

Our results suggest that developers should provide Why explanations as the primary form of explanation and Why Not as a secondary form, if provided. Our results may suggest the ineffectiveness of How To and What If explanations, but these intelligibility types may be more useful for other types of tasks, particularly those relating to figuring out how to execute certain system functionality, rather than interpreting or evaluating.

Impact of Prior Knowledge

We found in Experiment 1 (concrete application) that participants formed less accurate and precise mental models of the system, compared to those in Experiment 2 (abstract). This could be due to participants applying their prior knowledge of exercising to understanding how the system works and not paying careful attention to the explanations, as evidenced by the reasons they provided.

From the Lab to the Real World

Even though Why Not explanations were not as effective as Why explanations, we believe such explanations would be important with real-world applications. In reality, users would ask Why questions when they lack an understanding of how the application works, but Why Not questions when they expect certain results that the application did not produce.

Implications for Context-Aware Applications

(See publication).

Publication

Lim, B. Y., Dey, A. K., Avrahami, D. 2009. Why and Why Not Explanations Improve the Intelligibility of Context-Aware Intelligent Systems. In Proceedings of the 27th international Conference on Human Factors in Computing Systems (Boston, MA, USA, April 04 - 09, 2009). CHI '09. ACM, New York, NY, 2119-2128. DOI=10.1145/1518701.1519023. (Nominated for Best Paper).

Tagged: , , , , ,

§ Leave a Reply

You can add images to your comment by clicking here.

 

This blog is kept spam free by WP-SpamFree.