Lecture 9: Qualitative
Evaluation
Shengdong Zhao
Acknowledgement:
Some material in this lecture is from Saul Greenberg, Maneesh Agrawal, Scott Klemmer, Richard Davis, etc.
Used with permission.
1
Design
Prototype
Evaluate
The Design Process
[Koberg & Bagnall] Design Thinking Workshop
5
Naturalistic approach
•Observation occurs in realistic setting
– real life
•Problems
– hard to arrange and do
– time consuming
– may not generalize
6
Usability engineering
approach
Is the test result relevant to the usability of real products in
real use outside of lab?
Problems
– non-typical users
– non-typical tasks
– different physical environment
– different social context
• experimenter vs. boss
•Partial Solution
– use real users
– task-centered system design tasks
– environment similar to real situation
7
Discount usability evaluation
•Low cost methods to gather usability problems
– approximate: capture most large and many minor
problems
•How?
– Qualitative:
• observe interactions
• gather explanations
• produces description
• anecdotes, transcripts, problem areas, critical incidents…
– Quantitative*
• count, log, measure user actions
• speed, error rate, counts of activities
8
Qualitative vs. Quantitative
This week Next week
Words Numbers
9
Discount usability evaluation
•Methods
– Inspection
– extracting the conceptual model
– direct observation
• think-aloud
• constructive interaction
• Retrospective Think Aloud
– query techniques (interviews and questionnaires)
– continuous evaluation (user feedback and field
studies)
10
Inspection
• Designer tries the system (or prototype)
– does the system “feel right”?
– benefits
• catch major problems early
– problems
• not reliable
• not valid
• intuitions can be wrong
• Inspection methods
– task centered walkthroughs
– heuristic evaluation
11
Heuristic Evaluation
Usability Heuristics
“Rules of thumb” describing features of usable systems
– Can be used as design principles
– Can be used to evaluate a design
Example: Minimize users’ memory load
Pros and cons
– Easy and inexpensive
• No need users
• Catch many design flaws
– More difficult than it seems
• Not a simple checklist
• Cannot assess how well the interface will address user goals
Heuristic Evaluation
Developed by Jakob Nielsen (1994)
Original Heuristics
H1-1: Simple and natural dialog
H1-2: Speak the users’ language
H1-3: Minimize users’ memory load
H1-4: Consistency
H1-5: Feedback
H1-6: Clearly marked exits
H1-7: Shortcuts
H1-8: Precise & constructive error messages
H1-9: Prevent errors
H1-10: Help and documentation
Revised Heuristics
Also developed by Nielsen.
– Based on factor analysis of 249 usability problems
– A prioritized, independent set of heuristics
Revised Heuristics
H2-1: Visibility of system status
H2-2: Match system and real world
H2-3: User control and freedom
H2-4: Consistency and standards
H2-5: Error prevention
H2-6: Recognition rather than recall
H2-7: Flexibility and efficiency of use
H2-8: Aesthetic and minimalist design
H2-9: Help users recognize, diagnose and recover from errors
H2-10: Help and documentation
Heuristic: Visibility (Feedback)
searching database for matches
H2-1: Visibility of system status
Heuristic: Visibility (Feedback)
Users should always be aware!
Feedback: Toolbar, cursor, ink
Heuristics (H2-2): Match System &
World
Heuristics: Match System & World
Speak users’ language
•Withdrawing money at ATM
•Use meaningful mnemonics, icons and abbreviations
Heuristics (2-3) : Control &
Freedom
“Exits” for mistaken choices, undo, redo
Don’t force down fixed paths …
Heuristics: Control & Freedom
• Mark exits: Users don’t like to be trapped!
• Strategies
– Cancel button (or Esc key) for dialog
• Make the cancel button responsive!
– Universal undo
Heuristics: Consistency
H2-4: Consistency and standards
Heuristics: Errors and Memory
H2-5: Error prevention
H2-6: Recognition rather than recall
– Make objects, actions, options, & directions visible or easily
retrievable
Heuristic: Errors and Memory
• Promote recognition over recall
– Recognition is easier than recall
• Describe expected input clearly
– Don’t allow for incorrect input
Heuristics: Flexibility
Edit
Cut ctrl-X
Copy ctrl-C
Paste ctrl-V
H2-7: Flexibility and efficiency of use
– Accelerators for experts (e.g., gestures, shortcuts)
– Allow users to tailor frequent actions (e.g., macros)
Heuristics: Aesthetics
H2-8: Aesthetic and minimalist design
– No irrelevant information in dialogues
Heuristics: Help Users
H2-9: Help users recognize, diagnose, and recover
from errors
Good Error Messages
From Cooper’s “About Face 2.0”
Heuristics: Docs
H2-10: Help and documentation
– Easy to search
– Focused on the user’s task
– List concrete steps to carry out
– Not too long
Heuristics: Docs
32
The Process of Heuristic
Evaluation
Phases of Heuristic Eval. (1-2)
1) Pre-evaluation training
2) Evaluation
– Individuals evaluate interface then aggregate results
– Work in 2 passes
• Overview -> Details
– Each evaluator produces list of problems
Phases of Heuristic Eval. (3-4)
3) Severity rating
– Cosmetic << minor << major <<
catastrophic
4) Debriefing
– Discuss outcome
– Suggest solutions
– Assess difficulty to fix
Examples
Can’t copy info from one window to another
– Violates “User control and freedom” (H2-3)
– Violates “Recognition rather than recall” (H2-7)
– Violates “Flexibility and efficiency of use” (H2-8)
– Fix: allow copying
Typography uses mix of upper/lower case formats and fonts
– Violates “Consistency and standards” (H2-4)
– Slows users down
– Fix: pick a single format for entire interface
– Probably wouldn’t be found by user testing
Severity Rating
Used to allocate resources to fix problems
Estimates of need for more usability efforts
Combination of
– Frequency
– Impact
– Persistence (one time or repeating)
Should be calculated after all evaluations are in
Should be done independently by all judges
Levels of Severity
0 - don’t agree that this is a usability problem
1 - cosmetic problem
2 - minor usability problem
3 - major usability problem; important to fix
4 - usability catastrophe; imperative to fix
Severity Ratings Example
1. [H2-4 Consistency] [Severity 3][Fix 0]
The interface used the string "Save" on the first screen
for saving the user's file, but used the string "Write
file" on the second screen. Users may be confused by
this different terminology for the same function.
Debriefing
• Conduct with evaluators, observers, and development
team members
• Discuss general characteristics of UI
• Suggest improvements to address major usability problems
• Development team rates how hard things are to fix
• Make it a brainstorming session
– Little criticism until end of session
Number of Evaluators
Single evaluator achieves poor results
– Only finds 35% of usability problems
– 5 evaluators find ~ 75% of usability problems
– Why not more evaluators???? 10? 20?
• Adding evaluators costs more
• Many evaluators won’t find many more problems
But always depends on market for product:
– popular products high support cost for small bugs
Decreasing Returns
Problems Found Benefits / Cost
Caveat: Graphs are for a specific example
Summary
• Heuristic evaluation is a discount method
• Have evaluators go through the UI twice
• Have evaluators independently rate severity
• Combine the findings from 3 to 5 evaluators
• Discuss problems with design team
• Cheaper alternative to user testing
– Finds different problems, so good to alternate
In-class Exercise
44
Discount usability evaluation
•Methods
– Inspection
– extracting the conceptual model
– direct observation
• think-aloud
• constructive interaction
• Retrospective Think Aloud
– query techniques (interviews and questionnaires)
– continuous evaluation (user feedback and field
studies)
45
Conceptual model extraction
•How?
– show the user static images of
• the prototype or screens
– ask the user explain
• the function of each screen element
• how they would perform a particular task
•What?
– Initial conceptual model (first time)
– Formative conceptual model (later)
•Value?
– good for eliciting people’s understanding before & after
use
– poor for examining system exploration and learning
46
Direct observations
•Evaluator observes users interacting with system
– in lab:
• user asked to complete a set of pre-determined tasks
– in field:
• user goes through normal duties
•Value?
– excellent at identifying gross design/interface problems
– validity depends on how controlled/contrived the situation
is
47
Simple observation method
•User is given the task
•Evaluator just watches the
user
•Problem
– does not give insight into the
user’s decision process or
attitude
48
Think aloud method
Users speak their thoughts
while doing the task
•gives insight into what the user Hmm, what does this
is thinking do? I’ll try it… Ooops,
now what happened?
•most widely used evaluation
method in industry
•However:
– unnatural (awkward and
uncomfortable)
– hard to talk if they are
concentrating
– may alter the way users do the
task
49
Initial Conceptual Model and
Think Aloud Exercise
Techniques used:
• Conceptual Model
• Think Aloud
50
51
52
Think aloud method
Users speak their thoughts
while doing the task
•gives insight into what the user Hmm, what does this
is thinking do? I’ll try it… Ooops,
now what happened?
•most widely used evaluation
method in industry
•However:
– unnatural (awkward and
uncomfortable)
– hard to talk if they are
concentrating
– may alter the way users do the
task
53
Problems of Think aloud
method
However:
– unnatural (awkward and
uncomfortable) Hmm, what does this
do? I’ll try it… Ooops,
– hard to talk if they are now what happened?
concentrating
– may alter the way users do
the task
54
Constructive interaction
method
•Two people work together on
a task Oh, I think
– monitor their normal you clicked
on the
conversations wrong icon
Now, why
did it do
Co-discovery learning that?
– use semi-knowledgeable “coach”
and novice
– only novice uses the interface
• novice ask questions
• coach responds
– gives insights into two user
groups
55
Problems of Think aloud
method
However:
– unnatural (awkward and
uncomfortable) Hmm, what does this
do? I’ll try it… Ooops,
– hard to talk if they are now what happened?
concentrating
– may alter the way users
do the task
56
RTA – Retrospective Think
Aloud
•Users first complete the task
and verbalize after
•Process is observed and
recorded with notes
•Benefits
–Verbalizing on a higher level
–More relaxed
–Fabrication not a problem
Ref: ZhiweiGuan, Shirley Lee, Elisabeth Cuddihy, Judith 57
Ramey
Comparing Eye-tracking Patterns
58
What you have learned
• Why do we need evaluation?
• Different stages where usability evaluation applies
• What a usability room look like?
• A number of discounted usability evaluation methods
– Inspection
– Initial Conceptual Model
– Direct observation
• Think aloud
• Constructive interaction method
• RTA (Retrospective Think Alound)
59
Steps to Prepare and
Conduct a User Study
60
Preparing for a User Test
• Objective: narrow or broad?
• Design the tasks
• Decide on whether to use video/audio
• Choose the setting
• Representative users
61
User Test
• Roles:
– Greeter
– Facilitator: Help users
to think aloud…
– Observers: record
“critical incidents”
62
Critical Incidents
• Critical incidents are unusual or interesting
events during the study.
• Most of them are usability problems.
• They may also be moments when the user:
– got stuck, or
– suddenly understood something
– said “that’s cool” etc.
63
The User Test
• The actual user test will look something like
this:
– Greet the user
– Explain the test
– Collect Demographic Information
– Get user’s signed consent
– Demo the system
– Run the test (maybe ½ hour)
– Post-Interview & Questionnaire
– Debrief
64
10 Steps to better evaluation
65
10 steps to better evaluation
1. Introduce yourself
– some background will
help relax the subject.
66
10 steps
2. Describe the purpose of the observation (in
general terms), and set the participant at ease
– You're helping us by trying out this product in its
early stages.
– If you have trouble with some of the tasks, it's the
product's fault, not yours. Don't feel bad; that's
exactly what we're looking for.
67
10 steps (contd.)
3. Tell the participant that it's okay to quit at
any time, e.g.:
– Although I don't know of any reason for this to happen,
if you should become uncomfortable or find this test
objectionable in any way, you are free to quit at any
time.
68
10 steps (contd.)
4. Talk about the equipment in the room.
– Explain the purpose of each piece of equipment
(hardware, software, video camera, microphones, etc.)
and how it is used in the test.
69
10 steps (contd.)
5. Explain how to “think aloud.”
– Explain why you want participants to think aloud, and
demonstrate how to do it. E.g.:
– We have found that we get a great deal of information
from these informal tests if we ask people to think aloud.
Would you like me to demonstrate?
70
10 steps (contd.)
6. Explain that you cannot provide help.
71
10 steps (contd.)
7. Describe the tasks and introduce the
product.
– Explain what the participant should do and in
what order. Give the participant written
instructions for the tasks.
– However, don’t demonstrate what
you’re trying to test.
72
10 steps (contd.)
8. Ask if there are any questions before you
start; then begin the observation.
73
10 steps (contd.)
9. Conclude the observation. When the test is
over:
– Explain what you were trying to find.
– Answer any remaining questions.
– Discuss any interesting behaviors you would like the
participant to explain.
74
10 steps (contd.)
10. Use the results.
– When you see participants making mistakes, you
should attribute the difficulties to faulty product
design, not to the participant.
75
Using the Results
• Update task analysis and rethink design
– Rate severity & ease of fixing problems
– Fix both severe problems & make the easy fixes
• Will thinking aloud give the right answers?
– Not always
– If you ask a question, people will always give an answer,
even it is has nothing to do with the facts
– Try to avoid leading questions
76
Questions?
High-order summary:
• Follow a loose master-apprentice model
• Observe, but help the user describe what they’re
doing
• Keep the user at ease
77
How many users should you
observe?
• Problems
– observing many users is expensive
– but individual differences matter
• best user 10x faster than slowest
• best 25% of users ~2x faster than slowest 25%
• Partial solution
– reasonable number of users
with reasonable range
– big problems usually detected with 3-5 users
– small problems / fine measures need many users
78
In-class Exercise
79
In-class Exercise
• Procedure
– Greet the user
– Explain the test
• (How to be consistent?)
– Collect demographic information (how?)
– Get user’s signed consent
– Demo the system
• (how to be consistent in demoing the system to different users?)
– Run the test (maybe ½ hour)
• (What test and how to conduct it?)
– Post-study questionnaire
• (What type of questions you want to ask?)
– Debrief
• Work in pairs, and describe briefly what will you do in
each step
80
What we have prepared?
81
Pre-Study Questionnaire
82
Example Questionnaires
http://oldwww.acm.org/perlman/question.html
83
A Simple Questionnaire
84
A More Comprehensive One
85
Even More Comprehensive
86
Next Time
• Quantitative Evaluation
87