0% found this document useful (0 votes)

26 views52 pages

HCI - Evaluation

The document discusses the importance of evaluation in Human-Computer Interaction (HCI), emphasizing its role in assessing usability, informing design, and guiding decision-making. It outlines various evaluation methods, including heuristic evaluations and user tests, detailing their processes, benefits, and drawbacks. Additionally, it covers statistical considerations for analyzing evaluation data, highlighting the significance of participant numbers and data types in ensuring valid results.

Uploaded by

Ottezza Bea Kris Victoria Balgos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views52 pages

HCI - Evaluation

Uploaded by

Ottezza Bea Kris Victoria Balgos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Evaluation

Why Evaluate?

• In HCI we evaluate interfaces and

systems to:
– Determine how usable they are for different
user groups
– Identify good and bad features to inform
future design
– Compare design choices to assist us in
making decisions
– Observe the effects of specific interfaces on
users
Why now?

 Evaluation is key component of HCI

 Evaluation is a process, not an event
 Design ideas from evaluation of existing
technologies
 Making things better starts by evaluation
Evaluation Methods

• Inspection methods (no users needed!)

– Heuristic evaluations
– Walkthroughs
– Other Inspections
• User Tests (users needed!)
– Observations/Ethnography
– Usability tests/ Controlled Experiments
Heuristic Evaluation

• Heuristic evaluation (what is it?)

– Method for finding usability problems
– Popularised by Jakob Nielsen
• “Discount” usability engineering
– Use with working interface or scenario
– Convenient
– Fast
– Easy to use
Heuristic Evaluation

• Systematic inspection to see if interface complies

to guidelines
• Method
– 3-5 inspectors
– usability engineers, end users, double experts…
– inspect interface in isolation (~1–2 hours for simple
interfaces)
• compare notes afterwards
– single evaluator only catches ~35% of usability problems,
5 evaluators catch 75%
• Works for paper, prototypes, and working systems
Points of Variation

• Evaluators
• Heuristics used
• Method employed during inspection
Evaluators
• These people can be novices or experts
– “novice evaluators”
– “regular specialists”
– “double specialists” (- Nielsen)
• Each evaluator finds different problems
• The best evaluators find both hard and easy
problems
Heuristics

• Heuristics are rules that are used to

inform the inspection…
• There are many heuristic sets
Nielsen's Heuristics
 Visibility of system status
 Match between system & real world
 User control and freedom
 Consistency & standards
 Error prevention
 Recognition rather than recall
 Flexibility & efficiency of use
 Minimalist design
 Help error recovery
 Help & documentation
Example 1. Visibility of system status
What is “reasonable time”?

 0.1 sec: Feels immediate to the user. No

additional feedback needed.
 1.0 sec: Tolerable, but doesn’t feel
immediate. Some feedback needed.
 10 sec: Maximum duration for keeping
user’s focus on the action.
 For longer delays, use % done progress
bars.
Example 2. Consistency & Standards
Example 3. Aesthetic and minimalist
design
Phases of a heuristic evaluation

 1. Pre-evaluation training – give evaluators

needed domain knowledge and information
on the scenario
 2. Evaluate interface independently
 3. Rate each problem for severity
 4. Aggregate results
 5. Debrief: Report the results to the interface
designers
Severity ratings
 Each evaluator rates individually:
 0 - don’t agree that this is a usability problem
 1 - cosmetic problem
 2 - minor usability problem
 3 - major usability problem; important to fix
 4 - usability catastrophe; imperative to fix
 Consider both impact and frequency.
Styles of Heuristic evaluation

 Problems found by a single inspector

 Problems found by multiple inspectors
 Individuals vs. teams
 Goal or task?
 Structured or free exploration?
Problems found by a single inspector
 Average over six case studies
 35% of all usability problems;
 42% of the major problems
 32% of the minor problems
 Not great, but finding some problems with one
evaluator is much better than finding no
problems with no evaluators!
Problems found by a single inspector

 Varies according to
 difficulty of the interface being evaluated

 the expertise of the inspectors

 Average problems found by:

 novice evaluators - no usability expertise - 22%

 regular specialists - expertise in usability - 41%

 double specialists - experience in both usability and the

particular kind of interface being evaluated – 60%
 also find domain-related problems

 Tradeoff
 novices poorer, but cheaper!
Problems found by multiple
evaluators
 3-5 evaluators find 66-75% of usability problems
 different people find different usability problems

 only modest overlap between the sets of problems found

Individuals vs. teams

 Nielsen
 recommends individual evaluators inspect the
interface alone
 Why?
 evaluation is not influenced by others
 independent and unbiased
 greater variability in the kinds of errors found
 no overhead required to organize group meetings
Self Guided vs. Scenario Exploration
 Self-guided
 open-ended exploration
 Not necessarily task-directed
 good for exploring diverse aspects of the interface, and to
follow potential pitfalls
 Scenarios
 step through the interface using representative end user
tasks
 ensures problems identified in relevant portions of the
interface
 ensures that specific features of interest are evaluated
 but limits the scope of the evaluation - problems can be
missed
How useful are they?

 Inspection methods are discount methods

for practitioners. They are not rigorous
scientific methods.
 All inspection methods are subjective.
 No inspection method can compensate for
inexperience or poor judgement.
 Using multiple analysts results in an inter-subjective
synthesis.
How useful are multiple analysts?
 However, this also
 a) raises the false alarm rate, unless a voting
system is applied
 b) reduces the hit rate if a voting system is applied!
 Group synthesis of a prioritized problem list seems
to be the most effective current practical approach.
Ethnography

 Observation of users in their natural

environment e.g. where the product is
used
 Can lead to insight into
 Problems (amount and significance) in
interaction
 Ideas for solutions
– http://www.youtube.com/watch?
v=vbx739sIS00

A bit like a professional stalker/ interviewer

Ethnography

 Examples of data collected

 Conversations and semi structured
interviews
 Researcher observations and question
answers
 Descriptions of activities or environments
 Memos and notices in the environment
 User stories
Ethnography

 Benefits
 High ecological validity
 Great for identifying how design fits into the “real
world”
 Drawbacks
 Lack of control in design
 Data can be tricky and cumbersome to analyse
 Video, audio coding etc
 Fluidity of interpretation

Information free for all

Controlled Experiments/ User
Studies
 More Scientific Method
 Control is key
 Reduction of confounds
 Aim to investigate hypotheses about how
the designs affect:
 User Performance (Time or Error rate)
 Satisfaction
 Emotions/other psychological constructs
 Pre-defined task/goal
Controlled Experiments/ User
Studies

 Comparison of design solutions

 Results can feedback into redesign
 Typically termed usability engineering
 Robust study design
 Randomisation/Counterbalancing
 Ensures effect is due to the manipulation of
your independent variable
Example: A/B testing
 Two minor variants of a web page
 Show design A to every even-
numbered visitor to web site
 Show design B to every odd number
 Monitor site to see which has higher
dwell rate/click-through rate
 Choose better design
 Repeat

30
Good news
 Google can do this for you
 https://support.google.com/
analytics/bin/answer.py?
hl=en&answer=1745147&topic=174
5207&ctx=topic

31
Variables in Controlled Experiments

 Independent variables (IV’s)

 Variables controlled by the experimenter
 Design option
 Interaction at Time 1 and Time 2

 Dependent variables (DV’s)

 Variables being observed
 Completion time (for efficiency)
 Satisfaction Measure (SUMI)
Types of Experiment Design

 Between-subjects
 Within-subjects
 Benefits and drawbacks
 This will link to how you analyse your
data (more about this later)

BS- positives- independent groups ; no experience effect;

BS- negatives- individual abilities affect the data (although this can be minimised by random allocation to conditions; heavy need for participants for a valid experiment
WS- positives- takes into account individual differences; less participants to have good robust statistics
WS- negatives- practice effect (although this can be minimised by counterbalancing of conditions)
The ecological validity conundrum

 Controlled experiments are useful

 Causal inference
 Specificity of effect (sort of)
 Replicable and robust
 But are they realistic?
 Artificiality of scenario/lab environment
 Hawthorne effect
 Do they hinder creative design?

We can never tell if a variable is influenced by something we haven’t measured. In fact it is likely I.e. individual differences of the users in cognitive ability or
personality for instance but random allocation of users to conditions helps with this.
An Example

 Designing IT devices
for health
professionals

 Is this a good
environment to test in
for this device?

 Probably not….
Increasing ecological validity in
experiments

 Use representative participants

 Make the environment as realistic as

possible

 Make the tasks and scenario as realistic as

possible
Which is the most valid method?

Triangulation is the key and some will be more valid in certain scenarios e.g. where you have some designs you want to test then experiments might be good but if you
are at an early stage then inspection methods or observations may be better.

Whether you want to be theoretical I.e. see the effect of interfaces on users (in which case the psychological methods of controlled experiments will give you sound
scientific data) or want to design a product where causal inference may not be so important

Dependent on constraints (time/budget)

Statistics for evaluation
Data Types

 Quantitative
 Interval/Ratio
 Temperature, height, weight, questionnaire scale
(?)
 Qualitative
 Ordinal/Nominal
 The ranked rating of 3 interfaces
 Number of times an option is selected
Data Analysis

 Your data type will influence how you

analyse your data
 Parametric- Interval/Ratio
 Non Parametric- Ordinal/Nominal
 Study design will also affect analysis
 Between or Within Subjects Analysis
 Correlation Analysis
Statistical Assumptions

 Very important and again will influence

your analysis
 The most important one of these needs to be
demonstrated……
 Tall
 Medium Height
 Smaller
For whom the bell (curve) tolls….
Other assumptions of parametric
analysis
 Interval/Ratio data
 Equality of variance/ Sphericity
 Depends on study design
 Independence of data
 Depends on study design
Help….my data meets none of these!
 Qualitative analysis should be used
 But….
 Less power than parametric
 Lose quantity differences when comparing
measures
 Ranked data
Statistical Significance

 What does it mean?

 The probability that the difference/
relationship between the groups/variables is
due to chance
 Conventional levels
 p<0.05, p<0.01, p<0.001
 Infer strength of relationship
Available tests

 Correlation analysis (Pearson’s r)

 Linear relationship between two continuous variables
 Pearson’s r= strength of that relationship
 + or - = Direction
 No causality only relationship!
 Student t-test
 Compares means of 2 groups on the DV to see if they
are significantly different
 E.g. Interface 1 vs Interface 2
 Between (independent) or Within (dependent) t-tests
Available tests

 ANOVA
 Compares means of 3 or more groups on the
DV to see if they are significantly different
 Between, Within and Mixed
 Interaction Effects
The Importance of N

 The amount of participants (N) is important

 Effect size/Statistical Power
 Central limit theorem and normality of data
 Reduces effects of outliers on statistics
 Representative sample
 Nielsen’s 5 = bad stats if used for experiments
 Why?
Hello Participants!!

Poor generalisability from these sets of users- where would they fit on the normal distribution?
The Importance of Test Focus
 Family-wise error rate
 As you increase the amount of tests on the
data the chance of gaining a false positive
(Type 1 error) is increased
 Keep sight of what you are measuring
 E.g. Spurious correlations (Long hair and IQ)
 With lots of tests (e.g. Correlation matrix)
the strength of effect is important
What we have covered today
 Evaluation methods
 No users needed (e.g. Heuristic Eval, Cognitive
Walkthrough)
 Users needed (e.g. Ethnography, Experiments)
 Comparative validity of these methods
 Statistics in evaluation
 Data types
 Assumptions
 Tests
 Critical aspects of analysis design
Some Resources
 Methods
 Book: Cairn & Cox (2009) Research Methods in HCI.
(Also covered in all good HCI texts)
 Jakob Nielsen’s Alertbox Site
 www.useit.com/alertbox/

 Statistics
 Andy Field’s Statistics Hell Site
 www.statisticshell.com - actually more heaven than
hell

UI - Evaluation Methods-Inspection (Jacob Heuristiics) - Mar 2023 - (Part-1)
No ratings yet
UI - Evaluation Methods-Inspection (Jacob Heuristiics) - Mar 2023 - (Part-1)
25 pages
Design Heuristics & Usability Testing
No ratings yet
Design Heuristics & Usability Testing
40 pages
Part 7 Evaluation
No ratings yet
Part 7 Evaluation
33 pages
Chapter Seven: Evaluation Techniques
No ratings yet
Chapter Seven: Evaluation Techniques
33 pages
W02 DesignHeuristics - UsabilityTesting 01
No ratings yet
W02 DesignHeuristics - UsabilityTesting 01
27 pages
Session 13 - 14 - IsYS6596 - Techniques For Designing UX Evaluation
No ratings yet
Session 13 - 14 - IsYS6596 - Techniques For Designing UX Evaluation
47 pages
LECTURE - 7 - Evaluation Techniques
No ratings yet
LECTURE - 7 - Evaluation Techniques
34 pages
Chapter 2 HCI
No ratings yet
Chapter 2 HCI
32 pages
Ambo University Woliso Campus: School of Technology and Informatics
No ratings yet
Ambo University Woliso Campus: School of Technology and Informatics
11 pages
HCI Evaluation for Teen Apps
No ratings yet
HCI Evaluation for Teen Apps
41 pages
Part 1 (Chapter 1-4) : Fundamental Components of Interactive System
No ratings yet
Part 1 (Chapter 1-4) : Fundamental Components of Interactive System
46 pages
Human-Computer Interaction (HCI) : Evaluation Techniques
No ratings yet
Human-Computer Interaction (HCI) : Evaluation Techniques
6 pages
HCI - Evaluation Techniques
No ratings yet
HCI - Evaluation Techniques
43 pages
HCI-lecture Chp9 (Evaluation Techniques)
No ratings yet
HCI-lecture Chp9 (Evaluation Techniques)
60 pages
cs3240 09 Qualitative Evaluation
No ratings yet
cs3240 09 Qualitative Evaluation
85 pages
Unit 2 Modified
No ratings yet
Unit 2 Modified
35 pages
Notes HCI
No ratings yet
Notes HCI
41 pages
HCI Individual Assagnment
No ratings yet
HCI Individual Assagnment
8 pages
Evaluationtech
No ratings yet
Evaluationtech
40 pages
Human Computer Interaction Assignment
No ratings yet
Human Computer Interaction Assignment
10 pages
Predictive Studies
No ratings yet
Predictive Studies
19 pages
Huristic Evaluations
No ratings yet
Huristic Evaluations
21 pages
Predictive Studies
No ratings yet
Predictive Studies
19 pages
Evaluation Techniques and Universal Design
No ratings yet
Evaluation Techniques and Universal Design
64 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
108 pages
Design Heuristics & Usability Testing
No ratings yet
Design Heuristics & Usability Testing
28 pages
Chapter 15
No ratings yet
Chapter 15
34 pages
Evaluation Techniques: Human-Computer Interaction
No ratings yet
Evaluation Techniques: Human-Computer Interaction
22 pages
Topic 4. Usability and User EXperience (UX)
No ratings yet
Topic 4. Usability and User EXperience (UX)
63 pages
Universal Design Evaluation Methods
No ratings yet
Universal Design Evaluation Methods
30 pages
HCI Module 6 and 7
No ratings yet
HCI Module 6 and 7
44 pages
Evaluation Techniques for Designers
No ratings yet
Evaluation Techniques for Designers
17 pages
NPTEL HCI Tut8
No ratings yet
NPTEL HCI Tut8
29 pages
Week - 13 - 14 - Evaluation Techniques
No ratings yet
Week - 13 - 14 - Evaluation Techniques
23 pages
Evaluation & Universal Design Guide
No ratings yet
Evaluation & Universal Design Guide
22 pages
Unit 7: Evaluation Techniques
No ratings yet
Unit 7: Evaluation Techniques
41 pages
Human Computer Interaction - Evaluation Techniques
No ratings yet
Human Computer Interaction - Evaluation Techniques
26 pages
HCI Lecture Evaluation Part-I
No ratings yet
HCI Lecture Evaluation Part-I
54 pages
Usability Evaluation Techniques
No ratings yet
Usability Evaluation Techniques
13 pages
Lecture 11expert Evaluation
No ratings yet
Lecture 11expert Evaluation
37 pages
Lecture 6
No ratings yet
Lecture 6
58 pages
G1 - System Design Evaluation
No ratings yet
G1 - System Design Evaluation
34 pages
Lec11 ch8
No ratings yet
Lec11 ch8
36 pages
MIT6 831S11 gr04
No ratings yet
MIT6 831S11 gr04
22 pages
Human Computer Interaction - Evaluation Techniques
No ratings yet
Human Computer Interaction - Evaluation Techniques
26 pages
Exp 7
No ratings yet
Exp 7
17 pages
Potential
No ratings yet
Potential
40 pages
Lecture 13 User Interface Evaluation
No ratings yet
Lecture 13 User Interface Evaluation
13 pages
Usability Testing
No ratings yet
Usability Testing
29 pages
Uma-L31 (Evaluation Technique)
No ratings yet
Uma-L31 (Evaluation Technique)
33 pages
Kmklo
No ratings yet
Kmklo
69 pages
Evaluation Techniques: Cisc3650-Spring2012-Sklar-Lecii.1 1 Cisc3650-Spring2012-Sklar-Lecii.1 2
No ratings yet
Evaluation Techniques: Cisc3650-Spring2012-Sklar-Lecii.1 1 Cisc3650-Spring2012-Sklar-Lecii.1 2
6 pages
Evaluation Through Expert Analysis
No ratings yet
Evaluation Through Expert Analysis
3 pages
Heuristic Evaluation in UX Design
No ratings yet
Heuristic Evaluation in UX Design
9 pages
Heuristic Evaluation Insights
No ratings yet
Heuristic Evaluation Insights
18 pages
ECS781P-11-Edge of The Cloud
No ratings yet
ECS781P-11-Edge of The Cloud
30 pages
Enhanced Retro Pay
100% (1)
Enhanced Retro Pay
22 pages
Amateur TV Transmitter PDF
100% (1)
Amateur TV Transmitter PDF
11 pages
Interpretable Machine Learning
100% (4)
Interpretable Machine Learning
251 pages
Understanding Various Grief Models
No ratings yet
Understanding Various Grief Models
8 pages
Cratonização: Tectónica - 2018 - Daud Jamal
No ratings yet
Cratonização: Tectónica - 2018 - Daud Jamal
38 pages
DYNA Overview E01 Mail
No ratings yet
DYNA Overview E01 Mail
6 pages
Lectures On The Geometry of Quantization
No ratings yet
Lectures On The Geometry of Quantization
140 pages
Transmisionespt
100% (1)
Transmisionespt
66 pages
Assignment 02 HCI
No ratings yet
Assignment 02 HCI
1 page
Intermediate DVD Worksheets Unit 7
0% (1)
Intermediate DVD Worksheets Unit 7
5 pages
Inspection and Test Plan For Waterproofing Works
67% (3)
Inspection and Test Plan For Waterproofing Works
8 pages
Agile Unit II
No ratings yet
Agile Unit II
9 pages
Programming Assign Unit 5
No ratings yet
Programming Assign Unit 5
3 pages
Manual Kalfire W - EN
No ratings yet
Manual Kalfire W - EN
72 pages
A444 - One Side Antistatic Coated and Other Side Corona Treated Fold Retainable Polyester Film - For Candy Wrapping Application
No ratings yet
A444 - One Side Antistatic Coated and Other Side Corona Treated Fold Retainable Polyester Film - For Candy Wrapping Application
1 page
MSDS Bolidt Future Teak (01 B en GHS) PDF
No ratings yet
MSDS Bolidt Future Teak (01 B en GHS) PDF
12 pages
Andr
No ratings yet
Andr
2 pages
English7 - Lesson Exemplar - forDEMO
No ratings yet
English7 - Lesson Exemplar - forDEMO
5 pages
Claiming The High Ground
No ratings yet
Claiming The High Ground
421 pages
Test Bank For Sociology 13th Edition by Macionis Download
No ratings yet
Test Bank For Sociology 13th Edition by Macionis Download
41 pages
VT9500BT User Manual
No ratings yet
VT9500BT User Manual
15 pages
Abb 264DS DP
No ratings yet
Abb 264DS DP
5 pages
Vitalis Report Attachment
No ratings yet
Vitalis Report Attachment
24 pages
Soil Testing for Engineers
No ratings yet
Soil Testing for Engineers
6 pages
PDU For TMA Catalog (Add 3DT) - Add CEQ V3.0 - 001 PDF
No ratings yet
PDU For TMA Catalog (Add 3DT) - Add CEQ V3.0 - 001 PDF
3 pages
Allied - 3261 - Basic Anatomy (Including Histology) - Anttp (December-2020) - December-2020 (Oct-20)
No ratings yet
Allied - 3261 - Basic Anatomy (Including Histology) - Anttp (December-2020) - December-2020 (Oct-20)
2 pages
Voltage Dips During Start-Up of Large Compressor Motors
No ratings yet
Voltage Dips During Start-Up of Large Compressor Motors
2 pages
Vernier Caliper Worksheet With Example Solution PDF
100% (7)
Vernier Caliper Worksheet With Example Solution PDF
3 pages
IshworThapa MPA631Rural-UrbanDevelopment
No ratings yet
IshworThapa MPA631Rural-UrbanDevelopment
114 pages

HCI - Evaluation

Uploaded by

HCI - Evaluation

Uploaded by

Evaluation

• In HCI we evaluate interfaces and

 Evaluation is key component of HCI

• Inspection methods (no users needed!)

• Heuristic evaluation (what is it?)

• Systematic inspection to see if interface complies

• Heuristics are rules that are used to

 0.1 sec: Feels immediate to the user. No

 1. Pre-evaluation training – give evaluators

 Problems found by a single inspector

 the expertise of the inspectors

 Average problems found by:

 regular specialists - expertise in usability - 41%

 double specialists - experience in both usability and the

 only modest overlap between the sets of problems found

 Inspection methods are discount methods

 Observation of users in their natural

A bit like a professional stalker/ interviewer

 Examples of data collected

Information free for all

 Comparison of design solutions

 Independent variables (IV’s)

 Dependent variables (DV’s)

BS- positives- independent groups ; no experience effect;

 Controlled experiments are useful

 Use representative participants

 Make the environment as realistic as

 Make the tasks and scenario as realistic as

Dependent on constraints (time/budget)

 Your data type will influence how you

 Very important and again will influence

 What does it mean?

 Correlation analysis (Pearson’s r)

 The amount of participants (N) is important

You might also like