AMEE Guide 49 - OSCE Evaluation
AMEE Guide 49 - OSCE Evaluation
AMEE GUIDE
Assessment 49
The Guides are designed for use by individual teachers to inform their practice and can be used to support
staff development programmes.
‘Living Guides’: An important feature of this new Guide series is the concept of supplements, which
will provide a continuing source of information on the topic. Published supplements will be available for
download.
If you would like to contribute a supplement based on your own experience, please contact the Guides
Series Editor, Professor Trevor Gibbs (tjg.gibbs@gmail.com).
Supplements may comprise either a ‘Viewpoint’, when you communicate your views and comments on
the Guide or the topic more generally, or a ‘Practical Application’, where you report on implementation
of some aspect of the subject of the Guide in your own situation. Submissions for consideration for inclusion
as a Guide supplement should be maximum 1,000 words.
Other Guides in the new series: A list of topics in this exciting new series are listed below and
continued on the back inside cover.
30 Peer Assisted Learning: a planning and 34 Teaching in the clinical environment 38 Learning in Interprofessional Teams
implementation framework Subha Ramani, Sam Leinster (2008) Marilyn Hammick, Lorna Olckers,
Michael Ross, Helen Cameron (2007) ISBN: 978-1-903934-43-2 Charles Campion-Smith (2010)
ISBN: 978-1-903934-38-8 An examination of the many ISBN: 978-1-903934-52-4
Primarily designed to assist curriculum challenges for teachers in the clinical Clarification of what is meant by
developers, course organisers and environment, application of relevant Inter-professional learning and an
educational researchers develop and educational theories to the clinical exploration of the concept of teams
implement their own PAL initiatives. context and practical teaching tips for and team working.
clinical teachers.
31 Workplace-based Assessment as an 39 Online eAssessment
Educational Tool 35 Continuing Medical Education Reg Dennick, Simon Wilkinson,
John Norcini, Vanessa Burch (2008) Nancy Davis, David Davis, Ralph Bloch Nigel Purcell (2010)
ISBN: 978-1-903934-39-5 (2010) ISBN: 978-1-903934-53-1
ISBN: 978-1-903934-44-9 An outline of the advantages of on-
Several methods for assessing work-
based activities are described, Designed to provide a foundation line eAssessment and an examination
with preliminary evidence of their for developing effective continuing of the intellectual, technical, learning
application, practicability, reliability medical education (CME) for and cost issues that arise from its use.
and validity. practicing physicians.
40 Creating effective poster presentations
32 e-Learning in Medical Education 36 Problem-Based Learning: where are we George Hess, Kathryn Tosney, Leon
Rachel Ellaway, Ken Masters (2008) now? Liegel (2009)
ISBN: 978-1-903934-41-8 David Taylor, Barbara Miflin (2010) ISBN: 978-1-903934-48-7
An increasingly important topic in ISBN: 978-1-903934-45-6 Practical tips on preparing a poster
medical education – a ‘must read’ A look at the various interpretations – an important, but often badly
introduction for the novice and a useful and practices that claim the label PBL, executed communication tool.
resource and update for the more and a critique of these against the
experienced practitioner. original concept and practice. 41 The Place of Anatomy in Medical
Education
33 Faculty Development: Yesterday, Today 37 Setting and maintaining standards in Graham Louw, Norman Eizenberg,
and Tomorrow multiple choice examinations Stephen W Carmichael (2010)
Michelle McLean, Francois Cilliers, Raja C Bandaranayake (2010) ISBN: 978-1-903934-54-8
Jacqueline M van Wyk (2010) ISBN: 978-1-903934-51-7 The teaching of anatomy in a
ISBN: 978-1-903934-42-5 An examination of the more traditional and in a problem-based
Useful frameworks for designing, commonly used methods of standard curriculum from a practical and a
implementing and evaluating faculty setting together with their advantages theoretical perspective.
development programmes. and disadvantages and illustrations of
the procedures used in each, with the
help of an example.
Institution/Corresponding address:
Godfrey Pell, Principal Statistician, Medical Education Unit, Leeds Institute of Medical Education,
Worsley Building, University of Leeds, Leeds LS2 9JT, UK
Tel: +44 (0)113 23434378
Fax: +44 (0)113 23432597
Email: G.Pell@leeds.ac.uk
The authors:
Godfrey Pell is a Senior Statistician who has a strong background in management. Before joining the
University of Leeds he was with the Centre for Higher Education Practice at the Open University. Current
research includes standard setting for practical assessment in higher education, and the value of short
term interventionist programmes in literacy.
Richard Fuller is a Consultant Physician, and Director of the Leeds MB ChB undergraduate degree
programme within the Institute of Medical Education. His research interests include clinical assessment,
in particular monitoring and improving the quality of the OSCE.
Matthew Homer is a Research Fellow at the University of Leeds, working in both the Schools of Medicine
and Education. He works on a range of research projects and provides general statistical support to
colleagues. His research interests include the statistical side of assessment, particularly related to OSCEs.
Trudie Roberts is a Consultant Physician, a Professor of Medical Education and is the Director of the
Leeds Institute of Medical Education. Her research interests include clinical assessment.
© AMEE 2011
ISBN: 978-1-903934-62-3
Guide 49: How to measure the quality of the OSCE: A review of metrics
Contents
Abstract .. .. .. .. .. .. .. 1
Introduction .. .. .. .. .. .. .. 2
Conclusions .. .. .. .. .. .. 16
References .. .. .. .. .. .. .. 18
Guide 49: How to measure the quality of the OSCE: A review of metrics
Abstract
With an increasing use of criterion based assessment techniques in both With an increasing
undergraduate and postgraduate healthcare programmes, there is a use of criterion based
assessment techniques in
consequent need to ensure the quality and rigour of these assessments. both undergraduate and
The obvious question for those responsible for delivering assessment is how postgraduate healthcare
programmes, there is a
is this ‘quality’ measured, and what mechanisms might there be that allow consequent need to ensure
improvements in assessment quality over time to be demonstrated? Whilst a the quality and rigour of
small base of literature exists, few papers give more than one or two metrics these assessments.
In this Guide, aimed at assessment practitioners, the authors aim to review the
metrics that are available for measuring quality and indicate how a rounded
picture of OSCE assessment quality may be constructed by using a variety of
such measures, and also to consider which characteristics of the OSCE are
appropriately judged by which measure(s). The authors will discuss the quality
issues both at the individual station level and across the complete clinical
assessment as a whole, using a series of ‘worked examples’ drawn from OSCE
data sets from the authors’ institution.
• When judging the quality of an OSCE, it is very important to employ more than
one metric to gain an all round view of the assessment quality.
• The routine use of metrics within OSCE quality improvement allows a clear
method of measuring the effects of change.
Guide 49: How to measure the quality of the OSCE: A review of metrics
Introduction
With increasing scrutiny of the techniques used to support high level decision
making in academic disciplines, Criterion Based Assessment (CBA) delivers a
reliable and structured methodological approach. As a competency-based
methodology, CBA allows the delivery of ’high stakes’ summative assessment
(e.g. qualifying level or degree level examinations), and the demonstration
of high levels of both reliability and validity. This assessment methodology is
attractive, with a number of key benefits over more ’traditional’ unstructured
forms of assessment (e.g. viva voce) in that it is absolutist, carefully
standardised for all candidates, and assessments are clearly designed and
closely linked with performance objectives. These objectives can be clearly
mapped against curricular outcomes, and where appropriate, standards
laid down by regulatory and licensing bodies, that are available to students
and teachers alike. As such, CBA methodology has seen a wide application
beyond summative assessments, extending into the delivery of a variety of
work-based assessment tools across a range of academic disciplines (Norcini,
2007; Postgraduate Medical Education and Training Board, 2009). CBA is
also now being used in the UK in the recruitment of junior doctors, using a
structured interview similar to that used for selecting admissions to higher
education programmes (Eva et al., 2004).
Guide 49: How to measure the quality of the OSCE: A review of metrics
Understanding quality in OSCE assessments
– General Principles
This Guide will examine the metrics available, using final year OSCE results
from recent years as exemplars of how exactly these metrics can be
employed to measure the quality of the assessment. It is important to
recognise that a review of the OSCE metrics is only part of the overall process
of reviewing OSCE quality – which needs to embrace all relationships in the
wider assessment process (Figure 1).
Figure 1
OSCE quality assurance and improvement – a complex process
Curriculum
Staff
Blueprinting development
& assessment & examiner
innovation training
Reviewing
poor metrics – Simulation
assessing causes, – patients &
modelling technology
solutions
Institutional
Standard engagement
setting – oversight and
strategy
Guide 49: How to measure the quality of the OSCE: A review of metrics
Which method of standard setting?
The method of standard setting will determine the metrics available for use
in assessing quality. Standards can be relative (e.g. norm referenced) or
absolute, based either on the test item (Ebel & Angoff), or the performance
of the candidate (borderline methods). With the requirement for standards
to be defensible, evidenced and acceptable (Norcini, 2003), absolute
standards are generally used. Whilst all methods of standard setting will
generate a number of post-hoc metrics (e.g. station pass rates, fixed
effects (time of assessment, comparison across sites) or frequency of mark
distribution), it is important to choose a method of standard setting that
generates additional quality measures. At present, a large number of
institutions favour borderline methods, but only the regression method will give
some indication of the relationship between global grade and checklist score
and also the level of discrimination between weaker and stronger students.
Table 1 highlights the key differences between different borderline methods,
and what they contribute to assessment metrics.
Table 1
Comparison of the borderline methods of standard setting
The authors favour the borderline regression method because it uses all
the assessment interactions between assessors and candidates, and these
interactions are ‘real’. It is objectively based on pre-determined criteria, using
a large number of assessors and generates a wide range of metrics.
• Students who perform very badly and obtain a near zero checklist score.
• Students who achieve a creditable checklist score but who fail to impress
the assessor overall.
Guide 49: How to measure the quality of the OSCE: A review of metrics
How to generate station level quality metrics
Table 2 details a ‘standard’ report of metrics from a typical OSCE (20 stations
over two days, total testing time ~ three hours, spread over four examination
centres). This typically involves ~250 candidates, 500 assessors and 150
simulated patients, and healthy patient volunteers with stable clinical signs
(used for physical examination). Candidates are required to meet a passing
profile comprising of an overall pass score, minimum number of stations
passed (preventing excessive compensation, and adding the fidelity to the
requirement for a competent ‘all round’ doctor) and a minimum number
of acceptable patient ratings. Assessors complete an item checklist, and
then an overall global grade (the global grades in our OSCEs are recorded
numerically as 0=Clear fail, 1=Borderline, 2=Clear pass, 3=Very good pass,
4=Excellent pass).
The borderline regression method was used for standard setting (Pell &
Roberts, 2006). Typically such an OSCE will generate roughly 60,000 data
items (i.e. individual student-level checklist marks), which form a valuable
resource for allowing quality measurement and improvement. As a result
of utilising such data, we have seen our own OSCEs deliver progressively
more innovation, whilst simultaneously maintaining or improving the levels of
reliability.
Under any of the borderline methods of standard setting, where a global Under any of the borderline
methods of standard setting,
grade is awarded in addition to the checklist score, accompanying metrics
where a global grade
are useful in measuring the quality of the assessments. For other types of is awarded in addition
standard setting, where such a global grade does not form part of the to the checklist score,
accompanying metrics
standard setting procedure e.g. Ebel & Angoff, inter-grade discrimination and are useful in measuring the
coefficient of determination (R2) will not apply (Cusimano, 1996). quality of the assessments.
Table 2
Final year OSCE Metrics
Cronbach’s alpha Inter-grade Number of Between-group
Station if item deleted R2 discrimination failures variation (%)
Number of candidates=241
Guide 49: How to measure the quality of the OSCE: A review of metrics
Metric 1 – Cronbach’s Alpha
A selection of these overall summary metrics will be used in this Guide to
illustrate the use of psychometric data ‘in action’, and to outline approaches
to identifying and managing unsatisfactory station-level assessment
performance. We have chosen older OSCE data to illustrate this Guide, to
highlight quality issues, and subsequent actions and improvements.
Since alpha tends to increase with the number of items in the assessment, the
resulting ‘alpha if item deleted’ scores should all be lower than the overall
alpha score if the item/station has performed well. Where this is not the case
this may be caused by any of the following reasons:
• The item is measuring a different construct from the rest of the set of items.
• There are teaching issues – either the topic being tested has not been well
taught, or has been taught to a different standard across different groups
of candidates.
Guide 49: How to measure the quality of the OSCE: A review of metrics
It is also possible for alpha to be too high (e.g. >0.9), possibly indicating
redundancy in the assessment, whilst low alpha scores can sometimes be
attributed to large differences in station mean scores rather than being the
result of poorly designed stations.
We should point out that in the authors’ medical school, and in many similar
institutions throughout the UK, over 1,000 assessors are required for the
OSCE assessment season (usually comprising 2-3 large scale examinations
as previously described). Consequently, recruiting sufficient assessors of
acceptable quality is a perennial issue, so it is not possible to implement
double-marking arrangements that would then make the employment of
G-theory worthwhile in terms of more accurately quantifying differences in
assessors. Such types of analysis are more complex than those covered in this
Guide, and often require the use of additional, less user-friendly, software.
An individual, institution-based decision to use G-theory or Cronbach’s alpha An individual, institution
based decision to use G-
should be made in context with delivery requirements and any constraints. theory or Cronbach’s alpha
should be made in context
The hawks and doves effect, either within an individual station, or aggregated with delivery requirements
and any constraints.
to significant site effects, may have the effect of inflating the alpha value.
However, it is highly likely that this effect will lead to unsatisfactory metrics
in the areas of coefficient of determination, between-group within-station
error variance, and, possibly, in fixed effect site differences as we will explore
later in this Guide. Our philosophy is that one metric alone, including alpha,
is always insufficient in judging quality, and that in the case of an OSCE with
a high alpha but other poor metrics, this would not indicate a high quality
assessment.
Guide 49: How to measure the quality of the OSCE: A review of metrics
global ratings are accounted for by variation in their checklist scores. In
contrast, station 19 is less satisfactory with an R2 value of 0.404. This was a new
station focussing on patient safety and the management of a needlestick
injury. To understand why R2 was low, it is helpful to examine the relationship
graphically (for example, using SPSS Curve estimation) to investigate the
precise nature of the association between checklist and global grade – see
Figure 2. In this figure, assessor global grades are shown on the x-axis and the
total item checklist score is plotted on the y-axis. Clustered checklist scores
are indicated by the size of the black circle, as shown in the key. SPSS can
calculate the R2 coefficient for polynomials of different degree, and thereby
provide additional information on the degree of linearity in the relationship.
We would recommend always plotting a scatter graph of checklist marks We would recommend
always plotting a scatter
against global ratings as routine good practice, regardless of station metrics.
graph of checklist marks
against global ratings as
In station 19 we can see that there are two main problems – a wide spread routine good practice,
regardless of station metrics.
of marks for each global grade, and a very wide spread of marks for which
the fail grade (0 on the x-axis) has been awarded. This indicates that some
students have acquired many of the marks from the item checklist, but their
overall performance has raised concerns in the assessor leading to a global
fail grade.
Figure 2
Curve estimation (Station 19) – Assessor checklist score (y) versus global grade (x)
Guide 49: How to measure the quality of the OSCE: A review of metrics
This unsatisfactory relationship between checklist marks and global ratings
causes some degree of non-linearity, as demonstrated in the accompanying
Table 3 (produced by SPSS), where it is clear graphically that the best fit
is clearly cubic. Note that mathematically speaking, a cubic will always
produce a better fit, but parsimony dictates that the difference between
the two fits has to be statistically significant for a higher order model to be
preferred. In this example the fit of the cubic polynomial is significantly better
than that of the linear. The key point to note is whether the cubic expression
is the result of an underlying relationship or as a result of outliers, resulting
from inappropriate checklist design or unacceptable assessor behaviour in
marking. In making this judgement, readers should review the distribution of
marks seen on the scattergraph. Our own experience suggests that where
stations metrics are generally of good quality, a departure from strict linearity
is not a cause for concern.
Table 3
Curve estimation table (Station19)
Guide 49: How to measure the quality of the OSCE: A review of metrics
Where very poor student performance in terms of the checklist score occurs,
consideration needs to be given to whether these very low scores should be
excluded from standard setting to avoid excessive impact on overall passing
scores in a downward direction.
Figure 3
Curve estimation (Station 14) Assessor checklist score (y) versus global grade (x)
10 Guide 49: How to measure the quality of the OSCE: A review of metrics
As previously described, other psychometric data can be used to investigate
station design and performance in order to identify problems. Failure rates
may be used to review the impact of a change in teaching on a particular
topic – with higher such rates indicating where a review of content and
methods of teaching can help course design. There are no major outliers for
this metric in Table 2, but the difficulties with station 19 have allowed us to
identify and deliver additional teaching around elements of patient safety
within the final year curriculum, and introduce this specific safety focus into
checklists.
In the ideal assessment process, all the variation in marks will be due
to differences in student performance, and not due to differences in
environment (e.g. local variations in layout or equipment), location (e.g.
hospital based sites having different local policies for management of clinical
conditions), or differences of assessor attitude (i.e. hawks & doves). There are
two ways of measuring such effects, either by performing a one-way ANOVA
on the station (e.g. with the assessor as a fixed effect), or by computing
the proportion of total variance which is group specific. The latter allows
an estimation of the proportion of variation in checklist scores that is due
to student performance as distinct from other possible factors mentioned
above, although this is usually given as the proportion of variance which is
circuit specific.
From Table 2, stations 6, 17 and 19 give cause for concern with regard to this
metric, with the highest levels of between-group variance. In addition, station
6 has a poor R2, and the overall combination of poor metrics at this station
tells us that the poor R2 was probably due to poor checklist design. These
observations prompted a review of the design of station 6, and the checklist
was found to consist of a large number of low level criteria where weaker
candidates could attain high scores through ‘process’ only. In other words,
Guide 49: How to measure the quality of the OSCE: A review of metrics 11
there was a likely mismatch between the nature of the checklist, and the
aims and objectives of the station as understood by the assessors. Hence, in
redesigning the station, a number of the low-level criteria were chunked (that
is, grouped together to form a higher level criterion) in order to facilitate the
assessment of higher level processes as originally intended.
Station 17 tells a different story, as the good R2 coupled with the high
between-group variation indicates that assessors are marking consistently
within groups, but that there is a distinct hawks and doves effect between
groups. In such a case, this ought to be further investigated by undertaking a
one-way ANOVA analysis to determine whether this is an individual assessor
or a site phenomenon. The amount of variance attributable to different sites
is subsumed in the simple computation of within-station between-group
variance as describe above. However, its significance may be determined
using a one-way ANOVA analysis with sites as fixed effects.
Other important fixed effects can also be identified through the use of
ANOVA. For example, assessor training effects, staff/student gender effects,
and associated interactions, which have all been previously described
(Pell, 2008), and which underline the need for complete and enhanced
assessor training as previously highlighted (Holmboe, 2004).
12 Guide 49: How to measure the quality of the OSCE: A review of metrics
Metric 7 – Standardised Patients ratings
Most centres that use simulated/standardised patients (SPs) require them to
rate candidates, and this typically follows an intensive training programme.
Within our own institution, SPs would be asked a question such as “Would you
like to consult again with this doctor?” with a range of responses (strongly
agree, agree, neither agree nor disagree, disagree, strongly disagree), the
two latter responses being regarded as adverse. Akin to Metric 4 (Number of
station failures), a higher than normal proportion of candidates (e.g. >10%)
receiving adverse SP ratings may indicate problems. There is no available There is no available
literature on what constitutes
literature on what constitutes an ‘acceptable’ range of SP ratings at station
an ‘acceptable’ range of SP
level, so we have chosen an arbitrary cut off figure of 10%. The critical issue ratings at station level.
here is that other station metrics should be reviewed, and the impact on SP
ratings monitored in response to training or other interventions.
If this is coupled with a higher than normal failure rate it could be the result
of inadequate teaching of the topic. Adverse values of this metric are often
accompanied by high rates of between-group variance; assessors viewing
candidates exhibiting a lower than expected level of competence often
have difficulty in achieving consistency.
Actions to resolve this would typically include a review of the station content
and translation to the item checklist. Reviewing grade descriptors and
support material for assessors at station level should help overcome the
mismatch revealed by the poor R2 and higher error variance.
Guide 49: How to measure the quality of the OSCE: A review of metrics 13
Figure 4:
Curve estimation (Station 8) – Assessor checklist score (y) versus global grade (x)
Observed
50.00 Linear
Quadratic
Cubic
40.00
30.00
20.00
10.00
0.00 1.00 2.00 3.00 4.00
The curve shows wide performance variance at each grade level. The good
R2 suggests the variation in assessor global rating rather than assessor checklist
scoring, with a hawks and doves effect.
Figure 5
Curve estimation (Station 9) - Assessor checklist score (y) versus global grade (x)
Observed
30.00 Linear
Quadratic
Cubic
25.00
20.00
15.00
10.00
5.00
0.00 1.00 2.00 3.00 4.00
14 Guide 49: How to measure the quality of the OSCE: A review of metrics
Action to investigate and improve this would focus on assessor support
material in relation to global ratings.
• Ensuring that stations carry the same provision of equipment (or permit
flexibility if students are taught different approaches with different
equipment).
Other sources of error variance can occur during the delivery of the OSCE:
• Assessors who arrive late and miss the pre-assessment briefing and who
therefore fail to adhere adequately to the prescribed methodology.
• Biased real patients (e.g. gender or race bias). Simulated patients receive
training on how to interact with the candidates, but this may not be
possible with the majority of real patients to the same level undertaken with
simulators.
Guide 49: How to measure the quality of the OSCE: A review of metrics 15
needs) located within a single specific site needs to be discounted when
computing the adjustment level.
3. Removal of a station: Again, this is a rare event and the criteria for this are
usually multiple adverse metrics, the result of which would disadvantage
students to such an extent that the assessment decisions are indefensible
against appeal.
Conclusions
Using a series of worked examples and ‘live data’, this Guide focuses on
commonly used OSCE metrics and how they can be used to identify and
manage problems, and how such an approach helps to anticipate future
issues at the school/single institution level. This methodology therefore
naturally feeds into the wider assessment processes as described in Figure 1.
• Chunking to allow for higher level criteria commensurate with the stage of
student progression, allowing assessment of higher level, less process driven
performance
• Ensuring that checklist criteria have three instead of two anchors where
appropriate, thereby allowing greater discrimination by assessors.
The presence of high failure rates at particular stations has led to a revisiting
of the teaching of specific parts of the curriculum, and was followed by
changes in the way things were taught, resulting in improved student
performance as measured in subsequent OSCEs.
16 Guide 49: How to measure the quality of the OSCE: A review of metrics
• Improved assessor briefings prior to the assessment.
• Dummy runs before the formal assessment for both assessors and SPs (this
is only really practicable where student numbers are relatively small e.g.
resits, and in dental OSCEs with smaller cohorts of students).
The need for all the above improvements would be unlikely to have been
apparent from using a single reliability metric, such as Cronbach’s alpha or
the G Coefficient. It is only when a family of metrics is used that a true picture It is only when a family of
metrics is used that a true
of quality can be obtained and the deficient areas identified. Adopting this picture of quality can be
approach will be rewarded with a steady improvement in the delivery and obtained and the deficient
standard of clinical assessment. areas identified. Adopting
this approach will be
rewarded with a steady
improvement in the delivery
and standard of clinical
assessment.
Guide 49: How to measure the quality of the OSCE: A review of metrics 17
References
COHEN DS, COLLIVER JA, ROBBS RS & SWARTZ MH (1997). A Large-Scale Study of the
Reliabilities of Checklist Scores and Ratings of Interpersonal and Communication
Skills Evaluated on a Standardized-Patient Examination. Advances in Health Sciences
Education, 1: 209-213.
EVA KW, ROSENFELD J, REITER H & NORMAN GR (2004). An Admissions OSCE: the
multiple mini-interview. Medical Education, 38: 314-326.
FIELD A (2000). Discovering Statistics (using SPSS for windows), p.130 (Sage Publications,
London)
HOLMBOE E (2004). Faculty and the observation of trainees’ clinical skills: Problems and
opportunities. Academic Medicine, 79(1): 16-22.
HOMER M & PELL G (2009). The impact of the inclusion of simulated patient ratings on
the reliability of OSCE assessments under the borderline regression method. Medical
Teacher, 31(5): 420-425.
PELL G, HOMER M & ROBERTS TE (2008). Assessor Training: Its Effects on Criterion Based
Assessment in a Medical Context. International Journal of Research & Method in
Education, 31(2): 143-154.
PELL G & ROBERTS TE (2006). Setting standards for student assessment. International
Journal of Research & Method in Education, 29(1): 91-103.
ROBERTS C, NEWBLE D, JOLLY B, REED M & HAMPTON K (2006). Assuring the quality of
high-stakes undergraduate assessments of clinical competence. Medical Teacher,
28(6): 535-543.
STEVENS J (1992). Applied multivariate statistics for the social sciences (2nd ed),
Chapter 4: 151-182 (Erlbaum, Hillside NJ)
CIZEK GJ & BUNCH MB (2007). Standard Setting (1st ed). Sage Publications, London
18 Guide 49: How to measure the quality of the OSCE: A review of metrics
42 The use of simulated patients in 50 Simulation in Healthcare Education. 57 General overview of the theories used
medical education Building a Simulation Programme: in assessment
Jennifer A Cleland, Keiko Abe, a Practical Guide Lambert WT Schuwirth, Cees PM van
Jan-Joost Rethans (2010) Kamran Khan, Serena Tolhurst-Cleaver, der Vleuten (2012)
ISBN: 978-1-903934-55-5 Sara White, William Simpson (2011) ISBN: 978-1-903934-97-5
A detailed overview on how to recruit, ISBN: 978-1-903934-63-0 As assessment is modified to suit
train and use Standardized Patients A very practical approach to student learning, it is important that we
from a teaching and assessment designing, developing and organising understand the theories that underpin
perspective. a simulation programme. which method of assessment are
chosen. This guide provides an insight
43 Scholarship, Publication and Career 51 Communication Skills: An essential into the essesntial theories used.
Advancement in Health Professions component of medical curricula. Part I:
Education Assessment of Clinical Communication 58 Self-Regulation Theory: Applications to
William C McGaghie (2010) Anita Laidlaw, Jo Hart (2011) medical education
ISBN: 978-1-903934-50-0 ISBN: 978-1-903934-85-2 John Sandars, Timothy J Cleary (2012)
Advice for the teacher on the An overview of the essential steps to ISBN: 978-1-903934-99-9
preparation and publication of take in the development of assessing Self-regulation theory, as applied to
manuscripts and twenty-one practical the competency of students’ medical education, describes the
suggestions about how to advance a communication skills. cyclical control of academic and
successful and satisfying career in the clinical performance through several
academic health professions. key processes that include goal-
52 Situativity Theory: A perspective on
directed behaviour, use of specific
how participants and the environment
44 The Use of Reflection in Medical can interact
strategies to attain goals, and the
Education adaptation and modification to
Steven J Durning, Anthony R Artino behaviours or strategies to optimise
John Sandars (2010) (2011)
ISBN: 978-1-903934-56-2 learning and performance.
ISBN: 978-1-903934-87-6
A variety of educational approaches Clarification of the theory that our
in undergraduate, postgraduate and 59 How can Self-Determination Theory
environment affects what we and our
continuing medical education that assist our understanding of the
students learn.
can be used for reflection, from text teaching and learning processes in
based reflective journals and critical medical education?
incident reports to the creative use of 53 Ethics and Law in the Medical Olle ThJ ten Cate, Rashmi A Kusurkar,
digital media and storytelling. Curriculum Geoffrey C Williams (2012)
Al Dowie, Anthea Martin (2011) ISBN: 978-1-908438-01-0
ISBN: 978-1-903934-89-0
45 Portfolios for Assessment and Learning Self-Determination Theory (SDT) serves
Jan van Tartwijk, Erik W Driessen (2010) An explanation of why and a practical among the current major motivational
ISBN: 978-1-903934-57-9 description of how this important theories in psychology but its
topic can be introduced into the applications in medical education
An overview of the content and undergraduate medical curriculum.
structure of various types of portfolios, are rare. This guide uncovers the
including eportfolios, and the factors potential of SDT to help understand
that influence their success. 54 Post Examination Analysis of many common processes in medical
Objective Tests education.
Mohsen Tavakol, Reg Dennick (2011)
46 Student Selected Components
Simon C Riley (2010)
ISBN: 978-1-903934-91-3 60 Building bridges between theory
ISBN: 978-1-903934-58-6 A clear overview of the practical and practice in medical education
importance of analysing questions to by using a design-based research
An insight into the structure of an SSC
ensure quality and fairness of the test approach
programme and its various important
component parts. Diana HJM Dolmans, Dineke Tigelaar
55 Developing a Medical School: (2012)
Expansion of Medical Student ISBN: 978-1-908438-03-4
47 Using Rural and Remote Settings in the
Undergraduate Medical Curriculum Capacity in New Locations This guide describes how Design-Based
David Snadden, Joanna Bates, Philip Research (DBR) can help to bridge the
Moira Maley, Paul Worley, John Dent
(2010) Burns, Oscar Casiro, Richard B Hays, gap between research and practice,
Dan Hunt, Angela Towle (2011) by contributing towards theory testing
ISBN: 978-1-903934-59-3
ISBN: 978-1-903934-93-7 and refinement on the one hand and
A description of an RRME programme improvement of educational practice
As many new medical schools are
in action with a discussion of the on the other.
developed around the world, this
potential benefits and issues relating to
guide draws upon the experience
implementation.
of seven experts to provide a very 61 Integrating Professionalism into the
practical and logical approach to this Curriculum
48 Effective Small Group Learning often difficult subject. Helen O’Sullivan, Walther van der
Sarah Edmunds, George Brown (2010) Mook, Ray Fewtrell, Val Wass (2012)
ISBN: 978-1-903934-60-9
56 Research in Medical Education. “The ISBN: 978-1-908438-05-8
An overview of the use of small group research compass”: An introduction to Professionalism is now established
methods in medicine and what makes research in medical education an an important component of
them effective. Charlotte Ringsted, Brian Hodges, all medical curricula. This guide
Albert Scherpbier (2011) clearly explains the why and how of
49 How to Measure the Quality of the ISBN: 978-1-903934-95-1 integrating professionalism into the
OSCE: A Review of Metrics An introductory guide giving a broad curriculum and ways to overcome
Godfrey Pell, Richard Fuller, Matthew overview of the importance attached many of the obstacles encountered.
Homer, Trudie Roberts (2011) to research in medical education.
ISBN: 978-1-903934-62-3
A review of the metrics that are
available for measuring quality in
assessment and indicating how a
rounded picture of OSCE assessment
quality may be constructed by using a
variety of measures.
About AMEE
What is AMEE?
AMEE is an association for all with an interest in medical and healthcare professions education,
with members throughout the world. AMEE’s interests span the continuum of education from
undergraduate/basic training, through postgraduate/specialist training, to continuing professional
development/continuing medical education.
• Conferences: Since 1973 AMEE has been organising an annual conference, held in a European
city. The conference now attracts over 2300 participants from 80 countries.
• Courses: AMEE offers a series of courses at AMEE and other major medical education conferences
relating to teaching, assessment, research and technology in medical education.
• MedEdWorld: AMEE’s exciting new initiative has been established to help all concerned with
medical education to keep up to date with developments in the field, to promote networking
and sharing of ideas and resources between members and to promote collaborative learning
between students and teachers internationally.
• Medical Teacher: AMEE produces a leading international journal, Medical Teacher, published 12
times a year, included in the membership fee for individual and student members.
• Education Guides: AMEE also produces a series of education guides on a range of topics, including
Best Evidence Medical Education Guides reporting results of BEME Systematic Reviews in medical
education.
• Best Evidence Medical Education (BEME): AMEE is a leading player in the BEME initiative which aims
to create a culture of the use of best evidence in making decisions about teaching in medical and
healthcare professions education.
Membership categories
• Individual and student members (£85/£39 a year): Receive Medical Teacher (12 issues a year, hard
copy and online access), free membership of MedEdWorld, discount on conference attendance
and discount on publications.
• Institutional membership (£200 a year): Receive free membership of MedEdWorld for the institution,
discount on conference attendance for members of the institution and discount on publications.
If you would like more information about AMEE and its activities, please contact the AMEE Office:
Association for Medical Education in Europe (AMEE), Tay Park House, 484 Perth Road, Dundee DD2 1LR, UK
Tel: +44 (0)1382 381953; Fax: +44 (0)1382 381987; Email: amee@dundee.ac.uk
www.amee.org
ISBN: 978-1-903934-62-3 Scottish Charity No. SC 031618