Action Understanding
Action Understanding
Action Understanding
Action
research inspired by ‘mirror neurons’ and related concepts.
Understanding actions from vision is a multi-faceted process
that serves many behavioural goals, and is served by diverse
Understanding
mechanisms and brain systems.
ACTION UNDERSTANDING
Angelika Lingnau
University of Regensburg
Paul Downing
Bangor University
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Shaftesbury Road, Cambridge CB2 8EA, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,
New Delhi – 110025, India
103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467
www.cambridge.org
Information on this title: www.cambridge.org/9781009476010
DOI: 10.1017/9781009386630
© Angelika Lingnau and Paul Downing 2024
This publication is in copyright. Subject to statutory exception and to the provisions
of relevant collective licensing agreements, no reproduction of any part may take
place without the written permission of Cambridge University Press & Assessment.
When citing this work, please include a reference to the DOI 10.1017/9781009386630
First published 2024
A catalogue record for this publication is available from the British Library.
ISBN 978-1-009-47601-0 Hardback
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Elements in Perception
DOI: 10.1017/9781009386630
First published online: March 2024
Angelika Lingnau
University of Regensburg
Paul Downing
Bangor University
Author for correspondence: Angelika Lingnau,
Angelika.Lingnau@psychologie.uni-regensburg.de
1 Introduction 1
4 Brain Mechanisms 24
6 Concluding Remarks 44
References 46
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Action Understanding 1
1 Introduction
1.1 Motivation
Our experience of everyday social life is deeply shaped by the actions that
we see others perform: consider a parent carefully watching her infant try
to feed herself, a fan watching a tennis match, or a pottery student observ-
ing her teacher throw a pot. Although we may sometimes pause momentar-
ily in puzzlement (what is my neighbour doing up there on his roof?) or be
caught by surprise (by a partner’s sudden romantic gesture), we normally
understand others’ actions quickly and without a feeling of expending
much effort. By doing so, we unlock answers to important questions
about the world around us: What will happen next? How could I learn to
do that? How should I behave in a similar situation? What are those people
like?
How, then, do we understand observed actions? The simplicity of this
question, and the fluency of action understanding, obscures the complexity
of the underlying mental and neural processes. To start to answer it, and in
contrast to several recent valuable perspectives (e.g. Kilner, 2011;
Oosterhof et al., 2013; Pitcher & Ungerleider, 2021; Tarhan & Konkle,
2020; Thompson et al., 2019; Tucciarelli et al., 2019; Wurm & Caramazza,
2021) we do not focus first on possible brain mechanisms (including the
possible role of mirror neurons; see Bonini et al., 2022; Heyes & Catmur,
2022). Instead, first thinking about the problem in terms of Marr’s (1982)
computational level, we ask: why would an observer attend to the actions
of others? A reasonable answer to this question might be: Observers attend
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
others’ actions to learn about the meaning and outcomes of different action
kinds; to establish causal links between actors’ actions and their goals,
states, traits, and beliefs; and to use that learned knowledge to make
predictions about the social and physical environment, and to extend
one’s own action repertoire. (Although beyond the focus of this review,
we also sometimes attend others’ actions for pure enjoyment, e.g. when
watching ballet or figure skating; e.g. Christensen & Calvo-Merino, 2013;
Orgs et al., 2013). Achieving these multiple complex aims requires suitable
mental representations and processes – algorithms in Marr’s (1982) terms.
That is the main focus of Section 2 of this article. In Section 3, we go on to
describe key neuroscientific evidence on action understanding (focusing on
Marr’s implementation level), drawing links to the concepts and constructs
described in Section 2. In the final section, we identify directions for future
research that are highlighted by this review.
2 Perception
(but see Camponogara et al., 2017; Repp & Knoblich, 2004, for discussion of
action understanding in other modalities). Evidence from animals is reviewed
for its influences on thinking about human action understanding. We set aside
the interpretation of actions and interactions that are conveyed symbolically,
such as the decisions of a partner in an economic game like the Prisoners’
Dilemma (e.g. Axelrod, 1980). Finally, we focus on understanding by typical
healthy adult observers in exclusion of neuropsychological or neuropsychiatric
populations. The logic for this is that while action understanding difficulties are
associated with (for example) autism, schizophrenia, or semantic dementia, it is
not clear that this is necessarily a central feature of those conditions (see e.g.
Cappa et al., 1998; Cusack et al., 2015; Frith & Done, 1988). Action clearly is
central to apraxia, however in that case definitions and diagnostics tend to focus
on patients’ production of appropriate gestures and skilled actions, particularly
those relevant to tool use (Baumard & Le Gall, 2021) rather than understanding
per se (but see e.g. Kalénine et al., 2010). That said, these difficulties may be
informative for our thinking about the different computations and algorithms
involved in action understanding; the same caveat applies to developmental
evidence (Reddy & Uithol, 2016; Southgate, 2013).
Other, more specific action-related topics have recently been reviewed else-
where: these include the perception of social interactions (McMahon & Isik,
2023; Papeo, 2020; Quadflieg & Westmoreland, 2019), the execution of joint or
collaborative actions (Azaad et al., 2021; Sebanz & Knoblich, 2021), and visual
perception of biological motion, especially from ‘point-light’ displays (Blake &
Shiffrar, 2007; Thompson & Parasuraman, 2012; Troje & Basbaum, 2008).
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
observer can make spatially and temporally precise predictions about how an
action will unfold (McDonough et al., 2019), and about the target of a reaching
movement or the intended use of a grasped object (Ambrosini et al., 2011, 2015;
Amoruso & Finisguierra, 2019; Amoruso & Urgesi, 2016). At the same time,
our semantic knowledge about different kinds of actions includes descriptions
of their typical aims, and of the kinds of events that typically tend to follow
(cf. Schank & Abelson, 1977). For example, observing a friend hand-washing
the dishes implies that next they will be dried and put away. Finally, observing
an action supports inferences about an actor’s underlying goals and beliefs,
enabling predictions about what future actions would be consistent with those
beliefs, or further those goals, and indeed how that actor might behave in new
situations even into the distant future.
different viewpoints, lighting effects, occlusion, and other visual variables, just as
in visual object recognition (see also Perrett et al., 1989). Further, a given action
(e.g. chopping vegetables) may be carried out by many possible actors, using
many possible objects, in many possible locations. That problem of generaliza-
tion is complemented by the problem of specificity, which requires correctly
excluding from a category exemplars that do not belong. Taking an analogy
from objects, for example, one must understand that a robin (canonical exemplar)
and a penguin (unusual exemplar) are both birds, but that a bat, despite numerous
shared features with the bird category, is not. Figure 2 illustrates that similar
problems arise for action understanding, where the challenge is to correctly
include visually diverse exemplars while excluding attractive foils.
Finally (and also like objects), actions are well described by taxonomies that
include an abstract (or ‘superordinate’) level, a basic level, and a subordinate
level (Rosch et al., 1976; Zhuang & Lingnau, 2022). For example, ‘playing
tennis’ may describe an action at the basic level that is part of a superordinate
Action Understanding 7
category ‘sporting activities’ and also includes the subordinate level ‘perform-
ing a forehand volley’. The basic level has been proposed to play a key role in
object categorization, e.g. as evidenced by the number of features used to
describe objects, and the speed of processing (Rosch et al., 1976). Zhuang &
Lingnau (2022) recently reported similar results for actions. Specifically, parti-
cipants produced the highest number of features to describe actions at the basic
level (see also Morris & Murphy, 1990; Rifkin, 1985). Moreover, they verified
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
action categories faster and more accurately at the basic and the subordinate
level in comparison to the superordinate level. These findings suggest that the
taxonomical levels of description proposed for objects have a homology in the
long-term representation of action knowledge.
Action Spaces
Figure 3 Illustration of the action ‘spaces’ idea. Action kinds may be construed
as atom-like points in representational spaces, the dimensions of which may
correspond to psychologically meaningful distinctions. Positions of actions
reflect their values on hypothetical mental dimensions. Distances between
actions are proportional to subjective judgments of the similarity between them.
Here we present only a reduced example for the sake of clarity; realistic action
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Action Frames
The ‘space’ metaphor is very powerful for capturing the key dimensions of action
knowledge as well as subjective judgments of the similarity of different kinds of
action. One limitation of the approach, however, is that it obscures some of the
rich internal structure that constitutes our knowledge of familiar actions. This is
not easily captured in a dimensional representation that treats action concepts as
single points in a mental space. Accordingly, building on previous conceptions of
knowledge frames (Minsky, 1975) and scripts (Bower et al., 1979; Schank &
Abelson, 1977) here we consider the idea of action frames. Related ideas have
also been more recently explored in the context of action understanding (Aksoy
et al., 2017; Chersi et al., 2011; Zacks et al., 2007), although these have tended to
focus more narrowly on specific issues such as the sequential nature of actions.
An action frame may be seen as a schematic representation that describes,
abstractly, important features of an action, such as its intended outcomes or
goals; means by which the goals typically are achieved; and the kinds of move-
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
ments, postures, objects, and locations associated with that kind of action
(Figure 4). These associations are assumed to be picked up from statistical co-
occurrences in our natural environment. Action frames may help to identify
action kinds by interacting with the output of perceptual systems that recognize
objects and scenes (Epstein & Baker, 2019), detect and classify people (Pitcher &
Ungerleider, 2021), and estimate their poses and movements (Giese & Poggio,
2003). These perceptual systems analyse an observed action, abstracting over
some details (e.g. the colour of a knife) while emphasizing others (e.g. its position
relative to the ingredients, and its motion related to the movements of the chef).
Consistent evidence gathered in the perceptual systems and schematic action
frame representations mutually reinforce each other, whereas inconsistent evi-
dence leads to suppression. Recent perspectives have also highlighted the per-
ceptual significance of typical relationships amongst scene elements (Bach et al.,
2005; Green & Hummel, 2006; Hafri & Firestone, 2021; Kaiser et al., 2019),
which will also have diagnostic value for distinguishing among different kinds of
Action Understanding 11
the directly observable elements that constitute an action; they also need to
include descriptions of the expected mental states of the actors. Finally, they
also require access to more general semantic knowledge of the physical and
social world. This includes, for example, knowledge about typical cause-and-
effect relationships (cooking pasta makes it soft and edible; stealing from
someone makes them angry). Likewise, we deploy knowledge about the ways
in which the properties of objects like tools make them suited to specific kinds
of manipulations for specific kinds of outcomes – the shape, hardness, and
weight distribution of a hammer makes it useful for driving in nails (e.g.
Buxbaum et al., 2014; Osiurak & Badets, 2016; see also Binkofski &
Buxbaum, 2013).
Action frames as described here might offer several useful properties. First,
they may describe the highly predictable way in which actions generally unfold
over time that is not readily captured by a semantic space of actions. For
example, purchasing food ingredients is not just semantically related to cook-
ing; one typically precedes the other in a predictable way. Likewise, at a finer
grain, preparing a soup may include obtaining, washing, peeling, and slicing
vegetables, sub-actions that only make sense in a specific order. These regular-
ities enable an observer to anticipate what is likely to follow next (Aksoy et al.,
2017; Chersi et al., 2011; Schank & Abelson, 1977; Zacks et al., 2007). It may
be difficult to capture these kinds of relationships in a scheme in which action
kinds are considered as ‘points’ in a multidimensional Euclidean space. A more
abstract and compositional representation may be better suited to capture the
temporal and causal relationships that describe typical chains of actions.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Prediction
From the action frames perspective, expectations – for example, evidence that
an action will unfold in a kitchen – allows relevant action frames (e.g. for cooking,
eating, and washing up) to compete with and suppress less relevant ones. In turn,
this enables pre-activation of cooking-relevant objects (e.g. a knife), again at the
expense of other unrelated objects (e.g. pliers). The net effect of these competitive
interactions should be a relative advantage in understanding actions that are
consistent with expectations, by suppression of unlikely alternatives. Indeed,
when actions are embedded in an incongruent context, they take longer to be
processed in comparison to actions embedded in a neutral or congruent context
(Wurm & Schubotz, 2012, 2017). Likewise, ambiguous actions are recognized
with higher accuracy when taking place in a congruent context in comparison to
incongruent or neutral contexts (Wurm & Schubotz, 2017; Wurm et al., 2017a).
Here, the surrounding context (e.g. the emotional facial expression of an agent)
shapes the interpretation of the action (e.g. an approaching fist with the intention to
punch or to greet the observer with a fist bump; see e.g. Kroczek et al., 2021), just
as ambiguous objects (e.g. Brandman & Peelen, 2017) and emotional facial
expressions (Aviezer et al., 2012) are interpreted in reference to their immediate
context in the domains of scene and body perception.
To summarize, here we have considered two complementary perspectives on
how the mind organizes long-term knowledge about familiar actions. These are
not mutually exclusive ideas: as action understanding is so complex, each
perspective may better describe different aspects of what we know about actions,
how that knowledge is applied to understanding ‘what’ a given action is, and how
that supports predictions about the actors and events that we interact with.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Observational Learning
Observational learning (sometimes ‘social learning’) refers, in the broadest
sense, to acquiring knowledge about the contingencies between behaviour and
14 Perception
action dynamics were not relevant. Other studies have examined observational
learning with actions that involve more continuous variables. For example,
Mattar & Gribble (2005) required participants to make simple reaches under
the influence of an unseen ‘force field’ that deflected those movements.
Participants who first watched another actor perform this task before attempting
it showed stable benefits (e.g. smaller disruptions to their own reach trajectories)
compared to controls. Notably, this observational learning remained essentially
intact even when it took place under a demanding concurrent cognitive load,
suggesting a relatively automatic and implicit form of learning (see also
Section 3).
Conversely, other work has examined the transfer of motor learning to visual
action judgments. Casile & Giese (2006) demonstrated how learning to perform
an unusual pattern of walking movements selectively improved visual detection
of those movements when they were rendered as point-light animations. In
a more naturalistic context, Aglioti et al. (2008) demonstrated that experienced
basketball players made better predictions about the outcome of observed free
throws in comparison to individuals with similar visual experience (experi-
enced coaches, sports journalists) and to novices. Improved performance of
players in this example, compared to experienced coaches, invites the interpret-
ation that motor experience specifically contributes to improved action under-
standing. In a similar vein, Knoblich & Flach (2001) found that participants
were better able to judge from a video where a thrown dart would land, when
that video depicted a previous throw that they had performed themselves,
compared to another thrower. An important feature of each of these motor-to-
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
vision studies is that the observed actions were seen from a side view, that is,
one that is normally unavailable for one’s own actions. Therefore, the learning
exhibited in those situations must extend over modalities (from motor to visual)
and must also generalize across visual perspective.
The preceding findings imply a close overlap between an observer’s own
motor repertoire and her ability to understand actions. Yet other findings show
that these two variables can be dissociated. For example, a series of studies of
individuals with congenital dysplasia who lack upper arms (and therefore have
no upper-limb motor representations), revealed essentially normal performance
in a variety of tasks. These included different aspects of action understanding,
including the ability to name pantomimes and point-light animations, to learn
new actions, and to predict the outcome of basketball free-throws (Vannuscorps
& Caramazza, 2016; but see Vannuscorps & Caramazza, 2023). Developmental
studies reveal similar dissociations; for example, three-month-old infants have
been shown to interpret observed actions as goal-directed before they are able to
perform reach and grasp actions themselves (Liu et al., 2019; see also
16 Perception
Southgate, 2013). In sum, whereas several studies suggest that the ability to
detect subtle differences in the kinematics of observed movements is modified
by the observer’s experience, relevant motor experience is not always
a necessary requirement for the ability to understand actions.
Imitation
not match. Variants of this procedure have been developed to understand this
compatibility effect, to identify its neural correlates (Darda & Ramsey, 2019), to
assess its malleability following training (Catmur et al., 2007), and to test the
claim that it is ‘automatic’ (Cracco et al., 2018).
In contrast to these relatively simple and controlled tasks, researchers in
social psychology have asked whether, in more naturalistic settings, participants
tend to unwittingly mimic the movements or body postures of confederates. For
example, Chartrand & Bargh (1999) reported a ‘chameleon effect’ whereby
individuals may unintentionally match others’ overt behaviours, and moreover
that the experience of being imitated in this fashion increases liking. In general,
then, there is some evidence of the tendency for irrelevant or incidental actions
of others to influence the observer’s own concurrent behaviours, even in the
absence of an explicit goal to imitate.
A final important distinction is that between imitation and emulation, where
the latter refers to an achievement of the same end state via different specific
Action Understanding 17
motoric means (see also Bekkering et al., 2000; Csibra, 2008; Heyes, 2001;
Tomasello et al., 1993). For example, given no specific instructions, preschool
children will tend to emulate the target of an action (e.g. reaching for the right
ear) instead of producing a faithful copy of the observed action (e.g. reaching for
the right ear with the contralateral hand; Bekkering et al., 2000). This finding
illustrates that actions may normally be understood by default from the ‘inten-
tional stance’ – as deliberate and rational behaviours, performed by an agent for
a reason – a topic we return to in Section 2.3.
actors that perform those actions. In the following section, we examine how
observed actions also provide evidence about more complex mental states
such as goals and beliefs.
a person switching on a light with her knee (which makes sense if the hands of
the actor are occupied, but not if they are empty). The error signals that are
generated by such unusual actions would normally trigger a search for an
explanation, just as would be expected for other violations of expectations
(such as seeing a rowboat in a desert landscape; Brandmann & Peelen, 2017;
Oliva & Torralba, 2007). Generally, when there is a significant mismatch
between a percept and one’s expectations or action knowledge, a more explicit
and effortful process is engaged to understand the action. To what extent does
that search involve representing the actor’s mental states?
One approach to examining mental state attribution in action understanding is
by reverse inference1 from the activity of brain regions that are thought to
support such ‘mentalizing’, as revealed in false-belief or perspective-taking
tasks (e.g. Saxe & Kanwisher, 2003; Schurz et al., 2014). Unusual actions
(switching on a light with the knee) recruit such brain regions more when
they are presented in an implausible context (actor’s hands are free) relative
to a more plausible context (the hands are otherwise occupied; Brass et al.,
2007). The logic is that the implausible action elicits an attempt to identify an
account of the situation, which by default is one that relies on representing the
mental states of the actor.
A related topic in social psychology (e.g. Ambady & Rosenthal, 1992; Estes,
1938; Tamir & Thornton, 2018) concerns how action understanding provides cues
about the states and traits of an actor (Bach & Schenke, 2017). Here we are
concerned with the meaning and outcomes of the action, rather than the dynamics
as reviewed in Section ‘How’ beyond Observational Learning. For example,
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
1
Applied to neuroimaging, ‘reverse inference’ describes estimating the cognitive processes
involved in a task on the basis of the brain regions that are engaged by that task (in fMRI, for
example). While sometimes used perjoratively, reverse inference may be a strong form of
induction where the activity of the region in question is consistently selective across different
contexts (Poldrack, 2006).
20 Perception
the answer depends on the goals of the observer: action understanding is not
monolithic. While there are important examples that cross the boundaries, the
tasks of classifying an action, understanding how an action is carried out, and
understanding the intentions of the actors, draw on different mental capacities.
Broadly, classifying actions requires a rich semantic ‘database’ of our long-term
knowledge about actions; attention to the means by which an action is per-
formed implicates implicit, motoric knowledge as well; and adopting the
intentional stance to make inferences about others’ mental states requires
implicit theories of how traits, states, intentions, and behaviour interact.
the automaticity of action understanding, given its multifaceted nature, and its
dependence on distinct processes as well as contextual factors including the
observer’s own experience and goals. To focus more closely on the question,
here we consider several conceptions of automaticity that have been put forward
in the social cognition literature (Bargh, 1989). To simplify the discussion, in
each case we refer to examples that have used the ‘automatic imitation’ task
(Brass et al., 2000; see above) as a proxy measure of understanding a simple
viewed movement.
First, what aspects of action understanding proceed even when they are not
relevant to the task at hand? Say the observer is trying to find a friend who is
performing on a crowded stage; to what extent does he also represent the
performer’s actions even though these are not relevant to his goal? In the context
of the automatic imitation task, Hemed et al. (2021) approached this issue by
including incompatible finger movements that were also never task relevant
(and so not part of the participants’ response set). Such irrelevant movements
did not affect task performance, providing one example of the attentional
filtering of action even in a very minimalistic setting. In other words, there is
a limit to the automaticity of processing even simple movements viewed in
isolation.
Second, what aspects of action understanding are resistant to top-down
control, which is to say they are carried out even when the observer deliberately
tries not to do so? Chong et al. (2009) reported that the ‘automatic imitation’ of
a viewed grasping action (measured via response compatibility effects) was
eliminated when participants’ attention was directed to another object presented
at the same location. Here, again we see evidence against strong ‘automaticity’
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
in the finding that even a single, foveated action will affect the observer’s
behaviour less to the extent that it is not in the focus of selective attention.
Third, to what extent does action understanding persist in a complex visual
environment, or under increased mental load? For example, in daily life, an
action may be observed in a serene setting (watching the only other patient in
a dentist’s waiting room) or in a complex one (watching fans in a sporting
arena). At the same time, one may be free of distraction, or alternatively heavily
distracted by another ongoing mental task (e.g. attending an online meeting
while also home-schooling). These examples highlight the dimensions of per-
ceptual and cognitive load, which deeply affect everyday cognition (Lavie &
Dalton, 2014). Several recent studies have explored the effects of perceptual
load (Catmur, 2016; Thompson et al., 2023) and cognitive load (Ramsey et al.,
2019) on tasks that require either explicit action category judgments or measure
action perception implicitly (but see Benoni, 2018). The general strategy is to
assess how an action task is impacted by a second concurrent task, performed at
22 Perception
racket may be more relevant in the second example. This intention to select
aspects of the action may fail, in the sense that there may be processing of
irrelevant aspects of the action as well. For one example, on the principles of
object-based attention (Cavanagh et al., 2023), attempting to focus on the
movement of the arm may necessarily entail selection of the tennis racket it
holds as well. Similarly, based on neuroimaging studies, Spunt & Lieberman
(2013) have suggested that focusing attention on ‘why’ an action is executed
also elicits a representation of ‘how’ it is executed, even if the latter is not task
relevant.
Finally, attention is sometimes construed as the selection of internal repre-
sentations or templates, for example to support visual search for a certain target
item such as a face or house (Chun et al., 2011; Peelen & Kastner, 2014;
Serences et al., 2004). Applied to actions, we can think about search templates
in the frameworks of action spaces and action frames (Section 2.1). In terms
of action spaces, attention might ‘reshape’ representational geometries
Action Understanding 23
(see also Edelman, 1998; Kriegeskorte & Kievit, 2013; Nosofsky et al.,
1986). As an example, attending closely to the location in which an action
takes place (e.g. a kitchen) might effectively ‘expand’ the representational
space of kitchen-related actions, and ‘compress’ the space around other
actions (see also Nastase et al., 2017; Shahdloo et al., 2022; Wurm &
Schubotz, 2012, 2017). In this metaphor, ‘expanding’ dimensions of
a representational space implies enhancing distinctions that are relevant to
that dimension (e.g. amongst different kinds of slicing, chopping, and
grating) and de-emphasizing other distinctions that are not relevant
(Figure 5). In contrast, in terms of action frames, attention might facilitate
or inhibit the connections between different scene elements (cf. Figure 4B)
or between different action frames (Figure 4C), again to highlight those that
are contextually relevant.
To briefly summarize Section 3: while we argue that a general answer to the
question ‘is action understanding automatic’ must be ‘no’, much remains to be
learned about how different senses of automaticity apply to different contexts.
We suggest that the concepts and approaches developed in the study of visual
attention in general, are well suited to test assumptions about the representations
captured in action spaces and action frames. This broader approach, we suggest,
will be more fruitful than seeking a simple answer to the question of whether or
not action understanding proceeds automatically.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
4 Brain Mechanisms
In the preceding sections, we focused on the mental processes and representations
that enable action understanding. Next, we review evidence and theories about
the brain regions, networks, and distributed patterns of activity that support action
understanding tasks. Neuroscientific studies in this area have been very strongly
influenced by the discovery of the ‘mirror neuron’ and related theoretical views
on the contribution of the motor system to visual action understanding.
Accordingly, we structure this section roughly chronologically to track initial
findings and conceptions of mirror neurons, following subsequent waves of
human neuroimaging and non-human primate studies, and finally to consider
more recently emerging theoretical perspectives. Specifically, we start our jour-
ney in Section 4.1 by briefly reviewing evidence for visual action-selective
neurons in the macaque superior temporal sulcus (STS). We then review in
Section 4.2 the initial reports and key findings about ‘mirror neurons’ in macaque
premotor cortex. Section 4.3 reviews studies inspired by those findings that
sought signatures of a human ‘mirror neuron system’. These have used several
methods to probe the activity of motor regions in visual action understanding
tasks, and to identify potential markers of ‘mirror-like’ representations. More
recently, as we see in Section 4.4, several groups have turned away from the
emphasis on motor representations, to instead draw methodological and theoret-
ical parallels between action understanding and research on visual object percep-
tion. Finally, in Section 4.5, we come full circle to consider more recent
discoveries about mirror neurons in the macaque, and to review how thinking
has evolved about possible alternative functional roles of mirror neurons or
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Later, action sensitive neurons with more complex properties were dis-
covered in the same general region. As mentioned earlier, under natural condi-
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Kilner & Lemon (2013), Rizzolatti & Sinigaglia (2016), Heyes & Catmur
(2022), and Bonini et al. (2022)).
Based on the initial discovery of mirror neurons, Di Pellegrino et al. (1992)
concluded that premotor cortex not only retrieves appropriate motor acts in
response to sensory stimuli (such as the shape and size of objects), but also in
response to the meaning of the motor acts of another individual. In other words,
the authors argued that these neurons provide an explicit representation of the
link between the execution of a motor act and its visual appearance when
performed by another individual (Di Pellegrino et al., 1992). Gallese et al.
(1996) went further to propose that mirror neurons play a role in action
understanding of motor events, which they defined as ‘the capacity to recognize
that an individual is performing an action, to differentiate this action from others
analogous to it, and to use this information in order to act appropriately’. In line
with the division between the ventral and dorsal pathways (Goodale & Milner,
1992; Ungerleider & Mishkin, 1982), the authors argued that neurons in STS
Action Understanding 27
understanding. Note the contrast between this perspective and the descriptions of
action spaces and action frames (Section 2), which describe our rich semantic
knowledge about actions that is not obviously motoric in nature.
Claims that mirror neurons constitute a solution to the problem of action
understanding, and that this takes place automatically, have remained contro-
versial. For example, single cell recordings are correlational, so they do not
allow inferences regarding a causal role of measured neurons in the tasks under
investigation (see also Caramazza et al., 2014; Hickok et al., 2009; Thompson
et al., 2019). So it remains unknown whether mirror neurons play a causal role
in action understanding in the macaque, a problem that is exacerbated because
identifying suitable tasks and measures of ‘understanding’ in non-human pri-
mates is not trivial. Moreover, for practical reasons, studies of mirror neurons
have focused on immediate reach-to-grasp movements targeting food or other
desirable objects in most cases. It is therefore not clear how these kinds of
findings generalize to the wide repertoire of actions (see also Sliwa & Freiwald,
2017) performed with various body parts, objects and tools in human daily life.
28 Perception
We return in Section 4.5 to more elaborate arguments and debates about the
role of mirror neurons. First, however, we review key points in the large
literature on human observers that has been directly inspired by the discovery
of mirror neurons and by the initial ideas about their possible functional roles.
FDI OP
a) b)
Execution
observation
Observation
Grasping
10 fT/cm
0.3 mV 2.0 mV
observation
Object
0 0
–0.3 –2.0
–1 0 1 2s
10 ms 10 ms
c)
Observation
Execution
view view
d) e)
Dissimilar
patterns
Similar
patterns
0.4
0.2
0.0
–0.2
–0.4
–0.6
–0.8
0 2 4 6 8 10 12
Time (seconds)
actions (Finisguerra et al., 2018) or the congruence between the context and the
action (Amoruso & Urgesi, 2016; Betti et al., 2022). Findings like these have
contributed to a debate about whether MEPs reflect an automatic motor reson-
ance, or whether instead they are also modulated by top-down influences (for
a review, see Amoruso & Finisguerra, 2019).
Another series of experiments exploited the mu rhythm, a frequency in the
cortical EEG signal in the range of 8–13 Hz over sensorimotor cortex. In general
terms, the mu rhythm is suppressed during selective attention and motor
preparation, and the mu rhythm can be sensitive to the type of movement and
handedness (for review, see Hobson & Bishop, 2007). Mu suppression, like
MEPs, has been used as an index of motor system activity during the passive
observation of others’ actions. In common with the properties ascribed to some
mirror neurons, for example, suppression of the mu rhythm is stronger during
the observation of a precision grip of an object compared to a mimicked
precision grip in the absence of an object (e.g. Muthukumaraswamy et al.,
2004a, b). However, the view of the mu rhythm as an index of human mirror
neuron activity also remains debated (e.g. Hobson & Bishop, 2017). In particu-
lar, it is not straightforward to determine whether a suppression in the 8–13 Hz
window originates from sensorimotor areas, or whether it instead stems from
a modulation of the alpha rhythm originating from occipital cortex. This
alternative indicates that the modulation of the mu rhythm during action
observation might instead, or additionally, reflect visual attention or perceptual
processes.
Studies from the brain stimulation and mu rhythm lines of work have been
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
useful to explore how the states of the observers’ motor system are influenced
by what the observer sees and understands about an action. However, the
functional implications of some of these findings remain debated, in that several
interpretations remain about what processes these neural measures reveal.
Human Neuroimaging
Early human neuroimaging studies using PET (Grafton et al., 1996; Rizzolatti
et al., 1996) and fMRI (Iacoboni et al., 1999) adopted the logic that anatomical
overlap between brain areas that are recruited during the observation of actions,
and the execution, imagination, or imitation of actions, would provide evidence
of ‘mirror-like’ human brain representations. Some common findings in these
initial studies laid the foundation for later human neuroimaging investigations.
For example, fMRI studies demonstrated that during passive observation of
goal-directed actions, participants recruit a consistent set of brain region includ-
ing the ventral premotor cortex (PMv) extending into the posterior IFG, the
preSMA, somatosensory cortex, anterior and superior sections of the parietal
cortex, and portions of the lateral occipitotemporal cortex (see Figure 7B). As
a shorthand, these regions are often collectively referred to as the ‘action
observation network’. Later studies showed how parts of this network (pre-
motor, parietal, and somatosensory areas) also overlap with the areas involved
during motor imagery and/or movement execution (for meta analyses, see e.g.
Arioli & Canessa, 2019; Caspers et al., 2010; Hardwick et al., 2018; but see
Turella et al., 2009). Together, findings like these have been taken to show
a common neural representation of the corresponding visual and motor aspects
of actions, as a possible system-level homologue of the mirror neuron.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Expertise
If one’s own motor representations play a causal role in action understanding, it
stands to reason that the richness of those representations should influence the
nature of understanding. Accordingly, several studies have examined how
different kinds and levels of action expertise (and specifically motor expertise)
change the way these actions are processed in brain regions of the action
observation network. The general logic is that relative to the novice, an expert’s
richer motor representations of an action repertoire enable an improved, or even
qualitatively different, understanding of observed actions from that domain.
Observers’ expertise modulates fMRI activity within the action observation
network (see Turella et al., 2013, for a review). For example, one series of
studies examined brain responses of expert dancers from two disciplines (ballet
and capoeira). In their domain of expertise, dancers exhibited more activity in
prefrontal and parietal regions relative to dance movements of the other domain
34 Perception
(Calvo-Merino et al., 2005) and to dance movements of the expert domain that
were motorically but not visually familiar (Calvo-Merino et al., 2006; see also
Cross et al., 2006, and Jola et al., 2012). The interpretation of these findings was
that motoric aspects of dance expertise influenced the way that experts visually
perceived and understood actions, by way of a cross-modal visuo-motor
representation.
An apparent paradox in this literature is that in some cases the effect of
experience appears to decrease rather than increase the activity in action
observation regions (see e.g. Gardner et al., 2017). For example, Petrini et al.
(2011) found such a pattern of results when comparing the neural activity
elicited by observing ‘point light’ animations of drumming actions, in experi-
enced versus novice drummers. These divergent effects may reflect two differ-
ent facets of expertise: on the one hand, expertise (e.g. with performing a class
of actions) provides a rich framework by which observed actions may be
assigned meanings that are not accessible to novices; hence a relative increase
in activity in relevant regions for experts. In contrast, expertise also entails
familiarity with actions from the relevant domain, supporting an improved
ability to predict what will be seen next. Indeed, the literature on perceptual
expectations emphasizes the suppressing effect of expectations on neural activ-
ity in line with predictive coding models (Summerfield et al., 2008).
‘how’ of an action (Rizzolatti & Craighero, 2004; Rizzolatti & Sinigaglia, 2010;
Caspers et al., 2010), whereas activity in regions linked to mentalizing tasks is
taken to reveal an effort to understand the intentions behind an action (‘why’;
e.g. van Overvalle, 2009; van Overvalle & Baetens, 2009). More generally,
these studies reinforce the view discussed in Section 3, namely that action
understanding is not reflex-like, but rather recruits neural processes that adapt
to serve the observer’s current goals.
and the intraparietal sulcus. These carried information about body parts and the
target of an action during the passive observation of short naturalistic video
clips. Responses in four of the identified clusters were organized by the spatial
scale of the action (e.g. from small, precise movements involving the hands to
large movements involving the entire body). Using a similar approach,
Thornton & Tamir (2022) were able to decode amongst observed actions on
the basis of their six-dimensional ACT-FAST taxonomy, based on fMRI activity
measured from a widespread set of occipitotemporal, parietal and frontal
regions. Finally, using EEG during passive observation of short video clips
depicting everyday actions in combination with behavioural ratings, Dima et al.
(2022) observed a temporal gradient in action representations. Over a period
from 60 to 800 ms, the shape of action ‘spaces’ changed from an emphasis on
visual features, to action-related features, and then to social-affective features.
Together, studies like these show how specific action-space models can be
developed and tested on the basis of neuroimaging data.
38 Perception
Multiple studies have found particularly strong evidence that LOTC plays
a role in representing action spaces. For example, Tucciarelli et al. (2019)
found that patterns of activity across the LOTC best capture the semantic
similarity structure of observed actions, when variability due to specific action
features such as body parts, scenes, and objects is removed. In that study, actions
related to locomotion, communication, and food formed clusters both in the
behaviourally determined and in the neural action space. Given evidence for
abstract action ‘spaces’ in LOTC, is there evidence of any anatomical organiza-
tion to the patterns of activity within this region? We have previously made the
case for representational gradients across the LOTC, such that the way it encodes
an action property (e.g. the extent to which it is person or object-directed) varies
continuously across the region. Specific proposed gradients include a posterior-
anterior gradient for the dimensions concrete-abstract and visual-multimodal, and
a dorsal-ventral gradient for the dimensions intentional-perceptual and animate
versus inanimate (e.g. Papeo et al., 2019; Tarhan et al., 2021; Wurm et al., 2017b;
for reviews, see Lingnau & Downing, 2015; Wurm & Caramazza, 2022).
Together, this family of findings shows that the representational similarity
approach can test hypotheses about how action knowledge is captured in
distributed patterns of brain activity. Moreover, these studies have highlighted
the role of the LOTC, and point to several action-relevant features that are
captured in this region. At the same time, this review highlights that there is not
yet consensus on a single set of organizing dimensions. Indeed, given the
flexibility with which observers can process an action depending on their
attentional state or task set, such a consensus may not be expected.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
newer evidence, and then go on to describe more recent perspectives that extend
beyond the idea of mirroring in action understanding.
A key family of findings is that mirror neuron responses are in some cases
influenced by contextual factors. As an example, Csibra (2008) pointed out that
the reach-to-place and the reach-to-eat conditions used in the study by Fogassi
et al. (2005) differed with respect to the object (food versus non-food) and the
presence or absence of a container. The role of context is also explicitly
highlighted in a computational model for the execution and recognition of
action sequences proposed by Chersi et al. (2011). Likewise, several studies
demonstrated a distinction between peripersonal and extrapersonal space
(Caggiano et al., 2009; Maranesi et al., 2017) and the subjective value of an
object that is the target of an action (Caggiano et al., 2012). Further, some F5
mirror neurons are sensitive to the difference between visual stimuli that either
caused or did not cause an action (e.g. a hand, represented as a disc, reaching,
holding and moving an object, compared to a control condition with a similar
movement pattern in which the disc made no contact with the object; Caggiano
et al., 2016). This difference was obtained for naturalistic stimuli, and also for
abstract stimuli depicting the same causal (or non-causal) relationships, suggest-
ing a broader role in understanding events beyond observed motor behaviours.
Further, some mirror neurons have properties that suggest they form
a representation of an upcoming action based on the action affordances that an
object presents (Bonini et al., 2014; see also Bach et al., 2014). (‘Affordances’
refer to aspects of an object that are closely linked to a particular kind of action,
such as the handles of objects such as pans or mugs.) This class of so-called
‘canonical’ mirror neurons discharges both during an observed action (e.g. grasp-
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
ing a large cone with a whole hand grip) and during the presentation of an object
for which that same grip would be appropriate (e.g. a large cone). Further, the
firing rate of the majority of such neurons is suppressed when the object is
presented behind a transparent plastic barrier (Bonini et al., 2014), suggesting
that these neurons only fire when it is actually possible for the monkey to interact
with the object. This pattern of findings implies a pragmatic coding of an observed
object by mirror neurons, in the sense that the representation is influenced by
context and the potential for an overt action. While this observation does not
necessarily apply to all mirror neurons, it does strongly imply that mirror neuron
activity may at least in part support the preparation to act on an object, in contrast
to contributing to a more receptive understanding process.
Together, findings like these highlight the contribution of the object, the
context and the potential to perform an action in shaping mirror neuron activity,
in line with a network-level approach to action understanding (see also Bonini
et al., 2022). Inspired by findings like these, and by other theoretical
40 Perception
embedded in a wider network of brain areas, some of which are more special-
ized for a visual analysis of the observed action. Collectively, these develop-
ments reduce the focus on mirror neurons per se as providing a unified, abstract
representation of actions at the pinnacle of an action understanding system.
What emerges instead is a view of mirror neurons operating as part of a wider
set of processes in which they may provide a concrete representation of
observed actions that is closely related to the preparation of corresponding
motor plans.
action observation tasks (see also Section 4). Yet more recent work implicating
parietal and occipitotemporal regions in rich action knowledge points to further
targets for intervention, and predictions about how disrupting those regions
should impact on action understanding behaviours.
Biologically inspired models of action understanding have been developed to
explain manual reaching and grasping (e.g. Fleischer et al., 2013) and have been
inspired by predictive coding and Bayesian modelling (e.g. Bach & Schenke,
2017; Baker et al., 2009; Kilner et al., 2007; Oztop et al., 2005). Extending this
line of research towards a wider range of actions while incorporating the rich
sources of information that are known to contribute to processing the ‘What,
How and Why’ of actions would be fruitful for the generation of new testable
hypotheses. More specifically, potential lines for this modelling work will be to
more explicitly incorporate (a) the role of information obtained about actions
from different perceptual systems that analyse objects, scenes, postures and
movements and the way this information is combined; and (b) the observer’s
44 Perception
6 Concluding Remarks
Action understanding, like other kinds of understanding, is a complex construct.
It covers a broad class of behaviours that are aimed at learning about events in
the world, and about the links between cause and effect, including physical and
mental causes. Accordingly, a key message of this review is that multiple kinds
of cognitive processes and representations are implicated in action understand-
ing, and the nature of these depends on the experience and the goals of the
observer.
Many recent treatments of the topic of action understanding begin with the
mirror neuron system and work outwards from observations about their proper-
ties and ostensibly analogous properties of the human brain and behaviour. This
approach has clearly been productive, as witnessed by the resulting explosion of
empirical findings and theoretical perspectives. However, it has also sometimes
begged the question by assuming a role for mirror neurons and then seeking that
Action Understanding 45
role, and in some cases fitting definitions of action understanding around the
resulting findings – a form of reverse inference that may be in part responsible
for perpetuating controversies around this topic.
In contrast, we have started by asking first why an observer might attend
others’ actions – what goals this might serve – and then in turn what cognitive
and neural machinery might be necessary to achieve those goals. As a guiding
framework, we were led by three broad themes: understanding what an action is,
how it is carried out, and why it is performed. While these distinctions highlight
different requirements of cognitive systems for action understanding, it is also
clear that crosstalk amongst these action understanding goals and the implicated
systems is probably the norm, rather than the exception, in real-world behaviour.
One point that emerges repeatedly is that predictive processes of various
kinds are central to action understanding. These include, for example, abstract
predictions that might be made about a hypothetical actor, to guess what kind of
action she might carry out given her aims; predictions about the kind of action
that is observed, and the intended outcomes, based on the metric details of the
actor’s grasp and eye movements, objects and the scene (see also Wurm &
Schubotz, 2012, 2017); and predictions about the traits of a specific actor, and
her future behaviours, based on the evidence of her current actions. Prediction,
of course, is arguably central to all forms of perception and of understanding
(Kilner et al., 2007). Forming a meaningful model of the world involves the
processing of information about what might come next, and also about the
possible outcomes of one’s own behaviours. In this light, the connection
between prediction and action understanding may not be a unique one, but
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
actions, even simple ones, are simply a very rich source of different kinds of
cues about the social and physical world.
In sum, we believe that progress in understanding action understanding
profits from a focus on diverse kinds of observer goals, and available cues to
support those goals. We believe that this approach opens up new avenues for
research, especially where paradigms and methods from the domain of object
recognition can be transferred to action understanding. We hope that this review
inspires the current and next generation of researchers to pick up these threads
and to carry out future studies along these lines.
References
Abdollahi, R. O., Jastorff, J., & Orban, G. A. (2013). Common and segregated
processing of observed actions in human SPL. Cerebral Cortex, 23(11),
2734–2753.
Adams, R. B., Adams Jr., R. B., Ambady, N., Nakayama, K., & Shimojo, S.
(Eds.). (2011). The Science of Social Vision: The Science of Social Vision
(Vol. 7). Oxford University Press.
Aflalo, T., Zhang, C. Y., Rosario, E. R., et al. (2020). A shared neural substrate
for action verbs and observed actions in human posterior parietal cortex.
Science Advances, 6(43), 1–16.
Aglioti, S. M., Cesari, P., Romani, M., & Urgesi, C. (2008). Action anticipation
and motor resonance in elite basketball players. Nature Neuroscience, 11(9),
1109–1116.
Aksoy, E. E., Orhan, A., & Wörgötter, F. (2017). Semantic decomposition and
recognition of long and complex manipulation action sequences.
International Journal of Computer Vision, 122(1), 84–115. https://doi.org/
10.1007/s11263-016-0956-8.
Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as
predictors of interpersonal consequences: A meta-analysis. Psychological
Bulletin, 111(2), 256–274.
Ambrosini, E., Costantini, M., & Sinigaglia, C. (2011). Grasping with the eyes.
Journal of Neurophysiology, 106(3), 1437–1442.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Ambrosini, E., Pezzulo, G., & Costantini, M. (2015). The eye in hand:
Predicting others’ behavior by integrating multiple sources of information.
Journal of Neurophysiology, 113(7), 2271–2279.
Amoruso, L., & Finisguerra, A. (2019). Low or high-level motor coding?
The role of stimulus complexity. Frontiers in Human Neuroscience,
13, 1–9.
Amoruso, L., & Urgesi, C. (2016). Contextual modulation of motor reson-
ance during the observation of everyday actions. NeuroImage, 134,
74–84.
Anzelotti, S., & Coutanche, M. N. (2018). Beyond functional connectivity:
Investigating networks of multivariate representations. Trends in Cognitive
Sciences, 22, 258–269.
Anzelotti, S., Caramazza, A., & Saxe, R. (2017). Multivariate pattern
dependence. PloS Computational Biology, 20, 1–20. https://doi.org/10.1371/
journal.pcbi.1005799.
References 47
178, 509–517.
Bach, P., Nicholson, T., & Hudson, M. (2014). The affordance-matching
hypothesis: How objects guide action understanding and prediction.
Frontiers in Human Neuroscience, 8, 1–13.
Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as
inverse planning. Cognition, 113, 329–349.
Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (2017). Rational
quantitative attribution of beliefs, desires and percepts in human mentalizing.
Nature Human Behaviour, 1(4), 1–10.
Baldissera, F., Cavallari, P., Craighero, L., & Fadiga, L. (2001). Modulation of
spinal excitability during observation of hand actions in humans. European
Journal of Neuroscience, 13(1), 190–194.
Bandura, A., & Jeffrey, R. W. (1973). Role of symbolic coding and rehearsal
processes in observational learning. Journal of Personality and Social
Psychology, 26(1), 122–130.
48 References
Bandura, A., & Walters, R. H. (1977). Social Learning Theory (Vol. 1). Prentice
Hall: Englewood cliffs.
Bar, M., Kassam, K. S., Ghuman, A. S., et al. (2006). Top-down facilitation of
visual recognition. Proceedings of the National Academy of Sciences, 103(2),
449–454.
Bargh, J. A. (1989). Conditional automaticity: Varieties of automatic influence
in social perception and cognition. Unintended Thought, 3–51.
Baumard, J., & Le Gall, D. (2021). The challenge of apraxia: Toward an
operational definition? Cortex, 141, 66–80.
Bekkering, H., Wohlschlager, A., & Gattis, M. (2000). Imitation of gestures in
children is goal-directed. The Quarterly Journal of Experimental
Psychology: Section A, 53(1), 153–164.
Benoni, H. (2018). Can automaticity be verified utilizing a perceptual load
manipulation? Psychonomic Bulletin & Review, 25(6), 2037–2046.
Bestmann, S., & Krakauer, J. W. (2015). The uses and interpretations of the
motor-evoked potential for understanding behaviour. Experimental Brain
Research, 233, 679–689.
Betti, S., Finisguerra, A., Amoruso, L., & Urgesi, C. (2022). Contextual priors
guide perception and motor responses to observed actions. Cerebral Cortex,
32(3), 608–625.
Beymer, D., & Poggio, T. (1996). Image representations for visual learning.
Science, 272(5270), 1905–1909.
Binkofski, F., & Buxbaum, L. J. (2013). Two action systems in the human brain.
Brain and Language, 127(2), 222–229.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Bird, G., Osman, M., Saggerson, A., & Heyes, C. (2005). Sequence learning by
action, observation and action observation. British Journal of Psychology, 96
(3), 371–388.
Blake, R., & Shiffrar, M. (2007). Perception of human motion. Annual Review
of Psychology, 58, 47–73.
Bonini, L., Rozzi, S., Serventi, F. U., et al. (2010). Ventral premotor and inferior
parietal cortices make distinct contribution to action organization and inten-
tion understanding. Cerebral Cortex, 20, 1372–1385.
Bonini, L., & Ferrari, P. F. (2011). Evolution of mirror systems: a simple
mechanism for complex cognitive functions. Annals of the New York
Academy of Sciences, 1225(1), 166–175.
Bonini, L., Maranesi, M., Livi, A., Fogassi, L., & Rizzolatti, G. (2014).
Space-dependent representation of objects’and other’s action in monkey
ventral premotor grasping neurons. Journal of Neuroscience, 34(11),
4108–4119.
References 49
Bonini, L., Rotunno, C., Arcuri, E., & Gallese, V. (2022). Mirror neurons 30
years later: Implications and applications. Trends in Cognitive Sciences,
767–781.
Bower, G. H., Black, J. B., & Turner, T. J. (1979). Scripts in memory for text.
Cognitive Psychology, 11(2), 177–220.
Bowers, J. S., Malhotra, G., Dujmović, M., et al. (2022). Deep problems with neural
network models of human vision. Behavioral and Brain Sciences, 1–77, 1–74.
Brandman, T., & Peelen, M. V. (2017). Interaction between scene and object
processing revealed by human fMRI and MEG decoding. Journal of
Neuroscience, 37(32), 7700–7710.
Brass, M., Bekkering, H., Wohlschläger, A., & Prinz, W. (2000). Compatibility
between observed and executed finger movements: Comparing symbolic,
spatial, and imitative cues. Brain and Cognition, 44(2), 124–143.
Brass, M., Schmitt, R. M., Spengler, S., & Gergely, G. (2007). Investigating
action understanding: Inferential processes versus action simulation. Current
Biology, 17(24), 2117–2121.
Brincat, S. L., & Connor, C. E. (2004). Underlying principles of visual shape
selectivity in posterior inferotemporal cortex. Nature Neuroscience, 7, 880–886.
Buxbaum, L. J., Shapiro, A. D., & Coslett, H. B. (2014). Critical brain regions
for tool-related and imitative actions: A componential analysis. Brain, 137
(7), 1971–1985.
Cadieu, C. F., Hong, H., Yamins, D. L., et al. (2014). Deep neural networks rival
the representation of primate IT cortex for core visual object recognition.
PLoS Computational Biology, 10(12), 1–18.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Caggiano, V., Fogassi, L., Rizzolatti, G., Thier, P., & Casile, A. (2009). Mirror
neurons differentially encode the peripersonal and extrapersonal space of
monkeys. Science, 324(5925), 403–406.
Caggiano, V., Fogassi, L., Rizzolatti, G., et al. (2011). View-based encoding of
actions in mirror neurons of area f5 in macaque premotor cortex. Current
Biology, 21(2), 144–148.
Caggiano, V., Fogassi, L., Rizzolatti, G., et al. (2012). Mirror neurons encode
the subjective value of an observed action. Proceedings of the National
Academy of Sciences, 109(29), 11848–11853.
Caggiano, V., Pomper, J. K., Fleischer, F., et al. (2013). Mirror neurons in
monkey area F5 do not adapt to the observation of repeated actions. Nature
Communications, 4(1), 1–8.
Caggiano, V., Fleischer, F., Pomper, J. K., Giese, M. A., & Thier, P. (2016).
Mirror neurons in monkey premotor area F5 show tuning for critical features
of visual causality perception. Current Biology, 26(22), 3077–3082.
50 References
Calvo-Merino, B., Glaser, D. E., Grèzes, J., Passingham, R. E., & Haggard, P.
(2005). Action observation and acquired motor skills: An FMRI study with
expert dancers. Cerebral Cortex, 15(8), 1243–1249.
Calvo-Merino, B., Grèzes, J., Glaser, D. E., Passingham, R. E., & Haggard, P.
(2006). Seeing or doing? Influence of visual and motor familiarity in action
observation. Current Biology, 16(19), 1905–1910.
Camponogara, I., Rodger, M., Craig, C., & Cesari, P. (2017). Expert players
accurately detect an opponent’s movement intentions through sound alone.
Journal of Experimental Psychology: Human Perception and Performance,
43(2), 348–359.
Cappa, S. F., Binetti, G., Pezzini, A., et al. (1998). Object and action naming
in Alzheimer’s disease and frontotemporal dementia. Neurology, 50(2),
351–355.
Caramazza, A., Anzellotti, S., Strnad, L., & Lingnau, A. (2014). Embodied cogni-
tion and mirror neurons: A critical assessment. Annual Review of Neuroscience,
37, 1–15.
Casile, A., & Giese, M. A. (2006). Nonvisual motor training influences bio-
logical motion perception. Current Biology, 16(1), 69–74.
Caspers, S., Zilles, K., Laird, A. R., & Eickhoff, S. B. (2010). ALE
meta-analysis of action observation and imitation in the human brain.
Neuroimage, 50(3), 1148–1167.
Catmur, C. (2016). Automatic imitation? Imitative compatibility affects
responses at high perceptual load. Journal of Experimental Psychology:
Human Perception and Performance, 42(4), 530–539.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Catmur, C., Walsh, V., & Heyes, C. (2007). Sensorimotor learning configures
the human mirror system. Current Biology, 17(17), 1527–1531.
Cattaneo, L., Sandrini, M., & Schwarzbach, J. (2010). State-dependent TMS
reveals a hierarchical representation of observed acts in the temporal, parietal
and premotor cortices. Cerebral Cortex, 20(9), 2252–2258.
Cavallo, A., Koul, A., Ansuini, C., Capozzi, F., & Becchio, C. (2016). Decoding
intentions from movement kinematics. Scientific Reports, 6(1), 1–8.
Cavanagh, P., Caplovitz, G. P., Lytchenko, T. K., Maechler, M. R., Tse, P. U., &
Sheinberg, D. L. (2023). The Architecture of Object-Based Attention.
Psychonomic Bulletin & Review, 1–25.
Cerliani, L., Bhandari, R., De Angelis, L., et al. (2022). Predictive coding
during action observation – A depth-resolved intersubject functional correl-
ation study at 7T. Cortex, 148, 121–138.
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception–
behavior link and social interaction. Journal of Personality and Social
Psychology, 76(6), 893–910.
References 51
Chersi, F., Ferrari, P. F., & Fogassi, L. (2011). Neuronal chains for actions in the
parietal lobe: A computational model. PloS one, 6(11), 1–15.
Chong, T. T. J., Cunnington, R., Williams, M. A., Kanwisher, N., &
Mattingley, J. B. (2008). fMRI adaptation reveals mirror neurons in human
inferior parietal cortex. Current Biology, 18(20), 1576–1580.
Chong, T. T. J., Cunnington, R., Williams, M. A., & Mattingley, J. B. (2009).
The role of selective attention in matching observed and executed actions.
Neuropsychologia, 47(3), 786–795.
Christensen, J. F., & Calvo-Merino, B. (2013). Dance as a subject for empirical
aesthetics. Psychology of Aesthetics, Creativity, and the Arts, 7(1), 76–88.
Chun, M. M., Golomb, J. D., & Turk-Browne, N. B. (2011). A taxonomy of
external and internal attention. Annual Review of Psychology, 62(1), 73–101.
Cichy, R. M., & Kaiser, D. (2019). Deep neural networks as scientific models.
Trends in Cognitive Sciences, 23, 305–317.
Cisek, P. (2007). Cortical mechanisms of action selection: The affordance
competition hypothesis. Philosophical Transactions of the Royal Society B:
Biological Sciences, 362(1485), 1585–1599.
Cisek, P. (2019). Resynthesizing behavior through phylogenetic refinement.
Attention, Perception, & Psychophysics, 81, 2265–2287.
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory.
Journal of Verbal Learning and Verbal Behavior, 8(2), 240–247.
Cook, R., Bird, G., Catmur, C., Press, C., & Heyes, C. (2014). Mirror neurons:
From origin to function. Behavioral and Brain Sciences, 37(2), 177–192.
Cracco, E., Bardi, L., Desmet, C., et al. (2018). Automatic imitation: A
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
de Lange, F. P., Spronk, M., Willems, R. M., Toni, I., & Bekkering, H. (2008).
Complementary systems for understanding action intentions. Current
Biology, 18(6), 454–457.
de Lange, F. P., Heilbron, M., & Kok, P. (2018). How do expectations shape
perception? Trends in Cognitive Sciences, 22(9), 764–779.
Dennett, D. C. (1987). The Intentional Stance. MIT press.
Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992).
Understanding motor events: A neurophysiological study. Experimental
Brain Research, 91, 176–180.
Dima, D. C., Tomita, T. M., Honey, C. J., & Isik, L. (2022). Social-affective
features drive human representations of observed actions. Elife, 11, 1–22.
Dinstein, I., Hasson, U., Rubin, N., & Heeger, D. J. (2007). Brain areas selective
for both observed and executed movements. Journal of Neurophysiology, 98
(3), 1415–1427.
Dinstein, I., Thomas, C., Behrmann, M., & Heeger, D. J. (2008). A mirror up to
nature. Current Biology, 18(1), R13–R18.
Donnarumma, F., Costantini, M., Ambrosini, E., Friston, K., & Pezzulo, G.
(2017). Action perception as hypothesis testing. Cortex, 89, 45–60.
Dungan, J. A., Stepanovic, M., & Young, L. (2016). Theory of mind for
processing unexpected events across contexts. Social Cognitive and
Affective Neuroscience, 11(8), 1183–1192.
Edelman, S. (1998). Representation is representation of similarities. Behavioral
and Brain Sciences, 21, 449–498.
Epstein, R. A., & Baker, C. I. (2019). Scene perception in the human brain.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Green, C., & Hummel, J. E. (2006). Familiar interacting object pairs are percep-
tually grouped. Journal of Experimental Psychology: Human Perception and
Performance, 32(5), 1107–1119.
Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the
functional properties of human cortical neurons. Acta Psychologica, 107(1–
3), 293–321.
Güçlü, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal
a gradient in the complexity of neural representations across the ventral
stream. Journal of Neuroscience, 35, 10005–10014.
Hafri, A., & Firestone, C. (2021). The perception of relations. Trends in
Cognitive Sciences, 25(6), 475–492.
Hafri, A., Trueswell, J. C., & Epstein, R. A. (2017). Neural representations of
observed actions generalize across static and dynamic visual input. Journal of
Neuroscience, 37(11), 3056–3071.
Hamilton, A. F., & Grafton, S. T. (2007). The motor hierarchy: From kinematics
to goals and intentions. Sensorimotor Foundations of Higher Cognition, 22,
381–408.
Hamilton, A. F., & Grafton, S. T. (2008). Action outcomes are represented in
human inferior frontoparietal cortex. Cerebral Cortex, 18(5), 1160–1168.
Hamilton, A. F. D. C., & Grafton, S. T. (2006). Goal representation in
human anterior intraparietal sulcus. Journal of Neuroscience, 26(4),
1133–1137.
Hardwick, R. M., Caspers, S., Eickhoff, S. B., & Swinnen, S. P. (2018).
Neural correlates of action: Comparing meta-analyses of imagery, obser-
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Kalénine, S., Buxbaum, L. J., & Coslett, H. B. (2010). Critical brain regions for
action recognition: Lesion symptom mapping in left hemisphere stroke.
Brain, 133(11), 3269–3280.
Kelly, S. W., Burton, A. M., Riedel, B., & Lynch, E. (2003). Sequence learning
by action and observation: Evidence for separate mechanisms. British
Journal of Psychology, 94(3), 355–372.
Kemmerer, D. (2021). What modulates the Mirror Neuron System during action
observation? Multiple factors involving the action, the actor, the observer, the
relationship between actor and observer, and the context. Progress in Biology,
205, 1–24.
Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form.
Proceedings of the National Academy of Sciences, 105(31), 10687–10692.
Kilner, J. M. (2011). More than one pathway to action understanding. Trends in
Cognitive Sciences, 15(8), 352–357.
Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). Predictive coding: An account
of the mirror neuron system. Cognitive Processing, 8, 159–166.
56 References
Kriegeskorte, N., Mur, M., Ruff, D. A., et al. (2008b). Matching categorical
object representations in inferior temporal cortex of man and monkey.
Neuron, 60(6), 1126–1141.
Kroczek, L. O., Lingnau, A., Schwind, V., Wolff, C., & Mühlberger, A. (2021).
Angry facial expressions bias towards aversive actions. Plos one, 16(9),
1–13.
Lanzilotto, M., Maranesi, M., Livi, A., et al. (2020). Stable readout of observed
actions from format-dependent activity of monkey’s anterior intraparietal
neurons. Proceedings of the National Academy of Sciences, 117(28),
16596–16605.
Lavie, N., & Dalton, P. (2014). Load theory of attention and cognitive control.
The Oxford Handbook of Attention, 56–75.
Levin, B. (1993). English Verb Classes and Alternations. Chicago: The
University of Chicago Press.
Lingnau, A., & Downing, P. E. (2015). The lateral occipitotemporal cortex in
action. Trends in Cognitive Sciences, 19(5), 268–277.
Lingnau, A., & Petris, S. (2013). Action understanding inside and outside the
motor system: The role of task difficulty. Cerebral Cortex, 23(6), 1342–1350.
https://doi.org/10.1093/cercor/bhs112.
Lingnau, A., Gesierich, B., & Caramazza, A. (2009). Asymmetric fMRI adap-
tation reveals no evidence for mirror neurons in humans. Proceedings of the
National Academy of Sciences, 106(24), 9925–9930.
Liu, S., Brooks, N. B., & Spelke, E. S. (2019). Origins of the concepts cause,
cost, and goal in prereaching infants. Proceedings of the National Academy of
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Repp, B. H., & Knoblich, G. (2004). Perceiving action identity: How pianists
recognize their own performances. Psychological Science, 15(9), 604–609.
Rifkin, A. (1985). Evidence for a basic level in event taxonomies. Memory &
Cognition, 13(6), 538–556.
Riley, M. R., & Constantinidis, C. (2016). Role of prefrontal persistent activity
in working memory. Frontiers in Systems Neuroscience, 9, 1–14.
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual
Review of Neuroscience, 27, 169–192.
Rizzolatti, G., & Fogassi, L. (2014). The mirror mechanism: Recent findings
and perspectives. Philosophical Transactions of the Royal Society B:
Biological Sciences, 369(1644), 1–12.
Rizzolatti, G., & Sinigaglia, C. (2010). The functional role of the parieto-frontal
mirror circuit: Interpretations and misinterpretations. Nature Reviews
Neuroscience, 11(4), 264–274.
Rizzolatti, G., & Sinigaglia, C. (2016). The mirror mechanism: A basic prin-
ciple of brain function. Nature Reviews Neuroscience, 17(12), 757–765.
Rizzolatti, G., Scandolara, C., Gentilucci, M., & Camarda, R. (1981). Response
properties and behavioral modulation of ‘mouth’ neurons of the postarcuate
cortex (area 6) in macaque monkeys. Brain Research, 225(2), 421–424.
Rizzolatti, G., Camarda, R., Fogassi, L., et al. (1988). Functional organization
of inferior area 6 in the macaque monkey: II. Area F5 and the control of distal
movements. Experimental Brain Research, 71, 491–507.
Rizzolatti, G., Fadiga, L., Matelli, M., et al. (1996). Localization of grasp
representations in humans by PET: 1. Observation versus execution.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Schultz, J., & Frith, C. D. (2022). Animacy and the prediction of behaviour.
Neuroscience & Biobehavioral Reviews, 140, 1–11.
Schurz, M., Radua, J., Aichhorn, M., Richlan, F., & Perner, J. (2014).
Fractionating theory of mind: A meta-analysis of functional brain imaging
studies. Neuroscience & Biobehavioral Reviews, 42, 9–34.
Sebanz, N., & Knoblich, G. (2021). Progress in joint-action research. Current
Directions in Psychological Science, 30(2), 138–143.
Seeliger, K., Ambrogioni, L., Güçlütürk, Y., et al. (2021). End-to-end neural
system identification with neural information flow. PLoS Computational
Biology, 17(2), 1–22.
Seger, C. A. (1997). Two forms of sequential implicit learning. Consciousness
and Cognition, 6(1), 108–131.
Serences, J. T., Schwarzbach, J., Courtney, S. M., Golay, X., & Yantis, S.
(2004). Control of object-based attention in human cortex. Cerebral
Cortex, 14(12), 1346–1357.
Shahdloo, M., Çelik, E., Urgen, B. A., Gallant, J. L., & Çukur, T. (2022). Task-
dependent warping of semantic representations during search for visual
action categories. Journal of Neuroscience, 42(35), 6782–6799.
Shepard, R. N. (1958). Stimulus and response generalization: Tests of a model
relating generalization to distance in psychological space. Journal of
Experimental Psychology, 55(6), 509–523.
Singer, J. M., & Sheinberg, D. L. (2010). Temporal cortex neurons encode
articulated actions as slow sequences of integrated poses. Journal of
Neuroscience, 30(8), 3133–3145.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Sliwa, J., & Freiwald, W. A. (2017). A dedicated network for social interaction
processing in the primate brain. Science, 356, 745–749.
Southgate, V. (2013). Do infants provide evidence that the mirror system is involved
in action understanding? Consciousness and Cognition, 22(3), 1114–1121.
Spoerer, C. J., McClure, P., & Kriegeskorte, N. (2017). Recurrent convolutional
neural networks: A better model of biological object recognition. Frontiers in
Psychology, 8, 1–14.
Spunt, R. P., & Lieberman, M. D. (2013). The busy social brain: Evidence for
automaticity and control in the neural systems supporting social cognition
and action understanding. Psychological Science, 24(1), 80–86.
Spunt, R. P., & Lieberman, M. D. (2014). Automaticity, control, and the social
brain. In J. W. Sherman, B. Gawronski, & Y. Trope (Eds.), Dual-process
theories of the social mind (pp. 279–298). New York, NY: Guilford Press.
Spunt, R. P., Satpute, A. B., & Lieberman, M. D. (2011). Identifying the what,
why, and how of an observed action: An fMRI study of mentalizing and
References 63
Thompson, E. L., Bird, G., & Catmur, C. (2019). Conceptualizing and testing
action understanding. Neuroscience & Biobehavioral Reviews, 105,
106–114.
Thompson, E. L., Long, E. L., Bird, G., & Catmur, C. (2023). Is action
understanding an automatic process? Both cognitive and perceptual process-
ing are required for the identification of actions and intentions. Quarterly
Journal of Experimental Psychology, 76(1), 70–83.
Thompson, J., & Parasuraman, R. (2012). Attention, biological motion, and
action recognition. Neuroimage, 59(1), 4–13.
Thornton, M. A., & Tamir, D. I. (2021a). People accurately predict the transition
probabilities between actions. Science Advances, 7, 1–12. https://doi.org/
10.1126/sciadv.abd4995.
Thornton, M. A., & Tamir, D. I. (2021b). Perceiving actions before they happen:
Psychological dimensions scaffold neural action prediction. Social Cognitive
and Affective Neuroscience, 16(8), 807–815.
64 References
Yau, J. M., Pasupathy, A., Brincat, S. L., & Connor, C. E. (2013). Curvature
processing dynamics in macaque area V4. Cerebral Cortex, 23, 198–209.
Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S., & Reynolds, J. R.
(2007). Event perception: A mind/brain perspective. Psychological Bulletin,
133(2), 273–293.
Zhuang, T., & Lingnau, A. (2022). The characterization of actions at the
superordinate, basic and subordinate level. Psychological Research, 86(6),
1871–1891.
Zhuang, T., Kabulska, Z., & Lingnau, A. (2023). The representation of observed
actions at the subordinate, basic and superordinate level. Journal of
Neuroscience, 43(48), 8219–8230.
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Acknowledgments
Our thanks to Jens Schwarzbach, Marius Zimmermann, Moritz Wurm, Deyan
Mitev, Maximilian Reger, Marisa Birk, Federica Danaj, Zuzanna Kabulska, and
Filip Djurovic for helpful discussions and comments on previous versions
of this manuscript. A.L. was supported by a DFG Heisenberg-Professorship
(LI 2840/2-1).
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
https://doi.org/10.1017/9781009386630 Published online by Cambridge University Press
Perception
James T. Enns
The University of British Columbia
Editor James T. Enns is Professor at the University of British Columbia, where he
researches the interaction of perception, attention, emotion, and social factors. He has
previously been Editor of the Journal of Experimental Psychology: Human Perception and
Performance and an Associate Editor at Psychological Science, Consciousness and Cognition,
Attention Perception & Psychophysics, and Visual Cognition.
Editorial Board
Gregory Francis Purdue University
Kimberly Jameson University of California, Irvine
Tyler Lorig Washington and Lee University
Rob Gray Arizona State University
Salvador Soto-Faraco Universitat Pompeu Fabra
Ecological Psychology
Miguel Segundo-Ortin and Vicente Raja
Representing Variability: How Do We Process the Heterogeneity in the Visual
Environment?
Andrey Chetverikov and Árni Kristjánsson
Action Understanding
Angelika Lingnau and Paul Downing