MNL 13-2020
MNL 13-2020
Bleibaum, Editor
ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959
Copyright © 2020 ASTM International, West Conshohocken, PA. All rights reserved. This material may
not be reproduced or copied, in whole or in part, in any printed, mechanical, electronic, film, or other
distribution and storage media, without the written consent of the publisher.
Photocopy Rights
Authorization to photocopy items for internal, personal, or educational classroom use, or the internal,
personal, or educational classroom use of specific clients, is granted by ASTM International provided that
the appropriate fee is paid to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923,
Tel: (978) 646-2600; http://www.copyright.com/
Publisher:
ASTM International
100 Barr Harbor Drive
PO Box C700
West Conshohocken, PA 19428-2959
Phone: (610) 832-9585 Fax: (610) 832-9555
ISBN 978-0-8031-7118-3
ISBN-EB: 978-0-8031-7119-0
ASTM International is not responsible, as a body, for the statements and opinions advanced in the publi-
cation. ASTM International does not endorse any products represented in this publication.
Printed in Hanover, PA
December, 2020
Foreword iii
Index 145
well beyond these product categories, and includes agriculture, personal care, home
care, auto care, pet care products, office supplies, paper and textiles, handheld devices
(e.g., shavers, toothbrushes, hair dryers, etc.), art supplies, perceptions of luxury prod-
ucts or services, currency, furniture, etc.
The application of descriptive analysis is limited only by one’s imagination, time,
and budget. The wide array of products currently being evaluated using descriptive
analysis techniques, or at a minimum, some form of attribute evaluations, illustrates
their diversity and value to corporations.
Descriptive analysis is indeed a sophisticated and complex tool from several per-
spectives, as it:
• is the sensory technique that provides qualitative descriptions and quantitative
measures (and temporal behavior) of products, based on the perceptions of a
group of qualified assessors,
• requires the screening and training of assessors, although the amount of training
varies greatly between descriptive methods. In addition, it requires in-depth
discussions for a panel to develop the most appropriate evaluation procedures and
comprehensive descriptive language (these requirements apply only to the
fundamental methods covered in this manual).
Our distinguished chapter authors for the second edition of this manual all have
extensive global experience working with multinational organizations through their
thirty-plus year careers across the above-mentioned categories. These authors volun-
teered their time and expertise in order to share these methods with interested readers,
especially our next generation of sensory scientists.
Since the last edition, it is fair to say that the techniques themselves and appli-
cations of the methods have expanded dramatically. Despite our fundamental
methods having been updated, in reality, their basic techniques have changed
slightly, because they are robust. Furthermore, these basic fundamental methods
represent the foundation of many new descriptive approaches. This second edition
focuses on the fundamental descriptive methods incorporating any updates and
new developments along with case studies. Our goal with this edition is to provide
the reader with each dedicated practitioner’s view of the descriptive methodology
covered.
This manual does not represent an exhaustive discussion of all of the published
descriptive analysis methods, as some other books do. However, members of ASTM
International Committee E-18 on Sensory Evaluation agreed that the newer methods
are all based on the core descriptive analysis fundamental methods, which are covered
in this manual.
To illustrate the uniqueness of the individual methods, each chapter contains the
background and core philosophy, the assessors’ qualification criteria, the panel mod-
erators’ qualifications, the language development process, along with data collection,
analysis, and reporting of results. The fundamental methods differ from each other in
overall philosophy, the panel selection, training and language development process
and the scales used to collect data (e.g., scale length, type, anchors, and usage). Notably,
these methods also vary in the way data are collected and used; some use consensus
while others are data and statistics driven. Finally, methods differ in techniques and
statistical approaches to evaluate panel performance, along with data analysis, and
reporting. It is up to the practioner to determine the level of evidence or statistical
robustness required (e.g., exploratory versus legal dispute) when using descriptive
analysis. Each chapter also contains practical considerations for adaptation to other
categories.
In addition to the four methods previously published in the first edition, this
manual includes Free Choice Profiling and Temporal Methods. These were included
(1) to exemplify the nature and philosophy of a “nontraditional” method (i.e., Free
Choice Profiling) and (2) to illustrate the methodology that captures product percep-
tions over time (temporal methods), which has gained interest and value in the evalu-
ation of products.
All of these methods have tremendous applications in ways yet to be discovered.
Also, yet to witness are the new disciplines that will embrace and use these valuable
sensory tools!
Rebecca N. Bleibaum
Dragonfly SCI, Inc.
Santa Rosa, CA, USA
Alejandra M. Muñoz
IRIS: International Resources for Insights and Solutions, LLC
Mountainside, NJ, USA
For additional reading on this topic, the reader is directed to the following selected
references:
Standard Guide for Two Sensory Descriptive Analysis Approaches for Skin Creams and Lotions,
ASTM E1490 (2019) (West Conshohocken, PA: ASTM International, approved November 1,
2019), http://doi.org/10.1520/E1490-19
L. Stapleton, “Descriptive Analysis,” in Sensory Testing Methods, 3rd ed., ed. M. B. Wolf (West
Conshohocken, PA: ASTM International, 2020), 79–98.
Standard Guide for Time-Intensity Evaluation of Sensory Attributes, ASTM E1909-13 (West
Conshohocken, PA: ASTM International, approved October 1, 2013), http://doi.org/10.1520/
E1909-13
M. A. Brandt, E. Z. Skinner, and J. A. Coleman, “Texture Profile Method,” Journal of Food Science
28, no. 4 (1963): 404–409.
J. F. Caul, “The Profile Method of Flavor Analysis,” Advances in Food Research 7 (1957): 1–40.
M. C. Meilgaard, G. V. Civille, and B. T. Carr, Sensory Evaluation Techniques, 5th ed. (Boca Raton,
FL: CRC Press, 2015).
J. Delarue, B. Lawlor, and M. Rogeaux, Rapid Sensor Profiling Techniques and Related Methods:
Applications in New Product Development and Consumer Research (Cambridge, UK:
Woodhead Publishing, 2014).
M. C. J. Gacula, Descriptive Sensory Analysis in Practice (Washington, DC: Food and Nutrition
Press, 1997).
J. Hort, S. E. Kemp, and T. Hollowood, Time-Dependent Measures of Perception in Sensory
Evaluation (Oxford, UK: Wiley-Blackwell, 2017).
S. E. Kemp, J. Hort, and T. Hollowood, (eds.), Descriptive Analysis in Sensory Evaluation
(West Sussex, UK: John Wiley & Sons, Ltd., 2018).
H. T. Lawless and H. Heymann, Sensory Evaluation of Food: Principles and Practices, 2nd ed.
(New York: Springer, 2010).
M. Lindstrom, Small Data: The Tiny Clues That Uncover Huge Trends (New York, NY: St. Martin’s
Press, 2016).
H. Stone, J. L. Sidel, S. Oliver, A. Woolsey, and R. C. Singleton, “Sensory Evaluation by Quantitative
Descriptive Analysis,” Food Technology 28, no. 11 (1974): 24–34.
H. Stone, R. N. Bleibaum, and H. A. Thomas, Sensory Evaluation Practices, 5th ed. (London:
Elsevier/Academic Press, in press).
A. A. Williams and G. B. Arnold, “A New Approach to Sensory Analysis of Foods and Beverages,
in Progress in Flavour Research, Proceedings of the 4th Weurman Flavour Research
Symposium, ed. J. Adda (Amsterdam, The Netherlands: Elsevier, 1984), 35–50.
A. A. Williams and S. P. Langron, “The Use of Free-Choice Profiling for the Evaluation
of Commercial Ports,” Journal of the Science of Food and Agriculture 35, no. 5 (1984):
558–568.
Introduction
Consensus Profile Methods evolved from the original descriptive method, the Flavor
Profile,1,2 and describe methods that use a group decision-making descriptive process.
In a previous edition of this manual, the Flavor Profile Method was described, but its
adaptation to and more common use as a more generic Consensus Profile Method
resulted in a name change and a change of focus of this chapter.
The Consensus Method is based on the concept that a product’s profile consists of
identifiable sensory modalities (e.g., appearance, taste, odor) and a combination of
impressions not separately identifiable. The method consists of formal procedures for
describing and assessing the characteristics in a reproducible manner. The separate
characteristics contributing to the overall sensory impression of the product are iden-
tified and their intensity assessed to build a description of the product.
This “method” is not a single proscriptive tool for conducting a study but rather
describes a set of processes in which sensory descriptive analyses are performed by a
group of assessors to reach an agreed-on unified profile by consensus. The first adapta-
tion of the original Flavor Profile was to the consensus Texture Profile Method,3 and
consensus profiling has since been adapted for use with various scale lengths (7, 10,
and 15 points)4–6 and to other modalities such as vision7,8 and for such aspects as mea-
suring/comparing attributes among differing stages such as application, wearability,
and the removal of cosmetics.9
Consensus Profiling may be partial, meaning that it includes only a portion of the
process (e.g., determining attributes), or full, where the panel develops the full profile
on the basis of consensus methods, such as occurred with the Flavor Profile.
Partial consensus profiling is quite common in the initial development of lexicons
and sensory scorecards for later tests. In many cases, panels are asked to develop a set
of attributes for all of the modalities of interest that will be used later in testing either
1
Center for Sensory Analysis and Consumer Behavior, Kansas State University, 1310 Research Park Dr.,
Manhattan, KS 66502, USA E. C. https://orcid.org/0000-0002-2480-0200
DOI: 10.1520/MNL1320170037
Copyright © 2020 by ASTM International, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA 19428-2959
ASTM International is not responsible, as a body, for the statements and opinions expressed in this chapter.
ASTM International does not endorse any products represented in this chapter.
on a routine basis or for a set of products in further testing. Those attributes often are
developed on the basis of consensus methods, where assessors discuss potential terms,
come to some agreement on what terms will be included, and may even determine
definitions and references for those terms depending on the sensory methods to be
used. That list of terms then is made into a ballot and is used in other sensory methods
for individual measurement by assessors.
Core Philosophy
The essence of Consensus Profiling as embodied initially in the Flavor Profile is that it
allows a group of assessors to describe, in detail, the perceptual characteristics of prod-
ucts in a way that will be meaningful to the end user. The concept is that there is a way
to describe those characteristics that makes sense, that the “way” is not necessarily
determined in advance (i.e., there is not a set of standard attributes, a set order of
appearance, etc.), and that a group of trained assessors will discover and profile that
product through testing, discussion, and description. The format of the report may
take different forms such as tables, graphics, or descriptive paragraphs (e.g., storytell-
ing). Civille et al. state that “[c]onsensus evaluations allow highly trained, experi-
enced panelists to determine, discuss and conclude the best way to describe the
individual attributes of each sample.”10
Any number of different types of things can be “profiled.” Throughout this chap-
ter the term product is used to mean any product, service, information, or other stim-
ulus that can be evaluated using sensory descriptive methods. The product may not be
an actual physical product in the sense that many sensory analysts recognize a food,
lotion, cosmetic, automobile paint finish, etc., as a product. For example, profiles can
be developed for the sensory aspects of nutrition education campaigns (e.g., the colors,
fonts, graphics, crowdedness, layout) to compare with standards established by con-
sumers,11,12 instructions for recipes (e.g., fonts, types of instructions, graphics, style of
instructions),13 a library website (e.g., font, layout, readability, clickability, usability
aspects),14 or a concept for an app to take orders at a quick-service restaurant.
Consensus profiles may include a number of different aspects such as attributes;
intensities; order of appearance of the attributes; overall or integrated components
such as amplitude, balance, and blendedness; total impact; and other aspects that the
panelists or researchers believe are important.
Overview of Method
The process of Consensus Profiling as initiated by the Flavor Profile Method is essen-
tially the same regardless of whether a partial or full profile is completed using the
method. In a full consensus profile, the process simply is repeated or integrated
together to determine other aspects of the profile in addition to the attributes or what-
ever portion was to be developed in the partial profile.
The process usually consists of seven or eight steps depending on whether initially
recruiting the panel is included in the process. In this chapter, we consider recruiting the
panel a preliminary step and thus refer to seven steps of training, testing, and developing
profiles. Likewise, what happens to the profile once it is developed is considered outside the
process of gathering the data, although it is integral to the use and benefit of the method.
The seven steps are as follows:
1. Panel screening and selection
2. Panel training
3. Performance trials to validate the panel
4. Orientation to the defined product category
5. Product presentation—assessors begin individual evaluation
6. Assessors discuss and reevaluate to reach a consensus on the profile
7. Consensus profile created
Those steps (starting usually with Step 4) are repeated as needed to evaluate addi-
tional samples. A diagram of these steps is shown in figure 1. Aspects of these steps will
be described in more detail later in various parts of this chapter.
FIG. 1 Steps in Consensus Profiling Methods (including the Flavor Profile Method).
Assessors
Assessors may come from either a technical or nontechnical background; both will
need training, although that training may be of slightly different types. In the original
Flavor Profile, assessors with a technical background often were chosen because they
were deemed to be on site and available and have knowledge of flavors and ingredients
that could be helpful to the developer. However, later variations of the method often
chose to use assessors specifically hired to conduct the testing because scientists
already in the company were constrained by time and unavailable for training and
testing and there was the potential for biased knowledge about the products they were
testing. Regardless of background, assessors need the following major characteristics
to succeed as panelists: eagerness to participate, time to participate, physiological abil-
ity and innate aptitude, a willingness to learn the techniques necessary for the tasks,
and a disposition for working positively with others using unconditional positive
regard.
Assessors may be recruited from within an organization if the position of assessor
is clearly identified as part of the individual’s responsibilities and priorities. More fre-
quently, assessors are recruited from outside the organization and work as assessors
(often part time) as their primary job within the company. The reason for this is the
amount of time that is required for panel training, orientation, and testing, which for
many Consensus Profile panels is extensive. For example, Caul2 described the training
for the Flavor Profile in terms of 6 to 12 months, and other authors have described
training in terms of hours19 or panel sessions and time.20
It is essential that the individuals selected actually want to participate and are
selected on the basis of the criteria above. Selecting assessors simply because of their
job title, other responsibilities that suggest they “should” be on the panel, time avail-
ability, or their status within the organization is not acceptable. Assessors who are told
to participate, have many other responsibilities, do not have the aptitude for the testing
because they cannot differentiate among samples during screening sessions, or are
selected because they, as bosses, “have all the answers” (they do not when it comes to
sensory skills) make poor panelists.
Panel Training
After selection, assessors for full Consensus Profile Method panels are trained to
improve their abilities to (a) describe sensory perceptions, (b) score the perceptions
reliably on the designated scale, and (c) use consensus methods to profile actual prod-
ucts. Each of these skills is a key aspect of the training. Training increases reliability
and the ability to differentiate small differences in key aspects.19 The duration of the
training will vary depending on the purpose of the panel. If the panel is expected to be
capable of describing a variety of products, a training period of 4 to 6 months is
required. This includes approximately 120 hours of training and 40–60 hours of prac-
tice for the panel. Training for a single type or category of product can be accomplished
in a shorter time.
The structured training consists of the following:
1. A basic course of instruction that includes lectures and demonstrations on the nature
of the senses, basic requirements for panel work, and techniques and procedures for
reproducible sensory testing. This usually requires 3 days.
2. Evaluation of products with a range of relatively easy perceptible differences that
have been selected for their particular teaching value. In this step three to four sets of
products usually containing about four products each are used. For each set, approx-
imately 2–4 hours (usually broken into two to three days or sessions) are spent on the
evaluation of the set of products, with a discussion of the development of attributes
and their references, discussion of scaling, and development of rapport among the
panelists. During this step in training three aspects are of particular interest:
• Describing perceptions in detail: It is essential that assessors grasp the
importance of using specific terms instead of general terms as descriptors. For
example, general terms such as “apple” (flavor), “red” (color), and “soft”
(texture) are shown to the assessors as being too broad. Instead, terms must be
described in much more specific ways. For example, is the apple flavor cooked
or raw? Is the red color cherry or brick, or can we use a specific color number
from a visual color system? Is a soft texture the result of fuzziness or the lack of
stiffness? Getting the assessors to effectively describe products is usually quite
simple once they conceptualize that they must describe the specific perceptual
properties (i.e., what is noticed) and not the ingredients, processes, mechanics,
or other preparation, chemical, or manufacturing processes.
• Comparing intensities: Assessors must begin to understand the types of scales
that will be used. For example, is it a universal-type scale or product-specific
scale? How many points are on the scale? Are there anchors present? How
many anchors are there? Assessors must become comfortable discerning and
stating that one product is higher or lower in intensity than another and begin
to easily compare intensities of individual attributes.
• The concept of unconditional positive regard:21 In consensus panel work this
concept means accepting that other assessors may provide conflicting
information during evaluations but that through meaningful discussion an
agreed-on working profile can be developed to provide the appropriate
description for the product. This does not mean that assessors must
immediately agree with everyone else. It simply means that no matter how
different the opinions are, each person trusts that the intent of everyone’s input
is to progress,22 (i.e., get to a better profile than anyone could have created
Data Collection
For the Flavor Profile and several Consensus Profile derivatives, a single product is
evaluated at one time, and the complete profile is determined before moving to the
next product. In some Consensus Profile Methods, especially if there are many prod-
ucts to be profiled, the panel may examine a subset of products, establish tentative
profiles for those products, and then compare those products to each other to ensure
that the data provide the essential information before reevaluating all of the products
to develop final profiles. The process for profiling is as follows:
1. The development of consensus profiles begins with an understanding of the sensory
aspects of the category of product to be evaluated, if available. The sensory ana-
lyst may provide the panel with reports generated previously or reports that have
been published related to the attributes and testing of similar or related products.
If such materials are available, assessors are made aware that they provide informa-
tion that may, when needed, be adapted for the study in question. Note that in some
instances an end user may have some “standard terms” or “standard references” that
should not be changed. In that case the panel needs to be aware of that information
and will need to use those terms or references “as is” unless discussed with the end
user first. If the terms or references could result in misunderstandings (e.g., a term
that means one thing to a sensory panel but may be used differently by industry
experts), such information should be explained in notes attached to the profiles.
2. For evaluation, the assessors begin by looking, hearing, feeling, tasting, and s melling—
whatever is needed for the objectives of the evaluation. In the Flavor Profile, assessors
develop an individual profile, either partial or full, first on a blank paper or computer
screen. The assessor typically writes down a list of each attribute detected. Then the
assessor might score each of those attributes on a scale. Zero would not be used on
the scale because if the score were zero, the characteristic would not be listed and
would not need to be scored. In some derivative Consensus Profile Methods, the
panelists start with a sheet of terms and references and mark or score those they
believe are relevant to the product being evaluated. In fact, in some cases only a
subset of key attributes may be listed on the ballot to be evaluated. Either system is
appropriate. The blank-sheet method is considered the “purest” in that it does not
prompt assessors but may suffer from the exclusion of a particular attribute in the
quest to notice other attributes. It may not be appropriate if specific attributes are the
focus of the project. The list-of-attributes approach reminds panelists to look for each
attribute. However, it might encourage some panelists to find attributes that may not
really be present.
3. Regardless of whether a Flavor Profile or a more generic Consensus Profile Method is
being used, after all panelists have individually profiled the product, the panel leader,
who is a member of the panel, leads a discussion. First the assessors might discuss
what attributes were evaluated, and the leader would compose a list. Discussion con-
tinues, and depending on the objectives and further use of the terminology list (e.g.,
are these attributes part of a general list that will be used to create a ballot to evalu-
ate samples by individual assessors later or are they the specific list of attributes for
this product?), the group may quickly come to agree on a list of terms or may need
to spend more time developing a more focused lexicon. The panel then reevaluates
the product, examines the master list, discusses terms they did not agree on, and
describes the attribute in more detail (e.g., a description, where it appears, attributes
it follows, and other information). This is done to help other assessors determine
whether the attribute is one they omitted accidentally, is one they used a different
terms to describe, is one they did not find, or if they disagree with the other assessor
for some other reason. Usually, after a short discussion, the attributes can be identi-
fied in the product. If they cannot agree, additional evaluation of the product using
other samples, new reference materials, and further description and elaboration of
the attribute in question, including bringing in other examples of the attribute for the
assessors to examine, may be done to bring the group to consensus.
4. At that point, the panel typically addresses intensities of attributes that have not
already been determined, again using discussion as needed. Depending on the
objectives of the study, such aspects as order of appearance of the attributes, the time
sequence of an attribute’s increase or decrease in intensity, dominance of attributes,
or other aspects such as how attributes interact with each other (e.g., amplitude in the
Flavor Profile) may be evaluated.
5. The panel writes the profile in the format determined either by the client or in the
format best suited to the use of the data. Multiple formats, such as a tabular format
with additional graphical components and paragraphs describing unique properties
or interactions that may need to be addressed, are fine. The key to the Consensus
Profile Method is understanding by the end user.
Reporting of Results
As mentioned, Consensus Profiles can take many forms. Traditional Flavor Profiles
used a tabular form24,25 that included the attribute and intensity in order of appear-
ance. In addition to the tabular profile, the product often was described in paragraph
form to explain the profile.24 Other forms include graphical forms similar to any other
descriptive analysis such as spider plots, bar charts, trees, or in the case of data ana-
lyzed by multivariate methods, maps of products and attributes in multidimensional
spaces (e.g., principal component analysis biplots).4,26–29 In other cases, descriptive
paragraphs have been used to describe the product on the basis of results from the
consensus profile, such as the following summary of eating a manufactured potato
crisp provided by a panel:
For oiliness: Before even picking up the product one could tell that it likely would
be oily because of the 3–6 oil blotches on each chip giving them an inconsistent
greasy appearance. As the product was picked up one felt the need to get a napkin
to wipe the hands because the product had a high degree of oily/greasy feel on the
surface. On putting the chips in the mouth, there was an initially moderate oily/
greasy feel on whatever surfaces (e.g., lips, tongue) the chip touched, but interest-
ingly on biting, the chip seemed slightly dry. While chewing, the dryness grew to
a moderate to high intensity; a moderate starchy feel grew. The flavor did impart
some slight oiliness to it, but it was described as more of a “heated oil” note that
was just beginning to turn rancid, which appeared right at the end of chewing at
a slight level.
Practical Considerations
Consensus profiling is a relatively simple concept but may be difficult to execute in
practice without considerable training. As opposed to individual scoring, consensus
profiles use the concept that the panel of assessors must come to agreement on each of
the aspects evaluated. The idea is simple enough. Instead of averaging scores over the
group, the group must come to an agreement on what the appropriate attributes,
scores, etc., are. Consensus methods usually allow the assessors to use whatever attri-
butes they believe are appropriate (i.e., they start with a “blank ballot”), but it differs
from methods, such as free-choice profiling, that allow all individuals to use any words
to describe their perceptions. With consensus methods all assessors must discuss and
come to agreement on the terms, whereas in methods such as free-choice profiling
statistics are used to attempt to relate information to determine similarity in
meaning.
Because Consensus Profiling requires a team decision, it is essential that one
member not dominate the group. A “bully” cannot be tolerated in any Consensus Pro-
file panel, either from the standpoint of a satisfactory end profile (it cannot be one
person’s viewpoint) or from the standpoint of pleasant and respectful group dynamics.
The key is open communication and the use of unconditional positive regard. Success-
ful consensus panels use a range of knowledge, experience, and decision-making skills
to facilitate agreement on the profile. This stimulates thinking and brings new ideas
(e.g., attributes that might not be noticed by everyone) to the group. This is common in
other fields such as psychology and medicine and often helps in successful business
decision making. The key is not to get distracted and bogged down in discussions that
are better held during external training or outside of the panel session; doing so frus-
trates assessors and derails an efficient profile process. When a good panel leader sees
that happening, the best course of action is to call a halt to that discussion, move to a
different aspect of the profile, and return to that aspect with refocused vigor later in the
discussion.
Much has been discussed over the years about the Consensus Profile Method and
its predecessor methods such as the Flavor Profile in terms of specific disadvantages. It
is essential that these discussions are addressed here. In essence, they have been related
to the time and cost of panel training, individual variation that is unaccounted for, bias
related to discussion, lack of statistical analyses, and panel size. The first and last of
these issues are related.
It is true that a consensus panel needs training—sometimes extensive training—
and requires time to calibrate intensities and develop cohesion for discussion and
consensus development, depending on the projects it conducts. Of course, much of
this can be said for many descriptive methods that are not based on consumer vocab-
ulary and are expected to provide a high level of detail to the end user. Likewise, the
number of panelists needed in a test is a function of the level of training and reproduc-
ibility, not a perceived “magic number” with no actual basis in science. Studies have
repeatedly shown that smaller numbers of highly trained panelists can perform as well
or better than a larger number of less well-trained panelists.19,30 A large study of deci-
sion making by teams concluded that groups of 5 to 6 members were optimal where
discussion and decision making are paramount.31
Situational Adaptations
Time sequencing has been an integral part of the Flavor Profile from the beginning of
the method, as evidenced by the concept of order of appearance. Later Consensus Pro-
file Methods have adapted this concept to provide order of appearance of attributes in
single modalities, matched order of appearance in multiple modalities, or temporal
profiling as done in separate sensory tests. It has been included in some studies with
Consensus Profiles to characterize the rise and fall of attributes over time during the
profile. This use, which takes additional time, is typically conducted separately from
the initial profile. After the initial profile is established and the basic order of appear-
ance of the attributes is determined, the panelists reevaluate key attributes over time,
obtaining consensus data that are converted into a graph or line chart to show how the
characteristics interact with each other in terms of their appearance, rise, fall, and
disappearance in the profile. Such information potentially allows the time of appear-
ance and disappearance of certain key attributes to be adjusted for products as they are
References
1. S. E. Cairncross and L. B. Sjöström, “Flavor Profiles—A New Approach to Flavor
Problems,” Food Technology 4 (1950): 308–311.
2. J. F. Caul, “The Profile Method of Flavor Analysis,” in Advances in Food Research, ed. E. M.
Mrak and G. F. Stewart (New York: Academic Press, 1957), 1–40.
4. P. A. Prell and F. M. Sawyer, “Flavor Profiles of 17 Species of North Atlantic Fish,” Journal
of Food Science 54 (1988): 1036–1042.
6. T. R. Jaffe, H. Wang, and E. Chambers IV, “Determination of a Lexicon for the Sensory
Flavor Attributes of Smoked Food Products,” Journal of Sensory Studies 32 (2017):
e12262.
7. C. Maughan, E. Chambers IV, and S. Godwin, “A Procedure for Validating the Use of
Photographs as Surrogates for Samples in Sensory Measurement of Appearance: An
Example with Color of Cooked Turkey Patties,” Journal of Sensory Studies 31 (2016): 507–513.
8. L. Dooley, K. Adhikari, and E. Chambers IV, “A General Lexicon for Sensory Analysis of
Texture and Appearance of Lip Products,” Journal of Sensory Studies 24 (2009):
581–600.
9. C. Sun, K. Koppel, and E. Chambers IV, “An Initial Lexicon of Sensory Properties for Nail
Polish,” International Journal of Cosmetic Science 36 (2014): 262–272.
13. A. Lezama-Solano and E. Chambers IV, “Development and Validation of a Recipe Method
for Doughs,” Foods 7 (2018), https://doi.org/10.3390/foods7100163
15. H. Wang, X. Zhang, H. Suo, X. Zhao, and J. Kan, “Aroma and Flavor Characteristics of
Commercial Chinese Traditional Bacon from Different Geographical Regions,” Journal of
Sensory Studies 34 (2019), https://doi.org/10.1111/joss.12475
16. E. Chambers IV, J. Lee, S. Chun, and A. Miller, “Development of a Lexicon for
Commercially Available Cabbage (Baechu) Kimchi,” Journal of Sensory Studies 27 (2012):
511–518.
18. K. Sanchez and E. Chambers IV, “How Does Product Preparation Affect Sensory
Properties? An Example with Coffee,” Journal of Sensory Studies 30 (2015): 499–511.
20. H. Kim, J. Lee, and B. Kim, “Development of an Initial Lexicon for and Impact of Forms
(Cube, Liquid, Powder) on Chicken Stock and Comparison to Consumer Acceptance,”
Journal of Sensory Studies 32 (2017), https://doi.org/10.1111/joss.12251
21. C. R. Rogers, Client-Centered Therapy: Its Current Practice, Implications and Theory
(Boston: Houghton Mifflin, 1951).
22. C. Rogers and B. F. Skinner, “Some Issues Concerning the Control of Human Behavior,”
Science 124 (1956): 1057–1065.
23. T. J. Chermack and K. Nimon, “Drivers and Outcomes of Scenario Planning: A Canonical
Correlation Analysis,” European Journal of Training and Development 37 (2013): 811–834.
25. C. Y. Chang and E. Chambers IV, “Flavor Characterization of Breads Made from Hard Red
Winter Wheat and Hard White Winter Wheat,” Cereal Chemistry 69 (1992): 556–559.
29. V. Lotong, D. H. Chambers, C. Dus, E. Chambers IV, and G. V. Civille, “Matching Results of
Two Independent Highly Trained Sensory Panels Using Different Descriptive Analysis
Methods,” Journal of Sensory Studies 17 (2007): 429–444.
30. E. Chambers IV, J. A. Bowers, and A. D. Dayton, “Statistical Designs and Panel Training/
Experience for Sensory Analysis,” Journal of Food Science 46 (1981): 1902–1906.
31. J. S. Muller and E. Wittenberg, “Is Your Team Too Big? Too Small? What Is the Right Number?”
The Wharton School, University of Pennsylvania, 2006, http://web.archive.org/save/
https://knowledge.wharton.upenn.edu/article/is-your-team-too-big-too-small-whats-the-
right-number-2/
32. A. M. Muñoz and G. V. Civille, “Universal, Product and Attribute Specific Scaling and the
Development of Common Lexicons in Descriptive Analysis,” Journal of Sensory Studies
(1998): 57–75.
33. M. Meilgaard, G. V. Civille, and B. T. Carr, Sensory Evaluation Techniques, 2nd ed. (Boca
Raton, FL: CRC Press, 1991).
34. R. L. Hall, “Flavor Study Approaches at McCormick & Company Inc.,” in Flavor Research
and Food Acceptance, ed. A. D. Little (New York: Reinhold, 1958), 224–240.
Introduction
The Texture Profile Method was developed in the 1960s by General Foods researchers
and published by Brandt et al.1 This sensory descriptive method’s philosophy and
approach were based on two important methods/contributions: the Flavor Profile
Method2,3 and the work by Szczesniak4 and Szczesniak et al.5 on the classification of
textural characteristics and the development of rating/intensity scales.
The use and adaptation of these two technically strong methodological approaches
provide the Texture Profile Method its fundamental core characteristics and philoso-
phy, specifically:
• A descriptive method whose objective, similar to that of the Flavor Profile Method,
is to develop well trained sensory panels, highly skilled in the evaluation of both the
texture qualitative and quantitative descriptive components. Trained Texture
Profile panels evaluate the products’ perceived sensory rheological properties in a
technical, rigorous, and comprehensive way.
• A method that incorporates rheological concepts into a sensory descriptive
approach in order to analytically define the perceived texture attributes, establish
controlled and detailed evaluation procedures, and quantify the perceived sensory
texture properties.
The Texture Profile Method was developed to evaluate the texture attributes of
foods. However, its fundamental principles have been the basis for the development of
methods for evaluating sensory rheological/texture properties of nonfoods. This
chapter discusses several of these adaptations.
1
IRIS: International Resources for Insights and Solutions, LLC, 234 Robin Hood Rd., Mountainside, NJ 07092, USA
https://orcid.org/0000-0001-5506-742X
DOI: 10.1520/MNL1320160028
Copyright © 2020 by ASTM International, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA 19428-2959
ASTM International is not responsible, as a body, for the statements and opinions expressed in this chapter.
ASTM International does not endorse any products represented in this chapter.
developed in food rheology by these researchers. The Flavor Profile Method was well
established by the 1960s but mainly addressed flavor evaluation. Thus, there was a
need for developing another sensory approach that focused on the sensory texture
evaluations of foods. The core characteristics of the Texture Profile Method, which
were established on the basis of the work mentioned above, are described in the
two sections that follow.
language and detailed definitions and evaluation procedures, and all other con-
cepts established in 1963 have been the basis for many applications and for the
development of texture language and evaluation procedures for diverse products,
including nonfoods, ever since.6 Examples of the impact of the Texture Profile
Method on other profile derivative descriptive methods, and other methodologies
and developments, are covered later under the section “Adaptations.”
CORE PHILOSOPHY
The adaptation and use of the contributions and principles of the Flavor Profile Method
and the work by Szczesniak and coworkers in 1963 established the core philosophy and
characterizing features of the Texture Profile Method:
• A technical descriptive method that follows a carefully planned panel screening
and training process designed to establish well-trained texture panels
• A rheologically based sensory approach that focuses on developing and using
accurate and rheologically based sensory language and controlled product
presentation and evaluation procedures to ascertain high-quality and thorough
texture measurements
• A comprehensive descriptive method that provides information on the mechanical,
geometrical, and fat/moisture-related products’ properties and the products’
structural changes/transformation when manipulated; thus capturing the
perceived texture changes throughout all manipulation stages (from initial through
residual)
• A descriptive method that establishes the practice and delineates the advantages of
developing and using quantitative/intensity references to score the attributes’
intensities
There are also other notable publications that have discussed the growth of the Texture
Profile Method.8–10 Thus, it is important that this publication covers the evolution of
the Texture Profile Method and the developments listed above. In applicable sections
of this chapter the advancement and modifications of the original Texture Profile
Method are highlighted and discussed, specifically how the methodology has been
adapted and is currently being used in food and nonfood texture evaluation.
Crispness: The amount of snap, as measured by force and noise, released from the chip upon the
first bite
(soggy to crispy)
Place another half between the molars. Using a steady force, bite through the chip twice to
evaluate:
Denseness: The compactness of the cross section of the sample while biting completely through
with the molars
(airy to dense/compact)
Cohesiveness of mass: The degree to which the mass holds together after chewing
(not cohesive/loose mass to cohesive mass)
Graininess of mass: The amount of small particles perceived by the tongue when the mass is gently
compressed between the tongue and palate
(not grainy to grainy)
Moistness of mass: The amount of moistness of the bolus after prescribed number of chews
(dry mass to moist mass)
Tooth packing: The amount of sample left within the crevasses of the teeth after swallowing/
expectorating
(no tooth pack to product packs in teeth)
I. First Chew
Place sample between molar teeth, bite, and evaluate for:
1. Hardness: Force required to bite through sample.
2. Adhesiveness: Degree sample sticks to teeth.
3. Cohesiveness: Degree to which sample deforms rather than ruptures.
4. Smoothness: Degree to which sample is free of grits and/or grains.
II. Chewdown
Place sample between molar teeth, chew, and evaluate for:
1. Chewiness: Number of chews necessary to prepare sample for swallowing.
2. Gumminess: Amount of energy required to disintegrate sample to a state ready for swallowing.
3. Adhesiveness: Degree to which sample sticks to (a or b) during chewing.
a. Roof of mouth (10–15 chews)
b. Teeth
4. Cohesiveness of mass: Degree to which sample holds together.
5. Denseness: Compactness of sample.
6. Moisture absorption: Degree to which sample absorbs saliva.
a. Rate
b. Amount
7. Crystalline: Degree to which sample is granular.
III. Breakdown
Describe changes occurring during breakdown.
IV. Residual
After swallowing the sample evaluate for:
1. Ease: Degree to which prepared sample is readily swallowed.
2. Chalkiness: Degree to which mouth feels dry or chalky after all of the sample has been swallowed.
3. Grittiness: Degree to which mouth contains small particles after all of the sample has been
swallowed.
4. Toothpacking: Degree to which sample remains in teeth.
With permission.
shown in Tables 1 and 2 is thorough because it captures all perceived texture attributes
throughout the complete product manipulation, providing information on the prod-
uct’s textural changes throughout its manipulation.
In the case of foods, the Texture Profile evaluation includes the complete evalua-
tion of texture perceptions, from (a) initial contact with the product (i.e., initial surface
characteristics as perceived manually or when in contact with lips or mouth surfaces),
to (b) how the product behaves and disintegrates/melts down in the mouth, to (c) how
TABLE 3 E
xamples of texture terms used in the Texture Profile Method and other
sensory texture evaluations7
Adhesiveness Force required to remove the material that adheres to a specific surface.
Adhesiveness to lips Degree to which the product adheres to the lips following slight
compression.
Adhesiveness to palate Force required to remove the product completely from the palate with the
tongue following complete compression between tongue and palate.
Adhesiveness to teeth Amount of product adhering to the teeth after mastication.
Self-adhesiveness in the Force required to separate individual pieces with the tongue.
mouth outside the mouth Force required to separate individual pieces with the back of a spoon
(contents of a standard cup placed on a plate).
Bounce Resilience, rate at which the sample returns to the original shape after
partial compression.
Chewiness Number of chews (at I chew/s) needed to masticate the sample to a
consistency suitable for swallowing.
Coarseness Degree to which the mass feels coarse during product mastication.
Cohesiveness Degree to which the sample deforms before rupturing when biting with
molars.
Cohesiveness of mass Degree to which the bolus holds together after product mastication.
Denseness Compactness of cross section of the sample after biting completely
through with the molars.
Dryness Degree to which the sample feels dry in the mouth.
Fracturability Force with which the sample crumbles, cracks or shatters.
Fracturability encompasses crumbliness, crispness, crunchiness, and
brittleness.
Graininess Degree to which a sample contains small grainy particles.
Gumminess Energy required to disintegrate a semi-solid food to a state ready for
swallowing.
Hardness Force required to deform the product a given distance, that is, force to
compress between molars, bite through with incisors, compress between
tongue and palate.
Heaviness Weight of product perceived when first placed on tongue.
Moisture absorption Amount of saliva absorbed by product.
Moisture release Amount of wetness/juiciness released from sample.
Mouthcoating Type and degree of coating in the mouth after manipulation (for example,
fat/oil).
Roughness Degree of abrasiveness of product’s surface perceived by the tongue.
Slipperiness Degree to which the product slides over the tongue.
Smoothness Absence of any particles, lumps, bumps, etc. in the product.
Springiness Degree to which the product returns to its original size/shape after partial
compression (without failure) between the tongue and palate or teeth.
(continued)
TABLE 3 E
xamples of texture terms used in the Texture Profile Method and other
sensory texture evaluations7 (continued)
Swallow, ease of Degree to which the chewed mass can be readily swallowed.
Tooth packing Degree to which the product sticks in the teeth.
Uniformity Degree to which the sample is even throughout.
Uniformity of chew Degree to which the chewing characteristics of the product are even
throughout mastication.
Uniformity of bite Evenness of force through bite.
Viscosity Force required to draw a liquid from a spoon over the tongue.
Wetness Amount of moisture perceived on product’s surface.
With permission.
the residual texture characteristics left in the mouth once the product is swallowed or
expectorated are perceived.
In the case of nonfoods, such as in the evaluation of creams/lotions, the texture/
skinfeel profile approach11 also captures the perceived texture/skinfeel attributes
throughout the entire product manipulation: from initial contact with the product
(i.e., pickup/initial characteristics as perceived manually), to how the product is per-
ceived on the skin during application, to the residual (“after feel”) sensory character-
istics left by the product on the skin once it is absorbed or the manipulation cycle
ends.
While many research and development (R&D) projects require this complete
evaluation, often a simpler assessment is sufficient. Cases for which a reduced
and focused texture profile evaluation is warranted is discussed later under
“Adaptations.”
work on scales in 1963 but that they had been addressed by Muñoz.12 The original
intensity reference scales5 have also been adapted for their use in other countries.16–18
Assessors
TARGET PANEL SIZE
Among the Flavor Profile Method’s techniques adapted by the Texture Profile Method
was the use of consensus for data collection. In a consensus setup, and based on the
Flavor Profile Method’s techniques, a small panel size is used. Thus, in its early years,
the Texture Profile Method used only six to nine panel members.1 However, larger
numbers have been used, and currently groups of 15 to 20 assessors are usually trained
as texture profilists.12,16,19,20
Availability
The training of a Texture Profile panel is an involved and relatively long process.
Therefore, it is important to communicate the time and commitment requirements to
prospective assessors. They must be available for all training and practice sessions, as
well be able to participate in projects, as needed. Therefore, the training and practice
schedules and product evaluation needs must be communicated. When employees
participate in the program, it is important that the schedule requirements also be com-
municated to the assessors’ management.
Personality
A positive attitude and ability to work with others are key factors to explore when
selecting candidates.
In the training and operation of a Texture Profile panel, a great deal of interaction
occurs among panel members. As described below in the panel training process, asses-
sors work as a group in the completion of many tasks. These tasks include effectively
reaching agreement in the development of the texture protocol (defining attributes and
developing evaluation procedures), in the review and discussion of references and
individual results, etc. Therefore, people with extreme personalities, such as exces-
sively shy or dominating candidates, should be avoided. Extremely shy candidates do
not actively participate in panel discussions. On the other hand, extremely dominating
assessors are controlling and negatively affect the group dynamics and the interaction
with the panel leader.
Assessors should have a positive attitude, be open to all input provided by other
panel members, positively share their opinions, actively participate in group discus-
sions to address differences, and effectively try to reach agreement in establishing tex-
ture evaluation procedures and protocols. Avoiding extreme personalities is more
crucial when the panel results are collected through consensus. An effective and posi-
tive group interaction must occur in order to be able to successfully reach consensus.
The best way to assess the candidates’ personalities and hopefully unveil extreme
personalities is by conducting one-on-one interviews.
Acuity
When the Texture Profile Method was developed and used in the 1960s and 1970s,
only one texture acuity test was used in the panel screening process: hardness.
Reference materials from the original standard hardness scale were presented to can-
didates, who were asked to arrange them in an increasing order of hardness.18,23
The screening of texture profilists currently includes more than one texture
attribute.6 The attributes are selected to cover the most important texture attributes
of the product categories in the program. Moreover, in most cases a panel is trained
not only in the Texture Profile Method’s procedures but also in other sensory
dimensions. Even when the Texture Profile Method was developed, panels were
also trained in the Flavor Profile Method to evaluate flavor in addition to
texture.1, 23 In these instances, the acuity screening should include flavor or other
sensory exercises to explore acuity in flavor or in other sensory dimensions. In the
case of nonfoods, acuity exercises to explore the acuity/perception of the sensory
attributes being measured should be conducted.24
sensory scientist or project manager with additional duties. In general, the Texture
Profile panel leader’s responsibilities are as follows:
• Schedule assessors for all panel sessions and activities described below
• Screen candidates
• Design and conduct (or be involved in) the training and practice sessions
• Conduct or supervise the execution of panel sessions
• Supervise sample and reference preparation
• Monitor the panel
• Record, compile, and report panel results
In addition, the panel leader might be a project manager with additional responsibili-
ties such as designing projects, interpreting and reporting results, managing resources
and providing guidance to other staff in the administration and performance of the
daily operation of the Texture Profile program.
approaches are briefly described in several sections of this chapter (e.g., “Adaptations”).
For the philosophical and practical considerations of universal and product-specific
panels, the reader is directed to the discussion presented by Muñoz and Civille.26
The training of a universal Texture Profile panel initially involves a considerable
amount of learning because the panel learns all the Texture Profile technical concepts,
language, evaluation practices, the scaling approach, and intensity references. This
method’s universal training consists of orientation and practice sessions. During the
orientations, the panel learns the philosophy and elements of the Texture Profile
Method and completes product evaluations. During the practice sessions, the panel
practices all concepts learned in the orientations and completes product evaluations.
Early publications23 report that a training program to fully train the panel in the
Texture Profile Method would involve 2 weeks of daily orientation sessions (each last-
ing 2–3 hours), followed by about 6 months of hourly practice sessions (4–5 times per
week). As currently practiced, the duration of a universal Texture Profile training pro-
gram is determined by the number and type of product categories included and the
assessors’ schedules. In general, current Texture Profile panels might require 2 weeks of
orientation sessions, 3–4 days/each (the second one conducted a few weeks after the
first orientation session and weeks of practice), and 3–6 months of practice sessions
(3–4 times per week). The training of the Texture Profile panel covers the following:7
• Basic concepts of texture/rheology and texture perception
• Principles of the Texture Profile Method
• Use of intensity reference scales to demonstrate specific texture characteristics and
the procedure to quantify their intensities
• Evaluation of practice samples
• Expansion of the basic method’s procedures to specific products
Civille and Szczesniak23 presented an overview of the suggested training ele-
ments, such as basic concepts, the components of both the Flavor and Texture Profile
Methods, the definitions of texture attributes, the use of intensity reference scales,
practice in the use of scales, and products’ texture evaluations. A complete description
of the Texture Profile training approach is given in the next section. This approach is
typical for the training of a Texture Profile panel1,4,5 that is universal in nature, since
the training program includes a large variety of products. This approach is easily
adapted for training programs on nonfood texture that maintain the core technical
philosophy of the Texture Profile Method.
The panel reviews each of the references using the protocol described in Part 1.
In reviewing each of the reference points, the panel learns the evaluation procedure
and the attribute by experiencing low, medium, and high intensities of that attribute.
The above procedure is followed for all the attributes chosen by the trainer. This initial
review gives the panel not only the core knowledge of the main texture attributes but
also, more importantly, of the required evaluation procedures and the direction of
each of the scales. The intensity scales are reviewed several times during the training.
In addition, these scales are periodically reviewed after the training is completed and
during project work, as needed.
Data Collection
CONSENSUS
The section “Background and Core Philosophy” presents the Flavor Profile Method’s
techniques that were adapted by the Texture Profile Method. A few of these techniques
included the use of the original 5-point (or the modified 7- or 14-point) Flavor Profile
scale and collecting data through consensus.
The process followed by a Texture Profile panel to reach consensus has been
described by Brandt et at.1 and Muñoz et al.7 Profilists score samples independently
and a discussion follows. Individual scores for each attribute are tallied and shown to
the group for the discussion. The panel reviews and discusses individual results, shar-
ing opinions and reaching consensus. In these discussions and in the case of disagree-
ments, the attributes, definitions, evaluation procedures, and references are reviewed.
When needed, samples in question are reevaluated in an attempt to reach consensus.
Texture Profile scores are reported as consensus values.7,23
Currently, most Texture Profile panels provide individual scores (see next
section). However, collecting data through consensus in the Texture Profile Method
and in other descriptive methods is still practiced. This approach is controversial.
While some researchers criticize the technique, others highlight its advantages.
Several researchers have assessed the value of consensus and/or have discussed their
advantages/disadvantages.6,27–29
Scales
Currently, Texture Profile data are collected using a variety of scales and treated as
individual responses.9,12,15,20,30–32 Assessors evaluate the products individually using
the protocol developed to score all products’ attributes. Different scales (e.g., line, 10,
15, 100-point scales) are currently used, and a few researchers have also used magni-
tude estimation in Texture Profile studies.33
Replications
While original Texture Profile Method practices did not address replication, most
current Texture Profile panels incorporate replications in products’ evaluation of foods
and nonfoods.
Data Analysis
Texture Profile data are currently analyzed using diverse statistical analyses. When
describing how the Texture Profile Method had evolved since its inception, Larmond9
and Skinner10 discussed the use of statistical analyses of Texture Profile data. Both
authors also emphasized the use of multivariate statistical analysis approaches to ana-
lyze and present Texture Profile results and showed examples.
Researchers and practitioners who have trained and used Texture Profile panels
have used a variety of statistical techniques to analyze and present their results. These
techniques include the use of analysis of variance (ANOVA), correlation analysis, and
multivariate analysis. For example, Dransfield et al.20,32 described the way meat Texture
Profile results were analyzed through ANOVA, correlation, principal coordinate anal-
ysis, and principal component analysis (PCA). Other authors have reported the use of
similar statistical approaches (ANOVA, correlation analysis, and multivariate analy-
sis) for the analysis of Texture Profile data.15,22,30
In summary, interval Texture Profile data are currently treated as any other inter-
val descriptive data using univariate and the multitude of statistical techniques avail-
able to sensory practitioners. The specific analyses are chosen based on the test design
and study objectives.
Texture Profile data can be intrinsically variable; in foods because of assessors’
differences in mouth size/volume, teeth, saliva, dental status, etc., and in nonfoods
because of assessors’ differences in skin, hair type, etc. However, this method’s results’
variability due to scoring might be lowered through the use of intensity reference
scales. This practice, which aids in yielding lower variability in the data, provides addi-
tional discriminative ability to the statistical techniques used.
SAMPLE SELECTION
Samples to be evaluated should be carefully selected in a validation study. They should
be sufficiently but not extremely different. One or several sets of samples can be
included in the validation study. Romero del Castillo et al.34 covered the importance of
the samples’ characteristics in validation studies and described the selection and
preparation of samples to validate a texture panel. When samples with known vari-
ables are selected, it is possible to assess the panel performance and determine whether
the panel is ready for routine projects. A profile panel is expected to show differences
in the attributes known to be different.6
one evaluation to the other, and the ability of panel members to agree with one another.
Skinner10 mentioned the importance of checking individual and group data
repeatability.
Currently, because different scales are used to obtain attribute ratings, diverse
criteria can be studied to evaluate the Texture Profile panel performance. The most
important panel performance measures used to assess and monitor Texture Profile
individual assessors and panel are repeatability, scale usage, discrimination ability/
sensitivity, agreement, and reproducibility.35
Several of these performance measures have been used by other researchers to
assess texture profile panels. For example, Thybo and Martens13 published a valuable
approach for assessing the performance of a Texture Profile panel, such as its discrim-
ination ability and reproducibility, using several univariate and multivariate statistical
techniques, such as ANOVA and discriminant partial least-squares regression.
Rousset-Akrim et al.21 used a variety of statistical techniques such as correlation,
ANOVA, and stepwise discriminant analysis in their research work of selective tests
for discerning “an efficient assessor” in texture profiling.
The topic and techniques for panel performance have been discussed extensively
in other publications. Thus, the reader is encouraged to consult other publications that
have discussed the importance of these measures,35 provided recommendations, and
covered suggested statistical approaches to assess profile data and sensory profile pan-
els’ performance.36–38
Reporting of Results
Texture Profile results are summarized in reports or presentations.
CONTENTS OF REPORTS
Muñoz et al.7 recommended that a Texture Profile report include the objectives of the
study; identification and preparation of the samples; the ballot (showing the phases of
evaluation, terminology [attribute definitions and evaluation procedures], and inten-
sity scales used); summary of the techniques used; tables, charts, and graphs; and a
discussion of the results focused on differences and similarities among the test
samples.
Civille and Szczesniak23 also recommended that the Texture Profile report pres-
ents the frame of reference, i.e., how the panel was “standardized” prior to the evalua-
tion and the reference standards used to orient the panel. Additional important
characteristics that Muñoz and Keane6 recommend in a Flavor or Texture Profile
report are: executive summary, background, sensory test design, summary of data
analysis, conclusions, and recommendations.
PRESENTATION OF RESULTS
When consensus results are reported, charts or graphs are the most effective way of
displaying these results. Figure 1 shows the distinctive graphic representation of Texture
Profile results, which was commonly used in the early years of the method and is
typical of the Texture Profile Method.7,10
Tables and a variety of graphs are currently used to depict Texture Profile results
depending on the study objectives and test design, and the corresponding statistical
analysis is completed. The section above “Data Collection” presented a few of the nota-
ble publications that have shown the use of assessors’ individual scores, the corre-
sponding statistical analysis, and the presentation of results of Texture Profile data.
As examples, below are a few of the presentation formats that have been used to
report Texture Profile results, which are specific to the studies completed and the data
analysis performed.
Table 4 shows an example of tabular Texture Profile panel evaluation results. This
table shows selected information from the complete results presented by Dransfield et
al.20 on texture evaluation of different types of meats and preparation procedures.
Figures 2 and 3 are examples of current graphical representations of texture data.
Figure 2 depicts the Texture Profile panel results of herbal gels as a function of concen-
tration, studied and reported by Cui et al.16 These authors also showed the correlation
and data relationship analysis results and plots of the Texture Profile and instrumental
measures studied in their research.
Figure 3 is an example of a multivariate map of texture results. This format is cur-
rently commonly used to depict the diverse and sometimes complex texture data. The
application of multivariate statistical analyses presently used allow depicting texture
results in multivariate maps showing attribute relationships and other specific analysis
results. For example, figure 3 shows the PCA map (PC2 vs. PC1) of the texture and
appearance evaluation results of dry dog food products.39 Other presentations of the
statistical analysis results or Texture Profile and other texture evaluations include the
discriminant partial least-squares regression maps of texture research results on pota-
toes,13 preference mapping of the texture of dulce de leche,40 the texture and appear-
ance evaluation results of lip products,41 etc.
In general, the presentation format of Texture Profile results depends on the test
design, the data analyses conducted, and the individual researchers’ preferences.
Attributes
Chicken
Roast 3 4.1 0.8 5.1 0.4 6.8 2.1 4.2 1.4
Casserole 3 5.1 1.1 5.4 0.4 6.1 2.5 4.9 1.6
Beef
Microwave 5 7.7 4.0 8.2 4.4 7.0 2.9 8.2 4.8
Grill 3 5.1 1.6 6.3 1.3 8.4 1.4 5.3 2.3
Lamb
Roast 4 5.6 2.0 6.2 1.4 7.6 1.7 5.8 2.8
Processed 1 2.9 0.7 5.4 1.1 6.6 2.5 3.1 0.7
Pork
Roast 11 6.0 1.8 6.2 0.8 6.3 2.5 6.0 2.3
Processed 1 4.3 1.4 4.8 1.0 6.5 1.5 4.4 0.8
Note: Only selected information from original source results are presented.
Values are the means of n samples of diverse types of meat/cooking for: IR = initial resistance; RUB =
rubberiness; Fco = fiber cohesiveness; CT = connective tissue; J = juiciness; Res = residue; T = toughness;
Ch = chewiness.
FIG. 2 T
exture profile panel scores of herbal gel samples as a function of the
ingredients’ concentration.16 (With permission).
FIG. 3 P
rincipal components analysis map (PC2 vs. PC1) of the texture and
appearance evaluation results of dry dog food products.39 (With permission).
Adaptations
Several of the modifications and advances of the original Texture Profile Method
have been mentioned throughout this chapter. In this section, a summary of this
method’s adaptations is presented under two main categories: (a) evolution of the
Texture Profile Method and development of other texture evaluation methodology
(covering Texture Profile shortened/focused program, Derivative Profile Methods,
Temporal Texture Evaluation Methods and Descriptive Analysis Technical Texture
Evaluations) and (b) adaptation of the Texture Profile Method for the evaluation of
nonfoods (covering Skinfeel, Handfeel and other products [e.g., lip products, pet
food, etc.]).
• Focused/attribute panels
For specific objectives (e.g., shelf-life studies, category-specific evaluations), attribute
Texture Profile panels might be trained to focus only on specific attributes of interest.
• QC/QA sensory programs
This application is another type of a focused program, in which only specific
attributes need to be evaluated. In this application, selected texture and possibly
other sensory attributes that have been identified as variable or with production
issues are included in the evaluation.42
• Others
texture evaluations, specifically the way (a) texture attributes should be clearly and
technically named and defined, (b) a detailed texture evaluation procedure or protocol
should be developed, (c) proper texture scale anchors should be established (e.g., soft to
firm, airy to dense, etc.), and (d) quantitative/intensity references can be used.
In summary, texture evaluations are technical in nature and methods or evalua-
tions that apply any of the above features adapt/use the fundamental characteristics of
the Texture Profile Method.
Handfeel Evaluations
The principles of the Texture Profile were also applied to develop the sensory and
descriptive methodology for the evaluation of fabrics and paper products. Parallel to
the developments of the Texture Profile Method, Brand47 developed and published the
fundamental concepts of what he called “aesthetics of fabrics,” incorporating appear-
ance and handle. These concepts provided the framework for assessing handfeel char-
acteristics of fabrics. Brand outlined the aesthetics “concepts” of fabric (defined as the
factors or elements whose relationship defines aesthetics) as body, cover, surface tex-
ture, resilience, drape, and style.
Summary
The Texture Profile Method represents a milestone in the evolution of sensory science
and descriptive analysis. This method is a notable contribution not only because it was
one of the first descriptive methods established, but also because it truly provided the
foundations for all sensory evaluations of the texture/rheological properties of
products.
This method’s contributions are palpable when used as originally developed and
in all its adaptations in descriptive methods. Specifically, in terms of adaptations, the
Texture Profile Method’s techniques have been replicated by other descriptive meth-
ods or the techniques have been used as the basis for texture evaluations in all descrip-
tive methods.
In terms of contributions, the Texture Profile Method provided to the field of
sensory science the following key philosophical concepts and methodological
approaches, which continue being used as originally developed or in a modified form:
• The value of developing and using precise and analytical texture attributes,
definitions, and evaluation procedures.
• Together with the Flavor Profile Method, the use of consensus as a dynamic panel
practice to discuss results or to reach agreement, or both.
• The practice of using intensity/quantitative references to provide the benefits
described by the founders of the Texture Profile Method and by other professionals
who have adopted this approach.
• The pioneering and early methodology of temporal evaluations because the Texture
Profile Method captures temporal changes of the products’ texture throughout
their manipulation, from initial contact to complete use (i.e., complete chew-down
in foods, complete lotion absorption in skin, etc.).
This chapter’s main objectives were to summarize (a) the original method’s contribu-
tions, key characteristics, needs, and steps for establishing a texture profile descriptive
program and (b) the way many of the philosophical concepts and approaches devel-
oped by the original Texture Profile Method have been updated, as well as how they
were adapted and are being used in other descriptive methods.
The original Texture Profile Method was developed as a global/universal method
for evaluating the texture of many products, mainly foods. Over the years this meth-
od’s practices have been adapted in the following ways, demonstrating the impact that
this method has had in the evolution of descriptive analysis:
• Evolution of the Texture Profile Method and development of other methodology
(e.g., development of texture derivative profile methods, temporal methodology,
etc.)
• Adaptation of the Texture Profile Method for the evaluation of nonfoods (e.g.,
lotions, hair, paper and lip products, pet food, etc.)
References
1. M. A. Brandt, E. Z. Skinner, and J. A. Coleman, “The Texture Profile Method,” Journal of
Food Science 28 (1963): 404–409.
3. J. F. Caul, “The Profile Method of Flavor Analysis,” Advances in Food Research 7 (1950):
1–40.
6. A. M. Muñoz and P. A. Keane, “Original Flavor and Texture Profile and Modified/Derivative
Profile Descriptive Methods,” in Descriptive Analysis in Sensory Evaluation, ed. S. E. Kemp,
J. Hort, and T. Hollowood (Hoboken, NJ: John Wiley & Sons, 2018), 237–286.
9. E. Larmond, “Beyond the Texture Profile,” in Food Structure: Its Creation and Evaluation,
ed. J. M. V. Blanshard and J. R. Mitchell (Oxford, UK: Butterworth-Heinemann, 1988),
449–464.
10. E. Z. Skinner, “The Texture Profile Method,” in Applied Sensory Analysis of Foods, ed.
H. Moskowitz (Boca Raton, FL: CRC Press, 1988), 89–110.
11. N. O. Schwartz, “Adaptation of the Sensory Texture Profile Method to Skin Care Products,”
Journal of Texture Studies 6 (1975): 33–42.
16. S. Cui, K. Yu, Z. Hu, and G. Wei, “Texture Profile of Herbal Gel Affected by Instrumental
Parameters and Correlations of Instrumental and Sensory Evaluation,” Journal of Texture
Studies 42 (2011): 349–358.
17. G. Hough, A. Contarini, and A. M. Muñoz, “Training a Texture Profile Panel and
Constructing Standard Rating Scales in Argentina,” Journal of Texture Studies 25 (1994):
45–57.
21. S. Rousset-Akrim, J-F. Martin, C. Pilandon, and C. Touraille, “Research of Selective Tests
for Discerning ‘An Efficient Assessor’ in Texture Profiling,” Journal of Sensory Studies
10 (1995): 217–237.
22. B. W. Berry and G. V. Civille, “Development of a Texture Profile Panel for Evaluating
Restructured Beef Steaks Varying in Meat Particle Size,” Journal of Sensory Studies
1 (1986): 15–26.
23. G. V. Civille and A. S. Szczesniak, “Guidelines to Training a Texture Profile Panel,” Journal
of Texture Studies 4 (1973): 204–223.
24. Standard Guide for Two Sensory Descriptive Analysis Approaches for Skin Creams and
Lotions, ASTM E1490–19 (West Conshohocken, PA: ASTM International, approved
November 1, 2019), http://doi.org/10.1520/E1490-19
26. A. M. Muñoz and G. V. Civille, “Universal, Product and Attribute Specific Scaling and the
Development of Common Lexicons in Descriptive Analysis,” Journal of Sensory Studies
13 (1998): 57–75.
27. E. Chambers IV, “Consensus Methods for Descriptive Analysis,” in Descriptive Analysis in
Sensory Evaluation, ed. S. E. Kemp, J. Hort, and T. Hollowood (Hoboken, NJ: John Wiley &
Sons, 2018), 213–236.
31. L. B. Aust, L. P. Oddo, J. E. Wild, and O. H. Mills, “The Descriptive Analysis of Skin Care
Products by a Trained Panel of Judges,” Journal of the Society of Cosmetic Chemists
38 (1987): 443–449.
33. A. V. Cardello, A. Matas, and J. Sweeney, “The Standard Scales of Texture: Rescaling by
Magnitude Estimation,” Journal of Food Science 47 (1982): 1738–1740.
34. R. Romero del Castillo, J. Valero, F. Casañas, and E. Costell, “Training, Validation and
Maintenance of a Panel to Evaluate the Texture of Dry Beans (Phaseolus Vulgaris L.),”
Journal of Sensory Studies 23 (2008): 303–319.
35. Standard Guide for Measuring and Tracking Performance of Assessors on a Descriptive
Sensory Panel, ASTM E3000-18 (West Conshohocken, PA: ASTM International, approved
April 1, 2018), http://doi.org/10.1520/E3000-18
36. J. A. McEwan, E. A. Hunter, L. J. Van Gemert, and P. Lea, “Proficiency Testing for Sensory
Profile Panels: Measuring Panel Performance,” Food Quality and Preference 13 (2002):
181–190.
37. E. A. Hunter and D. D. Muir, “A Comparison of Two Multivariate Methods for the Analysis
of Sensory Profile Data,” Journal of Sensory Studies 10 (1995): 89–104.
38. T. Næs and R. Solheim, “Detection and Interpretation of Variation within and between
Assessors in Sensory Profiling,” Journal of Sensory Studies 6 (1991): 159–177.
39. B. D. Donfrancesco, K. Koppel, and E. Chambers IV, “An Initial Lexicon for Sensory
Properties of Dry Dog Food,” Journal of Sensory Studies 27 (2012): 498–510.
40. G. Ares, A. Giménez, and A. Gámbaro, “Preference Mapping of Texture of Dulce de Leche,”
Journal of Sensory Studies 21 (2006): 553–571.
41. L. M. Dooley, K. Adhikari, and E. Chambers IV, “A General Lexicon for Sensory Analysis of
Texture and Appearance of Lip Products,” Journal of Sensory Studies 24 (2009):
581–600.
45. C. Kuesten, J. Bi, and Y. Feng, “Exploring Taffy Product Consumption Experiences Using a
Multi-Attribute Time–Intensity (MATI) Method,” Food Quality and Preference 30 (2013):
260–273.
46. L. M. Duizer, E. A. Gullett, and C. Findlay, “The Relationship between Sensory Time-
Intensity, Physiological Electromyography and Instrumental Texture Profile Analysis
Measurements of Beef Tenderness,” Meat Science 42 (1996): 215–224.
49. Standard Guide for Sensory Evaluation of Household Hard Surface-Cleaning Products with
Emphasis on Spray Triggers, ASTM E2346 (West Conshohocken, PA: ASTM International,
approved November 1, 2015), http://doi.org/10.1520/E2346_E2346M-15
Introduction
Quantitative descriptive analysis (QDA) is a unique system for obtaining quantitative
sensory perceptual information about products. The system relies on a group of quali-
fied subjects who are trained to verbalize their sensory perceptions for attributes of
products. Individual responses are collected in replicate, yielding a quantitative data
set for further statistical analysis, a key component of the QDA method.
The QDA method can be used across the product life cycle, including, for exam-
ple, product development, ingredient substitution, monitoring competition, sensory
claims, and quality control. The QDA method is based on experimental and behav-
ioral psychology and relies on repeated measures and statistical analysis to determine
significant effects.
One value of a quantitative data set is to identify relationships between a QDA
panel’s description of those products, the technical language used by experts to
describe those products, and a larger population of consumer preferences to products.
This information provides a basis for relating specific ingredients and processing vari-
ables to consumer perception.
The QDA method was first published in 1974 by Stone et al.,1 introducing a proce-
dure for quantifying sensory descriptive perceptions. The method also introduced
several concepts new to descriptive testing, including individual data rather than con-
sensus opinion, an unstructured graphic rating scaling, replication, and specified sta-
tistical treatment of the resulting data. QDA is consumer-oriented in panel selection,
training, and attribute language and is useful for a wide variety of product develop-
ment and marketing activities, including product optimization research. This chapter
describes several product applications and more rapid approaches, including
1
ragonfly SCI, Inc., 2360 Mendocino Ave., Ste. A2-375, Santa Rosa, CA 95403, USA
D
2
niversity of California, Davis, Continuing and Professional Education, Sensory & Consumer Science Certificate
U
Program, Davis, CA, USA
3
Tragon Corp. (deceased)
DOI: 10.1520/MNL1320170005
Copyright © 2020 by ASTM International, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA 19428-2959
ASTM International is not responsible, as a body, for the statements and opinions expressed in this chapter.
ASTM International does not endorse any products represented in this chapter.
diagnostic descriptive analysis (DDA), which includes more subjects, selected attri-
butes, and fewer replications; and the master QDA panel approach, which utilizes
subjects who are professionals and have training in their area of expertise (e.g., chefs,
brewmasters, winemakers, cosmetologists). It also describes the procedure and ratio-
nale and experimental and behavioral sciences that have supported the methodology.
Overview of Method
QDA is a consumer behavioral approach that uses trained descriptive panels to mea-
sure a product’s sensory characteristics. Panel members use their senses to perceive
product similarities and differences and articulate those perceptions in their own
words.
The QDA method involves measuring perceptions derived from products as the
source of the stimulation. The method embraces the uniqueness of our measuring
instrument (humans) based on physiological, psychological, and behavioral differ-
ences, along with experiential differences with a specific product or related products.
The responses reflect what is perceived in its entirety—how a product looks, smells,
feels (in the hand and mouth), and tastes; or, for nonfood items, the product experience
before, during, and after usage.
This matter of “perceived in its entirety” is an aspect of the sensory process that is
often disregarded or not fully appreciated by the descriptive analysis methods. When
planning a test and considering the evaluation process, there was a tendency to think
in narrow terms of flavor or texture; however, this is not the entire picture of what is
perceived. These sensory interrelationships must be considered regardless of the type
of test that is planned. If not accounted for, useful information may be lost, and the
overall value of the sensory resource may be compromised. The overall goal of the
QDA method is to provide a complete measure of the perceived sensory experience of
the products under study.
The QDA methodology has been applied to a wide variety of consumer goods,
including all types of foods and beverages, personal and household care, office sup-
plies, apparel, furniture, wearable electronics, and appliances. The application of the
QDA methodology is limited only by the creativity of the sensory scientist.
In the QDA method, qualitative references are used to clarify the definitions of
sensory attributes on an as-needed basis and to ensure that all subjects have a common
experience. References are complex, and the sensory scientist must take care to ensure
that they are used to help clarify the definition and to provide a common experience to
all subjects. The reference material may consist of a complex finished product (sour
lemon candy), less complex products (lemon juice), or individual ingredients such as
lemon peels, lemon oil, or lemon extract. However, each reference is carefully selected
when necessary, as deemed by the panel moderator, to aid panel discussions, provide a
common sensory experience, and promote a common language among the panel
members.
provide directions as to how best to proceed with the qualifying process. Each product
category and each test type will have some unique requirements. Therefore, a basic set
of guidelines is far more useful for the sensory professional when developing a pool of
qualified subjects.
Subject guidelines to consider include the following:
• Participation should be voluntary and can be stopped at any time.
• All subjects’ personal information should be kept confidential.
• Any allergic or medical conditions that may impact testing should be determined.
• Subjects should like and be average or above-average users of the product categories
under study.
• Subjects should have demonstrated sensory skill within the product categories
under study.
People have a wide range of sensory skills in terms of their sensitivities as well as
in their abilities to articulate what they perceive. There is currently no way of knowing
in advance who these people are. However, it has been well established that consumers
who use a product frequently tend to be more skilled (than those who do not or use it
infrequently). Also, one’s ability to discriminate among products of the type being
tested is a very good indicator of sensory skill. The combination of these two criteria
serves as a reasonable basis for selecting those individuals best suited for language
development.
These procedures are designed to identify those individuals who are most likely to
perceive differences, articulate the basis for those differences, and provide intensity
judgments with a high degree of reliability. However, even with this screening, those
individuals who meet all criteria will not be equally sensitive and reliable. The screen-
ing process is intended to identify those individuals who are insensitive or unreliable,
or both, and eliminate them from further consideration as subjects, leaving a group of
people who are reasonably homogeneous in their sensory skills.
Potential subjects are prerecruited following a specific script (usually via tele-
phone or online) that begins with appropriate category usage qualification. The inter-
view includes an assessment of availability, verbal fluency, interest, and comfort with
participating in a group activity.
Once a potential set of qualified volunteers has been identified, the sensory staff
should expect to begin with some form of orientation and screening within a few
weeks. Failure to initiate screening within a few weeks will reduce interest and make it
more difficult to recruit others. The orientation includes a brief explanation (about 10
minutes) of sensory science followed by a booth-test demonstration. This first test in a
booth is planned to yield almost unanimous agreement among the subjects and allow
them to gain familarity with evaluating products, and this should have a direct and
positive impact on motivation.
Orientation and screening should be done with not more than about 10 to 12
individuals at a time. This ensures that each person is given adequate attention and
that all of his or her questions are answered. It also makes for a more manageable effort
for the experimenter. The objective is to make each person feel that his or her responses
are important and a benefit to the program. At the same time, it also is important to
avoid creating the concept of an elitist group.
The actual screening will vary depending on each company’s products; how-
ever, the common point is that the screening tests use products of interest (i.e.,
products that will be evaluated in tests that follow). Sensory acuity screening is best
done using the discrimination model to determine whether subjects can find the
known product differences. While procedures such as ranking and scoring can be
used to supplement discrimination testing, they are of secondary importance in
assessing sensory skill.
The primary purposes of screening are to identify and eliminate those individuals
who cannot follow instructions, are insensitive to differences among products being
evaluated, exhibit little interest in the activity, or lack verbal fluency. Perhaps most
importantly, the screening should identify those individuals who can detect differ-
ences at better than chance among the products of interest. These are the ones identi-
fied as qualified based on sensory acuity. This can easily be achieved in about 5 to 6
hours (over several days and sessions), even starting with individuals with no prior
experience (i.e., naive consumers).
Once qualified in the preinterview, about 30 subjects (in groups of about 10)
report to a central location testing facility and are given a series of up to 20 discrimina-
tion tests over the course of several days. These tests are designed to cover the range of
products in the category (or categories) of interest and cover each modality that will be
evaluated by the panel, including differences before, during, and after usage. The dis-
crimination tests range from easy to moderate to difficult and are based on the product
set(s) of interest. Subjects scoring significantly above and beyond chance in the series
of screening tests (e.g., ≥70% correct duo-trio testing) qualify as having satisfactory
sensory acuity.
To establish an array of discrimination tests, the sensory professional must bench-
screen the category of products to be tested, identify observable differences, and pre-
pare a series of product pairs that include differences before, during, and after usage.
The number of pairs should be up to 20; with replication, this provides up to 30 or 40
trials/judgments, which is sufficient for subjects to demonstrate their abilities and for
the sensory professional to be able to classify individuals on the basis of their sensitiv-
ity and reliability. The effectiveness of this approach is based on the subsequent test
performance of those individuals. The degree of difficulty of the product pairs should
be considered. If more than a few product pairs are easy to differentiate the screening
process will not be effective, as most everyone will qualify. Only after this procedure
has been used once or twice will the sensory professional know whether pairings rep-
resent the desired easy, moderate, and difficult options. After screening, about 12 sub-
jects will be identified as potential candidates for language development on the basis of
their sensory acuity and their availability to complete the project.
session, to select products or product pairings, or both, and to determine the emphasis
that will be given to specific product characteristics (color, flavor, etc.). An additional
90 to 120 minutes of moderator time is required after each session to record the infor-
mation developed in the session, organize the paperwork for the next session, and so
forth. Panel moderators usually have an assistant to help with these activities, such as
creating and editing the scorecard and definitions of terms and preparing products.
The overall objectives of the language-development sessions are to develop a
scorecard with a list of attributes that reflect the subject’s perceptions in common,
everyday language. The panel creates a set of definitions for those attributes and an
evaluation procedure for a set of products. Because it is the subjects who will evaluate
the product, it is important to keep in mind that these activities reflect what they per-
ceive, not what you, the panel moderator, would like them to perceive. Within this
framework, there are a series of specific tasks that need to be accomplished:
• Develop a language (the attributes) that fully describes the products
• Develop clear definitions/explanations for each attribute
• Order the attributes within each modality (appearance, aroma, etc.)
• Develop and document the evaluation procedures
• Develop a frame of reference for scale use
• Practice scoring products
To begin the language development process, an individual and group orientation
is provided for the subjects. During the orientation, the panel moderator facilitates
introductions and introduces the general concepts of language development, describ-
ing their sensations and perceptions of the product category. This introduction and
orientation should take no longer than 20 minutes.
Immediately following the orientation, the language development process begins
as a group activity. Panel members are provided with an appropriate amount of the
product for evaluation—more than they will actually use. The first product given to
each subject should be the one most “typical” for the category (e.g., a benchmark,
gold-standard control, or one that defines the category), along with a category sheet.
Each panel member is asked to divide his or her perceptions into categories/modalities
such as before usage (visual, aroma, etc.), during usage (application, flavor, mouthfeel,
handfeel, etc.), and after usage. Once all panel members have written down their indi-
vidual perceptions, the panel moderator will call on all subjects to describe what they
have written, tracking each response on the board. This process is repeated for three or
four products that best represent the range of products in the research, when typically,
90% of the words needed to describe the product category will have been generated.
Products are thoughtfully selected by the panel moderator to ensure the range of
differences within the product array of interest has been provided to the panel. Each
panel session is about 90 minutes in length, and it may require more than one session
to describe three or four products because of the physical nature of the category. The
language development process is iterative (i.e., the words generated in the first few ses-
sions to cover the product category are reviewed, discussed, and defined). The subjects
then practice scoring products using an unstructured graphic rating scale (fig. 1). They
develop a comprehensive list of words to describe the product array and the specific
procedures for their evaluation that is most typical for the category of interest. In addi-
tion, the subjects decide upon appropriate anchor words for each scale (such as weak to
strong or slightly to very).
The panel moderator creates definitions for each attribute scored based on input
from the panel so that the final definitions represent a true group consensus. The defi-
nitions are always present in the data-collection sessions so that subjects can reference
the meaning of each attribute. This panel method is designed to provide direct con-
sumer feedback to the technical developers and marketing teams on how these prod-
ucts are similar and different based on their sensory properties, absent of brand and
imagery.
During language development, subjects practice scoring and discuss their per-
ceptions on the scale. Scale context is established by exposing subjects to the actual
range represented by the stimuli (i.e., products) in the category or in the project.
Subjects establish their own scale location for products included in the range of, and
practice with, the training stimuli. It is not expected, or necessary, that all subjects
perceive or score the low-intensity product at the same exact scale location. Individual
subject differences due to scale location are accounted for, and extracted with, the
analysis of variance (ANOVA) model used to analyze the data.
In training sessions in which subject’s scores are distributed over an unusually
large-scale range, those scores are shown (e.g., whiteboard, flipchart, computer screen)
and the panel discusses whether that range is consistent with their perception. The
subjects decide for themselves whether their score represents what they perceive. As
indicated in the previous paragraph, differences in scale location are accounted for. All
that is required is that subjects remain reasonably consistent (i.e., normal expected
variation) in applying their internalized frame of reference and scale and through
repeated product exposure and practice with the scale; subjects will generally demon-
strate increased consistency in their ratings. The initial reaction from most individuals
is to look for “the correct location” on the scale or to see whether the person adjacent to
them is making a mark in the same place.
In the third and subsequent sessions, qualitative reference materials may be used
to help subjects where necessary. Generally, references are presented without identifi-
cation, only with a letter or number and subjects are asked to write a word or two to
describe what they perceive. The panel is then told that these are presented to provide
a common experience and asked whether these references match any of the attributes
on their scorecard. Any material can be used; however, the raw materials used to for-
mulate the products may be most helpful in this regard. Subjects may also want to
bring materials from home if they believe it will help to explain a particular sensation.
This is perfectly acceptable, but here too, the panel moderator should exercise control
so that it does not become a mission (i.e., each subject feels he or she must bring some-
thing). Sometimes references encourage the subjects to talk with each other, which is
essential if the attributes are to be understood in a similar manner. References may be
useful in clarifying attribute meaning and in deciding which attributes are most useful
to the panel in describing the product array. Where references are helpful, they should
be documented so they are available for future applications. References also can be
helpful when training new subjects; however, it is not necessary, required, or recom-
mended to have a reference for every attribute. When references are used, they should
be identified and sampled in relation to the specific attribute(s) being discussed.
Some comments are warranted about the QDA language that is used to character-
ize the products. Because it is the basis for a descriptive test, questions are often raised
as to how subjects develop attributes, how one knows if the attributes are correct, and
how one knows when there are enough attributes.
First, the reader should keep in mind that the words used by the subjects are labels
to represent sensations. For example, if a panel uses the words “salty flavor” it does not
necessarily mean that there is a direct relationship with the amount of salt (or of
sodium ions) in a product’s formula. Likewise, attributes such as fruit flavor or artifi-
cial are most likely composites of several ingredients rather than a specific ingredient.
They represent a panel consensus as to what word (or group of words) to use to best
represent a particular sensation in the context of that set of products—hence, the
observation that the attributes should be viewed as labels and not as having a direct
relationship with a specific ingredient or having some greater meaning beyond the
product category being evaluated. Information about the effects of ingredients can be
developed through use of appropriate design studies. It is most important, at least in
the short term, for the panel to develop a list of attributes that enables them to describe
product similarities and differences in a reliable and valid manner.
Theoretically, there is no limit as to the number of attributes that a panel can
develop to fully describe a set of products. However, there are practical limits that
reflect the ability of humans to process information. In addition, time spent in discus-
sion also helps to minimize the duplication of attributes. Empirically it has been
observed that most product scorecards have about 40 attributes (heterogeneous prod-
ucts could have more attributes), and these attributes are grouped by modality, e.g.,
appearance, aroma, etc. While the total number of attributes can be quite large, the
number associated with each modality will be much smaller. For example, there could
be 10 attributes to encompass product appearance, another 7 or 8 for aroma, 10 to 12
more for flavor, and so on. Because each modality is evaluated sequentially, the total
number of attributes is much less important. As previously stated, there is nothing
special about the number of attributes or the specific words the subjects use, within
certain constraints. It is better to have too many, rather than too few, attributes. Too
few attributes results in less sensitivity and less product differentiation. Too many
attributes adds some panel time, but it can result in improved product differentiation.
Because all subjects will not be equally sensitive to all attributes, having a few
additional attributes increases the likelihood that product differentiation will occur. It
reduces frustration for some of the subjects to include attributes that some of them
detect better than others. During training the subjects have numerous opportunities
to evaluate products and to use the attributes. It is the panel moderator’s responsibility
to help the subjects address the different “perceptions” and where possible to resolve
any disagreements that arise over the presence or absence of a particular attribute.
However, it is risky to decide, a priori, what attributes should or should not be retained
without first obtaining sufficient responses from each subject in a pilot test. Training is
a group effort, and it takes time for the subjects to develop a working relationship with
one another and to be able to describe their perceptions. It is relatively easy for an
inexperienced panel moderator to be misled into concluding too early in the training
process that the panel is or is not sensitive to a particular attribute or to be unduly
influenced by one or two subjects. Entirely different results may be obtained when the
subjects score the products in the booths. A critical element in this training process is
to be sure that the subjects have sufficient time to practice evaluating products and
discuss their perceptions as a panel. The discussions help to resolve the issue of whether
there is too much overlap or too many words to represent a particular sensation.
Therefore, it is best to leave some attributes for which the subjects are having difficulty
reaching a consensus, and only after a test will it be possible to assess how well those
attributes are being used. However, the panel leader must keep in mind that there are
limits as to how much information humans can process. If there are too many choices,
sensitivity is reduced, and variability increases substantially. Clearly, if there are 60 or
70 attributes for a homogeneous product, that is probably too many.
It has been suggested that one can reduce the number of attributes through the
use of factor analysis of responses obtained during training. Such action is not recom-
mended for numerous reasons, including the preliminary nature of the language, the
limited database, the psychological impact of the loss of attributes proposed by certain
subjects, and so forth.
The subjects are told they can use any words they want to describe their percep-
tions provided they do not use words that relate to product liking, connote quality, or
are technical in nature. For example, using “a product I would like,” “good for me,” “is
not my brand,” and so forth, are actively discouraged by the panel leader. Technical
jargon is especially problematic; the words will have very different meanings
In every way, the scorecard is a product of the subjects. Care should be taken to
avoid forcing them to use attributes with which they are uncomfortable; the resultant
data will be biased or have considerably more variability. Also, if an attribute is to be
dropped (not used), the reason should be clear, and the subjects should discuss their
decision. Deleting an attribute that the panel considers important risks losing an
important attribute, undermines their self-confidence, and implies there are correct
answers to the test.
Although the panel leader should avoid providing attributes that the panel may
feel obligated to use (especially during the first two or three sessions), anything to help
the panel’s understanding should be used (e.g., if a panel member says something has
a musty/earthy odor, it is fine to ask whether it is somewhat like the odor of a dirt cellar,
raw mushrooms, or soil on potato skins to help describe it). As a general rule, subjects
should be told that if an attribute proposed by the panel leader is not helpful then they
can drop it by a simple vote. This procedure serves as a reminder that they (the sub-
jects) are in charge of the process.
If everything tried (discussion, reference, etc.) fails to bring about a consensus
agreement for an attribute, it may be better to allow the subjects to remove it from the
scorecard. If subjects feel frustrated, it will affect their responses to other attributes.
The data obtained from that attribute will most likely be uninformative. On the other
hand, even if only two or three subjects are responding consistently to an attribute that
others cannot perceive, it is better to retain that attribute. As always, the decision is the
subjects’ responsibility, although the leader may have to make the final decision if there
is disagreement.
There is no such thing as a problem subject—only a subject who has a problem. If
it cannot be solved, then you will likely have a frustrated/unhappy subject. The panel
leader develops the panel by keeping an open atmosphere, enhancing subject self-es-
teem by active listening, providing feedback as needed, and helping subjects to describe
a sensation when words will not come. Other subjects may follow the leader’s example
and aid each other in clarifying descriptions. Sometimes one encounters an individual
who has functioned effectively during screening but is so uncomfortable in the group
sessions that they ask to be excused. This is a reasonable request and should be granted.
Alternatively, there are individuals whose behavior is so disruptive that they should be
excused rather than allow them to destroy the panel’s effectiveness. These are not easy
decisions but reflect actual situations.
The initial goal is for subjects to describe products in their own words. When a
subject has difficulty with a description, the panel leader may make suggestions; how-
ever, subjects should first be encouraged to help one another. The panel leader must be
objective (i.e., without judgment of right/wrong, good/bad). Because of the high degree
of group dynamics involved, subjects are more sensitive to the responses of others than
they might otherwise be.
While the inexperienced panel leader will worry that subjects will not develop
many words, it is often the converse that occurs (i.e., the subjects will develop many
words). The task for the panel leader in this case is to focus the panel on consolidation.
A second issue is the ability to control the group dynamics without appearing to con-
trol it. At some point, usually by the third session, the subjects develop sufficient
self-confidence, and there is a tendency for them to have simultaneous conversations.
While such discussions may appear to be useful, they are not, and the panel leader will
need to control this activity.
Once a scorecard (fig. 2) and attribute definitions (fig. 3) have been developed and
used in a test, the panel will be able to use it more effectively as more tests are initiated.
Prior to each test, the panel meets for a session, usually 90 minutes in duration, to
review the scorecard and make any changes based on the products in the new test.
Here the panel leader provides the scorecard, list of definitions, and a product. After
the second product, the panel leader will focus on selected groupings of attributes
reflecting the current products and results from the previous test. The panel leader will
need to review past results and identify specific attributes that warrant attention (e.g.,
product differentiation was lost due to interaction).
Data Collection
Replication and analyses (ANOVA) take into account differences in scale use; this
determines whether the mean is a realistic measure of the distribution of the scores.
Products are evaluated in individually controlled evaluation areas (e.g., booths) or in a
typical usage situation (at home, work, or play).
Strengths of perceptions are converted to numerical values through the use of a
line or graphic rating scale. The QDA scale developed by Stone et al. in 1974 is a 6-inch
(∼15 cm) unstructured line scale with two anchors, each 1/2 inch (1.25 cm) from the
end of the scale. These anchors are typically labeled with low/weak at the left to high/
strong on the right. Strength of a perception is recorded by placing a vertical mark on
the scale that best represents that individual’s sensitivity to that characteristic. The
effectiveness of the scale is based on the subject experiencing the range of products (the
stimuli) being tested.
the pilot test data are analyzed to determine individual panel performance and attri-
bute agreement, not product differences. The data should be analyzed thoroughly with
one- and two-way ANOVA for each sensory attribute to determine whether the panel,
as a whole, scored products differently from one another. Multiple-comparison proce-
dures (such as Duncan’s new multiple range test) are calculated after the ANOVA to
identify statistically significant differences among products for each sensory attribute.
APPEARANCE
Judge Instructions: Look at the cup of product, pick up, examine closely, then evaluate:
BROWN COLOR Overall impression of brown color of the product, ranging from a light
(light-dark) brown to a darker brown to a deep or very dark brown color.
THICKNESS (thin-thick) Overall impression of the thickness of the product, ranging from thin and
watery to thick and viscous.
FOAMY (slightly-very) Overall impression of the amount of foam on and in the product.
AROMA
Judge Instructions: Pick up the cup of product, hold close to nose, smell, and then evaluate:
CHOCOLATE Intensity of distinct chocolate aroma of any type; like chocolate chips or
(weak-strong) candy bars.
DAIRY/MILKY Intensity of milk or dairy aroma; reminiscent of the lactic tang with fresh
(weak-strong) dairy milk.
MALT (low-high) Intensity of malt or malted aroma.
FLAVOR
Calculations for panel performance measures include six key measures: standard devi-
ation, crossover, scale range usage, discrimination, scale position usage, and subject
attribute decision influence. This panel performance analysis is conducted with the
pilot test data, and with each subsequent QDA study to monitor subject behavior.
More on this topic will be covered in the next section, “Panel Performance Measures.”
The pilot test data are only for the sensory scientist to make decisions on areas for
remedial panel training if necessary. Based on results of the pilot test, the panel is
reconvened to discuss attributes and products in which the panel leader seeks clarifi-
cation of the definitions or evaluation methods or both. After initial training and pilot
testing, remedial training sessions may be scheduled if necessary, or the panel may
begin their data-collection sessions.
Reporting of Results
QDA analyses are designed to provide an in-depth understanding of these sensory
similarities and differences among products and include the following:
• One-way ANOVA for each sensory attribute to measure subject consistency and to
identify product differences
• Two-way ANOVA for each sensory attribute to determine whether the panel, as a
whole, scored the products as different from one another
Practical Considerations
The product developer often needs to make changes quickly to accommodate project
objectives. In general, changes can be readily incorporated into the QDA process.
Several practical considerations and helpful suggestions are discussed below. The
panel moderator’s responsibility is to manage the language, help subjects clarify any
confusing words, and reduce the redundancy of words used in describing the product.
Most projects evolve, and descriptive analysis may be useful at various stages of devel-
opment. To best accommodate the research objectives, amendments to the original
test design or additional testing may be required.
Rapid Methods
If there are resource constraints or if there is less risk in decision making, consider
options for more rapid language development and data collection. Within the QDA
methodology, this is referred to as diagnostic descriptive analysis, or DDA. There
are a couple of approaches that are generally used. The first is to conduct a full QDA
approach and then select a subset of attributes based on various data-reduction
techniques such as PCA or with a more in-depth understanding of the important
sensory attributes that have the greatest impact on consumer behavior and pur-
chase interest. The research team could select key attributes on the basis of their
product knowledge and on previous results. Keep in mind that the subjects should
still have direct input to ensure that the reduced language is well defined and reflects
differences that they have observed.
Another common use of DDA involves screening previously trained subjects
qualified as likers and users (or potential likers and users) for a new product category.
One could screen potential subjects with as few as three to six discrimination tests to
qualify them, and then conduct only two to three language sessions to develop a short-
ened set of around 12 to 15 attributes. It is advised to use the full 12 to 15 subjects and
two replications of the product set may suffice.
p value 0.05
Interaction as error value 0.10
Post hoc method Duncan’s
Brown Color Thickness Foamy
BK-AST-MNL_13-200365-Chp03.indd 68
Product 2 51.08 A Product 14 16.92 A Product 8 60.29 A
Product 9 49.75 AB Product 7 16.79 A Product 16 49.42 B
Product 13 42.29 ABC Product 15 16.12 AB Product 13 48.00 B
Product 12 40.54 BC Product 8 11.50 ABC Product 11 42.00 BC
Product 7 35.08 CD Product 13 11.13 ABC Product 15 41.75 BC
Product 14 33.92 CD Product 16 10.67 ABC Product 3 35.75 CD
Product 15 31.96 CDE Product 2 9.96 ABCD Product 2 34.79 CD
Product 16 28.58 DEF Product 4 8.79 BCD Product 4 34.46 CD
Product 8 27.79 DEF Product 5 7.38 CD Product 10 32.67 CDE
Product 4 24.37 DEFG Product 12 6.67 CD Product 12 32.33 CDE
Product 3 22.50 EFG Product 11 6.21 CD Product 9 30.63 DE
Product 5 20.17 FG Product 10 6.00 CD Product 5 27.50 DEF
30/10/20 4:31 PM
Product 11 19.50 DEFG Product 9 22.25 CDE Product 2 5.83 BCD
Product 9 18.96 DEFG Product 16 22.25 CDE Product 5 4.75 CD
Product 4 18.67 DEFG Product 15 19.67 CDEF Product 7 4.67 CD
Product 8 17.21 EFGH Product 1 17.71 DEFG Product 10 4.17 CD
Product 10 16.92 EFGH Product 6 13.42 EFGH Product 6 3.96 CD
Product 6 11.79 FGHI Product 14 11.50 FGH Product 4 3.92 CD
BK-AST-MNL_13-200365-Chp03.indd 69
Product 1 8.17 GHI Product 12 10.71 FGH Product 8 3.62 D
Product 7 5.87 HI Product 7 8.83 GH Product 3 3.50 D
Product 5 3.50 I Product 2 6.79 H Product 11 3.42 D
Product 2 3.29 I Product 5 6.04 H Product 1 2.92 D
Chocolate Flavor Dairy/Milky Flavor Powdered Milk Flavor
Product 14 44.71 A Product 8 27.25 A Product 15 18.08 A
Product 6 42.50 AB Product 3 26.25 AB Product 12 16.04 AB
Product 3 41.75 AB Product 11 25.38 AB Product 14 15.00 AB
Product 13 40.75 AB Product 2 24.46 ABC Product 13 14.42 AB
Product 5 40.54 AB Product 10 24.08 ABC Product 8 12.12 ABC
Product 11 40.25 AB Product 6 22.58 ABCD Product 9 12.08 ABC
Product 9 39.46 AB Product 13 22.21 ABCD Product 16 11.96 ABC
Product 16 39.42 AB Product 16 21.58 ABCDE Product 2 10.46 ABC
Product 10 39.00 AB Product 5 21.33 ABCDE Product 7 9.17 BC
Product 2 38.25 AB Product 9 20.21 BCDE Product 10 8.96 BC
Product 15 36.46 AB Product 15 19.71 BCDE Product 3 8.75 BC
Product 7 34.96 B Product 12 18.37 CDE Product 11 8.46 BC
Product 12 34.17 B Product 4 18.21 CDE Product 6 7.71 BC
Product 8 33.75 B Product 14 16.42 DE Product 5 5.58 C
Product 4 22.04 C Product 1 16.17 DE Product 1 5.21 C
Product 1 12.08 D Product 7 15.21 E Product 4 5.04 C
Product 9 39.46 AB Product 13 22.21 ABCD Product 16 11.96 ABC
QUANTITATIVE DESCRIPTIVE ANALYSIS
30/10/20 4:31 PM
70 Descriptive Analysis Testing for Sensory Evaluation: 2nd Edition
These types of DDA programs are more common with nonflagship products such
as the lower-risk second- and third-tier products.
methodology, repetitions are included, as well as timed rest intervals given between
evaluations, and data are collected in appropriate test environments following best
sensory practices.
Adaptations
The QDA methodology is readily adapted to any consumer product, as the overar-
ching objective is to describe and measure the product’s sensory experience. Each
product comes with its own set of challenges, and the sensory professional must
decide on testing protocols and methodology on the basis of the research objective.
The method has been adapted for a wide variety of products around the world,
including, but not limited to, food and confections, beverages (including wine and
spirits), tobacco, apparel, footwear, sporting goods, office supplies, writing instru-
ments, personal care, household care, art supplies, furniture, automotives, and pet
foods, among others.
To adapt the QDA methodology, the sensory scientist must understand how
product preparation and testing protocols can be designed to best reflect the end use
of that product. Research design choices may impact the selection of the test location
and tasks associated with language development and data collection. For some cate-
gories, it may be more appropriate to conduct an in-home evaluation or combine it
with a laboratory procedure and an away-from-the-lab or in-home procedure.
Generally, the overall objective is to work with the constraints of the product category
and develop a consumer-based language that best reflects the product’s sensory expe-
rience. Many consumer-usage situations, especially for nonfood products, cannot
fully be replicated in a laboratory environment. The sensory professional must deter-
mine how best to get as close as possible to typical behavior associated with consumer
usage. In this way, the QDA results can be readily correlated with consumer affective
behavior to determine how the products’ sensory experience affects consumer
perception.
For example, with running shoes, subjects could hold and bend the shoes, put
them on in the laboratory or a central location, and then walk around a room or build-
ing and describe their initial product sensory impressions. However, the running
experience cannot be fully understood without a variety of surfaces, distances, and
running times, along with considerations for indoor/outdoor and various weather
conditions. The sensory scientist must make key decisions during the recruitment of
subjects/runners, the distance, frequency, duration, and training level necessary to
meet the project objectives. These types of sensory procedural decisions must be dis-
cussed before initiating the panel to ensure that they meet the business objectives and
that the results reflect the intended usability of the data. This is just one example of how
the QDA methodology may be adapted for any product of interest.
With widespread usage of direct-data entry and web-based systems, both labora-
tory and out-of-laboratory techniques will increase in usage and application. QDA is a
flexible methodology and is easily adapted to a wide variety of products for which the
laboratory may not be the best location for data collection.
Summary
As stated previously, the QDA methodology has been applied to a wide variety of con-
sumer goods, and the applications are limited only by the creativity of the sensory
science, development, and brand-building teams. The real power of the method comes
in the researcher’s approach to language development. It must start with a curious
mind and a panel of subjects with known sensory acuity who are likers and users of the
product. A key element of the moderator’s role is to understand what to measure and
how to measure it—not to teach the subjects anything but to learn from them. How do
individuals on the panel approach the category to begin with? How do they discuss
and talk about their observations and evaluate and make decisions?
For the sensory scientist, the QDA methodology is a dynamic and flexible system
but requires that the panel moderator make numerous decisions before, during, and
after the language development process. Key decisions must be made when organiz-
ing, recruiting, and screening the panel, observing behavior throughout the language
development process, when reviewing results and when making decisions on the find-
ings. All of these aspects are important and can affect the usefulness of the data. The
sensory scientist must decide on the appropriate experimental design, when and where
products will be evaluated, and, last but not least, analyzing data and reporting key
finding and conclusions. This is not unique to the QDA method but is an important
aspect of reviewing data and drawing appropriate conclusions from results. Without
sufficient knowledge about human behavior and the research/project objectives, incor-
rect product decisions could be reached to the detriment of our sensory science, to the
business, and to the product category.
From a business perspective, small data derived from QDA studies can provide
robust guidance to the development and marketing teams.7 QDA has been an essential
part of brand building and brand protection for well over 40 years, and it is continuing
to find new applications across a host of new categories. Consumer research is in a hard
data-driven era, and companies are looking for additional insight to understand prod-
ucts and how consumers view and interact with them to discover improvement and
new product opportunities. For example, the results can be analyzed with simple and
complex consumer data to understand relationships between sensory attributes and
consumer ratings with single correlations (fig. 6) or with multisegment data (table 2
and fig. 7).
FIG. 6 XY graph relating consumer liking to QDA data with a single correlation.
TABLE 2 C
orrelation matrix with QDA attributes and CLT data with subgroups (e.g.,
brand user, preference segments)
References
1. H. Stone, J. Sidel, S. Oliver, A. Woolsey, and R. C. Singleton, “Sensory Evaluation by
Quantitative Descriptive Analysis,” Food Technology 28, no. 11 (1974): 23–34.
2. H. Stone, R. N. Bleibaum, and H. A. Thomas, Sensory Evaluation Practices, 4th ed. (San
Diego, CA: Academic Press, 2012).
3. J. L. Sidel, R. N. Bleibaum, and K. W. C. Tao, “Quantitative Descriptive Analysis (QDA),” in
Descriptive Analysis in Sensory Evaluation, ed. S. E. Kemp, J. Hort, and T. Hollowood
(London: Wiley-Blackwell, 2018), 287–318.
4. R. N. Bleibaum, M. J. Kern, and H. Thomas, “Contextual Product Testing for Small to
Medium Sized Enterprises (SMEs),” in Context: The Effects of Environment on Product
Design and Evaluation, ed. H. Meiselman (Duxford, UK: Woodhead Publishing, 2019),
501–520.
5. Standard Guide for Sensory Claim Substantiation, ASTM E1958-16a (West Conshohocken,
PA: ASTM International, approved October 1, 2016), http://doi.org/10.1520/E1958-16A
6. R. M. Corbin, R. N. Bleibaum, T. Jirgal, D. Mallen, and C. A. Van Dongen, Practical Guide to
Comparative Advertising: Dare to Compare (San Diego, CA: Academic Press, 2018).
7. M. Lindstrom, Small Data: The Tiny Clues That Uncover Huge Trends (New York: St. Martin’s
Press, 2016).
Introduction
The Spectrum™ Descriptive Analysis Method was developed by Gail Vance Civille
in the 1970s. The Spectrum Method expands on the rigorous training and structure of
the Flavor and Texture Profile Methods with the use of a more discriminating scale
(151 points) and statistical techniques to improve data interpretation. The Spectrum
Method has been used for a variety of consumer products beyond the world of food
and beverages, including personal care,1,2 fabrics, and paper.3
Objective Setting
Spectrum panel results allow for documentation of a product’s sensory characteristics
to guide product ideation, product maintenance, product development, quality con-
trol/assurance, and advertising claims. The rationale for the Spectrum Method is to
have technical terms and a universal scale that provide product guidance to product
developers at the bench and quality control engineers at the plant. The data are meant
to be used like other technical data to make decisions and gather critical product infor-
mation that is objective and consistent.
1
Sensory Spectrum Inc., 554 Central Ave., New Providence, NJ 07974, USA
G. C. https://orcid.org/0000-0002-5007-7546 , K. O. https://orcid.org/0000-0003-2298-535X
DOI: 10.1520/MNL1320150029
Copyright © 2020 by ASTM International, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA 19428-2959
ASTM International is not responsible, as a body, for the statements and opinions expressed in this chapter.
ASTM International does not endorse any products represented in this chapter.
With the universal scale, all attributes in all samples are rated using the same
scale, and samples may be compared over time, across panels, and across sessions.
Products can also be compared across categories because all attributes are rated rela-
tive to the universal scale and not to each other or within a particular category.
A key challenge in sensory science is overcoming the disconnect between how
people express what they perceive (usually consumers) compared with what is actually
perceived (panels). By training panelists to recognize sensory attributes and rate them
on a universal intensity scale, the Spectrum Method helps to bridge this gap. The cali-
brated data obtained from Spectrum panels can be used with the same level of confi-
dence that a researcher or product developer would expect from instrumental data.
Overview of Method
The Spectrum Method uses highly trained panelists to descriptively profile the sensory
attributes of consumer products. Depending on the product category, these profiles
may include appearance, aroma/fragrance, flavor (including aromatics, basic tastes,
and chemical feeling factors), texture, fabric/paper handfeel, skinfeel, and product
sound. The panelists are calibrated to the sensory attributes in each of these modalities
using qualitative and quantitative references and a universal intensity scale.
For ease of reading, many topics discussed in this chapter will use food flavor and
texture as examples. The beauty of the Spectrum Method, however, is that it can be uni-
versally used to evaluate the sensory properties of virtually any consumer product,
including fragrance, personal care, hair care, paper, fabric, and home care products. For
example, the principles that apply to flavor evaluation also apply to fragrance evaluation.
QUALITATIVE REFERENCES
Qualitative references are used to clarify the definition of each sensory attribute and to
ensure that all panelists have learned and internalized these definitions. A qualitative
reference has four features: a name that is specific to the attribute (e.g., distilled lemon), a
physical reference that demonstrates the attribute (e.g., lemon oil), examples of products
with the attributes (e.g., lemon-lime beverage, lemon candy), and a definition to explain
the attribute (e.g., the aromatics associated with distilled lemon oil).4 References can
include chemical odorants/flavorants (e.g., benzaldehyde), simple ingredients (e.g., white
vinegar), and controlled processes (e.g., coffee at different brew times or dilutions). Refer-
ences are also helpful for teasing apart nuanced differences in sensory character. For
example, the use of references can help separate the smoky flavor of bacon into ash, phe-
nolic, and wood smoke (these attributes would be grouped under “Smoke Complex”).
The more experienced the panelists, the more they can detect and describe these nuances.
Qualitative references can also be demonstrated by examples. Examples are pre-
dominating in one attribute but are not singular examples of that attribute (references,
on the other hand, are singular). Sweetened condensed milk would be an example of
the attribute “caramelized,” while caramelized sugar would be a reference. Examples
can help panelists see attributes in a broader context and more complex environment.
QUANTITATIVE REFERENCES
A quantitative reference demonstrates an attribute at a specific intensity (e.g., Sour 2,
Sour 5, Sour 10, Sour 15). For a given attribute scale, references should be available for
several points on the scale. Well-chosen references help to reduce panel variability,
ensuring that data can be compared across time and products. Table 1 shows the Spec-
trum universal scale for aromatics, with commonly used quantitative references. Spec-
trum intensity references for basic tastes, texture, skinfeel, and handfeel attributes
have been published elsewhere.5 These references have been generated collectively over
several replicates and several panels.
Lexicon Development
The Spectrum lexicon development process has been covered extensively in other pub-
lications.5–7 Fig. 1 outlines the five primary steps in lexicon development (by a trained
panel). A shortened lexicon development procedure is demonstated for the flavor and
texture of macaroni and cheese in the sections that follow.
that cover the full frame of reference available when developing a new lexicon. Ulti-
mately, a set of 6 to 12 representative samples are chosen. Four samples were chosen
for the macaroni and cheese demonstration.
2. Term generation: The panel convenes to evaluate the frame of reference samples and
generate a list of terms. The panelists taste the samples and describe them in their
own words. Together, the panel discusses the samples and puts together an exhaus-
tive list of sensory attributes (possibly grouped into categories or complexes).
3. Use of references: A reference is found for each of the attributes listed in Step 2. Pan-
elists review the references and come to agreement on the appropriate definition for
each attribute. Macaroni and cheese references are shown in table 2.
TABLE 2 References used in the lexicon development for macaroni and cheese
4. Examples: Examples that encompass various sensory attributes are shown to the
panel. An evaluation of these examples allows the panelists to see the attributes in
context to help them further refine the lexicon. Macaroni and cheese examples are
shown in table 3.
5. Lexicon refinement: The final step in lexicon development involves refining the
attribute list by eliminating redundant and integrated, consumer-type terms (e.g.,
“creamy”). The result is a working lexicon that is ready for validation. It is important to
remember that a lexicon can always be updated, changed, or modified. As new samples
are introduced, so may be new attributes. The “finished” macaroni and cheese flavor
lexicon is listed in table 4. After a new lexicon is developed, and before it is used for
product evaluations, it must be validated. Typically, two samples are evaluated using the
lexicon to demonstrate its ability to differentiate and profile products in the category.
Chemical Feeling
Aromatics Aromatics (continued) Basic Tastes Factors
New texture attributes occasionally arise during evaluation. New scales (with appro-
priate references) can be created to capture these new attributes.
In addition to attribute definitions, the techniques used to evaluate texture attri-
butes must be clearly defined. Texture evaluation techniques include identifying the
stage at which a particular attribute is experienced (i.e., first bite, chewdown, residual),
the amount of sample required for evaluation (i.e., an average bite), and any additional
manipulations needed (i.e., move the tongue around the outside of the mass). Table 5
shows an example of a texture attribute (hardness), its definition and technique, and its
references along the universal intensity scale. Table 6 shows the texture lexicon for
macaroni and cheese.
RECRUITING
Prospective panelists go through several stages of screening and qualification. Panel-
ists may be recruited internally or externally. Internally recruited panelists are conve-
nient because they are already located in the building and may have some knowledge
of the sensory evaluation process. However, people already employed by the company
must carve time out of their current work schedule to attend trainings and practice
sessions. Additionally, their prior knowledge of the company’s products may introduce
bias in evaluations. Externally recruited panelists may be more reliable and may have
more time to devote to the panel. Because they are more naive in regard to sensory
evaluation, however, they may require more extensive training. Also, because they will
likely be working only part time as panelists, their availability to come in for evalua-
tions on short notice may be limited.
PRESCREENING
Typically, 60 to 80 people are prescreened for a Spectrum panel, with the goal of 34 to
45 moving to the acuity screening step and finally 18 to 20 participating in the training
program (fig. 2). The final panel ideally has 12 to 15 members. A primary consideration
is availability for training, practice sessions, and afterward, routine panel work. An
accurate schedule should be available during this time. Potential panelists should be
available for a minimum of 80% of sessions during a specified time frame (typically
2–3 years). Prescreening for a food panel should also serve to eliminate those who
require special diets (medical or otherwise), take medication that affects taste percep-
tion, have food allergies, or have dentures/dental work that may affect texture percep-
tion. Potential skinfeel panelists must not be sensitive or allergic to ingredients
commonly found in fragrances, lotions, etc. Potential panelists should also express
interest in learning more about sensory evaluation of the product category. Prescreen-
ing is typically completed via online questionnaire, although it may also be completed
over the phone or in person.
TABLE 7 Standard evaluation protocol and lexicon for lotions and creams
Product Appearance
In a petri dish, panel leader dispenses the product in a spiral shape using a nickel size circle, filling it in
from the edge to the center. Evaluate for:
Attribute Definition Qualitative Reference
Product Pickup
Using automatic pipette, panel leader delivers 0.1 cc of product to the tip of the thumb or index finger.
Compress product slowly between index finger and thumb one time and then separate fingers. Evaluate
for:
Firmness Force required to fully compress product Petrolatum
between thumb and index finger
[no force—high force]
Stickiness Force required to separate finger tips Petrolatum
[no force/not sticky—high force/very sticky]
Compress and separate product between the index finger and thumb three times and evaluate for:
Cohesiveness Amount sample strings rather than breaks Petrolatum
when fingers are separated
[no strings—high strings]
Amount of Peaking Degree to which product makes stiff peaks on Petrolatum
finger tips
[flat/no peaks—stiff peaks]
Rub-Out
Using automatic pipette, panel leader delivers 0.05 cc of product to center of 5-cm-diameter circle on
volar forearm. Spread the measured amount of product within the circle using index or middle finger,
using a gentle circular motion—stroke at a rate of two strokes per second. After three rubs, evaluate for:
Wetness Amount of water perceived while rubbing Water
[none—high amount]
Spreadability Ease of moving product over the skin Baby oil or light mineral
[difficult/drag—easy/slip] oil
(continued)
TABLE 7 Standard evaluation protocol and lexicon for lotions and creams (continued)
Wax Amount of wax perceived in the product during Surface of wax taper
rub out candle; cheese wax
[none—extreme]
Grease Amount of grease perceived in the product Petrolatum
during rub out
[none—extreme]
Continue rubbing and evaluate for:
Rubs to Absorbency The number of rubs at which the product loses Vaseline Total Moisture
wet, moist feel and a resistance to continue is Body Lotion; light
perceived mineral oil
[upper limit = 120 rubs]
Afterfeel (immediate—can be repeated at additional time points)
Visually analyze the forearm test site and evaluate for:
Gloss Amount or degree of light reflected off skin Baby oil or light mineral
[dull/matte—shiny] oil
Tap cleansed finger lightly over application site and evaluate for:
Stickiness Degree to which fingers adhere to residual Petrolatum
product
[not sticky—very sticky]
Stroke cleansed fingers (1–2 strokes) lightly across skin and evaluate for:
Slipperiness Ease of moving fingers across skin Untreated volar forearm
[difficult/drag—easy/slip] skin; light mineral oil
ACUITY SCREENING
Potential panelists who successfully complete prescreening are invited to participate in
a two-part screening session—sensory acuity testing and a personal interview. Mock
panel sessions may also be part of the screening process.
and ability to use scales and references. Texture, skinfeel, and handfeel acuity can be
assessed by asking potential panelists to rank or rate the references for specific attri-
bute scales.10,11
Personal interview
The interview (by the panel leader or panel trainer) serves to gauge the potential pan-
elist’s interest in and commitment to the job, their ability to work in a team setting,
their general ability to learn the skills needed, and their ability to follow instructions.
Spectrum panel leaders have ideally been previously trained as panelists and so
are familiar with the methodology and have considerable skills of their own. Alterna-
tively, new panel leaders may observe an existing panel for several months. Spectrum
panel leaders should also have the demonstrated ability to manage, teach, and commu-
nicate within a group. An additional asset is a basic understanding of statistical analy-
sis, which is helpful for monitoring panel data and performance.
PANEL FUNCTION
The primary day-to-day function of the panel leader is to lead the panel discussions
during product evaluation. Spectrum evaluation sessions are very lively and everyone is
an active participant, especially when data are collected by consensus. The panel leader’s
job is to manage these group discussions to ensure that the panelists work together as a
team to come to a uniform, agreed upon intensity rating for each sensory attribute. The
panel leader’s guidance should be free from bias, giving equal voice to each panelist and
refraining from excessively positive or negative judgments of panelists’ comments. All
efforts should be made to avoid influencing the outcome of the evaluations.
PANEL LIFE
The panel leader facilitates the practical as well as cultural aspects of day-to-day panel
functioning. The practicalities include managing work schedules, pay, time off, and
individual performance monitoring. Culturally, the panel leader is responsible for
maintaining panel morale. The panel leader should be positive, prompt, professional,
and well prepared for each day’s session. The panel leader should also actively partici-
pate in all evaluation sessions.
Panel Training
LENGTH OF TRAINING
Spectrum panelists complete a minimum of 100 hours of training for each sensory
modality (flavor, texture, etc.) before beginning to evaluate products. Panels may be
trained to evaluate one or all modalities and one type or a variety of products. The 3- to
4-month training period includes orientation weeks, led by an experienced trainer,
interspersed with practice weeks, led by the panel leader (table 8).
Orientation 1 1 24 24
Practice 5–7 4–6 26–30
Orientation 2 1 24 24
Practice 5–7 4–6 26–30
Total 12–16 4–24 100+
TRAINING OVERVIEW
The first orientation week covers basic sensory principles and Spectrum descriptive
analysis methodology. For food panels, these concepts include basic tastes, aromatics,
chemical feeling factors, texture, and appearance. The panelists also learn about objec-
tive versus subjective evaluation and the principles of scaling to help orient them to the
type of data they will be providing. Panelists start to develop their evaluation skills by
developing lexicons for a range of product categories using the standard lexicon devel-
opment procedure. References are used extensively throughout training. During lexi-
con development qualitative references are used to clarify terms. The panel trainer
then introduces the universal scale and the use of intensity references. In this way,
panelists learn the procedure while developing their evaluation skills and familiarity
with those products. Panelists also gain practice working as a group to define the
terms, evaluation procedures, and references needed so the entire panel works as
a unit.
Subsequent orientation weeks are tailored to the needs of the panel’s future work
(e.g., a wine panel will be trained on grapes and dark fruit). Practice sessions allow
panelists to hone their skills and gain exposure to more and more product categories
as well as attributes and their references.
During training, the role of the trainer or panel leader is to coach the panelists
through the exercises. Guidance will likely include additional references to clarify
terms and intensities, assistance in assigning terms when only integrated or consumer
terms are offered by the new panelists, and clarification around terms where two or
more panelists may be assigning different terms to the same attribute. For example,
Panelist 1 perceives orange, Panelist 2 perceives ginger, and Panelist 3 perceives cilan-
tro. All of these have a citrus component or character associated with them. It is the
panel leader’s responsibility to show this connection to the panel and work to assign
which citrus term they can all agree upon. Panel leaders also often point out instances
in which panelists’ newly learned sensory skills can be applied in other situations (to
other categories/modalities). For example, a woody note observed in coffee may also
appear in chocolate. Spectrum panel leaders do not provide the lexicon for the panel
but encourage exploration of nuances. If panelists describe an Oreo as vanilla, they get
to see several references for vanilla and vanillin and learn by evaluating that the cor-
rect descriptor for the sweet aromatic in an Oreo is vanillin.
TEXTURE
Texture training involves orienting panelists to the fundamental optical, rheological,
and physiological concepts of texture evaluation. They learn how to assess geometric
and mechanical properties of products, moistness-related attributes, and the sounds
products make during manipulation. Panelists then learn specific texture attribute
scales and the corresponding references and evaluation techniques. Panelists also
develop their skills by practicing texture evaluations of various products.
Panel Validation
Validation is an important step in ensuring the panel is acting as a calibrated mea-
surement instrument. Validation first occurs at the end of training and then annu-
ally or semiannually to monitor panel (group) and panelist (individual)
performance. An important part of validation is choosing appropriate decision
criteria and identifying what course of action will be taken for each outcome. Deci-
sion criteria may vary depending on product, modality, and the level of training of
the panel. Panelists who fail validation are given probationary status and undergo
retraining. Panelists may return to regular panel evaluations after the successful
completion of validation.
VALIDATION PROCEDURE
Validation assesses the precision, accuracy, and sensitivity of both the individual pan-
elists and the panel as a whole. These three areas can be addressed simultaneously by
implementing a well-designed validation study.
Panelists should be validated on products and attributes that are likely to occur in
their day-to-day evaluations. For example, a panel that evaluates potato chips should
be validated on not only potato and oil aromatics but also common off notes, such as
painty and cardboard. Four to six samples are typically used for validation. The sample
set as a whole should showcase differences (varying from small to large) in all or most
sensory attributes. Samples for which known product profiles exist (blind controls)
should also be included. Panelists should evaluate two to three replications of each
validation sample, in random order, each with a different three-digit blinding code.
Precision. Precision refers to the ability of panelists to repeatedly rate the same sample
in the same way. For individual panelists, standard deviation (across replicates of
the same sample) for one or more attributes serves as a good measure of precision.
Standard deviations greater than a specified decision criterion (dependent on the
product, modality, and panelist’s level of training) indicate that a panelist is not
sufficiently replicating his or her ratings for that attribute.
Accuracy. Accuracy is the degree to which a panelist’s ratings agree with a known sam-
ple profile (control). By-attribute accuracy may be assessed by subtracting the
expected rating for the control from the panelist’s average rating across replicates of
blind controls and converting the difference to a percentage of the total scale length.
(Alternatively, if a sample with a known profile is unavailable, panelists’ data may
be compared to the panel mean or median data.) Above a certain percentage (deci-
sion criterion), a panelist is considered insufficiently accurate.
Sensitivity. Sensitivity is the degree to which panelists can detect small differences,
measured as panelists’ ability to discriminate among samples. To determine dis-
crimination ability, each panelist’s data are subjected to an analysis of variance
(ANOVA). A p value for the sample effect that is less than a specified cutoff (deci-
sion criterion) indicates that the panelist is successfully discriminating among
samples for that attribute.
Overall precision, accuracy, and sensitivity (across attributes) may be examined by
calculating the frequency with which panelists “pass” based on the decision criteria.
Panelists’ abilities to both discriminate (sensitivity) and replicate (precision) can
be assessed visually by plotting their mean standard deviation versus their mean
p value (fig. 3). Panelists with low p values and low standard deviations have high sen-
sitivity and precision and are performing better than those with high p values and high
standard deviations.
ways to show the panelists that their work is valued. A descriptive panel should also
operate with a sense of fun. Although evaluations can be stressful at times, it is import-
ant to remember that it is not brain surgery. No one’s life hangs in the balance.
Data Collection
Spectrum descriptive analysis data may be collected individually or by consensus, and
there are benefits to each. When a study requires statistical analysis, individual data are
best. In this case, 6 to 12 panelists and 2 or more replications are required. Consensus
data are used when directional or nuanced information is needed. The process of con-
sensus data collection allows the panel to discuss attributes and intensities while the
products are being evaluated. The discussions that occur during evaluation ensure that
panelists are carving attributes similarly and rating them the same way. For example, a
product may have a browned dairy note that some panelists call “cooked milk” and
some call “caramelized.” Through these discussions, the panel as a whole can decide in
which attribute to put that particular flavor note. Attributes may also be added to the
lexicon during panel discussions. Consensus data are not simply an average of individ-
ual panelists’ data; it is an intensity rating that is agreed upon by all panelists. Statistical
analysis can be done with consensus data only when multiple replications are
completed.
REPORTING
Descriptive analysis reports must clearly and concisely communicate the results of a
project to all interested parties. These parties may have differing degrees of sensory
knowledge, so results should be reported at varying levels. The use of graphs and visual
displays is highly encouraged. Reports should contain the test objectives, an executive
summary of the results, and key recommendations right up front. This information is
followed by more detailed sections on methodology, sample information, evaluation
procedures, data analysis, and results. Appendices containing additional information
(attribute definitions, prep instructions, etc.) may also be included.
PERCEPTUAL MAPPING
When many products must be compared, perceptual mapping is a useful tool. Percep-
tual mapping aids in summarizing descriptive data according to key attributes that
contribute most to the sensory variability within a specific product category. Spectrum
descriptive data are subjected to principal component analysis to group like attributes
and reduce the data to orthogonal (independent) factors. These factors are called key
sensory dimensions of the perceptual space. The dimensions may be mapped two at a
time to provide a visual representation of the space. Products may be added to the
maps to understand similarities and differences among products along the depicted
sensory dimensions. See fig. 4 for an example of a perceptual map of the orange juice
category. Attributes used in the evaluation are shown in table 9.
Practical Considerations
As a panel becomes more and more experienced, the Spectrum Method may be
adapted to allow for more efficient product evaluations. After a panel has had extensive
training on one product category, the lexicon developed for that category can be used
as a starting point for subsequent evaluations of similar products. Lexicons do not
need to be developed from scratch every time if the panel and project managers have a
Chemical
Aromatics Aromatics Basic Feeling
Appearance Aromatics (continued) (continued) Tastes Factors Texture
Color intensity Orange complex Other citrus Hydrolyzed oil Sweet Astringent Viscosity
Chroma Raw orange Albedo Cardboard/ Sour Burn
oxidized
Opacity Cooked orange Other fruit Vinyl Bitter
Distilled orange Sweet Balance and
oil aromatic blend
Expressed orange Green
oil
Fruity/floral Fermented
Situational Adaptations
Descriptive analysis data generated using the Spectrum Method combines synergisti-
cally with other types of data to provide even greater guidance to researchers than any
of these methods alone.
Degree of difference. A Spectrum descriptive profile can easily be augmented with a
degree of difference (DOD) score. DOD scores indicate how different, perceptually,
two samples are from one another (e.g., how different a test sample is from a con-
trol). DOD is rated on a 10-point scale, with 0 = no difference and 10 = very large
difference. DOD scores can be helpful in both research and development and qual-
ity control applications.
Quality. Spectrum panels can use their highly developed descriptive vocabulary to
qualitatively describe sample differences. Product quality ratings are appropriate
when panelists have extensive experience with the product category. Panelists dis-
cuss and classify descriptive attributes according to their appropriateness for the
category. These attributes may be considered positive, as in expected “on notes,” or
negative, as in unexpected “off notes.” Quality ratings are generally given on a
10-point scale.
Product grouping/sorting/napping. Spectrum panels can use their clearly defined tech-
nical language in product sorting exercises, and the resulting data can be easily
translated to research needs.
Fidelity. Fidelity is an overall rating that takes into account the degree to which the
sample of interest matches an identified target or expectation.
Balance and blend. Balance and blend measures the degree to which the individual
attributes of a product blend together, making it difficult to identify each compo-
nent. In products with high balance and blend, nothing is “sticking out” (i.e., attri-
butes are balanced), and all attributes overlap so closely that pulling the profile
apart into discrete components is very difficult (i.e., attributes are blended).
Product-adapted universal scaling. For panels that primarily evaluate products within
the same or similar product categories, the universal scale may be adapted accord-
ingly. Intensities of specific attributes from a specific product category are rated
using the universal scale as a base and thus may be related back to the overall uni-
versal scale.
Flavor pairings. The breadth and depth of sensory expertise of a well-trained descrip-
tive panel may be leveraged in nontraditional ways. In a flavor pairings session,
creative problem-solving techniques (more commonly used in the consumer
Advanced Applications
KEY DRIVERS AND PREDICTED LIKING
Spectrum descriptive data can, in combination with consumer data, help researchers
assess which sensory attributes drive consumer liking of a product category. Data from
perceptual mapping, such as the orange juice map shown in fig. 4, are combined with
consumer acceptance data using multiple linear regression. Statistical models gener-
ated from key-driver studies can also be used to predict liking of additional products
in a category and to generate sensory profiles of “ideal” products within the sensory
space.
References
1. N. O. Schwartz, “Adaptation of the Sensory Texture Profile Method to Skin Care
Products,” Journal of Texture Studies 6 (1975): 33–42.
6. Lexicon for Sensory Evaluation: Aroma, Flavor, Texture, and Appearance, (West
Conshohocken, PA: ASTM International, 2011).
10. Standard Guide for Two Sensory Descriptive Analysis Approaches for Skin Creams and Lotions,
ASTM E1490-19 (West Conshohocken, PA: ASTM International, approved November 1, 2019),
http://doi.org/10.1520/E1490-19
11. Standard Guide for Descriptive Analysis of Shampoo Performance, ASTM E2082-12
(West Conshohocken, PA: ASTM International, approved October 15, 2012),
http://doi.org/10.1520/E2082-12
Introduction
Free choice profiling (FCP), unlike conventional descriptive methods, is a descriptive
sensory technique in which each assessor uses his or her own objective terms for pro-
file analysis and is not required to explain or provide definitions for these terms.1 This
approach, in which attribute intensities of products are rated, is based on the premise
that assessors do not differ in their perceptions but merely in the way in which they
describe them.2
Conventional descriptive methods such as Quantitative Descriptive Analysis
(e.g., Tragon QDA developed in 1974 by Stone and Sidel3) and Spectrum™ Descriptive
Analysis4 have consensual steps (see respective chapters in this manual for details).
FCP also differs from conventional descriptive methods in that there is no consensus
among assessors on which terms should be used for profiling; thus, actual descriptor
terms vary in kind and quantity between assessors.
The training of panelists is another major factor that distinguishes conventional
descriptive methods from FCP. FCP is considered as a rapid alternative to conventional
descriptive analysis because it does not require extensive training of assessors.1 Train-
ing may take approximately 1 h4 and focuses on understanding the concept of subjec-
tive versus objective terms for vocabulary development, consistent use of a descriptor
term once the assessor has defined that term in his or her mind, and how to use scales
to score intensities. FCP also has no established training protocols; the rigor and
period of training typically depend on the study objective, complexity of the products,
and whether assessors are technical professionals or naive consumers. For example,
the training time for FCP may be limited to practicing scale usage by naive consumers
who are heavy product users5 or 10 half-hour sessions for the texture of complex prod-
ucts6 compared with 8–12 h for QDA and 100 h or more training time per modality for
the Spectrum method.3
1
Givaudan Flavors Corp., 1199 Edison Dr., Cincinnati, OH 45216, USA https://orcid.org/0000-0001-5447-9561
DOI: 10.1520/MNL1320150034
Copyright © 2020 by ASTM International, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA 19428-2959
ASTM International is not responsible, as a body, for the statements and opinions expressed in this chapter.
ASTM International does not endorse any products represented in this chapter.
Core Philosophy
FCP is based on the concept that assessors do not differ in their perceptions of prod-
ucts but mainly in the way that they describe them1 because of their individual expe-
riences and familiarity with the products.7 FCP contains some principles of
conventional descriptive methods in that assessors must be able to reliably detect,
describe, and quantify attribute differences between products. However, unlike con-
ventional descriptive methods, FCP does not require agreement in the usage and
interpretation of the terms used to describe the products.5 Yet insights from FCP can
provide information on consumer perceptions of and the key features they use for
differentiating among products.8 Thus, the FCP approach benefits from terms used
uniquely by consumers, which is not typical with conventional descriptive
methods.9
FCP originated in food sensory science1 but has since been applied to explore
characteristics of nonfood items, including personal care products, and expressive
qualities of animals and landscapes. In this chapter, and throughout this manual, the
term product refers to any stimulus that may be described by sensory profiling (e.g.,
food, beverage, household and personal care products, animal behavior, and other
environmental stimuli that may be rated objectively).
intensity scales. The data set from each assessor is a two-way table or rectangular array
of attribute intensities per product and can thus be treated as a (n × p) matrix of n
products described across p attributes.
During GPA, the matrices from all assessors undergo simultaneous transfor-
mations of translation, rotation, and scaling.36,37 Translation corrects for intensity
variations from using different parts of the scale by moving the centroids of each
configuration (a separate configuration is first developed from each assessor’s scores)
to a common origin. Rotation accounts for differences in the descriptive terms used
by the assessors. Scaling corrects for the range effects by generating weighted scaling
factors to compensate for individual scale usage differences (e.g., narrow scale ranges
versus wide scale ranges). The weighted scaling factors either stretch or shrink the
volumes of the individual configurations to make them as similar as possible. Sev-
eral iterations of these transformations are performed to decrease the distance
between the individual configurations (from each assessor’s matrix) and maximize
agreement between assessors on each product.37 Finally, GPA generates an optimum
consensus configuration of the samples for different principal axes combinations,
and thus provides information on the sensory spatial interrelationships between the
samples.
Results from GPA also include information on the efficacy of the transformations,
and performance of the assessors in terms of their agreement on each product and
comparison of their individual configurations with the consensus configuration.
Agreement between the shape and orientation of the individual configurations with
each other (and hence agreement with the consensus configuration) is more important
than agreement on scale usage.37 This is because similar shapes and orientations show
alignment on perceptions of product attributes by the assessors.37 Results from
optional permutation tests included with the GPA would indicate if a true consensus
was reached after the transformations, and suggest the appropriate number of factors
(principal axes) to retain.
Conducting FCP
The main objective of FCP is to limit the time spent on training assessors to reach
consensus on the use of descriptor terms (lexicon), some of which might not be famil-
iar to them or applicable for a given study. Thus, the main distinguishing feature of
FCP from conventional descriptive methods is the development of individual descrip-
tive terms by each assessor. However, established sensory protocols/best practices
should be incorporated as much as possible to enable the collection of reliable data and
to obtain meaningful results.
The FCP method may be used with a small group of assessors trained specifi-
cally for FCP (referred to as FCP panel herein) or with naive consumers who are
familiar with the product. The sensory professional should ensure that all potential
assessors, irrespective of training experience, are articulate in critically examining
and describing important product attributes and are willing to participate in the
necessary tasks.
STAGES IN FCP
The following stages should be included in FCP, but approaches to Stages 1–6 would
vary depending on whether a small number of assessors are specifically screened and
validated for their ability to detect, describe, and quantify differences between prod-
ucts (FCP panel) or a large number of naive consumers are used.
1. Assessor screening and qualifications
2. Training and selection
3. Vocabulary development
4. Scaling
5. Validation of assessors’ performance
6. Sample evaluation and data collection
7. Data analysis and interpretation
8. Reporting results
Stages 1-6 for a FCP panel are more rigorous than those for naive consumers
because FCP panels have less numbers of respondents than the latter, and for a smaller
number of assessors, the reliability of the data is expected to increase with increased
training.
FCP Panel
Potential assessors for a FCP panel are typically recruited from a pool of employees,
students, or professionals who have experience with the product. Recruitment criteria
should include good health to make reliable judgments of the product type and good
communication skills.42 Candidates should be screened for normal sensory acuity and
the ability to discriminate and reproduce results.4,42
At least two to three times the required number of qualified assessors (e.g., 40–50
candidates) should be screened to cater for failures and dropouts. A minimum of eight
trained panelists are recommended to achieve stable data in conventional descriptive
methods,43 thus suggesting the need for more than eight assessors for a FCP panel
because they undergo less rigorous training. For example, in a study with different biscuit
formulations, 15 persons were retained as assessors from the 48 that were screened.16
The types of products used for screening should be relevant to those that would be
in the actual study. Screening tests should be conducted across all modalities that
would be germane to the study objective. Types of tests may include matching, dis-
crimination, and intensity rating; and standard screening tests4,42 should be adapted
for the products to be tested. A series of discrimination tests (e.g., Triangle, Duo-Trio)
and intensity-ranking tests should always be conducted to facilitate the selection of
assessors who can reliably perceive and quantify differences between products. To
qualify for subsequent training, candidates should score at least 60% correct from a
series of discrimination tests with relevant products.4,42
Specific examples of screening tests that have been used with potential FCP can-
didates are as follows:
• For food and beverages:
¡ Biscuits containing 70% and 110% of sugar in a standard formulation and
predetermined to be significantly different (95% confidence) by a 30-member
triangle test panel were used. To qualify for training, each candidate was
required to pass three consecutive triangle tests with the two biscuit samples.16
• For personal care products:
¡ Ability to identify and describe fragrances/scents or textures and estimate
proportions of shaded areas in scaling exercises.4
¡ Ability to detect and describe sensations during and after the application of
sticks with different conditioning agents.44
• For animal-behavior studies:
¡ In reported studies, students of animal science or behavior, and livestock or pet
owners are typically recruited by e-mail or questionnaire, and no additional
screening tests are applied.32,33,45,46
• For environmental studies:
¡ Candidates are recruited from persons with the relevant technical expertise.
For example, in a study to assess landscape expressivity,34 graduate students in
holistic science whose academic program included landscape quality were
recruited and were not required to take additional screening tests.
who are heavy users of the product but who have no formal training in sensory evalu-
ation and no affiliations to consumer/market research. The prescreener should contain
sections to assess respondents’ abilities to describe and quantify product characteris-
tics. For example, if the study objective is to evaluate the flavors of products, one would
include questions asking respondents to describe noticeable flavors of specific items,
along with scaling exercises to indicate the proportions of shaded areas.4 Depending
on the goal of the study, demographic quotas may need to be considered, and candi-
dates may be recruited from one geographical region or from multiple regions.
Candidates who pass the prescreener are invited on-site to receive additional
information about the study protocol, and interviewed to determine their potential
ability to participate in FCP. In some instances, additional screening tests are admin-
istered to naive consumers. For example, in a study in which flavor nuances between
black coffee products were being investigated, candidates who were regular coffee
drinkers had to score greater than 70% for odor recognition and 100% accuracy for
basic tastes recognition.21
VOCABULARY DEVELOPMENT
In this phase, assessors are required to generate their own terms to best describe the
products of interest. The main criteria are that objective terms only must be developed
and assessors must be consistent in the definition and use of each term throughout the
study. These criteria must be properly explained to and understood by the assessors
before commencing the exercise. Several approaches to facilitate vocabulary develop-
ment may be used. Typical approaches are outlined below.
• Individual Sample Description (ISD)
¡ Assessors may be presented with samples from all of the study products one at
a time and asked to provide spontaneous information/descriptors about the
modality/modalities of interest (e.g., listing the odor attributes of dried parsley
samples;15 appearance, odor, flavor, and texture of dry-cured ham;5 and aroma
attributes of whole cone hops and dry-hopped beer12).
¡ For animal-behavior studies, assessors may watch individual video footage of
each animal to capture the behavior of interest of a group of animals (e.g.,
pregnant ewes/sheep,31 domestic dogs,32 dogs in shelters,33 pigs,45 and lambs46).
Video clips are typically 2–4 min long, after which assessors would have
approximately 2–3 min to write down terms that best describe the behavioral
and/or emotional expressions of the animals.
¡ For landscape studies, assessors may be presented with monadic sequential
digital images of the actual landscapes in addition to site visits.34
• All Sample Description
¡ In some animal studies, multiple animals are observed in each video clip to
assess their group behavioral responses, such as social interaction of dairy
cows (e.g., social licking and head butting47) and environmental challenges to
horses and ponies.30
• Comparative Products Method
¡ A version of Kelly’s Repertory Grid Method (RGM) may be used in which
triads of products/samples with accompanying elicitations are presented to the
assessors.25,48,49 This is a systematic approach in which two of the products
within each triad are randomly grouped such that each product appears in at
least one triad, and one product from each triad is carried over to the next
triad. Assessors are asked to describe how the two grouped products are
similar to each other and different from the third one.49,50 The products may be
actual products in the study,25,40,51 or they may be related to the study
products.50 In either scenario, assessors are encouraged to list all of the
differences and similarities they can perceive, and these terms are used
subsequently to develop their vocabulary list. RGM is typically used with naive
consumers to trigger the generation of descriptors but is criticized for being
difficult and time-consuming to implement.52
¡ Samples of actual products may also be presented in pairs to assessors who are
asked to describe their similarities and differences.21,22
The study objective and product type should be considered when selecting a
method for vocabulary development. For example, in the ISD approach, assessors are
asked to describe products individually, whereas in RGM they are required to com-
pare samples. One study reported that ISD and RGM approaches for evaluating choc-
olate produced sample configurations that were very similar, as were interpretations
of the main perceptual dimensions.52 However, it has been proposed that ISD might
be more appropriate when the main objective is to discover the consumers’ vocabu-
lary, but RGM might be more useful when the goal is to discriminate among product
profiles.49
The sensory professional should review the terms developed by each assessor
and provide assistance, as needed, in finalizing their vocabulary. For example, the
sensory moderator should inspect the terms for synonyms and discuss their mean-
ings with the particular assessor to guide the selection of the most relevant term. If
there are an extensive number of terms for a given modality, the assessor and sensory
moderator should discuss their definitions to ensure that all are essential (e.g., no
redundancy) and that the assessor would not be overwhelmed by having all of them
on the ballot.
SCALING
Candidates are introduced to scaling with the actual type of scale for the study. There
is no standard scale for FCP. Scales may vary in length (e.g., 6 points, 6 inches, 15 cm)
and type (e.g., category, structured, unstructured/continuous with or without anchors).
However, scale characteristics4 should be carefully chosen by the sensory professional
based on study objectives and product type. Any words, anchors, or graphics used with
the scale must be descriptive in nature and not related to hedonic sentiments.
FCP Panel
Examples of scales and their applications with FCP are listed below:
• 7-point continuous scale with anchors at 1 = null or very slight and 7 = very intense
for evaluating the odor, flavor, and textural attributes of ewes’ milk cheeses40
• 9-cm unstructured scale anchored at endpoints with intensity terms for
evaluating the appearance, aroma, flavor, and texture of biscuits with different
nutritional formulations16
• 10-cm unstructured line scale anchored at the ends with the terms not perceptible
and strongly perceptible for evaluating the effect of different drying protocols on
the odor of dried parsley15
• 125-mm-length visual analog scale anchored at 0 = minimum and 125 = maximum
for scoring the behavioral expressions of dogs32
• 12.5-cm unstructured line scale anchored, respectively, on the left and right sides
with 0 (descriptor not detected) and 100 (descriptor detected at the highest
intensity) for investigating the behavioral expression of lambs when exposed to
tactile interactions with humans46
• 125-mm-length visual analog scale ranging from minimum (quality absent) to
maximum (quality could not be more dominant) for assessing the expressive
qualities of a range of landscapes34
ASSESSOR TRAINING
For training sessions, individual ballots or score cards are prepared by the sensory
professional for each assessor from their own list of terms (from the Vocabulary
Development stage). Assessors are shown how to use the ballots, after which they
would rate attribute intensities of one or more replicates of the products using the
prescribed scale. In situations in which the stimuli to be evaluated are not tangible
products (e.g., animal behavior or environmental factors), relevant or representative
graphics (e.g., photographs, video clips) or other items and/or site visits may be used
for training, and assessors would likewise score attribute intensities. Irrespective of
type, products to be evaluated should be blind-coded and randomized, and product
characteristics (e.g., spicy flavor, alcohol content) site location and potential fatigue
should be considered in determining the number of products to be evaluated in a
given session.42
FCP Panel
Training sessions are also opportunities for monitoring scale usage and for revising
anchors or ballots, or both, as necessary. After two replicates of the products have been
evaluated, the sensory professional should confirm the adequacy of scale and ballot
with each assessor. Considerations would include how the assessor is representing
slight versus moderate versus high versus very high intensities on the scale; how useful
the anchors are in their meanings and positions on the scale, and if any of them need
to be revised to home in on intensity ranges specific to the products; and whether there
are terms that definitely need to be deleted from or added to the ballot.
FCP Panel
Data collected from each assessor during training must be evaluated by the sensory
professional for consistency in terminology usage and scoring, ideally between repli-
cate product samples. The rigor of validation will depend on the study objective and
experience of the assessors with the products. In some instances, ANOVA on replicate
data from each assessor might be essential to monitor consistency in scoring relative
intensities (i.e., no significant difference between replicates for profiles of the same
product). The sensory professional would determine the number of training sessions
required to achieve this goal. Two training sessions are typically sufficient, but one
session might suffice when assessors have been prescreened for relevant technical
expertise. Data should also be inspected for terms that are used with very low fre-
quency, and these terms should be reviewed with the respective assessors to determine
whether they should be removed from the ballot or replaced by more suitable alterna-
tives. For example, in animal-behavior studies, descriptors with negative meanings
might need to be converted to the positive form (e.g., “unhappy” to “happy” ) to facili-
tate uniformity in the use of the scales.46
FCP Panel
At least 10 assessors who provided reliable and consistent data during the training are
selected to form the FCP panel. However, this number may be increased depending on
the nature of the product and whether the assessors have limited experience with the
product, FCP method, or other sensory practices. The sensory professional should
keep in mind that a minimum of 8 trained assessors are recommended to obtain stable
data when using conventional descriptive methods.43 The number of assessors on FCP
panels range from 10 to 18 in reported studies.
FCP Panel
From reported studies, assessors in FCP panels typically evaluate one or two repli-
cates of the product depending on their experience with the product and its degree of
complexity, but up to four replications13 may be necessary if a detailed characteriza-
tion of products is the objective. In comprehensive studies on animal-behavioral
responses, video clips are usually viewed and evaluated in duplicate, and the two sets
of scores are entered into one matrix defined by the number of animals observed and
the number of terms used by the individual assessor.33 At least duplicate evaluations
of all products should be carried out by FCP panels to enhance the stability and inter-
pretation of the data.
Data Preparation
Prior to statistical analysis, the data from each assessor must be formatted in a two-
way table with rows for the products and columns for the numerical scores of the
attributes or descriptor terms. The latter are then configured as a series of matrices,
where each matrix represents the intensities of n products across p attributes for one
assessor. Most software packages have options to cater for unequal numbers of col-
umns among the matrices arising from the diverse quantity of terms used by assessors.
However, if such options are unavailable, one would need to pad each matrix with
zero-intensity columns per product to achieve equal numbers of columns across all
matrices.32,37
Data Analysis
During data input, one would include information on the number of matrices (config-
urations) and indicate if their dimensions are equal or to be defined. The latter option
would require denoting the number of attribute columns per assessor. In FCP, asses-
sors typically evaluate all products, thereby allowing an algorithm based on Gower’s
GPA approach36 to be applicable. However, if one or more assessors did not evaluate all
products, some software packages offer an alternative Commandeur algorithm53 to
account for the missing data.
Several statistical packages support GPA, and thus care should be taken in select-
ing tests that are applicable to sensory data and the objectives of the particular study.
Basic steps to include in the GPA are translation, rotation, scaling, and Procrustes
Analysis of Variance (PANOVA). It is also advisable to incorporate a Consensus Test
(i.e., a permutation test to determine whether a true consensus is reached after the GPA
transformations). The consensus profile is defined in terms of its geometrical proper-
ties only.32,37 If available, a Dimensions Test might also be a helpful option to determine
the appropriate number of factors (dimensions) to keep in the consensus
Dimension 1 (e.g., ranged from −2.63 to −4.59), and regular and light yogurts were on
the positive side of Dimension 1 (e.g., ranged from 1.98 to 5.72), one would conclude
that the Greek and high-protein yogurts were perceived as white, thick, dull, rough,
and grainy looking, whereas the regular and light yogurts had glossy, creamy, smooth,
and silky appearances.
REPORTING RESULTS
One would indicate the number of spatial dimensions that the products occupy based
on the number of axes that were important in explaining their variability. One would
then discuss the descriptor terms and products that aligned with the important axes
and focus on the ones that loaded highly (typically ≥0.5 but definitely ≥0.3) with the
respective axes. This would facilitate describing the attributes and product properties
associated with each dimension as well as profile differences between the products (as
discussed/exemplified in the previous section). The sensory professional should be
mindful of project objectives when interpreting and reporting results (e.g., selecting
names for axes that provide meaningful and relevant insights to the project). Graphical
representations of the important dimensions of the consensus configurations, and
tables listing descriptor terms that correlated highly with these dimensions should be
included to enhance discussion of similarities and differences in product profiles.
Advantages of FCP
• Is faster than conventional descriptive methods: short-term training allows for
quick responses and insights to business needs and market.
• Increases the number of potential descriptors for samples: can enrich sensory
descriptive vocabulary, especially when working with complex products or
scenarios (e.g., animal behavior, environment). In a study to evaluate aromas of
whole-cone hops and dry-hopped beer, FCP generated more than 180 valid aroma
descriptors versus only 14 developed by a conventional descriptive panel.12
• Individual assessor reproducibility is similar to conventional methods because it
allows assessors to use their own terminology.
• Facilitates profiling of samples because assessors feel more comfortable/can better
relate to using their own terms.1
• Affords less frustration to panel leaders because there is no need to force
agreement on the use of common terminology by assessors.
• Avoids potential challenges that might occur during consensual steps (e.g.,
different conceptions of the same stimulus,57 unbalanced group dynamics within
the panel, or a panel leader with a dominant personality58).
• Is cheaper because panel maintenance associated with some conventional
methods (e.g., Spectrum Descriptive Analysis, Consensus Profiling) is not
necessary: thus less time and resources are required.
• Requires only short-term commitment from assessors compared with long-term
commitment needed for some conventional methods (e.g., Spectrum Descriptive
Analysis, Consensus Profiling)
• Has potential for yielding insights on product perceptions from target consumers
versus technical descriptions only from trained panelists.
Disadvantages
• No formal or prescribed lexicon is provided to assessors, and assessors do not
define their descriptor terms; thus, the inconsistent use of terms by assessors,
especially naive consumers, is possible.
• Terms used by assessors might be too personal or difficult to interpret by the
sensory professional, or terms might be related to product benefits.56
• Ballot construction is time-consuming: the sensory professional must prepare a
separate ballot for each assessor containing their own descriptor terms
(vocabulary).
• Data are not robust: individual attribute means cannot be calculated, and thus
FCP does not facilitate the determination of significant flavor profile differences
between products (e.g., using ANOVA) or generation of numerical detailed
profiles (e.g., spider or radial graphs) typically associated with conventional
descriptive analysis methods.
• Results are not reproducible, stable over time or in a specific sensory space: FCP
does not cater for the addition and analysis of supplementary data from similar
types of products collected at a different time point.
Flash Profile
In the Flash Profile (FP) method, which was derived from FCP, assessors also develop
their own descriptor terms; and like FCP, FP has been carried out with trained panels,
semi-trained assessors, and naive consumers.2 However, in FP, there is a simultaneous
presentation of the whole sample set, and products are ranked for each attribute.59
Typically, FP is carried out in two sessions or in one session with two steps. In the first
step, assessors taste all the products comparatively to generate terms to best discrimi-
nate between the products. In the second step, they rank the products from low to high
for each descriptor.59 For data analysis, numbers are assigned to the relative ranks of
the products for each term, and (n × p) matrices are developed for GPA.2 The use of
ranking in FP has been criticized as time-consuming and more difficult than FCP
because assessors have to re-taste samples, but re-tasting might facilitate greater dis-
crimination between the products.60 From studies in which the two methods have
been directly compared, recommendations favor using FCP for large numbers of prod-
ucts and FP when there is a small product set.60
Check-All-That-Apply
Both FCP and Check-All-That-Apply (CATA) require no consensus on descriptor
terms selected by assessors for product evaluation. However, the CATA method uses a
ballot consisting of product descriptor terms that are preselected by the sensory profes-
sional. The selection of descriptor terms is one of the main ways in which CATA differs
from FCP. The CATA terms may originate from trained assessor panels or previous
qualitative or quantitative consumer studies, or both,2 and unlike FCP, assessors have
no input in vocabulary development. Another major difference between FCP and
CATA is that the latter requires assessors to simply select (check/tick) all of the terms
they consider as descriptors of the product, but there is no attribute intensity rating. A
disadvantage of CATA is that terms might not be selected because assessors were neu-
tral or undecided about them or because they did not pay attention to them.2 The
CATA method requires minimal instruction, no training, is relatively easy to perform,
and is completed quickly.61 Frequency scores from the binary data collected from
CATA can be subjected to Cochran’s Q test to determine whether there are significant
differences between the products for their respective attributes, a feat that is not possi-
ble with FCP data. However, insights on spatial sensory interrelationships of products
may be obtained by both methods: using Correspondence Analysis on CATA data and
GPA on FCP data.
Sorting/Multidimensional Sorting
Sorting is a technique in which products are grouped in two dimensions by assessors
according to their overall perceptions or perceptions of specific attributes of the prod-
ucts. It requires little or no assessor training, and is based on the concept that the more
frequently two products are grouped together the more similar they are. Major differ-
ences from FCP are that in Sorting no language or descriptor terms are developed or
used by the assessors, and there is no quantitative rating of attributes. For data analy-
sis, a matrix developed by summing the number of times pairs of products are sorted
into the same group (across all assessors) is subjected to multiple dimensional scaling
(MDS). Compared with the sensory dimensions developed from GPA on FCP data,
those from MDS on Sorting data might illustrate product relationships from a more
integrated perspective than from separable attributes.62 However, unlike FCP, the
dimensions created by MDS from Sorting data are not readily interpretable unless the
sensory professional has additional knowledge of the stimuli.
Data Source
The data used for this case study were part of a beer data set provided by Hal MacFie
(Hal MacFie Sensory Training Limited, Keynsham, United Kingdom). Vocabulary for
the original beer data was developed using the RGM. Eight assessors with technical
expertise or consumption experience with beers, or both, evaluated 12 freshly-poured
draft beers of different brands using 9-point intensity scales.
Data Preparation
This case study focused on data from one modality only—appearance of the beers—to
facilitate visualization of the accompanying tables and graphics. The number of
descriptor terms (attributes) developed by the assessors for the appearance of the beers
ranged from 3 (Assessor P4) to 10 (Assessor P6) and are listed in table 1.
Data were prepared in an appropriate format to use XLSTAT Premium version
2019.3.2 for GPA analysis. The data set from each assessor was formatted in a two-way
table or matrix with rows for the 12 products (A–M) and columns for the numerical
scores of the attributes. Table 2 illustrates the matrix for Assessor P1: the attribute
intensity scores for each product are in columns 2–9.
Data from the eight assessors were configured as a series of matrices, where each
matrix represented intensities of the 12 products across the particular number of
attributes per assessor. Table 3 shows the matrices derived from Assessors P1 and P2:
data from Assessor P1 are shown in columns 1–8, and data from Assessor P2 are
TABLE 1 D
escriptor terms developed by eight assessors for the appearance of 12
beers
P1 P2 P3 P4
shown in columns 9–15. Data sets from the other six panelists (not shown here) were
merged in a similar format to develop the complete series of matrices. Information on
the number of columns designated for attributes from each assessor was tabulated
separately for data-entry purposes (e.g., 8 columns for Assessor P1 and 7 columns for
Assessor P2).
During data input, information on the number of products, assessors, and attri-
bute columns per assessor were defined. (XLSTAT has options to account for unequal
numbers of attribute columns from assessors, so there was no need to add columns
with zeroes to each matrix.)
Data Analysis
Obtaining meaningful insights from FCP data relies heavily on properly conducting
GPA. Thus, the sensory professional should have prior training in using suitable statis-
tical software, including the selection of appropriate options; otherwise, a statistician
should be consulted. The data-analysis procedures and options used in this case study
BK-AST-MNL_13-200365-Chp05.indd 120
Product Dark Appearance Amount of Head Light Appearance Carbonated Appearance Amount of Color Long-Lasting Head Hearty Appearance Fizzy Appearance
A 2 3 6 5 3 5 2 5
B 3 5 2 3 5 5 5 5
C 2 5 7 6 2 6 2 6
D 2 5 4 5 4 5 4 5
E 1 5 5 5 4 5 5 6
G 2 3 3 4 3 3 3 5
H 5 4 2 2 6 5 6 3
I 2 4 4 3 4 3 3 3
J 2 3 3 5 4 3 3 5
Descriptive Analysis Testing for Sensory Evaluation: 2nd Edition
K 2 3 5 5 5 4 4 5
L 6 5 1 4 6 4 5 5
M 6 5 1 2 7 3 6 2
18/11/20 3:45 PM
TABLE 3 Matrices derived from attribute intensity scores from Assessors P1 and P2
BK-AST-MNL_13-200365-Chp05.indd 121
Long- Thin, Long-
Dark Amount Light Carbonated Amount Lasting Hearty Fizzy Light Amount Constant Watery Lasting Frothy Golden
Appearance of Head Appearance Appearance of Color Head Appearance Appearance Appearance of Color Head Head Head Head Color
2 3 6 5 3 5 2 5 6 2 4 5 3 3 3
3 5 2 3 5 5 5 5 4 4 2 6 2 2 5
2 5 7 6 2 6 2 6 6 2 5 2 6 6 2
2 5 4 5 4 5 4 5 5 1 7 1 1 6 2
1 5 5 5 4 5 5 6 6 3 2 6 5 2 3
2 3 3 4 3 3 3 5 2 5 5 3 5 5 6
5 4 2 2 6 5 6 3 4 5 6 1 6 6 6
2 4 4 3 4 3 3 3 4 4 6 2 6 6 5
2 3 3 5 4 3 3 5 2 6 5 2 5 5 6
2 3 5 5 5 4 4 5 2 5 5 3 5 4 6
6 5 1 4 6 4 5 5 1 6 5 1 5 5 3
6 5 1 2 7 3 6 2 1 7 6 1 6 6 2
Note: Scores from Assessor P1 are shown in columns 1–8; scores from Assessor P2 are shown in columns 9–15.
121
18/11/20 3:45 PM
122 Descriptive Analysis Testing for Sensory Evaluation: 2nd Edition
are provided below but may have to be altered depending on the study objective and
capabilities of the statistical software.
• The GPA was set to perform a maximum of 100 iterations: this is a typical number
of iterations to request to allow the GPA algorithm to develop and test a sufficient
quantity of consensus spaces from the data to get the best possible fit for the
consensus configuration.
• Gower’s GPA approach36 was selected because the 12 products were evaluated by
all 8 assessors.
• Steps included in the GPA were translation, rotation, scaling, PANOVA, a
Consensus Test, and a Dimensions Test.
• For the Consensus and Dimension tests, 300 permutations each were requested:
typically a minimum of 200 is used. These permutation tests are necessary
because the amount of variance explained “on its own” cannot give an indication
of the true fit of the consensus configuration from the data (because assessors
used different descriptors to score attribute intensities of the products) and should
be tested against the variance from random permutations to ensure that results
from the original data did not occur by chance. During the Consensus and
Dimension tests, respective results of the consensus variance and number of
dimensions from the original data were tested (at 95% confidence) against results
from the 300 random permutations of the various attribute intensity scores for
each product.
• Finally, a PCA was conducted to obtain an optimal graphical representation of the
consensus configuration and facilitate its interpretation.
FIG. 1 C
onsensus Test results: variance explained by original data that generated the
consensus configuration versus variance explained by 300 random
permutations of the data.
variability between the volume that each of the individual configurations occupied in
the sensory space.
Results from the Consensus Test (95% confidence) are shown in figure 1, in which
the horizontal axis depicts the variance explained by the original data (that generated
the Consensus Configuration) and by the 300 random permutations of the data. The
histogram indicates the distribution of the variance from 300 random permutations of
the data: the proportion of variance explained from the majority of random permuta-
tions is less than 0.25, with a few permutations explaining almost 0.28 of the variance
(fig. 1). The Rc value of 0.319, which corresponds to the proportion of the original vari-
ance explained by the consensus configuration, is significantly greater than the results
obtained from the random permutations of the data, thus indicating that the consen-
sus configuration was not developed by chance.
Configuration. The horizontal axes in figure 2A and 2B show the values for F, where F
represents the following ratio: (variance between products)/(variance between
assessors).
For each dimension in the consensus configuration, the F value from the original
data was tested against the F value obtained from 300 random permutations of the
data in that dimension: products were considered significantly separated in a given
dimension if the F value from the original data was larger than the highest F value
from the random permutations. Figure 2A shows that the F value (34.292) from the
original data in Dimension F1 was significantly greater than the distribution (histo-
gram) of F values from the 300 random permutations of the data in that dimension.
However, the F value (9.989) from the original data in Dimension F2 (fig. 2B) was not
significantly higher than the F values from the 300 random permutations (some ran-
dom permutations had F values as high as 10.3). Therefore, one would conclude that in
the consensus configuration, the products were significantly separated (95% confi-
dence) on Dimension F1 but not on Dimension F2.
FIG. 2 D
imensions Test results: The significant number (95% confidence) of
dimensions in which the products were separated in the consensus
configuration. (A) Test in Dimension F1: Original data significantly different
from 300 random permutations of the data; (B) Test in Dimension F2: Original
data not significantly different from 300 random permutations of the data.
FIG. 2 (continued)
Eigenvalues from the PCA (table 5) were also used to determine the appropriate
number of dimensions or axes to retain for discussion and display. The eigenvalues
(table 5) showed that after PCA transformation of the Consensus Configuration, fac-
tors/axes F1 and F2 represented 66.9% and 15.5%, respectively, of the total variability
compared with less than 7% variability from the other factors. These results (table 5)
suggested the retention of two axes for display, and further indicated that the
F1 F2 F3 F4 F5 F6
Note: F1–F6 denotes the first six axes (factors) from the PCA.
variability in appearance attributes of the products was captured mainly along the
first axis (F1) and to a much lesser extent along the second axis (F2), agreeing with the
previous Dimensions Test on the consensus configuration.
To understand how the attributes aligned with the two main axes, correlations
between descriptor terms and Dimensions F1 and F2 from the individual assessors
(table 6) were inspected. Terms displaying high (≥0.5) positive and negative correlations
with the dimensions were considered important for this case study example. Depend-
ing on a study objective, terms with correlations greater than ±0.3 may also be exam-
ined by the sensory professional to obtain more details on product profile perceptions.
Table 6 shows that the terms dark appearance or dark color were used by all asses-
sors except Assessor P2 and showed a high positive correlation (greater than +0.75)
TABLE 6 C
orrelation between assessor attributes and dimensions in the consensus
configuration
(continued)
TABLE 6 C
orrelation between assessor attributes and dimensions in the consensus
configuration (continued)
Note: Numerical values in F1–F4 denote the correlation between the assessors’ descriptor terms and the
first four axes (dimensions).
with Dimension F1. The terms light color or light appearance were used by six asses-
sors and correlated negatively (more negatively than −0.75) with Dimension F1 from
four assessors (P1, P2, P5, and P6), negatively (−0.74) with Dimension F2 from Asses-
sor P4, but positively (+0.79) with Dimension F1 from Assessor P3 (table 6). Terms
used by one or two assessors only were amount of color, caramel-color brown, depth of
shade, golden-brown appearance, golden color, thick appearance, hearty appearance,
and thick head that correlated positively with Dimension F1; and yellow color, fizzy
FIG. 3 P
rojection of individual assessor terms for color on Dimension F1 of the
consensus configuration.
TABLE 7 PCA results showing the product coordinates on the first four axes
Product F1 F2 F3 F4
FIG. 4 R
elative positions of the products (A–M) on Dimensions F1 and F2 after the
PCA.
TABLE 8 Residuals by object: the residual variance from each product after the GPA
A 30.19
B 45.52
C 39.42
D 49.45
E 28.66
G 35.21
H 32.59
I 35.45
J 49.39
K 36.79
L 44.22
M 30.57
TABLE 9 C
oordinates for Products J and M on the first two dimensions from
individual assessors
Note: F1 and F2 denote the coordinates of Products J and M on the first two dimensions of the individual
configurations developed from the scores of each assessor (P1-P8).
coordinates for Product J on F1 ranged from −1.95 to 2.15 (disagreement on its posi-
tion on F1)—hence, the larger numerical residual for Product J (vs. Product M) pre-
viously observed in table 8.
The “Residuals by Configuration” results (table 10) show the residual variance
from each assessor after the consensus configuration was generated by the GPA and
indicate the variation between the individual configurations from each assessor
and the final consensus configuration. The configuration from Assessor P6 had the
best fit to the consensus configuration (smallest residual variance of 41.82) com-
pared with less goodness of fit (residual variance of 64.66) from the configurations
of Assessors P2, P5, and P8 (table 10). A revisit of table 6 provided further explana-
tion for the differences in the residual variance between these assessors: the descrip-
tors developed by Assessor P6 correlated with Dimensions F1, F2, and F3 in a
similar manner to those in the consensus configuration, whereas descriptors for
head from Assessor P5 correlated very highly with Dimension F4 but not F2, and
light color from Assessor P8 correlated negatively with Dimension F3 instead of
Dimension F1.
The “Scaling Factors” (table 11) indicate the relative scale ranges used by the
assessors and how they were transformed during the GPA. Assessors P1, P5, P6,
and P7 had scaling factors close to 1, indicating a relatively similar scale-range use
(table 11). For those assessors who used a relatively narrower scale range (e.g.,
Assessor P3, whose actual intensity scores were mainly between 2 and 5) com-
pared with the other assessors, the GPA algorithm compensated with a high scale
factor (1.28 for Assessor P3; table 11) to get a better fit (expansion) of the volume of
P3’s configuration to the consensus configuration. On the other hand, Assessor P2
used a relatively wider scale range (P2’s actual scores from the raw data were
between 1 and 7), and thus the GPA algorithm used a (reduction) scale factor of
0.81 on the volume of P2’s configuration to improve its fit to the volume of the
consensus configuration (table 11).
TABLE 10 R
esiduals by configuration: residual variance from each assessor’s
configuration after the consensus configuration was developed by the GPA
P1 57.64
P2 64.36
P3 54.46
P4 61.90
P5 64.29
P6 41.82
P7 48.34
P8 64.66
TABLE 11 S
caling factors used on the individual configurations from each assessor
P1 0.99
P2 0.81
P3 1.28
P4 1.23
P5 0.98
P6 0.92
P7 0.93
P8 1.13
Acknowledgments
The author thanks Hal MacFie for providing the data that were used for statistical
analysis and development of the case study presented in this chapter.
References
1. A. A. Williams and S. P. Langron, “The Use of Free-Choice Profiling for the Evaluation of
Commercial Ports,” Journal of the Science of Food and Agriculture 35, no. 5 (1984):
558–568, https://doi.org/10.1002/jsfa.2740350513
2. P. Varela and G. Ares, “Sensory Profiling, the Blurred Line between Sensory and Consumer
Science. A Review of Novel Methods for Product Characterization,” Food Research
International 48, no. 2 (2012): 893–908, https://doi.org/10.1016/j.foodres.2012.06.037
16. D. A. M. dos Santos, J. Lobo, L. M. Araujo, and P. S. Marcellini, “Free Choice Profiling,
Acceptance and Purchase Intention in the Evaluation of Different Biscuit Formulations,”
Ciencia e Agrotecnologia 39, no. 6 (2015): 613–623.
23. M. B. dos Santos Scholz, C. S. G. Kitzberger, N. Durand, and M. Rakocevic, “From the Field
to Coffee Cup: Impact of Planting Design on Chlorogenic Acid Isomers and Other
Compounds in Coffee Beans and Sensory Attributes of Coffee Beverage,” European Food
Research and Technology 244, no. 10 (2018): 1793–1802, https://doi.org/10.1007/
s00217-018-3091-7
24. Y. Xia, F. Zhong, Y. Chang, and Y. Li, “An Aromatic Lexicon Development for Soymilks,”
International Journal of Food Properties 18, no. 1 (2015): 125–136, https://doi.org/
10.1080/10942912.2013.780255
26. L. Torri, M. Piochi, R. Marchiani, G. Zeppa, C. Dinnella, and E. Monteleone, “A Sensory- and
Consumer-Based Approach to Optimize Cheese Enrichment with Grape Skin Powders,”
Journal of Dairy Science 99, no. 1 (2016): 194–204, https://doi.org/10.3168/jds.2015-9922
36. J. C. Gower, “Generalized Procrustes Analysis,” Psychometrika 40, no. 1 (1975): 33–50.
38. E. van der Burg, J. de Leeuw, and G. Dijksterhuis, “OVERALS: Nonlinear Canonical
Correlation with k Sets of Variables,” Computational Statistics and Data Analysis 18, no. 1
(1994): 141–163, https://doi.org/10.1016/0167-9473(94)90136-8
39. E. van der Burg and G. Dijksterhuis, “Generalised Canonical Analysis of Individual Sensory
Profiles and Instrumental Data,” Data Handling in Science and Technology 16 (1996):
221–258.
40. P. Bárcenas, F. J. Pérez Elortondo, and M. Albisu, “Comparison of Free Choice Profiling,
Direct Similarity Measurements and Hedonic Data for Ewes’ Milk Cheeses Sensory
Evaluation,” International Dairy Journal 13, no. 1 (2003): 67–77, https://doi.org/10.1016/
S0958-6946(02)00139-5
42. Guidelines for the Selection and Training of Sensory Panel Members (West Conshohocken,
PA: ASTM International, 1981), https://doi.org/10.1520/STP758-EB
43. H. Heymann, B. Machado, L. Torri, and A. L. Robinson, “How Many Judges Should One
Use for Sensory Descriptive Analysis?” Journal of Sensory Studies 27 (2012): 111–122.
48. D. M. H. Thomson and J. A. McEwan, “An Application of the Repertory Grid Method
to Investigate Consumer Perceptions of Foods,” Appetite 10, no. 3 (1988): 181–193,
https://doi.org/10.1016/0195-6663(88)90011-6
54. I. N. Wakeling, M. M. Raats, and H. J. H. MacFie, “A New Significance Test for Consensus in
Generalized Procrustes Analysis,” Journal of Sensory Studies 7, no. 2 (1992): 91–96,
https://doi.org/10.1111/j.1745-459X.1992.tb00526.x
56. O. Lazo, A. Claret, and L. Guerrero, “A Comparison of Two Methods for Generating
Descriptive Attributes with Trained Assessors: Check-All-That-Apply (CATA) vs. Free
Choice Profiling (FCP),” Journal of Sensory Studies 31 (2016): 163–176, https://doi.org/
10.1111/joss.12202
61. L. Dooley, Y.-S. Lee, and J.-F. Meullenet, “The Application of Check-All-That-Apply (CATA)
Consumer Profiling to Preference Mapping of Vanilla Ice Cream and Its Comparison to
Classical External Preference Mapping,” Food Quality and Preference 21, no. 4 (2010):
394–401, https://doi.org/10.1016/j.foodqual.2009.10.002
62. A. Saint-Eve, E. Kora, and N. Martin, “Impact of the Olfactory Quality and Chemical
Complexity of the Flavouring Agent on the Texture of Low Fat Stirred Yogurts Assessed
by Three Different Sensory Methodologies,” Food Quality and Preference 15 (2004):
655–668, https://doi.org/10.1016/j.foodqual.2003.09.002
Introduction
Temporal methods encompass a broad range of techniques to gather information
about the time course of sensory sensations. Methodological developments have
focused on measuring temporality within descriptive analysis work, although research
has studied temporal liking as well. This chapter focuses solely on the analytical mea-
sure of temporal events.
One of the earliest examples of evaluating temporal characteristics was in Flavor
Profiling,1 in which the order that flavor attributes appeared were listed, as well as the
quantification of aftertastes. Larson-Powers and Pangborn2 detailed the time-intensity
method, which focused on the exclusive measurement of intensity changes of a single
taste sensation from onset through aftertaste. Since that time there have been many
methods developed to expand the measurement of sensations relative to time, some
focused on attribute intensity changes and others focused on attribute perceptual
changes. Examples of these methods are classified in table 1.
1
P&D Consulting LLC, 57 N St. NW, Unit 134, Washington, DC 20001, USA
2
Compusense Inc., 255 Speedvale Ave. West, Guelph, ON Canada N1H 1C5
https://orcid.org/0000-0002-2999-2061
DOI: 10.1520/MNL1320160016
Copyright © 2020 by ASTM International, 100 Barr Harbor Dr., PO Box C700, West Conshohocken, PA 19428-2959
ASTM International is not responsible, as a body, for the statements and opinions expressed in this chapter.
ASTM International does not endorse any products represented in this chapter.
Acronym
Note: Superscript numbers correspond to references in the reference list at the end of the chapter.
the skin). Changes that might occur beyond a single exposure period are generally not
evaluated using the temporal methods described here but are more in the realm of
longer-term methods (e.g., shelf-life studies or extended-use testing).
Overview of Method
Temporal methods measure the changes in attribute appearance or attribute intensi-
ties over time and are typically performed by a trained panel of assessors. They provide
additional information about the full product experience that cannot be captured
using traditional descriptive analysis, which is focused on maximal intensities at one
point in time. Temporal methods are concerned with attribute changes within the
immediate product experience and do not extend beyond the single exposure period.
Subjects/Assessors
Generally temporal evaluations are conducted by trained panelists. As traditional
descriptive analysis will typically accompany temporal work as a means of
understanding the product experience more fully, it is efficient to use the same panel-
ists to conduct both evaluations. The advantage of using current descriptive panelists
is a reduction in training time, as they will already be familiar with the product, the
terminology to describe the product, and the data-entry system.
Given the similarity to descriptive panel work, the same numbers, background,
and expertise is expected of temporal method panelists. However, given the difficult
cognitive and physical tasks involved in some methods, not all existing panelists may
be successful in capturing their temporal responses. Thus, recruiting and training
additional panelists may have to be considered.
Data Collection
The type and number of attributes collected are as listed in table 2.
Reporting of Results
Time will always be a key data point in any temporal method, whether a specific time
unit or an ordering of the attributes. The data for analysis will vary depending on the
method, as shown in table 3.
Practical Considerations
Choosing a temporal method depends on the attribute change being measured. In
selecting a method, consider the temporal aspect being affected by the product change,
as well as the capabilities and resources of the researcher.
TABLE 3 Key parameters and analysis appropriate for each temporal method
Note: Details on each method are provided in the cited literature references and in the ASTM guides on
time intensity and on other temporal methods. ANOVA = analysis of variance; AUC = area under the curve.
Situational Adaptations
Temporal methods should be used very judiciously, as they typically require highly
trained panelists, a specific data-entry system, and an experienced researcher to exe-
cute the study well. In some situations of multi-attribute evaluation, focusing on only
one or a few attributes of the temporal experience may be appropriate when a full study
cannot be conducted due to time and resource constraints. For example, sometimes
the early onset of a key attribute is critical to acceptability, and simply asking an asses-
sor to check the first appearing attribute on a list will suffice to understand this. In
another example, simply asking about any sensations remaining at a time point after
product application can determine whether the product is meeting specifications for
efficacy.
References
1. S. E. Cairncross and L. B. Sjöström, “Flavor Profiles—A New Approach to Flavor
Problems,” Food Technology (1950): 308–311.