Food Quality and Preference: Leticia Vidal, Gastón Ares, Duncan I. Hedderley, Michael Meyners, Sara R. Jaeger
Food Quality and Preference: Leticia Vidal, Gastón Ares, Duncan I. Hedderley, Michael Meyners, Sara R. Jaeger
a r t i c l e i n f o a b s t r a c t
Article history: Rate-all-that-apply (RATA) questions are a variation of check-all-that-apply (CATA) questions in which
Received 22 July 2016 consumers are asked to indicate whether terms from a list apply to describe a given product, and if they
Received in revised form 10 October 2016 do so, to rate their intensity. RATA questions have been argued to provide more insights than CATA ques-
Accepted 17 December 2016
tions for sensory characterization with consumers. The present research is, to date, the most exhaustive
Available online 21 December 2016
comparison of CATA and RATA with regard to term usage, sample discrimination and sample configura-
tions. A total of seven studies with 860 consumers were conducted with different product categories. A
Keywords:
between-subjects design was used in all studies to compare the two methodologies. Confirming past
Research methodology
Sensory characterization
studies, results from RATA and CATA were very similar. Minor differences between RATA and CATA were
Consumer profiling found, but were study and term specific and general superiority of one methodology over the other was
not established, as opposed to what previous studies had suggested. Instead, results indicate that each
method might have advantages over the other for certain product characteristics. A strong linear relation-
ship was established between mean RATA scores and CATA term citation frequencies, demonstrating
clearly that CATA questions differentiate among samples based on relative strength/weakness of sample
characteristics. Collecting data as RATA but analysing them as CATA was inferior to the use of mean RATA
scores, and is not recommended. The comparison of RATA data using mean scores and Dravnieks’ scores
showed no advantage of the latter and it is recommended that simple mean scores are used. Overall,
results from the present work show that RATA is not necessarily an improvement over CATA questions
and that for consumer research the decision to add an attribute intensity rating step depends on the
aim of the study and the specific characteristics of the sample set.
Ó 2016 Elsevier Ltd. All rights reserved.
1. Introduction Lee, & Meullenet, 2010; Dos Santos et al., 2015; Lelièvre, Chollet,
Abdi, & Valentin, 2008; Moussaoui & Varela, 2010).
Sensory product characterisation is a cornerstone activity in Check-all-that-apply (CATA) questions, a methodology in which
sensory and consumer science (Lawless & Heymann, 2010). Inter- consumers are presented with a list of terms and asked to select all
est in alternative methodologies for sensory characterisation with those that apply to the focal sample, have become one of the most
consumers has substantially increased in the last decade as the line popular approaches for sensory product characterization with con-
between trained assessors and consumers has blurred (Ares, 2015; sumers (Ares & Jaeger, 2015). The structured format of CATA ques-
Meiselman, 2013; Varela & Ares, 2012). For many products, con- tions enables data collection and analysis from large consumer
sumers have been reported to provide sensory spaces that are samples easily and quickly (Ares & Varela, 2014). Research has
highly similar to those obtained using descriptive analysis with shown that data from CATA questions are valid and repeatable
trained assessors (Ares et al., 2015; Cadena et al., 2014; (Ares et al., 2014; Ares et al., 2015; Jaeger et al., 2013) and that they
Dehlholm, Brockhoff, Mejnert, Aaslyng, & Bredie, 2012; Dooley, are not likely to bias hedonic scores (Jaeger & Ares, 2014; Jaeger,
Giacalone, et al., 2013).
The simplicity of CATA questions is also a potential limitation.
The binary response format does not allow direct measurement
⇑ Corresponding author. of the intensity of sensory attributes, which could potentially hin-
E-mail address: lvidal@fq.edu.uy (L. Vidal).
http://dx.doi.org/10.1016/j.foodqual.2016.12.013
0950-3293/Ó 2016 Elsevier Ltd. All rights reserved.
50 L. Vidal et al. / Food Quality and Preference 67 (2018) 49–58
der discrimination of samples with subtle sensory differences were conducted in Auckland (New Zealand), whereas Study 7
(Meyners, Jaeger, & Ares, 2016). In order to overcome this limita- was conducted in Montevideo (Uruguay).
tion, approaches that combine CATA data and intensity measure- In New Zealand participants were registered on a database
ments are emerging (e.g., Reinbach, Giacalone, Ribeiro, Bredie, & maintained by a professional recruitment firm and were screened
Frøst, 2014). In this research focus is directed to rate-all-that- in accordance with eligibility criteria for each of the studies. In
apply (RATA) questions, in which consumers are asked to indicate Uruguay participants were recruited from the consumer database
whether terms from a list apply to describe the focal sample, and if of the Sensometrics & Consumer Science research group of Univer-
they do so, to rate their intensity using a 3 or 5-point scale (Ares, sidad de la República (Uruguay), based on their consumption of the
Bruzzone, et al., 2014). focal products. In all the studies participants gave informed con-
Comparison of sensory characterizations performed using CATA sent and were compensated for their participation.
and RATA questions has shown that both methodologies provide Participants were aged between 18 and 71 years old and the
similar information about samples (Ares, Bruzzone, et al., 2014; percentage of female participants ranged from 33% to 71%. The
Reinbach et al., 2014). The validity and reproducibility of RATA consumer samples comprised varying household compositions,
data has also been confirmed and recommendations for data anal- income levels, education levels, etc. but were not representative
ysis are emerging (Giacalone & Hedelund, 2016; Meyners et al., of the general populations in Auckland and Montevideo.
2016).
However, RATA questions are still an under-explored variant of 2.2. Samples
CATA questions and their potential to improve sample discrimina-
tion remains unconfirmed. Ares, Bruzzone et al. (2014) reported Six product categories were tested (Table 1). In Studies 1–2
evidence of greater discriminative capacity for RATA questions samples were advanced selections grown under commercial condi-
compared to the simple CATA questions: the percentage of terms tions and commercially available apple cultivars. In Studies 3, 4, 6
for which significant differences among samples were identified and 7 samples were commercially available in New Zealand or
was higher for the RATA question variant compared to CATA ques- Uruguay, which were purchased from local supermarkets. In Study
tion variant in three of four consumer studies. However, reported 5 samples were raspberry coulis made from frozen berries pur-
comparisons are limited so far and the potential superiority of chased at a local supplier (Gilmores Wholesale Food and Beverage
RATA questions still needs to be proven, particularly considering Supplies, Auckland, NZ). Defrosted berries were pulped in a Cuisin-
the increasing popularity of the methodology (Franco-Luesma eart blender. Samples were created by manipulation of sweetness
et al., 2016; Oppermann, de Graaf, Scholten, Stieger, & Piqueras- (added sucrose to 3% or 6%), berry flavour (1% sucrose and 0.2%
Fiszman, 2017; Waehrens, Zhang, Hedelund, Petersen, & Byrne, Boysenberry essence from Blue Pacific), or acidity (added malic
2016). In this context, the present research expands the method- and tartaric acids (Sigma Aldrich, Saint Louis, MO), 0.1% and 0.4%,
ological comparison of CATA and RATA questions, focusing on term respectively). Coulis samples were measured into 150 g quantities,
use, sample discrimination and sample configurations. Different frozen and defrosted prior to assessment.
approaches to the analysis of RATA data are also considered in Serving sizes were always sufficient to allow 2–3 bites/sips per
order to provide recommendations for practitioners on this topic. sample in all studies. Samples were always presented in cups
Overall, the present research contributes further insight on the dif- labelled with 3-digit random codes at room temperature, except
ferences between CATA and RATA questions, as well as guidelines for Study 7, where the orange-flavoured drinks samples were pre-
for the implementation of RATA questions. sented at 10 °C.
Table 1
Overview of the seven studies comparing rate-all-that-apply (RATA) and check-all-that-apply (CATA) questions for product sensory characterization by consumers.
Study ID Number of consumers who completed Number of consumers who completed Product category Number of Number of
the task using RATA questions the task using CATA questions samples sensory terms
1 56 54 Apple 4 16
2 52 53 Apple 4 16
3 56 59 Peanuts 3 12
4 60 56 Tinned pineapple 4 12
5 101 102 Raspberry coulis 5 12
6 54 55 Fruitcake 5 12
7 53 49 Orange-flavoured powdered drinks 4 16
L. Vidal et al. / Food Quality and Preference 67 (2018) 49–58 51
applicable terms using a 3-point structured scale (‘low’, ‘medium’ For RATA scores sample configurations were obtained using
and ‘high’). Consumers were told that they had to leave the scale Principal Component Analysis (PCA) on both the arithmetic mean
blank in the case of non-applicable terms. values and Dravnieks’ scores (Dravnieks, 1982). For each attribute
The sensory terms used in each study were selected based on and sample, Dravnieks’ scores are calculated as the square root of
pilot work or previous research using the same product categories. the product of the proportion of consumers who selected the attri-
The lists of terms comprised 12 or 16 terms and covered multiple bute for describing the sample and the average intensity scores of
sensory modalities (appearance, aroma, flavour/taste, texture, after those consumers (ignoring consumers who did not select the attri-
taste, mouth feel) (available upon request). Based on recommenda- bute). The rationale for considering Dravnieks’ scores is that they
tions by Ares, Etchemendy, et al. (2014), the order in which the may provide a better summary of the intensity scores obtained
terms were listed for both CATA and RATA questions was different in a RATA question than arithmetic means as they balance fre-
for each product and each participant, following a Williams’ Latin quency of use and average intensity by only taking into account
Square design. the scores given by those consumers who considered that the attri-
Products were presented sequentially following Williams’ bute was applicable for describing the focal sample and weighing
designs. Data collection took place in standard sensory booths, this with the usage rate.
under white lighting, controlled temperature (20–23 °C) and air- Confidence ellipses around samples were constructed using a
flow conditions. truncated total bootstrapping approach in which only the first
two dimensions of the configurations were considered (Cadoret &
2.4. Data analysis Husson, 2013). Similarity between the sample and term configura-
tions in the first two dimensions, obtained using data from CATA
In accordance with the stated aim and drawing on past method- and RATA questions, was evaluated using the RV coefficient
ological research on CATA and related methods a number of anal- (Robert & Escoufier, 1976). In order to visually compare the config-
yses were performed, focusing on sample characterization, urations, a Procrustes rotation was used considering the configura-
discrimination, sample configurations and stability of the results. tion obtained using CATA questions as reference.
These analyses were performed using data from CATA and RATA
questions. 2.4.4. Stability of the results
RATA questions enabled two approaches to analysis: converting The stability of the results was evaluated using a bootstrapping
RATA data to CATA (RATA-as-CATA) by collapsing responses to two re-sampling approach (Ares, Tárrega, Izquierdo, & Jaeger, 2014).
levels (0 if the attribute was not selected as applicable for describ- The bootstrapping process consisted of extracting random subsets
ing the focal sample or 1 if the attribute was selected as applicable, of different size (m = 5, 10, 15, 20,. . ., N) from the original data with
regardless of its intensity rating) or treating RATA data as continu- N consumers, using sampling with replacement. For each m, 1000
ous (RATA scores) by expanding the scale to 4 points (0–3) random subsets were obtained. Sample configurations were
(Meyners et al., 2016). obtained for each subset. The agreement between sample and term
configurations in the first two dimensions of the configuration and
2.4.1. Term usage the reference configuration (obtained with all the consumers) was
For each study, frequency of use of each term for each sample in evaluated by computing the RV coefficient between their coordi-
CATA and RATA-as-CATA questions was calculated by counting the nates. Average values for the 1000 random subsets of size equal
number of participants who selected the term for describing each to the total number of consumers in each study (N) were calculated
sample. Fisher’s exact test (Fisher, 1954) was used to evaluate and used as an index of stability.
the existence of significant differences between CATA and RATA All statistical analyses were performed using R language (R Core
questions at the aggregate level and for each of the terms. Team, 2015). FactoMineR was used to perform CA, PCA and to cal-
The frequency distribution of RATA scores at the aggregate level culate RV coefficients (Lê, Josse, & Husson, 2008).
was determined. The mean RATA scores for each term and sample
were graphed as a function of the percentage of consumers who
3. Results
selected the term using CATA questions. The R2 coefficient of the
linear correlation was calculated.
3.1. Frequency of use of sensory terms and RATA scores
2.4.2. Significant differences among samples In six of the seven studies consumers used a significantly larger
For CATA questions and RATA-as-CATA data, Cochran’s Q test number of terms (p < 0.0001) for describing samples when using
(Manoukian, 1986) was carried out to identify significant differ- RATA questions compared to CATA questions (Table 2a). In these
ences among samples for each of the sensory terms. Pairwise com- studies, the percentage of terms for which frequency of use signif-
parisons were performed using the sign test, as proposed by icantly increased ranged from 38% to 67% (Table 2b). The average
Meyners, Castura, and Carr (2013) and Meyners and Castura increase in the frequency of use of the terms ranged between 17
(2014). and 55% (Table 2c). An exception to these trends was Study 3, in
RATA scores were analysed following the recommendations of which no significant differences in the frequency of use of the
Meyners et al. (2016). ANOVA was performed considering sample terms between CATA and RATA questions were found, both at
and consumer as fixed effects. Besides, the t statistic for all pair- the aggregate level and for all the individual terms
wise comparisons of samples based on the pooled variances was (Table 2a and c). Compared to the rest of the studies, this study
computed. included fewer samples with larger differences (3 peanut samples:
dry roasted, honey coated and salted).
2.4.3. Sample configurations For completeness, the distribution of RATA intensity scores is
Correspondence analysis (CA) was performed on the frequency shown in Table 2d. In six of the seven studies (except Study 5)
table of CATA and RATA-as-CATA data. CA was performed consid- the middle point of the intensity scale (2: ‘medium’) was the most
ering chi-square distances, as recommended by Vidal, Tárrega, frequently used, reaching an average frequency of use that ranged
Antúnez, Ares, and Jaeger (2015). Sample configurations were not from 13% to 21%. However, the ‘low’ intensity anchor was almost
obtained in Study 3, as only 3 samples were evaluated. as frequently used.
52 L. Vidal et al. / Food Quality and Preference 67 (2018) 49–58
Table 2
Summary of results regarding term usage for sensory characterizations by consumers obtained with CATA and RATA questions across seven studies.
Study ID
1- Apple 2- Apple 3-Peanuts 4- Tinned 5 – Raspberry 6 – Fruitcake 7 – Powdered
pineapple coulis drinks
Term usage
a. Average percentage of terms used for describing samples* CATA: 27%a CATA: 26%a CATA: 38%a CATA: 41%a CATA: 30%a CATA: 44%a CATA: 26%a
RATA: 39%b RATA: 36%b RATA: 38%a RATA: 51%b RATA: 36%b RATA: 51%b RATA: 31%b
b. Percentage of terms for which frequency of use significantly 63% 63% 0% 58% 67% 42% 38%
increased when using RATA compared to CATA
c. Average increase in the frequency of use of the terms when 55% 46% 10% 33% 24% 17% 36%
using RATA compared to CATA
d. Distribution of intensity scores in RATA questions** 0: 61% 0: 64% 0: 62% 0: 51% 0: 64% 0: 49% 0: 69%
1: 12% 1: 12% 1: 14% 1: 18% 1: 16% 1: 19% 1: 8%
2: 18% 2: 15% 2: 16% 2: 21% 2: 13% 2: 20% 2: 13%
3: 9% 3: 9% 3: 8% 3: 10% 3: 7% 3: 12% 3: 11%
*
Percentages with different letters are significantly different at p < 0.05, according to Fishers’ exact test.
**
For RATA scores 0 = ’not applicable’, 1 = ’low’, 2 = ’medium’ and 3 = ’high’.
Table 3
Percentage of terms with significant differences among samples and significant pairwise comparisons for CATA questions, RATA questions (RATA scores), and RATA when treated
as CATA (RATA-as-CATA) for a 5% significance level. A comparison of the results obtained using CATA and RATA questions for significant differences among samples and pairwise
comparisons are also shown.
Study ID Percentage of terms with Comparison of results obtained using Percentage of significant Comparison of results obtained using
significant differences CATA and RATA questions regarding pairwise comparisons CATA and RATA questions regarding
percentage of terms with significant percentage of significant pairwise
differences among samples comparisons
CATA RATA-as- RATA Both None CATA RATA scores CATA RATA-as- RATA Both None CATA RATA scores
CATA scores only only CATA scores only only
1 – Apple 69 63 69 56 19 13 13 33 33 38 26 55 7 11
2 – Apple 75 50 63 56 19 19 6 35 22 31 24 68 11 6
3 – Peanuts 83 75 92 83 8 0 8 61 53 61 50 28 11 11
4 – Tinned pineapple 100 75 83 83 0 17 0 46 32 51 31 33 15 21
5 – Raspberry coulis 75 58 50 50 25 25 0 36 29 30 23 57 13 8
6 – Fruit cake 100 92 100 100 0 0 0 55 53 63 46 28 9 17
7 – Powdered drinks 31 50 50 25 44 6 25 27 34 34 18 56 9 17
Table 4
Overview of differences in sample discrimination by CATA and RATA questions, both in terms of ability to identify significant differences among samples and the number of
significant pairwise comparisons for each of the seven consumer studies. For each term, the maximum and minimum frequency of use across samples (%) for CATA questions and
maximum and minimum mean score across samples for RATA questions are shown between brackets.
Study ID Attributes with greater sample discrimination in CATA questions Attributes with greater sample discrimination in RATA
questions
1 – Apple Astringent/drying (15–42/0.1–0.3), Pink-ish coloured skin Firm (7–44/0.4–1.2), Floral (20–28/0.3–0.9), Green/grassy
(2–22/0.3–0.5), Tough skin (30–65/1.1–1.7) (6–16/0.1–0.4)
2 – Apple Bland (6–23/0.2–0.5), Firm (17–38/0.4–0.8), Lingering flavour Pink-ish coloured skin (9–17/0.3–0.6), Sweet (26–72/0.8–1.8)
(11–32/0.4–0.5), Tough skin (19–53/0.5–1.1),
Uneven spread of skin colours (11–45/0.4–0.9)
3 – Peanuts Bland (0–58/0.1–1.3), Crunchy (80–98/1.7–2.1), Roasted Visible spices/salt (51–66/0.8–1.4)
(58–85/1.0–1.4)
4 – Tinned pineapple Fibrous (21–41/0.9–1.2), Fresh (16–36/0.6–0.7), Mango Crunchy (34–63/0.6–1.4), Flavoursome (16–50/0.6–1.3),
flavour (0–2/0.1–0.3), Off-flavour (14–38/0.4–0.9) Juicy (45–71/1.0–1.7),
Pineapple flavour (55–79/1.1–1.7), Ripe (29–54/0.7–1.6),
Sweet (37–73/0.7–1.8),
Yellow colour (48–86/0.8–2.7)
5 – Raspberry coulis Boysenberry (17–35/0.4–0.7), Fruitiness (19–52/0.9–1.3), Green Jammy (10–39/0.4–0.8), Sour (41–92/0.8–2.8), Strawberry
grape (9–24/0.2–0.4), Green stalks (9–28/0.1–0.4), Plum (6–23/0.1–0.4)
(15–35/0.3–0.5), Sweet (9–58/0.2–1.2)
6 – Fruit cake Crumbly (3–31/0.1–0.5), Dates (13–53/0.2–0.9), Doughy Baking spices (22–75/0.3–2.0), Fruity (18–75/0.5–2.3),
(22–62/0.6–1.1) Golden syrup (36–54/0.7–1.6),
Moist (27–76/0.7–1.9), Raisins/sultanas (36–98/0.5–2.6),
Sticky (40–91/0.8–2.0)
7 – Powdered drinks Mandarin flavour (10–31/0.4–0.8) Sour (20–88/0.5–2.2), Intense flavour (41–59/0.8–1.5),
Off-flavour (12–31/0.3–0.4),
Smooth (2–12/0.1–0.6), Concentrated (35–53/0.4–1.3),
Diluted (0–10/0.0–0.8)
3.3. Sample and term configurations configurations were high. For example, in Study 2 when samples
were evaluated using CATA questions samples S2 and S3 were
Across the six studies for which sample configurations were located away from the other two samples (Fig. 3b). This same
obtained, the percentage of variance explained by the first two information was also obtained when RATA data were treated as
dimensions of Correspondence Analysis (CA) and Principal Compo- CATA. However, when RATA scores were considered and analysed
nent Analysis (PCA) ranged between 75.3% and 95.7%. using PCA on arithmetic or Dravnieks’ means, sample S2 was
As shown in Table 5a, in four of the six studies (Studies 2, 5, 6 located apart from sample S3 and close to the other two samples.
and 7) sample configurations from CATA and RATA questions Similarly, in Study 4 sample S3 was located close to sample S2
tended to be similar, regardless of the data analysis approach for when RATA scores were considered, whereas it was located on
RATA. In these studies the RV coefficients between sample config- the opposite side of the first dimension on the CA performed on
urations obtained using CATA and RATA were higher than 0.80. In data from CATA questions or when RATA data were treated as
the remaining two studies (1 and 4) the RV coefficients between CATA (Fig. 3c).
sample configurations were lower than 0.80, which indicates Regarding sample discrimination in the first two dimensions of
potential differences in the conclusions regarding similarities and sample configurations, no clear superiority of one methodology
differences among samples between RATA and CATA questions. over the other was observed. In some of the studies sample dis-
Fig. 3 shows sample configurations for four studies. Sample con- crimination was higher for RATA based on intensity scores than
figurations were almost identical for CATA and RATA questions. As for CATA questions. For example, in Studies 4 and 6
an example, Fig. 3a and d show the high similarity of sample con- (Fig. 3c and d, respectively) the confidence ellipses of some of the
figurations obtained in Studies 1 and 6, respectively. Differences in samples were overlapped in sample configurations obtained using
conclusions regarding similarities and differences were identified CATA questions, whereas they did not overlap in the configurations
in some of the studies, even if the RV coefficient between sample obtained by analysing the scores obtained using RATA questions.
L. Vidal et al. / Food Quality and Preference 67 (2018) 49–58 55
Table 5
Summary of results regarding sample and term configurations for sensory characterizations with consumers obtained with CATA and RATA questions across six of the seven
studies. RATA data were analysed as CATA (RATA-as-CATA) and using PCA based on mean RATA scores (RATA PCA) and Dravnieks’ means (RATA Dravnieks).
Study ID
1- 2- 4- Tinned 5 – Raspberry 6– 7 – Powdered
Apple Apple pineapple coulis Fruitcake drinks
Sample configurations
a. RV between sample configurations
CATA vs. RATA-as-CATA 0.75 0.99 0.80 0.96 0.99 0.98
CATA vs. RATA PCA 0.71 0.84 0.63 0.96 0.88 0.94
CATA vs. RATA Dravnieks 0.61 0.81 0.64 0.96 0.88 0.95
RATA-as-CATA vs. RATA PCA 0.90 0.84 0.76 0.99 0.85 0.97
RATA-as-CATA vs. RATA Dravnieks 0.89 0.85 0.73 0.99 0.86 0.97
RATA PCA vs RATA Dravnieks 0.98 0.94 0.99 1.00 1.00 1.00
b. RV between term configurations CATA vs. RATA-as-CATA 0.34 0.57 0.56 0.92 0.95 0.77
CATA vs. RATA PCA 0.43 0.38 0.23 0.72 0.71 0.52
CATA vs. RATA Dravnieks 0.40 0.34 0.20 0.72 0.71 0.53
RATA-as-CATA vs. RATA PCA 0.71 0.67 0.49 0.76 0.60 0.78
RATA-as-CATA vs. RATA Dravnieks 0.72 0.71 0.46 0.77 0.60 0.79
RATA PCA vs RATA Dravnieks 0.98 0.92 0.99 1.00 0.99 0.97
Stability of sample configurations
c. Average RV coefficient of sample CATA 0.97 0.94 0.93 0.98 0.98 0.96
configurations across simulations for RATA-as-CATA 0.95 0.93 0.90 0.97 0.98 0.98
the total number of consumers RATA PCA 0.92 0.86 0.94 0.94 0.95 0.95
RATA Dravnieks 0.91 0.82 0.93 0.94 0.99 0.96
d. Average RV coefficient of term CATA 0.85 0.83 0.90 0.95 0.96 0.85
configurations across simulations RATA-as-CATA 0.86 0.81 0.84 0.93 0.97 0.87
for the total number of consumers RATA PCA 0.83 0.72 0.87 0.90 0.94 0.77
RATA Dravnieks 0.81 0.68 0.86 0.90 0.93 0.78
Notes. Study 3 was excluded from these analyses because only 3 samples were evaluated.
On the contrary, in other studies the opposite trend was found, as compared to the simple CATA questions. A similar trend was
exemplified in Fig. 3b for Study 2. observed for term configurations in Study 7. On the contrary, none
Finally, it can also be seen from Fig. 3b and c that in several of the studies showed a clearly superior stability of sample or term
studies the confidence ellipses obtained using truncated total boot- configurations for RATA over CATA questions. However, differences
strapping tended to be larger in the configurations obtained in the type of data involved in CATA and RATA should be taken into
through the analysis of RATA scores than in the configurations account. As RATA is analysed considering a 4-point scale, variabil-
obtained using CATA data or by treating RATA-as-CATA. ity can and usually will be larger than for CATA, which consists of
The RV coefficients between term configurations in the first and binary data. Therefore, the variability of RATA scores encountered
second dimensions obtained from CATA and RATA, analysed using in the boostrapping approach can be expected to be higher than
PCA on arithmetic and Dravnieks’ means, were lower than those that of CATA data, such that a lower average RV coefficient across
from sample configurations (Table 5a and b). In four of the six simulations is to be expected. This is also supported by the fact the
studies the RV coefficients between term configurations of CATA RV coefficients for sample as well as for term configurations from
and RATA were lower than 0.60, which indicates differences in CA treating RATA-as-CATA is typically larger (similar if RV
the way in which consumers used the terms for describing samples approaches 1) than those using PCA on RATA data.
using the two methodologies. In Studies 5 and 6 the RV coefficient
between term configurations from CATA questions and RATA-as-
CATA were higher than 0.90, suggesting good agreement. In these 4. Discussion
studies the RV coefficient between term configurations of CATA
and RATA scores were close to 0.70, which indicated moderate The present work further explored the use of RATA questions, a
agreement. rating variant of CATA questions, for sensory characterization with
Regarding the different statistical approaches used to analyse consumers by comparing RATA and CATA across seven studies with
RATA scores, results did not largely differ. Sample configurations different product categories. The key findings are discussed below.
obtained using PCA on arithmetic or Dravnieks’ means were highly
similar, as evidenced by the RV coefficients being higher than 0.94 4.1. CATA and RATA term use and perceived attribute intensity
in five of the six studies (Table 5a). When RATA data were consid-
ered as CATA and analysed using CA, sample configurations were With regard to term use, it was found in six of the seven studies
similar to those obtained using PCA on arithmetic or Dravnieks’ that asking consumers to rate the intensity of the terms they
RATA means (RV > 0.73). However, when RATA data were analysed selected as applicable (i.e., RATA) led to an increase in the total
as CATA, the RV between term configurations were lower, particu- number of selected terms, confirming results from previous studies
larly in Studies 4 and 6 (Table 5b). (Ares, Bruzzone, et al., 2014). The increase in frequency of use was
As shown in Table 5c and d, the average RV coefficient of sample found for the majority of the terms included in the CATA/RATA
and term configurations for a sample size equal to the total num- question. These results can be attributed to two potential effects.
ber of consumers in the studies were similar for CATA and RATA First, the greater cognitive effort necessary to answer RATA ques-
questions in the six studies, regardless of the approach used for tions compared to CATA questions may have discouraged con-
RATA data analysis. However, in Study 2, the average RV coefficient sumers from using satisficing response strategies (Sudman &
of sample and term configurations tended to be lower for RATA Bradburn, 1992). Secondly, the use of a rating step may have
questions, analysed using both arithmetic and Dravnieks’ means, caused a change in the cognitive strategy used by consumers to
56 L. Vidal et al. / Food Quality and Preference 67 (2018) 49–58
studies that compared frequency of use of CATA terms and attri- differences in attribute intensities between samples are rather
bute intensities measured using structured and unstructured ‘‘smaller” than ‘‘larger”, but how big these magnitudes of differ-
scales (Ares et al., 2015; Bruzzone et al., 2012, 2015). This suggests ences should be is unknown and likely to be product specific.
that the concern that prompted the development of hybrid CATA- Despite somewhat higher sample discrimination for certain
rating methods in the first instance, including RATA, may thus be attributes being achievable by RATA such an outcome is not neces-
exaggerated. Albeit in an indirect manner, the use of CATA ques- sarily better in consumer testing. For example, RATA questions may
tions by consumers can deliver measurements of perceived inten- place consumers in a mind-set where they pay greater attention to
sity of sensory attributes. attributes than they would in natural eating situations, making the
method too sensitive. Conversely, CATA questions may allow a
4.2. Sample characterisation and discrimination more spontaneous evaluation, but be a bit less sensitive because
they encourage less attention to samples. Although the use of dif-
Across the seven studies, CATA and RATA questions did not differ ferent testing protocols to enhance detection of sample differences
in the identification of the most salient sensory characteristics of the has previously been shown (e.g., for hedonic testing using monadic
products and conclusions regarding significant differences among vs. side-by-side sample presentation: McBride, 1986), it was noted
samples were identical for the majority of the terms and pairwise long ago that the practical importance of these differences could be
comparisons. Sample and term configurations obtained using fre- exaggerated (Amerine, Pangborn, & Roessler, 1965).
quency of term use in CATA questions and RATA scores were also
similar. Thus, neither method was clearly superior to the other, in
4.3. Recommendations regarding the analysis of RATA data
agreement with results reported by Reinbach et al. (2014).
As possibly indicated by the results of Study 3, there were test
Results from the present work also provided recommendations
specific situations where RATA did not lead to greater sample dis-
for the analysis of RATA data. First of all, treating RATA-as-CATA
crimination than CATA, such as when a few samples with large dif-
led to a consistent decrease in sample discrimination. Although
ferences are tested using a low number of terms. This was a
analysing RATA questions as if they were CATA questions is a prac-
distinguishing feature of Study 3 relative to the other six studies.
tice previously reported in literature (e.g. Oppermann et al., 2017),
In such instances, the additional attention to the task that RATA
our results suggest that practitioners should refrain from using this
encourages may not be needed.
approach.
In instances where differences between CATA and RATA results
However, no major differences were found between the sample
were found, a few general trends were noted. CATA questions
and term configurations obtained using PCA on arithmetic or Drav-
tended to be slightly more discriminative than RATA questions
nieks’ means, in agreement with results reported by Meyners et al.
for terms related to minor sensory characteristics, attributes that
(2016). Therefore, considering that no evidence was obtained jus-
appeared in low intensity for only few samples and attributes that
tifying the extra computational effort involved in the calculation
may be less simple to quantify for consumers. If this result is
of Dravnieks’ means, sample and term configurations from RATA
robust and CATA/RATA method differences systematically exist
data can be obtained using arithmetic means.
for attributes that are present in low intensities and possibly only
Further research on how to analyse RATA data is, however, still
for a subset of samples, are hard for consumers to quantify or pos-
necessary. One of the specific aspects that should be investigated is
sibly ambiguous, then it may also be relevant to consider if a need
how to convert RATA intensity responses to numerical values. In
exists for revision of the terms used in a focal study. Should terms
the present work it was decided, somewhat arbitrarily, to assign
that are ambiguous (e.g., off-flavour) be used at all? What about
‘not applicable’ to 0, ‘low intensity’ to 1, ‘moderate intensity’ to 2
specific and technical sensory terms such as pungent, mineral or
and ‘high intensity’ to 3. Whether this is appropriate is open for
brittle? Are they compatible with the seemingly agreed upon but
discussion. Labelled magnitude scales (e.g., Labelled Magnitude
ill-defined notion that the terms used in CATA/RATA should be
Scale (LMS: Green et al., 1996) or Labelled Affective Magnitude
‘‘consumer friendly”?
scale (LAM: Cardello & Schutz, 2004) show that intensity cate-
RATA tended to be slightly more discriminative than CATA for
gories are not linearly spaced. It is also not obvious that setting
terms that were related to salient sensory characteristics that were
‘not applicable’ to 0 is ‘‘correct” in the sense that it equates the dis-
more or less applicable to describe all samples. These characteris-
tance between ‘not applicable’ and ‘low’ intensity to the distance
tics may be familiar/common to participants, who may find it easy
between ‘low’ and ‘medium’ intensity. Besides, the 0 on this scale
to quantify their intensity using rating scales. Therefore, in terms of
does not necessarily mean complete absence of the attribute, but
sample discrimination, RATA questions may offer a slight improve-
it could mean that the intensity is ‘‘below a certain threshold”,
ment over CATA questions for sample sets with subtle perceptual
and that threshold might differ from the one that assessors might
differences in which samples are not expected to differ in the type
apply in a classical descriptive analysis due to the nature of the
of terms that apply but in the intensity of their sensory character-
task. To investigate whether equating a non-ticked attribute to a
istics. Further research seems necessary to confirm this hypothesis.
0 value is a reasonable assumption would require, for instance,
Overall, although RATA and CATA tended to perform similarly,
the comparison between RATA and a descriptive analysis with a
they may have strengths for different types of attributes when it
similar group, on a 0:4 scale. Doing so is beyond the scope of this
comes to identifying differences among samples. Evidence of this
research.
had not been reported earlier. In this sense, further research com-
paring the discrimination of CATA and RATA questions for specific
sensory characteristics with other methodologies can contribute 5. Conclusions
to our understanding of the differences between the methods,
enabling informed decisions by practitioners. For example, it would To the best of our knowledge, this is the most exhaustive com-
be relevant to establish if RATA yields sensory profiles that are more parison of CATA and RATA to date. Results from this work show
similar to those from descriptive panels, which could point to the that RATA questions are not necessarily an improvement over
method placing consumers in a more analytical mind-set. It would CATA questions in terms of sample discrimination and attribute
also be relevant to establish if sensory characteristics present in all intensity measurement, thereby failing to confirm the superiority
the samples in a study, say sweet and acid/sour in apple, are always reported by Ares, Bruzzone et al. (2014). Both methodologies seem
best assessed with RATA. Tentatively, this may be the case when to be able to identify the main similarities and differences in the
58 L. Vidal et al. / Food Quality and Preference 67 (2018) 49–58
sensory characteristics of samples, but may have advantages for and its comparison to classical external preference mapping. Food Quality and
Preference, 21, 394–401.
the identification of differences among samples in different types
Dos Santos, B. A., Bastianello Campagnol, P. C., da Cruz, A. G., Galvão, M. T. E. L.,
of attributes. The decision to add a rating step to a CATA question Monteiro, R. A., Wagner, R., et al. (2015). Check all that apply and free listing to
depends on the aim of the study and the specific characteristics of describe the sensory characteristics of low sodium dry fermented sausages:
the sample set. In cases where differences among samples rely on Comparison with trained panel. Food Research International, 76, 725–734.
Dravnieks, A. (1982). Odor quality: Semantically generated multidimensional
the absence or presence of the attributes on the list, CATA ques- profiles are stable. Science, 218, 799–801.
tions should be preferred, as it is a less analytical and thus more Fisher, R. A. (1954). Statistical methods for research workers. Edinburgh: Oliver and
natural task for consumers. RATA questions may only be recom- Boyd.
Franco-Luesma, E., Sáenz-Navajas, M.-P., Valentin, D., Ballester, J., Rodrigues, H., &
mended when the aim of the study is to assess sets of samples Ferreira, V. (2016). Study of the effect of H2S, MeSH and DMS on the sensory
which differ in the relative intensity of salient sensory characteris- profile of wine model solutions by Rate-All-That-Apply (RATA). Food Research
tics that are familiar to consumers and apply to describe most of International, 87, 152–160.
Giacalone, D., & Hedelund, P. I. (2016). Rate-all-that-apply (RATA) with semi-
the focal samples. The results also indicate that collecting RATA trained assessors: An investigation of the method reproducibility at assessor-,
data but analysing them as CATA data should be avoided. attribute- and panel-level. Food Quality and Preference, 51, 65–71.
Green, B., Dalton, P., Cowart, B., Shaffer, G., Rankin, K., & Higgins, J. (1996).
Evaluating the ‘Labeled Magnitude Scale’ for measuring sensations of taste and
Acknowledgements smell. Chemical Senses, 21, 323–334.
Jaeger, S. R., & Ares, G. (2014). Lack of evidence that concurrent sensory product
Staff at Plant & Food Research are thanked for help in characterisation using CATA questions bias hedonic scores. Food Quality and
Preference, 35, 1–5.
planning and collection of data, in particular S.L. Chheang, D. Jin,
Jaeger, S. R., Chheang, S. L., Yin, J., Bava, C. M., Gimenez, A., Vidal, L., et al. (2013).
M.K. Beresford, and K. Kam. Financial support was received Check-all-that-apply (CATA) responses elicited by consumers: Within-assessor
from Comisión Sectorial de Investigación Científica (Universidad reproducibility and stability of sensory product characterizations. Food Quality
de la República – Uruguay) and The New Zealand Ministry for and Preference, 30, 56–67.
Jaeger, S. R., Giacalone, D., Roigard, C. M., Pineau, B., Vidal, L., Giménez, A., et al.
Business, Innovation & Employment and Plant & Food Research. (2013). Investigation of bias of hedonic scores when co-eliciting product
attribute information using CATA questions. Food Quality and Preference, 30,
References 242–249.
Lawless, H. T., & Heymann, H. (2010). Sensory evaluation of food. Principles and
practices (2nd ed.). New York: Springer.
Amerine, M. A., Pangborn, R. M., & Roessler, E. B. (1965). Principles of sensory Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate
evaluation of food (p. 427). New York: Academic Press. analysis. Journal of Statistical Software, 25(1), 1–18.
Ares, G. (2015). Methodological challenges in sensory characterization. Current
Lelièvre, M., Chollet, S., Abdi, H., & Valentin, D. (2008). What is the validity of the
Opinion in Food Science, 3, 1–5. sorting task for describing beers? A study using trained and untrained assessors.
Ares, G., Antúnez, L., Bruzzone, F., Vidal, L., Giménez, A., Pineau, B., et al. (2015).
Food Quality and Preference, 19, 697–703.
Comparison of sensory product profiles generated by trained assessors and Manoukian, E. B. (1986). Mathematical nonparametric statistics. New York, NY:
consumers using CATA questions: Four case studies with complex and/or Gordon & Breach.
similar samples. Food Quality and Preference, 45, 75–86.
McBride, R. L. (1986). Hedonic rating of food: single or side-by-side sample
Ares, G., Antúnez, L., Giménez, A., Roigard, C. M., Pineau, B., Hunter, D. C., et al. presentation? Journal of Food Technology, 21, 355–363.
(2014). Further investigations into the reproducibility of check-all-that-apply
Meiselman, H. L. (2013). The future in sensory/consumer research: Ellipsis Ellipsis
(CATA) questions for sensory product characterization elicited by consumers. EllipsisEllipsisevolving to a better science. Food Quality and Preference, 27,
Food Quality and Preference, 36, 111–121. 208–214.
Ares, G., Bruzzone, F., Vidal, L., Cadena, R. S., Giménez, A., Pineau, B., et al. (2014b). Meyners, M., Castura, J. C., & Carr, B. T. (2013). Existing and new approaches for the
Evaluation of a rating-based variant of Check-All-That-Apply questions: Rate- analysis of CATA data. Food Quality and Preference, 30, 309–319.
All-That-Apply (RATA). Food Quality and Preference, 36, 87–95.
Meyners, M., & Castura, J. C. (2014). Check-all-that apply questions. In P. Varela & G.
Ares, G., Etchemendy, R., Antúnez, L., Vidal, L., Giménez, A., & Jaeger, S. (2014). Ares (Eds.), Novel techniques in sensory characterization and consumer profiling
Visual attention by consumers to check-all-that-apply questions: Insights to (pp. 271–305). Boca Raton, FL: CRC Press.
support methodological development. Food Quality and Preference, 32, 210–220. Meyners, M., Jaeger, S. R., & Ares, G. (2016). On the analysis of Rate-All-That-Apply
Ares, G., & Jaeger, S. R. (2015). Check-all-that-apply (CATA) questions with (RATA) data. Food Quality and Preference, 49, 1–10.
consumers in practice. Experimental considerations and impact on outcome.
Moussaoui, K. A., & Varela, P. (2010). Exploring consumer product profiling
In J. Delarue, J. B. Lawlor, & M. Rogeaux (Eds.), Rapid sensory profiling techniques techniques and their linkage to a quantitative descriptive analysis. Food
and related methods (pp. 227–245). Sawston, Cambridge: Woodhead Publishing.
Quality and Preference, 21, 1088–1099.
Ares, G., Tárrega, A., Izquierdo, L., & Jaeger, S. R. (2014). Investigation of the number Oppermann, A. K. L., de Graaf, C., Scholten, E., Stieger, M., & Piqueras-Fiszman, B.
of consumers necessary to obtain stable sample and descriptor configurations (2017). Comparison of Rate-All-That-Apply (RATA) and Descriptive sensory
from check-all-that-apply (CATA) questions. Food Quality and Preference, 31, Analysis (DA) of model double emulsions with subtle perceptual differences.
135–141. Food Quality and Preference, 56, 55–68.
Ares, G., & Varela, P. (2014). Comparison of novel methodologies for sensory
Core Team, R. (2015). R: A language and environment for statistical computing. Vienna,
characterization. In P. Varela & G. Ares (Eds.), Novel techniques in sensory Austria: R Foundation for Statistical Computing.
characterization and consumer profiling (pp. 365–389). Boca Raton: CRC Press. Reinbach, H. C., Giacalone, D., Ribeiro, L. M., Bredie, W. L. P., & Frøst, M. B. (2014).
Bruzzone, F., Ares, G., & Giménez, A. (2012). Consumers’ texture perception of milk Comparison of three sensory profiling methods based on consumer perception:
desserts. II- Comparison with trained assessors’ data. Journal of Texture Studies, CATA, CATA with intensity and nappingÒ. Food Quality and Preference, 32,
43, 214–226.
160–166.
Bruzzone, F., Vidal, L., Antúnez, L., Giménez, A., Deliza, A., & Ares, G. (2015). Robert, P., & Escoufier, Y. (1976). A unifying tool for linear multivariate statistical
Comparison of intensity scales and CATA questions in new product
methods: The RV coefficient. Applied Statistics, 25, 257–265.
development: Sensory characterisation and directions for product Sudman, S., & Bradburn, N. M. (1992). Asking questions. San Francisco, CA: Jossey-
reformulation of milk desserts. Food Quality and Preference, 44, 183–193. Bass.
Cadena, R. S., Caimi, D., Jaunarena, I., Lorenzo, I., Vidal, L., Ares, G., et al. (2014). Varela, P., & Ares, G. (2012). Sensory profiling, the blurred line between sensory and
Comparison of rapid sensory characterization methodologies for the consumer science. A review of novel methods for product characterization. Food
development of functional yogurts. Food Research International, 64, 446–455.
Research International, 48, 893–908.
Cadoret, M., & Husson, F. (2013). Construction and evaluation of confidence ellipses Vidal, L., Tárrega, A., Antúnez, L., Ares, G., & Jaeger, S. R. (2015). Comparison of
applied at sensory data. Food Quality and Preference, 28, 106–115. Correspondence Analysis based on Hellinger and chi-square distances to obtain
Cardello, A. V., & Schutz, H. G. (2004). Numerical scale point locations for sensory spaces from Check-All-That-Apply (CATA) questions. Food Quality and
constructing the LAM (labeled affective magnitude) scale. Journal of Sensory Preference, 43, 106–112.
Studies, 19, 341–346.
Waehrens, S. S., Zhang, S., Hedelund, P. I., Petersen, M. A., & Byrne, D. V. (2016).
Dehlholm, C., Brockhoff, P. B., Mejnert, L., Aaslyng, M. D., & Bredie, W. L. P. (2012). Application of the fast sensory method ‘Rate-All-That-Apply’ in chocolate
Rapid descriptive sensory methods – comparison of free multiple sorting,
Quality Control compared with DHS-GC-MS. International Journal of Food Science
partial napping, napping, flash profiling and conventional profiling. Food Quality and Technology, 51, 1877–1887.
and Preference, 26, 267–277.
Dooley, L., Lee, Y. S., & Meullenet, J. F. (2010). The application of check-all-that-
apply (CATA) consumer profiling to preference mapping of vanilla ice cream