Public Transport Service Quality Analysis
Public Transport Service Quality Analysis
Transport Policy
journal homepage: www.elsevier.com/locate/tranpol
art ic l e i nf o a b s t r a c t
Article history: Customer Satisfaction Surveys (CSS) have become an important tool for public transport planners, as
Received 5 January 2016 improvements in the perceived quality of certain service attributes can lead to greater use of public
Received in revised form transport and lower traffic pollution. The literature shows that the importance of quality attributes has
11 March 2016
until now been estimated indirectly, as they are derived from the Customer Satisfaction Index using
Accepted 4 April 2016
various different and complex techniques. Little work has been dedicated to its direct estimation (stated
importance) by designing ad-hoc surveys, an approach that represents a considerable reduction in the
Keywords: length of the questionnaire.
Public transport This paper contributes to the limited existing literature by developing a survey technique based on
Customer satisfaction Surveys (CSS)
hierarchy processes to estimate the stated importance of quality attributes, and compares the results
Service Quality (SQ)
with the derived importance obtained using conventional surveys with the same sample. The added
User Perception
Factorial Analysis value of this research is that it provides the first comparison between two quality survey methods using
MIMIC models the same real case study in Madrid (Spain). The results achieved using this pioneer survey method (293
valid questionnaires) were validated using conventional face-to-face surveys (520 valid questionnaires).
Factorial analysis, multiple regression analysis and Multiple Indicators Multiple Causes (MIMIC) models
were applied to the conventional survey sample to analyse and derive the importance of the attributes.
The results clearly show that, after a few teething troubles, the stated importance of quality attributes
can be estimated directly, thus providing transport management companies with a simple and useful tool
to implement in their Customer Satisfaction Surveys (CSS), and narrowing the gap between practitioners’
needs and scientific research.
& 2016 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tranpol.2016.04.003
0967-070X/& 2016 Elsevier Ltd. All rights reserved.
B. Guirao et al. / Transport Policy 49 (2016) 68–77 69
survey depends strongly on the approach used to estimate the principles guide problem-solving using the AHP: decomposition,
relative importance of the attributes to the customers. In con- comparative judgments and synthesis of priorities. In our service
ventional CSS designed by companies to obtain a general sa- quality case study, the decomposition is based on the selection of
tisfaction index (CSI), it is necessary to consider both the attribute- the attributes to be ranked and on the comparative judgments
performance rating and attribute-importance measures when the given by the surveys. The AHP priorities are synthesised from the
operator's priority is to improve or sustain the current overall SQ. second level down by multiplying local priorities by the priority of
This dual target often requires a long questionnaire, although re- their corresponding criterion in the level above, and adding a level
searchers only use the results of the first part (attribute-perfor- for each element according to the criteria it affects. This gives the
mance rating), as the attribute importance can be indirectly de- composite or global priority of that element, which in turn is used
rived from the attribute-performance rating. This problem has to weight the local priorities of the elements in the level below.
already been debated in the literature, as described below. Aydin et al. (2015) recently used a type of AHP methodology,
Weinstein (2000) was the first author to clearly distinguish two FAHP (Fuzzy Analytic Hierarchy Process), to measure the perfor-
main approaches to estimate attribute importance: stated im- mance of rail transit lines in Istanbul; however the FAHP was
portance and derived importance. Stated importance involves applied to fix the weights of the main “criteria” (train comfort,
asking customers to rate each attribute on a scale of importance; ticketing, information system, accessibility, station comfort, fare
this is the more intuitive and direct of the two methods, but re- and time) based on the unbiased opinions of experts. The weights
quires a significant increase in the length of the questionnaire of the sub-criteria were simultaneously calculated by trapezoidal
(which can lower the overall response rate and the accuracy of the fuzzy numbers based on customer responses. The FAHP was
survey). It can also sometimes fail to differentiate sufficiently be- therefore not directly applied to the CSS itself, as in our case.
tween mean importance ratings; if customers score nearly all the In designing the survey questionnaire using an AHP process,
measures near the top of the scale, certain attributes may be rated some practical ideas have been borrowed from the stated pre-
as important even though they in fact have little influence on ferences experiments in transportation described by Saako (2001)
overall satisfaction. As this is the more intuitive and direct of the in order to collect useful data with as little bias as possible. Stated
two methods, operating companies have tended to use this type of preference surveys have been used in transportation to analyse
questionnaire, while the scientific research has focused on more alternative trip choices (each alternative is composed of various
complex methodologies using the derived importance approach. attributes), but we have found no literature that ranks simple
The derived importance approach is less intuitive and is based quality attributes, although statistically the problem to be solved is
on “deriving” a measure of attribute importance by statistically fairly similar. Ampt and Meyburg (1995) suggest a maximum of 9–
testing the strength of the relationship of individual attributes 16 options as acceptable in this type of stated preference surveys,
with overall satisfaction. A simple conventional attribute rating with most current designs now adopting the lower end of this
survey is needed to derive importance, and this type of ques- range. With a maximum of nine options for the respondent to
tionnaire is always included in the CSS. Recent literature is now set ponder, this severely limits the number of attributes that can be
on seeking other alternatives to the methods commonly used until considered. Our Customer Satisfaction surveys consider over 10–15
now to derive importance, namely; (a) bivariate Pearson correla- attributes, so this limitation must be overcome while allowing the
tions, (b) factor analysis, and (c) multiple regression analysis. consideration of more attributes and/or more attribute levels. One
These other alternatives include structural equation models (SEM), of the strategies proposed by Pearmain et al. (1991) in stated
based on a multivariate technique combining regression, factor preference surveys is to separate the options into “blocks”, so that
analysis and analysis of variance to estimate interrelated depen- the full choice set is completed by groups of respondents, but with
dence relationships simultaneously. This approach allows a phe- each group responding to a different sub-set of options. Each
nomenon to be modelled by considering both the unobserved group responds to a full-factorial design within each sub-set of
“latent” constructs and the observed indicators that describe the options, and the responses from the different sub-groups can be
phenomenon. SEM has also been adopted to measure customer assumed to be sufficiently homogeneous to provide the full picture
satisfaction in several public transport services such as me- when combined.
tropolitan public transport (Lai and Chen, 2011; Shen et al., 2016). As part of a research project led by the Madrid Polytechnic
More recently, de Oña et al. (2012) have used decision trees to University, the authors of this paper had the opportunity to design
derive attribute importance in public transport quality, and a new an ad-hoc CCS, based on this previous literature, in a Spanish case
methodology of “index numbers” has been developed to monitor study: the Madrid-Tres Cantos corridor, with four urban bus lines
the evolution of attribute importance throughout successive CSS (operated by the company ALSA). A new type of survey ques-
(de Oña et al., 2016). However, these last complex methodologies tionnaire (to state importance) was tested using a more sophisti-
are not based on stated attribute importance from the CSI, but on cated process of hierarchy, separating the options into blocks and
derived importance. As far as the authors are aware, there are no reducing the length of the survey questionnaire (not all users were
studies comparing the different methodologies for obtaining at- asked for the same attribute ranking). In order to validate this new
tribute importance using the same case study data (or even a stated importance method, a conventional survey was also re-
comparison between the most commonly used derived im- quired (designed to derive importance), and the whole campaign
portance methodologies). was based on face-to-face surveys (293 surveys to state attribute
The possibility of comparing techniques and estimating stated importance and 520 to derive importance). As the face-to-face
importance has been practically abandoned by academics, but survey campaign was starting to become very costly, additional
other survey formats could have been tested and studied, such as research based on Quick Response (QR) code surveys was also
ranking attributes using hierarchy process together with stated implemented in the study. A third type of questionnaire was
preference techniques. Analytic Hierarchy Process (AHP) is a gen- therefore designed for the QR survey (also derived-importance)
eral theory of measurement used to derive ratio scales from both and uploaded to the operating company's (ALSA) website. The QR
discrete and continuous paired comparisons (Saaty, 1987), which code is a simple way of providing the user with a virtual link to the
may be taken from actual measurements or from a basic scale that questionnaire in order to test how to reduce the cost of future SQ
reflects the relative strength of preferences and feelings. Pairwise survey campaigns using new Intelligent Transport Systems (ITS).
comparisons are fundamental in the use of AHP, although this The results of the QR research have been published recently
theory can be extrapolated to a three-option choice. Three (Guirao et al., 2015), and this article shows the results of the main
B. Guirao et al. / Transport Policy 49 (2016) 68–77 71
Table 2
Conventional survey collection per bus line. Sample rates and questionnaires collected per user and trip profile.
User activity
Working 112 (54.1%) 68 (58.6%) 17 (18.7%) 62 (58.5%) 259 (49.8%)
Unemployed 11 (5.3%) 6 (5.2%) 1 (1.1%) 2 (1.9%) 20 (3.8%)
Retired 26 (12.6%) 9 (7.8%) 6 (6.6%) 6 (5.7%) 47 (9.0%)
Student 43 (20.8%) 26 (22.4%) 67 (73.6%) 29 (27.4%) 165 (31.7%)
Other 15 (7.3%) 7 (6.0%) 0 (0.0%) 7 (6.6%) 29 (5.6%)
Ticket
Single 10 (4.8%) 6 (5.2%) 0 (0.0%) 7 (6.6%) 23 (4.4%)
10 trips 16 (7.7%) 10 (8.6%) 2 (2.2%) 5 (4.7%) 33 (6.3%)
Season ticket 176 (85.0%) 99 (85.3%) 89 (97.8%) 94 (88.7%) 458 (88.1%)
Other 5 (2.4%) 1 (0.9%) 0 (0.0%) 0 (0.0%) 6 (1.2%)
Frequency of trip
Z5 days 142 (68.6%) 84 (72.4%) 65 (71.4%) 73 (68.9%) 364 (70.0%)
3–4 days 22 (10.6%) 14 (12.1%) 13 (14.3%) 11 (10.4%) 60 (11.5%)
1–2 days 31 (15.0%) 9 (7.8%) 10 (11.0%) 13 (12.3%) 63 (12.1%)
Less than 1 d 12 (5.8%) 9 (7.8%) 3 (3.3%) 9 (8.5%) 33 (6.3%)
Trip purpose
Work 117 (56.5%) 65 (56.0%) 15 (16.5%) 63 (59.4%) 260 (50.0%)
Study 38 (18.4%) 23 (19.8%) 71 (78.0%) 25 (23.6%) 157 (30.2%)
Medical 11 (5.3%) 8 (6.9%) 0 (0.0%) 4 (3.8%) 23 (4.4%)
Leisure 10 (4.8%) 3 (2.6%) 0 (0.0%) 3 (2.8%) 16 (3.1%)
Other 31 (15.0%) 17 (14.7%) 5 (5.5%) 11 (10.4%) 64 (12.3%)
Age
r to 23 48 (23.2%) 22 (19.0%) 60 (65.9%) 30 (28.3%) 160 (30.7%)
From 23 to 35 59 (28.5%) 33 (28.4%) 19 (20.9%) 24 (22.6%) 135 (25.9%)
From 36 to 50 38 (18.4%) 30 (25.9%) 7 (7.7%) 29 (27.4%) 104 (20.0%)
Z50 62 (30.0%) 31 (26.7%) 5 (5.5%) 23 (21.7%) 121 (23.2%)
Gender
Male 66 (31.9%) 37 (31.9%) 33 (36.3%) 41 (38.7%) 177 (34.0%)
Female 141 (68.1%) 79 (68.1%) 58 (63.7%) 65 (61.3%) 343 (66.0%)
TOTAL 207 (39.8%) 116 (22.3%) 91 (17.5%) 106 (20.4%) 520 (100%)
service: the last three on the list (bus driving security, customer attention
from the bus driver and the possibility of sitting during the jour-
● Route (bus route). ney) were introduced at the request of the operating company and
● Connections (connection with other lines and transport modes). located in a different part of the conventional survey.
● Punctuality (on-time performance). The statistical mode and median of the results of the analysis of
● Frequency (timetable and headway). these abovementioned bus lines show that most of the variables
● Access (ease of access to the bus stop from origin –home, work, have an average and median with the semantic meaning “Good”.
university, etc.). Only the variable “Frequency” has a semantic value “Not Good” for
● Information-incidents (delays, breakdowns, changes in the line, the median, which indicates the importance of this variable and
etc.). how it is valued by respondents. The statistical analysis by line
● Cleanliness (cleanliness of the bus). does not reveal any substantial difference, except in the case of the
● Information-service (timetables, routes, etc.). valuation of ICTs by the users of bus 714, who describe it as “very
● Journey time (of the route). good”. A preliminary aggregated analysis of the conventional sur-
● Comfort (air conditioning, seating, etc.). vey is shown in Table 3, with the average rating of each attribute-
● Information and communication technologies (ICTs) (internet on performance. The three best rated attributes (over 7.0 out of 10.0)
board, mobile payment, real-time information screens both on are bus cleanliness, access to bus stops and the possibility of sitting
board and at stops). during the journey, while the three worst rated are ICTs, in-
● Shelters (along the route). formation about incidents and frequency. It should be noted that
● Bus driving security. global customer satisfaction on this line is high (7.0 out of a
● Customer attention from the bus driver. maximum score of 10.0). These data allow the importance of each
● Possibility of sitting during the journey. attribute to be estimated mathematically, although the most in-
Faced with the impossibility of developing a focus group of tuitive and direct method for operating companies would be to ask
corridor users, twelve attributes were selected based on a socio- the customers directly which attributes they consider more im-
logical study (on SQ attributes for periurban lines) carried out by portant from a general point of view – not necessarily linked to
the Madrid's Regional Transport Consortium (CRTM, 2005). Finally their trip experience – when answering the survey. Although
B. Guirao et al. / Transport Policy 49 (2016) 68–77 73
Table 3 (A2), 74 from Card 3 (A3), and 61 from Card 4 (A4). The number of
Average rating of each attribute-performance in Madrid-Tres Cantos corridor. cards collected per bus line ranged from 41.98% on Line 712–
18.77% on Line 714 (with 19.45% on Line 713% and 19.8% on Line
Rated variables Rating (over 10)
716). This means that each survey has an error of around 11% for
Cleanliness 7.72 high confidence intervals. However, as each card has a scale with
Access 4.57 several attributes, we obtained three pairs of discrete choices per
Possibility of sitting during the journey 7.48 user, and the error per survey thus drops to 6% for a confidence
Journey time 7.36
Customer attention from the bus driver 7.28
interval of 95.5%. Obviously, this simplification would have been
Comfort 7.04 unnecessary had we collected a higher number of stated im-
Connections 7.00 portance surveys, but does not invalidate the results. Moreover,
Punctuality 6.96 the user profile registered in the stated survey is consistent with
Bus driving security 6.86
the one obtained through the conventional survey and with the
Route 6.85
Information-service 6.81 information on demand provided by the operating company.
Shelters 6.75 Table 4 shows the structure of each card and a preliminary
Frequency 5.64 analysis of the attribute importance results depending on the
Information-incidents 4.59 number of times the attribute is in first, second and third position.
Information and communication technologies (ICTs) 3.28
Each time an attribute is in first place in a survey, it is assigned a
value of 3.0. This value is 2.0 in second place, and 1.0 in third place.
Table 4 shows the score given to each attribute for each type of
operating companies have tended to use – and most continue to card. The number of valid surveys obtained per card must be taken
use – this type of “stated format”, the required length of the into account to guarantee statistically robust results. Each card
questionnaire is excessive and can lower the overall response rate contains seven or eight attributes and it is also necessary to
and the accuracy of the survey. The following section contains a average (or weight) the number of times an attribute appears in
proposed design for a new type of stated important questionnaire the top three positions. For example, in card 1 the score for
together with its application to the case study. punctuality has been divided by 474 (the sum of all the scores in
this card); this percentage (out of ten) is shown in the last column
of Table 4. Once these values have been calculated, the scores are
4. A proposal for a stated importance survey aggregated for each attribute from two different cards, but con-
sidering the total range of scores; the highest score corresponds to
The stated importance survey was carried out in the Madrid- the “punctuality” attribute on card 1 (4.43) while the lowest
Tres Cantos corridor in March 2013, but on a different date from corresponds to “ICTs” on card 2. We therefore assigned the value
the conventional survey, in order to avoid biases or “contamina- 10.0 to the highest score and 0.0 to the lowest, interpolating the
tion” between them. The new questionnaire was designed to in- intermediate scores (see the last column in the table). Table 5
clude the same 15 attributes as in the conventional survey but shows the final aggregation per attribute, and the ranking of at-
these were offered to the customers in four different sub-sets of tributes in terms of their importance for users.
attributes (blocks) according to the literature review and in order Punctuality, frequency and driving security can be seen to be
to reduce the length of the survey. The customers were asked to the three most important attributes for customers, while ICTs, bus
identify the three most important attributes in each sub-set, and driver attention and incident information appear at the bottom of
to rank them in descending order of importance. This solution the table. According to Table 3, two of the least important attri-
allowed the number of attributes to be reduced to a smaller butes for users are also the worst rated in the conventional survey
ranking, thereby improving the reliability of the survey process. (ICTs and incident information). After defining this pioneer survey
The first questions in the survey concerned user and trip profile, tool, we validated and analysed our results using the conventional
and these were common to all the users surveyed. In contrast, the survey database for the same corridor.
attribute importance questions were organised in four scale cards
and the customers were assigned only one, with no more than
eight attributes. One of the main problems with earlier long stated 5. Validation of the stated importance survey
preference surveys was that they sometimes failed to differentiate
sufficiently between mean importance ratings if customers rated The stated importance survey was validated based on the
nearly all the measures near the top of the scale. Certain attributes conventional survey analysis, in which the same 15 attributes were
could therefore be rated as important even though they in fact rated using a 5-point Likert scale and subsequently normalised
have little influence on overall satisfaction. To avoid this type of with a 0–10 scale during data processing. The number of valid
bias, the attributes in each card and their order of appearance questionnaires collected in the conventional survey (520) shows a
were selected according to the following guidelines: uniform error of 4.4% for a confidence interval of 95.5%.
Before deriving the attribute importance mathematically from
● Each card includes a total of seven or eight attributes (almost the conventional survey, the valid surveys were analysed in depth
half the total attributes). using different statistical techniques. An independence test was
● Each attribute appears only twice; that is, on only two of the first carried out considering the different bus lines to check that
four available cards. the samples were independent and unbiased. As the variables
● Each time an attribute appears, attempts were made to change were categorical, this test was done by estimating the Pearson Chi-
its order of appearance, alternating the top and bottom positions squared (χ2). The Chi-square goodness-of-fit test revealed that
in the cards. To achieve this target, all the attributes meet the there is sample independence for most of the variables; that is, the
requirement that the difference between their two appearances survey answers do not depend on the chosen segment although
is at least two positions (at the top or bottom of the scale). there are variables that show different behaviours. For example,
Table 4 shows the four scale cards used in the case study bus route perception, frequency and information-service depend
(namely A1, A2, A3 and A4). 293 valid surveys were collected with on the bus line considered. Age affects the perception of the
this type of questionnaire: 79 from Card 1 (A1), 79 from Card 2 connection to other transport modes, frequency and information-
74 B. Guirao et al. / Transport Policy 49 (2016) 68–77
Table 4
Structure of the four ranking cards in the stated important survey, and number of times an attribute appear in the first, second and third position of the ranking.
Card 1 First position Second position Third position Score 1 Score 2 (over the card) Score 3 (over all cards)
Table 5 Table 6
Final ranking of attributes according to importance. Cluster analysis results according to bus lines. Number of clusters in two stages.
Variable Ranking Points over 100 Line Estimation Clusters in two stages Total
underlying variables or factors that explain the pattern of corre- more than 70% of the users surveyed were workers (49.8%) or
lations within a set of observed variables. After several trials, three students (31.7%).
indicators were identified with high factor loading ( 40.5), and in In addition to these supplementary studies, the conventional
order to explain the variance of 56.1% with a KMO index (Kaiser- survey allowed us to derive the attribute importance and compare
Meyer-Olkin) of 0.803, all the 11 attributes were included in the the results with those obtained from the conventional survey.
analysis, except “shelters”, due to its loading factor (lower than Multiple regression analysis was used to design a model in which
0.5). The first factor identified was designated SERVICE, as it de- the dependent variable was overall service satisfaction (CSI) –
scribes the quality attributes associated to the characteristics of whose values were collected from the last question in the con-
the service operation such as punctuality, frequency, information- ventional survey – and the dependent variables were the 15
service and information-incidents. The second factor was called quality attributes. The coefficients for each attribute therefore re-
INTEGRATION as it captures concepts associated to the inclusion of presented the average weight (or importance) given by the users.
the bus in the transportation systems, such as access to bus stops, Table 7 shows the descriptive statistics of the variables from the
connections to other modes, journey time and route. Finally the multiple regression. Specifically it can be seen that the attributes
third factor, identified as SUPPLEMENTARY FEATURES, includes considered most important are “frequency”, “punctuality”, “route”
attributes that are usually secondary to the users such as comfort, and “bus driving security”, three of which are also the most im-
cleanliness and ICTs. portant attributes in the stated importance survey, providing a
Fig. 2 shows the path diagram with the significant parameters consistent validation of the top positions in the ranking. There is
and relations (p o0.1) from the best MIMIC model estimation also consistency in the worse positions for the rest of the attri-
obtained with the following modelling fit indexes: root mean butes (“ICTs” and “Information-incidents”) and in some inter-
square error of approximation (RMSEA ¼0.079), confirmatory fit mediate positions (“journey time” and “bus seating position), al-
index (CFI¼0.956) and adjusted goodness-of-fit statistic though some differences can be seen in the rest of the ranking
(AGFI ¼0.820). The MIMIC estimation was obtained using the positions.
AMOS program from the SPSS package (Arbuckle, 2013). This It is clear from the stated importance survey that attributes
diagram shows how the relations between the observable vari- linked to supplementary features (cleanliness, comfort, ICTs, bus
ables and the three main factors (Service, Integration and Sup- driving attention) are worse ranked than those associated to ser-
plementary Features) are weaker (0.3) than the relation between vice (punctuality, frequency, bus driving security and information
factors and quality attributes. The results show more clearly how service), which are in the top positions. Attributes included in the
gender mainly affects the perception of attributes associated to concept of integration (connections, access, route, journey time)
factors of Integration and Service, while users with work purpose are in the middle of the ranking list. These results are clearly
trips are more sensitive to Service and Integration attributes. Fi- connected to the MIMIC model designed to analyse the presence
nally, the age of the customers conditions the perception of the of latent variables in this case study, and also consistent with the
attributes linked only to the Supplementary Features of the ser- trip and user profile statistics (more than 49% work trips).
vice, and usually means that the older the users, the lower the In the derived importance ranking, the main differences in the
rating of comfort, cleanliness and ICTs. These results are consistent attribute position correspond to the integration category (“route”,
with those obtained using the frequency analysis of the attribute “access”, “connections”) and two in the “supplementary features”
ratings in the conventional survey, in which the work-based trips category (“comfort” and “bus driver attention”). As “route” was the
are not as sensitive to the comfort attribute, and Supplementary second most important attribute, we revised the survey format for
features and Integration attributes are worse rated by women than differences in the wording or interpretation of the route question
by men. in the conventional survey and the stated importance survey. We
The results obtained with the study of latent variables in- found one small difference in the wording that may have affected
evitably led to the issue of the disaggregation of attribute im- the results: while in the stated survey the concept of route was
portance according to user and trip profile. Two of the most im- explained in brackets (“itinerary and stops”), in the conventional
portant attributes in the stated importance survey were punctu- survey no explanation was given of this category. This means that
ality and frequency, and both attributes are directly link to work- the interpretation of what the route actually involves (location of
based trips. This is consistent with the survey's main statistics, as stops, distance between stops, adaptation to urban sprawl and
Fig. 2. Path diagram with the significant parameters and relations (po 0.1) of the best MIMIC model estimation (RMSEA¼ 0.079. CFI¼ 0.956. AGFI ¼ 0.820).
76 B. Guirao et al. / Transport Policy 49 (2016) 68–77