0% found this document useful (0 votes)
9 views10 pages

Award Price Estimator For Public Procurement Auctions Using Machine Learning Algorithms: Case Study With Tenders From Spain

This study evaluates various machine learning algorithms for predicting award prices in public procurement auctions in Spain, comparing traditional methods like linear regression and random forest with less common techniques such as isotonic regression and artificial neural networks. The research aims to enhance the accuracy of award price estimators, which can significantly benefit public procurement agencies and small businesses. The findings suggest that machine learning can improve the efficiency and transparency of public procurement processes.

Uploaded by

Silvano Júnior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Award Price Estimator For Public Procurement Auctions Using Machine Learning Algorithms: Case Study With Tenders From Spain

This study evaluates various machine learning algorithms for predicting award prices in public procurement auctions in Spain, comparing traditional methods like linear regression and random forest with less common techniques such as isotonic regression and artificial neural networks. The research aims to enhance the accuracy of award price estimators, which can significantly benefit public procurement agencies and small businesses. The findings suggest that machine learning can improve the efficiency and transparency of public procurement processes.

Uploaded by

Silvano Júnior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Studies in Informatics and Control, 30(4) 67-76, December 2021 ISSN: 1220-1766 eISSN: 1841-429X 67

Award Price Estimator for Public Procurement Auctions


Using Machine Learning Algorithms: Case Study with
Tenders from Spain
Manuel J. GARCIA RODRIGUEZ1, Vicente RODRIGUEZ MONTEQUIN1*,
Andoni ARANGUREN UBIERNA2, Roberto SANTANA HERMIDA2,
Basilio SIERRA ARAUJO2, Ana ZELAIA JAUREGI2
1
Project Engineering Area, University of Oviedo, Oviedo, 33004, Spain
manueljgarciar@gmail.com, montequi@uniovi.es (*Corresponding author)
2
Department of Computer Sciences and Artificial Intelligence, University of the Basque Country,
San Sebastián, 20018, Spain
andoni.aranguren@gmail.com, roberto.santana@ehu.eus, b.sierra@ehu.eus, ana.zelaia@ehu.eus
Abstract: The public procurement process plays an important role in the efficient use of public resources. In this context,
the evaluation of machine learning techniques that are able to predict the award price is a relevant research topic. In this
paper, the suitability of a representative set of machine learning algorithms is evaluated for this problem. The traditional
regression methods, such as linear regression and random forest, are compared with the less investigated paradigms, such as
isotonic regression and popular artificial neural network models. Extensive experiments are conducted based on the Spanish
public procurement announcements (tenders) dataset and employ diverse error metrics and implementations in WEKA and
Tensorflow 2.
Keywords: Machine learning, Neural networks, Public procurement, Spanish tender.

1. Introduction
The importance of the public procurement is pace in the last 5 years in the private sector
well known. In terms of projects and cost, the worldwide, the adoption of AI within public
largest adjudicators of a country are the public administration processes has the potential to
procurement agencies. For example, the public provide enormous benefits. It improves the
authorities of the European Union (EU) spent efficiency and effectiveness of policy making
around 14% of their GDP (around €2 trillion) on and service delivery to businesses and citizens,
public procurement (purchase of services, works ultimately enhancing their level of satisfaction and
and supplies) in 2017 (European Commission, trust in the quality of public service (Kuziemski
2017). Therefore, improving public procurement & Misuraca, 2020).
can yield enormous savings: even a 1% efficiency
The award price estimator is a regression problem.
gain could save €20 billion per year. It is crucial to
The tender has x known input features (e.g.,
analyse the public procurement notice (also called
date, tender price, type of contract, and public
auctions, requests for tender or simply tenders) procurement agency) and a y unknown output
in order to understand its behaviour in terms of feature (award price). The tender price, which
prices. Through the use of new technologies, is calculated by the public procurement agency,
like machine learning (ML), among others, new is the key input parameter to the award price
tools can be created to improve these public estimator. The tender price is the theoretical price
procurement processes. and the estimator adjusts it regarding the real and
changing market conditions to predict the award
ML involves computer algorithms that are used price by the winning bidder.
for knowledge discovery from large amounts
of data. It is considered to be a type of artificial The aim of this article is to improve the accuracy
intelligence (AI), and it is regarded as one of of the award price estimator studied previously
the most disruptive innovations and a strong in (García Rodríguez et al., 2019a). That article
enabler of competitive advantages. While ML applied only one algorithm (random forest) to
has been around for more than 60 years, it has predict the award price, and it was validated
only recently showed significant potential for over two tender datasets from Spain and Europe.
disrupting economies and societies (Lee & Shin, Further, this article increases the prediction
2020). Mirroring a trend that has increased accuracy, and it compares four algorithms: linear

https://doi.org/10.24846/v30i4y202106 ICI Bucharest © Copyright 2012-2021. All rights reserved


68 Garcia Rodriguez M. J. , Rodriguez Montequin V., Aranguren Ubierna A., Santana Hermida R. , Sierra Araujo B. , Zelaia Jauregi A.

regression, isotonic regression, random forest and procurement managers, project managers,
artificial neural network. The last two algorithms executives, politicians and, indirectly, citizens.
are ML methods, particularly supervised learning. In the particular case of Spain, an initial analysis
published in (García Rodríguez et al., 2019b)
An award price estimator would produce explains the Spanish public tendering system
significant benefits. It would be an excellent tool and the potential applications and benefits of
for the cost planning of public tendering agencies employing massive data processing.
by allowing them to have more realistic budgets.
Additionally, such a price estimator would provide The past decades have seen the rapid
support to small- and medium-sized enterprises development of the computer hardware,
(SMEs) that play a crucial role in most economies. communication technologies and computer
For example, SMEs represent 50% of the GDP in sciences (artificial intelligence and big data).
the EU (European Commission, 2020). However, These new technologies make it possible to
they have difficulty when competing on equal implement the informatization of conventional
terms with big suppliers in the public procurement public procurement tendering processes. Public
space. Other benefits could be the reduction of procurement has the typical objectives of the
fraud between bidders, which would improve private sector: to acquire the right goods or
the transparency of the process and lead to better services from the right supplier, at the right price,
quantification of the product quality. at the highest service level, and considering
laws and norms requirements. But it also
The paper begins with reviewing the literature requires strict compliance with the principles
and identifying the research gap to be examined of non-discrimination, free competition, and
(Section 2). Then, the dataset of public transparency of the awarding procedures (Dotoli,
procurement auctions, the ML algorithms being Epicoco & Falagario, 2020).
compared (random forest, linear regression,
isotonic regression and artificial neural networks There is extensive literature about prediction
(ANNs)) and the error metrics that are used techniques (forecasting) and data analysis in public
are described (Section 3). Next, the major tendering. There are mainly two approaches:
quantitative results of the experimental analysis statistical models (e.g., mathematical algorithms)
are summarized for identifying the best ML and statistical learning (e.g., ML algorithms).
algorithm to predict the award price (Section 4). There is not a clear demarcation or boundary
Finally, some concluding remarks, limitations, between both approaches because research on
and avenues for future research are presented ML also covers the conception of mathematical
(Section 5). algorithms. Thus, statistics and ML are closely
related fields in terms of methods, but distinct
2. Literature Review with regard to their principal goal: statistics draws
population inferences from a sample, while ML
While an increasing number of studies in public finds generalizable predictive patterns (Bzdok,
procurement is being published every year, an Altman & Krzywinski, 2018).
overview of the field is missing. In the literature
Statistical models are the traditional or
on public procurement, an ambiguous wording is
conventional approach used to analyse and
usually used, and a consensus on the terminology
validate hypotheses. For example, there are
and concepts involved has not been reached yet
models for statistical relationships for tender
(Obwegeser & Müller, 2018). Technological and
forecasting in capped tender (Ballesteros-Pérez,
organizational challenges faced during public González-Cruz & Cañavate-Grimal, 2012),
electronic procurement processes are not well scoring probability graphs (Ballesteros-Pérez,
understood despite past studies focusing on González-Cruz & Cañavate-Grimal, 2013),
these topics (Mohungoo, Brown & Kabanda, multicriteria decision making (Dotoli, Epicoco
2020). The data analysis of public tenders can & Falagario, 2020), the probability of bidder
provide valuable information for different participation (Ballesteros-Pérez et al., 2015;
stakeholders: public tendering agencies, public Ballesteros-Pérez et al., 2016), and the optimal

https://www.sic.ici.ro
Award Price Estimator for Public Procurement Auctions Using Machine Learning Algorithms... 69

bidder participation to achieve the lowest In conclusion, this article is a true reflection of the
procurement prices (Onur & Tas, 2019). There applicability of ML in public procurement. The
is also a mathematical model where the bidders fundamental insight behind this breakthrough is
are evaluated on the basis of price and quality as much statistical as computational. Artificial
through a score function (Lorentziadis, 2020), intelligence became possible once researchers
the detection of groups of bidders in collusive stopped approaching intelligence tasks
auctions (also called not competitive tenders or procedurally and began tackling them empirically
bid-rigging cartels) (Conley & Decarolis, 2016) (Mullainathan & Spiess, 2017). ML algorithms
or discriminatory competitive procedures in public produce a powerful, flexible way of making
procurement with unverifiable quality (Albano, quality predictions, but they have a weakness: they
Cesi & Iozzi, 2017). do not contain strong assumptions and instead
contain mostly unverifiable assumptions due to the
On the other hand, a variety of ML techniques fact that ML approaches do not generally produce
has also been successfully applied to public stable estimates of the underlying parameters
procurement and created empirical models. For (Mullainathan & Spiess, 2017).
example, among the particular problems addressed
by this type of algorithm are those related to the
3. Experimental Procedures
behaviour of bidders: the estimation of the number
of bidders in tenders (KNN) (Gorgun, Kutlu & The main objective of this work is to analyze
Onur Tas, 2020), the identification of the optimal different ML paradigms for predicting the award
bidder (fuzzy logic) (Wang et al., 2014), creating a (winning) price of Spanish public tenders. In this
search engine of suppliers to recommend potential section, the dataset and the learning models are
bidders for a characterized tender (random forest) presented, and details about the error metrics and
(García Rodríguez et al., 2020) the detection of
validation method are given.
collusive auctions (ensemble method) (Huber
& Imhof, 2019), or the proposal of an objective 3.1 Dataset
system (key performance indicators) for
supporting the estimators (benchmarking) during The original data were extracted from the
the tender evaluation process (ANNs) (Bilal & information files published by the Spanish
Oyedele, 2020). Ministry of Finance (see Data Availability). It
contained information about tenders published
However, there are almost no studies about award between 2012 and 2018. The data were
price forecasting, so there is a research gap. The preprocessed for a preliminary study published
first holistic approach that considers all kinds in (García Rodríguez et al., 2019a) and a dataset
of tenders (multi-sectorial) and a large volume of 58,337 Spanish tenders was obtained. To
of tenders is (García Rodríguez et al., 2019a) compare the results, the same dataset was used in
whose dataset is used in this article. Previously, the experiments presented in this article.
two articles created award price estimators with
ML algorithms, but they were applied only to Tenders in the dataset were defined by 14
construction auctions: bridge projects (Chou et input variables that provided the following
al., 2015) and highway procurement (Kim & information: the name of the public procurement
Jung, 2019). It is typical to find literature focused agency that made the tender, geographical
only on public procurement for construction or information about the agency (municipality,
civil engineering projects; this is mainly because province, region, and wider region code), the
they are the biggest and most important projects tender price (the amount of budgeted bidding),
in public procurement (García Rodríguez et al., the duration (days to execute the contract),
2019a). This paper is the first attempt to compare the type of work according to the common
different algorithms in order to improve the procurement vocabulary (CPV) in two levels of
accuracy of award price forecasting in multi- detail, the type of contract defined by legislation
sectorial tenders. (in two levels of detail), the procedure by which
the contract was awarded, the urgency level

ICI Bucharest © Copyright 2012-2021. All rights reserved


70 Garcia Rodriguez M. J. , Rodriguez Montequin V., Aranguren Ubierna A., Santana Hermida R. , Sierra Araujo B. , Zelaia Jauregi A.

and the date of agreement in the award of the Another technique that is increasingly applied
contract. Note that during preprocessing, all this to regression problems is isotonic regression
information was converted to integer values to (equation 2). This method tries to find a line as
make it suitable for the learning methods being close to the observations as possible:
evaluated. The output variable was the award m
price, which is the amount offered by the winning min ∑ wi ( g ( xi ) − f ( xi )) 2 (2)
g
bidder of the contract. i =1

3.2 Machine Learning Algorithms where xi are the input variables, g is the isotonic
estimator, f is a function, wi are the weights and
The random forest for regression ML model m is the number of observations. This method
was selected to create an award price predictor produced a series of predictions for the training
in the preliminary study (García Rodríguez et data that were the closest to the targets in terms of
al., 2019b). The research presented here aimed the mean square error (MSE). These predictions
to investigate a wider range of ML paradigms, were interpolated to predict unseen data. The
compare them and select the most suitable one predictions from the isotonic regression thus
for the task. Models for regression need to be formed a function that was piecewise linear
selected, since the output variable to predict is (Chakravarti, 1989).
the award price. Very widely used supervised ML
algorithms were considered: random forest, linear For the three ML algorithms presented in this
regression, isotonic regression and artificial neural section, the implementations available in WEKA
networks (ANNs). A brief description of them is (Hall et al., 2009; Witten et al., 2011) were used
presented here. in the experiments. WEKA is a machine learning
platform developed by Waikako University, that
A random forest algorithm (Breiman, 2001) supports a large number of learning algorithms
is a combination of tree predictors, where (Waikako University, 2021).
each tree depends on the values of a random
vector independently sampled and with the 3.3 ANNs
same distribution for all the trees in the forest.
The prediction of the ensemble is computed Recently, ANNs have re-emerged as a powerful
by averaging the predictions of the individual tool to deal with a variety of ML problems. In
models. It is a typical example of an ensemble particular, they have been applied to regression
method that reduces the bias of individual models problems where the input data can be noisy or not
and provides a more flexible predictor that is less fully observed. An ANN is a computational model
prone to overfitting. inspired by biological neural networks. It consists
of a collection of units or nodes (artificial neurons)
While the random forest model is robust, there are organized in connected layers. The parameters of
situations where simpler algorithms, like linear the model are the weights and biases associated
regression, could produce better results. This explains to the connections. Information is processed from
the convenience of evaluating the performance of the input layer to the output layer.
linear regression for award price estimation.
The learning process is based on minimizing a
Linear regression (equation 1) is a machine cost function (also known as loss function) that
learning technique used to model the linear evaluates the performance of the network for
relationship between the input variables xi and the given task. Backpropagation is used to learn
the output variable y: the weights associated to the connections. One
n of the ANNs used in this work is a multi-layer
=y ∑ (β x + ε )
i =1
i i (1) perceptron (MLP) (Hastie, Tibshirani & Friedman,
2009) implemented in WEKA (Hall et al., 2009;
where βi are the parameters that measure the Witten et al., 2011).
influence of the input variables, and ε is a
constant value.

https://www.sic.ici.ro
Award Price Estimator for Public Procurement Auctions Using Machine Learning Algorithms... 71

3.4 ANN Optimization (Deep Learning) activation functions including a sigmoid. Due
to the page number restrictions and the poor
In addition to using the MLP implementation results achieved with these functions, it was
in WEKA, a set of ANN architectures that decided to include only the results for the ReLU
represented a different number of layers (to and SeLU. It is emphasized that both functions
evaluate the impact of the depth) and a different are theoretically more sound since they address
number of neurons in each layer was selected. the vanishing and exploding gradient problems
The particular choice of the number of neurons experienced by the sigmoid and hyperbolic
is arbitrary and was intended to keep a balance tangent functions. They can be used for all the
between the goals of increasing the capacity of main neural network paradigms (i.e., MLPs,
the model and keeping a manageable complexity. CNNs, and RNNs). In particular, SeLU, one
of the newest activation functions proposed in
The selected ANN architectures were the the literature, was introduced with an eye on
following: two architectures of one hidden layer standard feed-forward neural networks and not
with 16 nodes and 32 nodes; five architectures of envisioning CNNs.
three hidden layers of 16 nodes in each (16-16-
16), 32 nodes in each (32-32-32), and a different Regarding the selection of the regression loss-
number of nodes in each (16-8-16, 32-8-32, 32- functions, the common ones were used: MSE
16-32); and an architecture of five hidden layers or the sum of squared distances, mean absolute
with 32-16-8-16-32 nodes (see Figure 1). For error (MAE) or the sum of absolute differences
each of these ANN designs, different activation (see subsection 3.5). Among the available gradient
functions, loss functions and gradient descent descent optimization algorithms commonly used,
optimization algorithms were evaluated. Adagrad (Duchi, Bartlett & Wainwright, 2012)
and Adam and Adamax (Kingma & Ba, 2014)
The activation function determines the type of were selected for the experiments (Keras, 2021).
non-linear transformation made to the linear For the optimization process, the maximum
combination of the weights and input neurons. number of epochs (times the learning algorithm
In most cases, the rectified linear unit (ReLU) iterated through the training dataset) was set
general activation function is used. Recently, the to 50,000.
scaled exponential linear unit (SeLU) activation
function (Klambauer et al., 2017) has been reported The eight ANN structures combined with two
to produce promising results. This is an activation activation functions, two loss functions and three
function that induces self-normalizing properties. optimizers provided 96 different ANN designs.
Only the training dataset (46,670 tenders) was
Regarding the choice of ReLU and SeLU, used for the optimization process. Two different
preliminary experiments were made with other validation frameworks were evaluated: a train/test

Figure 1. ANN structures

ICI Bucharest © Copyright 2012-2021. All rights reserved


72 Garcia Rodriguez M. J. , Rodriguez Montequin V., Aranguren Ubierna A., Santana Hermida R. , Sierra Araujo B. , Zelaia Jauregi A.

division (Hold-out 80/20) and a K-fold cross- scoring rule that also measures the average
validation with K=10. Figure 2 and Figure 3 magnitude of the error:
show the results obtained for the four error metrics
1 m
(MAE, root mean square error (RMSE), relative =
absolute error (RAE) and root relative square error
MAE
m i =1
ri − pi ∑ (3)
(RRSE)). The rows of the tables correspond to
1 m
the eight different ANN architectures, the columns =
correspond to the two activation functions (RELU,
RMSE
m i =1
(ri − pi ) 2 ∑
(4)

SELU), two loss functions (MAE, MSE) and three


optimizers (Adam, Adamax, Adagrad). The best where ri are the actual observations (the true
results (minimum error) are coloured in green, the values), Pi are the predicted values and m is the
intermediate ones are in orange and the worst are number of observations.
in red. In relative terms, the RAE (equation 5) and
A set of ANN configurations that performed well RRSE (equation 6) calculate the error values as
during the optimization phase was selected for the a ratio (percentage):
final test:

m
r − pi
RAE = i =1 i
(5)
- ANN1: Three hidden layers with 16-8-16

m
r −r
i =1 i
nodes, SeLU activation function, MAE loss
function and Adam optimizer (see Figure 2);

m
(ri − pi ) 2
RRSE = i =1
(6)

- ANN2: Three hidden layers with 32-8-32 m
nodes, SeLU activation function, MAE loss (r − r ) 2
i =1 i
function and Adagrad optimizer (see Figure 3);
where r is the mean of the actual observations.
- ANN3: Three hidden layers with 16-8-16 Values over 100% appear when the absolute or
nodes, ReLU activation function, MAE loss quadratic difference between the predicted values
function and Adagrad optimizer (see Figure 2); and the actual observations are bigger than the
differences between the actual observations and
- ANN4: Three hidden layers with 16 nodes their means.
in each, ReLU activation function, MSE loss
function and Adam optimizer (see Figure 3). Finally, the MSE (equation 7) is used as a loss
function in the present experiments with the ANNs:
The optimization for the ANN architectures
was performed using Tensorflow (Abadi et 1 m
al., 2016; Tensorflow, 2021), which is an open =
MSE
m i =1
(ri − pi ) 2 ∑
(7)
source software library for ML. It is a very
appropriate platform to evaluate with different
ANN architectures. Keras is an API designed to
3.6 Validation
simplify the use of Tensorflow. To evaluate the performance of the different
ML paradigms, models were trained with 80%
3.5 Error Metrics of the tenders (46,670). The remaining 20%
In this subsection, the error metrics used to (11,667) were used as a test. The same validation
measure the deviation of the predicted values framework and train/test division of the dataset
compared to the real ones are presented. were used in the preliminary study.

The MAE (equation 3) and RMSE (equation 4. Experimental Results


4) are two of the most common metrics used to
measure accuracy in absolute terms for continuous The validation was performed for the 11,667
variables. The MAE measures the average tenders in the test dataset. The random forest
magnitude of the errors in a set of predictions model was considered to be the baseline because
without considering their direction. It was also it was used in the preliminary study and therefore
used as the loss function for the ANNs in the enabled the comparison of the behaviour of
present experiments. The RMSE is a quadratic different ML paradigms.

https://www.sic.ici.ro
Award Price Estimator for Public Procurement Auctions Using Machine Learning Algorithms... 73

ANN configuration Experimental results

Activation function RELU SELU

Regression loss function Mean Absolute Error Mean Squared Error Mean Absolute Error Mean Squared Error Colour legend
Error
metrics Optimizer Percentile
Adam Adamax Adagrad Adam Adamax Adagrad Adam Adamax Adagrad Adam Adamax Adagrad Percentile
Layer structure value
16x1 0.94M€ 0.77M€ 0.88M€ 1.04M€ 0.89M€ 0.73M€ 0.80M€ 0.75M€ 0.86M€ 0.89M€ 0.83M€ 0.75M€ 1.27M€ 100
32x1 0.91M€ 0.76M€ 0.94M€ 0.92M€ 0.86M€ 0.75M€ 0.82M€ 0.75M€ 1.03M€ 0.94M€ 0.88M€ 0.74M€ 0.87M€ 75
16x3 0.78M€ 0.75M€ 0.76M€ 0.73M€ 0.80M€ 0.80M€ 0.77M€ 0.75M€ 0.79M€ 0.80M€ 0.86M€ 0.80M€ 0.82M€ 59
MAE 32x3 0.78M€ 0.74M€ 0.76M€ 0.94M€ 0.77M€ 0.74M€ 0.78M€ 0.74M€ 0.79M€ 0.80M€ 0.82M€ 0.73M€ 0.79M€ 41
(M€) 16-8-16 0.78M€ 0.76M€ 0.67M€ 0.73M€ 0.79M€ 0.75M€ 0.55M€ 0.76M€ 0.80M€ 0.81M€ 0.77M€ 0.73M€ 0.76M€ 25
32-8-32 1.11M€ 1.03M€ 0.86M€ 0.80M€ 0.79M€ 0.74M€ 0.95M€ 1.25M€ 0.86M€ 0.77M€ 0.77M€ 0.85M€ 0.75M€ 16
32-16-32 0.90M€ 1.01M€ 0.82M€ 0.79M€ 0.85M€ 0.73M€ 1.06M€ 1.27M€ 0.80M€ 0.83M€ 0.83M€ 0.82M€ 0.73M€ 8
32-16-8-16-32 1.00M€ 1.23M€ 0.84M€ 0.83M€ 0.83M€ 0.78M€ 0.98M€ 1.22M€ 0.81M€ 0.76M€ 0.78M€ 0.76M€ 0.55M€ 0
16x1 10.80M€ 10.86M€ 10.78M€ 10.62M€ 10.62M€ 10.74M€ 10.84M€ 10.68M€ 10.45M€ 10.59M€ 10.63M€ 10.64M€ 10.99M€ 100
32x1 10.68M€ 10.76M€ 10.63M€ 10.69M€ 10.61M€ 10.65M€ 10.71M€ 10.71M€ 10.68M€ 10.70M€ 10.62M€ 10.56M€ 10.78M€ 75
16x3 10.72M€ 10.73M€ 10.61M€ 10.68M€ 10.66M€ 10.57M€ 10.81M€ 10.47M€ 10.54M€ 10.66M€ 10.71M€ 10.56M€ 10.71M€ 59
RMSE 32x3 10.95M€ 10.53M€ 10.60M€ 10.60M€ 10.51M€ 10.66M€ 10.82M€ 10.50M€ 10.63M€ 10.62M€ 10.62M€ 10.62M€ 10.66M€ 41
(M€) 16-8-16 10.77M€ 10.59M€ 10.36M€ 10.64M€ 10.57M€ 10.58M€ 10.11M€ 10.68M€ 10.60M€ 10.68M€ 10.56M€ 10.58M€ 10.62M€ 25
32-8-32 10.91M€ 10.79M€ 10.86M€ 10.70M€ 10.71M€ 10.67M€ 10.86M€ 10.86M€ 10.93M€ 10.73M€ 10.64M€ 10.64M€ 10.58M€ 16
32-16-32 10.91M€ 10.99M€ 10.92M€ 10.70M€ 10.65M€ 10.58M€ 10.89M€ 10.92M€ 10.91M€ 10.71M€ 10.71M€ 10.71M€ 10.55M€ 8
32-16-8-16-32 10.95M€ 10.84M€ 10.86M€ 10.70M€ 10.73M€ 10.68M€ 10.92M€ 10.84M€ 10.83M€ 10.71M€ 10.72M€ 10.71M€ 10.11M€ 0
16x1 145% 119% 137% 160% 137% 112% 123% 116% 133% 137% 127% 115% 195% 100
32x1 140% 118% 145% 142% 133% 115% 127% 116% 159% 145% 136% 114% 134% 75
16x3 120% 116% 118% 113% 123% 123% 119% 115% 122% 123% 132% 124% 127% 59
RAE 32x3 121% 115% 118% 145% 118% 114% 120% 114% 122% 124% 127% 112% 121% 41
(%) 16-8-16 120% 117% 104% 113% 121% 115% 85% 117% 123% 125% 119% 113% 118% 25
32-8-32 171% 159% 133% 124% 122% 115% 147% 192% 133% 119% 118% 131% 115% 16
32-16-32 139% 156% 126% 122% 131% 112% 164% 195% 124% 128% 128% 127% 113% 8
32-16-8-16-32 155% 189% 129% 128% 129% 120% 151% 188% 125% 117% 120% 117% 85% 0
16x1 109.4% 110.0% 109.2% 107.5% 107.6% 108.8% 109.8% 108.2% 105.8% 107.2% 107.6% 107.8% 111.3% 100
32x1 108.2% 109.0% 107.7% 108.3% 107.5% 107.8% 108.5% 108.5% 108.1% 108.4% 107.5% 107.0% 109.2% 75
16x3 108.5% 108.6% 107.5% 108.2% 108.0% 107.1% 109.5% 106.1% 106.7% 108.0% 108.5% 107.0% 108.5% 59
RRSE 32x3 110.9% 106.7% 107.4% 107.4% 106.5% 107.9% 109.6% 106.3% 107.6% 107.5% 107.5% 107.6% 108.0% 41
(%) 16-8-16 109.1% 107.3% 104.9% 107.8% 107.0% 107.1% 102.4% 108.2% 107.4% 108.2% 106.9% 107.2% 107.5% 25
32-8-32 110.5% 109.2% 110.0% 108.4% 108.5% 108.1% 110.0% 110.0% 110.7% 108.6% 107.7% 107.8% 107.2% 16
32-16-32 110.5% 111.3% 110.6% 108.4% 107.9% 107.2% 110.3% 110.6% 110.5% 108.5% 108.5% 108.4% 106.8% 8
32-16-8-16-32 110.9% 109.8% 110.0% 108.3% 108.6% 108.2% 110.6% 109.8% 109.7% 108.4% 108.5% 108.5% 102.4% 0

Figure 2. Error metrics (MAE, RMSE, RAE and RRSE) for different ANN configurations with validation
framework train/test division (hold-out 80/20)
ANN configuration Experimental results

Activation function RELU SELU

Regression loss function Mean Absolute Error Mean Squared Error Mean Absolute Error Mean Squared Error Colour legend
Error
metrics Optimizer Percentile
Adam Adamax Adagrad Adam Adamax Adagrad Adam Adamax Adagrad Adam Adamax Adagrad Percentile
Layer structure value
16x1 0.99M€ 0.90M€ 0.96M€ 0.97M€ 1.01M€ 0.89M€ 0.97M€ 0.89M€ 0.94M€ 1.08M€ 1.02M€ 0.88M€ 1.25M€ 100
32x1 1.15M€ 0.91M€ 0.95M€ 1.01M€ 1.14M€ 0.92M€ 1.01M€ 0.90M€ 1.02M€ 0.97M€ 1.01M€ 0.91M€ 0.97M€ 75
16x3 0.89M€ 0.89M€ 0.88M€ 0.81M€ 0.88M€ 0.89M€ 0.90M€ 0.87M€ 0.91M€ 0.91M€ 0.99M€ 0.89M€ 0.91M€ 59
MAE 32x3 0.89M€ 0.88M€ 0.86M€ 0.91M€ 0.83M€ 0.98M€ 0.90M€ 0.90M€ 0.86M€ 0.82M€ 0.83M€ 0.98M€ 0.89M€ 41
(M€) 16-8-16 0.84M€ 0.90M€ 0.91M€ 0.95M€ 1.04M€ 0.89M€ 0.81M€ 0.90M€ 0.91M€ 0.87M€ 1.02M€ 0.91M€ 0.88M€ 25
32-8-32 0.89M€ 0.91M€ 0.85M€ 0.86M€ 0.90M€ 0.84M€ 0.82M€ 0.83M€ 0.77M€ 0.92M€ 0.95M€ 0.87M€ 0.86M€ 16
32-16-32 0.89M€ 0.90M€ 0.89M€ 0.88M€ 0.99M€ 1.09M€ 0.88M€ 0.89M€ 0.89M€ 0.96M€ 1.15M€ 1.01M€ 0.83M€ 8
32-16-8-16-32 0.83M€ 0.87M€ 0.87M€ 0.98M€ 0.91M€ 0.88M€ 0.85M€ 0.89M€ 0.80M€ 0.86M€ 1.25M€ 0.87M€ 0.77M€ 0
16x1 14.30M€ 14.29M€ 14.25M€ 14.26M€ 14.29M€ 14.24M€ 14.26M€ 14.26M€ 14.26M€ 14.30M€ 14.27M€ 14.25M€ 14.32M€ 100
32x1 14.30M€ 14.31M€ 14.26M€ 14.25M€ 14.29M€ 14.23M€ 14.31M€ 14.28M€ 14.24M€ 14.32M€ 14.24M€ 14.27M€ 14.27M€ 75
16x3 14.26M€ 14.29M€ 14.27M€ 14.18M€ 14.22M€ 14.23M€ 14.32M€ 14.24M€ 14.27M€ 14.19M€ 14.21M€ 14.23M€ 14.26M€ 59
RMSE 32x3 14.30M€ 14.28M€ 14.26M€ 14.25M€ 14.21M€ 14.28M€ 14.32M€ 14.28M€ 14.27M€ 14.25M€ 14.19M€ 14.22M€ 14.24M€ 41
(M€) 16-8-16 14.24M€ 14.24M€ 14.25M€ 14.23M€ 14.24M€ 14.23M€ 14.21M€ 14.27M€ 14.27M€ 14.18M€ 14.23M€ 14.22M€ 14.23M€ 25
32-8-32 14.27M€ 14.29M€ 14.18M€ 14.27M€ 14.25M€ 14.18M€ 14.30M€ 14.19M€ 14.08M€ 14.24M€ 14.25M€ 14.25M€ 14.22M€ 16
32-16-32 14.25M€ 14.28M€ 14.24M€ 14.29M€ 14.21M€ 14.25M€ 14.25M€ 14.23M€ 14.24M€ 14.23M€ 14.21M€ 14.23M€ 14.19M€ 8
32-16-8-16-32 14.21M€ 14.26M€ 14.23M€ 14.26M€ 14.26M€ 14.23M€ 14.25M€ 14.25M€ 14.16M€ 14.23M€ 14.30M€ 14.23M€ 14.08M€ 0
16x1 124% 112% 121% 121% 127% 112% 122% 112% 118% 136% 128% 110% 156% 100
32x1 144% 114% 119% 127% 143% 115% 127% 113% 128% 122% 126% 114% 121% 75
16x3 112% 111% 110% 102% 111% 112% 113% 109% 113% 115% 124% 112% 114% 59
RAE 32x3 112% 111% 108% 114% 104% 123% 112% 113% 108% 103% 103% 123% 112% 41
(%) 16-8-16 106% 113% 114% 119% 130% 112% 102% 112% 114% 109% 128% 115% 110% 25
32-8-32 111% 114% 107% 108% 113% 105% 103% 104% 96% 116% 119% 109% 108% 16
32-16-32 111% 112% 112% 110% 124% 137% 111% 112% 111% 121% 144% 127% 104% 8
32-16-8-16-32 104% 109% 108% 123% 115% 110% 107% 112% 101% 108% 156% 110% 96% 0
16x1 103.0% 102.9% 102.6% 102.7% 103.0% 102.6% 102.7% 102.7% 102.7% 103.0% 102.8% 102.7% 103.2% 100
32x1 103.0% 103.1% 102.7% 102.6% 103.0% 102.5% 103.1% 102.9% 102.6% 103.2% 102.6% 102.8% 102.8% 75
16x3 102.7% 102.9% 102.8% 102.1% 102.4% 102.5% 103.1% 102.6% 102.8% 102.3% 102.4% 102.5% 102.7% 59
RRSE 32x3 103.0% 102.8% 102.8% 102.6% 102.4% 102.8% 103.1% 102.9% 102.8% 102.7% 102.2% 102.4% 102.6% 41
(%) 16-8-16 102.6% 102.6% 102.7% 102.5% 102.6% 102.5% 102.3% 102.8% 102.8% 102.1% 102.5% 102.5% 102.5% 25
32-8-32 102.8% 103.0% 102.1% 102.8% 102.6% 102.2% 103.0% 102.2% 101.4% 102.6% 102.7% 102.6% 102.4% 16
32-16-32 102.6% 102.9% 102.6% 102.9% 102.4% 102.6% 102.7% 102.5% 102.6% 102.5% 102.3% 102.5% 102.2% 8
32-16-8-16-32 102.4% 102.7% 102.5% 102.8% 102.8% 102.5% 102.6% 102.7% 102.0% 102.5% 103.0% 102.5% 101.4% 0

Figure 3. Error metrics (MAE, RMSE, RAE and RRSE) for different ANN configurations with validation
framework K-fold cross-validation (K=10)

ICI Bucharest © Copyright 2012-2021. All rights reserved


74 Garcia Rodriguez M. J. , Rodriguez Montequin V., Aranguren Ubierna A., Santana Hermida R. , Sierra Araujo B. , Zelaia Jauregi A.

Table 1 shows that the results obtained for the Summarizing, ANNs are very promising models
random forest model were improved for all the for award price prediction. The quality of the final
error metrics (the lowest errors are in bold). The predictions is very good considering that only 96
linear regression model did not perform well ANN designs were tested.
because the results obtained are worse than the
ones obtained with the random forest model for 5. Conclusion and Future Work
all the error metrics. Therefore, it was concluded
that the model is not appropriate for the problem While the importance of using public datasets
at hand. Isotonic regression and MLP performed to make a more efficient use of public resources
better. In fact, both improved the results obtained is generally acknowledged, the choice of the
with the random forest model for some of the error particular type of ML technique to apply to each
metrics. Isotonic regression improved all the error problem is not straightforward. For award price
metrics. MLP substantially improved the results prediction in public procurement auctions, it
for the RMSE and RRSE error metrics (the values was previously reported that the random forest
in bold) and are the best compared to the results model is an efficient algorithm. The present paper
obtained from the other models. investigates this question considering a larger
set of ML models. Extensive experiments were
For all the error metrics, the ANNs are the models conducted aiming to predict the award price of
that obtained the best results (values in bold). The Spanish tenders.
ANN2 architecture improved the simple MLP and
had the best MAE and RAE errors. This comprised The contributions of this study are the following.
a network structure of only 3 hidden layers with Using different metrics, it was demonstrated that
32-8-32 nodes, SeLU activation function, MAE ANNs and isotonic regression can improve the
loss function. It was trained using the Adagrad performance of random forests for the award
optimizer and appears to be a very promising price estimation of public procurement auctions.
configuration in terms of the MAE. The simplicity Furthermore, the influence of the neural network
of this network design makes it very suitable in hyperparameters and gradient optimizers on
terms of generalization to other data. the performance of the ANN was evaluated in
detail and it was concluded that a careful choice
Similarly, when considering the RMSE metrics, of hyperparameters can further improve the
the MLP with parameters by default outperformed predictions of the model.
all the other configurations. Relative errors for the
previous two ANN configurations were also very These experiments used different error metrics,
good. The ANN2 model had the best RAE, and and the performance of different ML paradigms
the MLP model had the best RRSE. These results was evaluated. Upon analysing the obtained
confirmed that these are the best ANN designs results, it was concluded that among the methods
among the ones evaluated herein. Experts may that are not based on ANNs, isotonic regression
select ANN or ANN2 depending on the risk they is the model that gives the best results. Using its
are taking: ANN2 minimizes the absolute error implementation in WEKA, it was corroborated
value, while ANN (MLP) obtains the minimum that it is a fast and efficient method for training
value for the square of the errors, which could be and testing. However, according to all the
considered as a riskier bidding. error metrics considered, the ANN models can

Table 1. Results compared with the baseline model (random forest)

MAE RMSE RAE RRSE


Random Forest (baseline model) 179,247.80€ 6,621,784.24€ 31.11% 74.86%
Linear regression 228,491.36€ 15,535,231.61€ 39.66% 175.63%
Isotonic regression 136,971.39€ 5,648,693.54€ 23.76% 63.86%
ANN (MLP) 270,953.50€ 1,974,981.24€ 47.03% 22.33%
ANN1 140,763.27€ 7,416,004.50€ 23.03% 83.84%
ANN2 123,570.91€ 5,110,687.50€ 20.22% 57.78%
ANN3 157,181.00€ 9,543,883.00€ 25.71% 107.90%
ANN4 124,035.82€ 3,304,259.20€ 20.29% 37.36%

https://www.sic.ici.ro
Award Price Estimator for Public Procurement Auctions Using Machine Learning Algorithms... 75

outperform the results from isotonic regression. and their economic offers. Unfortunately, this
It was proved that a hyperparameter optimization information has not been consistently collected
phase can contribute to improving the predictions in the Spanish public procurement datasets until
made by the ANNs. now. When these values become available, they
will be added to the input variables of this study.
There are a number of ways in which this work
could be extended. Procurement datasets are Data Availability
updated daily, so we can increase the size of the
dataset. An update of the dataset in order to include The processed data used to support the findings
tender information up to 2021 and a revaluation of this study are available from the corresponding
of the performance of the ML algorithms are author upon request. The raw data from Spain are
planned. On the other hand, three interesting input available from the Ministry of Finance, Spain.
variables that have not yet been used and that Open data of Spanish tenders are hosted in:
could improve the award price estimator in terms
of accuracy were discovered during the analysis. https://www.hacienda.gob.es/es-ES/GobiernoAbierto/
These variables include the price criteria weighing Datos%20Abiertos/Paginas/licitaciones_plataforma_
variable and the number of bidders for each tender contratacion.aspx

REFERENCES
Abadi, M. et al. (2016). TensorFlow: Large- in project tendering, Expert Systems with Applications,
Scale Machine Learning on Heterogeneous 147, 113194. DOI: 10.1016/j.eswa.2020.113194
Distributed Systems. Available at: <http://arxiv.org/
abs/1603.04467>, last accessed: 20 May, 2021. Breiman, L. (2001). Random forests, Machine Learning,
45(1), 5-32. DOI: 10.1023/A:1010933404324
Albano, G. L., Cesi, B. & Iozzi, A. (2017). Public
procurement with unverifiable quality: The case Bzdok, D., Altman, N. & Krzywinski, M. (2018).
for discriminatory competitive procedures, Journal Statistics versus machine learning, Nature Methods,
of Public Economics, 145, 14-26. DOI: 10.1016/j. 15(4), 233-234. DOI: 10.1038/nmeth.4642
jpubeco.2016.11.004
Chakravarti, N. (1989). Isotonic Median Regression:
Ballesteros-Pérez, P., Campo-Hitschfeld, M., Mora- A Linear Programming Approach, Mathematics of
Meliá, D. & Domínguez, D. (2015). Modeling Operations Research, 14(2), 303-308. DOI: 10.1287/
bidding competitiveness and position performance moor.14.2.303
in multi-attribute construction auctions, Operations
Research Perspectives, 2, 24–35. DOI: 10.1016/j. Chou, J. S., Lin, C.-W., Pham, A.-D. & Shao, J.-
orp.2015.02.001 Y. (2015). Optimized artificial intelligence models
for predicting project award price, Automation
Ballesteros-Pérez, P., González-Cruz, M. C. in Construction, 54, 106-115. DOI: 10.1016/j.
& Cañavate-Grimal, A. (2012). Mathematical autcon.2015.02.006
relationships between scoring parameters in
capped tendering, International Journal of Project Conley, T. G. & Decarolis, F. (2016). Detecting
Management, 30(7), 850-862. DOI: 10.1016/j. Bidders Groups in Collusive Auctions, American
ijproman.2012.01.008 Economic Journal: Microeconomics, 8(2), 1-38. DOI:
10.1257/mic.20130254
Ballesteros-Pérez, P., González-Cruz, M. C. &
Cañavate-Grimal, A. (2013). On competitive bidding: Dotoli, M., Epicoco, N. & Falagario, M. (2020).
Scoring and position probability graphs, International Multi-Criteria Decision Making techniques for the
Journal of Project Management, 31(3), 434-448. DOI: management of public procurement tenders: A case
10.1016/j.ijproman.2012.09.012 study, Applied Soft Computing, 88, 106064. DOI:
10.1016/j.asoc.2020.106064
Ballesteros-Pérez, P., Skitmore, M., Pellicer, E. &
Gutierrez, J. H. (2016). Improving the estimation of Duchi, J. C., Bartlett, P. L. & Wainwright, M. J. (2012).
probability of bidder participation in procurement Randomized smoothing for (parallel) stochastic
auctions, International Journal of Project optimization. In 2012 IEEE 51st IEEE Conference on
Management, 34(2), 158-172. DOI: 10.1016/j. Decision and Control (CDC) (pp. 5442-5444). IEEE.
ijproman.2015.11.001 DOI: 10.1109/CDC.2012.6426698

Bilal, M. & Oyedele, L. O. (2020). Big Data with deep European Commission (2017). Public procurement,
learning for benchmarking profitability performance European semester thematic factsheet. Available at:

ICI Bucharest © Copyright 2012-2021. All rights reserved


76 Garcia Rodriguez M. J. , Rodriguez Montequin V., Aranguren Ubierna A., Santana Hermida R. , Sierra Araujo B. , Zelaia Jauregi A.

<https://ec.europa.eu/info/sites/info/files/file_import/ Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter,


european-semester_thematic-factsheet_public- S. (2017). Self-Normalizing Neural Networks,
procurement_en_0.pdf >, last accessed: 20 May, 2021. Advances in Neural Information Processing Systems,
2017(Decem), 972-981. Available at: <http://arxiv.
European Commission (2020). Unleashing the Full org/abs/1706.02515>, last accessed: 20 May, 2021.
Potential of SMEs, p. 3. DOI: 10.2775/296379
Kuziemski, M. & Misuraca, G. (2020). AI governance
García Rodríguez, M. J., Montequín, V. R., Ortega- in the public sector: Three tales from the frontiers of
Fernández, F. & Balsera, J. (2019a). Public automated decision-making in democratic settings,
Procurement Announcements in Spain: Regulations, Telecommunications Policy, 44(6), 101976. DOI:
Data Analysis, and Award Price Estimator Using 10.1016/j.telpol.2020.101976.
Machine Learning, Complexity, 2019(v), 1-20. DOI:
10.1155/2019/2360610 Lee, I. & Shin, Y. J. (2020). Machine learning for
enterprises: Applications, algorithm selection, and
García Rodríguez, M. J., Montequín, V. R., Ortega- challenges, Business Horizons, 63(2), 157-170. DOI:
Fernández, F. & Balsera, J. (2019b). Spanish Public 10.1016/j.bushor.2019.10.005.
Procurement: Legislation, open data source and
extracting valuable information of procurement Lorentziadis, P. L. (2020). Competitive bidding in
announcements, Procedia Computer Science, 164, asymmetric multidimensional public procurement,
441-448. DOI: 10.1016/j.procs.2019.12.204 European Journal of Operational Research, 282(1),
211-220. DOI: 10.1016/j.ejor.2019.09.005
García Rodríguez, M. J., Montequín, V. R., Ortega-
Fernández, F. & Balsera, J. (2020). Bidders Mohungoo, I., Brown, I. & Kabanda, S. (2020) A
Recommender for Public Procurement Auctions Systematic Review of Implementation Challenges in
Using Machine Learning: Data Analysis, Algorithm, Public E-Procurement, Lecture Notes in Computer
and Case Study with Tenders from Spain, Complexity, Science (including subseries Lecture Notes in Artificial
1, 1-20. DOI: 10.1155/2020/8858258 Intelligence and Lecture Notes in Bioinformatics)
(2020) 12067 LNCS, 46-58. Springer International
Gorgun, M. K., Kutlu, M. & Onur Tas, B. K. Publishing. DOI: 10.1007/978-3-030-45002-1_5
(2020). Predicting The Number of Bidders in
Public Procurement. In 2020 5th International Mullainathan, S. & Spiess, J. (2017). Machine
Conference on Computer Science and Engineering Learning: An Applied Econometric Approach,
(UBMK). (pp. 360-365). IEEE. DOI: 10.1109/ Journal of Economic Perspectives, 31(2), 87-106.
UBMK50275.2020.9219404 DOI: 10.1257/jep.31.2.87

Hall, M. Frank, E., Holmes, G., Bernhard Pfahringer, Obwegeser, N. & Müller, S. D. (2018). Innovation
B., Reutemann, P. & Witten, I. H. (2009). The and public procurement: Terminology, concepts, and
WEKA data mining software, ACM SIGKDD applications, Technovation, 74–75(April 2016), 1-17.
Explorations Newsletter, 11(1), 10-18. DOI: DOI: 10.1016/j.technovation.2018.02.015
10.1145/1656274.1656278
Onur, I. & Tas, B. K. O. (2019). Optimal bidder
Hastie, T., Tibshirani, R. & Friedman, J. (2009). The participation in public procurement auctions,
Elements of Statistical Learning, second edition. New International Tax and Public Finance, 26(3), 595-617.
York, NY: Springer New York. Part of the Springer DOI: 10.1007/s10797-018-9515-2
Series in Statistics book series. DOI: 10.1007/978-0-
387-84858-7 TensorFlow (2021). An end-to-end open source
machine learning platform. Available at:<https://
Huber, M. & Imhof, D. (2019). Machine learning with www.tensorflow.org>, last accessed: 20 May, 2021.
screens for detecting bid-rigging cartels, International
Journal of Industrial Organization, 65, 277-301. DOI: Waikako University (2021). Weka 3: Machine
10.1016/j.ijindorg.2019.04.002 Learning Software in Java. Available at: <http://old-
www.cms.waikato.ac.nz/ml/weka/>, last accessed: 20
Keras. (2021). Optimizers. Available at: <https://keras. May, 2021.
io/api/optimizers/>, last accessed: 20 May, 2021.
Wang, Y. Xi, C., Zhang, S., Yu, D., Zhang, W. & Li,
Kim, J. M. & Jung, H. (2019). Predicting bid Y. (2014). A combination of extended fuzzy AHP and
prices by using machine learning methods, Fuzzy GRA for government e-tendering in hybrid
Applied Economics, 51(19), 2011-2018. DOI: fuzzy environment, Scientific World Journal, 2014(1),
10.1080/00036846.2018.1537477 123675. DOI: 10.1155/2014/123675

Kingma, D. P. & Ba, J. (2014). Adam: A Method Witten, I., Frank, E. & Hall, M. (2011). Data Mining:
for Stochastic Optimization. In 3rd International Practical Machine Learning Tools and Techniques.
Conference on Learning Representations, ICLR 2015 Third. Boston: Elsevier. DOI: 10.1016/C2009-0-
- Conference Track Proceedings (pp. 1–15). Available 19715-5
at: <http://arxiv.org/abs/1412.6980>, last accessed: 20
May, 2021.

https://www.sic.ici.ro

You might also like