0% found this document useful (0 votes)
37 views9 pages

Cryptocurrency Price Forecasting Using Xgboost Regressor and Technical Indicators

This study presents a machine learning approach using the XGBoost regressor model to predict cryptocurrency prices, specifically focusing on Bitcoin. The model incorporates various technical indicators and historical market data to enhance prediction accuracy, demonstrating significant improvements over traditional methods. The research highlights the importance of machine learning in navigating the complexities of the volatile cryptocurrency market, providing valuable insights for traders and investors.

Uploaded by

zoo.inc04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views9 pages

Cryptocurrency Price Forecasting Using Xgboost Regressor and Technical Indicators

This study presents a machine learning approach using the XGBoost regressor model to predict cryptocurrency prices, specifically focusing on Bitcoin. The model incorporates various technical indicators and historical market data to enhance prediction accuracy, demonstrating significant improvements over traditional methods. The research highlights the importance of machine learning in navigating the complexities of the volatile cryptocurrency market, providing valuable insights for traders and investors.

Uploaded by

zoo.inc04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1

Cryptocurrency Price Forecasting Using XGBoost


Regressor and Technical Indicators
Abdelatif Hafid1 , Maad Ebrahim2 , Ali Alfatemi3 , Mohamed Rahouti3 , and Diogo Oliveira4
1 ESISA Analytica, Higher School of Engineering in Applied Sciences, Fez, Morocco

abdelatif.hafid@yahoo.com
2 GAIA, Ericsson, Montreal, Canada

maad.ebrahim@ericsson.com
3 Computer and Information Science, Fordham University, NY 10023, USA.

mrahouti@fordham.edu; aalfatemi@fordham.edu
4 College of IST, Penn State University, Monaca, PA 15601, USA.
arXiv:2407.11786v1 [cs.LG] 16 Jul 2024

dko5179@psu.edu

Abstract—The rapid growth of the stock market has at- faces challenges such as high price volatility and relatively
tracted many investors due to its potential for significant profits. smaller capitalization. Success in cryptocurrency financial
However, predicting stock prices accurately is difficult because trading hinges on the careful analysis and selection of data,
financial markets are complex and constantly changing. This is
especially true for the cryptocurrency market, which is known making the development of machine learning models crucial
for its extreme volatility, making it challenging for traders and for extracting meaningful insights. Models such as Long
investors to make wise and profitable decisions. This study Short Term Memory (LSTM) and Random Forest (RF) are
introduces a machine learning approach to predict cryptocur- instrumental in predicting cryptocurrency prices by leveraging
rency prices. Specifically, we make use of important technical historical data and patterns, thereby aiding effective decision-
indicators such as Exponential Moving Average (EMA) and
Moving Average Convergence Divergence (MACD) to train and making in this volatile market. Despite the potential, there have
feed the XGBoost regressor model. We demonstrate our approach been limited studies attempting to create successful trading
through an analysis focusing on the closing prices of Bitcoin strategies in the cryptocurrency market.
cryptocurrency. We evaluate the model’s performance through With the advent of FinTech, machine learning models have
various simulations, showing promising results that suggest its been increasingly adopted to forecast stock price movements,
usefulness in aiding/guiding cryptocurrency traders and investors
in dynamic market conditions. transforming the landscape of financial analysis and trading.
These models leverage large datasets and complex algorithms
Index Terms—Artificial intelligence, Bitcoin, Machine learn- to identify patterns and predict future price trends, which has
ing, Market forecasting, Price prediction, Regression analysis,
XGBoost. led to notable success across various markets, including the
S&P 500 and NASDAQ [1]. In the cryptocurrency market,
which is characterized by its high volatility and rapid price
I. I NTRODUCTION fluctuations, machine learning techniques have proven particu-
Over the past few years, the rapid expansion of the stock larly valuable. Studies have demonstrated the efficacy of deep
market has made it an appealing option for investors seeking learning methods, such as Stacked Denoising Autoencoders
high returns and easy access. However, investing in stocks (SDAE) and LSTM networks, in predicting Bitcoin prices
carries inherent risks, underscoring the need for a well-defined with high accuracy [2], [3]. These models utilize a variety of
investment strategy. Traditionally, investors relied on empirical inputs, including historical price data, trading volume, public
methods such as technical analysis, guided by financial ex- sentiment, and macroeconomic indicators, to generate predic-
pertise. With the widespread adoption of financial technology tions that can guide investment decisions. The integration of
(FinTech), statistical models incorporating machine learning machine learning into FinTech has thus provided investors
techniques have emerged for forecasting stock price move- with powerful tools to navigate the complexities of financial
ments. This shift has demonstrated significant success across markets, enhancing their ability to make informed and strategic
various markets, including the S&P 500, NASDAQ [1], and the trading decisions.
cryptocurrency market [2], [3]. In this research, our emphasis Despite the advantages of the cryptocurrency market, such
is on the cryptocurrency market, a dynamic force in finance, as abundant market data and continuous trading, it faces signif-
with a particular focus on Bitcoin price prediction [4]. icant challenges like high price volatility and relatively smaller
Furthermore, Blockchain technology, the backbone of cryp- capitalization. Successful trading in this market depends on
tocurrencies, has gained substantial attention in the banking careful data analysis and selection, making the development
and financial industry due to its secure and transparent de- of machine learning models crucial for extracting meaningful
centralized database [5]. Despite the advantages of abundant insights. Models like LSTM and RF are instrumental in
market data and continuous trading, the cryptocurrency market predicting cryptocurrency prices by utilizing historical data
2

and patterns, thus aiding effective decision-making in this the machine learning model and its mathematical formulation.
volatile landscape. While there have been limited studies on In Section V, we evaluate/assess our proposed model. Section
developing successful trading strategies in the cryptocurrency V also provides a comparison between the proposed work and
market, our research aims to bridge this gap by introducing a existing studies in the literature. Finally, Section VI concludes
novel machine learning strategy using the XGBoost regressor the paper.
model, which incorporates essential technical indicators and
historical data to enhance financial trading strategies. II. R ELATED W ORK
This research introduces an efficient machine learning ap-
proach for forecasting cryptocurrency prices, specifically fo- The effort to forecast cryptocurrency prices has garnered
cusing on Bitcoin. The motivation behind this study stems significant interest in recent years, leading to the development
from the inherent volatility and complexity of the cryptocur- of various methods to address this complex problem [6]. This
rency market, which pose significant challenges for traders section reviews advanced studies that employ machine learning
and investors. Traditional methods of technical analysis and for predicting cryptocurrency prices, with a particular focus
empirical strategies are often insufficient in predicting price on Bitcoin due to its dominant position and the extensive
movements in such a dynamic environment. To address this, availability of data.
we propose using the XGBoost regressor model, a powerful Among these advancements, machine learning has signifi-
machine learning technique known for its robustness and cantly impacted cryptocurrency price forecasting by providing
accuracy. Our methodology integrates a comprehensive set models that adeptly navigate the complex and volatile digital
of technical indicators, including the Exponential Moving currency market [7]. These methods range from simple regres-
Average (EMA), Moving Average Convergence Divergence sion models to advanced deep learning networks, each capable
(MACD), Relative Strength Index (RSI), and other relevant of detecting patterns and predicting future prices based on
metrics derived from historical market data. The data is historical data [8].
sourced from Binance via its API, covering a detailed time Cryptocurrency value fluctuations are influenced by numer-
span with high-frequency intervals, which allows for capturing ous factors, which has prompted the adoption of machine
rapid market changes. learning for price prediction [9], [10]. For instance, studies
The proposed model undergoes extensive preprocessing and by Greaves and AU [11] have investigated using network
feature engineering to enhance its predictive capabilities. By attributes and machine learning to predict Bitcoin prices.
employing regularization techniques, we mitigate the risk of Similarly, Jang and Lee [12] combined blockchain-related
overfitting and fine-tune the model parameters through a grid features, time series analysis, and Bayesian neural networks
search for optimal performance. Our results demonstrate that (BNNs) for Bitcoin price analysis.
the XGBoost regressor model significantly improves predic- Building on this foundation, further research by [13],
tion accuracy, evidenced by low Mean Absolute Error (MAE) [14], and [15] has applied machine learning to Bitcoin price
and Root Mean Squared Error (RMSE) values, as well as a forecasting. Saad et al. [15] not only predicted prices but
near-perfect R-squared value. This study contributes to the also identified critical network attributes and user behaviors
state-of-the-art by providing a robust and scalable solution for influencing price variations in Bitcoin and Ethereum [16],
cryptocurrency price prediction, leveraging advanced machine alongside the supply and demand dynamics of cryptocurren-
learning techniques to navigate the complexities of financial cies. Additionally, Sin and Wang [17] utilized neural networks
markets and aiding in informed decision-making for traders for price predictions, leveraging blockchain data features.
and investors. Continuing this trend, Christoforou et al. [18] developed a
The key contributions of this paper are summarized as Bitcoin price prediction model using neural networks, focusing
follows: on factors affecting price volatility and utilizing blockchain
• Introduce an efficient machine learning strategy using data and network activity metrics for forecasting. Furthermore,
the XGBoost regressor model for cryptocurrency price Chen et al. [19] and Akyildirim et al. [20] demonstrated the ap-
prediction. plication of machine learning in forecasting Bitcoin prices and
• Integrate a comprehensive set of technical indicators, mid-price movement of Bitcoin futures, respectively. These
including EMA, MACD, RSI, and others, with historical studies highlight the ability of machine learning to harness vast
market data. datasets and identify complex patterns, enhancing predictive
• Employ regularization techniques to mitigate overfitting accuracy beyond traditional statistical approaches.
and fine-tuned model parameters through grid search. Moreover, some studies have demonstrated the effectiveness
• Demonstrate significant improvements in prediction accu- of combining machine learning techniques with blockchain
racy with low MAE, RMSE, and a near-perfect R-squared data for cryptocurrency price forecasting. For example, Martin
value. et al. [21] introduced a hybrid method that merges diverse data
• Provide a robust and scalable solution for navigating and analytical techniques, enhancing accuracy in this complex
the complexities of financial markets, aiding informed field. Liu et al. [22] focused on optimizing performance and
decision-making for traders and investors. interpretability in financial time series, showcasing the benefits
The rest of the paper is organized as follows. Section II of combining various machine learning approaches. He et al.
reviews the most relevant existing works. Section III explains [23] developed a deep learning ensemble model for financial
how we collected and prepared the data. Section IV proposes time series forecasting, applicable to cryptocurrencies, illus-
3

its balance between capturing detailed market fluctuations


and maintaining accuracy. The data is split into 80% for the
training set and 20% for the testing set. The choice of a shorter
time interval is particularly important due to the high volatility
of the Bitcoin market, where rapid changes are frequent. In
such highly volatile markets, shorter intervals are essential for
accurately capturing these swift price movements, unlike in
less volatile markets where longer intervals might suffice.
Figure 1 illustrates the Bitcoin close price over time in USD.
The x-axis represents dates, while the y-axis represents the
price in USD. The plot provides a visual representation of
the fluctuation in Bitcoin’s closing price over the observed
period, enabling insights into the cryptocurrency’s price trend
and volatility.
We take advantage of StandardScaler from
Fig. 1: Bitcoin close price over time. sklearn.preprocessing module to scale the data.
Let’s denote the elements of the matrix X as xij , where
i represents the row index (sample) and j represents the
trating the increased reliability and accuracy of multiple deep column index (feature). The transformation applied by the
learning [24] strategies. StandardScaler to each feature j is outlined as follows:
Additionally, Nazareth and Reddy [8] reviewed machine 1
Pm
learning in finance, highlighting hybrid models’ effectiveness 1) Compute the meanq(µj = m i=1 xij ) and standard
1
Pm 2
in handling financial market complexities. Further research by deviation (σj = m i=1 (xij − µj ) ) of feature j,
Nagula and Alexakis [25], Petrovic et al. [26], Gupta and where m is the number of samples (rows), xij is the
Nalavade [27], and Luo et al. [28] underscores the success element at the i-th row and j-th column of X.
of diverse computational techniques in improving Bitcoin 2) Apply the transformation to each element of feature j:
price predictions, advancing sophisticated, accurate models for
xij − µj
cryptocurrency investments. x′ij =
In conclusion, machine learning not only excels in predictive σj
accuracy but also in adaptability and scalability, both of which where x′ij is the scaled value of xij .
are essential as the cryptocurrency market evolves. With the
capacity to update models with new data, machine learning
remains a vital tool for cryptocurrency trading and investment, B. Feature Engineering
ensuring timely and precise forecasts [19], [20].
Unlike existing studies, our work introduces a novel ma- In this section, we elaborate on the various features incor-
chine learning strategy that leverages the XGBoost regressor porated in this case study, employing both historical market
model, combining a range of technical indicators such as EMA data and technical indicators.
and MACD with historical data for Bitcoin price prediction. 1) Historical Data: In historical data analysis, we utilize
This approach emphasizes the use of regularization techniques various metrics to understand the behavior of Bitcoin prices
to prevent overfitting and fine-tuning model parameters for within specific time periods. These metrics include:
enhanced accuracy. Our methodology stands out by effectively • Open price (Op ): The initial price of Bitcoin at the
integrating diverse datasets and analytical techniques, ensuring beginning of a specific time period.
robust and precise predictions in the highly volatile cryptocur- • Highest price (Hp ): The maximum price of Bitcoin
rency market. recorded during a time period.
• Lowest price (Lp ): The minimum price of Bitcoin
III. DATA recorded during a time period.
This section covers the essential notations and abbreviations, • Close price (Cp ): The final price of Bitcoin at the end of
explains the data collection process, details the preprocessing a time period.
steps, and discusses the engineering of additional features. • Trading volume (V): The total number of Bitcoin traded
Table I provides the definitions of the parameters and abbre- within a time period.
viations used in this paper. • Quote Asset Volume (QAV): The total trading value of
Bitcoin within a time period.
• Number of Trades (NOT): The total number of trades
A. Data Collection and Preprocessing executed during a time period.
We obtained Bitcoin historical market data from Binance • Total Buy Base Volume (TBBV): The total volume of
via the Binance API [29]. The dataset spans from February Bitcoin bought during a time period.
1, 2021, to February 1, 2022, with a time interval of 15 • Total Buy Quote Volume (TBQV): The total value of
minutes (∆t = 15 minutes). This interval was chosen for Bitcoin bought during a time period.
4

TABLE I: Notations and abbreviations used in the paper.


Notation Description
m Total number of samples or observations
mtrain Number of training samples
mtest Number of testing samples
n Number of features
(i)
Cp Close price at time ti
(i)
Op Opening price at time ti
(i)
Hp High price at time ti
(i)
Lp Low price at time ti
V (i) Volume of the cryptocurrency being traded at time ti
QAV(i) Total trading value at time ti
NOT(i) Number of trades at time ti
TBBV(i) Total volume of Bitcoin bought at time ti
(i)
RSIα Relative strength index at time ti within a time period α
MACD(i) Moving average convergence divergence at time ti
(i)
EMAα Exponential moving average at time ti within a period of time α
(i)
PROCα Price rate of change at time ti within a period of time α
(i)
%Kα Stochastic oscillator at time ti within a period of time α
(i)
MOMα Momentum at time ti within a period of time α
η Learning rate
λ, α Regularization parameters
N Number of trees
∆t Time interval

2) Technical Indicators: Technical analysis indicators rep- cryptocurrency price forecasting using the XGBoost regressor
resent a trading discipline utilized to assess investments and model combined with various technical indicators such as
pinpoint trading opportunities through the analysis of statis- EMA, MACD, RSI, and more. The process includes data col-
tical trends derived from trading activities, including price lection and preprocessing, feature engineering, model training
movements and volume [30]. In this study, we explore in- with hyperparameter tuning, and model evaluation. Details of
dicators to feed our machine learning model, such as EMA, this methodology are discussed in the following subsections.
MACD, relative strength index, momentum, price rate of
change, and stochastic oscillator.
Let (x(i) , y (i) ) denotes a single sample/observation, and the
We employ EMA with different periods, where EMA10 ,
set of samples is represented by:
EMA30 , and EMA200 represent the average price of Bitcoin
over the last 10, 30, and 200 periods, respectively. To measure
the magnitude of recent price changes and evaluate overbought
or oversold conditions, we use RSI. Specifically, RSI10 , RSI14 ,
RSI30 , and RSI200 assess price changes over 10, 14, 30, and
200 periods, respectively. In addition, we apply Momentum
(MOM) indicators to gauge the rate of change in Bitcoin
prices, with MOM10 and MOM30 reflecting changes over the n o
last 10 and 30 periods, respectively. S = (x(1) , y (1) ), (x(2) , y (2) ), . . . , (x(m) , y (m) )
Furthermore, we incorporate MACD, a trend-following
momentum indicator that illustrates the relationship between
two moving averages of Bitcoin prices. Additionally, we use
%K10, %K30, and %K200 as components of the stochastic
oscillator, which compare the current price of Bitcoin to its
price range over the last 10, 30, and 200 periods, respectively.
Finally, we include the Percentage Rate of Change with 9
(i)
periods (PROC9 ), measuring the percentage change in Bitcoin where x(i) ∈ Rn and y (i) = Cp .
prices over the last 9 periods.
Considering both technical indicators and historical data for
IV. M ETHODOLOGY price prediction necessitates the integration of diverse datasets.
This section details the proposed methodology for our To achieve this, we combine technical indicators and historical
machine learning approach to cryptocurrency price forecast- data as inputs to our model. The feature vector at a given time
ing. Algorithm 1 outlines our machine learning approach for t can be expressed as follows:
5

Algorithm 1 Cryptocurrency price forecasting using XGBoost regressor and technical indicators.
1: Input: Historical market data H = {h1 , h2 , ..., hT }, Technical indicators T = {t1 , t2 , ..., tN }, Target variable Y = {y1 , y2 , ..., yT }
2: Output: Trained XGBoost regressor model M
3: procedure DATA P REPARATION
4: // Collect and preprocess data
5: D ← Collect data from Binance API
6: Dtrain , Dtest ← Split data into training and testing
7: // Scale features
8: Dtrain ← StandardScaler.fit transform(Dtrain )
9: Dtest ← StandardScaler.transform(Dtest )
10: end procedure
11: procedure F EATURE E NGINEERING
12: // Extract historical data features
13: H ← Extract historical data features from D
14: // Calculate technical indicators: T E
15: T ← Calculate T E (EMA, MACD, RSI, etc.)
16: // Combine features
17: X ←H ∪T
18: end procedure
19: procedure M ODEL T RAINING
20: // Initialize XGBoost regressor
21: M ← XGBoost Regressor
22: // Define hyperparameter grid
23: G ← {η, Dmax , N, λ, α, γ, S, C}
24: // Perform grid search with cross-validation
25: θ ← GridSearchCV(M, G, scoring=RMSE)
26: // Train model with best hyperparameters
27: M ← M.f it(Xtrain , Ytrain )
28: end procedure
29: procedure M ODEL E VALUATION
30: // Predict on test set
31: Ŷ ← M.predict(Xtest )
32: // CalculateP evaluation metrics
33: MAE ← n1q n i=1 |yi − ŷi |
RMSE ← n1 n
P
i=1 (yi − ŷi )
34: 2
Pn
(y −ŷ )2
35: R2 ← 1 − Pi=1 i i
n (y −ȳ)2
i=1 i
36: // Display results
37: Display (MAE, RMSE, R2 )
38: end procedure
39: return Trained model M

To extend the generality of our model, we stack all feature


(i)
 
Cp vectors into a matrix X, which can be expressed as follows:
 V (i) 
 
(i) 
 QAV 

 Cp V ··· %K200 
 NOT(i) 
 
x11 x12 ··· x1m
 TBBV(i) 
 
 x21 x22 ··· x2m 
 RSI(i) 
   
14 
 x31 x32 ··· x3m 
X =  .. (2)

.. .. 
 
(i) 
 RSI30 

 . . ··· . 
 RSI(i)
   
 200 
 xn−11 xn−12 ··· xn−1m 
 MOM(i)  xn1 xn2 ··· xnm
10 
x(i) = (i)  , x(i) ∈ Rn (1)

 MOM30 
MACD(i)  Where:
 
 PROC(i) 
 
9  (1)
  (1)

Cp V (1)
 
%K200

(i) 
 EMA10 

 (2)   V (2)  (2) 
 Cp   %K200 

 EMA(i)
 
Cp = 
 ..  , V =  .  , . . . , %K200 = 
30 
  
 .. 
 
 EMA(i)   .. 

 200 
 .   . 
(i) 
 %K10 
 Cp
(m) V (m) (m)
%K200
 (i) 
 %K30 
(i) The output matrix can then be expressed as follows:
%K200
6

TABLE II: Parameter grid for GridSearchCV. the learning process of algorithms. The table lists various
Parameter Values
hyperparameters commonly used in the XGBoost regressor
model, a popular gradient boosting framework [31]. Each
N 300, 400
η 0.01, 0.1, 0.2 hyperparameter is accompanied by its corresponding values
Dmax 3, 4 that are explored during the grid search process. For instance,
Wmin 1, 3 N represents the number of estimators (trees) in the XGBoost
S 0.8, 1.0
C 0.8, 1.0 model, with values of 300 and 400 being considered. Similarly,
γ 0, 0.1 η denotes the learning rate, with potential values of 0.01, 0.1,
α 0.5, 1 and 0.2.
λ 0.5, 1
Other hyperparameters include Dmax for maximum depth
of trees, Wmin for minimum child weight, S for subsampling
ratio, C for column subsampling ratio, γ for minimum loss
y (1)

 reduction required to make further splits, α for L1 regular-
 y (2)  ization term on weights, and λ for L2 regularization term on
Y = . 
  weights.
 ..  This parameter grid serves as a roadmap for systematically
y (m) exploring various combinations of hyperparameters to identify
the optimal configuration for the XGBoost model, thereby
In this case study, the problem is to minimize the cost
enhancing its predictive performance. The best combination
function for XGBoost regressor, which is a regularized finite-
of hyperparameters for the XGBoost model was selected
sum minimization problem defined as:
based on the smallest RMSE, resulting in enhanced predictive
m
X train K
X performance. The chosen parameters are as follows:
min J(Θ) := L(yi , ŷi ) + R(fk ) (3)  
Θ C γ η Dmax Wmin N α λ S
i=1 k=1 Θ=
0.8 0 0.2 4 3 300 1 0.5 1.0
Where:
Finally, the RMSE achieved with this parameter combina-
• Θ represents the set of parameters to be learned during
tion is the smallest observed during the hyperparameter tuning
training.
process.
• L(yi , ŷi ) is the loss function that measures the difference
between the true target value yi and the predicted target
V. R ESULTS AND A NALYSIS
balue ŷi for the i-th instance. In the context of this case
study, we employ the mean squared error (MSE) loss In this section, we provide simulations-based evaluations
function, which is expressed as follows: of the proposed machine learning model. In particular, we
compute the Mean Absolute Error (MAE), RMSE, and R-
n
X squared (R2 ).
L(yi , ŷi ) = (yi − ŷi )2 (4)
i=1 n
1X
MAE = |yi − ŷi | (6)
Here, yi is the true target value for sample i, and ŷi is n i=1
the predicted target value for sample i.
• R(fk ) represents the regularization term for each tree to MAE provides a simple and straightforward interpretation
control its complexity. It typically includes both L1 and of the average absolute deviation between the predicted and
L2 regularization. Assuming T is the number of leaves actual values. It is easy to understand and is less sensitive to
in tree fk and wj,k is the weight for leaf j in tree fk , the outliers compared to other metrics like RMSE.
regularization term for tree fk is: v
u n
u1 X
T
1 X 2
T RMSE = t (yi − ŷi )2 (7)
n i=1
X
R(fk ) = γT + λ wj,k + α |wj,k | (5)
2 j=1 j=1
RMSE provides a measure of the average magnitude of
The regularization terms (R(fk )) help control the com- prediction errors in the same units as the target variable. It
plexity of individual trees in the ensemble, preventing penalizes larger errors more heavily than MAE, making it
overfitting. particularly useful when large errors are undesirable.
During training, XGBoost regressor aims to find the set Pn
(yi − ŷi )2
of parameters (Θ) that minimizes the overall cost function. R2 = 1 − Pi=1n 2
(8)
The optimization is typically performed using techniques like i=1 (yi − ȳ)
gradient boosting, which involves iteratively adding weak where ȳ is the mean of the actual values of the target
learners to the ensemble to reduce the residual errors [31]. variable.
Table II presents a parameter grid used in GridSearchCV, R2 Score provides an indication of how well the model fits
a technique for hyperparameter tuning in machine learning the data relative to a simple baseline model (e.g., a model
models. Hyperparameters are predefined settings that control that always predicts the mean). It ranges from 0 to 1, where
7

Fig. 2: Scatter plot showing the residuals against the predicted Fig. 3: Scatter plot of actual vs. predicted values.
values.

higher values indicate a better fit. R2 score is widely used efficacy across the spectrum of actual values, showcasing its
for comparing different models and assessing overall model predictive performance.
performance.
Metric Value
RMSE 59.9504 A. State-of-the-Art Comparison
MAE 46.2229
R2 0.9999 Lastly, this subsection provides a comparison between the
TABLE III: Model evaluation metrics. work proposed in this paper and existing studies in the
literature.
Table III presents key evaluation metrics for our regression Table IV provides a comprehensive comparison of vari-
model. The RMSE is 59.9504, indicating the square root of ous machine learning approaches in financial forecasting and
the average squared difference between predicted and actual trading. Shynkevich et al. [32] leverage machine learning
values. The MAE is 46.2229, indicating the average absolute algorithms on daily stock price time series, achieving op-
difference between predicted and actual values. The model’s timal performance by analyzing different forecast horizons
R2 Score is 0.9999, reflecting an exceptionally strong fit to and input window lengths. Similarly, Liu et al. [2] employ
the data. Overall, the model demonstrates high accuracy and SDAE deep learning models utilizing historical data, public
predictive capability, with low errors and a near-perfect R2 attention, and macroeconomic factors, which result in superior
score. prediction accuracy. In addition, Jaquart et al. [33] implement
Another way to assess the performance of the XGBoost ensemble machine learning models on cryptocurrency market
Regressor model is to analyze the relationship between the data (streamed from CoinGecko [35]), producing statistically
predicted values and the residuals. Let ytest be the true target significant predictions and incorporating a long-short portfolio
values from the test dataset, ŷpred be the predicted target values strategy. Furthermore, Hafid et al. [3] use a Random Forest
from the model, and ε be the residuals calculated as ε = classifier with historical data and a few technical indicators to
ytest − ŷpred . achieve high accuracy in market trend prediction, effectively
Figure 2 shows a scatter plot of the residuals against the signaling buy and sell moments.
predicted values. The plot displays the relationship between Saad et al. [15] integrate economic theories with machine
the predicted values (scaled by 1000) and the residuals (scaled learning, analyzing user and network activity to attain high
by 10). A horizontal dashed line at y = 0 indicates perfect accuracy in price prediction and offer insights into network
prediction, where residuals are centered around zero. The plot dynamics. Moreover, Akyildirim et al. [34] apply SVM, LR,
illustrates the model’s ability to predict accurately across the ANN, and RF algorithms on historical price data and technical
range of predicted values. indicators, demonstrating consistent predictive accuracy and
Furthermore, Figure 3 presents a scatter plot depicting the trend predictability. In contrast, this paper introduces a novel
comparison between predicted values (in 1000s) and actual approach using an XGBoost regressor with technical indicators
values (in 1000s). The diagonal dashed red line represents and historical data, achieving low MAE, RMSE, and an R2
ideal prediction, where actual values align perfectly with value close to 1, thereby contributing a new machine learning
predicted values. This plot offers insight into the model’s strategy to the field.
8

TABLE IV: Comparison of machine learning approaches in financial forecasting and trading.
Paper Methodology Data Utilization Model Performance Contribution to Field
Shynkevich et al. Machine learning al- Daily price time series for stocks Optimal performance with Impact of forecast horizon and
[32] gorithms varied metrics input window length
Liu et al. [2] SDAE, deep learning Historical data, public attention, Superior performance in Improved prediction with SDAE,
macroeconomic environment predictions RMAE, and DA
Jaquart et al. [33] Ensemble machine Cryptocurrency market data Statistically significant pre- Long-short portfolio strategy
learning models dictions
Hafid et al. [3] RF classifier Historical data, few technical in- High accuracy in market Effective market trend prediction
dicators trend prediction (buy & sell)
Saad et al. [15] Integration of eco- User and network activity High accuracy in price pre- Understanding network dynamics
nomic theories with diction
machine learning
Akyildirim et al. [34] SVM, LR, ANN, RF Historical price data, technical Predictive accuracy in price Evidence of predictability in
indicators trends trends
This paper XGBoost regressor Technical indicators, historical Low MAE, RMSE, R2 ≈ 1 Novel machine learning strategy
data

VI. C ONCLUSION [4] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,”


Decentralized business review, 2008. [Online]. Available: http:
Our research highlights the efficacy of the XGBoost regres- //dx.doi.org/10.2139/ssrn.3440802
sor model in forecasting Bitcoin prices using a combination [5] A. Hafid, A. S. Hafid, and M. Samih, “Scaling blockchains: A compre-
of technical indicators and historical market data. The model’s hensive survey,” IEEE access, vol. 8, pp. 125 244–125 262, 2020.
[6] H. Sebastião and P. Godinho, “Forecasting and trading cryptocurrencies
performance, as evidenced by the low Mean Absolute Error with machine learning under changing market conditions,” Financial
(MAE) and Root Mean Squared Error (RMSE) along with Innovation, vol. 7, no. 1, pp. 1–30, 2021.
a near-perfect R2 value, underscores its potential in provid- [7] A. M. Khedr et al., “Cryptocurrency price prediction using traditional
statistical and machine-learning techniques: A survey,” Intelligent Sys-
ing accurate and reliable predictions in the highly volatile tems in Accounting, Finance and Management, vol. 28, no. 1, pp. 3–34,
cryptocurrency market. By incorporating regularization tech- 2021.
niques to mitigate overfitting and fine-tuning model parameters [8] N. Nazareth and Y. V. R. Reddy, “Financial applications of machine
through an extensive grid search, we have achieved a robust learning: A literature review,” Expert Systems with Applications, vol.
219, p. 119640, 2023.
predictive model. Furthermore, the use of various technical [9] S. Tanwar et al., “Machine learning adoption in blockchain-based smart
indicators such as the Exponential Moving Average (EMA), applications: The challenges, and a way forward,” IEEE Access, vol. 8,
Moving Average Convergence Divergence (MACD), Relative pp. 474–488, 2019.
[10] J. B. Awotunde et al., “Machine learning algorithm for cryptocurrencies
Strength Index (RSI), and others, in conjunction with historical price prediction,” in Artificial Intelligence for Cyber Security: Methods,
prices and volume data, has proven effective in enhancing the Issues and Possible Horizons or Opportunities. Springer, 2021, pp.
model’s predictive capabilities. This approach not only offers 421–447.
[11] A. Greaves and B. Au, “Using the bitcoin transaction graph to predict
a comprehensive analysis of market trends but also facilitates the price of bitcoin,” No Data, 2015.
better decision-making for traders and investors. [12] H. Jang and J. Lee, “An empirical study on modeling and prediction
This work contributes to the field of financial forecasting, of bitcoin prices with bayesian neural networks based on blockchain
particularly in the domain of cryptocurrency price prediction. information,” IEEE Access, vol. 6, pp. 5427–5437, 2017.
[13] S. McNally et al., “Predicting the price of bitcoin using machine
The findings suggest that machine learning models, when learning,” in 2018 26th euromicro international conference on parallel,
properly calibrated and integrated with relevant technical distributed and network-based processing (PDP). IEEE, 2018, pp.
indicators, can serve as powerful tools for navigating the 339–343.
[14] S. Velankar et al., “Bitcoin price prediction using machine learning,”
complexities of financial markets. Future research could fur- in 2018 20th International Conference on Advanced Communication
ther explore the integration of additional data sources and Technology (ICACT). IEEE, 2018, pp. 144–147.
advanced machine learning techniques to continue improving [15] M. Saad et al., “Toward characterizing blockchain-based cryptocurren-
cies for highly accurate predictions,” IEEE Systems Journal, vol. 14,
the accuracy and applicability of such models in dynamic no. 1, pp. 321–332, 2019.
trading environments. [16] G. Wood et al., “Ethereum: A secure decentralised generalised trans-
action ledger,” Ethereum project yellow paper, vol. 151, no. 2014, pp.
1–32, 2014.
R EFERENCES [17] E. Sin and L. Wang, “Bitcoin price prediction using ensembles of neural
[1] Y.-L. Hsu, Y.-C. Tsai, and C.-T. Li, “Fingat: Financial graph attention networks,” in International Conference on Natural Computation, Fuzzy
networks for recommending top-k k profitable stocks,” IEEE Transac- Systems and Knowledge Discovery (ICNC-FSKD). IEEE, 2017, pp.
tions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 469–481, 666–671.
2021. [18] E. Christoforou et al., “Neural networks for cryptocurrency evalua-
[2] M. Liu, G. Li, J. Li, X. Zhu, and Y. Yao, “Forecasting the price of bitcoin tion and price fluctuation forecasting,” in Mathematical Research for
using deep learning,” Finance research letters, vol. 40, p. 101755, 2021. Blockchain Economy. Springer, 2020, pp. 133–149.
[3] A. Hafid, A. S. Hafid, and D. Makrakis, “Bitcoin price prediction using [19] Z. Chen, C. Li, and W. Sun, “Bitcoin price prediction using machine
machine learning and technical indicators,” in International Symposium learning: An approach to sample dimension engineering,” Journal of
on Distributed Computing and Artificial Intelligence. Springer, 2023, Computational and Applied Mathematics, vol. 365, p. 112395, 2020.
pp. 275–284. [Online]. Available: https://doi.org/10.1016/j.cam.2019.112395
9

[20] E. Akyildirim et al., “Forecasting mid-price movement of bitcoin futures


using machine learning,” Annals of Operations Research, vol. 330, no. 1,
pp. 553–584, 2023.
[21] K. Martin et al., “Combining blockchain and machine learning to fore-
cast cryptocurrency prices,” in International Conference on Blockchain
Computing and Applications (BCCA). IEEE, 2020, pp. 52–58.
[22] S. Liu et al., “Financial time-series forecasting: Towards synergizing
performance and interpretability within a hybrid machine learning ap-
proach,” arXiv preprint arXiv:2401.00534, 2023.
[23] K. He et al., “Financial time series forecasting with the deep learning
ensemble model,” Mathematics, vol. 11, no. 4, p. 1054, 2023.
[24] A. Alfatemi, M. Rahouti, R. Amin, S. ALJamal, K. Xiong, and Y. Xin,
“Advancing ddos attack detection: A synergistic approach using deep
residual neural networks and synthetic oversampling,” arXiv preprint
arXiv:2401.03116, 2024.
[25] P. K. Nagula and C. Alexakis, “A new hybrid machine learning model
for predicting the bitcoin (BTC-USD) price,” Journal of Behavioral and
Experimental Finance, vol. 36, p. 100741, 2022.
[26] A. Petrovic et al., “Cryptocurrency price prediction by using hybrid
machine learning and beetle antennae search approach,” in Telecommu-
nications Forum (TELFOR). IEEE, 2021, pp. 1–4.
[27] R. Gupta and J. E. Nalavade, “Metaheuristic assisted hybrid classifier
for bitcoin price prediction,” Cybernetics and Systems, vol. 54, no. 7,
pp. 1037–1061, 2023.
[28] C. Luo et al., “Bitcoin price forecasting: an integrated approach using
hybrid lstm-elm models,” Mathematical Problems in Engineering, vol.
2022, 2022.
[29] “Data from binance api,” https://www.binance.com/.
[30] S. B. Achelis, “Technical analysis from a to z,” 2001.
[31] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
in Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining, 2016, pp. 785–794.
[32] Y. Shynkevich, T. M. McGinnity, S. A. Coleman, A. Belatreche, and
Y. Li, “Forecasting price movements using technical indicators: Inves-
tigating the impact of varying input window length,” Neurocomputing,
vol. 264, pp. 71–88, 2017.
[33] P. Jaquart, S. Köpke, and C. Weinhardt, “Machine learning for cryp-
tocurrency market prediction and trading,” The Journal of Finance and
Data Science, vol. 8, pp. 331–352, 2022.
[34] E. Akyildirim, A. Goncu, and A. Sensoy, “Prediction of cryptocurrency
returns using machine learning,” Annals of Operations Research, vol.
297, pp. 3–36, 2021.
[35] “CoinGecko Methodology,” https://www.coingecko.com/en/
methodology, accessed: February 7, 2024.

You might also like