Ref 1
Ref 1
                                         College of Computing and IT, Arab Academy for Science, Technology and Maritime Transport,
                                         Alexandria 5517220, Egypt; saleh.mesbah@aast.edu (S.M.E.); waleedf@aast.edu (M.W.F.)
                                         * Correspondence: alamirlabib@yahoo.com
                                         Abstract: Stock value prediction and trading, a captivating and complex research domain, continues
                                         to draw heightened attention. Ensuring profitable returns in stock market investments demands
                                         precise and timely decision-making. The evolution of technology has introduced advanced predictive
                                         algorithms, reshaping investment strategies. Essential to this transformation is the profound reliance
                                         on historical data analysis, driving the automation of decisions, particularly in individual stock
                                         contexts. Recent strides in deep reinforcement learning algorithms have emerged as a focal point for
                                         researchers, offering promising avenues in stock market predictions. In contrast to prevailing models
                                         rooted in artificial neural network (ANN) and long short-term memory (LSTM) algorithms, this study
                                         introduces a pioneering approach. By integrating ANN, LSTM, and natural language processing
                                         (NLP) techniques with the deep Q network (DQN), this research crafts a novel architecture tailored
                                         specifically for stock market prediction. At its core, this innovative framework harnesses the wealth of
                                         historical stock data, with a keen focus on gold stocks. Augmented by the insightful analysis of social
                                         media data, including platforms such as S&P, Yahoo, NASDAQ, and various gold market-related
                                         channels, this study gains depth and comprehensiveness. The predictive prowess of the developed
                                         model is exemplified in its ability to forecast the opening stock value for the subsequent day, a feat
                                         validated across exhaustive datasets. Through rigorous comparative analysis against benchmark
                                         algorithms, the research spotlights the unparalleled accuracy and efficacy of the proposed combined
                                         algorithmic architecture. This study not only presents a compelling demonstration of predictive
                                         analytics but also engages in critical analysis, illuminating the intricate dynamics of the stock market.
                                         Ultimately, this research contributes valuable insights and sets new horizons in the realm of stock
Citation: Awad, A.L.; Elkaffas, S.M.;    market predictions.
Fakhr, M.W. Stock Market Prediction
Using Deep Reinforcement Learning.       Keywords: stock trading markets; deep reinforcement learning; DRL; neural networks; stock prediction;
Appl. Syst. Innov. 2023, 6, 106.         variational mode decomposition; BERT
https://doi.org/10.3390/asi6060106
                                  2. Related Work
                                       Stock price prediction efforts have centered on supervised learning techniques, such
                                  as neural networks, random forests, and regression methods [11]. A detailed analysis
                                  by authors [12] underscored the dependency of supervised models on historical data,
                                  revealing constraints that often lead to inaccurate predictions. In a separate study [13],
                                  speech and deep learning (DL) techniques were applied to stock prediction using Google
                                  stock datasets from NASDAQ. The research demonstrated that employing 2D principal
                                  component analysis (PCA) with deep neural networks (DNN) outperformed the results
                                  obtained with two-directional PCA combined with radial bias function neural network
                                  (RBFNN), highlighting the efficacy of specific methodologies in enhancing accuracy. An-
                                  other comprehensive survey [14] explored various DL methods, including CNN, LSTM,
                                  DNN, RNN, RL, and others, in conjunction with natural language processing (NLP) and
                                  WaveNet. Utilizing datasets sourced from foreign exchange stocks in Forex markets, the
                                  study employed metrics like mean absolute percentage error (MAPE), root mean square
                                  error (RMSE), mean square error (MSE), and the Sharpe ratio to evaluate performance.
                                  The findings highlighted the prominence of RL and DNN in stock prediction research,
                                  indicating the increasing popularity of these methods in financial modeling. While this
                                  study covered a wide array of prediction techniques, it notably emphasized the absence
                                  of results related to combining multiple DL methods for stock prediction. In a different
                                  studies [15,16], four DL models utilizing data from NYSE and NSE markets were examined:
                                  MLP, RNN, CNN, and LSTM. These models, when trained separately, identified trend
                                  patterns in stock markets, providing insights into shared dynamics between the two stock
                                  markets. Notably, the CNN-based model exhibited superior results in predicting stock
                                  prices for specific businesses. However, this study did not explore hybrid networks, leav-
                                  ing unexplored potential in creating combined models for stock prediction. Additionally,
Appl. Syst. Innov. 2023, 6, 106                                                                                            3 of 21
                                  3. Background
                                       This section provides essential context for understanding the research presented in
                                  this paper.
                                  Figure1.1. The
                                  Figure     The architecture
                                                 architectureof
                                                              ofan
                                                                 anartificial
                                                                    artificialneural
                                                                              neuralnetwork.
                                                                                     network.
                                   3.2. Recurrent
                                  3.2.  RecurrentNeural
                                                   NeuralNetwork
                                                           Network
                                         Recurrentneural
                                        Recurrent   neuralnetworks
                                                            networks(RNNs)
                                                                         (RNNs)excelexcelin inprocessing
                                                                                               processingsequential
                                                                                                              sequentialdata.
                                                                                                                           data.They
                                                                                                                                  Theypossess
                                                                                                                                        possess
                                   a memory     feature, retaining    information     from    previous    steps    in a
                                  a memory feature, retaining information from previous steps in a sequence as shown in sequence   as shown     in
                                   Figure  2. RNNs    incorporate    inputs   (“x”), outputs    (“h”),   and   hidden
                                  Figure 2. RNNs incorporate inputs (“x”), outputs (“h”), and hidden neurons (“A”). A    neurons  (“A”). A  self-
                                   loop on on
                                  self-loop  hidden   neurons
                                                 hidden  neuronssignifies
                                                                    signifiesinput
                                                                               inputfrom
                                                                                      fromthe theprevious
                                                                                                  previous timetime step (“t −− 1”). However,
                                                                                                                                      However,
                                   RNNs face
                                  RNNs     facechallenges
                                                 challenges like
                                                             likethe
                                                                   thevanishing
                                                                        vanishinggradient
                                                                                     gradient problem,
                                                                                                  problem,mitigated
                                                                                                                mitigated bybytechniques
                                                                                                                               techniqueslikelike
                                   longshort-term
                                  long   short-term memory
                                                       memory (LSTM)
                                                                 (LSTM) units.
                                                                            units. For
                                                                                    For instance,
                                                                                         instance, ifif the
                                                                                                         the input
                                                                                                              inputsequence
                                                                                                                      sequence comprises
                                                                                                                                 comprises sixsix
                                   daysof
                                  days   ofstock
                                            stockopening
                                                   openingprice
                                                             pricedata,
                                                                     data,the
                                                                            thenetwork
                                                                                network      unfurls
                                                                                          unfurls      into
                                                                                                    into  sixsix  layers,
                                                                                                               layers,    each
                                                                                                                        each    corresponding
                                                                                                                             corresponding     to
                                   to the  opening   stock price   of  a single   day.  However,      a  significant    challenge
                                  the opening stock price of a single day. However, a significant challenge confronting RNNs       confronting
                                  isRNNs   is the vanishing
                                     the vanishing   gradientgradient
                                                                problem,   problem,
                                                                             which has which
                                                                                           beenhas    been effectively
                                                                                                  effectively    addressed addressed
                                                                                                                             through through
                                                                                                                                       various
                                   various techniques,
                                  techniques,   includingincluding     the incorporation
                                                           the incorporation                   of long short-term
                                                                                   of long short-term       memory memory        (LSTM)
                                                                                                                        (LSTM) units   intounits
                                                                                                                                             the
                                  network.
                                   into the network.
                                  Figure2.2.Unfolded
                                  Figure    Unfoldedrecurrent
                                                     recurrentneural
                                                               neuralnetwork.
                                                                      network.
Appl. Syst. Innov. 2023, 6, x FOR PEER REVIEW                                                                                                      5 of 22
                                  3.3.
                                   3.3. LSTM
                                        LSTM
                                         LSTM
                                         LSTMenhances
                                                   enhancesRNNs’  RNNs’memory,
                                                                           memory,crucial
                                                                                        crucialfor
                                                                                                 forhandling
                                                                                                     handlingsequential
                                                                                                                  sequentialfinancial
                                                                                                                                financialdata.
                                                                                                                                             data.LSTM
                                                                                                                                                     LSTM
                                  units,   integrated
                                   units,Each
                                            integrated    into
                                                            into RNNs,
                                                                   RNNs,  have
                                                                             havethree
                                                                                    threegates: input
                                                                                            gates:      gate
                                                                                                    input     (i),
                                                                                                            gate   forget
                                                                                                                    (i),    gate
                                                                                                                         forget
                                                 of these gates operates utilizing sigmoid functions, transforming values into    (f),
                                                                                                                                 gate  and
                                                                                                                                        (f), output
                                                                                                                                             and      gatea
                                                                                                                                                    output
                                  (o). These
                                   gate (o).
                                   range    from gates
                                                 These   use
                                                    zero to    sigmoid
                                                           gates
                                                               one.use     functions
                                                                     Thissigmoid
                                                                            mechanism    to write,
                                                                                       functions    delete,
                                                                                                    to write,
                                                                                             empowers        and
                                                                                                          LSTMs      read
                                                                                                                   delete, information,
                                                                                                                             and read
                                                                                                                       to adeptly            addressing
                                                                                                                                    write,information,
                                                                                                                                              delete, and
                                  long-term
                                   addressing
                                   read          dependencies
                                          informationlong-term       and  preserving
                                                                      dependencies
                                                             from their                   data
                                                                                           and patterns.
                                                                            memory, rendering    preserving
                                                                                                       themIn exceptionally
                                                                                                               thedataLSTM   architecture
                                                                                                                           patterns.
                                                                                                                                  skilled     illustrated
                                                                                                                                         In atthe    LSTM
                                                                                                                                                handling
                                  in Figure     3, three
                                   architecturedependencies
                                   long-term               gates    play
                                                     illustrated in Figurepivotal   roles:
                                                                                 3, three gates
                                                                        and preserving             play patterns
                                                                                               crucial   pivotal roles: in data. Crucially, LSTMs
                                  1.
                                   1. Input
                                   address
                                         Input theGate    (i):
                                                          (i): This
                                                     challenge
                                                   Gate        Thisofgate
                                                                        the facilitates
                                                                      gate   vanishingthe
                                                                            facilitates       addition
                                                                                            gradient,
                                                                                          the            of
                                                                                                         of new
                                                                                                             new information
                                                                                              additionensuring        that gradient
                                                                                                                     information     to
                                                                                                                                      to the
                                                                                                                                         the cell
                                                                                                                                         values    state.
                                                                                                                                              cell remain
                                                                                                                                                    state.
                                  2.
                                   steep Forget     Gate
                                           enoughGate
                                   2. Forget                (f):
                                                       during     The   forget
                                                             (f): training.
                                                                  The forget    gate
                                                                              This      selectively
                                                                                    characteristic
                                                                                 gate                 discards
                                                                                        selectively significantly  information       that
                                                                                                                         reduces training
                                                                                                       discards information                is  no
                                                                                                                                               no longer
                                                                                                                                     that is times     and
                                                                                                                                                    longer
                                         relevant
                                   markedly
                                         relevant     or
                                                       or required
                                                  enhances             by
                                                                       by the
                                                                 accuracy,
                                                          required             model.
                                                                           theestablishing
                                                                               model.          LSTMs as a foundational technology in the
                                  3.
                                   3. Output
                                   domainOutput       Gate
                                                      Gate (o):
                                               of sequence          Responsible
                                                              (o):prediction,
                                                                    Responsible      for
                                                                                     for choosing
                                                                                  especially          the
                                                                                                      the information
                                                                                                for intricate
                                                                                          choosing               datasets to
                                                                                                           information          be
                                                                                                                                 be presented
                                                                                                                              prevalent
                                                                                                                             to                     as
                                                                                                                                                     as the
                                                                                                                                             in financial
                                                                                                                                      presented          the
                                         output.
                                   markets.
                                         output.
                                  Figure 3.
                                  Figure 3. LSTM
                                            LSTM architecture.
                                                 architecture.
                                  3.4. Reinforcement
                                       Each of theseLearning
                                                      gates operates utilizing sigmoid functions, transforming values into a
                                  rangeReinforcement
                                         from zero to one.  This mechanism
                                                        learning              empowers
                                                                 involves an agent      LSTMs
                                                                                   making       to adeptly
                                                                                           decisions       write, scenarios.
                                                                                                     in different  delete, andIt
                                  read  information  from  their memory,   rendering them exceptionally  skilled at handling
                                  comprises the agent, environment, actions, rewards, and observations. Reinforcement
                                   learning faces challenges such as excessive reinforcements and high computational costs,
                                   especially for complex problems. The dynamics of reinforcement learning are
                                   encapsulated in Figure 4, illustrating the interaction between the agent and its
                                   environment. Notably, states in this framework are stochastic, meaning the agent remains
Appl. Syst. Innov. 2023, 6, 106                                                                                                    5 of 21
                                  long-term dependencies and preserving crucial patterns in data. Crucially, LSTMs address
                                  the challenge of the vanishing gradient, ensuring that gradient values remain steep enough
                                  during training. This characteristic significantly reduces training times and markedly
                                  enhances accuracy, establishing LSTMs as a foundational technology in the domain of
                                   Figure 3. LSTM
                                  sequence         architecture.
                                              prediction, especially for intricate datasets prevalent in financial markets.
                                   3.4. Reinforcement
                                  3.4.  Reinforcement Learning
                                                         Learning
                                        Reinforcement learning involves
                                         Reinforcement                involves anan agent
                                                                                     agentmaking
                                                                                            makingdecisions
                                                                                                      decisionsinindifferent
                                                                                                                    differentscenarios. It
                                                                                                                               scenarios.
                                  Itcomprises
                                      comprisesthe theagent,
                                                       agent,environment,
                                                               environment, actions,
                                                                                actions, rewards, and
                                                                                                    and observations.
                                                                                                         observations. Reinforcement
                                                                                                                         Reinforcement
                                   learningfaces
                                  learning    faceschallenges
                                                     challengessuch
                                                                  suchasasexcessive
                                                                           excessive   reinforcements
                                                                                    reinforcements   andand  high
                                                                                                          high     computational
                                                                                                               computational        costs,
                                                                                                                                costs, es-
                                   especially
                                  pecially        for complex
                                            for complex            problems.
                                                           problems.             The dynamics
                                                                       The dynamics                  of reinforcement
                                                                                        of reinforcement                   learning are
                                                                                                          learning are encapsulated    in
                                   encapsulated
                                  Figure             in Figure
                                           4, illustrating         4, illustrating
                                                            the interaction  betweenthethe interaction
                                                                                           agent and itsbetween     the agent
                                                                                                         environment.    Notably,and   its
                                                                                                                                   states
                                  in  this framework
                                   environment.           are stochastic,
                                                     Notably,              meaning
                                                               states in this         the agent
                                                                              framework          remains unaware
                                                                                            are stochastic, meaningof   the
                                                                                                                      the    subsequent
                                                                                                                          agent  remains
                                  state, evenofwhen
                                   unaware              repeating the
                                                  the subsequent       same
                                                                    state,    action.
                                                                           even  when repeating the same action.
                                  Figure4.
                                  Figure 4. The
                                            The reinforcement
                                                 reinforcementlearning
                                                               learningprocess.
                                                                       process.
                                        Withinthe
                                       Within   the realm
                                                    realm of
                                                           of reinforcement
                                                              reinforcement learning,
                                                                                learning, several
                                                                                          several crucial
                                                                                                  crucial quantities
                                                                                                           quantities are determined:
                                  ••    Reward: A scalar value from the environment
                                       Reward:                                  environment that evaluates
                                                                                                    evaluates the preceding action.
                                        Rewards can
                                       Rewards    can be
                                                       be positive
                                                          positive oror negative,
                                                                        negative, contingent
                                                                                   contingent upon
                                                                                                upon the
                                                                                                      the nature
                                                                                                           nature of
                                                                                                                  of the
                                                                                                                      the environment
                                                                                                                          environment
                                        andthe
                                       and   theagent’s
                                                 agent’saction.
                                                          action.
                                  ••    Policy: This
                                       Policy:  This guides
                                                      guides the
                                                              the agent
                                                                   agent in
                                                                          in deciding
                                                                             deciding the
                                                                                       the subsequent
                                                                                            subsequent action
                                                                                                          action based
                                                                                                                 based on
                                                                                                                        on the
                                                                                                                             the current
                                                                                                                                 current
                                       state,
                                        state,helping
                                              helpingthetheagent
                                                            agentnavigate
                                                                    navigateits
                                                                              itsactions
                                                                                  actionseffectively.
                                                                                          effectively.
                                  ••   Value
                                        Value (V):
                                               (V): Represents
                                                    Represents thethe long-term
                                                                       long-term return,
                                                                                   return, factoring
                                                                                            factoring in
                                                                                                       in discount
                                                                                                          discount rates,
                                                                                                                     rates, rather
                                                                                                                             rather than
                                                                                                                                     than
                                       focusing  solely  on short-term    rewards
                                        focusing solely on short-term rewards (R).  (R).
                                  •    Action Value: Like the reward value, but incorporates additional parameters from the
                                       current action. This metric guides the agent in optimizing its actions within the given
                                       environment.
                                       Despite the advantages of reinforcement learning over supervised learning models, it
                                  does come with certain drawbacks. These challenges include issues related to excessive
                                  reinforcements, which can lead to erroneous outcomes. Additionally, reinforcement learn-
                                  ing methods are primarily employed for solving intricate problems, requiring substantial
                                  volumes of data and significant computational resources. The maintenance costs associated
                                  with this approach are also notably high.
                                       This study focuses on predicting gold prices based on next-day tweets sourced from
                                  news and media datasets. Gold prices exhibit rapid fluctuations daily, necessitating a robust
                                  prediction strategy. To achieve accurate predictions, this research employs a comprehensive
                                  approach integrating deep reinforcement learning (DRL), long short-term memory (LSTM),
                                  variational mode decomposition (VMD), and natural language processing (NLP). The
                                  prediction time spans from 2012 to 2019, utilizing tweets related to gold prices. DRL
                                  is enhanced by incorporating sentiment analysis of media news feeds and Twitter data,
                                  elevating prediction accuracy. The dataset used for this analysis was retrieved from the
                                  link https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-in-commodity-
                                  market-gold accessed on 1 February 2023. This dataset, spanning from 2000 to 2021,
Appl. Syst. Innov. 2023, 6, 106                                                                                                   6 of 21
                                  Figure
                                  Figure 5.
                                         5. The
                                            The DRL
                                                DRL process.
                                                    process.
                                       Another objective
                                                   objective of reinforcement learning is to maximize the cumulative reward
                                  instead of
                                           of the
                                               the immediate
                                                    immediatereward
                                                               reward[29].
                                                                        [29].Suppose
                                                                              Supposethe
                                                                                       thecumulative
                                                                                            cumulative reward
                                                                                                     reward    is represented
                                                                                                            is represented by by
                                                                                                                              Gt
                                  𝐺  and  immediate     reward
                                  and immediate reward by Rt :   by 𝑅 :
                                                                  𝐸[𝐺   ] = 𝐸[𝑅 + 𝑅 + ⋯ + 𝑅 ]                                         (1)
                                                               E[ Gt ] = E [ R t +1 + R t +2 + · · · + R T ]                          (1)
                                        In Equation (1), the reward is received at a terminal state T. This implies Equation (1)
                                        In Equation
                                  will hold         (1), the
                                             good when   the reward
                                                             problemisends
                                                                        received    at a terminal
                                                                              in terminal    state T,state
                                                                                                      alsoT.  This implies
                                                                                                             known          Equationtask
                                                                                                                    as the episodic   (1)
                                  will hold good when the problem ends in terminal state T, also known as the episodic
                                  [30]. In problems involving continuous data, the terminal state is not available, i.e., T = ∞.
                                  task [30]. In problems involving continuous data, the terminal state is not available, i.e.,
                                  A discount factor γ is introduced in Equation (2), which represents the cumulative reward,
                                  T = ∞. A discount factor γ is introduced in Equation (2), which represents the cumulative
                                  and (0 ≤ 𝛾 ≤ 1) to provide:
                                  reward, and (0 ≤ γ ≤ 1) to provide:
                                                          𝐺 = 𝛾 𝑅 + 𝛾 𝑅 + 𝛾 𝑅 + ⋯𝛾 𝑇 + ⋯                                              (2)
                                                     Gt = γ0 Rt+1 + γ1 Rt+2 + γ2 Rt+3 + · · · γk−1 Tt+k + · · ·                       (2)
𝐺 = 𝛾 𝑅 (3)
                                       To perform an action in the given state by the agent, value functions in RL methods
                                  determine the estimate of actions. The agent determines the value functions based on what
                                  future actions will be taken [31]. Bellman’s equations are essential in RL, as they provide
Appl. Syst. Innov. 2023, 6, 106                                                                                              7 of 21
                                                                                ∞
                                                                         Gt =   ∑ γ k R t + k +1                                (3)
                                                                                 0
                                      To perform an action in the given state by the agent, value functions in RL methods
                                  determine the estimate of actions. The agent determines the value functions based on
                                  what future actions will be taken [31]. Bellman’s equations are essential in RL, as they
                                  provide the fundamental property for value functions and solve MDPs. Bellman’s equations
                                  support the value function by calculating the sum of all possibilities of expected returns
                                  and weighing each return by its probability of occurrence in a policy [32].
                                        The actor–critic approach will form the policy as the actor will select actions, and the
                                  critic will evaluate the chosen actions. Hence, in this approach, the policy parameters θ
                                  will be adjusted for the actor to maximize the reward predicted by the critic. Here, the
                                  value function estimate for the current state is summed as a baseline to accelerate learning.
                                  The policy parameter θ of the actor is adjusted to maximize the total future reward. Policy
                                  learning is done by maximizing the value function [36].
                                        DRL is an action–critic-based value learning function that compromises current and
                                  future rewards [37]. The stock prediction problem can be formulated by describing the
                                  state space, action space, and reward function. Here, the state space is the environment
                                  designed to support single or multiple stock trading by considering the number of assets to
                                  trade in the market. The state space will show a linear increase with increasing assets. The
                                  state space has two components: the position state and the market signals. The position
                                  state will provide the cash balance and shares owned in each asset, and the market signals
                                  will contain all necessary market features for the asset as tuples [38]. The information is
                                  provided to the agent to make predictions of market movement. Here, the information is a
                                  hypothesis based on technical analysis and of the future behavior of the financial market
                                  based on its past trends. The information will also be used by economic and industry
                                  conditions, media, and news releases.
                                  3.9. TFIDF
                                        TF-IDF stands for term frequency–inverse document frequency. It is used for document
                                  search by getting a query as input and finding the relevant documents as output. It is a
                                  statistical analysis technique used to know the importance of a word inside a document. It
                                  calculates the frequency of a word inside a document, compares it with the frequency of
                                  the word inside all documents, and compares the two values. The assumption is that if the
                                  word is repeated many times in a document and rarely appears in other documents, this
                                  means that this word is vital for this document.
                                  3.10. BERT
                                       Bidirectional encoder representations from transformers (BERT) is based on deep
                                  learning transformers for natural language processing. BERT is trained bidirectionally,
                                  which means it analyzes the word and the surrounding words in both directions. Reading
                                  in both directions allows the model to understand the context deeply. BERT models are
                                  already pretrained, so they already know the word representation and the relationships
                                  between them. BERT is a generic model that can be fine-tuned for specific tasks like
                                  sentiment analysis tasks. BERT contains a stack of transformers, each consisting of an
                                  encoder and decoder network. It has two versions, the base version and the large one,
                                  which gives the best results compared to any other model.
                                  4. Problem Statement
                                        In the complex landscape of stock markets, the central objective of trading resides in
                                  the precise forecasting of stock prices. This accuracy is paramount, as it directly influences
                                  investors’ confidence, shaping their decisions on whether to buy, hold, or sell stocks amid
                                  the inherent risks of the market. Extensive scholarly research emphasizes the critical
                                  necessity for efficiency in addressing the challenges associated with stock price prediction.
                                  Efficient predictions are not just advantageous but pivotal, empowering investors with the
                                  knowledge needed for astute decision-making. Market efficiency, a foundational concept
                                  in this domain, refers to the phenomenon where stock prices authentically mirror the
                                  information available in the current trading markets. It is essential to recognize that
      Appl. Syst. Innov. 2023, 6, 106                                                                                              9 of 21
                                           these price adjustments might not solely stem from new information; rather, they can be
                                           influenced by existing data, leading to outcomes that are inherently unpredictable. In
                                           this context, our research endeavors to enhance the precision of stock price predictions,
                                           addressing the need for informed and confident decision-making among investors.
                                        Figure 6. The architecture with components of the proposed stock prediction model.
                                           Figure 6. The architecture with components of the proposed stock prediction model.
                                             In In
                                                 order
                                                   order to  facilitatethethe
                                                         to facilitate         implementation
                                                                           implementation  of the of the proposed
                                                                                                  proposed framework framework      the code is
                                                                                                                       the code is divided
                                           into three
                                        divided   intomajor
                                                       threemodules      that can be
                                                               major modules         summarized
                                                                                  that            in Algorithm
                                                                                       can be summarized      in1.Algorithm 1.
Appl. Syst. Innov. 2023, 6, 106                                                                                            10 of 21
                                  BERT is used in NLP tasks for predicting the next sentence [41]. In NLP, mixed models
                                  tend to provide the best results from BERT. For instance, TFIDF, SVM, and BERT will
                                  provide better sentiment output from the dataset. The sentiments are further classified into
                                  four categories: extremely positive, positive, negative, and extremely negative. NLP will
                                  support investors in classifying if the news is positive or negative to decide whether to sell,
                                  buy, or hold stock.
                                        In this phase, news data are fed to the natural language processing module to decide
                                  whether the news is positive or negative. The BERT model is used along with TFIDF in
                                  this task to achieve the most accurate results. Fine-tuning BERT is achieved by applying
                                  a binary classifier on top of BERT. This NLP phase involves the stages of preprocessing,
                                  modeling, and prediction.
                                  •    Preprocessing: In this phase, the news dataset obtained from media or tweets is pre-
                                       processed. The preprocessing involves reading the dataset, tokenizing the sentences,
                                       converting words to lowercase, removing stop words, sentences stemmed, and finally,
                                       the words with the same meaning are grouped or lemmatized.
                                  •    Modeling: This step involves feature extraction for the model and sentiment analysis.
                                       Sentiment analysis will first convert the tokens to the dictionary, and the dataset will
                                       be split for training and testing the model. The model is built using an artificial neural
                                       network classifier.
                                  •    Prediction: This step will receive the testing news data and predict if the sentiment is
                                       positive or negative. This result is concatenated with the historical dataset.
                                     Figure7.7.The
                                    Figure      Thearchitecture
                                                    architectureofofVMD
                                                                     VMDplus
                                                                         plusLSTM.
                                                                              LSTM.
                                   5.3.
                                    5.3.The
                                         TheDeep
                                              DeepReinforcement
                                                    ReinforcementLearning
                                                                     LearningPhase
                                                                              Phase
                                         The  last phase  is the DRL   model,
                                          The last phase is the DRL model,     from  which
                                                                                   from      the final
                                                                                           which        decision
                                                                                                  the final        is generated.
                                                                                                             decision              The input
                                                                                                                           is generated.   The
                                   to this to
                                    input  phase
                                              this is the output
                                                   phase           from the
                                                           is the output     sentiment
                                                                          from            analysis
                                                                                the sentiment        module,
                                                                                                 analysis      the predicted
                                                                                                            module,              prices prices
                                                                                                                        the predicted   from
                                   the
                                    fromLSTM,   and some
                                           the LSTM,     andtechnical   indicators.
                                                               some technical        The DRL
                                                                                 indicators.    used
                                                                                              The   DRLin this
                                                                                                          usedphase
                                                                                                                  in thisis deep
                                                                                                                            phaseQislearning
                                                                                                                                      deep Q
                                   with
                                    learning with a reply buffer. The neural network is trained to generate the Q valuesall
                                         a  reply  buffer.   The  neural  network    is trained  to  generate    the   Q  values  for   forthe
                                                                                                                                             all
                                   possible  actions   based  on the  current environment     state, which  is  fed  to the
                                    the possible actions based on the current environment state, which is fed to the neural  neural network
                                   as input. as input.
                                    network
                                         Therefore,
                                          Therefore,thetheproposed
                                                            proposedarchitecture
                                                                       architectureandandalgorithm
                                                                                           algorithm depend
                                                                                                        dependon onhistorical  andand
                                                                                                                       historical   media
                                                                                                                                       mediaor
                                   news   datasets.   The  architecture   consists of  three phases:    NLP,  prediction,
                                    or news datasets. The architecture consists of three phases: NLP, prediction, and DRL. Theand  DRL.  The
                                   combined
                                    combinedalgorithm
                                                algorithmof  ofsentiment
                                                                sentimentandandanalysis
                                                                                 analysisand
                                                                                           andDRL
                                                                                                DRLareareused
                                                                                                           usedto toobtain
                                                                                                                      obtainpredictions
                                                                                                                              predictionsforfor
                                   stock.
                                    stock.
                                    6. Implementation and Discussion of Results
                                     6. Implementation and Discussion of Results
                                          The implementation of our framework is carried out utilizing cloud GPUs, leveraging
                                           The implementation of our framework is carried out utilizing cloud GPUs,
                                    the advantages of cloud computing for enhanced processing capabilities. Rigorous evalu-
                                     leveraging   the advantages
                                    ation and fine-tuning   of eachofcode
                                                                       cloud  computing
                                                                          module            for enhanced
                                                                                    are conducted             processing
                                                                                                      to ensure            capabilities.
                                                                                                                 optimal accuracy    at
                                     Rigorous   evaluation and  fine-tuning of each  code module    are  conducted   to ensure
                                    every phase. The efficiency of the proposed framework is comprehensively evaluated and     optimal
                                     accuracy at
                                    compared       every
                                                with     phase. The
                                                      benchmark       efficiency
                                                                  trading        of the
                                                                          strategies  to proposed
                                                                                         validate itsframework    is comprehensively
                                                                                                       effectiveness.
                                     evaluated and compared with benchmark trading strategies to validate its effectiveness.
                                    6.1. Sentiment Analysis Phase
                                     6.1. Sentiment Analysis
                                          In the sentiment   Phase phase, various classification algorithms coupled with differ-
                                                           analysis
                                         In the sentiment
                                   ent preprocessing    modelsanalysis     phase,
                                                                  are tested       various the
                                                                              to determine    classification
                                                                                                 most accuratealgorithms
                                                                                                                 algorithm.coupled    with
                                                                                                                              The results,
                                    different
                                   as shown preprocessing       models are
                                               in Table 1, underscore     thetested   to determine
                                                                               superiority            the most accurate
                                                                                             of the combination           algorithm.
                                                                                                                   of TFIDF            The
                                                                                                                               and BERT,
                                    results, as shown   in  Table  1, underscore    the superiority   of the combination    of
                                   which yielded a remarkable accuracy of 96.8%. Extensive analytics, including classification TFIDF   and
                                    BERT, which
                                   techniques   and yielded    a remarkable
                                                     model overfitting           accuracy were
                                                                           identification,  of 96.8%.    Extensive
                                                                                                  performed.         analytics, especially
                                                                                                               Visualization,    including
                                    classification
                                   using             techniques
                                          artificial neural          and (ANN)
                                                              networks      model withoverfitting
                                                                                          BERT and   identification,
                                                                                                       TFIDF, playedwere       performed.
                                                                                                                        a crucial  role in
                                    Visualization, especially
                                   comprehending                  using artificialdynamics.
                                                      the training-prediction       neural networks
                                                                                               The ANN  (ANN)
                                                                                                           modelwith   BERT exceptional
                                                                                                                  exhibited   and TFIDF,
                                    played a crucial
                                   performance,          role an
                                                   boasting    in accuracy
                                                                   comprehending       the as
                                                                             rate of 97%,  training-prediction
                                                                                              depicted in Figuredynamics.
                                                                                                                   8.           The ANN
                                    model exhibited exceptional performance, boasting an accuracy rate of 97%, as depicted
                                   Table 1. Findings
                                    in Figure  8.     on gold data using sentiment analysis.
                                            It important
                                        It is  is important
                                                          to to note
                                                             note    that
                                                                  that thethe accurate
                                                                           accurate     prediction
                                                                                      prediction     from
                                                                                                  from     this
                                                                                                        this    phase
                                                                                                              phase    leads
                                                                                                                    leads  to to accurate
                                                                                                                              accurate
                                   decisions from the DRL phase. The efficiency of this prediction is evaluated, and results
                                      decisions    from the DRL  phase. The  efficiency of this prediction  is evaluated, and  the  the
                                      are shown in Figure 9, comparing the actual and predicted prices. The figure shows that
                                   results are shown in Figure 9, comparing the actual and predicted prices. The figure shows
                                      our prediction module works very well, as there is a significant correlation between the
                                      actual and the predicted prices.
Appl. Syst. Innov. 2023, 6, x FOR PEER REVIEW                                                                                     15 of 22
                                  Figure 9.
                                  Figure 9. Actual
                                            Actual prices
                                                   prices vs.
                                                          vs. predicted
                                                              predicted prices.
                                                                        prices.
                                  6.3.
                                   6.3. Final
                                        Final Decision
                                               Decision Phase
                                                         Phase
                                         The
                                         The next
                                               next phase
                                                     phase is
                                                            is the
                                                                the deep
                                                                    deep reinforcement
                                                                          reinforcement learning
                                                                                           learning phase,
                                                                                                      phase, which
                                                                                                              which will
                                                                                                                     will make
                                                                                                                           make the
                                                                                                                                 the final
                                                                                                                                      final
                                  decision.
                                   decision. The
                                               The implementation
                                                    implementation relies
                                                                        relies on
                                                                               on the
                                                                                   the famous
                                                                                       famous architecture
                                                                                               architecture ofof deep
                                                                                                                 deep QQ learning,
                                                                                                                          learning, which
                                                                                                                                    which
                                  belongs
                                   belongs toto the
                                                 the value-based
                                                     value-based category
                                                                     category ofof DRL
                                                                                   DRL algorithms.
                                                                                         algorithms. Table
                                                                                                       Table 44 shows
                                                                                                                shows the
                                                                                                                       the configuration
                                                                                                                            configuration
                                  for
                                   for the implemented network. The DQN relies on a reply buffer with two
                                       the  implemented      network.    The   DQN    relies on  a reply  buffer  with  two deep
                                                                                                                              deep neural
                                                                                                                                    neural
                                  networks:
                                   networks: one is the main network, and the other is the target network. Both networks
                                                one  is the main    network,    and  the other  is the target  network.   Both  networks
                                  have
                                   have the
                                          the same
                                              same architecture
                                                     architecturewithwiththree
                                                                          threelayers.
                                                                                 layers.
                                  Table4.4. Hyper-parameters
                                  Table     Hyper-parametersadopted
                                                             adoptedin
                                                                     inthe
                                                                        theimplemented
                                                                            implementedDRL
                                                                                       DRLalgorithm.
                                                                                           algorithm.
                                       The final decision phase employs deep reinforcement learning (DRL), specifically the
                                        The final decision phase employs deep reinforcement learning (DRL), specifically the
                                  deep Q learning architecture, a value-based DRL algorithm. The implementation details are
                                   deep Q learning architecture, a value-based DRL algorithm. The implementation details
                                  provided in Table 4. The state representation includes factors like historical and predicted
                                   are provided in Table 4. The state representation includes factors like historical and
                                  prices, sentiment analysis outputs, and technical indicators like relative strength index (RSI)
                                   predicted prices, sentiment analysis outputs, and technical indicators like relative strength
                                  and momentum (MOM). The action space consists of four actions: buy, buy more, sell, and
                                   index (RSI) and momentum (MOM). The action space consists of four actions: buy, buy
                                  sell more.
                                   more, sell, and sell more.
                                       The efficiency of the entire framework is deeply rooted in the accurate predictions
                                        The efficiency of the entire framework is deeply rooted in the accurate predictions
                                  from the stock price prediction phase. The DRL model’s capability to make informed
                                   from the stock price prediction phase. The DRL model’s capability to make informed
                                  decisions based on these predictions is crucial for successful trading strategies.
                                   decisions based on these predictions is crucial for successful trading strategies.
                                  6.4. Algorithms in Comparison
                                       The gold dataset was processed using the algorithms, namely, best stock benchmark,
                                  buy-and-hold benchmark, and “constantly rebalanced portfolios” (CRPs). The algorithms
                                  provided results that were compared with the proposed architecture. The metrics and
                                  values determined using these algorithms are provided in Table 5. The values obtained
                                  are rounded to the nearest whole number. The classical buy-and-hold benchmark is quite
Appl. Syst. Innov. 2023, 6, 106                                                                                             15 of 21
                                  simple, where the user buys gold with all his money at the beginning of the period and
                                  waits till the end of the period, then sells all his gold, and the total profit is the difference
                                  between his wealth at the start and end of the period.
                                                                                 ∑tT=1 Pt or Lt
                                                                      AWRT =                                                    (5)
                                                                                    Casht=0
                                                                                  ∑tT=1 (MDDt )
                                                                    AMDDT =                                                     (7)
                                                                                         T
                                  •    Calmar ratio
                                      This calculates the mean value for the accumulated wealth rate with respect to the
                                  max of the max drawdown value. The following equation can calculate it.
                                                                                  ∑tT=1 (MDDt )
                                                                    AMDDT =                                                     (8)
                                                                                         T
                                                                                 mean(AWRT )
                                                                         SRT =                                               (11)
                                                                                  Std(AWRT )
                                                                                   ∑tT=1 (SRt )
                                                                         ASRT =                                              (12)
                                                                                        T
                                  •    Annualized Return Rate and Annualized Sharpe Ratio
                                       The annualized terms mean calculating the values with respect to a full year.
                                       They are calculated with the same equations, but the trading periods will be 365.
                                                                                         100
                                                                      RSI = 100 −                                            (13)
                                                                                      (1 + RS)
                                   fifth,
                                   fifth, the
                                          the predicted
                                              predicted prices
                                                         prices for
                                                                for the
                                                                    the upcoming
                                                                        upcoming five
                                                                                 five days
                                                                                      days are
                                                                                           are predicted
                                                                                               predicted from
                                                                                                         from the
                                                                                                              the VMD
                                                                                                                   VMD    plus
Appl. Syst. Innov. 2023, 6, 106                                                                                        17 plus
                                                                                                                          of 21
                                   LSTM;
                                   LSTM; andand sixth,
                                                sixth, the
                                                       the sentiment
                                                           sentiment analysis
                                                                      analysis module
                                                                               module generates
                                                                                      generates the
                                                                                                the sentiment
                                                                                                    sentiment for
                                                                                                              for the
                                                                                                                  the current
                                                                                                                      current
                                   day.
                                   day.
                                  6.9.
                                   6.9. Proposed Framework       Results Comparison
                                   6.9. Proposed
                                        Proposed Framework
                                                    Framework Results
                                                                  Results Comparison
                                                                           Comparison
                                         The
                                         The  results  from   the  proposed    framework are      compared with    the benchmark      trading
                                         The results from the proposed framework
                                               results  from   the proposed     framework are are compared
                                                                                                  compared with
                                                                                                              with the
                                                                                                                    the benchmark
                                                                                                                         benchmark trading
                                                                                                                                       trading
                                  strategies
                                   strategies   mentioned
                                                  mentioned    above.
                                                                  above. The  results
                                                                            The        showed
                                                                                   results        that the
                                                                                             showed        proposed
                                                                                                        that  the       framework
                                                                                                                    proposed          outper-
                                                                                                                                  framework
                                   strategies mentioned above. The results showed that the proposed framework
                                  formed     the other algorithms
                                   outperformed                        in different  evaluation criteria,criteria,
                                                                                                           as shown in     Table 5.Table 5.
                                   outperformed the   the other
                                                          other algorithms
                                                                  algorithms in in different
                                                                                   different evaluation
                                                                                              evaluation criteria, asas shown
                                                                                                                        shown inin Table 5.
                                         The
                                         The  values   for  performance      metrics  are  obtained   from  the same    gold dataset    earlier.
                                         The values for performance metrics are obtained from the same gold dataset earlier.
                                               values   for performance      metrics  are  obtained   from  the same    gold dataset    earlier.
                                  The
                                   The  DQN     results  were   compared     with  the other   algorithms.   The graphs    were  obtained    to
                                   The DQN
                                         DQN results
                                                 results were
                                                          were compared
                                                                 compared withwith the
                                                                                    the other
                                                                                        other algorithms.
                                                                                                algorithms. The
                                                                                                             The graphs
                                                                                                                 graphs were
                                                                                                                           were obtained
                                                                                                                                 obtained to  to
                                  show
                                   show   the  performance      of the  algorithms.   The   annualized   wealth  rate  algorithm    provided
                                   show thethe performance
                                                performance of   of the
                                                                    the algorithms.
                                                                         algorithms. The
                                                                                      The annualized
                                                                                            annualized wealth
                                                                                                          wealth rate
                                                                                                                  rate algorithm
                                                                                                                        algorithm provided
                                                                                                                                    provided
                                  the
                                   the following graph      for metrics    shown in   Figure 10.
                                   the following
                                        following graph
                                                     graph for
                                                             for metrics
                                                                  metrics shown
                                                                            shown inin Figure
                                                                                       Figure 10.
                                                                                                10.
                                  Figure
                                  Figure 10. Graph
                                         10. Graph
                                  Figure10.        showing
                                             Graph showing metrics
                                                   showing metrics for
                                                           metricsfor  annualized
                                                                   forannualized  wealth
                                                                       annualizedwealth  rate.
                                                                                  wealthrate.
                                                                                         rate.
                                        The
                                       The    graph
                                              graphforfor
                                        Thegraph        thethe
                                                      for      average
                                                            average
                                                           the          maximum
                                                                    maximum
                                                               average           drawdown
                                                                             drawdown
                                                                        maximum             algorithm
                                                                                      algorithm
                                                                                 drawdown       for the for
                                                                                            algorithm       the
                                                                                                            the performance
                                                                                                        performance
                                                                                                        for          metrics
                                                                                                                performance
                                   metrics
                                  is       is
                                     provided
                                   metrics    provided    in Figure
                                                in Figurein11.
                                           is provided              11.
                                                             Figure 11.
                                  Figure  11.
                                  Figure11.   Graph
                                          11.Graph
                                              Graph  showing
                                                     showing   the
                                                               the comparison
                                                                   comparison  of
                                                                               of metrics
                                                                                   metrics    for
                                                                                              for the average
                                                                                                      average maximum
                                                                                                  the maximum maximum  drawdown
                                  Figure
                                  algorithm.       showing the comparison of metrics for the average          drawdown drawdown
                                                                                                                       algorithm.
                                  algorithm.
                                       In Figure 10, the peaks indicate the amount of profit possible at a certain point in
                                        In Figure  10,  the
                                                        the peaks   indicate  the  amount
                                                                                   amount of   profit
                                                                                               profit possible   at aa certain point
                                                                                                                               point in
                                  time. In Figure
                                         The graphs10,show  peaks   indicate the
                                                             that regarding   the annualizedof wealth possible
                                                                                                      rate, the at     certain
                                                                                                                 proposed            in
                                                                                                                             algorithm
                                   time.
                                   time. The
                                         The graphs
                                             graphs   show
                                                      show   that
                                                             that regarding
                                                                  regarding   the
                                                                              the annualized
                                                                                  annualized   wealth
                                                                                               wealth  rate,
                                                                                                       rate, the
                                                                                                             the proposed
                                                                                                                 proposed    algorithm
                                                                                                                             algorithm
                                  outperforms the other algorithms, and hence is effective in predicting stock value. Likewise,
                                   outperforms
                                   outperforms    the  other
                                                  thepeaks
                                                       other   algorithms,
                                                               algorithms,   and hence    is
                                                                                          is effective   in   predicting   stock
                                                                                                                          stock value.
                                  in Figure 11, the         of the proposedand     hence indicate
                                                                              algorithm       effective  in results
                                                                                                  that the   predicting
                                                                                                                     outperform  value.
                                                                                                                                  other
                                  baseline algorithms. In addition, the NLP processing and the combined RNN, DQN, and
                                  VMD architecture provide better prediction results.
 Appl. Syst. Innov. 2023, 6, x FOR PEER REVIEW                                                                                               19 of 22
                                    Likewise, in Figure 11, the peaks of the proposed algorithm indicate that the results
Appl. Syst. Innov. 2023, 6, 106     outperform other baseline algorithms. In addition, the NLP processing and the combined
                                                                                                                     18 of 21
                                    RNN, DQN, and VMD architecture provide better prediction results.
                                   6.10.2.
                                    6.10.2.Effect
                                            Effectof  ofUsing
                                                         UsingSentiment
                                                                 SentimentAnalysis
                                                                              AnalysisModule
                                                                                           Moduleon  onthe
                                                                                                         theFramework
                                                                                                              FrameworkPerformance
                                                                                                                             Performance
                                         In
                                          In the same context, other experiments are conducted to emphasizethe
                                            the   same    context,   other  experiments       are conducted    to  emphasize       theefficiency
                                                                                                                                         efficiencyof  of
                                   using
                                    using sentiment
                                            sentiment analysis
                                                           analysis ininour
                                                                         ourproposed
                                                                               proposedalgorithm.
                                                                                             algorithm. Figure
                                                                                                           Figure 1212 shows
                                                                                                                        shows thetheperformance
                                                                                                                                       performance
                                   improvement
                                    improvement achieved
                                                       achieved by  by adding
                                                                        adding the
                                                                                 thesentiment
                                                                                       sentimentanalysis
                                                                                                    analysismodule
                                                                                                               moduleto   toour
                                                                                                                             ouralgorithm.
                                                                                                                                     algorithm.The The
                                   experiments
                                    experimentsare  aredone
                                                          doneforfordifferent
                                                                     different  numbers
                                                                                 numbers   ofofepisodes.
                                                                                                 episodes.Each  number
                                                                                                             Each   number of episodes
                                                                                                                              of episodes  is done  at
                                                                                                                                               is done
                                   least 10  times,  and    the average   is taken.   In  these  experiments,    the performance
                                    at least 10 times, and the average is taken. In these experiments, the performance is               is measured
                                   as follows. as
                                    measured       The   current
                                                    follows.    Theday’s  closing
                                                                      current   day’sprice   is compared
                                                                                        closing              with the previous
                                                                                                  price is compared                   day’s closing
                                                                                                                        with the previous        day’s
                                   price.  If  there  is a  price  increase   and   the  algorithm    decides   to  sell, this  is  considered
                                    closing price. If there is a price increase and the algorithm decides to sell, this is considered              the
                                   correct
                                    the correct action. On the other hand, if the algorithm decides to buy, this is consideredaa
                                             action.    On   the  other  hand,    if the   algorithm   decides    to buy,   this   is considered
                                   wrong
                                    wrongaction.
                                             action.TheTheperformance
                                                             performancehere hereisiscalculated
                                                                                       calculatedas  asthe
                                                                                                        thepercentage
                                                                                                             percentageof  ofthe
                                                                                                                               thecorrect
                                                                                                                                     correctactions
                                                                                                                                               actions
                                   relative   to the algorithm’s      total number     of
                                    relative to the algorithm’s total number of actions.   actions.
                                    Figure12.
                                   Figure  12.Effect
                                              Effectofofusing
                                                       usingthe
                                                             thesentiment
                                                                 sentimentanalysis
                                                                           analysismodule.
                                                                                   module.
                                   7. Conclusions
                                        This research introduces a novel architecture that combines various prediction al-
                                   gorithms to tackle the challenges of stock value prediction with exceptional accuracy.
                                   Specifically focusing on gold datasets, the study aimed to forecast gold prices for investors.
                                   The input data encompassed gold datasets from reputable sources such as S&P, Yahoo, and
                                   NASDAQ, representing standard stock market data. The predictive framework employed
Appl. Syst. Innov. 2023, 6, 106                                                                                               19 of 21
                                  natural language processing (NLP) to process sentiments extracted from social media
                                  feeds, long short-term memory (LSTM) networks to analyze historical data, variation mode
                                  decomposition (VMD) for feature selection, and artificial neural networks (ANNs) to make
                                  predictions. Additionally, the research integrated deep reinforcement learning (DRL) algo-
                                  rithms and deep Q networks (DQNs) to blend sentiments with other algorithms, enabling
                                  the prediction of the opening stock value for the next day based on the previous day’s
                                  data. The processes developed for training and testing data were meticulously presented,
                                  forming the foundation of the prediction model. Comparative analysis was conducted
                                  with benchmark performance metrics, including the best stock benchmark, buy-and-hold
                                  benchmark, constant rebalanced portfolios, and DQN. Through rigorous evaluation, the
                                  proposed architecture demonstrated superior accuracy in performance metrics. Graphi-
                                  cal representations were employed to showcase peaks indicating high values at specific
                                  times or on specific days, aligning with benchmark standards. The comparison clearly
                                  highlighted that the DQN outperformed existing algorithms, underscoring the potential of
                                  the proposed architecture to predict stocks with unparalleled precision.
                                        Future research, which could extend this research into real-time applications within dy-
                                  namic environments, such as livestock markets, holds immense promise. Such applications
                                  could provide invaluable insights into the model’s effectiveness and adaptability across
                                  different market scenarios. Moreover, the framework’s generic nature, as demonstrated in
                                  this study, suggests its versatility for application across diverse products beyond gold. This
                                  versatility transforms the model into a powerful tool for traders and investors in various
                                  sectors. Subsequent studies focusing on real-time livestock market data not only stand
                                  to validate the framework’s effectiveness but also pave the way for tailored adaptations
                                  customized to specific industries and the unique intricacies of each market.
                                        The proposed framework contains three main modules. Each module can be enhanced
                                  with different techniques. In the sentiment analysis module, the proposed framework used
                                  classification techniques to judge whether the sentence is positive or negative. However,
                                  another primary technique that can be used is the lexicon-based technique in which the
                                  language dictionary is used to make the sentiment analysis.
                                        In the price prediction module, the proposed framework considered the stock historical
                                  prices as a signal and used VMD as a signal-processing technique to decompose the signal
                                  into sub-signals and remove the signal noise. Several other signal-processing techniques
                                  can be used for noise removal. This area is open to research, and other signal-processing
                                  techniques may easily enhance this module if they exist.
                                        Finally, the decision-making is undertaken by the deep reinforcement network. Several
                                  DRL techniques can be utilized in this module, giving better or worse results than the
                                  implemented one.
                                  Author Contributions: Methodology, A.L.A., S.M.E. and M.W.F.; Software, A.L.A.; Supervision,
                                  S.M.E. and M.W.F. All authors have read and agreed to the published version of the manuscript.
                                  Funding: This research received no external funding.
                                  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can
                                  be found here: https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-in-commodity-
                                  market-gold (accessed on 1 February 2023).
                                  Conflicts of Interest: The authors declare no conflict of interest.
References
1.    Idrees, S.M.; Alam, M.A.; Agarwal, P. A Prediction Approach for Stock Market Volatility Based on Time Series Data. IEEE Accesss
      2019, 7, 17287–17298. [CrossRef]
2.    Bouteska, A.; Regaieg, B. Loss aversion, the overconfidence of investors and their impact on market performance evidence from
      the US stock markets. J. Econ. Financ. Adm. Sci. 2020, 25, 451–478. [CrossRef]
3.    Feng, F.; He, X.; Wang, X.; Luo, C.; Liu, Y.; Chua, T.S. Temporal Relational Ranking for Stock Prediction|ACM Transactions on
      Information Systems. ACM Trans. Inf. Syst. (TOIS) 2019, 37, 1–30. [CrossRef]
Appl. Syst. Innov. 2023, 6, 106                                                                                                      20 of 21
4.    Dirman, A. Financial distress: The impacts of profitability, liquidity, leverage, firm size, and free cash flow. Int. J. Bus. Econ. Law
      2020, 22, 17–25.
5.    Ghimire, A.; Thapa, S.; Jha, A.K.; Adhikari, S.; Kumar, A. Accelerating Business Growth with Big Data and Artificial Intelligence.
      In Proceedings of the 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC),
      Palladam, India, 7–9 October 2020. [CrossRef]
6.    Kurani, A.; Doshi, P.; Vakharia, A.; Shah, M. A Comprehensive Comparative Study of Artificial Neural Networks (ANN) and
      Support Vector Machines (SVM) on Stock Forecasting. Ann. Data Sci. 2021, 10, 183–208. [CrossRef]
7.    Beg, M.O.; Awan, M.N.; Ali, S.S. Algorithmic Machine Learning for Prediction of Stock Prices. In FinTech as a Disruptive Technology
      for Financial Institutions; IGI Global: Hershey, PA, USA, 2019; pp. 142–169. [CrossRef]
8.    Shah, D.; Isah, H.; Zulkernine, F. Stock Market Analysis: A Review and Taxonomy of Prediction Techniques. Int. J. Financ. Stud.
      2019, 7, 26. [CrossRef]
9.    Yadav, A.; Chakraborty, A. Investor Sentiment and Stock Market Returns Evidence from the Indian Market. Purushartha-J. Manag.
      Ethics Spiritual. 2022, 15, 79–93. [CrossRef]
10.   Chauhan, L.; Alberg, J.; Lipton, Z. Uncertainty-Aware Lookahead Factor Models for Quantitative Investing. In Proceedings of the
      37th International Conference on Machine Learning (PMLR), Virtual, 13–18 July 2020; Volume 119, pp. 1489–1499.
11.   Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A novel multi-source information-fusion predictive framework based on deep neural
      networks for accuracy enhancement in stock market prediction. J. Big Data 2021, 8, 17. [CrossRef]
12.   Sakhare, N.N.; Imambi, S.S. Performance analysis of regression-based machine learning techniques for prediction of stock market
      movement. Int. J. Recent Technol. Eng. 2019, 7, 655–662.
13.   Singh, R.; Srivastava, S. Stock prediction using deep learning. Multimed. Tools Appl. 2016, 76, 18569–18584. [CrossRef]
14.   Hu, Z.; Zhao, Y.; Khushi, M. A Survey of Forex and Stock Price Prediction Using Deep Learning. Appl. Syst. Innov. 2021, 4, 9.
      [CrossRef]
15.   Hiransha, M.; Gopalakrishnan, E.A.; Menon, V.K.; Soman, K.P. NSE Stock Market Prediction Using Deep-Learning Models.
      Procedia Comput. Sci. 2018, 132, 1351–1362. [CrossRef]
16.   Patel, R.; Choudhary, V.; Saxena, D.; Singh, A.K. Review of Stock Prediction using machine learning techniques. In Proceedings of
      the 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 3–5 June 2021; pp. 840–847.
17.   Kamath, U.; Liu, J.; Whitaker, J. Deep Learning for NLP and Speech Recognition; Springer: Cham, Switzerland, 2019; pp. 575–613.
18.   Manolakis, D.; Bosowski, N.; Ingle, V.K. Count Time-Series Analysis: A Signal Processing Perspective. IEEE Signal Process. Mag.
      2019, 36, 64–81. [CrossRef]
19.   Kabbani, T.; Duman, E. Deep Reinforcement Learning Approach for Trading Automation in the Stock Market. IEEE Access 2022,
      10, 93564–93574. [CrossRef]
20.   Moghar, A.; Hamiche, M. Stock Market Prediction Using LSTM Recurrent Neural Network. Procedia Comput. Sci. 2020, 170,
      1168–1173. [CrossRef]
21.   Ren, Y.; Liao, F.; Gong, Y. Impact of News on the Trend of Stock Price Change: An Analysis based on the Deep Bidirectional LSTM
      Model. Procedia Comput. Sci. 2020, 174, 128–140. [CrossRef]
22.   Jin, Z.; Yang, Y.; Liu, Y. Stock closing price prediction based on sentiment analysis and LSTM. Neural Comput. Appl. 2019, 32,
      9713–9729. [CrossRef]
23.   Parray, I.R.; Khurana, S.S.; Kumar, M.; Altalbe, A.A. Time series data analysis of stock price movement using machine learning
      techniques. Soft Comput. 2020, 24, 16509–16517. [CrossRef]
24.   Duan, G.; Lin, M.; Wang, H.; Xu, Z. Deep Neural Networks for Stock Price Prediction. In Proceedings of the 14th International
      Conference on Computer Research and Development (ICCRD), Shenzhen, China, 7–9 January 2022. [CrossRef]
25.   Huang, J.; Liu, J. Using social media mining technology to improve stock price forecast accuracy. J. Forecast. 2019, 39, 104–116.
      [CrossRef]
26.   Iqbal, S.; Sha, F. Actor-Attention-Critic for Multi-Agent Reinforcement Learning. In Proceedings of the 36th International
      Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 2961–2970.
27.   Singh, V.; Chen, S.-S.; Singhania, M.; Nanavati, B.; Kar, A.K.; Gupta, A. How are reinforcement learning and deep learning
      algorithms used for big data-based decision making in financial industries—A review and research agenda. Int. J. Inf. Manag.
      Data Insights 2022, 2, 100094. [CrossRef]
28.   Padakandla, S. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Comput. Surv.
      (CSUR) 2021, 54, 1–25. [CrossRef]
29.   Silver, D.; Singh, S.; Precup, D.; Sutton, R.S. A reward is enough. Artif. Intell. 2021, 299, 103535. [CrossRef]
30.   Kartal, B.; Hernandez-Leal, P.; Taylor, M.E. Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning. Proc.
      AAAI Conf. Artif. Intell. Interact. Digit. Entertain. 2019, 15, 38–44. [CrossRef]
31.   Zhang, Z.; Zohren, S.; Roberts, S. Deep Reinforcement Learning for Trading. J. Financ. Data Sci. 2020, 2, 25–40. [CrossRef]
32.   Sewak, M. Mathematical and Algorithmic Understanding of Reinforcement Learning. In Deep Reinforcement Learning; Springer:
      Cham, Switzerland, 2019; pp. 19–27.
33.   Xiao, Y.; Lyu, X.; Amato, C. Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning. In Proceedings
      of the International Symposium on Multi-Robot and Multi-Agent Systems (MRS), Cambridge, UK, 4–5 November 2021. [CrossRef]
Appl. Syst. Innov. 2023, 6, 106                                                                                                21 of 21
34.   Ren, Y.; Duan, J.; Li, S.E.; Guan, Y.; Sun, Q. Improving Generalization of Reinforcement Learning with Minimax Distributional
      Soft Actor-Critic. In Proceedings of the IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes,
      Greece, 20–23 September 2020. [CrossRef]
35.   Yang, H.; Liu, X.Y.; Zhong, S.; Walid, A. Deep reinforcement learning for automated stock trading: An ensemble strategy. In
      Proceedings of the First ACM International Conference on AI in Finance (ICAIF), New York, NY, USA, 6 October 2020; pp. 1–8.
36.   Zanette, A.; Wainwright, M.J.; Brunskill, E. Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning. Adv.
      Neural Inf. Process. Syst. 2021, 34, 13626–13640.
37.   Nguyen, N.D.; Nguyen, T.T.; Vamplew, P.; Dazeley, R.; Nahavandi, S. A Prioritized objective actor-critic method for deep
      reinforcement learning. Neural Comput. Appl. 2021, 33, 10335–10349. [CrossRef]
38.   Wang, C.; Sandas, P.; Beling, P. Improving Pairs Trading Strategies via Reinforcement Learning. In Proceedings of the 2021
      International Conference on Applied Artificial Intelligence (ICAPAI), Halden, Norway, 19–21 May 2021. [CrossRef]
39.   Huang, H.; Zhao, T. Stock Market Prediction by Daily News via Natural Language Processing and Machine Learning. In
      Proceedings of the 2021 International Conference on Computer, Blockchain and Financial Development (CBFD), Nanjing, China,
      23–25 April 2021. [CrossRef]
40.   Gupta, R.; Chen, M. Sentiment Analysis for Stock Price Prediction. In Proceedings of the 2020 IEEE Conference on Multimedia
      Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020. [CrossRef]
41.   Huo, H.; Iwaihara, M. Utilizing BERT Pretrained Models with Various Fine-Tune Methods for Subjectivity Detection. Web Big
      Data 2020, 4, 270–284. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.