Stock Prediction Using ML/DL
Stock Prediction Using ML/DL
Bachelor of Engineering
by
                                         College seal
                          Approval Sheet
                                                      Examiners
                                                         1._________________
                                                         2._________________
Date: 29/05/2021
Place: Bandra, Mumbai.
                                   Declaration
We declare that this written submission represents our ideas in our own words and where
others’ ideas or words have been included, we have adequately cited and referenced the sources.
We also declare that we have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission.
We understand that any violation of the above will be cause for disciplinary action by the
Institute and can also evoke penal action from the sources which have thus not been properly
cited or from whom proper permission has not been taken when needed.
In today’s age, some businesses need to upgrade to machine learning techniques. This project
aims to create an autonomous platform for researchers/laymen who operate on data, which
would auto-clean the data and suggest a machine learning approach to understand and get
better value out of data. The proposed project includes a minimalistic user interface and
complex backend to cater to user’s needs such as better data visualization and summarization,
using pandas and ChartJS operations in Python. The ultimate goal is to obtain an autonomy
such that the user does not have to write a single line of code. The data cleaning has been
generalized for different structures of datasets in Flask backend. The algorithms are allotted
according to target columns which will be set by the user in UI. Users can try different ML
approaches by just choosing options in the UI. The project also includes complementary features
such as auto web scraping to gather news related to the stock market. The project also features a
chatbot which can be used to query stock information and provide the necessary knowledge.
This way a layman can be introduced to the power of computing using machine learning to help
him trade efficiently.
                            Acknowledgements
We have great pleasure in presenting the report on "Stock Prediction using ML/DL". We
take this opportunity to express our sincere thanks towards our guide Dr Nilesh Patil, C.R.C.E,
Bandra (W), Mumbai, for providing the technical guidelines, and the suggestions regarding the
line of this work. We enjoyed discussing the work progress with him during our visits to the
department.
We thank Dr Jagruti Save, Head of Information Technology Dept., Principal and the
management of C.R.C.E., Mumbai for encouraging and providing the necessary infrastructure
for pursuing the project.
We also thank all non-teaching staff for their valuable support, to complete our project.
Abstract 5
List of Figures 9
Glossary
 1.   Introduction                                                   1
       1.1.     Background                                           1
      1.2.      Motivation                                           2
      1.3.      Objectives                                           3
 2.   Related Theory                                                 4
      2.1.      Knowledge Discovery in Databases                     4
      2.2.      Technical Indicators                                 6
              2.2.1.   MACD (Moving Average conversion diversion)    6
              2.2.2.   MACD Histogram                                6
              2.2.3.   RSI (Relative Strength Index)                 6
      2.3.      Machine Learning                                     7
              2.3.1.   Support Vector Regression                     7
              2.3.2.   Random Forest Regressor                      8
              2.3.3.   Gradient Boosting Regressor                  10
              2.3.4.   LSTM                                         11
      2.4.      Data Analysis                                       13
      2.5.      Natural Language Processing                         15
      2.6.      Web Scraping                                        16
              2.6.1.   The Crawler                                  17
              2.6.2.   The Scraper                                  17
      2.7.      RSS Feeds                                           19
 3.   Literature Review                                             20
 4.   Problem Statement                                             24
      4.1.      Drawbacks of current trading platforms              24
      4.2.      Solution To Above Problems                          25
 5.    Project Description                                              26
       5.1.    Overview of the project                                  26
       5.2.    Module Description                                       28
              5.2.1.      Modules                                       28
                       5.2.1.1.   Stock Price Prediction                28
                       5.2.1.2.   News Feed                             30
                       5.2.1.3.   Trading Agent                         30
                       5.2.1.4.   Sentiment Analysis of News Articles   31
                       5.2.1.5.   Chatbot                               33
       5.3.    Data Flow Diagram Level 0                                37
       5.4.    Data Flow Diagram Level 1                                37
       5.5.    Data Flow Diagram Level 2                                38
       5.6.    Architecture/ Block Diagram                              39
       5.7.    Sequence Diagram                                         40
       5.8.    Use Case Diagram                                         41
       5.9.    Dataset Design                                           42
      5.10.    Splitting Train - Test Data                              44
      5.11.    Building the models                                      45
      5.12.    Input Design                                             46
      5.13.    Output Design                                            47
 6.    Implementation Results                                           51
       6.1.    Stock Predictions                                        51
       6.2.    News Updates                                             58
       6.3.    Sentiment Analysis of News Articles                      59
       6.4.    Chatbot                                                  60
 7.    Applications                                                     67
 8.    Conclusion And Future Enhancements                               68
References                                                              70
                                     List of Figures
2.1: Knowledge Data Discovery                                            4
5.19 : A code snippet for generating a Chart.JS line graph after giving data points as input. 47
5.21 Shows a code snippet for generating an RSS feed after giving RSS feed source as input. 50
Chapter 1
Introduction
1.1. Background
       The power of machine learning has outperformed human pattern recognition speed and
statistical superiority because of the computation speed of processors. But being a concept
which recently came into the boom, there are countless applications of the technology in various
domains.
       One such application of machine learning is Stock Prediction. India allowed algorithmic
trading only in 2008, but algorithms control nearly a 3rd of all trades already. within the West,
where the phenomenon first surfaced nearly three decades ago, the algorithm was running about
50% folks equity trading volumes by 2012. within the exchange markets, which run 23 hours
each day, algorithms account for 80% of the trading volumes.
       This led us to develop a solution for the common man to be able to make the best use of
such technology. Users can use the website which will fetch predicted data and other
information as per users needs. Users can search for a particular stock along with an amount
they can invest. On the other hand, machine learning models will predict values for the searched
data and give the necessary output to the user. Users can use a chatbot for basic information
like stock price value, stock news, stock market information, etc.
                                                                                                  2
       Furthermore, after the values are fed, different models will be predicting future stock
values. Users can select the models which they want and the predicted values from that model
will be shown in the form of a chart using chart.js. Another feature is that the website will show
buying and selling time for the stock user has searched for in the same chart and a calculated
return the user will get for the amount the user is willing to invest.
       The training is based on the user’s searched values, the algorithm selects the dataset of
the searched stock for training the data. Users can get a comparison graph of past values and
predicted values to have a gist of the prediction accuracy. Users can then select which model to
choose for investing in that particular stock.
1.2. Motivation
In the unfortunate pandemic of COVID-19, whilst stuck in our homes hoping for quick control
on the outbreak, we noticed a drastic decline in the global economy and crashes in the stock
market. Which motivated us to explore the stock market and understood its widespread
importance to the Indian economy. We then realised how much it would benefit the Indian
audience if there was a software which enabled them to ease the process of trading. We further
realised how important it was to know the future stock prices in advance. Thus originated the
idea to predict the stock prices. Also, we noticed that the majority of people were scared to invest
in trading platforms because the platforms were too complicated. So we thought of creating a
platform where the user would be prompted every step during the trading process.
                                                                                                3
1.3. Objectives
● To provide users with a complete platform where they can understand the market trends.
● To facilitate the smooth trading process by providing users with graphs depicting the future
  trends that might occur and the best probable points to buy/sell stocks.
● To provide them with the latest stock market news which helps them to keep in touch with
  the latest updates of the market.
● To enhance user experience and provide assistance to the user with frequently asked
  questions, index values, top gainers / losers and all such relevant information via a virtual AI
  assistant.
                                                                                            4
Chapter 2
Related Theory
1. Data Cleaning: "Data cleaning is defined as the removal of noisy and irrelevant data from
    the collection." [1]
 ● Cleaning in case of Missing values.
 ● Cleaning noisy data, where noise is a random or variance error.
 ● Cleaning with Data discrepancy detection and Data transformation tools.
                                                                                       5
3. Data Selection: "Data selection is defined as the process where data relevant to the
   analysis is decided and retrieved from the data collection." [1]
● Data selection using Neural network.
● Data selection using Decision Trees.
● Data selection using Naive Bayes.
● Data selection using Clustering, Regression, etc.
5. Data Mining: "Data mining is defined as clever techniques that are applied to extract
   patterns potentially useful." [1]
● Transform Task-relevant data into patterns.
● Decides the purpose of the model using classification or characterization.
A. Generate reports.
B. Generate tables.
consists of three layers Bands Middle, Upper, and Lower. And we take 20 periods to calculate all
three-layer Bands. Middle Band simple is moving average and Upper Band and Lower Band is F
deviations of 2 subtracted by 20 periods.
       Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. ML is one of the most exciting technologies one has ever
encountered. As the name implies, it provides a computer that makes it more like humans: The
ability to learn. Machine learning is actively used today, perhaps in far more places than one
might expect. [2]
SVR is a regression algorithm, so we can use SVR for working with continuous Values instead of
Classification which is SVM. [3]
1. Kernel: The function used to map a lower-dimensional data into a higher dimensional data.
2. Hyper Plane: In SVM this is the separation line between the data classes. Although in SVR we
  are going to define it as the line that will help us predict the continuous value or target value
3. Boundary line: In SVM there are two lines other than Hyper Plane which creates a margin. The
  support vectors can be on the Boundary lines or outside it. This boundary line separates the
  two classes. In SVR the concept is the same.
4. Support vectors: These are the data points which are closest to the boundary. The distance of
  the points is minimum or least.
         As base learning models Random Forest has multiple decision trees. Row sampling and
feature sampling are randomly performed from the dataset forming sample datasets for every
model. This section is called Bootstrap.
         We need to approach the Random Forest regression technique like any other machine
learning technique.
                                                                                               9
●   Design a specific question or data and get the source to determine the required data.
●   Make sure the data is in an accessible format else convert it to the required format.
●   Specify all noticeable anomalies and missing data points that may be required to achieve the
    required data.
●   Create a machine learning model
●   Set the baseline model that users want to achieve
●   Train the data machine learning model.
●   Provide an insight into the model with test data
●   Now compare the performance metrics of both the test data and the predicted data from the
    model.
●   If it doesn’t satisfy your expectations, you can try improving your model accordingly or
    dating your data or use another data modelling technique.
●   At this stage, you interpret the data you have gained and report accordingly.
                                                                                                       10
         When gradient boost is used to predict a continuous value –such as age, weight, or cost
we're using gradient boost for regression. Using linear regression is not the same. So we'll stick to
regression in this article as this is a bit different from the configuration used for classification
                                                                                                11
        In gradient boosting, decision trees are used as weak learners. Decision Tree solves the
problem of machine learning by transforming the data into a tree representation. Each internal
node of the tree representation denotes an attribute and each leaf node denotes a class label. In
General, the loss function is the squared error (particularly for regression problems). The loss
function needs to be differentiable.
        Also like linear regression, we have concepts of residuals in Gradient Boosting Regression
as well. The difference between the current prediction and the known correct target value is
calculated by Gradient boosting Regression.
        This difference is called residual. After that Gradient boosting Regression trains a weak
model that maps features to that residual. This residual predicted by a weak model is added to
the existing model input and thus this process nudges the model towards the correct target.To
improve the overall model prediction repeat this step again.
2.3.4. LSTM
         LSTM which stands for Long Short Term Memory networks is a different type of RNN,
which can learn long-term dependencies. They were introduced by Hochreiter & Schmidhuber
(1997) and were refined and popularized by many people in the work. They work immensely well
on a large number of problems and are used widely.
         To avoid the long-term dependency problem, LSTMs are explicitly designed. Storing
information for a long time is practically their default behaviour.
         All recurrent neural networks have the shape of a sequence of repeating modules of the
neural network. In standard RNNs, this repeating module will have a very simple structure, such
as a single tanh layer. [6]
                                                                                                12
Figure 2.5: The repeating module in a standard RNN contains a single layer.
         LSTMs have this chain-like structure whereas the repeating module has a structure
differently. Instead of having a single neural network layer, there are four, interacting in a very
special way.
Figure 2.6: The repeating module in an LSTM contains four interacting layers.
         In the diagram above, from the output of one node to the inputs of others, each line
carries an entire vector. The circles in pink show pointwise operations such as vector addition
and learned neural network layers are shown as yellow boxes. Lines merging are concatenation,
whereas a line forking denotes its content being copied and the copied content going to different
locations.
       [4] Data Analysis Process is gathering information by using a proper application or tool
which allows you to explore the data and find a pattern in it. Based on that, you can make
decisions, or you can get ultimate conclusions. Data Analysis consists of the following phases:
First of all, one should know whether to do Data Analysis or not. For that, you should find the
reason or purpose of doing Data Analysis. You have to choose which type of data analysis you
wanted to go with. And after that you have to decide how to measure it and what to analyze , you
should know why you are investigating and to do this Analysis what measures you have to use.
• Data Collection
After gathering requirements , you will get an idea about what things you have to measure and
what should be your findings. Now based on requirements you have to collect your data. Once
data is collected, remember that the collected data must be processed or organized for Analysis.
As data is collected from various sources, you must have to keep a record with the source of the
• Data Cleaning
Now the data which you collected may be useless or irrelevant to the purpose of Analysis,
therefore data should be cleaned. There can be repeated records, white spaces or errors in the
                                                                                                14
data collected. The data must be cleaned and error-free. This phase should be implemented
before doing Analysis because based on data cleaning, the output of Analysis will be closer to the
expected outcome.
• Data Analysis
The data becomes ready once it is collected, cleaned, and processed. As you manipulate data, you
will notice that you have the exact information you need, or any need to gather more data. In
this phase, you can use tools for data analysis tools and any software which will based on the
• Data Interpretation
After analyzing your data you can now interpret your results. You can choose the way to express
or communicate your data analysis either you can use simply in words or maybe a table or chart.
Then use the results of your data analysis process to decide your best course of action.
• Data Visualization
Data visualization is very common in your day to day life; they often appear in the form of charts
and graphs. In other words, data shown graphically so that it will be easier for the human brain
to understand and process it. Data visualization is often used to discover unknown facts and
trends. By observing relationships and comparing datasets, you can find a way to find out
meaningful information.
                                                                                                  15
Two key techniques used with natural language processing are Syntax and semantic analysis.
       "Syntax is the arrangement of words in a sentence to make grammatical sense." NLP uses
syntax to evaluate sense from a language based on grammatical guidelines. Syntax techniques
used word segmentation (which divides a large piece of text to units), comprise parsing
(grammatical analysis for a sentence), morphological segmentation (which splits words into
groups) sentence breaking (which places sentence boundaries in large texts), and stemming
(which splits words with variation in them to origin forms).
       Semantics contains the usage and meaning behind words. NLP applies algorithms to
recognize the meaning and assembly of sentences. Techniques that NLP practices with semantics
contain word logic disambiguation (which originates the sense of a word based on context),
called entity recognition (which determines words that can be characterized into groups), and
natural language generation (which will practice a database to determine semantics behind
words).
       Present methods to NLP are based on deep learning, a type of AI that inspects and uses
patterns in data to progress a program’s understanding. Deep learning models need immense
quantities of labelled data to train on and classify pertinent correlations, and collecting this kind
of big data set is one of the key hurdles to NLP presently.
                                                                                               16
       Previous methods to NLP involved added rules-based tactic, where simpler machine
learning algorithms were expressed what words and phrases to look for in-text and specified
exact responses when those expressions appeared. Nevertheless, deep learning is an extra
flexible, instinctive approach in which algorithms study to classify speakers’ intent from
numerous examples, nearly similar to how a kid would learn human language.
       Three tools used normally for NLP contain NLTK, Gensim, and Intel NLP Architect.
NLTK, Natural Language Toolkit, is an open-source python module with data sets and tutorials.
Gensim is a Python library for topic modelling and document indexing. Intel NLP Architect is
also another Python library for deep learning topologies and techniques.
       "Web scraping, also known as web data extraction, is the process of retrieving or
“scraping” data from a website." [5] Unlike the mundane, mind-numbing process of manually
extracting data, web scraping uses intelligent automation to retrieve hundreds, millions, or even
billions of data points from the internet’s seemingly endless frontier.
                                                                                                17
       More than a modern convenience, the true power of web scraping lies in its ability to
build and power some of the world’s most revolutionary business applications. ‘Transformative’
doesn’t even begin to describe the way some companies use web scraped data to enhance their
operations, informing executive decisions down to individual customer service experiences. [13]
"A web crawler, which we generally call a “spider,” is an artificial intelligence that browses the
internet to index and searches for content by following links and exploring, like a person with
too much time on their hands." [5]
A web scraper is a specialized tool designed to accurately and quickly extract data from a web
page. Web scrapers vary widely in design and complexity, depending on the project.
                                                                                                18
1. First, our team of seasoned scraping veterans develops a scraper unique to your project,
   designed specifically to target and extract the data you want from the websites you want it
   from.
2. The data is retrieved in HTML format, after which it is carefully parsed to extricate the raw
   data you want from the noise surrounding it. Depending on the project, the data can be as
   simple as a name and address in some cases, and as complex as high dimensional weather
   and seed germination data the next.
3. Ultimately, the data is stored in the format and to the exact specifications of the project.
   Some companies use third-party applications or databases to view and manipulate the data
   to their choosing, while others prefer it in a simple, raw format - generally as CSV, TSV or
   JSON.
       Ultimately, the flexibility and scalability of web scraping ensure your project parameters,
no matter how specific, can be met with ease. Fashion retailers inform their designers with
upcoming trends based on web scraped insights, investors time their stock positions, and
marketing teams overwhelm the competition with deep insights, all thanks to the burgeoning
adoption of web scraping as an intrinsic part of everyday business.
                                                                                                   19
       This information is downloaded by an RSS feed reader that converts files into the latest
updates from websites in an easy-to-read way. It gives you news headlines, summaries, review
notifications, and links back to articles on your favorite web page. This content is still distributed
in real time so that high-quality RSS feeds remain the latest website publication. RSS feeds allow
you to create your own customized eZine for the most up-to-date content of topics and websites
you are interested in.
       Your favorite website writer or podcast creates an RSS feed that keeps a list of new
updates or notifications. You can check this list alone, or you can subscribe to the feed so that
updates will come from your feed reader. This keeps you informed of updates immediately.
                                                                                                  20
Chapter 3
Literature Review
       In this section, some light will be thrown on how machine learning and deep learning
benefit the traders to trade more efficiently and with ease.
1. Stock Price Prediction Using Lstm, Rnn And Cnn-sliding Window Model
   2. Stock price prediction using DEEP learning algorithm and its comparison
       with machine learning algorithms
https://www.researchgate.net/profile/Gholamreza_Mansourfar/publication/337735594_Stock_
price_prediction_using_DEEP_learning_algorithm_and_its_comparison_with_machine_learn
ing_algorithms/links/5e995014299bf13079a1f806/Stock-price-prediction-using-DEEP-learning
-algorithm-and-its-comparison-with-machine-learning-algorithms.pdf
Technique and Algorithm: ANN, SVM, Random Forest, Deep Learning, LSTM
Objective: This study seeks to evaluate the prediction power of machine‐ learning models in a
stock market. The data used in this study include the daily close price data of iShares MSCI
United Kingdom exchange‐traded fund from January 2015 to June 2018. The prediction process
is done through four models of machine‐ learning algorithms.
Research Gap: It is recommended to investigate different types of LSTM models, such as
stacked LSTMs, encoder–decoder LSTMs, bidirectional LSTMs, CNN LSTMs, and generative
LSTMs models in prediction of stock price. Finally, due to the use of time series data in this
study, the role of other influential factors on stock price prediction were not investigated;
therefore, researchers are recommended to consider the role of other factors in future studies and
compare the results obtained with the results of this study.
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.333.3366&rep=rep1&type=pdf
Technique and Algorithm: Bag of Words approach, Noun Phrasing and s Named Entities for
textual analysis. Genetic Algorithm, Naïve Bayesian and SVM for machine learning algorithms.
Objective: The research examines a predictive machine learning approach for financial news
articles analysis using several different textual representations: Bag of Words, Noun Phrases, and
Named Entities.It is found that a Proper Noun scheme performs better than the de facto standard
of Bag of Words in the metrics considered.
Research Gap: The dataset used is very small and hence further research must be done with a
large dataset.
                                                                                                 22
Publication Details: 2019 IEEE Fifth International Conference on Big Data Computing Service
and Applications (BigDataService)
https://ieeexplore.ieee.org/abstract/document/8848203
Technique and Algorithm: ARIMA, Facebook Prophet and RNN LSTM
Objective: This paper aims to improve the accuracy of stock price predictions by gathering a
large amount of time series data and analyzing it in relation to related news articles, using deep
learning models.
Research Gap:The models did not perform well in cases where stock prices are low or highly
volatile.
                                                                                                24
Chapter 4
Problem Statement
○   After studying the current platforms available for the Indian stock market, it was seen that
    the majority of the platforms do not provide a range of different models. This restricts the
    user to use only the one provided by them. On doing research, it was seen that different
    models provide varying accuracy for different stock prices. So, even if the platform provides
    the best performing model, it is not guaranteed that that particular model will perform
    accurately for every stock.
○   We tried major platforms available on the Internet and we found that there wasn't an option
    which suggested when you should buy the stocks and when you should sell the stocks. so not
    all traders are experienced as some are just beginning the trading career so so I can option is
    provided which tells them when they should buy the stock and when they should sell the
    stock it would benefit them a lot
○   Also on major websites, the earlier feature was not available; they couldn't tell how much you
    could earn if you invest a particular sum according to their algorithm by the end of a
    stipulated time. There are virtual trading apps available but, these apps don't tell how much
    return a user would get if a certain amount is invested in particular stocks.
○   It was also seen that on major stock trading websites there wasn't an option where you could
    view the recent updates on the stock market or companies which provide stocks. usually, a
    beginner doesn't have any idea about what's going on in the market and starts trading. This
    lack of knowledge can lead a beginner to have more losses than having a successful trade.
                                                                                                 25
○   Given all the missing features stated above, there wasn't a single platform available which
    provided all these features as a single package. The user has to perform research on a
    separate platform and trade on a separate platform which makes it hectic for the user. The
    entire trading process becomes too time-consuming when you have to constantly switch
    between platforms to do different kinds of work.
1. It is seen that the current stock market Application doesn't provide different models To
    predict the stock market value. in this application we have come across various models of
    such as Support Vector Regression, Random Forest Regressor,Gradient Boosting Regressor
    And Long Short Term Memory networks             to      predict values Of stock which user has
    searched and can select from the models which shows best accuracy
2. When a user searches for Any particular stock, the application also tells the user when to buy
    and when to sell the stocks for optimal returns. This way users can trade stocks in the way
    they feel the best.
3. Trading agent calculates the estimated amount the user would get if the user buys or sells
    stock at that given time provided by the agent. This estimated amount helps the user know
    how much he would earn after trading in stocks and plan the future accordingly.
4. Recent news is provided to the user about the stock and Corporate so that any trader can
    stay up to date and decide whether to sell or buy stocks. This news helps traders stay aware
    of the company and plan their trade.
5. We have a standalone app which provide many features that as predicting stock prices
    predicting this buy and sell    time for best        returns,   chatbot to get basic information
    regarding stock and stock market and page for up to date news
                                                                                                 26
Chapter 5
Project Description
       The stock market plays a very important role in our country's economy. While investment
in the stock market has become more popular in recent times, overall inflation remains low
which means. Only 2.78 Indians invested in the stock market of about 2% of the country's
population. Compared to more than 50% of Americans owning UN shares in neighboring China,
the percentage of people investing in the 7% stock market. India's 2% figure shows that there is a
huge head of Indian stock market entry to grow.
       With the increasing demand for stockbrokers, there has been a need for a better and more
efficient platform for people to trade easily.
       As the flowchart (Figure 5.1) implies, the system's flowchart is pretty generic at the start
i.e., the same as any user based website. The change starts after the creation of the project where
the surgery can colon next line
   ●   Enter the stock name investment amount to view the protected graphs and to see the
       points when to buy and when to sell.
   ●    View the trending news feed of the Indian stock market.
   ●   Interact with the AI chatbot
   ●   Find out the sentiment score of the news articles of the entered company name. A
       sentiment score allows the user to judge how a particular event in the world would affect
       the stock prices.
                                                      27
5.2.1. Modules
The modules implemented in the project with their explanation are as follows:
       The major issue faced by investors is the challenge to predict the price of the stock in the
future. With the help of machine learning and deep learning it is possible to make the predictions
for future stock prices depending on the trends experienced in the past.
       In order to make the predictions, we can make use of technical indicators which helps
study certain elements from the historical data of the stock, such as, trend, volatility and
oscillations. These technical indicators are majorly used to produce better insights of the
movement of the stock in the future. And hence play a vital role in prediction, when passed with
the stock prices.
       After we generate the indicator values using the TA-Lib library in python we create a final
data frame which is then passed to our machine learning models. We use 6 models which
independently predict the closing price for a given stock. The models we use are:
a. LSTM
       LSTM solves the problem of vanishing gradients and gives us much better accuracy than
RNN. LSTM is made up of three gates and one cell state. These cell states are additional
interactions. The forget gate which takes the old state and the input and multiplies it with the
respective weights and us just to through a sigmoid activation. And we just process with the
input gate and output gate C that is the intermediate cell state and then we calculate Ct which is
the cell state using function the input gate and intermediate cell state are added with the old cell
and forget gate. (See Fig. 1.) Then we pass the cell state through the tanh activation and then
multiply with the output gate.
                                                                                             29
        We use SVR for constant values as an alternative to classification. SVM supports both
linear and non-linear regression that can be stated as SVR. The thickness of the road is
controlled by a hyperparameter Epsilon. SVR accomplishes linear regression in higher
(dimensional space) and each data point in the training signifies its own pwn dimension. When
we evaluate our kernel among a test point and a point in the resultant values gives you the
coordinates of our test point in that dimension. In SVR we effort to fit a convinced threshold.
Linear, Polynomial, and RBF are basically different in case of making the hyperplane decision
boundary the classes. RBF gives more correctness than Linear and Polynomial which take fewer
time.
        Random Forest is a method to achieve both regression and classification use of multiple
trees to get output results In RF x value to consider as a technical indicator (independent
variable) and y value as strategy signal (dependent variable).
        Basically, GBR works with the Boosting Technique that computes the base model and will
give us one output (average of dependent). It works with many Decision Trees and will compute
the residual that is errors. It relies on the best next model, and combined with the previous
model, and minimized the prediction errors that are overall. The key reason of GBR is to set the
goal output for other models.
                                                                                                 30
        News on shares and stocks adds muscle to our speculations and helps you win the best
deals without having to consult with a broker every single time. The value of reading daily stock
news to an investor has never been measured. So much so that real success and failure depend
on how well the investor can make effective decisions based on real facts. Stock market news
alerts you to important business world events.
        To implement a running newsfeed we used RSS feeds to fetch the financial data from a
credible source and display it on the website. RSS feeds provide timely updates which are
updated by the concerned authority. RSS feeds are updated constantly,           so we needed to
implement a real-time news feature which displayed news as and when the RSS feeds were
updated. To do so we used the RSS widget provided by surfing-waves.com. The widget provided
by the website gives you the freedom of selecting up to 10 different RSS feeds and displaying it
simultaneously. You also get the freedom of choosing the styles of the widget. The styles of the
widget are very basic so we thought of using our custom CSS to improve the look of the widget.
The CSS file of the news feed widget was to be uploaded on an HTTPS server.
        We used a stock trading bot using Deep Reinforcement Learning, especially Deep
Q-learning. In general, Reinforcement Learning is a family of machine learning strategies that
allow us to create intelligent agents who learn from the environment through their interactions,
as they learn the right policy through trial and error. This is especially useful in many real-world
tasks where supervised learning might not be the best approach due to various reasons like the
nature of the task itself, lack of appropriate labelled data, etc.
        The important idea here is that this approach can be applied to any real-world activity
that can be freely described as a Markovian process.
(buy/sell/hold), observes a subsequent state, receives some reward signal (difference in portfolio
position) and lastly adjusts its parameters based on the gradient of the loss computed.
       Q-learning algorithm had several improvements over the years, and a few have been
implemented in this project:
1. Vanilla DQN
3. Double DQN
Drawbacks:
● At any given state, the agent can only decide to buy/sell one stock at a time. This is done to
  keep things as simple as possible as the problem of deciding how much stock to buy/sell is one
  of the portfolio redistributions.
● Training is preferably done on CPU due to its sequential manner, after each episode of trading,
  we replay the experience (1 epoch over a small minibatch) and update model parameters.
         Sentiment Analysis or Opinion Mining refers to the use of NLP, text analysis and
computational linguistics to determine subjective information or the emotional state of the
writer/subject/topic.
                                                                                                    32
It is often used in reviews that save businesses a lot of time from learning by making ideas. [9]
Methodology
          The concept is straightforward. We crawl the Business Times with any Facebook-related
article, mine texts, we get all the feelings for each day, and we buy 10 shares in its stock when the
price goes up by 0.5 or sells down by 0.5. Remember that emotions range from 1 to 1 and 0 is
neutral and we use the feelings of the past day to trade in the present day.
Scraping
The sentiment is stored in a dictionary e.g. {datetime.date(2018,7,5):-0.59,...,}
            The sentiment score of a text can be achieved by summarizing the intensity of each
word in the text and then becoming familiar with it. Vader's individual ratings used 5 heuristics
to analyze the sentiment:
4.Conjunctions (shift in sentiment polarity, with later dictating polarity) — I love pizza, but I
   really hate Pizza Hut (bad review)
5.Preceding Tri-gram (identifying reverse polarity by examining the tri-gram before the lexical
   feature— Canadian Pizza is not really all that great.
            VADER however focuses on social media and short texts as opposed to Almost Money
Alternative. I put a paragraph in my notebook to revive the VADER dictionary with words +
feelings from other sources / dictionaries such as Loughran-McDonald Financial Sentiment
Word Lists.
5.2.1.5. Chatbot
           The stock markets have taken an exceptional leap in the past years with a massive growth
of investors and shares in the market. However, this hype has also increased the uncertainty of
stocks, which makes it even riskier than it should be. Yet, we see a huge number of people reap
rewards with their investments in the market. Which can only be achieved by being aware of all
the latest news about the market. It is a crucial step in placing trades in the market. And can
certainly help users reap better returns. But with a market of such capacity, it is really difficult to
keep track of various stocks and its related information. This motivated us to build a chatbot to
help users get real-time news updates, stock prices and mostly all the information users need to
make better investments. We do all the heavy lifting while users seek benefits. Our chatbot is
designed to scrape the internet to fetch real-time news and provide users with the most suitable
results.
                                                                                                 34
1. Provides opening, closing, high and low prices of any stock for a particular date
In this fast-paced life, time is of the essence for any task we perform. And being able to find
prices for a particular stock for a specific date can surely take a great amount of time and effort.
Our chatbot provides a much efficient and easy process to do the same. A simple question
mentioning the stock name, date and the type of price users would like to know which in turn
provides users with what users were looking for, all within just a few seconds.
Technologies Used:
Dialogflow
        Dialogflow is a Google-owned framework that enables users to develop human-computer
interaction technologies that can support Natural Language Processing (NLP). It lets users make
Digital Programs that interact with end-users through natural languages.
Flask
        Flask is a lightweight WSGI web application framework. It is designed to make getting
started quick and easy, with the ability to scale up to complex applications. The backend for the
chatbot was done using a webhook which was hosted via Flask.
Ngrok Server
        Ngrok is a reverse proxy that creates a secure tunnel from a public endpoint to a locally
running web service. ngrok captures and analyzes all traffic over the tunnel for later inspection
and replay.
                                                                                            36
Botcopy
       Botcopy is a UI for AI chatbots, which helps to integrate Dialogflow chatbots on any
website. We have linked the AI chatbot with Botcopy which provides a seamless UI experience on
our website. The figure shows the console of Botcopy linked with our StockTracker AI chatbot on
Dialogflow.
A DFD Level 2 offers a more detailed overview of the system than the previous DFD
Level 1. The figure 5.7 shows the DFD Level 2 of our system.
A block diagram is an approach to describing system structure. The Figure 5.8 shows the
A sequence diagram describes how the various entities of the system interact with each
other in a time sequence. The figure 5.9 shows the sequence diagram of our system.
relationships between the actors and use cases. The Figure 5.10 shows the use case diagram of our
system.
       The data set is generated when the user gives the input of the company name. The data is
fetched online from Yahoo Finance API. The data set consists of seven fields namely-
   ➢ Date
   ➢ High
   ➢ Low
   ➢ Open
   ➢ Close
   ➢ Volume
   ➢ Adjacent Close
       After fetching the data from yahoo, we need to clean the data and keep only the fields that
are required by our prediction models.
We then add a few more columns to create the final data frame that will be passed as input to our
machine learning models.
                                                                                                 44
       The columns added would be MACD, MACD Histogram, RSI and Bollinger Bands
indicator values. And also, a Next Move column which will be used to train the models. The final
dataframe would look like below:
       After creating the final data frame, we now come to the part where we need to train our
models on the data fetched and make predictions with the trained models. We take 90% of the
data for training our models and test them on the remaining 10%, in order to cover as many price
movements as possible.
                                                                                               45
        The figures below illustrate the configuration with which we have developed, trained and
 tested our models for the final data frame created.
           The user would enter the stock name to get details regarding the stock prices and the
 suggested trading strategy.
          This page demands from the user, the stock symbol of the company and the maximum
 number of articles he wants to analyse pertaining to that company which would display the
 sentiment analysis of the different news articles pertaining to the provided company name.
1. Dataset processing
               {
                   label: "Predicted Values",
                   borderColor: 'rgb(51, 133, 255)',
                   data: data.predicted_valuesSVR_Poly,
                   lineTension: 0,
                   fill: false
               }
               ]
          },
Figure 5.19 : A code snippet for generating a Chart.JS line graph after giving data points
                                        as input.
df['position'] = position
df['action'] = actions
actual = alt.Chart(df).mark_line(
color='green',
opacity=0.5
).encode(
x='date:T',
).interactive(
bind_y=False
points = alt.Chart(df).transform_filter(
alt.datum.action != 'HOLD'
).mark_point(
filled=True
).encode(
x=alt.X('date:T', axis=alt.Axis(title='Date')),
color='action'
).interactive(bind_y=False)
return chart
<script type="text/javascript">
rssfeed_url[0] = "https://www.indiainfoline.com/rss/news.xml";
rssfeed_frame_width = "100%";
rssfeed_frame_height = "500px";
rssfeed_scroll = "on";
rssfeed_scroll_step = "6";
rssfeed_scroll_bar = "off";
rssfeed_target = "_blank";
rssfeed_css_url = "https://c92ffefa54a5.ngrok.io/static/news.css";
rssfeed_title = "off";
rssfeed_item_date = "off";
rssfeed_item_description = "on";
rssfeed_item_description_length = "3000";
rssfeed_cache = "de201ee2e3c4faf95eefcb55ba1c9a24";
</script>
<script
type="text/javascript"
src="//feed.surfing-waves.com/js/rss-feed.js"
></script>
  Figure 5.21 shows a code snippet for generating an RSS feed after giving RSS feed source as
                                            input.
                                                                             51
Chapter 6
6. Implementation Results
1.1 LSTM
1.2 SVR-Linear
1.3 SVR-Polynomial
1.4 SVR-Rbf
Performance Metrics
The values displayed in figure 6.14 are the mean squared values by which our models differed
from the original predictions.
The figure 6.16 shows the news updates live feed implemented using RSS Feed widgets.
[12] The following figure shows the output of the sentiment analysis performed on news
6.4. Chatbot
      The following figures show the various commands which work on the Dialogflow interface.
The below Figure 6.18 shows how the chatbot returns the opening/closing/high/low price of a
stock on the input date.
   The below Figure 6.19 shows how the chatbot returns the latest gainer/list of gainers on the
current financial day.
   The below Figure 6.20 shows how the chatbot returns the latest loser/list of losers on the
current financial day.
   The below Figure 6.21 shows how the chatbot returns the index quotes of a particular NIFTY
index.
   The below Figure 6.22 shows how the chatbot returns the advances/declines of a particular
NIFTY index.
   The below Figure 6.23 shows how the bot interacts when the user asks small talk questions
unrelated to stock markets.
6.4.7 ngrok
The following Figure 6.24 shows the console of ngrok which enables to expose a local
development server to the Internet.
6.4.8 http-server
Chapter 7
Applications
Chapter 8
        The proposed system has scope in terms of ease of use, enhanced visualization,
user-friendly interface and integration with a wide range of services. In order to stay ahead in
the stock market, it is crucial to be well aware of the various news updates, which are provided
to the user conveniently. Also, providing links to the articles to get detailed information
pertaining to the article.
        In addition to the updates on the news articles, our system is capable of running
sentiment analysis tests using NLTK to prompt the user regarding the effect of the news
(positive or negative). This can provide the investor with a slight edge over the others.
        Coming to the centre of attraction of our application, the stock prediction and trading
agent: Our system is capable of predicting prices for any stock listed in the NSE. We run
predictions on 6 different AI models namely - LSTM, SVR (Linear Kernel), SVR (Polynomial
Kernel), SVR (Rbf kernel), Random Forest Regressor and Gradient Boosting Regressor. Once we
predict the closing prices for the provided stock name, we pass these predictions to our trading
agent to generate a strategy to gain maximum profit.
        The additional features and enhancements would be involving the latest news articles
and relevant updates related to the stock in the task of predicting the prices. We can also
optimize the stock prediction models accuracy by the use of reinforcement learning to learn
from the faulty predictions and improve on them. The proposed system lacks maintaining the
                                                                                               69
state of the user’s work (user cannot view the previously analysed stocks). Another feature worth
adding would be the ability for the system to place trades in the live market after displaying a
simulation to the user.
                                                                                             70
References
[4]   (2017, April) What is data analysis- types, process, methods, techniques.
      https://www.guru99.com/what-is-data-analysis.html
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
[7]   Yadav, Sameer. (2017), “STOCK MARKET VOLATILITY - A STUDY OF INDIAN STOCK
      MARKET”, Global Journal for Research Analysis, 6. 629-632.
[8]   Kustrin, Snezana & Beresford, Rosemary. (2000). “Basic concepts of artificial neural
      network (ANN) modeling and its application in pharmaceutical research”. Journal of
      pharmaceutical and biomedical analysis, 22. 717-27. 10.1016/S0731-7085(99)00272-1.
[9]   Shetty, Nisha & Pathak, Ashish. (2017), “Indian Stock Market Prediction Using Machine
      Learning and Sentiment Analysis”, 10.1007/978-981-10-8055-5_53.
                                                                                            71
[10] https://blog.quantinsti.com/predicting-stock-trends-technical-analysis-random-forests/
[11]    Vaanchitha Kalyanaraman, Sarah Kazi, Rohan Tondulkar, Sangeeta Oswal, “Sentiment
        Analysis on News Articles for Stocks”, 2014 8th Asia Modelling Symposium
[12] (2018, November) Algorithmic Trading using Sentiment Analysis on News Articles
        https://towardsdatascience.com/https-towardsdatascience-com-algorithmic-trading-usin
        g-sentiment-analysis-on-news-articles-83db77966704
https://info.scrapinghub.com/web-scraping-guide/beginners-guide-to-web-scraping
[14]    https://rss.com/blog/how-do-rss-feeds-work/
                 72
Appendix A
Research Paper
        Stock Price Prediction and Trading Agent using
                            ML/DL
                               1                  2                3                         4
          *Nilesh Patil , Chinmay Gawde , Ethan Palani , and Jeswin Thomas
 1
     Assistant Professor, Fr. Conceicao Rodrigues College of Engineering, Mumbai.
                                   1,*nilesh.patil@fragnel.edu.in
            2,3,4
                    Student, Fr. Conceicao Rodrigues College of Engineering, Mumbai.
                                      2gawdechinmay@gmail.com
1 Introduction
Machine Learning has far surpassed the limits of speed and computations set by the
human mind. It is a powerful concept that has recently come into the boom, there
are countless applications of its technology in various domains. One such
application of machine learning is stock price prediction. Algorithmic Trading was
legalised in India in only 2008 whereas in foreign countries trading using algorithms
control nearly one-third of the total trading. This concept of algorithmic trading was
introduced nearly thirty years ago in the US. The predictions are based on past and
present data and most commonly by analysis of trends. It can be broken down into
the following terms [1]:
These predictions help the brokers in the stock market in planning recuperative
courses of action in cases of utmost trouble [2]. If algorithms are able to provide
close to accurate results on price predictions, traders can maximise their return on
investment [3]. Some people are of the opinion that one cannot predict the flow of
the stock market, others might argue that it can be predicted with a fair amount of
accuracy [4]. It is however difficult for investors to analyze and predict market
trends according to all of this information [5]. This led us to develop a solution for
the common man to be able to make the best use of such technology. Users can use
the website which will fetch predicted data and other information as per users needs.
Users can search for a particular stock along with an amount they can invest. On the
other hand, machine learning models will predict values for the searched data and
give the necessary output to the user. Users can use a chatbot for basic information
like stock price value, stock news, stock market information, etc.
2 Review of Literature
has higher accuracy compared to PCA representing data over the data set which is
untransformed. He observed that patterns of classification accuracy emerge as the
amount of hidden layers in the DNN model increases. Naik [15] mainly focused on
predicting stock prices for short term trading. Technical indicator feature selection
and accurate prediction model for stocks. Results showed that ANN outperformed
compared to other existing works.
3      Methodology
In order to make the predictions, we can make use of technical indicators which
helps study certain elements from the historical data of the stock, such as, trend,
volatility and oscillations. These technical indicators are majorly used to produce
better insights of the movement of the stock in the future. And hence play a vital
role in prediction, when passed with the stock prices.
After we generate the indicator values using the TA-Lib library in python we
create a final data frame which is then passed to our machine learning models. We
use 6 models which independently predict the closing price for a given stock. The
models we use are:
3.1 LSTM
Random Forest is a method to achieve both regression and classification use of mul-
tiple trees to get output results In RF x value to consider as a technical indicator
(independent variable) and y value as strategy signal (dependent variable).
GBR works with the Boosting Technique that computes the base model and will
give us one output (average of dependent). It works with many Decision Trees and
will compute the residual that is errors. It relies on the best next model, and com-
bined with the previous model, and minimized the prediction errors that are overall.
The key reason for GBR is to set the goal output for other models.
A stock trading bot has been implemented which uses Deep Q-learning. Deep Q-
learning is a part of Deep Reinforcement Learning. Deep Reinforcement Learning
belongs to machine learning techniques. It allows us to make brilliant agents that
can learn through interactions from the surroundings. By trying and making mis-
takes it improves itself. When the supervised learning approach is not the top ap-
proach for actual activities where there are no proper tags for the data or missing
information, this technique is very useful. The main perspective here is to under-
stand that this technique can be used in any real-life activities. This can be said as a
Markov process.
In this method of Deep Q-learning, the agent first goes through the initial state,
chooses and performs an action (either buy, hold or sell), then watches the adjacent
state, obtains signal and accordingly finalizes its parameters to the gradient.
Some of the Q-learning algorithms which we have executed in this research are:
1. Vanilla Deep Q-Networks
2. Double Deep Q-Networks
3. Dueling Network Architectures
4. Deep Q-Networks with fixed target distribution
5. Prioritized Experience Replay
Drawbacks
1. To keep things simple, the broker can only do one thing at a time that is, either
   buy or sell one share. Deciding how many stocks to sell is another problem
   encountered with it.
2. N-day window function, we are trading the closing price, after which a sigmoid
   function is implemented to simplify values in the interval [0,1].
3. In a sequential manner, training of the agents is done on the CPU. The
   experiences are reproduced following each trading episode and then updating
   the parameters.
                                                                          5
4 Results
Performance Metrics
We evaluate the performance of the models based on 3 metrics: Mean Absolute
Error, Mean Squared Error and Directional Symmetry.
5 Conclusion
The proposed system has scope in terms of ease of use, enhanced visualization,
user-friendly interface and integration with a wide range of services. In order to
stay ahead in the stock market, it is crucial to be well aware of the various news
updates, which are provided to the user conveniently. Also, providing links to the
articles to get detailed information pertaining to the article.
In addition to the updates on the news articles, our system is capable of running
sentiment analysis tests using NLTK to prompt the user regarding the effect of the
news (positive or negative). This can provide the investor with a slight edge over
the others.
Coming to the centre of attraction of our application, the stock prediction and trad-
ing agent: Our system is capable of predicting prices for any stock listed in the
NSE. We run predictions on 6 different AI models namely - LSTM, SVR (Linear
Kernel), SVR (Polynomial Kernel), SVR (Rbf kernel), Random Forest Regressor
and Gradient Boosting Regressor. Once we predict the closing prices for the pro-
vided stock name, we pass these predictions to our trading agent to generate a
strategy to gain maximum profit.
The additional features and enhancements would be involving the latest news ar-
ticles and relevant updates related to the stock in the task of predicting the prices.
We can also optimize the stock prediction models accuracy by the use of reinforce-
ment learning to learn from the faulty predictions and improve on them. The pro-
posed system lacks maintaining the state of the user’s work (user cannot view the
previously analysed stocks). Another feature worth adding would be the ability for
the system to place trades in the live market after displaying a simulation to the
user.
References
2. K. Khare, O. Darekar, P. Gupta and V. Z. Attar: Short term stock price prediction using
    deep learning 2nd IEEE International Conference on Recent Trends in Electronics, In-
    formation & Communication Technology (RTEICT), Bangalore, India, pp. 482-486
    (2017)
3. Nikou, Mahla & Mansourfar, Gholamreza & Bagherzadeh: Stock price prediction using
    DEEP learning algorithm and its comparison with machine learning algorithms. Intelli-
    gent Systems in Accounting, Finance and Management (2019)
4. Mehtab, Sidra & Sen, Jaydip.: A Robust Predictive Model for Stock Price Prediction
    Using Deep Learning and Natural Language Processing (2019)
5. Akita, Ryo & Yoshihara, Akira & Matsubara, Takashi & Uehara, Kuniaki.: Deep learn-
    ing for stock prediction using numerical and textual information. 1-6. (2016)
6. Huynh, Huy & Dang, L. Minh & Duong, Duc.: A New Model for Stock Price Movements
    Prediction Using Deep Neural Network (2017)
7. M, Hiransha & Gopalakrishnan, E. A & Menon, Vijay & Kp, Soman.: NSE Stock Market
    Prediction Using Deep-Learning Models. Procedia Computer Science (2018)
8. Singh, Ritika & Srivastava, Shashi.: Stock prediction using deep learning. Multimedia
    Tools and Applications (2017)
9. Yoshihara, Akira & Fujikawa, Kazuki & Seki, Kazuhiro & Uehara, Kuniaki.: Predicting
    Stock Market Trends by Recurrent Deep Neural Networks. 8862. 759-769. (2014)
10. Akita, Ryo & Yoshihara, Akira & Matsubara, Takashi & Uehara, Kuniaki.: Deep learn-
    ing for stock prediction using numerical and textual information. 1-6. (2016)
11. Jeong, Gyeeun & Kim, Ha.: Improving Financial Trading Decisions Using Deep Q-learn-
    ing: Predicting the Number of Shares, Action Strategies, and Transfer Learning. Expert
    Systems with Applications (2018)
12. Li, Yuming & Ni, Pin & Chang, Victor.: Application of deep reinforcement learning in
    stock trading strategies and stock forecasting. Computing. 102. (2020)
13. Yuan, Yuyu & Wen, Wen & Yang, Jincui.: Using Data Augmentation Based Reinforce-
    ment Learning for Daily Stock Trading. Electronics. 9. 1384. (2020)
14. Zhong, Xiao & Enke, David.: Predicting the daily return direction of the stock market
    using hybrid machine learning algorithms. Financial Innovation (2019)
15. Naik, Nagaraj & Mohan, Biju.: Optimal Feature Selection of Technical Indicator and
    Stock Prediction Using Machine Learning Technique (2019)