Stocks Final Word
Stocks Final Word
ANALYTICS
                              A Major Project Report
Submitted in partial fulfilment of the requirements for the award of the degree of
Bachelor Of Engineering
In
By
Dr.A.USHASREE
Assistant Professor,
Dept.of CSE
                                               1
    DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
                     Gokaraju Lailavathi Engineering College
                          (Affiliated to Osmania University)
                        Bachupally, Kukatpally, Hyderabad –
                                    500090 (2024-2025)
CERTIFICATE
This is to Certify that A Major Project report entitled “Stock Analysis And Prediction
Using Big Data Analytics” is being submitted by Deepthi Mane(245621733097), Muttineni
Sadhvika (245621733104) and Nalla Bhargavi(24562173312) in partial fulfilment of the
requirement of the award for the degree of Bachelor of Engineering in “Computer Science
and Engineering” O.U., Hyderabad during the year 2022-2023 is a record of Bonafide work
carried out by them under my guidance. The results presented in this project have been
verified and are found to be satisfactory.
External Examiner(s)
                                                  II
          DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
500090 (2024-2025)
DECLARATION
                                                    III
                                  ACKNOWLEDGEMENT
Our deepest gratitude goes to our mentor, Dr. A.Ushasree, Assistant Professor, Gokaraju
Lailavathi Engineering College. Her unwavering support, insightful guidance, and
constructive feedback have been invaluable throughout our research journey.
Our sincere appreciation extends to Dr. Padmalaya Nayak, Professor & Head of the
Department Gokaraju Lailavathi Engineering College, for motivational words and invaluable
input that helped shape our project.
Our thanks also go to all faculty members and staff of the Computer Science andEngineering
Department for their support and encouragement throughout our academic journey.
Lastly, we are deeply indebted to our friends and families for their unwavering emotional
support and understanding, which have been crucial in helping us compl
                                                    IV
                                 TABLE OF CONTENTS
                                         V
                           LIST OF FIGURES
LIST OF Tables
                                       VI
                                      ABSTRACT
Big data analytics have become integral across various sectors for accurate prediction, pattern
recognition, and insightful analysis of massive datasets. These techniques enable the
discovery of valuable insights that would otherwise remain hidden in traditional data
processing systems. In this paper, we propose a robust Cloudera-Hadoop-based data pipeline
designed to handle and analyse data of any scale and type, providing a scalable and
fault-tolerant architecture for real-time analytics. The pipeline integrates the Apache Hadoop
big-data ecosystem, leveraging HDFS for distributed storage and YARN for resource
management, ensuring efficient processing of high-volume stock market data. As a use case,
we focus on selected stocks from the US stock market and utilize real-time financial data
sourced from Yahoo Finance to analyse and predict daily stock gains. The data pipeline
preprocesses raw stock data, which is then split into training and testing datasets. To enhance
forecasting accuracy, we employ a hybrid approach combining the Autoregressive Integrated
Moving Average (ARIMA) model with Long Short-Term Memory (LSTM) networks.
ARIMA captures linear trends in the data, while LSTM models address non-linear patterns,
providing a comprehensive analysis. The performance of each model is evaluated based on
accuracy, precision, and recall, and the best-performing model is used to forecast stocks with
high daily returns. The proposed solution demonstrates the capability of big data technologies
in financial analytics, providing a scalable, efficient, and accurate mechanism for real-time
stock prediction. This framework can be extended to other domains where large-scale,
real-time data processing and analytics are required.
                                              Vii
                                    CHAPTER-1
INTRODUCTION
Big data has been attached to great importance for the proliferation of a lot of different
sectors. It has been extensively employed by business organizations to formalize important
business insights and intelligence. Furthermore, it has been utilized by the healthcare sector
to discover important patterns and knowledge so as to improve the modern healthcare
systems. Besides, big data holds significant importance for the information, technology and
cloud computing sector. Recently, the finance and banking sectors utilized big data to track
the financial market activity. Big data analytics and network analytics were used to catch
illegal trading in the financial markets. Similarly, traders, big banks, financial institutions and
companies utilized big data for generating trade analytics utilized in high frequency trading.
Besides, big data analytics also helped in the detection of illegal activities such as: money
laundering and financial frauds. In this paper, we hope to build a system which analyses US
oil stocks to predict daily gains in US stocks based on the real time data from Yahoo Finance.
About all 13 stocks in the US oil fund are picked up and their daily gain data are divided into
training and test data set to predict the stocks with high daily gains using the Machine
Learning module of Spark. Based on our analysis we propose a robust ClouderaHadoop
based data pipeline to perform this analyses for any type and scale of data. By means of
studying live stream data of US oil stock prices so that it can help us better understand how
the US Oil index affects the stock price of other stocks in US oil funds exchange. Besides, it
can help us predict the profitable stocks for stock traders and provide profits to the US Oil
stocks trader community.
1.1 OBJECTIVE
1. The goal is to harness the capabilities of big data by developing robust machine learning
models, specifically ARIMA and LSTM algorithms. These models are intended to analyze
vast quantities of data, with the overarching objective of predicting stock movements.
Emphasizing the significance of big data processing, the project aims to uncover meaningful
insights for stock traders and facilitate data-driven decision-making.
                                                1
  2. The objective is to implement PYSPARK, a Python API for Apache Spark, with the goal
  of efficiently processing large volumes of stock data. Acknowledging the streaming nature of
  this data, the project aims to ensure its capacity to handle substantial datasets and maintain
  responsiveness to real-time changes in stock prices.
  3. The project's objective is to apply the XGBOOST algorithm to refine and enhance the
  relevance of features within the stock data. By systematically removing irrelevant variables,
  the goal is to improve the accuracy of predictions made by the machine learning models. This
  contributes to more reliable outcomes, aligning with the overarching goal of providing
  valuable insights for stock traders.
  4. The project aims to assess the performance of the ARIMA and LSTM algorithms, utilizing
  the R SQUARED metric. This objective provides a quantitative measure of how well the
  models predict stock movements. With a higher R SQUARED value signifying a more
  effective predictive capability, the goal is to aid in the selection of the most suitable
  algorithm for comprehensive stock analysis.
This project is to develop a stock analysis prediction system using big data analytics. Illicit
  trading in the financial markets was detected through the use of network analytics and big
  data analytics. In a similar vein, traders, large banks, corporations, and financial
  organizations used big data to produce trade analytics used in high frequency trading.
Recently, the finance and banking sectors utilized big data to track the financial market activity.
  Big data analytics and network analytics were used to catch illegal trading in the financial
  markets. Similarly, traders, big banks, financial institutions and companies utilized big data
  for generating trade analytics utilized in high frequency trading. Besides, big data analytics
  also helped in the detection of illegal activities such as: money laundering and financial
  frauds.
                                                 2
1.3.1 DISADVANTAGES OF EXISTING SYSTEM:
In this paper, we hope to build a system which analyses US oil stocks to predict daily gains in
US stocks based on the real time data from Yahoo Finance. About all 13 stocks in the US oil
fund are picked up and their daily gain data are divided into training and test data set to
predict the stocks with high daily gains using the Machine Learning module of Spark. Based
on our analysis we propose a robust ClouderaHadoop based data pipeline to perform this
analyses for any type and scale of data.
                                                3
                                         CHAPTER 2
                                  LITERATURE
                                  SURVEY
  2.1 Price Trend Prediction of Stock Market Using Outlier Data Mining
  Algorithm:
ABSTRACT:In this paper we present a novel data mining approach to predict long term
  behavior of stock trends. Traditional techniques on stock trend prediction have shown their
  limitations when using time series algorithms or volatility modelling on price sequence. In
  our research, a novel outlier mining algorithm is proposed to detect anomalies on the basis of
  volume sequence of high frequency tick-by tick data of the stock market. Such anomaly
  trades always inference with the stock price in the stock market. By using the cluster
  information of such anomalies, our approach predicts the stock trend effectively in the real
  world market. Experiment results show that our proposed approach makes profits on the
  Chinese stock market, especially in a long-term usage.
                                                4
2.3    Stock market: Statistical analysis of its indexes and its constituents:
ABSTRACT:The ever-changing realm of the stock market is constantly thriving under the
process of modifications and alterations. Thus, making a profit from it is hard and requires
intensive planning. It is in the context of this fact that makes Stock Market analysis the first
and foremost priority for any financial investment. Considering the behavioural aspects of
stock prices which have a tendency to rise and fall unexpectedly, leads to a volatile scenario.
However, to acquire some insight, intellectual wit and smartness to extract the best, a
thorough and consistent analysis is the most popular and tested way. This paper aims to
determine top high performing stocks having good returns under a given index that would be
most safe and beneficial for investment. Using historical data we were able to obtain top
stocks that are advisable for investment. We also verified our results by analyzing
contemporary data similarly and found out that the performance and returns of these stocks
were still high irrespective of volatility.
ABSTRACT:Big data is a new and emerging buzzword in today's times. Stock market is an
up and ever evolving, volatile, uncertain and intriguingly potential niche, which is an
important extension in finance and business growth and prediction. Stock market has to deal
with a large amount of vast and distinct data to function and draw meaningful conclusions.
Stock market trends depend broadly on two analyses; technical and fundamental. Technical
analysis is carried out using historical trends and market values. On the other hand,
fundamental analysis is done based on the sentiments, values and social media data and
responses. Since large, complex and complicated and exponentially growing data is involved,
we use big data analysis to help assist in the prediction and drawing accurate business
decisions and profitable investments.
                                                                                            2.5
Stock market prediction: A big data approach:
able to predict the sentiment and the stock performance and its recent news and social data
are also closely correlated.
CHAPTER-3
                                             6
           SOFTWARE REQUIREMENT SPECIFICATION
Software requirements deal with defining software resource requirements and prerequisites
that need to be installed on a computer to provide optimal functioning of an application.
These requirements or prerequisites are generally not included in the software installation
package and need to be installed separately before the software is installed.
Operating system is one of the first requirements mentioned when defining system
requirements (software). Software may not be compatible with different versions of same line
of operating systems, although some measure of backward compatibility is often maintained.
For example, most software designed for Microsoft Windows XP does not run on Microsoft
Windows 98, although the converse is not always true. Similarly, software designed using
newer features of Linux Kernel v2.6 generally does not run or compile properly (or at all) on
Linux distributions using Kernel v2.2 or v2.4.
APIs and drivers – Software making extensive use of special hardware devices, like high-end
display adapters, needs special API or newer device drivers. A good example is DirectX,
which is a collection of APIs for handling tasks related to multimedia, especially game
programming, on Microsoft platforms.
Web browser – Most web applications and software depending heavily on Internet
technologies make use of the default browser installed on the system. Microsoft Internet
Explorer is a frequent choice of software running on Microsoft Windows, which makes use
of ActiveX controls, despite their vulnerabilities.
   1. Software : Anaconda
   2. Primary Language : Python
   3. Frontend Framework : Flask
   4. Back-end Framework : Jupyter Notebook
   5. Database : Sqlite3
                                               7
     6. Front-End Technologies : HTML, CSS, JavaScript and Bootstrap4
The most common set of requirements defined by any operating system or software application
  is the physical computer resources, also known as hardware, A hardware requirements list is
  often accompanied by a hardware compatibility list (HCL), especially in case of operating
  systems. An HCL lists tested, compatible, and sometimes incompatible hardware devices for
  a particular operating system or application. The following subsections discuss the various
  aspects of hardware requirements.
Architecture – All computer operating systems are designed for a particular computer
  architecture. Most software applications are limited to particular operating systems running
  on particular architectures. Although architecture-independent operating systems and
  applications exist, most need to be recompiled to run on a new architecture. See also a list of
  common operating systems and their supporting architectures.
Processing power – The power of the central processing unit (CPU) is a fundamental system
  requirement for any software. Most software running on x86 architecture defines processing
  power as the model and the clock speed of the CPU. Many other features of a CPU that
  influence its speed and power, like bus speed, cache, and MIPS are often ignored. This
  definition of power is often erroneous, as AMD Athlon and Intel Pentium CPUs at similar
  clock speed often have different throughput speeds. Intel Pentium CPUs have enjoyed a
  considerable degree of popularity, and are often mentioned in this category.
Memory – All software, when run, resides in the random access memory (RAM) of a computer.
  Memory requirements are defined after considering demands of the application, operating
  system, supporting software and files, and other running processes. Optimal performance of
  other unrelated software running on a multi-tasking computer system is also considered when
  defining this requirement.
                                                8
Display adapter – Software requiring a better than average computer graphics display, like
graphics editors and high-end games, often define high-end display adapters in the system
requirements.
Peripherals – Some software applications need to make extensive and/or special use of some
peripherals, demanding the higher performance or functionality of such peripherals. Such
peripherals include CD-ROM drives, keyboards, pointing devices, network devices, etc.
             ●      Usability Requirement
             ●      Serviceability Requirement
                                                 9
●   Manageability Requirement
●   Recoverability Requirement
●   Security Requirement
●   Data Integrity requirement
●   Capacity Requirement
●   Availability Requirement
●   Scalability Requirement
●   Interoperability Requirement
●   Reliability Requirement
●   Maintainability Requirement
●   Regulatory Requirement
●   Environmental Requirement
CHAPTER-4
                                 10
                                  SYSTEM DESIGN
Unified Modelling Language (UML) Diagrams are a standardized way of visualizing the
design of a system. UML is widely used in software engineering for specifying, visualizing,
constructing, and documenting the artifacts of software systems. It is similar to the blueprints
used in other fields of engineering. UML is not a programming language; it is rather a visual
language. UML diagrams provide a graphical representation of software components, their
relationships, and their interactions, which helps in understanding, designing, and managing
complex systems.UML has been revised over the years and is reviewed periodically.
                                              11
   ● Serves as comprehensive documentation that can be referred to throughout the project
lifecycle, including during maintenance and future development.
    ● Specifies the structure and behaviour of the system using detailed diagrams, aiding in
clear and precise communication of requirements and design.
     ● UML is linked with object-oriented design and analysis. UML makes the use of
elements
 Support for Analysis and Design: Aid in the analysis of requirements and the design of
systems with various diagrams addressing different concerns.
Tool Support: Enable automated tools for creating, managing, and analysing diagrams to
enhance productivity and accuracy.
Integration: Integrate with other modelling languages and methodologies for a coherent
modelling approach.
                                             12
Structural Diagrams:
Behavior Diagrams:
Capture dynamic aspects or behavior of the system. Behavior diagrams include: Use Case
Diagrams, State Diagrams, Activity Diagrams and Interaction Diagrams.
       A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
                                              13
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.
Class diagram:
       The class diagram is used to refine the use case diagram and define a detailed design
of the system. The class diagram classifies the actors defined in the use case diagram into a
set of interrelated classes. The relationship or association between the classes can be either an
"is-a" or "has-a" relationship. Each class in the class diagram may be capable of providing
                                               14
certain functionalities. These functionalities provided by the class are termed "methods" of
the class. Apart from this, each class may have certain "attributes" that uniquely identify the
class.
Activity diagram:
         The process flows in the system are captured in the activity diagram. Similar to a state
diagram, an activity diagram also consists of activities, actions, transitions, initial and final
states, and guard conditions.
                                                15
Sequence diagram:
    A sequence diagram represents the interaction between different objects in the system.
The important aspect of a sequence diagram is that it is time-ordered. This means that the
                                             16
exact sequence of the interactions between the objects is represented step by step. Different
objects in the sequence diagram interact with each other by passing "messages".
Component diagram:
       The component diagram represents the high-level parts that make up the system. This
diagram depicts, at a high level, what components form part of the system and how they are
interrelated. A component diagram depicts the components culled after the system has
undergone the development or construction phase.
                                              17
18
                                        Chapter-5
                                 METHODOLOGY
5.1 MODULES:
2. Data Normalization:
Normalization is crucial when dealing with features that may have different units or ranges,
ensuring that the algorithms can learn effectively from the data.
Apply XGBoost algorithm to perform feature engineering and select relevant features. while
excluding irrelevant ones.
XGBoost can help enhance the model's ability to make accurate predictions by focusing on
the most impactful features.
                                              19
 Use the preprocessed dataset to train an ARIMA model, specifying the order of the model
(p, d, q).
The ARIMA model is a time series model that captures temporal dependencies in the data.
R-squared measures how well the model explains the variance in the test data.
LSTM models are effective for capturing long-term dependencies in sequential data.
Compare the R-squared value with the ARIMA model to determine which model performs
better.
                                            20
Algorithms:
Why Used in the Project: LSTM is employed in the project for its capability to analyze and
learn patterns in time-series data, which is crucial for predicting stock movements. Its ability
to capture long-term dependencies in the sequential nature of stock prices makes it
well-suited for forecasting stock
Definition: ARIMA is a statistical method used for time-series analysis and forecasting. It
consists of three main components: AutoRegressive (AR), Integrated (I), and Moving
Average (MA). ARIMA models are effective in capturing trends and seasonality within
time-series data.
Why Used in the Project: ARIMA is applied in the project due to its effectiveness in
modeling time-series data, making it suitable for predicting stock price movements over time.
Its ability to account for trends and seasonality in the data complements the project's goal of
forecasting daily gains or losses in the stock market.
Why Used in the Project: XGBOOST is utilized in the project for feature refinement within
the stock data. Its capability to handle large datasets and prioritize the most influential
features makes it suitable for improving the accuracy of predictions. By removing irrelevant
                                               21
variables, XGBOOST contributes to enhancing the overall performance of the machine
learning models in the project.
The Stock Prediction System aims to forecast future stock prices using big data analytics and
machine learning. It follows a structured methodology involving data collection,
preprocessing, feature engineering, model selection, training, optimization, performance
evaluation, and forecasting. This approach ensures the system delivers reliable predictions for
financial decision-making.
The first step in stock prediction involves acquiring high-quality historical stock price data
from sources such as Yahoo Finance, Alpha Vantage, and Quandl. Additional sources
include financial reports and news sentiment analysis.
Once collected, the raw data undergoes preprocessing to remove inconsistencies and enhance
accuracy. This includes handling missing values using interpolation, normalizing numerical
features with Min-Max scaling, and detecting outliers using Boxplot analysis or Isolation
Forest algorithms. These steps ensure that the dataset remains clean and structured for further
analysis.
2. Feature Engineering
To improve predictive accuracy, feature engineering is applied to enhance the dataset with
technical indicators. Moving Averages (SMA & EMA) help track trends over different time
periods, while the Relative Strength Index (RSI) measures stock momentum to indicate
whether a stock is overbought or oversold.
                                              22
Several machine learning models are tested to determine the most accurate prediction algorithm.
The ARIMA (AutoRegressive Integrated Moving Average) model is ideal for time-series
  forecasting, capturing trends based on historical stock behavior. For deeper pattern
  recognition, LSTM (Long Short-Term Memory Neural Network) is used to analyze
  sequential dependencies in financial data. Additionally, Random Forest Regression, an
  ensemble learning method, utilizes multiple decision trees to provide robust predictions by
  minimizing overfitting.
To ensure model effectiveness, these methods are benchmarked against traditional approaches
  such as Linear Regression and Support Vector Regression (SVR).
The dataset is divided into training (80%), validation (10%), and test (10%) sets to ensure
  proper evaluation of the models.Hyperparameter tuning plays a vital role in improving
  accuracy. Optimization techniques such as Grid Search and Random Search help refine
  parameters like learning rate, number of layers, and activation functions. Neural networks are
  optimized using Adam and RMSprop optimizers, while cross-validation methods such as
  k-fold validation prevent overfitting.
Once the model is trained, stock price forecasts are generated for short-term (5-30 days) and
  long-term (quarterly) predictions The system provides interactive graphs to visualize
  prediction results. Line charts show predicted vs. actual stock movements, while candlestick
  charts highlight market opening and closing prices. Users can also access heatmaps and
  correlation matrices to analyze relationships between indicators. Dynamic filters allow users
  to adjust forecast parameters based on stock symbols, timeframes, and volatility levels.
6. Performance Evaluation
To validate prediction accuracy, the system employs multiple evaluation metrics.The Mean
  Squared Error (MSE) measures deviations from actual stock prices, while Root Mean
  Squared Error (RMSE) evaluates prediction stability. The R² Score determines how well the
                                                23
predicted data aligns with real market trends. Additionally, Mean Absolute Percentage Error
(MAPE) quantifies forecast errors, ensuring precise adjustments for more reliable predictions.
Because financial markets are highly volatile, implementing error reduction techniques is
essential.Bayesian Optimization enhances hyperparameter tuning, while hybrid model
integration combines LSTM and ARIMA to improve accuracy. Sentiment analysis
adjustments refine predictions by incorporating investor mood swings, while ensemble
learning techniques merge outputs from multiple models for better stability.
import pandas as pd
import numpy as np
plt.style.use('ggplot')
import datetime as dt
import yfinance as yf
                                             24
import tweepy
import preprocessor as p
import re
import constants as ct
import nltk
nltk.download('punkt')
# Ignore
Warnings import
warnings
warnings.filterwarnings("ignore")
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
side @app.after_request
def add_header(response):
response.headers['Pragma'] = 'no-cache'
                                            25
  response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Expires'] = '0'
return respons
@app.route('/')
def index():
return render_template('index.html')
@app.route('/insertintotable',methods = ['POST'])
def insertintotable():
nm = request.form['nm']
def get_historical(quote):
end = datetime.now()
start = datetime(end.year-2,end.month,end.day)
df = pd.DataFrame(data=data)
df.to_csv(''+quote+'.csv')
if(df.empty):
ts = TimeSeries(key='N6A6QT6IBFJOPJ70',output_format='pandas')
outputsize='full') #Format df
                                          26
   data=data.head(503).iloc[::-1]
data=data.reset_index()
df=pd.DataFrame()
df['Date']=data['date']
df['Open']=data['1. open']
df['High']=data['2. high']
df['Low']=data['3. low']
df['Close']=data['4. close']
df['Volume']=data['6. volume']
df.to_csv(''+quote+'.csv',index=False)
return
def ARIMA_ALGO(df):
uniqueVals =
df["Code"].unique()
len(uniqueVals)
df=df.set_index("Code")
def parser(x):
predictions = list()
for t in range(len(test)):
                                 28
                                    CHAPTER -6
                                       TESTING
Testing is like a detective's investigation into a product. Its main job is to uncover any hidden
flaws or weaknesses that could cause problems later on. By carefully checking each part
—from small components to the whole system—testing ensures everything works as it
should. This helps make sure the software meets what users expect and doesn't let them down
unexpectedly. Different types of tests serve different purposes, like checking specific
requirements or making sure everything works together smoothly. Ultimately, testing is about
making sure the software is reliable and meets high standards before it reaches users.
Unit testing is like checking each ingredient before baking a cake. It's about testing small,
specific parts of your code—like checking that each ingredient is fresh and measured
correctly. By doing this, you ensure that each piece of your code works as intended before
putting everything together. Unit tests help catch mistakes early, maintain the overall quality
of your code, and build confidence that your software will work smoothly when it's all
combined.
We'll take the product out for a spin in real-world situations to see how it holds up, making
sure everything works as intended through hands-on testing. At the same time, we'll craft
detailed tests to check each function thoroughly, ensuring they meet our standards and
perform reliably.
Test Objectives
   1. Every field entry needs to function correctly. When you click on the predict button,
       it should give the right prediction.
   2. The entry screen should be responsive, showing messages and responses
       instantly without any delays.
Features to be Tested
                                               29
           1. Make sure that all entries are in the correct format.
           2. We shouldn't have any duplicates entered.
Integration testing is like putting together pieces of a puzzle to see if they fit perfectly. It
checks how different parts of a software system work together to make sure they collaborate
smoothly and perform their tasks correctly when combined. This testing ensures that the
entire system functions seamlessly as intended.
Test Results: All the test cases mentioned above passed successfully. No
defects encountered.
Acceptance testing is like giving the software a trial run in a real-life situation before it goes
live. It's about making sure the software does everything it's supposed to do and meets all the
goals we set for it. This testing phase helps us decide if the software is ready to be used by
the people who will rely on it every day.
Test case no .1
                                               30
Table 6.5.2 - Test Case 2: Feature Extraction Validation
                                           31
                                         CHAPTER-7
Results
The Welcome Page of the Stock Prediction System introduces users to the platform with a
simple and intuitive interface. It includes options for Sign Up and Home, allowing new users
to create an account and returning users to access stock analysis features directly. The design
ensures easy navigation, guiding users toward financial predictions and insights. A visually
appealing header displays the platform’s name, reinforcing the purpose of the tool.
The Sign-In Page is designed to provide users with secure access to the Stock Prediction
System. The interface includes two primary options: Sign In for existing users and Sign Up
for new registrations. Users must enter their email and password to access the dashboard,
ensuring a protected login experience.
Additionally, a "Forgot Password" option allows users to recover their credentials in case of
access issues. The design prioritizes user convenience, featuring a clean layout with intuitive
buttons and responsive fields.
                                              32
Figure 7.3 - Crypto Symbol Entry Page
This page prompts users to enter the cryptocurrency symbol for analysis. Users can input the
ticker symbol of the crypto asset they want to track, such as BTC for Bitcoin, ETH for
Ethereum, or XRP for Ripple. The system uses the entered symbol to fetch real-time market
data, historical price trends, and predictive insights.
The interface includes an input field with placeholder text guiding users to enter the correct
crypto symbol. A validation mechanism ensures the symbol is recognized, preventing errors
caused by invalid entries. Once a valid symbol is provided, users can proceed to view charts,
predictions, and trading recommendations.
                                                33
Figure 7.5 - Stock Prediction Model Based on Investor Sentiments and
Optimized Deep Learning
This image illustrates a stock prediction model integrating investor sentiment analysis with
deep learning algorithms. The system analyzes news, social media, and market trends to
gauge sentiment—classifying it as positive, neutral, or negative.Optimized LSTM and CNN
models process financial data, refining stock forecasts based on historical patterns and
emotional market responses. The visual elements highlight neural network layers, sentiment
classification, and predicted stock movements, enhancing investment strategies with real-
time sentiment insights.
                                            34
Figure 7.6-Stock Prediction Model Evaluation and Market Trends Analysis
This section presents various graphs analyzing stock trends and prediction model accuracy.
Figure 7.6 visualizes recent trends in Google cryptocurrency prices, tracking historical
fluctuations, volatility, and trading volume to assess market sentiment and investor behavior.
Figure 7.7 evaluates the ARIMA model’s accuracy, displaying error metrics like RMSE,
MAPE, and R² score, which indicate its effectiveness in predicting linear time-series trends.
Figure 7.8 focuses on the LSTM model’s accuracy, highlighting its ability to capture
long-term dependencies in financial data, making it superior for volatile stock forecasting.
Figure 7.9 assesses XGBoost’s predictive performance, illustrating its precision, recall, and
error rates, showcasing how boosted decision trees refine feature selection and enhance stock
forecast accuracy. These graphical insights collectively provide a comprehensive evaluation
of stock prediction methodologies.
                                             35
Figure 7.10 - Final Test Report and Stock Selling Decision
This image presents the comprehensive test report summarizing the performance evaluations
of different stock prediction models. The table includes results from unit testing, integration
testing, and model accuracy assessments, ensuring the system functions correctly and
provides reliable forecasts. Key metrics such as error rates, processing speed, and prediction
accuracy are displayed to validate the effectiveness of the algorithms.
                                              36
Additionally, the image features a "Sell or Not Sell" decision panel, where stocks are
analyzed based on market trends and forecasted movements. The system categorizes stocks
into "Recommended for Sale" or "Hold for Future Gains", helping investors make informed
trading decisions. Color-coded indicators highlight profitable opportunities, reinforcing
strategic investment choices.
                                           37
                                      CHAPTER 8
CONCLUSION
8.1. CONCLUSION
In this paper, the big data analytics are used for efficient stock market analysis and
prediction. Generally, stock market is a domain that uncertainty and inability to accurately
predict the stock values may result in huge financial losses. Through our work we were able
to propose a approach to help us identify stocks with positive everyday return margins, which
can be suggested to be the potential stocks for enhanced trading. Such an approach will act as
a Hadoop based pipeline to learn from past data and make decisions based on streaming
updates which the US stocks are profitable to trade in. We also try to find scope of
improvements to our study in future directions. We intend to further our study by automating
the analysis processes using scheduling module, then obtain periodic recommendations for
trading the US stocks. We also plan to test some Neural Network model based learning rather
than linear regression aims to accurately predict the US stock prices
Future iterations may include sentiment analysis of news articles and social media posts
related to the stock market, providing traders with additional contextual information for
improved decision-making.C Interactive visualization tools and dashboards can enhance user
experience by enabling deeper exploration of stock market trends and predictions through
customizable charts and real-time updates.
C Building predictive APIs would allow integration with trading platforms and financial
applications, of éring traders convenient access to real-time stock market predictions within
their existing workflows.Future iterations could explore reinforcement learning techniques to
dynamically optimize trading strategies, enabling the system to adapt to changing market
conditions for more responsive decision-making.
                                              38
                                   REFERENCES
[1] Albeladi, K., Zafar, B., & Mueen, A. (2023). Time series forecasting using LSTM and
ARIMA. International Journal of Advanced Computer Science and Applications, 14(1),
313-320.
[2] Sirisha, U. M., Belavagi, M. C., & Attigeri, G. (2022). Profit prediction using ARIMA,
SARIMA and LSTM models in time series forecasting: A comparison. IEEE Access, 10,
124715-124727
[3] Awan, M. J., Rahim, M. S., Nobanee, H., Munawar, A., Yasin, A., & Zain Azlan, A. M.
(2021). Social Media and Stock Market Prediction: A Big Data Approach. Computers,
Materials & Continua, 67(2).
[4] ]Z. Peng, “Stocks Analysis and Prediction Using Big Data Analytics,” in 2019
International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS),
Changsha, China, 2019, pp. 309–312.
[5] P. Singh and A. Thakral, “Stock market: Statistical analysis of its indexes and its
constituents,” in 2017 International Conference On Smart Technologies For Smart Nation
(SmartTechCon), Bangalore, 2017, pp. 962–966.
[6] S. Tiwari, A. Bharadwaj, and S. Gupta, “Stock price prediction using data analytics,” in
2017 International Conference on Advances in Computing, Communication and Control
(ICAC3), Mumbai, 2017, pp. 1–5.
[7] L. Zhao and L. Wang, “Price Trend Prediction of Stock Market Using Outlier Data
Mining Algorithm,” in 2015 IEEE Fifth International Conference on Big Data and Cloud
Computing, Dalian, China, 2015, pp. 93–98.
[8] G. V. Attigeri, Manohara Pai M M, R. M. Pai, and A. Nayak, “Stock market prediction: A
big data approach,” in TENCON 2015 - 2015 IEEE Region 10 Conference, Macao, 2015, pp.
1–5.
[9] W.-Y. Huang, A.-P. Chen, Y.-H. Hsu, H.-Y. Chang, and M.-W. Tsai, “Applying Market
Profile Theory to Analyze Financial Big Data and Discover Financial Market Trading
Behavior - A Case Study of Taiwan Futures Market,” in 2016 7th International Conference
on Cloud Computing and Big Data (CCBD), Macau, China, 2016, pp. 166–169.
                                             39
[10] S. Jeon, B. Hong, J. Kim, and H. Lee, “Stock Price Prediction based on Stock Big Data
and Pattern Graph Analysis:,” in Proceedings of the International Conference on Internet of
Things and Big Data, Rome, Italy, 2016, pp. 223–231.
[11] R. Choudhry and K. Garg, “A Hybrid Machine Learning System for Stock Market
Forecasting,” vol. 2, no. 3, p. 4, 2008.
[12] K. Kim, “Financial time series forecasting using support vector machines,”
Neurocomputing, vol. 55, no. 1–2, pp. 307–319, Sep. 2003.
[13] M. Makrehchi, S. Shah, and W. Liao, “Stock Prediction Using EventBased Sentiment
Analysis,” in 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence
(WI) and Intelligent Agent Technologies (IAT), Atlanta, GA, USA, 2013, pp. 337–342.
[15] M.D. Jaweed and J. Jebathangam, “Analysis of stock market by using Big Data
Processing Environment” in International Journal of Pure and Applied Mathematics, Volume
119
40