A PROJECT SPECIFICATION REPORT
ON
SENTIMENT ANALYSIS FOR STOCK MARKET PREDICTION
                          Submitted by
                     L SHYAM        B111562
        in partial fulfillment for the award of the degree
                                 of
                      Bachelor of Technology
                                 IN
              Computer Science and Engineering
                       under the guidance of
                           Mr.Anjaneyulu
       Rajiv Gandhi University of Knowledge Technologies
                  Basar, Telangana - 504107
                         March-2017
                                1
                                      Abstract
              Stock prices change every day as a result of market forces. There is a
change in share price because of supply and demand. According to the supply and
demand, the stock price either moves up or undergoes a fall. When the economy
slows, stock prices tend to be more mixed. Markets may take time to form bottoms or
make tops, sometimes of two years or more. This makes it difficult to determine
when the market hits a top or a bottom.
             More than 70% Indians are economically illiterate. Unfortunately our
educational systems are also unable to create a platform in this field. We as a
technical students came up with a strong desire to boost this field with technology.
Although there are so many technical analysis tools available in the market which
perform stock price prediction based on the historical data,we hardly find a tool
which can predict the stock price based on sentiment analysis.
                                           2
                              Table of Contents
Abstract
1.Introduction
2. Literature survey
    2.1 Prediction of Stock Performance in the Indian Stock Market Using Logistic
           Regression
    2.2 Sentiment Analysis with the Naive Bayes Classifier
3. Overview of Task Specifications and Project Schedule
4. Review of Tasks
5. Interim Results
6. Future Plans
7. Conclusion
                                        3
1.Introduction:
      In the finance field, stock market and its trends are extremely volatile in nature.
It attracts researchers to capture the volatility and predicting its next moves. Investors
and market analysts study the market behaviour and plan their buy or sell strategies
accordingly. As stock market produces large amount of data every day, it is very
difficult for an individual to consider all the current and past information for
predicting future trend of a stock. Mainly there are two methods for forecasting
market trends. One is Technical analysis and other is Fundamental analysis. Technical
analysis considers past price and volume to predict the future trend where as
Fundamental analysis On the other hand, Fundamental analysis of a business involves
analyzing its financial data to get some insights. The efficacy of both technical and
fundamental analysis is disputed by the efficient-market hypothesis which states that
stock market prices are essentially unpredictable.This research follows the
Fundamental analysis technique to discover future trend of a stock by considering
news articles about a company as prime information and tries to classify news as
good (positive) and bad (negative). If the news sentiment is positive, there are more
chances that the stock price will go up and if the news sentiment is negative, then
stock price may go down.This research is an attempt to build a model that predicts
news polarity which may affect changes in stock trends. In other words,check the
impact of news articles on stock prices We are using supervised machine learning as
classification and other text mining techniques to check news polarity. And also be
able to classify unknown news, which is not used to build a classifier.
                                            4
2. Literature survey:
   2.1 Prediction of Stock Performance in the Indian Stock Market
          Using Logistic Regression:
           The authors use logistic regression (LR) and various financial ratios as
independent variables to investigate indicators that significantly affect the
performance of stocks actively traded on the Indian stock market. The study sample
consists of the ratios of 30 large market capitalization companies over a four-year
period. The study identifies and examines eight financial ratios that can classify the
companies up to a 74.6% level of accuracy into two categories – “good” or “poor” –
based on their rate of return. The paper asserts that the model developed can enhance
an investor's stock price forecasting ability. Macro economic variables, which also
can influence the share price, were not taken into account, however. The paper
discusses the practical implications of using the LR method to predict the probability
of good stock performance. The authors state that the model can be used by investors,
fund managers, and investment companies to enhance their ability to select out-
performing stocks.
2.2 Sentiment Analysis with the Naive Bayes Classifier:
        We know that the Naive Bayes Classifier is based on the bag-of-words
model.With the bag-of-words model we check which word of the text-document
appears in a positive-words-list or a negative-words-list. If the word appears in a
positive-words-list the total score of the text is updated with +1 and vice versa. If at
the end the total score is positive, the text is classified as positive and if it is negative,
the text is classified as negative. Simple enough!
       With the Naive Bayes model, we do not take only a small set of positive and
negative words into account, but all words the NB Classifier was trained with, i.e. all
words presents in the training set.
                                              5
3. Overview of Task Specifications and Project Schedule:
TASK                                      PERIOD/Expected Duration
Studying different text mining methods    Continuing from Dec-15
and learning NLTK in python
Obtaining twitter credentials &           Dec-20 to Dec-25
installation of required softwares
Dynamically collect tweets from twitter   Jan-05 to Jan-15
Preprocessing the tweets                  Jan-28 - on going/45 days
Extracting the tweet
Removing html chars
Tokenizing the tweets
Removing stop words
Feature extraction
Building a training data set              5days
Classifying the whole tweets sentiment as 3days
positive, negative or neutral
Generating the signal                     2days
Testing                                   1day
Improving the efficiency
                                          6
4.Review of Tasks:
                  TASK                          STATUS
        Learning NLTK in python                 completed
     Installation of required softwares         completed
                -Nltk,tweepy
  Dynamically collect tweets from twitter       completed
Preprocessing the tweets                        completed
       Extracting the tweet
       Tokenizing the tweets
       Removing stop words
       Feature extraction
        Building a training data set              To do
Classifying the whole tweets sentiment as         To do
        positive, negative or neutral
           Generating the signal                  To do
                                            7
5. Interim Results:
Obtained the twitter credentials(consumer key, secret key and access token) to use
twitter API in our program.
We have written a program to get tweets from twitter and another program to
extract the tweet.
We have written a program to tbreak the the text into tokens.
We have written a program to remove stop words from the tokenized text.
6. Future Plans:
           I have to train a dataset, Classify the whole tweets sentiment as positive,
negative or neutral and generate the signal.
7. Conclusion:
          We are using a Naive Bayes Classfying technique to analyse the sentiments
of different data to predict the stock market.There are new approaches to known in-
depth of an analysis of stock price variations. Sentiment analysis can be used
exclusively in forecasting of stock price. We are trying propose a Sentiment analysis
method to provide better accuracy than other methods.
                                           8
Suggestions by PEC Members
Member1:
Member2:
Member3: