0% found this document useful (0 votes)

88 views17 pages

Final Report Data Mining

This document appears to be a final project report for a Twitter data analysis project on the topic of demonetization in India. It includes sections on introduction and objectives, literature review on related work, experimental details on machine learning methods and the dataset, and planned sections on results and discussions, summary and conclusions, and references. The introduction provides background on demonetization in India and the objective to perform sentiment analysis on Twitter data related to this topic. The literature review covers related work on analyzing information and sentiment on Twitter. Experimental details include plans to use machine learning clustering and classification methods like K-means on a Twitter dataset from Kaggle about demonetization.

Uploaded by

Venna Vamsikrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views17 pages

Final Report Data Mining

Uploaded by

Venna Vamsikrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

School of Information Technology & Engineering

M-Tech Software Engineering

SWE2009- DATA MINING TECHNIQUES
TWITTER DATA ANALYSIS
FINAL REVIEW

Group Members
Name Reg.no
V.VAMSI KRISHNA 16MIS0170
K. SAITEJA 16MIS0272

Submitted to
Faculty: Prof.SUDHA.M
SLOT: B2+TB2

1
CERTIFICATE

This is to certify that the Project work entitled “TWITTER DATA ANALYSIS”
that is being submitted by “VAMSI KRISHNA AND K. SAITEJA” in M. Tech
(S.E) for SWE2009: DATA MINING TECHNIQUES is a record of bonafide
work done under my supervision. The contents of this Project work, in full or in
parts, have neither been taken from any other source nor have been submitted for
any other course.

Signature of faculty

(SUDHA.M)

2
AKNOWLEDGEMENT

We are thankful to the Department because of whom, we have gained confidence

in Innovative Thinking and it also enhanced our professional skills as to become
competent in this field.

In performing our project, we had to take the help and guideline of some respected
persons, who deserve our greatest gratitude. The completion of this project gives
us much Pleasure. We would like to show our gratitude to Prof. SUDHA.M, SITE
VIT University for giving us a good guideline for project throughout numerous
consultations. We would also like to expand our deepest gratitude to all those
who have directly and indirectly guided us in this project.

Thank you,

V.VAMSI KRISHNA 16MIS0170

K. SAITEJA 16MIS0272

3
TABLE OF CONTENTS

S.no Topics
Abstract
1. Introduction
1.1 Introduction
1.2 Objective of the work
1.3 Scope of the work
2. Literature review
2.1 Introduction
2.2 Background
2.3 Challenges
2.4 Problem definition and approach
3. Experimental details
3.1 Machine learning methods
3.2 Design frame work
3.3 Dataset, Data source, characterization, Pre-processing
3.4 Processing techniques
4. Results and Discussions
5. Summary and Conclusions
6. References

4
Abstract

Withdrawal of a particular form of currency (such currency notes) from circulation is known
as demonetization. On November 8th, India’s Prime Minister announced that 86% of the
country’s currency would be rendered null and void in 50 days and it will withdraw all 500 and
1,000 rupee notes the country’s most popular currency denominations from circulation, while
a new 2,000 rupee note added in. It was posited as a move to crackdown on corruption and the
country’s booming under-regulated and virtually untaxed grassroots economy. To the final
result of the implementation of demonetization we will apply sentimental analysis for the data
set, and we will find which type of user and tweets per/hour and rate of increase in number of
tweets so that we can find the interest of the user on demonetization.

5
CHAPTER-1

Introduction

Twitter is a micro-blogging website that has become increasingly popular with the network
community. Users update short messages, also known as Tweets, which are limited to 140
characters. Users update their personal opinions on many subjects, discuss current topics and
write about life events through tweets. This platform is favoured by many users because it has
no political and economic restrictions and is easily available to large number of people. As the
amount of users increase, micro-blogging platforms are becoming a place to find strong
viewpoints and sentiment. People use twitter to forecast and analyse in a lot of different areas.

Objective of the work

Scope of the work

6
CHAPTER-2

Literature Survey

The recent advancements in Web technologies have attracted a large number of internet users
to use online social networks like Facebook and Twitter for varied purposes, including events
update and data sharing. As a result, social network applications are emerging as a powerful
online tool for users to express and share their views with other users around the globe. Twitter
is one such social media application with a large and rapidly growing user base. It has become
the most popular micro-blogging social networking website in which users share their views in
the form of very short message limited to 140 characters called “tweets”. Besides events update
and data sharing, Twitter is also being used for many other purposes, including product
marketing, political campaign, and market research. In addition, Twitter is also being used by
the users to express their opinions and views about prominent issues of day-to-day life that
may be social, political, or entertainment. Analysing tweets to spot emerging issues and trends
and to assess public opinion concerning topics and events is of considerable interest to various
stakeholders, including government, companies, and security agencies.

Background

In this section, we present the functional details of our proposed tweets mining approach, which
aims to classify tweets based their relatedness with various events. Figure 1 presents the work-
flow of the proposed method and highlights the functioning details of the various working
modules. Tweets crawling aims to retrieve tweets from the server and store them on local
machine for analysis. Tweets pre-processing and tokenization process aims to extract tweets
contents, filter out unwanted constituents like embedded emoticons and URLs, and tokenize
them into 1-grams for further processing. Feature extraction and social network generation
identifies significant key terms from the tweets using Latent Dirichlet Allocation (LDA)
method and use them to model the tweets as a social network. Finally, Markov clustering is
applied on the generated social network to crystallize it into various clusters, each one
representing a particular event.

7
Challenges

The widespread and different types of information on Twitter make it one of the most
appropriate virtual environments for information monitoring and tracking. In this paper, the
authors review different information analysis techniques; starting with the analysis of different
hashtags, twitter’s network-topology, event spread over the network, identification of
influence, and finally analysis of sentiment. Future research and development work will be
addressed.

Problem definition and approach

The project addresses the problem of sentiment analysis in twitter; that is classifying tweets
according to the sentiment expressed in them: positive, negative or neutral. Twitter is an online
micro-blogging and social-networking platform which allows users to write short status
updates of maximum length 140 characters. It is a rapidly expanding service with over 200
million registered users out of which 100 million are active users and half of them log on twitter
on a daily basis – generating nearly 250 million tweets per day. Due to this large amount of
usage we hope to achieve a reflection of public sentiment by analysing the sentiments
expressed in the tweets. Analysing the public sentiment is important for many applications such
as firms trying to find out the response of their products in the market, predicting political
elections and predicting socioeconomic phenomena like stock exchange. The aim of this
project is to develop a functional classifier for accurate and automatic sentiment classification
of an unknown tweet stream.

8
CHAPTER-3

Experimental Details

Machine learning methods

Clustering with K-means

Given k, the k-means algorithm is implemented in four steps

 Partition objects into k nonempty subsets.

 Compute seed points as the centroids of the clusters of the current partitioning.
 Assign each object to the cluster with the nearest seed point.
 Go back to Step 2, stop when the assignment does not change.

Design Framework

9
Data Set

Data Source

https://www.kaggle.com/zoupet/exploratory-data-analysis.

Pre-processing

 Replaced 0 values by mean, but no performance improvement was observed while

evaluating models.
 Replacing missing values by mean has performance improvement, while evaluating
models.
 Dropped rows with 0 values, performance seems to be improved. But dataset reduces
to half.
 Split into train and test sets.
 Applied feature selection, but not much change in performance. So code lines disabled.
 Total number of rows: 912
 Total number of noise: 49
 Total number of missing: 56
 Total number of outlier: 11
 Total number of errors: 28
 Total number of rows after data pre-processing: 768

10
Processing techniques

Sentimental Analysis

Taking the sample dataset and making the pre-processing and transforming the data and
colleting the suitable information using the python Jupiter. Analysis done using the word cloud
printing the most common words are used for the tag ”Narendhramodhi” and using the data
ratio of tweets counts are displayed. Pie chart and graphical representation of the most used
source for the data like iPhone or android. Main motto proposed the system using sentimental
analysis. This code and method is mostly used for all the data sets to analysis the most common
used text identification. We have tried with another data path also we are able to get correct
output 95%. It can be applicable to the any real life scenario and get analysis of response.

11
CHAPTER-4

Results and discussions

Getting data

For the word Narendra Modi

12
For the word Terrorist

Time series plotting

No of retweets per hour

13
Source of retweets

No of retweet by source bis

Cluster plotting

14
Correlation analysis

Sentiment Analysis

15
CHAPTER-5

Summary and conclusions

The above results shown to the each component taken for the total dataset we are to calculate
the tweets per particular time period and device used for the tweets and correlation. We have
done the experiment using almost 700 content from the data and we got the output but it takes
almost 25 min to get the output. This project gave us the experience how to do analysis of the
data to different categories. It can be applicable to the any real life scenario and get analysis of
response like positive or negative or partial.

16
References

[1] Chung, J. E., & Mustafaraj, E. (2011, August). Can collective sentiment expressed on twitter predict
political elections?. In Twenty-Fifth AAAI Conference on Artificial Intelligence.

[2] Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining.
In LREc (Vol. 10, No. 2010, pp. 1320-1326).

[3] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users: Real-time event detection
by social sensors,” in Proceedings of the 19th international conference on World wide web, 2010, pp.
851–860.

[4] M. Cheong and V. Lee, “A study on detecting patterns in twitter intratopic user and message
clustering,” in Proceedings of the 2010 20th International Conference on Pattern Recognition, 2010, pp.
3125–3128.

[5] M. Thelwall, K. Buckley, and G. Paltoglou, “Sentiment in twitter events,” Journal of the American
Society for Information Science and Technology, vol. 62, no. 2, pp. 406–418, 2011.

Fin Irjmets1715854730
No ratings yet
Fin Irjmets1715854730
8 pages
Dataset Analysis Using Keyword Searching in Twitter Data: Inderprastha Engineering College, Ghaziabad
No ratings yet
Dataset Analysis Using Keyword Searching in Twitter Data: Inderprastha Engineering College, Ghaziabad
4 pages
Major Project Report: AT "Baldev Ram Mirdha Institute of Technology"
No ratings yet
Major Project Report: AT "Baldev Ram Mirdha Institute of Technology"
51 pages
Twitter Sentiment Analysis Report
No ratings yet
Twitter Sentiment Analysis Report
51 pages
Major Project Report: AT "Baldev Ram Mirdha Institute of Technology"
No ratings yet
Major Project Report: AT "Baldev Ram Mirdha Institute of Technology"
51 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
25 pages
Vaibhav DSBDA Project
No ratings yet
Vaibhav DSBDA Project
16 pages
Python Twitter Sentiment Analysis
No ratings yet
Python Twitter Sentiment Analysis
20 pages
Twitter Sentiment Analysis - Final - Report Copy Sahil
No ratings yet
Twitter Sentiment Analysis - Final - Report Copy Sahil
26 pages
Sentiment Analysis Tool Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Tool Using Machine Learning Algorithms
5 pages
Finalreview 1
No ratings yet
Finalreview 1
4 pages
Effective Approach For Sentiment Opinion Mining Using Natural Language Extraction and Tweets Evaluation
No ratings yet
Effective Approach For Sentiment Opinion Mining Using Natural Language Extraction and Tweets Evaluation
8 pages
A Framework To Predict Social Crimes Using Twitter Tweets
No ratings yet
A Framework To Predict Social Crimes Using Twitter Tweets
5 pages
Twitter Sentimental Analysis: © APR 2021 - IRE Journals - Volume 4 Issue 10 - ISSN: 2456-8880
No ratings yet
Twitter Sentimental Analysis: © APR 2021 - IRE Journals - Volume 4 Issue 10 - ISSN: 2456-8880
5 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
Python & Data Analytics Internship Review
No ratings yet
Python & Data Analytics Internship Review
20 pages
17BIT051
No ratings yet
17BIT051
26 pages
Business Sentiment Analysis Guide
No ratings yet
Business Sentiment Analysis Guide
6 pages
Sentiment Analysis PDF
No ratings yet
Sentiment Analysis PDF
4 pages
Marketing
No ratings yet
Marketing
6 pages
ProjectFinalReport 2copies
No ratings yet
ProjectFinalReport 2copies
26 pages
Social Data Analytics
No ratings yet
Social Data Analytics
2 pages
B3 Twitter Data
No ratings yet
B3 Twitter Data
68 pages
CCL MiniProject
No ratings yet
CCL MiniProject
8 pages
Twitter Sentiment Analysis Project
100% (1)
Twitter Sentiment Analysis Project
14 pages
Ajay PD Yadav
No ratings yet
Ajay PD Yadav
7 pages
ML Paper (Namrit & Ritika)
No ratings yet
ML Paper (Namrit & Ritika)
16 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
Twitter Sentiment Analysis Survey
No ratings yet
Twitter Sentiment Analysis Survey
7 pages
Twitter BDA Presentation
No ratings yet
Twitter BDA Presentation
15 pages
(IJCST-V5I2P52) :asst - Prof.J.Omana, S.Dhanalakshmi, V.M.Divyalakshmi, S.Mahalakshmi
No ratings yet
(IJCST-V5I2P52) :asst - Prof.J.Omana, S.Dhanalakshmi, V.M.Divyalakshmi, S.Mahalakshmi
4 pages
Bhumesh RD
No ratings yet
Bhumesh RD
9 pages
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
No ratings yet
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
4 pages
DMW Project Report by Saurabh Zingade
No ratings yet
DMW Project Report by Saurabh Zingade
16 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
43 pages
Case Stud1
No ratings yet
Case Stud1
5 pages
IJRPR6548
No ratings yet
IJRPR6548
5 pages
Twitter Sentiment Analysis Algorithm
No ratings yet
Twitter Sentiment Analysis Algorithm
4 pages
Mining Tweets
No ratings yet
Mining Tweets
19 pages
MAJOR PROJECT - Report
No ratings yet
MAJOR PROJECT - Report
26 pages
Analyzing Public Sentiments A Review
No ratings yet
Analyzing Public Sentiments A Review
3 pages
22bcs50154 Basepaper (Ajay PD Yadav) 3
No ratings yet
22bcs50154 Basepaper (Ajay PD Yadav) 3
7 pages
Twitter's Machine Learning Use
No ratings yet
Twitter's Machine Learning Use
11 pages
Fake News Synopsis
No ratings yet
Fake News Synopsis
10 pages
Optimization of Classification Algorithm With Grid
No ratings yet
Optimization of Classification Algorithm With Grid
7 pages
Dsbda
No ratings yet
Dsbda
12 pages
Data Science Industrial Report
No ratings yet
Data Science Industrial Report
22 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
5 pages
Twitter-Based Traffic Monitoring
No ratings yet
Twitter-Based Traffic Monitoring
3 pages
Twitter Sentiment for Stock Forecast
No ratings yet
Twitter Sentiment for Stock Forecast
5 pages
Sentiment Analysis and Influence Tracking Using Twitter
No ratings yet
Sentiment Analysis and Influence Tracking Using Twitter
8 pages
1143-Article Text-7844-1-10-20221206
No ratings yet
1143-Article Text-7844-1-10-20221206
10 pages
Group11 Report
No ratings yet
Group11 Report
18 pages
Unlocking Twitter's Sentiments: A Deep Dive Into Sentiment Analysis
No ratings yet
Unlocking Twitter's Sentiments: A Deep Dive Into Sentiment Analysis
8 pages
(IJCST-V4I6P20) :siddu P. Algur, Rashmi H. Patil, Prashant Bhat
No ratings yet
(IJCST-V4I6P20) :siddu P. Algur, Rashmi H. Patil, Prashant Bhat
6 pages
Timeline Analysis of Twitter User Timeline Analysis of Twitter User
No ratings yet
Timeline Analysis of Twitter User Timeline Analysis of Twitter User
10 pages
Sentimental Analysis of Social Media For Stock Prediction Using Hadoop
No ratings yet
Sentimental Analysis of Social Media For Stock Prediction Using Hadoop
5 pages
School of Information Technology& Engineering M.Tech Software Engineering
No ratings yet
School of Information Technology& Engineering M.Tech Software Engineering
5 pages
School of Information Technology& Engineering M.Tech Software Engineering
No ratings yet
School of Information Technology& Engineering M.Tech Software Engineering
3 pages
Marketing Quiz 1 PDF
No ratings yet
Marketing Quiz 1 PDF
4 pages
Marketing QUIZ - 2 16MIS0170
No ratings yet
Marketing QUIZ - 2 16MIS0170
6 pages
Marketing Quiz 1
No ratings yet
Marketing Quiz 1
5 pages
SWE2022 Software Engineering Process, Tools and Methods Review 3 Weather Forecasting
No ratings yet
SWE2022 Software Engineering Process, Tools and Methods Review 3 Weather Forecasting
30 pages
Compiler Design May 2024
No ratings yet
Compiler Design May 2024
8 pages
System Platform Glossary
No ratings yet
System Platform Glossary
7 pages
HIM Role HIM Director
No ratings yet
HIM Role HIM Director
6 pages
Blog (Creative Non Fiction) : Strand: HUMSS 301
No ratings yet
Blog (Creative Non Fiction) : Strand: HUMSS 301
12 pages
Exchange 2010 Training Guide
No ratings yet
Exchange 2010 Training Guide
3 pages
BIT Stuffing Program
No ratings yet
BIT Stuffing Program
6 pages
Micromedex 360 Care Insights Brochure
No ratings yet
Micromedex 360 Care Insights Brochure
9 pages
Descriptives: Descriptives Variables Usia /statistics Mean Stddev Min Max
No ratings yet
Descriptives: Descriptives Variables Usia /statistics Mean Stddev Min Max
37 pages
Gostin Alan
No ratings yet
Gostin Alan
64 pages
Computer Science at Study World Lanka
No ratings yet
Computer Science at Study World Lanka
6 pages
A Apr B 3 PDF
No ratings yet
A Apr B 3 PDF
15 pages
Assignment Questions IT-701 (OOAD) : "UNIT-1"
100% (1)
Assignment Questions IT-701 (OOAD) : "UNIT-1"
3 pages
Skyjet 3318 USB20 en
No ratings yet
Skyjet 3318 USB20 en
80 pages
Windows XP Networking
No ratings yet
Windows XP Networking
30 pages
VB.NET Simplified for VBA Users
100% (1)
VB.NET Simplified for VBA Users
28 pages
Se 328 HW 1
No ratings yet
Se 328 HW 1
5 pages
Gue Kaga Ngerti Ini Apaan
No ratings yet
Gue Kaga Ngerti Ini Apaan
5 pages
8.4.1.2 Packet Tracer - Skills Integration Challenge Instructions
No ratings yet
8.4.1.2 Packet Tracer - Skills Integration Challenge Instructions
3 pages
Tigrigna Web Browser Project
No ratings yet
Tigrigna Web Browser Project
41 pages
Simplex Method
100% (4)
Simplex Method
16 pages
Ge3151 Python Unit 5
No ratings yet
Ge3151 Python Unit 5
15 pages
Duhamel's Integral
No ratings yet
Duhamel's Integral
3 pages
ICONIX Is A Software Development Methodology Which Predates Both The
No ratings yet
ICONIX Is A Software Development Methodology Which Predates Both The
2 pages
368 Quantitative Analysis of Flexible Manufacturing Systems
No ratings yet
368 Quantitative Analysis of Flexible Manufacturing Systems
2 pages
IJNAA - Volume 13 - Issue 1 - Pages 2053-2063
No ratings yet
IJNAA - Volume 13 - Issue 1 - Pages 2053-2063
11 pages
Please Encircle The Letter of Your Choice and Fill in The Blanks For Your Answers
No ratings yet
Please Encircle The Letter of Your Choice and Fill in The Blanks For Your Answers
2 pages
Notes DS CH 1 Shraddha
No ratings yet
Notes DS CH 1 Shraddha
7 pages
Prepar3D View Groups
No ratings yet
Prepar3D View Groups
22 pages
PLC Scada
100% (1)
PLC Scada
30 pages
Eee302 Quiz2
No ratings yet
Eee302 Quiz2
3 pages

Final Report Data Mining

Uploaded by

Final Report Data Mining

Uploaded by

School of Information Technology & Engineering

M-Tech Software Engineering

We are thankful to the Department because of whom, we have gained confidence

V.VAMSI KRISHNA 16MIS0170

Objective of the work

Scope of the work

Problem definition and approach

Machine learning methods

Clustering with K-means

Given k, the k-means algorithm is implemented in four steps

 Partition objects into k nonempty subsets.

 Replaced 0 values by mean, but no performance improvement was observed while

Results and discussions

For the word Narendra Modi

Time series plotting

No of retweets per hour

No of retweet by source bis

Summary and conclusions

You might also like