0% found this document useful (0 votes)

120 views17 pages

Sentiment Analysis of Reddit Comments On Israel and Palestine Conflict: A Social Discourse Study

The document discusses analyzing sentiment in Reddit comments about the Israel-Palestine conflict. It loads comment data from a CSV file into a Pandas dataframe with over 189600 rows. It examines the distribution of subreddits in the data, finding the largest proportions come from IsraelPalestine, worldnews, AskMiddleEast, and CombatFootage. Visualizations are created to illustrate the subreddit counts and proportions.

Uploaded by

Ahtisham Khalid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views17 pages

Sentiment Analysis of Reddit Comments On Israel and Palestine Conflict: A Social Discourse Study

Uploaded by

Ahtisham Khalid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Sentiment Analysis of Reddit Comments on Israel and Palestine

Conflict: A Social Discourse Study

In [1]: # An Machine Learning project by: Prof. Nirmal Gaud

# Contact: ds.ml.projects.sessions.1@gmail.com

In [2]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

In [3]: import warnings

warnings.filterwarnings('ignore')

In [4]: df = pd.read_csv('pls_isl_conflict_comments.csv')

In [5]: df.head()

Out[5]: comment_id score self_text subreddit created_time

0 k5480sx 1 Exactly! I can remember the humanitarian aid s... worldnews 16-10-2023 19:39

1 k547q14 1 *We are the only part of the World that has Fr... Palestine 16-10-2023 19:36

2 k547elf 1 I don’t make Israeli strategy, nor am I Israel... worldnews 16-10-2023 19:34

3 k54742r 1 These people didn't vote Hamas in or something... worldnews 16-10-2023 19:32

4 k5473zi 1 We don't care what you do. We just want to liv... worldnews 16-10-2023 19:32

In [6]: df.tail()

Out[6]: comment_id score self_text subreddit created_time

189626 k3sdwfc 42 US. This is bullshit Palestine 07-10-2023 05:20

189627 k3sdixt 1 I am in the United States and it has the dotte... Palestine 07-10-2023 05:17

189628 k3sccp2 54 In which country are you?\nSometimes maps adap... Palestine 07-10-2023 05:08

189629 k3ritvj 116 You can't give up on something you only preten... worldnews 07-10-2023 01:46

189630 k3riboh 30 > The head of Islamic Jihad denounced Arab ... worldnews 07-10-2023 01:42

In [7]: df.shape

(189631, 5)
Out[7]:

In [8]: df.columns

Index(['comment_id', 'score', 'self_text', 'subreddit', 'created_time'], dtype='object')

Out[8]:

In [9]: df.duplicated().sum()

0
Out[9]:

In [10]: df.isnull().sum()
comment_id 0
Out[10]:
score 0
self_text 0
subreddit 0
created_time 0
dtype: int64

In [11]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 189631 entries, 0 to 189630
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 comment_id 189631 non-null object
1 score 189631 non-null int64
2 self_text 189631 non-null object
3 subreddit 189631 non-null object
4 created_time 189631 non-null object
dtypes: int64(1), object(4)
memory usage: 7.2+ MB

In [12]: df.describe()

Out[12]: score

count 189631.000000

mean 28.583607

std 179.946085

min -934.000000

25% 1.000000

50% 2.000000

75% 10.000000

max 16463.000000

In [13]: df.nunique()

comment_id 189631
Out[13]:
score 1761
self_text 186338
subreddit 14
created_time 13562
dtype: int64

In [14]: object_columns = df.select_dtypes(include=['object']).columns

print("Object type columns:")
print(object_columns)

numerical_columns = df.select_dtypes(include=['int', 'float']).columns

print("\nNumerical type columns:")
print(numerical_columns)

Object type columns:

Index(['comment_id', 'self_text', 'subreddit', 'created_time'], dtype='object')

Numerical type columns:

Index(['score'], dtype='object')

In [15]: df['subreddit'].unique()

array(['worldnews', 'Palestine', 'IsraelPalestine', 'TerrifyingAsFuck',

Out[15]:
'worldnewsvideo', 'AskMiddleEast', 'CombatFootage',
'PublicFreakout', 'NonCredibleDefense', 'IsrealPalestineWar_23',
'CrazyFuckingVideos', 'AbruptChaos', 'NoahGetTheBoat',
'ActualPublicFreakouts'], dtype=object)

In [16]: df['subreddit'].value_counts()

IsraelPalestine 52622
Out[16]:
worldnews 36204
AskMiddleEast 28107
CombatFootage 27901
PublicFreakout 14255
NonCredibleDefense 13865
Palestine 6968
worldnewsvideo 5598
IsrealPalestineWar_23 2537
TerrifyingAsFuck 546
NoahGetTheBoat 498
AbruptChaos 200
CrazyFuckingVideos 197
ActualPublicFreakouts 133
Name: subreddit, dtype: int64

In [17]: plt.figure(figsize=(15,6))
sns.countplot(df['subreddit'], data = df, palette = 'hls')
plt.xticks(rotation = -45)
plt.show()
In [18]: plt.figure(figsize=(30,20))
plt.pie(df['subreddit'].value_counts(), labels=df['subreddit'].value_counts().index,
autopct='%1.1f%%', textprops={ 'fontsize': 25,
'color': 'black',
'weight': 'bold',
'family': 'serif' })
hfont = {'fontname':'serif', 'weight': 'bold'}
plt.title('Subreddit', size=20, **hfont)
plt.show()

In [19]: fig = go.Figure(data=[go.Bar(x=df['subreddit'].value_counts().index,

y=df['subreddit'].value_counts())])
fig.update_layout(title='Subreddit', xaxis_title='Subreddit', yaxis_title="Count")
fig.show()
Subreddit

50k

40k
Count

30k

20k

10k

0
Isr wo As Co Pu No Pa wo Isr Te No Ab Cr Ac
ae rld kM mb bli nC les rld ea rri ah ru az tu
lPa ne idd atF cF re tin ne lPa fyi Ge pt yF alP
les ws leE re d ibl e w l es ngA tT C ha u c ub
tin o o a k s vid tin sF h os k i ng lic
as tag o u eD u eB Fr
e t e t efe eo eW ck oa Vid ea
ns ar t eo ko
e _2 s ut
3 s

Subreddit

In [20]: fig = px.pie(df, names='subreddit', title = 'Subreddit')

fig.show()

Subreddit

IsraelPalestine
worldnews
AskMiddleEast
CombatFootage
19.1%
PublicFreakout
27.7% NonCredibleDefense
Palestine
worldnewsvideo
IsrealPalestineWar_23
14.8% TerrifyingAsFuck
0.0701% NoahGetTheBoat
2.9 0.104% AbruptChaos
5% 0.105%
3. CrazyFuckingVideos
67 0.263%
%
0.288% ActualPublicFreakouts
14.7% 1.34%
7.31%
7.52%

In [21]: plt.figure(figsize=(15,6))
sns.histplot(df['score'], kde = True, bins = 5, palette = 'hls')
plt.show()
In [22]: plt.figure(figsize=(15,6))
sns.distplot(df['score'], kde = True, bins = 5)
plt.show()

In [23]: plt.figure(figsize=(15,6))
sns.boxplot(df['score'], data = df, palette = 'hls')
plt.show()

In [24]: plt.figure(figsize=(15,6))
sns.violinplot(df['score'], data = df, palette = 'hls')
plt.show()
In [25]: fig = go.Figure(data=[go.Histogram(x=df['score'], nbinsx=5)])
fig.update_layout(title='Histogram of score', xaxis_title='score', yaxis_title='Count')
fig.show()

Histogram of score

180k

160k

140k

120k

100k
Count

80k

60k

40k

20k

0
−5k 0 5k 10k 15k 20k

score

In [26]: fig = px.box(df, y='score', title=f'Box Plot Score')

fig.show()
Box Plot Score

15k

10k
score

In [27]: fig = px.violin(df, y='score', title=f'Box Plot Score')

fig.show()

Box Plot Score

15k

10k
score

In [28]: df_new = df.copy()

In [29]: def clean_text(text):

text = text.lower()
return text.strip()

In [30]: df_new['self_text'] = df_new['self_text'].apply(lambda x: clean_text(x))

In [31]: import string

string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
Out[31]:

In [32]: def remove_punctuation(text):

punctuationfree="".join([i for i in text if i not in string.punctuation])
return punctuationfree

In [33]: df_new['self_text']= df_new['self_text'].apply(lambda x:remove_punctuation(x))

In [34]: import re
In [35]: def tokenization(text):
tokens = re.split('W+',text)
return tokens

In [36]: df_new['self_text']= df_new['self_text'].apply(lambda x: tokenization(x))

In [37]: import nltk

from wordcloud import WordCloud

In [38]: nltk.download('vader_lexicon')
nltk.download('stopwords')

[nltk_data] Downloading package vader_lexicon to

[nltk_data] C:\Users\hp5cd\AppData\Roaming\nltk_data...
[nltk_data] Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\hp5cd\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
True
Out[38]:

In [39]: stopwords = nltk.corpus.stopwords.words('english')

In [40]: def remove_stopwords(text):

output= " ".join(i for i in text if i not in stopwords)
return output

In [41]: df_new['self_text']= df_new['self_text'].apply(lambda x:remove_stopwords(x))

In [42]: from nltk.stem import WordNetLemmatizer

In [43]: wordnet_lemmatizer = WordNetLemmatizer()

In [44]: def lemmatizer(text):

lemm_text = "".join([wordnet_lemmatizer.lemmatize(word) for word in text])
return lemm_text

In [45]: df_new['self_text']=df_new['self_text'].apply(lambda x:lemmatizer(x))

In [46]: def clean_text(text):

text = re.sub('\[.*\]','', text).strip()
text = re.sub('\S*\d\S*\s*','', text).strip()
return text.strip()

In [47]: df_new['self_text'] = df_new['self_text'].apply(lambda x: clean_text(x))

In [48]: def remove_urls(vTEXT):

vTEXT = re.sub(r'(https|http)?:\/\/(\w|\.|\/|\?|\=|\&|\%)*\b', '', vTEXT, flags=re.MULTILINE)
return(vTEXT)

In [49]: df_new['self_text'] = df_new['self_text'].apply(lambda x: remove_urls(x))

In [50]: def remove_digits(text):

clean_text = re.sub(r"\b[0-9]+\b\s*", "", text)
return(text)

In [51]: df_new['self_text'] = df_new['self_text'].apply(lambda x: remove_digits(x))

In [52]: def remove_emojis(data):

emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"]+", flags=re.UNICODE)
return re.sub(emoji_pattern, '', data)

In [53]: df_new['self_text'] = df_new['self_text'].apply(lambda x: remove_emojis(x))

In [54]: df_new['self_text'] = df_new['self_text'].apply(lambda x:re.sub(r'\s+[a-zA-Z]\s+', '', x))

In [55]: df_new['self_text'] = df_new['self_text'].apply(lambda x:re.sub(r'\s+', ' ', x, flags=re.I))

In [56]: df_new
Out[56]: comment_id score self_text subreddit created_time

0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 16-10-2023 19:39

1 k547q14 1 we are the only part of the world that has fre... Palestine 16-10-2023 19:36

2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 16-10-2023 19:34

3 k54742r 1 these people didnt vote hamas in or something ... worldnews 16-10-2023 19:32

4 k5473zi 1 we dont care what you do we just want to live ... worldnews 16-10-2023 19:32

... ... ... ... ... ...

189626 k3sdwfc 42 us this is bullshit Palestine 07-10-2023 05:20

189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 07-10-2023 05:17

189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 07-10-2023 05:08

189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 07-10-2023 01:46

189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 07-10-2023 01:42

189631 rows × 5 columns

In [57]: from textblob import TextBlob

In [58]: df_new['sentiment'] = df_new['self_text'].apply(lambda x: TextBlob(str(x)).sentiment.polarity)

In [59]: df_new

Out[59]: comment_id score self_text subreddit created_time sentiment

0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 16-10-2023 19:39 0.000000

1 k547q14 1 we are the only part of the world that has fre... Palestine 16-10-2023 19:36 0.000000

2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 16-10-2023 19:34 0.305159

3 k54742r 1 these people didnt vote hamas in or something ... worldnews 16-10-2023 19:32 0.045000

4 k5473zi 1 we dont care what you do we just want to live ... worldnews 16-10-2023 19:32 -0.347643

... ... ... ... ... ... ...

189626 k3sdwfc 42 us this is bullshit Palestine 07-10-2023 05:20 0.000000

189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 07-10-2023 05:17 0.000000

189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 07-10-2023 05:08 0.000000

189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 07-10-2023 01:46 -0.300000

189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 07-10-2023 01:42 -0.032143

189631 rows × 6 columns

In [60]: sentiment_correlation = df_new[['score', 'sentiment']].corr()

print('Correlation between "score" and sentiment:')
print(sentiment_correlation)

Correlation between "score" and sentiment:

score sentiment
score 1.00000 -0.00933
sentiment -0.00933 1.00000

In [61]: average_score_per_subreddit = df_new.groupby('subreddit')['score'].mean()

print('Average score per subreddit:')
print(average_score_per_subreddit)

Average score per subreddit:

subreddit
AbruptChaos 8.715000
ActualPublicFreakouts 60.563910
AskMiddleEast 4.736400
CombatFootage 40.808430
CrazyFuckingVideos 11.223350
IsraelPalestine 1.436205
IsrealPalestineWar_23 1.746945
NoahGetTheBoat 12.160643
NonCredibleDefense 33.659719
Palestine 11.412600
PublicFreakout 47.413750
TerrifyingAsFuck 36.661172
worldnews 76.210336
worldnewsvideo 8.808324
Name: score, dtype: float64

In [62]: df_new['created_time'] = pd.to_datetime(df_new['created_time'])

In [63]: score_over_time = df_new.set_index('created_time').resample('D')['score'].mean()

print('Score trends over time:')
print(score_over_time)
Score trends over time:
created_time
2023-07-10 120.231962
2023-07-11 NaN
2023-07-12 NaN
2023-07-13 NaN
2023-07-14 NaN
...
2023-12-06 NaN
2023-12-07 NaN
2023-12-08 NaN
2023-12-09 NaN
2023-12-10 28.593894
Freq: D, Name: score, Length: 154, dtype: float64

In [64]: df_new['text_length'] = df_new['self_text'].apply(len)

In [65]: length_correlation = df_new[['score', 'text_length']].corr()

print('Correlation between "score" and text length:')
print(length_correlation)

Correlation between "score" and text length:

score text_length
score 1.00000 -0.01091
text_length -0.01091 1.00000

In [66]: from collections import Counter

In [67]: def word_frequency(word):

return df_new[df_new['self_text'].str.contains(word, case=False, na=False)]['score'].mean()

word_example_frequency = word_frequency('example')
print('Average score for comments containing the word "example":', word_example_frequency)

Average score for comments containing the word "example": 14.987375415282392

In [68]: plt.figure(figsize=(15,6))
plt.hist(df_new['text_length'], bins=30, edgecolor='black')
plt.xlabel('Text Length')
plt.ylabel('Frequency')
plt.title('Histogram of Text Length')
plt.show()

In [69]: from sklearn.feature_extraction.text import CountVectorizer

from sklearn.decomposition import LatentDirichletAllocation as LDA

In [70]: text_data = df_new['self_text'].astype(str)

In [71]: vectorizer = CountVectorizer(max_df=0.85, stop_words='english')

text_vectorized = vectorizer.fit_transform(text_data)

In [72]: num_topics = 5
lda = LDA(n_components=num_topics, random_state=42)
lda.fit(text_vectorized)

Out[72]: ▾ LatentDirichletAllocation
LatentDirichletAllocation(n_components=5, random_state=42)

In [73]: for topic_idx, topic in enumerate(lda.components_):

print(f"Topic {topic_idx + 1}:")
print([vectorizer.get_feature_names()[i] for i in topic.argsort()[:-10 - 1:-1]])
print()
Topic 1:
['iran', 'just', 'russia', 'israel', 'ukraine', 'like', 'dont', 'going', 'weapons', 'time']

Topic 2:
['hamas', 'israel', 'people', 'gaza', 'civilians', 'palestinians', 'just', 'israeli', 'like', 'war']

Topic 3:
['like', 'just', 'people', 'say', 'video', 'news', 'im', 'post', 'good', 'media']

Topic 4:
['comments', 'action', 'questions', 'comment', 'concerns', 'contact', 'based', 'automatically', 'performed', 'moderators']

Topic 5:
['israel', 'jews', 'land', 'palestine', 'people', 'palestinians', 'arab', 'like', 'jewish', 'state']

In [74]: df_new

Out[74]: comment_id score self_text subreddit created_time sentiment text_length

0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 2023-10-16 19:39:00 0.000000 278

1 k547q14 1 we are the only part of the world that has fre... Palestine 2023-10-16 19:36:00 0.000000 148

2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 2023-10-16 19:34:00 0.305159 285

3 k54742r 1 these people didnt vote hamas in or something ... worldnews 2023-10-16 19:32:00 0.045000 630

4 k5473zi 1 we dont care what you do we just want to live ... worldnews 2023-10-16 19:32:00 -0.347643 293

... ... ... ... ... ... ... ...

189626 k3sdwfc 42 us this is bullshit Palestine 2023-07-10 05:20:00 0.000000 19

189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 2023-07-10 05:17:00 0.000000 120

189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 2023-07-10 05:08:00 0.000000 123

189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 2023-07-10 01:46:00 -0.300000 79

189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 2023-07-10 01:42:00 -0.032143 1232

189631 rows × 7 columns

In [75]: highest_score_index = df_new['score'].idxmax()

lowest_score_index = df_new['score'].idxmin()

highest_score_text = df_new.loc[highest_score_index, 'self_text']

highest_score = df_new.loc[highest_score_index, 'score']

lowest_score_text = df_new.loc[lowest_score_index, 'self_text']

lowest_score = df_new.loc[lowest_score_index, 'score']

print(f"Comment with the highest score ({highest_score}):")

print(highest_score_text)
print("\n")

print(f"Comment with the lowest score ({lowest_score}):")

print(lowest_score_text)

Comment with the highest score (16463):

that’s pretty damning for netanyahu and israeli intelligence no

Comment with the lowest score (-934):

too bad its pretty much pointless when they fire thousands of rockets

In [76]: def categorize_sentiment(polarity):

if polarity > 0.05:
return 'Positive'
elif polarity < -0.05:
return 'Negative'
else:
return 'Neutral'

In [77]: df_new['sentiment_category'] = df_new['sentiment'].apply(categorize_sentiment)

In [78]: df_new
Out[78]: comment_id score self_text subreddit created_time sentiment text_length sentiment_category

0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 2023-10-16 19:39:00 0.000000 278 Neutral

1 k547q14 1 we are the only part of the world that has fre... Palestine 2023-10-16 19:36:00 0.000000 148 Neutral

2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 2023-10-16 19:34:00 0.305159 285 Positive

3 k54742r 1 these people didnt vote hamas in or something ... worldnews 2023-10-16 19:32:00 0.045000 630 Neutral

4 k5473zi 1 we dont care what you do we just want to live ... worldnews 2023-10-16 19:32:00 -0.347643 293 Negative

... ... ... ... ... ... ... ... ...

189626 k3sdwfc 42 us this is bullshit Palestine 2023-07-10 05:20:00 0.000000 19 Neutral

189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 2023-07-10 05:17:00 0.000000 120 Neutral

189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 2023-07-10 05:08:00 0.000000 123 Neutral

189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 2023-07-10 01:46:00 -0.300000 79 Negative

189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 2023-07-10 01:42:00 -0.032143 1232 Neutral

189631 rows × 8 columns

In [79]: df_new['sentiment_category'].unique()

array(['Neutral', 'Positive', 'Negative'], dtype=object)

Out[79]:

In [80]: df_new['sentiment_category'].value_counts()

Neutral 79689
Out[80]:
Positive 68529
Negative 41413
Name: sentiment_category, dtype: int64

In [81]: plt.figure(figsize=(15,6))
sns.countplot(df_new['sentiment_category'], data = df_new, palette = 'hls')
plt.show()

In [82]: plt.figure(figsize=(30,20))
plt.pie(df_new['sentiment_category'].value_counts(), labels=df_new['sentiment_category'].value_counts().index,
autopct='%1.1f%%', textprops={ 'fontsize': 25,
'color': 'black',
'weight': 'bold',
'family': 'serif' })
hfont = {'fontname':'serif', 'weight': 'bold'}
plt.title('Sentiment Category', size=20, **hfont)
plt.show()
In [83]: def map_sentiment_to_numeric(sentiment_category):
if sentiment_category == 'Positive':
return 1
elif sentiment_category == 'Negative':
return -1
else:
return 0

In [84]: df_new['sentiment_numeric'] = df_new['sentiment_category'].apply(map_sentiment_to_numeric)

In [85]: df3 = df_new[['self_text', 'sentiment_numeric']]

In [86]: df3
Out[86]: self_text sentiment_numeric

0 exactlycan remember the humanitarian aid strea... 0

1 we are the only part of the world that has fre... 0

2 i don’t make israeli strategy nor amisraeli or... 1

3 these people didnt vote hamas in or something ... 0

4 we dont care what you do we just want to live ... -1

... ... ...

189626 us this is bullshit 0

189627 i am in the united states and it has the dotte... 0

189628 in which country are you sometimes maps adapt ... 0

189629 you cant give up on something you only pretend... -1

189630 gt the head of islamic jihad denounced arab at... 0

189631 rows × 2 columns

In [87]: positive_text_data = ' '.join(df_new[df3['sentiment_numeric'] == 1]['self_text'])

if positive_text_data:
wordcloud = WordCloud(background_color='white').generate(positive_text_data)

fig, ax = plt.subplots(figsize=(30, 10))

ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
plt.show()
else:
print('No positive text data to generate a word cloud.')

In [88]: negative_text_data = ' '.join(df_new[df3['sentiment_numeric'] == -1]['self_text'])

if negative_text_data:
wordcloud = WordCloud(background_color='white').generate(negative_text_data)

fig, ax = plt.subplots(figsize=(30, 10))

ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
plt.show()
else:
print('No negative text data to generate a word cloud.')
In [89]: neutral_text_data = ' '.join(df_new[df3['sentiment_numeric'] == 0]['self_text'])

if neutral_text_data:
wordcloud = WordCloud(background_color='white').generate(neutral_text_data)

fig, ax = plt.subplots(figsize=(30, 10))

ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
plt.show()
else:
print('No neutral text data to generate a word cloud.')

In [90]: from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report
from sklearn.metrics import precision_recall_fscore_support
from sklearn.feature_extraction.text import TfidfVectorizer

In [91]: X = df_new['self_text']
y = df_new['sentiment_numeric']

In [92]: tfidf_vectorizer = TfidfVectorizer(max_features=5000)

X_tfidf = tfidf_vectorizer.fit_transform(X)

In [93]: X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

In [94]: from sklearn.linear_model import LogisticRegression

In [95]: lr_classifier = LogisticRegression()

lr_classifier.fit(X_train, y_train)
Out[95]: ▾ LogisticRegression

LogisticRegression()

In [96]: y_pred = lr_classifier.predict(X_test)

lr_accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", lr_accuracy)

Accuracy: 0.8297255253513328

In [97]: precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, average='weighted')

print('Precision:', precision)
print('Recall:', recall)
print('F1 Score:', f1)

Precision: 0.8311469792237322
Recall: 0.8297255253513328
F1 Score: 0.8291814267939249

In [98]: from sklearn.tree import DecisionTreeClassifier

In [99]: dt_classifier = DecisionTreeClassifier()

dt_classifier.fit(X_train, y_train)

Out[99]: ▾ DecisionTreeClassifier

DecisionTreeClassifier()

In [100… y_pred = dt_classifier.predict(X_test)

dt_accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", dt_accuracy)

Accuracy: 0.7498615761858307

In [101… precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, average='weighted')

print('Precision:', precision)
print('Recall:', recall)
print('F1 Score:', f1)

Precision: 0.7489914531442408
Recall: 0.7498615761858307
F1 Score: 0.7493180803889777

In [102… accuracies = {
'Logistic Regression': lr_accuracy,
'Decision Tree': dt_accuracy,
}

In [103… fig = go.Figure(

data=[
go.Bar(x=list(accuracies.keys()), y=list(accuracies.values()))
],
layout={
'title': 'Model Comparison: Accuracy',
'xaxis': {'title': 'Models'},
'yaxis': {'title': 'Accuracy'}
}
)
fig.show()
Model Comparison: Accuracy

0.8

0.7

0.6

0.5
Accuracy

0.4

0.3

0.2

0.1

0
Logistic Regression Decision Tree

Models

10 Streamlit
No ratings yet
10 Streamlit
7 pages
Chandru Lab 3
No ratings yet
Chandru Lab 3
7 pages
Prototype 1
No ratings yet
Prototype 1
10 pages
Part C Assignment No 2 Mini Project On Twitter 1
No ratings yet
Part C Assignment No 2 Mini Project On Twitter 1
9 pages
Twitter Analysis
No ratings yet
Twitter Analysis
5 pages
Analyzing Social Media Data in Python Chapter2
No ratings yet
Analyzing Social Media Data in Python Chapter2
30 pages
Russia Vs Ukraine Tweets Analysis
No ratings yet
Russia Vs Ukraine Tweets Analysis
20 pages
Sma Exp 03 Code Print
No ratings yet
Sma Exp 03 Code Print
5 pages
Twitter Sentiment Analysis Guide
No ratings yet
Twitter Sentiment Analysis Guide
7 pages
Analyzing Social Media Data in Python Chapter4
No ratings yet
Analyzing Social Media Data in Python Chapter4
20 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
13 pages
Twitter Data Analysis with Python
No ratings yet
Twitter Data Analysis with Python
21 pages
SMA4
No ratings yet
SMA4
5 pages
Sma Exp 09 Code Print
No ratings yet
Sma Exp 09 Code Print
5 pages
Sample Report 5
No ratings yet
Sample Report 5
73 pages
Sentiment Analysis for Tweets
No ratings yet
Sentiment Analysis for Tweets
11 pages
Sentiment Analysis - Comparing Algorithms Accuracy
No ratings yet
Sentiment Analysis - Comparing Algorithms Accuracy
22 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Data Minning Assignment #1: Submitted By: Rahul Kumar Roll No: 160BTCCSE010 Class: CSE A, 3rd Year
No ratings yet
Data Minning Assignment #1: Submitted By: Rahul Kumar Roll No: 160BTCCSE010 Class: CSE A, 3rd Year
9 pages
ChatGPT Tweets Analysis & Visualization
No ratings yet
ChatGPT Tweets Analysis & Visualization
50 pages
Dsbda 4
No ratings yet
Dsbda 4
13 pages
Document From Gr7
No ratings yet
Document From Gr7
29 pages
Data Analysis for Social Media Posts
No ratings yet
Data Analysis for Social Media Posts
10 pages
Tbower Finalwriteup v1
No ratings yet
Tbower Finalwriteup v1
12 pages
Package Sentimentr': R Topics Documented
No ratings yet
Package Sentimentr': R Topics Documented
49 pages
EXP5
No ratings yet
EXP5
15 pages
Python Twitter Sentiment Guide
No ratings yet
Python Twitter Sentiment Guide
21 pages
Analyzing Political Discourse On Reddit
No ratings yet
Analyzing Political Discourse On Reddit
20 pages
Social Media Sentimental Analysis 1
No ratings yet
Social Media Sentimental Analysis 1
30 pages
Best Prog
No ratings yet
Best Prog
4 pages
Engagement and Sentiment of Ukraine-Russia Posts
No ratings yet
Engagement and Sentiment of Ukraine-Russia Posts
13 pages
Streamlit: Build Python Data Apps
No ratings yet
Streamlit: Build Python Data Apps
22 pages
Python Data Analysis for Students
No ratings yet
Python Data Analysis for Students
22 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
Pandas EDA for Data Science Students
No ratings yet
Pandas EDA for Data Science Students
20 pages
Twitter Sentiment Analysis Guide
No ratings yet
Twitter Sentiment Analysis Guide
23 pages
All Exp Lab
No ratings yet
All Exp Lab
15 pages
Twitter As Data PDF
No ratings yet
Twitter As Data PDF
116 pages
TP 10 Big Data (Ega Sarmita) PDF
No ratings yet
TP 10 Big Data (Ega Sarmita) PDF
6 pages
Code
No ratings yet
Code
13 pages
Reddit Post Popularity Prediction
No ratings yet
Reddit Post Popularity Prediction
1 page
Twitter Sentiment Analysis Russia-Ukraine-War - Report File
No ratings yet
Twitter Sentiment Analysis Russia-Ukraine-War - Report File
40 pages
Course3 Notes
No ratings yet
Course3 Notes
44 pages
PRO Level Data Visualization Cheat Sheet
No ratings yet
PRO Level Data Visualization Cheat Sheet
15 pages
@PowerBI - Ir - Data Visualization Cheat Sheet
No ratings yet
@PowerBI - Ir - Data Visualization Cheat Sheet
15 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
14 pages
Adithiyaa BR 23MBA0018 SMA DA Text Mining PDF
No ratings yet
Adithiyaa BR 23MBA0018 SMA DA Text Mining PDF
6 pages
Predictive Modeling
No ratings yet
Predictive Modeling
13 pages
Problem Statement
No ratings yet
Problem Statement
10 pages
Cmu ML 14 101
No ratings yet
Cmu ML 14 101
133 pages
Data Visualization - New
No ratings yet
Data Visualization - New
5 pages
Reddit Data Sentiment Analysis
No ratings yet
Reddit Data Sentiment Analysis
3 pages
The Chai Squad Project Paper
No ratings yet
The Chai Squad Project Paper
6 pages
Data Science Project
No ratings yet
Data Science Project
34 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Sma QB Solution Tt2
No ratings yet
Sma QB Solution Tt2
40 pages
Mapping Global Data Sets - Json
100% (1)
Mapping Global Data Sets - Json
15 pages
Python for Twitter Data Analysis
No ratings yet
Python for Twitter Data Analysis
21 pages
Total Likes Type Category Post Month Post Weekday Post Hour Paid Lifetime Post Total Reach Lifetime Post Total Impressions
No ratings yet
Total Likes Type Category Post Month Post Weekday Post Hour Paid Lifetime Post Total Reach Lifetime Post Total Impressions
10 pages
Karl Marx Theory
No ratings yet
Karl Marx Theory
95 pages
Fourth Quarter CSS (Computer Systems Servicing)
No ratings yet
Fourth Quarter CSS (Computer Systems Servicing)
54 pages
Automate Oracle HFM with SFCC
No ratings yet
Automate Oracle HFM with SFCC
10 pages
Stability Analysis of Earth Dam by Geostudio
No ratings yet
Stability Analysis of Earth Dam by Geostudio
10 pages
Manual de Servicio Compact Touch Quantel Medical
No ratings yet
Manual de Servicio Compact Touch Quantel Medical
67 pages
Amul PPT Mba - Dox
No ratings yet
Amul PPT Mba - Dox
74 pages
Fuel Injection System
100% (1)
Fuel Injection System
49 pages
04 Task Performance PLATECH
100% (1)
04 Task Performance PLATECH
3 pages
Livro Geografia 2023 1anoCARAC PDF
No ratings yet
Livro Geografia 2023 1anoCARAC PDF
102 pages
KMU v. The Director General (NEDA)
No ratings yet
KMU v. The Director General (NEDA)
13 pages
Load Cell Weigh In Motion Proposal
No ratings yet
Load Cell Weigh In Motion Proposal
23 pages
Syllabus 03
No ratings yet
Syllabus 03
2 pages
18 U.S.C. 1030 (A) (2) (C) - Without Authorization
No ratings yet
18 U.S.C. 1030 (A) (2) (C) - Without Authorization
15 pages
Advanced Web Designing
No ratings yet
Advanced Web Designing
11 pages
668eb7550817e7867f7b26f8 NX 2312 Install WNT
No ratings yet
668eb7550817e7867f7b26f8 NX 2312 Install WNT
46 pages
Unit 4 - Designing User Interface With View (MSBTE MAD 22617 MCQS) PDF
100% (1)
Unit 4 - Designing User Interface With View (MSBTE MAD 22617 MCQS) PDF
8 pages
COME 2202 NAHPI Introduction To Computer Networks Course Outline April 2024docx
No ratings yet
COME 2202 NAHPI Introduction To Computer Networks Course Outline April 2024docx
24 pages
PURELAB Chorus 1 Complete and Chorus 2+ Quick Reference Guide GUID40961 Version 01
No ratings yet
PURELAB Chorus 1 Complete and Chorus 2+ Quick Reference Guide GUID40961 Version 01
18 pages
ME6703 CIM Important Questions Unit 1
No ratings yet
ME6703 CIM Important Questions Unit 1
2 pages
GIS/RS Training Evaluation Test
100% (4)
GIS/RS Training Evaluation Test
3 pages
ACA (15CS72) MODULE-1: 1.0 Objective
No ratings yet
ACA (15CS72) MODULE-1: 1.0 Objective
61 pages
Google Workspace Skills Checklist - ALX Foundations
No ratings yet
Google Workspace Skills Checklist - ALX Foundations
16 pages
Automation and Manufacturing Quiz
No ratings yet
Automation and Manufacturing Quiz
4 pages
Cyberoam IView Linux Installation Guide
No ratings yet
Cyberoam IView Linux Installation Guide
11 pages
The Lean Startup Quiz Answer
100% (2)
The Lean Startup Quiz Answer
6 pages
It Analyst Resume
100% (1)
It Analyst Resume
7 pages
Mechatronics & Robotics Course Guide
No ratings yet
Mechatronics & Robotics Course Guide
8 pages
OpenText Media Management 16.3 - Administration Guide English (MEDMGT160300-AGD-EN-02) PDF
No ratings yet
OpenText Media Management 16.3 - Administration Guide English (MEDMGT160300-AGD-EN-02) PDF
306 pages
CR02AM: Mitsubishi Semiconductor Thyristor
No ratings yet
CR02AM: Mitsubishi Semiconductor Thyristor
6 pages
Draw The General View of Telecommunication and Explain The Function of The Each Unit?
No ratings yet
Draw The General View of Telecommunication and Explain The Function of The Each Unit?
22 pages