Sentiment Analysis of Reddit Comments On Israel and Palestine Conflict: A Social Discourse Study
Sentiment Analysis of Reddit Comments On Israel and Palestine Conflict: A Social Discourse Study
In [4]: df = pd.read_csv('pls_isl_conflict_comments.csv')
In [5]: df.head()
0 k5480sx 1 Exactly! I can remember the humanitarian aid s... worldnews 16-10-2023 19:39
1 k547q14 1 *We are the only part of the World that has Fr... Palestine 16-10-2023 19:36
2 k547elf 1 I don’t make Israeli strategy, nor am I Israel... worldnews 16-10-2023 19:34
3 k54742r 1 These people didn't vote Hamas in or something... worldnews 16-10-2023 19:32
4 k5473zi 1 We don't care what you do. We just want to liv... worldnews 16-10-2023 19:32
In [6]: df.tail()
189627 k3sdixt 1 I am in the United States and it has the dotte... Palestine 07-10-2023 05:17
189628 k3sccp2 54 In which country are you?\nSometimes maps adap... Palestine 07-10-2023 05:08
189629 k3ritvj 116 You can't give up on something you only preten... worldnews 07-10-2023 01:46
189630 k3riboh 30 > The head of Islamic Jihad denounced Arab ... worldnews 07-10-2023 01:42
In [7]: df.shape
(189631, 5)
Out[7]:
In [8]: df.columns
In [9]: df.duplicated().sum()
0
Out[9]:
In [10]: df.isnull().sum()
comment_id 0
Out[10]:
score 0
self_text 0
subreddit 0
created_time 0
dtype: int64
In [11]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 189631 entries, 0 to 189630
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 comment_id 189631 non-null object
1 score 189631 non-null int64
2 self_text 189631 non-null object
3 subreddit 189631 non-null object
4 created_time 189631 non-null object
dtypes: int64(1), object(4)
memory usage: 7.2+ MB
In [12]: df.describe()
Out[12]: score
count 189631.000000
mean 28.583607
std 179.946085
min -934.000000
25% 1.000000
50% 2.000000
75% 10.000000
max 16463.000000
In [13]: df.nunique()
comment_id 189631
Out[13]:
score 1761
self_text 186338
subreddit 14
created_time 13562
dtype: int64
In [15]: df['subreddit'].unique()
In [16]: df['subreddit'].value_counts()
IsraelPalestine 52622
Out[16]:
worldnews 36204
AskMiddleEast 28107
CombatFootage 27901
PublicFreakout 14255
NonCredibleDefense 13865
Palestine 6968
worldnewsvideo 5598
IsrealPalestineWar_23 2537
TerrifyingAsFuck 546
NoahGetTheBoat 498
AbruptChaos 200
CrazyFuckingVideos 197
ActualPublicFreakouts 133
Name: subreddit, dtype: int64
In [17]: plt.figure(figsize=(15,6))
sns.countplot(df['subreddit'], data = df, palette = 'hls')
plt.xticks(rotation = -45)
plt.show()
In [18]: plt.figure(figsize=(30,20))
plt.pie(df['subreddit'].value_counts(), labels=df['subreddit'].value_counts().index,
autopct='%1.1f%%', textprops={ 'fontsize': 25,
'color': 'black',
'weight': 'bold',
'family': 'serif' })
hfont = {'fontname':'serif', 'weight': 'bold'}
plt.title('Subreddit', size=20, **hfont)
plt.show()
50k
40k
Count
30k
20k
10k
0
Isr wo As Co Pu No Pa wo Isr Te No Ab Cr Ac
ae rld kM mb bli nC les rld ea rri ah ru az tu
lPa ne idd atF cF re tin ne lPa fyi Ge pt yF alP
les ws leE re d ibl e w l es ngA tT C ha u c ub
tin o o a k s vid tin sF h os k i ng lic
as tag o u eD u eB Fr
e t e t efe eo eW ck oa Vid ea
ns ar t eo ko
e _2 s ut
3 s
Subreddit
Subreddit
IsraelPalestine
worldnews
AskMiddleEast
CombatFootage
19.1%
PublicFreakout
27.7% NonCredibleDefense
Palestine
worldnewsvideo
IsrealPalestineWar_23
14.8% TerrifyingAsFuck
0.0701% NoahGetTheBoat
2.9 0.104% AbruptChaos
5% 0.105%
3. CrazyFuckingVideos
67 0.263%
%
0.288% ActualPublicFreakouts
14.7% 1.34%
7.31%
7.52%
In [21]: plt.figure(figsize=(15,6))
sns.histplot(df['score'], kde = True, bins = 5, palette = 'hls')
plt.show()
In [22]: plt.figure(figsize=(15,6))
sns.distplot(df['score'], kde = True, bins = 5)
plt.show()
In [23]: plt.figure(figsize=(15,6))
sns.boxplot(df['score'], data = df, palette = 'hls')
plt.show()
In [24]: plt.figure(figsize=(15,6))
sns.violinplot(df['score'], data = df, palette = 'hls')
plt.show()
In [25]: fig = go.Figure(data=[go.Histogram(x=df['score'], nbinsx=5)])
fig.update_layout(title='Histogram of score', xaxis_title='score', yaxis_title='Count')
fig.show()
Histogram of score
180k
160k
140k
120k
100k
Count
80k
60k
40k
20k
0
−5k 0 5k 10k 15k 20k
score
15k
10k
score
5k
15k
10k
score
5k
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
Out[31]:
In [34]: import re
In [35]: def tokenization(text):
tokens = re.split('W+',text)
return tokens
In [38]: nltk.download('vader_lexicon')
nltk.download('stopwords')
In [56]: df_new
Out[56]: comment_id score self_text subreddit created_time
0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 16-10-2023 19:39
1 k547q14 1 we are the only part of the world that has fre... Palestine 16-10-2023 19:36
2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 16-10-2023 19:34
3 k54742r 1 these people didnt vote hamas in or something ... worldnews 16-10-2023 19:32
4 k5473zi 1 we dont care what you do we just want to live ... worldnews 16-10-2023 19:32
189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 07-10-2023 05:17
189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 07-10-2023 05:08
189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 07-10-2023 01:46
189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 07-10-2023 01:42
In [59]: df_new
0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 16-10-2023 19:39 0.000000
1 k547q14 1 we are the only part of the world that has fre... Palestine 16-10-2023 19:36 0.000000
2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 16-10-2023 19:34 0.305159
3 k54742r 1 these people didnt vote hamas in or something ... worldnews 16-10-2023 19:32 0.045000
4 k5473zi 1 we dont care what you do we just want to live ... worldnews 16-10-2023 19:32 -0.347643
189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 07-10-2023 05:17 0.000000
189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 07-10-2023 05:08 0.000000
189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 07-10-2023 01:46 -0.300000
189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 07-10-2023 01:42 -0.032143
word_example_frequency = word_frequency('example')
print('Average score for comments containing the word "example":', word_example_frequency)
In [68]: plt.figure(figsize=(15,6))
plt.hist(df_new['text_length'], bins=30, edgecolor='black')
plt.xlabel('Text Length')
plt.ylabel('Frequency')
plt.title('Histogram of Text Length')
plt.show()
In [72]: num_topics = 5
lda = LDA(n_components=num_topics, random_state=42)
lda.fit(text_vectorized)
Out[72]: ▾ LatentDirichletAllocation
LatentDirichletAllocation(n_components=5, random_state=42)
Topic 2:
['hamas', 'israel', 'people', 'gaza', 'civilians', 'palestinians', 'just', 'israeli', 'like', 'war']
Topic 3:
['like', 'just', 'people', 'say', 'video', 'news', 'im', 'post', 'good', 'media']
Topic 4:
['comments', 'action', 'questions', 'comment', 'concerns', 'contact', 'based', 'automatically', 'performed', 'moderators']
Topic 5:
['israel', 'jews', 'land', 'palestine', 'people', 'palestinians', 'arab', 'like', 'jewish', 'state']
In [74]: df_new
0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 2023-10-16 19:39:00 0.000000 278
1 k547q14 1 we are the only part of the world that has fre... Palestine 2023-10-16 19:36:00 0.000000 148
2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 2023-10-16 19:34:00 0.305159 285
3 k54742r 1 these people didnt vote hamas in or something ... worldnews 2023-10-16 19:32:00 0.045000 630
4 k5473zi 1 we dont care what you do we just want to live ... worldnews 2023-10-16 19:32:00 -0.347643 293
189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 2023-07-10 05:17:00 0.000000 120
189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 2023-07-10 05:08:00 0.000000 123
189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 2023-07-10 01:46:00 -0.300000 79
189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 2023-07-10 01:42:00 -0.032143 1232
lowest_score_index = df_new['score'].idxmin()
In [78]: df_new
Out[78]: comment_id score self_text subreddit created_time sentiment text_length sentiment_category
0 k5480sx 1 exactlycan remember the humanitarian aid strea... worldnews 2023-10-16 19:39:00 0.000000 278 Neutral
1 k547q14 1 we are the only part of the world that has fre... Palestine 2023-10-16 19:36:00 0.000000 148 Neutral
2 k547elf 1 i don’t make israeli strategy nor amisraeli or... worldnews 2023-10-16 19:34:00 0.305159 285 Positive
3 k54742r 1 these people didnt vote hamas in or something ... worldnews 2023-10-16 19:32:00 0.045000 630 Neutral
4 k5473zi 1 we dont care what you do we just want to live ... worldnews 2023-10-16 19:32:00 -0.347643 293 Negative
189627 k3sdixt 1 i am in the united states and it has the dotte... Palestine 2023-07-10 05:17:00 0.000000 120 Neutral
189628 k3sccp2 54 in which country are you sometimes maps adapt ... Palestine 2023-07-10 05:08:00 0.000000 123 Neutral
189629 k3ritvj 116 you cant give up on something you only pretend... worldnews 2023-07-10 01:46:00 -0.300000 79 Negative
189630 k3riboh 30 gt the head of islamic jihad denounced arab at... worldnews 2023-07-10 01:42:00 -0.032143 1232 Neutral
In [79]: df_new['sentiment_category'].unique()
In [80]: df_new['sentiment_category'].value_counts()
Neutral 79689
Out[80]:
Positive 68529
Negative 41413
Name: sentiment_category, dtype: int64
In [81]: plt.figure(figsize=(15,6))
sns.countplot(df_new['sentiment_category'], data = df_new, palette = 'hls')
plt.show()
In [82]: plt.figure(figsize=(30,20))
plt.pie(df_new['sentiment_category'].value_counts(), labels=df_new['sentiment_category'].value_counts().index,
autopct='%1.1f%%', textprops={ 'fontsize': 25,
'color': 'black',
'weight': 'bold',
'family': 'serif' })
hfont = {'fontname':'serif', 'weight': 'bold'}
plt.title('Sentiment Category', size=20, **hfont)
plt.show()
In [83]: def map_sentiment_to_numeric(sentiment_category):
if sentiment_category == 'Positive':
return 1
elif sentiment_category == 'Negative':
return -1
else:
return 0
In [86]: df3
Out[86]: self_text sentiment_numeric
if positive_text_data:
wordcloud = WordCloud(background_color='white').generate(positive_text_data)
if negative_text_data:
wordcloud = WordCloud(background_color='white').generate(negative_text_data)
if neutral_text_data:
wordcloud = WordCloud(background_color='white').generate(neutral_text_data)
In [91]: X = df_new['self_text']
y = df_new['sentiment_numeric']
LogisticRegression()
Accuracy: 0.8297255253513328
Precision: 0.8311469792237322
Recall: 0.8297255253513328
F1 Score: 0.8291814267939249
Out[99]: ▾ DecisionTreeClassifier
DecisionTreeClassifier()
Accuracy: 0.7498615761858307
Precision: 0.7489914531442408
Recall: 0.7498615761858307
F1 Score: 0.7493180803889777
In [102… accuracies = {
'Logistic Regression': lr_accuracy,
'Decision Tree': dt_accuracy,
}
0.8
0.7
0.6
0.5
Accuracy
0.4
0.3
0.2
0.1
0
Logistic Regression Decision Tree
Models