Attachment 1

This document outlines an assessment task involving sentiment analysis using naive Bayes classification. Students will analyze sentiment in movie and product reviews using a provided Python script implementing naive Bayes. The task involves familiarizing themselves with the code, running it on different datasets, analyzing the most useful words, comparing naive Bayes to a rule-based approach, and explaining errors. Students must write a report of no more than 1000 words following the outlined steps and submit it by the deadline for marking.

Uploaded by

Ali Hammad Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views3 pages

Attachment 1

Uploaded by

Ali Hammad Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

COM6115 Text Processing

Assessment: Sentiment Analysis

Quick Summary
To better understand the strengths and limitations of Bayesian text classification, in this
assignment you are going to investigate Sentiment Analysis using two sentiment datasets you
will be provided. You will also be provided a python script that implements Naive Bayes.
You will need to write a report (no more than 1000 words) to describe your results and
findings.

Note: This assessment accounts for 30% of your total mark for the course. Your report may
be submitted for a plagiarism check (e.g., Turnitin). For any clarification on this assessment,
please contact Dr Chenghua Lin (c.lin@sheffield.ac.uk) and Prof. Aline Villavicencio
(a.villavicencio@sheffield.ac.uk)

Assessment Tasks
STEP 1: Download the data from Blackboard. This contains the following:

1. A dataset with snippets of movie reviews from the Rotten Tomatoes website (one text
file for positive reviews and one text file for negative reviews).
1. rt-polarity.pos
2. rt-polarity.neg
2. A smaller dataset with snippets of reviews for Nokia phones (again, 2 files)
1. nokia-pos.txt
2. nokia-neg.txt
3. A sentiment dictionary of positive and negative sentiment words:
1. negative-words.txt contains 4783 negative-sentiment words
2. positive-words.txt contains 2006 positive-sentiment words
4. The python script with my implementation of Naive Bayes, a knowledge-based
classifier using the sentiment dictionary, as well as some helper functions:
1. Sentiment.py (you will need Python3 to run)

STEP 2: Familiarise:

1. The code splits the Rotten Tomatoes Data into a training and test set in readFiles(),
then builds the p(word|sentiment) model on the training data in trainBayes(), and
finally applies Naive Bayes to the test data in testBayes().
2. Write a function which will print out Accuracy, Precision, Recall and F-measure for
the test data. [5 pt]
3. Run the code and report the classification results. [5 pt]

STEP 3: Run Naive Bayes on other data:

1. In the python script, towards the end of the file (lines 272 and 274), uncomment out
the other two calls to testBayes(). These run NaiveBayes on the training data and on
Nokia product reviews
2. What do you observe? Why are the results so different? [10 pt]
STEP 4: What is being learnt by the model?

1. Which are the most useful words for predicting sentiment? The code you have
downloaded contains another function mostUseful() that prints the most useful words
for deciding sentiment. [5 pt]
2. Uncomment the call to mostUseful(pWordPos, pWordNeg, pWord, 50) at the bottom
of the program, and run the code again. This prints the words with the highest
predictive value. Are the words selected by the model good sentiment terms? How
many are in the sentiment dictionary? [5 pt]

STEP 5: How does a rule-based system compare?

1. Add some code for the function testDictionary() which will print out Accuracy,
Precision, Recall and F-measure for the test data. [5 pt]
Uncomment out the three lines towards the end of the program that call the function
testDictionary() and run the program again. All this code does is add up the number of
negative and positive words present in a review and predict the larger class.
2. How does the dictionary-based approach compare to Naive Bayes on the two
domains? What conclusions do you draw about statistical and rule-based approaches
from your observations? [5 pt]
3. Write a new function to improve the rule-based system, e.g., to take into account
negation, diminisher rules, etc. Run the program again and analyse the results on both
datasets. [25 pt]

STEP 6: Error Analysis.

1. Comment out all but one of the testBayes/testDictionary calls.

2. At the top of the program, set PRINT_ERRORS=1
3. Run the program again, and it will print out the mistakes made. List the mistakes in
the report [5 pt]
4. Please explain why the model is making mistakes (e.g., analyse the errors and report
any patterns or generalisations). [15 pt]

Marking Criteria

Submit a report to describe your results and findings by following the tasks detailed in the 6
steps above.

1. Quality of the report, including structure and clarity. No more than 1000 words. [15 pt]
2. Step 2 [10 pt]
3. Step 3 [10 pt]
4. Step 4 [10 pt]
5. Step 5 [35 pt]
6. Step 6 [20 pt]

Submission Guideline

You should submit a PDF version of your report along with your code via Blackboard by
23:59 PM (Friday) 9th December 2022. The name of the PDF file should have the form
“COM6115_Assessment-SA_< your Surname>_<your first name>_<Your Student ID>”. For
instance, “COM6115_Assessment-SA_Smith_John_XXXXX.pdf”, where XXXXX is your
student ID.

Lasmathhs017 Math 8 Q1 W3 LC1
No ratings yet
Lasmathhs017 Math 8 Q1 W3 LC1
6 pages
Kabir Jivan Parichay - PDF
No ratings yet
Kabir Jivan Parichay - PDF
9 pages
Revision Matrix
0% (1)
Revision Matrix
5 pages
Mobile Applicatiion & Development Unit-4
No ratings yet
Mobile Applicatiion & Development Unit-4
30 pages
Mackey - Assyrian Contemporaries of Ramses II The Great
No ratings yet
Mackey - Assyrian Contemporaries of Ramses II The Great
8 pages
Create a Digital Magazine Project
No ratings yet
Create a Digital Magazine Project
3 pages
Post-War Resilience in Nigeria
0% (1)
Post-War Resilience in Nigeria
8 pages
Gloomhaven: The Infinite Beyond
No ratings yet
Gloomhaven: The Infinite Beyond
5 pages
O-Levels Mathematics Exemplar
100% (2)
O-Levels Mathematics Exemplar
60 pages
S.N Balagangadhara Notes - On - Normativity
No ratings yet
S.N Balagangadhara Notes - On - Normativity
14 pages
33 Physics SV 2024 Exam-1
No ratings yet
33 Physics SV 2024 Exam-1
11 pages
COL106 Assignment 2
No ratings yet
COL106 Assignment 2
5 pages
"Two Great Tastes That Taste Great Together!": David Wilson Brown
No ratings yet
"Two Great Tastes That Taste Great Together!": David Wilson Brown
21 pages
Dialect Film Subtitle Strategies
No ratings yet
Dialect Film Subtitle Strategies
4 pages
Past Simple Issatayeva A
No ratings yet
Past Simple Issatayeva A
16 pages
Journal of Cuneiform Studies-Vol. 58-2006 PDF
No ratings yet
Journal of Cuneiform Studies-Vol. 58-2006 PDF
143 pages
Seven C's of Communication
85% (13)
Seven C's of Communication
36 pages
LO1 Assignment
No ratings yet
LO1 Assignment
12 pages
Opium of The People - Wikipedia
No ratings yet
Opium of The People - Wikipedia
6 pages
A Formula For PI (X) Applied To A Result of Koninck-Ivic
No ratings yet
A Formula For PI (X) Applied To A Result of Koninck-Ivic
2 pages
Test Bank For New Perspectives On Computer Concepts 2018: Comprehensive, 20th Edition, June Jamrich Parsons Download
100% (8)
Test Bank For New Perspectives On Computer Concepts 2018: Comprehensive, 20th Edition, June Jamrich Parsons Download
51 pages
Student Presentation Evaluation
No ratings yet
Student Presentation Evaluation
7 pages
Halloween, Grade 1 To 3 - 1209i
100% (3)
Halloween, Grade 1 To 3 - 1209i
36 pages
Hikvision VMS
100% (4)
Hikvision VMS
11 pages
Computer - Revision Sheet - Prep 1 - T2 - 2024
No ratings yet
Computer - Revision Sheet - Prep 1 - T2 - 2024
20 pages
Tutorial Zoom Details For VLE
No ratings yet
Tutorial Zoom Details For VLE
1 page
The Abuse of God's Grace - Nicholas Clagett
No ratings yet
The Abuse of God's Grace - Nicholas Clagett
362 pages
St. Paul's Clothing Closet
No ratings yet
St. Paul's Clothing Closet
10 pages
Đề khảo sát trường Nguyễn Trãi
No ratings yet
Đề khảo sát trường Nguyễn Trãi
4 pages
Chapter1.2 PythonPandas2
No ratings yet
Chapter1.2 PythonPandas2
38 pages

Attachment 1

Uploaded by

Attachment 1

Uploaded by

COM6115 Text Processing

Assessment: Sentiment Analysis

STEP 3: Run Naive Bayes on other data:

STEP 5: How does a rule-based system compare?

STEP 6: Error Analysis.

1. Comment out all but one of the testBayes/testDictionary calls.

You might also like