0% found this document useful (0 votes)
78 views3 pages

Attachment 1

This document outlines an assessment task involving sentiment analysis using naive Bayes classification. Students will analyze sentiment in movie and product reviews using a provided Python script implementing naive Bayes. The task involves familiarizing themselves with the code, running it on different datasets, analyzing the most useful words, comparing naive Bayes to a rule-based approach, and explaining errors. Students must write a report of no more than 1000 words following the outlined steps and submit it by the deadline for marking.

Uploaded by

Ali Hammad Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views3 pages

Attachment 1

This document outlines an assessment task involving sentiment analysis using naive Bayes classification. Students will analyze sentiment in movie and product reviews using a provided Python script implementing naive Bayes. The task involves familiarizing themselves with the code, running it on different datasets, analyzing the most useful words, comparing naive Bayes to a rule-based approach, and explaining errors. Students must write a report of no more than 1000 words following the outlined steps and submit it by the deadline for marking.

Uploaded by

Ali Hammad Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

COM6115 Text Processing

Assessment: Sentiment Analysis

Quick Summary
To better understand the strengths and limitations of Bayesian text classification, in this
assignment you are going to investigate Sentiment Analysis using two sentiment datasets you
will be provided. You will also be provided a python script that implements Naive Bayes.
You will need to write a report (no more than 1000 words) to describe your results and
findings.

Note: This assessment accounts for 30% of your total mark for the course. Your report may
be submitted for a plagiarism check (e.g., Turnitin). For any clarification on this assessment,
please contact Dr Chenghua Lin (c.lin@sheffield.ac.uk) and Prof. Aline Villavicencio
(a.villavicencio@sheffield.ac.uk)

Assessment Tasks
STEP 1: Download the data from Blackboard. This contains the following:

1. A dataset with snippets of movie reviews from the Rotten Tomatoes website (one text
file for positive reviews and one text file for negative reviews).
1. rt-polarity.pos
2. rt-polarity.neg
2. A smaller dataset with snippets of reviews for Nokia phones (again, 2 files)
1. nokia-pos.txt
2. nokia-neg.txt
3. A sentiment dictionary of positive and negative sentiment words:
1. negative-words.txt contains 4783 negative-sentiment words
2. positive-words.txt contains 2006 positive-sentiment words
4. The python script with my implementation of Naive Bayes, a knowledge-based
classifier using the sentiment dictionary, as well as some helper functions:
1. Sentiment.py (you will need Python3 to run)

STEP 2: Familiarise:

1. The code splits the Rotten Tomatoes Data into a training and test set in readFiles(),
then builds the p(word|sentiment) model on the training data in trainBayes(), and
finally applies Naive Bayes to the test data in testBayes().
2. Write a function which will print out Accuracy, Precision, Recall and F-measure for
the test data. [5 pt]
3. Run the code and report the classification results. [5 pt]

STEP 3: Run Naive Bayes on other data:

1. In the python script, towards the end of the file (lines 272 and 274), uncomment out
the other two calls to testBayes(). These run NaiveBayes on the training data and on
Nokia product reviews
2. What do you observe? Why are the results so different? [10 pt]
STEP 4: What is being learnt by the model?

1. Which are the most useful words for predicting sentiment? The code you have
downloaded contains another function mostUseful() that prints the most useful words
for deciding sentiment. [5 pt]
2. Uncomment the call to mostUseful(pWordPos, pWordNeg, pWord, 50) at the bottom
of the program, and run the code again. This prints the words with the highest
predictive value. Are the words selected by the model good sentiment terms? How
many are in the sentiment dictionary? [5 pt]

STEP 5: How does a rule-based system compare?

1. Add some code for the function testDictionary() which will print out Accuracy,
Precision, Recall and F-measure for the test data. [5 pt]
Uncomment out the three lines towards the end of the program that call the function
testDictionary() and run the program again. All this code does is add up the number of
negative and positive words present in a review and predict the larger class.
2. How does the dictionary-based approach compare to Naive Bayes on the two
domains? What conclusions do you draw about statistical and rule-based approaches
from your observations? [5 pt]
3. Write a new function to improve the rule-based system, e.g., to take into account
negation, diminisher rules, etc. Run the program again and analyse the results on both
datasets. [25 pt]

STEP 6: Error Analysis.

1. Comment out all but one of the testBayes/testDictionary calls.


2. At the top of the program, set PRINT_ERRORS=1
3. Run the program again, and it will print out the mistakes made. List the mistakes in
the report [5 pt]
4. Please explain why the model is making mistakes (e.g., analyse the errors and report
any patterns or generalisations). [15 pt]

Marking Criteria

Submit a report to describe your results and findings by following the tasks detailed in the 6
steps above.

1. Quality of the report, including structure and clarity. No more than 1000 words. [15 pt]
2. Step 2 [10 pt]
3. Step 3 [10 pt]
4. Step 4 [10 pt]
5. Step 5 [35 pt]
6. Step 6 [20 pt]

Submission Guideline

You should submit a PDF version of your report along with your code via Blackboard by
23:59 PM (Friday) 9th December 2022. The name of the PDF file should have the form
“COM6115_Assessment-SA_< your Surname>_<your first name>_<Your Student ID>”. For
instance, “COM6115_Assessment-SA_Smith_John_XXXXX.pdf”, where XXXXX is your
student ID.

You might also like