Assignment 2 - Text Compression

The assignment focuses on the Huffman algorithm for text compression, requiring groups of three to analyze character frequencies from a provided message. Students will create a frequency tally, merge notes based on frequency, and derive binary encodings for each character. The assignment includes six questions to assess understanding of the process and its outcomes, with a submission deadline of February 18th.

Uploaded by

vaghelisz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views5 pages

Assignment 2 - Text Compression

Uploaded by

vaghelisz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS101 – Fundamentals of Computer and Information Sciences – LIU 1 of 5

Assignment 2 – text compression

due in class on Tue 18 Feb (40 points)

is assignment is an activity for groups of three. We’ll work on it in class on Wed 5
Feb, and then your group must submit one set of responses to the six questions before
the deadline.

Introduction

In this activity, we will investigate the Huffman algorithm for text compression. You’ve
already seen one example of a Huffman encoding, represented by the strange-looking
tree on the handout labeled “variable-bit Huffman encoding.”
You will follow the Huffman algorithm and create a tree of your own, based on the
character frequencies of a message that I provide.

Phase 1: count letter frequency

Start with a stack of blank sticky notes and the message you were given. We’re going to
consider each of the characters in your message, in order. Suppose the ﬁrst character
is a G. We would write the G on a sticky note – roughly at the center left – and also begin
a tally in the lower left corner. Leave some space above and below the character, as
shown:

Move on to the next character in your message. Assuming it is a diﬀerent character,

make a new sticky for that one.
When you encounter a character that you’ve seen before, do not create a new note,
but instead update the tally on the existing note containing that character. In this
example, we’ve just seen the character E for the third time:
2 of 5 Prof. League – Spring 2014 – Assignment 2 – text compression

Continue doing this for the entire length of your message. You will now have a count
of the frequencies of each character. Write the frequency in conventional (base ten)
notation in the upper left. Here’s a small sample:

In the next section, we will process these characters in order from lowest frequency
to highest. So you may want to take a moment now to arrange them in roughly that
order on your desktop.
Question 1: How many distinct characters did your message contain?
Question 2: If we were using a ﬁxed-width encoding, how many bits would you need
to represent just those characters?
Question 3: What is the most frequent character in your message, and how many
times did it appear?

Phase 2: merge tiles

e algorithm continues by repeatedly merging sticky notes, as described here. Start

by choosing two notes with the lowest frequencies. Probably you had several charac-
ters with a frequency of one, so you can just choose two of them arbitrarily. Place them
side by side in your work area:
CS101 – Fundamentals of Computer and Information Sciences – LIU 3 of 5

In the section beneath the character and starting from the right, write a zero on the
left note and a one on the right note:

en, stick the right one onto the bottom of the left one. Cross out the frequencies and
replace the top one with their sum – in this case, 1+1 is 2:

Now you will treat this ‘merged’ note as if it were a single one, so place it somewhere
among other characters with frequency=2.
Continue merging together your lowest-frequency letters like this. It’s okay to pair
a frequency=1 with a frequency=2 if it’s the last frequency=1 remaining – then the
merged frequency would be 3.
Before long, you’ll have to merge notes that themselves are already merged. In the
example below, we paired the last frequency=1 character (K) with a group (MJ) that
has frequency=2:
4 of 5 Prof. League – Spring 2014 – Assignment 2 – text compression

As before, we write a zero on the left note. And we write ones on all of the right notes…
to the left of whatever code is already there:

Again, stick the right note onto the bottom of the left one. Cross out the frequencies
and replace the top one with their sum – in this case, 1+2 is 3.

Continue merging notes using this technique until every character in your message is
merged into one big note. en you will have a distinct binary encoding underneath
CS101 – Fundamentals of Computer and Information Sciences – LIU 5 of 5

each character. Probably you should take a photo of your encoding, and/or write down
the bits produced for each character elsewhere.
Question 4: How many bits are used to represent the most frequent character in your
message?
Question 5: What is the most number of bits used to encode any character in your
message?
Question 6: Use the character encodings you produced to encode the entire message
you were given. How many bits are used, in total?

Visualize encoding as a tree

As in the handout on variable-bit Huffman encoding, the character encodings you pro-
duced should fit nicely into a binary tree. I’ll do a small example below. Our algorithm
has produced the encodings 00 for K, 010 for M, 011 for J, and 1 for E.
We interpret a 0 as choosing the left path in a binary tree, and 1 as the right path. So
to get to the K from the root we would go left, twice. For the E, we go right just once.
e M and J both have the prefix 01, so they sit at a “sub-tree” reached by going left
then right.

Task: Draw the entire tree corresponding to the character encoding you produced
using the Huﬀman algorithm.

You Do Not Need To Fully Understand This Section To Complete The Assessment.
No ratings yet
You Do Not Need To Fully Understand This Section To Complete The Assessment.
9 pages
2 2 5huffman
No ratings yet
2 2 5huffman
52 pages
5c. Huffman
No ratings yet
5c. Huffman
13 pages
L8 - Huffman Algorithm
No ratings yet
L8 - Huffman Algorithm
52 pages
L10 Huffman Encoding Greedy
No ratings yet
L10 Huffman Encoding Greedy
52 pages
Mmis G1 Ass
No ratings yet
Mmis G1 Ass
13 pages
Huffman Coding
No ratings yet
Huffman Coding
3 pages
Hauffman Coading
No ratings yet
Hauffman Coading
6 pages
Umair Week 7
No ratings yet
Umair Week 7
9 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
05 Compression
No ratings yet
05 Compression
46 pages
Di-Huffman Trees
No ratings yet
Di-Huffman Trees
44 pages
Lecture 14
No ratings yet
Lecture 14
52 pages
Huffman Coding
No ratings yet
Huffman Coding
16 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Huffman
No ratings yet
Huffman
70 pages
Postgrad Guide to Huffman Coding
No ratings yet
Postgrad Guide to Huffman Coding
13 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Huffman Coding: An Application of Binary Trees and Priority Queues
No ratings yet
Huffman Coding: An Application of Binary Trees and Priority Queues
43 pages
5.2 Huffman Algorithm
No ratings yet
5.2 Huffman Algorithm
12 pages
Huffman Coding for CS Students
No ratings yet
Huffman Coding for CS Students
12 pages
Lecture 26
No ratings yet
Lecture 26
2 pages
Huffman
No ratings yet
Huffman
17 pages
Huffman Coding for Tech Students
No ratings yet
Huffman Coding for Tech Students
77 pages
Problem E: Huffman Codes
No ratings yet
Problem E: Huffman Codes
2 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Lecture 15
No ratings yet
Lecture 15
3 pages
Huffman Code
No ratings yet
Huffman Code
5 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Huffman Coding
No ratings yet
Huffman Coding
30 pages
String Compression
No ratings yet
String Compression
13 pages
Huffman Coding: Efficient Encoding Algorithm
No ratings yet
Huffman Coding: Efficient Encoding Algorithm
16 pages
bc200410608 cs301 Assignment No 2
No ratings yet
bc200410608 cs301 Assignment No 2
6 pages
Huffman Coding for Tech Enthusiasts
No ratings yet
Huffman Coding for Tech Enthusiasts
5 pages
Dijkstra, Huffmancoding
No ratings yet
Dijkstra, Huffmancoding
55 pages
Data Compression
No ratings yet
Data Compression
18 pages
Huffman Code
No ratings yet
Huffman Code
7 pages
Getting Started: Huffman Coding
No ratings yet
Getting Started: Huffman Coding
5 pages
DAA Lab Practice 2
No ratings yet
DAA Lab Practice 2
15 pages
Huffman
No ratings yet
Huffman
24 pages
Greedy Huffman Coding
No ratings yet
Greedy Huffman Coding
7 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Huffman Coding Explained
No ratings yet
Huffman Coding Explained
45 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
210 Huffman Encoding
No ratings yet
210 Huffman Encoding
10 pages
Nikhil Devadas: Huffman Data Compression .!!!!
No ratings yet
Nikhil Devadas: Huffman Data Compression .!!!!
4 pages
Huffman
No ratings yet
Huffman
53 pages
Huffman Coding
No ratings yet
Huffman Coding
9 pages
Codes
No ratings yet
Codes
16 pages
Assignment No-05
No ratings yet
Assignment No-05
3 pages
Assignment 6: Huffman Encoding: Assignment Overview and Starter Files
No ratings yet
Assignment 6: Huffman Encoding: Assignment Overview and Starter Files
20 pages
Big Homework 2: General Mentions
No ratings yet
Big Homework 2: General Mentions
11 pages
Mini Project 2
No ratings yet
Mini Project 2
4 pages
Discrete Mathematics
No ratings yet
Discrete Mathematics
51 pages
Huffman Code
No ratings yet
Huffman Code
25 pages
Create A Huffman Tree
No ratings yet
Create A Huffman Tree
10 pages
Huffman Coding Explained
No ratings yet
Huffman Coding Explained
8 pages
Practical Sheet5 v1
No ratings yet
Practical Sheet5 v1
2 pages
Legal Issues 1
No ratings yet
Legal Issues 1
10 pages
1CP0 01 Que 20150603
No ratings yet
1CP0 01 Que 20150603
24 pages
Assignment 6 - Pseudocode
No ratings yet
Assignment 6 - Pseudocode
3 pages
GCSE Python Recap
No ratings yet
GCSE Python Recap
33 pages
Scheme of Work Cambridge International AS and A Level Computer Science 9608
No ratings yet
Scheme of Work Cambridge International AS and A Level Computer Science 9608
94 pages
Aqa 45122 W MS Jun15
No ratings yet
Aqa 45122 W MS Jun15
16 pages
Aqa 85201 SMS
No ratings yet
Aqa 85201 SMS
15 pages
Aqa 45122 W MS Jun14
No ratings yet
Aqa 45122 W MS Jun14
16 pages
Aqa 45122 QP Jun15
No ratings yet
Aqa 45122 QP Jun15
24 pages
Aqa 45122 QP Jun14
No ratings yet
Aqa 45122 QP Jun14
20 pages
Revision Guide - Paper 1
100% (1)
Revision Guide - Paper 1
11 pages
Unit j276 01 Computer Systems Sample Assessment Materials
No ratings yet
Unit j276 01 Computer Systems Sample Assessment Materials
28 pages
Aqa 85201 Ex
No ratings yet
Aqa 85201 Ex
28 pages
GCSE OCR J277 Question - Practice
100% (1)
GCSE OCR J277 Question - Practice
44 pages
Ethical, Legal and Environmental (examQuestions+ANSWERS)
No ratings yet
Ethical, Legal and Environmental (examQuestions+ANSWERS)
13 pages
Assessment Test Algorithms
No ratings yet
Assessment Test Algorithms
9 pages
Delphi Grade 10 Strings Multiple Choice Questions and Answers
No ratings yet
Delphi Grade 10 Strings Multiple Choice Questions and Answers
14 pages
BlynkSimpleEsp8266 H
No ratings yet
BlynkSimpleEsp8266 H
2 pages
2022 LDC Resume Alcon
No ratings yet
2022 LDC Resume Alcon
5 pages
Excel for Business and Home Users
No ratings yet
Excel for Business and Home Users
3 pages
Scientific Notation Explained
No ratings yet
Scientific Notation Explained
8 pages
Online Hostel Management System
0% (1)
Online Hostel Management System
48 pages
Junior Engineer Syllabus
No ratings yet
Junior Engineer Syllabus
16 pages
Remote DC Microgrid Power Strategy
No ratings yet
Remote DC Microgrid Power Strategy
19 pages
STM32L476RG STMicroelectronics
No ratings yet
STM32L476RG STMicroelectronics
271 pages
SEO Terms
No ratings yet
SEO Terms
33 pages
Solu 5
0% (2)
Solu 5
46 pages
RA SSL Everywhere Recommended Practices
No ratings yet
RA SSL Everywhere Recommended Practices
46 pages
Generative AI's Impact on Procurement
No ratings yet
Generative AI's Impact on Procurement
12 pages
Research Proposal - : Proposed Title
No ratings yet
Research Proposal - : Proposed Title
35 pages
Internet Banking
No ratings yet
Internet Banking
3 pages
Programming The VIC. The Definitive Guide To The Commodore VIC-20 Computer
No ratings yet
Programming The VIC. The Definitive Guide To The Commodore VIC-20 Computer
612 pages
Manual Parameter 828D
No ratings yet
Manual Parameter 828D
898 pages
KFC 11 Additional Exercises 1 2023 0213 PDF
No ratings yet
KFC 11 Additional Exercises 1 2023 0213 PDF
20 pages
Scorpion Tank Shell Inspection
100% (1)
Scorpion Tank Shell Inspection
4 pages
Abstract
No ratings yet
Abstract
1 page
Computational Geometry Course 67842 Final Exam Moed A: Question 1 - Largest Enclosing Circle: (33%)
No ratings yet
Computational Geometry Course 67842 Final Exam Moed A: Question 1 - Largest Enclosing Circle: (33%)
21 pages
2022 Torqeedo Catalog en International
No ratings yet
2022 Torqeedo Catalog en International
56 pages
DS-2CD2121G0-I (W) (S) Datasheet V5.5.3 20190308 PDF
No ratings yet
DS-2CD2121G0-I (W) (S) Datasheet V5.5.3 20190308 PDF
4 pages
How To Paint Realistic Skin of A Beautiful Girl - Text
93% (14)
How To Paint Realistic Skin of A Beautiful Girl - Text
148 pages
Department of Computer Science and Engineering (D.S)
No ratings yet
Department of Computer Science and Engineering (D.S)
17 pages
Modul 1 Sample Test
No ratings yet
Modul 1 Sample Test
5 pages
Grade 6 Math Review: Multiple Choice & Exercises
No ratings yet
Grade 6 Math Review: Multiple Choice & Exercises
10 pages
Talend - Making ETL Easy
0% (1)
Talend - Making ETL Easy
21 pages
m094 h7cc Digital Counter Tachometer Datasheet en
No ratings yet
m094 h7cc Digital Counter Tachometer Datasheet en
66 pages
Oracle BI Apps 11.1.1.8.1 Pre-Requisites
0% (1)
Oracle BI Apps 11.1.1.8.1 Pre-Requisites
82 pages

Assignment 2 - Text Compression

Uploaded by

Assignment 2 - Text Compression

Uploaded by

CS101 – Fundamentals of Computer and Information Sciences – LIU 1 of 5

Assignment 2 – text compression

Phase 1: count letter frequency

Move on to the next character in your message. Assuming it is a diﬀerent character,

Phase 2: merge tiles

e algorithm continues by repeatedly merging sticky notes, as described here. Start

Visualize encoding as a tree

You might also like