Lesson7 LZ77
Lesson7 LZ77
1 Introduction / Overview
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
Overview
In this chapter, you are going to learn about:
• Introduction to LZ77
• How to make encoder and decoder of LZ77
• Examples on encoder and decoder of LZ77
1. Introduction > 1.2 Learning Content
Image Processing
I. General knowledge in image p 1. Introduction to Image Processing
rocessing and multimedia 2. Data Structure and Color of Images
3. Ms. Visual Studio 2008 and OpenCV
4. Introduction to Multimedia Systems
5. Introduction to Video and Lossless Compression
6. Huffman Coding
7. LZ77
8. LZ78
9. LZW
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
Objective
Upon completion of this chapter, you will be able to:
• Understand concept of LZ77
• Use method of compression and decompression of LZ77 for applying to text.
1. Introduction > 1.5 Keywords
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
Keywords Description
Metadata is XML-formatted data that defines the characteristics of an update, including its title, description, rul
es for determining whether the update is applicable to a client computer, and instructions for installin
g the update content.
Input stream is the sequence of bytes to be compressed.
Coding position is the position of the byte in the input stream that is currently being coded (the beginning of the look
ahead buffer).
Lookahead buffer is the byte sequence from the coding position to the end of the input stream.
Window is a buffer of size that indicates the number of bytes from the coding position backward. The window
is empty at the beginning of the compression, then grows to size as the input stream is processed.
Once it reaches size, it then "slides" along with the coding position.
Pointer is information about the starting offset of the match in the window and its length. The starting offset i
s expressed as the count of bytes from the coding position backwards into the window. The length is
the number of bytes to read forward from the starting offset.
Match is the string that is used to find a match of the byte sequence between the lookahead buffer and the
window.
2. Learn> Topic: 1.1 Introduction to LZ77
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Q: What is LZ77?
▪ A: LZ77 is come from the full word “Lempel-Ziv Data Compression Algorithm, 19
77”.
▪ The data compression algorithm developed in 1977 by Abraham Lempel and Jac
ob Ziv.
▪ It became a basis for enabling data transmission via the Internet in an efficient w
ay.
▪ It contributed significantly in making the Internet a global communication medium
(1)
Learning
.
Contents
LempelZiv Sign
6
2. Learn> Topic: 1.1 Introduction to LZ77
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
7
2. Learn> Topic: 1.2 Algorithm of LZ77
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ The LZ77 compression algorithm is used to analyse input data and determine ho
w to reduce the size of that input data by replacing redundant information with met
adata.
▪ The encoder needs to keep this data to look for matches, and the decoder needs
to keep this data to interpret the matches the encoder refers to.
8
2. Learn> Topic: 1.2 Algorithm of LZ77
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
9
2. Learn> Topic: 1.2 Algorithm of LZ77
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
begin
fill view from input
while (view is not empty) do
(1) begin
Learning
Contents
find longest prefix p of view starting in coded part
i := position of p in window
j := length of p
X := first char after p in view
output(i, j, X)
add j+1 chars
end
end
10
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 1:
➢ Compare 6 characters from first block with second block.
(1) abcaacdaabcdabbbab
Learning
Contents o “abcaac” ≠ “abcdab” → move 1 character from first block
o “bcaacd” ≠ “abcdab” → move 1 character from first block
o “caacda” ≠ “abcdab” → no more character from first block.
o So, remove 1 character at the end from second block.
o It rests 5 characters from second block: “abcda”.
11
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 1 (cont.):
➢ Compare 5 characters from first block with second block.
abcaacdaabcdabbbab
12
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 1 (cont.):
➢ Compare 4 characters from first block with second block.
abcaacdaabcdabbbab
13
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 1 (cont.):
➢ Compare 3 characters from first block with second block.
abcaacdaabcdabbbab
o “abc” = “abc” → match or equal each other
o Start to give index in the first block (8 character) from 1 by reversing fr
om the end to the beginning.
(1) abcaacdaabcdabbbab
Learning 87654321
Contents
o Use formula: Codeword<position, length, C(x)> (x is next character
from the last matching character in second block)
o If no match all character from first block and second block, use formul
a: Codeword<0, 0, C(0)> (0 is first character in second block).
o According to formula, we get: Codeword<8, 3, C(d)> (n=length=3)
14
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
15
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 2 (cont.):
➢ Compare 5 characters from first block with second block.
acdaabcdabbbab
16
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 2 (cont.):
➢ Compare 4 characters from first block with second block.
acdaabcdabbbab
17
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 2 (cont.):
➢ Compare 3 characters from first block with second block.
acdaabcdabbbab
18
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 2 (cont.):
➢ Compare 2 characters from first block with second block.
acdaabcdabbbab
19
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
20
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 3 (cont.):
➢ Compare 2 characters from first block with second block.
aabcdabbbab
21
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Step 3 (cont.):
➢ Compare 1 characters from first block with second block.
aabcdabbbab
22
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
23
2. Learn> Topic: 1.3 Encoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
24
2. Learn> Topic: 1.4 Decoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
abcaacdaabc
87654321
abcaacdaabcd
87654321
25
2. Learn> Topic: 1.4 Decoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
abcaacdaabcd
abcaacdaabcdabb
87654321
26
2. Learn> Topic: 1.4 Decoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
abcaacdaabcdabb
abcaacdaabcdabbba
87654321
27
2. Learn> Topic: 1.4 Decoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
abcaacdaabcdabbba
a b c a a c d a a b c d a b b b a b EOF
87654321
➢ Decoder: “abcaacdaabcdabbbab”
28
2. Learn> Topic: 1.4 Decoder
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Conclusion:
➢ The result of decoder must exactly the same as input string.
➢ Number of steps in decoder part must equal to number of steps in encoder
part.
➢ Unless, you have to check your encoder part and decoder part again.
➢ If you do wrong in any steps in encoder or decoder, you will get wrong resu
lt.
➢ Be careful at moving window and matching characters!
(1)
Learning
Contents
29
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Example1: Input string: “bcabadbcac”. Find encoder and decoder of LZ77? Assu
me that we take first block = 5 and second block = 3.
➢ bcabadbcac
30
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
cabadbcac
54321
➢ Codeword<3, 1, C(c)> (n=1)
31
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
badbcac
54321
➢ Codeword<4, 1, C(c)> (n=1)
32
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
33
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Decoder:
➢ Input string: {“bcaba”, <0, 0, C(d)>, <3, 1, C(c)>, <4, 1, C(c)>}
34
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
bcabad
35
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
bcabadbc
bcabadbcac
54321
➢ Decoder: “bcabadbcac”
36
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
aabacdabaccbadb
654321
➢ Codeword<5, 3, C(c)> (n=3)
37
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
cdabaccbadb
654321
➢ Codeword<6, 1, C(b)> (n=1)
38
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
abaccbadb
654321
➢ Codeword<6, 1, C(d)> (n=1)
39
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
accbadb
654321
➢ Codeword<3, 1, null> (n=1)
➢ So, we stop here.
40
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Encoder = {“aabacd”, <5, 3, C(c)>, <6, 1, C(b)>, <6, 1, C(d)>, <3, 1, null>}
Input string
aabacdabaccbadb
(1)
Learning
Contents
Encoder
41
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
▪ Decoder:
➢ Input string: {“aabacd”, <5, 3, C(c)>, <6, 1, C(b)>, <6, 1, C(d)>, <3, 1, null>
}
aabacdaba
654321
aabacdabac
654321
42
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
aabacdabac
aabacdabaccb
654321
43
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
aabacdabaccb
aabacdabaccbad
654321
44
2. Learn> Topic: 2. Examples
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
aabacdabaccbad
a a b a c d a b a c c b a d b EOF
654321
➢ Decoder: “aabacdabaccbadb”
45
4. Outro > 4.1 Summarize
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
Summarize
▪ LZ77 is come from the full word “Lempel-Ziv Data Compression Algorithm, 1
977”. It became a basis for enabling data transmission via the Internet in an ef
ficient way. It contributed significantly in making the Internet a global commun
ication medium.
Reference
▪ https://en.wikipedia.org/wiki/LZ77_and_LZ78
▪ https://msdn.microsoft.com/en-us/library/ee916854.aspx
▪ http://www.stringology.org/DataCompression/lz77/index_en.html
▪ https://sites.google.com/site/datacompressionguide/lz77
4. Outro > 4.3 Assignment
A : Text-based + Audio
□ B : Text-based + Video
□ C : Only Video
Assignment
Overview
• Introduce to LZ78
• Concept of LZ78
LZ78
1. Concept of LZ78
Next Lesson 2. Examples
Title