2/27/2015
Data Compression
Eng. Reem Sabah Ahmed
Computer Engineering Department
Engineering College
Al-Nahrain University
Data Compression
Introduction
Lossless
and lossy
compression.
Run-Length coding
Delta coding.
Huffman
method(coding)
Huffman
method(decoding)
Arithmetic
method(coding)
Arithmetic
method(decoding)
Dictionary methods
Lempel-Ziv
Algorithm
Image compression
Video compression
2/27/2015
Run-Length Coding
Run length encoding (RLE) is a simple technique to
compress digital data by representing successive runs
of the same value in the data as the value followed
by the count, rather than the original run of values.
The goal is to reduce the amount of data needed to
be stored or transmitted.
Run length encoding (RLE) is one very simple lossless
data compression algorithm called run-length
encoding that can be very useful in some cases.
Run-length encoding (RLE) is a very simple form of
data compression in which consecutive sequences
of the same data value (runs) are stored or
transmitted as a single data value and count,
rather than as the original individual data
elements. This is particularly useful for data that
contains many such runs such as simple graphic
images and faxed documents. If the original
doesnt have many runs, it could increase rather
than decrease the size of the data file or
transmission.
2/27/2015
A fax machine scans a page and represents that image as
black or white pixels, which are sent over telephone
lines to another fax machine that will print the pixels
onto a blank page. The total number of pixels to be
transmitted may be very large which would result in
lengthy transmission times. Because fax images often
have large blocks of white (e.g. margins and inter-line
spacing) or black (e.g. horizontal lines) they are readily
amenable to run-length encoding.
this compression under the category of lossless
compression algorithms. The decoded data will be the
exact same as the original data.
The following are the main considerations that apply to
this technique:
Text in a natural language (as well as names) may have many
doubles and a few triplesas in AAA (an acronym), abbess,
Emmanuelle, bookkeeper, arrowwood, freeer (old usage),
and hostessship (used by Shakespeare)but longer runs are
limited to consecutive spaces and periods.
In a bi-level image there are two types of symbols, namely
black and white pixels, so runs of pixels alternate between
these two colors, which implies that RLE can compress such
an image by replacing each run with its length.
2/27/2015
In general, the data to be compressed includes several
types of symbols, so RLE compresses a run by replacing
it with a pair (length, symbol).
If the run is short, such a pair may be longer than the
run of symbols, thereby leading to expansion. A
reasonable solution is to write such short runs on the
output in raw format, so at least they do not cause
expansion. However, this raises the question of
distinguishing between pairs and raw items, because in
the output file both types are binary strings. A practical
RLE program must therefore precede each pair and each
raw item with a 1-bit indicator.
Thus, a pair becomes the triplet (0, length, symbol) and a
raw item becomes the pair (1, symbol).
Runs have different lengths, which is why the pairs and
triplets have different lengths. It therefore makes sense
to replace each by a variable-length code and write the
codes on the output. Thus, RLE is normally just one step
in a multistep compression algorithm that may include a
transform, variable-length codes, and perhaps also
quantization.
2/27/2015
Examples of Run Length coding
Example1
As a basic example, consider the following string of numbers:
555588822222
There is a fair amount of redundancy there. In RLE notation, this
same string could be expressed as:
453852
This indicates a run of 4 5 numbers, followed by a run of 3 8
numbers, followed by a run of 5 2 numbers. Suppose that each
number was represented by a byte on disk. The original encoding
requires 12 bytes. The RLE encoding requires 6 bytes. 2:1
compression in this simple case not bad.
Example 2
consider the following number string:
123456
Apply the same RLE compression scheme as before:
111213141516
So the original string was 6 bytes and the RLE string is 12
bytes. That did not go too well as the compression ratio is
now 1:2. This is why pure RLE implementations have a mode
for encoding strings of dissimilar numbers as well. It usually
works by encoding a negative number that, when negated,
will give the number of succeeding bytes that are not RLE
data.
2/27/2015
Example 2 Cont.
Since the above example contains a string of 6 dissimilar
numbers, the encoding would be:
-6 1 2 3 4 5 6
A decoder would see that the RLE code is negative, negate it
to get 6, and then copy 6 bytes from the encoded byte stream
to the decoded byte stream.
Example 3
Using the basic concepts described so far, lets decode the
following byte stream:
4 5 -6 1 2 3 4 5 6 3 8 5 2
This is a combination of examples from above and decodes
to:
555512345688822222
Work through it by hand as necessary. So the encoded string
is 13 bytes and the decoded string is 18 bytes. Modest
compression. RLE compression naturally works best where
there are long runs of the same pixel value. At worst, there
will be no compression if there are almost no pairs of
adjacent pixels with the same value.
2/27/2015
Summery
The basic RLE principle is to detect sequences of repeated
data values and after that to replace this sequence with two
elements:
- the number of the same characters.
- the character itself.
Useful for compressing data that contains repeated
values
Very simple compared with other compression
techniques
Reversible (Lossless) compression: decompression is just
as easy
Question?
2/27/2015
Thank you