0% found this document useful (0 votes)

27 views47 pages

Module 2

This document discusses different types of text representation including unformatted text, formatted text, and hypertext. It describes the ASCII character set used for unformatted text and how mosaic characters can be used to create simple graphical images. Formatted text allows for styling of characters and structure of documents. Hypertext uses pages, hyperlinks, and HTML to electronically link documents. The document also covers text compression techniques including entropy encoding methods like run length encoding and Huffman coding. Static and dynamic Huffman coding algorithms are described.

Uploaded by

povir39461

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views47 pages

Module 2

Uploaded by

povir39461

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Text Representation and Compression

Module 2

Dr Pushpavathi.K.P, BMSCE
Types of text
• Unformatted text – Plain text.

• Formatted text – Rich text.

• Hypertext.

Dr Pushpavathi.K.P, BMSCE
Unformatted text
• ASCII character set.
• Printable characters- alphabetic, numeric, punctuation.
• Control characters:
1. Format control character- BS, SP, DEL, ESC.
2. Information separators - FS, RS.
3. Transmission control characters – SOH, STX, ETX, ACK, NAK,
SYN.
• Mosaic characters- supplementary version – to create relatively
simple graphical images.
Dr Pushpavathi.K.P, BMSCE
Character set to produce unformatted text
ASCII
character set

Dr Pushpavathi.K.P, BMSCE
Character set to produce unformatted text (cntd.)
Mosaic characters

Dr Pushpavathi.K.P, BMSCE
Formatted text

• Publishing sector.
• Characters of different styles and variable size.
• Structure a document into chapters, paragraph.
• Graphics and pictures inserted at appropriate points.
• Enter a specific command.
• Reserved format control character.
• Numeric or alphabetic character.

Dr Pushpavathi.K.P, BMSCE
Formatted text (cntd.)
a) formatted text string b) printed version of string

Dr Pushpavathi.K.P, BMSCE
Hyper Text
• Pages
• Defined linkage points-hyperlinks
• Electronic version of the documents.
• Browser.
• Home page
• URL
• HTML- presentation of the document.
• Directives – page formatting commands sandwiched between pair of
tags , eg: <p> , <B>text</B>

Dr Pushpavathi.K.P, BMSCE
Electronic document written using hypertext

Dr Pushpavathi.K.P, BMSCE
Dr Pushpavathi.K.P, BMSCE
Digital document

Dr Pushpavathi.K.P, BMSCE
Digital document (cntd.)
• Vertical resolution – 3.85 or 7.7 lines/mm or 100 or 200 lines/inch
• o/p of scanner – Resolution of apprx. 8 pels
• Bitonical images –black and white
• One binary digit –represent each pel
0 for white , 1 for black pel
• Typicla image – stream of 2 million bits - uncompressed

Dr Pushpavathi.K.P, BMSCE
Facsimile conversion codes
• ITU-T group 3 and group 4- standards
a) Termination codes – run length – 0 to 63
b) Make-up codes – run length – multiples of 64
• Table of code words –produced - based on relative frequency of
occurrence of no. of white and black pels in scanned lines.
• Code words are fixed

Dr Pushpavathi.K.P, BMSCE
Facsimile conversion codes (cntd.)

Dr Pushpavathi.K.P, BMSCE
Compression principles

• Source encoders and destination decoders.

• Lossless and lossy compression.
• Entropy encoding.
• Source encoding.

Dr Pushpavathi.K.P, BMSCE
Source Encoders and Destination Decoders

Dr Pushpavathi.K.P, BMSCE
Lossless and lossy compression

• Lossless compression algorithm – reversible.

• Example – text file.
• Lossy compression algorithm – version of it.
• Higher level of compression – more approximated the received
version.
• Example – images, audio and video streams.

Dr Pushpavathi.K.P, BMSCE
Entropy encoding:
• Lossless and independent.
• Information representation.

• Two examples:
1. Run length encoding.
2. Statistical encoding.

Dr Pushpavathi.K.P, BMSCE
Entropy encoding (cntd.)
• Entropy of the source – minimum average no of bits required to
transmit a particular source stream
• Qualitatively - defined as the useful information
• Ebits(C) = -log2(PC) , where Ebits(C) –entropy of character C, - in bits
PC – probability of character C in given data

• If the stream contains n characters or symbols ,

Η = 𝜂 = − σ𝑛𝑖=1 𝑃𝑖𝑙𝑜𝑔2(𝑃𝑖 ) = σ𝑛𝑖=1 𝑃𝑖𝑙𝑜𝑔2(1/𝑃𝑖 )

• Higher the probability of occurrence of a symbol, lower the entropy

Dr Pushpavathi.K.P, BMSCE
Run length encoding:
• Repetitive Character encoding
• Used - Source information comprises of long substrings of same
character or bit.
• Transmitted as different set of code words.
• Indicates particular transmitted bit and the number of characters in
substring.
• Destination knows the set of code words being used.

Dr Pushpavathi.K.P, BMSCE
Statistical encoding:
• Variable length code word.
• Prefix property.
• Huffman encoding algorithm.
• Entropy.
entropy of source
• Efficiency of encoding scheme =
average number of bits per codeword

Where, average number of bits per codeword = σ𝑛𝑖=1 𝑁𝑖𝑃𝑖

Dr Pushpavathi.K.P, BMSCE
Source encoding:
• Differential encoding.

• Transform encoding.

Dr Pushpavathi.K.P, BMSCE
Differential encoding:
• Amplitude of signal covers larger range, but difference in amplitude
between successive values is relatively small.
• Smaller set of codewords.
• Indicates difference in amplitude between current signal and
immediately preceding value.
• Can be lossless or lossy – depending on number of bits to encode
different value.

Dr Pushpavathi.K.P, BMSCE
Transform encoding
• Transforming the source information.
• No loss of information.
• Digitization of image produces two dimensional matrix.
• Rate of change in magnitude – Spatial frequency.
• Horizontal and vertical frequency components.
• Human eye is less sensitive to higher spatial frequency.
• DCT.

Dr Pushpavathi.K.P, BMSCE
Transform encoding: pixel pattern

Dr Pushpavathi.K.P, BMSCE
Transform encoding : DCT principle

Dr Pushpavathi.K.P, BMSCE
Text Compression
• Compression algorithm must be lossless.
• Entropy encoding – Statistical encoding.
• Two types of Statistical encoding:
1. Optimum set of codewords – Huffman coding, Arithmetic coding
2. Variable length characters – Lempel-Ziv (LZ) algoritm.
• Two types of coding :
1. Static coding.
2. Dynamic or adaptive coding.

Dr Pushpavathi.K.P, BMSCE
Static Huffman Coding
• Transmitted character string is analyzed.
• Determine character types and relative frequency.
• Coding operation involves creating an unbalanced tree.
• Huffman code tree – Binary tree.
• Branches are assigned the value 0 or 1.
• Root node (top), branch node , leaf node (termination point).

Dr Pushpavathi.K.P, BMSCE
Huffman code tree
Example - character string : AAAABBCD (Final tree with codes)

Dr Pushpavathi.K.P, BMSCE
Dr Pushpavathi.K.P, BMSCE
Decoding Algorithm

• Assume code words is available at receiver.

• Received bit stream – variable BIT-STREAM.
• Bits in each code word – variable CODEWORD.
• ASCII code word – variable RECEIVE-BUFFER.

Dr Pushpavathi.K.P, BMSCE
Dr Pushpavathi.K.P, BMSCE
Dr Pushpavathi.K.P, BMSCE
Decoding (cntd.)

• code words related to data.

• Done in two ways.
• Adaptive Compression.
• Disadvantage- new set of code words for new data- overheads

Dr Pushpavathi.K.P, BMSCE
Dynamic (adaptive) Huffman Coding

• Both transmitter and receiver build the Huffman tree –dynamically.

• If the character to be transmitted is present in tree – its codeword is
determined and sent in normal way.
• If not present (first occurrence) – character is transmitted in
uncompressed form.
• Tree is updated by – 1. incrementing the frequency of occurrence
or 2. introducing the new character to the tree.

Dr Pushpavathi.K.P, BMSCE
Adaptive Huffman tree

Dr Pushpavathi.K.P, BMSCE
Ref: multimedia communication , components, Techniques, standards – Krishna kumar D N
Incrementing the count : if next symbol is ‘A’

Dr Pushpavathi.K.P, BMSCE
Ref: multimedia communication , components, Techniques, standards – Krishna kumar D N
Sibling property - swapping
One more increment in ‘A’

Dr Pushpavathi.K.P, BMSCE
Ref: multimedia communication , components, Techniques, standards – Krishna kumar D N
Swapping of internal nodes
Receiving two more ‘A’ s

Dr Pushpavathi.K.P, BMSCE
Ref: multimedia communication , components, Techniques, standards – Krishna kumar D N
Arithmetic Coding

• Complicated than Huffman coding.

• Assigns one code for each encoded string of characters – unlike
Huffman that use separate codeword for each character.
• Coding start with a certain interval – read input symbol by symbol –
use the probability of each symbol to narrow the interval
• High probability symbols contribute fewer bits to output
• Output is interpretedas a number in the range [0,1).

Dr Pushpavathi.K.P, BMSCE
Arithmetic Coding (cntd.)
e=0.3, n=0.3, t=0.2, w=0.1, .=0.1 ‘WENT’

Dr Pushpavathi.K.P, BMSCE
Arithmetic Coding (decoding)
• Decoder knows:
• Set of characters present in the received encoded messages
• Segment to which each character has been assigned
• Its related range.
• Follow the same procedure as in encoder.
• No. of decimal digits increase linearly with no. of characters in string.
• Complex messages –first fragmented into multiple smaller strings –
encoded separately
• Resulting set of code words – sent as blocks of floating point numbers
(binary) in known formats.

Dr Pushpavathi.K.P, BMSCE
Lempel-Ziv Coding
• Uses strings of characters for the coding instead of single characters.
• Table holds character string of the text (all possible words).
• Encoder sends only the index of the word.
• Decoder use this index to access the word.
• Table is used as dictionary.
• Dictionary based compression algorithm.
• Word processing packages have dictionary-15 bits (25000 words).
• Shorter words – lower compression ratio, longer words – higher.

Dr Pushpavathi.K.P, BMSCE
Lempel-Ziv-Welsh Coding

• Contents of dictionary is built dynamically by encoder and decoder.

• Initially dictionary consists of only the character set-ASCII.
• To determine compression level - no. of entries in dictionary – no.of
bits needed for index.
• Ex : this is simple as it is

Dr Pushpavathi.K.P, BMSCE
Lempel-Ziv-Welsh (LZW) compression algorithm
Basic operation

Dr Pushpavathi.K.P, BMSCE
Dr Pushpavathi.K.P, BMSCE
Reference
• Multimedia communications- Applications , networks, Protocols and
Standards – Fred Halsall

Dr Pushpavathi.K.P, BMSCE

Compressor Principles
No ratings yet
Compressor Principles
32 pages
Data Compression Techniques
100% (1)
Data Compression Techniques
18 pages
Mod 2 Multimedia Communications
No ratings yet
Mod 2 Multimedia Communications
20 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
Bec613a MMC Module 3 Notes
No ratings yet
Bec613a MMC Module 3 Notes
35 pages
Notes Module2
No ratings yet
Notes Module2
16 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
Data Compression: CS 147 Minh Nguyen
No ratings yet
Data Compression: CS 147 Minh Nguyen
25 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
71 pages
Unit3 Ece MMC 6th Sem
No ratings yet
Unit3 Ece MMC 6th Sem
96 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
57 pages
Unit 5 - Data Compression
No ratings yet
Unit 5 - Data Compression
46 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
MMC Unit II
No ratings yet
MMC Unit II
40 pages
Data Compression: by Dilip Jha Assistant Prof. GBPEC, Pauri
No ratings yet
Data Compression: by Dilip Jha Assistant Prof. GBPEC, Pauri
21 pages
Multimedia Communication Vtu Unit 3
No ratings yet
Multimedia Communication Vtu Unit 3
44 pages
Multimedia Communication Vtu Unit 3
71% (7)
Multimedia Communication Vtu Unit 3
44 pages
Mod 3
No ratings yet
Mod 3
69 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Chapter Presentation
No ratings yet
Chapter Presentation
57 pages
Data Compression
No ratings yet
Data Compression
22 pages
Data Compression Techniques
100% (1)
Data Compression Techniques
57 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Digital Coding for ELEC1010 Students
No ratings yet
Digital Coding for ELEC1010 Students
72 pages
Module 3
No ratings yet
Module 3
23 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Bec613a MMC Mod3
No ratings yet
Bec613a MMC Mod3
50 pages
Ch8c Data Compression
No ratings yet
Ch8c Data Compression
7 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
CH 6
No ratings yet
CH 6
21 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Unit Iv - Multimedia File Handling: Compression and Decompression
No ratings yet
Unit Iv - Multimedia File Handling: Compression and Decompression
49 pages
MMC Module Iii-1
No ratings yet
MMC Module Iii-1
73 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Data Compression 1
No ratings yet
Data Compression 1
25 pages
MMC Chap3
100% (1)
MMC Chap3
22 pages
CG Unit 5
No ratings yet
CG Unit 5
76 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Data Compression
No ratings yet
Data Compression
25 pages
Basics of Information Theory
No ratings yet
Basics of Information Theory
21 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Lossless Compression
No ratings yet
Lossless Compression
36 pages
Data Compression Techniques Guide
No ratings yet
Data Compression Techniques Guide
41 pages
MMC Module 3
No ratings yet
MMC Module 3
65 pages
Module IV
No ratings yet
Module IV
37 pages
Bec613a - MMC - Module 3
No ratings yet
Bec613a - MMC - Module 3
55 pages
Compression Techniques
No ratings yet
Compression Techniques
24 pages
Audio and Video Coding PDF
No ratings yet
Audio and Video Coding PDF
72 pages
Module 3
No ratings yet
Module 3
78 pages
PSP Unit 2 ManagingIputOutput Branching Loop C 11 7 2022 3pm
No ratings yet
PSP Unit 2 ManagingIputOutput Branching Loop C 11 7 2022 3pm
129 pages
Module 1
No ratings yet
Module 1
70 pages
PSP Unit 1 AlgorithmsFlowchart 29 6 2022 3pm
No ratings yet
PSP Unit 1 AlgorithmsFlowchart 29 6 2022 3pm
16 pages
Midi Multimedia
No ratings yet
Midi Multimedia
5 pages
PSP Unit 1 Overview of C 12 7 2022 2pm
No ratings yet
PSP Unit 1 Overview of C 12 7 2022 2pm
155 pages
PSP Unit 1 Introduction 25 6 2022 5am
No ratings yet
PSP Unit 1 Introduction 25 6 2022 5am
37 pages
RS ST 232 - 485-DB9
No ratings yet
RS ST 232 - 485-DB9
2 pages
AAU3920 Huawei PDF
No ratings yet
AAU3920 Huawei PDF
2 pages
Irl3713Spbf: Smps Mosfet
No ratings yet
Irl3713Spbf: Smps Mosfet
13 pages
Mate User Manual
No ratings yet
Mate User Manual
27 pages
Sel 351 3
No ratings yet
Sel 351 3
1 page
Stellarmate Plus Manual
No ratings yet
Stellarmate Plus Manual
236 pages
Lcb3a Prices en
No ratings yet
Lcb3a Prices en
3 pages
Protecting Against USBMalware
No ratings yet
Protecting Against USBMalware
33 pages
User Manual PE-6000
No ratings yet
User Manual PE-6000
32 pages
CCTV Specs
No ratings yet
CCTV Specs
2 pages
Module-2 Baseband Transmission: Intersymbol Interference
No ratings yet
Module-2 Baseband Transmission: Intersymbol Interference
22 pages
Real-Time Kinematic (RTK) GPS Guide
No ratings yet
Real-Time Kinematic (RTK) GPS Guide
9 pages
Spotify Ad Studio for Musicians
No ratings yet
Spotify Ad Studio for Musicians
3 pages
Differential Amplifier Using BJT
100% (3)
Differential Amplifier Using BJT
11 pages
Spokeo Background+ Report Marcia Walker 20250305
No ratings yet
Spokeo Background+ Report Marcia Walker 20250305
18 pages
Industrial Refrigeration Solutions
No ratings yet
Industrial Refrigeration Solutions
2 pages
A Project Report On Scrolling Led Display Board: Semester 4
100% (1)
A Project Report On Scrolling Led Display Board: Semester 4
21 pages
Digital Signal Processing Two Marks
No ratings yet
Digital Signal Processing Two Marks
11 pages
Cisco Secure Firewall 3100 Getting Started Guide: Americas Headquarters
No ratings yet
Cisco Secure Firewall 3100 Getting Started Guide: Americas Headquarters
176 pages
OMB023800 MBTS GUL V100R008 Product Description ISSUE 1.00 - Dicotel PDF
No ratings yet
OMB023800 MBTS GUL V100R008 Product Description ISSUE 1.00 - Dicotel PDF
168 pages
UVA 126638 en US 1118 3 PDF
No ratings yet
UVA 126638 en US 1118 3 PDF
4 pages
Vacon Opte3
No ratings yet
Vacon Opte3
64 pages
Networking Glossary: 150+ Words You Should Know
No ratings yet
Networking Glossary: 150+ Words You Should Know
19 pages
Introduction To The Connected World
No ratings yet
Introduction To The Connected World
8 pages
Inhand Router 615 - Double
No ratings yet
Inhand Router 615 - Double
2 pages
Eko CORE User Manual
No ratings yet
Eko CORE User Manual
834 pages
EDC1050 Datasheet en
No ratings yet
EDC1050 Datasheet en
1 page
Nokia Private Wireless - Johan - Nokia
No ratings yet
Nokia Private Wireless - Johan - Nokia
18 pages
Features and Packaging of DSP Processors
No ratings yet
Features and Packaging of DSP Processors
2 pages
Havells Wire Catalogue
No ratings yet
Havells Wire Catalogue
8 pages

Module 2

Uploaded by

Module 2

Uploaded by

Text Representation and Compression

• Formatted text – Rich text.

• Source encoders and destination decoders.

• Lossless compression algorithm – reversible.

• If the stream contains n characters or symbols ,

• Higher the probability of occurrence of a symbol, lower the entropy

Where, average number of bits per codeword = σ𝑛𝑖=1 𝑁𝑖𝑃𝑖

• Assume code words is available at receiver.

• code words related to data.

• Both transmitter and receiver build the Huffman tree –dynamically.

• Complicated than Huffman coding.

• Contents of dictionary is built dynamically by encoder and decoder.

You might also like