Multimedia Networking
(Networked) Multimedia Applications
•Definition: Networked applications that employ audio or
video
•Commonly used nowadays (e.g., YouTube, BBC iPlayer,
Netflix, Skype)
•It would take an individual more than 5 million years to
watch the amount of video that will cross global IP networks
each month in 2021*.
•IP video traffic will be 82 percent of all the consumer
Internet traffic by 2021*.
*Source: Cisco Visual Networking Index: Forecast and Methodology, 2016– 2021
(Networked) Multimedia Applications
•Traditional elastic application like email, file transfer
and web-browsing
–Are delay tolerant but loss in-tolerant
• While multimedia applications
–Can tolerate loss
–But not delay (and jitter), especially for interactive audio/video
communication and live streaming
F o r conversational voice, delays ≤ 150msfor best user experience while > 400ms
unacceptable
Outline
• Multimedia data (audio and video)
–Features
–Compression
–Service requirements
–Design issues and protocols
Multimedia Data
-Properties of Audio Data
-Properties of Video Data
Properties of Audio Data
(Pulse Code Modulation)
-Digital Audio data has much less bandwidth
requirements than that of Video.
-Analog audio signal is sampled at a fixed rate (e.g. 8000
samples/sec, Human hearing and voice range is about 20 Hz to 20
kHz, most sensitive at 2 to 4 KHz.). The value of each sample is a
real value.
Properties of Audio Data
(Pulse Code Modulation)
-Each sample value is then rounded to one of the finite
number of values. This operation is called quantization.
-The finite values are called quantization values, and are always
in power of 2, e.g. 256 quantization
val ues are 2 power 8. quantization
error
quantized
value of
analog value
audio signal amplitude
analog
signal
time
sampling rate
(N sample/sec)
Properties of Audio Data
(Pulse Code Modulation)
-Number of bits required to send a sample is thus a Byte in our
example.
-Once each sample is assigned a quantization value, a digital
signal is formed by concatenating the corresponding
quantized samples. In our example 64000 bps is the rate of
digital signal.
-When playing back the digital signal via an audio speaker, it is
decoded back into audio signal; but due to loss in sampling
and quantization steps… original signal cannot be recovered.
Properties of Audio Data
(Pulse Code Modulation)
-Audio CD also uses PCM: sampling rate of 44100 samples per
second and 16 bits per sample on a linear scale
–705.6 Kbps (mono) and 1.411 Mbps (stereo).
-PCM-encoded speech and music is rarely used in Internet.
-The reason is to have reduced bit rate.
Properties of Audio Data
(Audio Compression Overview)
•Audio transmitted on the Internet is MP3 compression
typically compressed with a reduced bit
rate.
•Human speech offers a great potential for
compression to as low as 2.4 Kbps
•MPEG 1 audio layer 3 (MP3): popular audio
compression technique for music streaming
encodes at different layers, the most
common is 128 Kbps rate.
•Ears more sensitive than eyes Audio
glitches need to be kept minimal to
ensure good user experience
Properties of Audio Data
(MP3 Compression)
Also Known as Perceptual Coding
–Remove frequencies that human ear cannot hear (i.e. very high and low
frequencies are removed).
–Also if both a louder and softer sounds are played at same time, the
filtering ignores the softer ones (i.e. simultaneous/temporal masking). Our
ears and minds cannot separate events that are close in time.
–As a sound becomes quiter and quiter,humans are able only to make out less
and less detail. The encoder thus not saves every single detail of quite sounds
(i.e. minimum audition threshold)
–Reduction by a factor of 10 (33MB songs on CD can be compressed
to about 3MB)
Properties of Audio Data
(MP3 Compression steps)
1.Filter Bank: It divides sounds into sub-bands of frequency
2.Psychoacoustic model: It utilizes the concept of auditory masking, that
determines what can and cannot be heard in each sub-band.
3.Quantization and Coding: Bit code allocation to remaining
samples.
4.Bit stream formatting block: Accumulates all the information and
processes it into a bit-stream.
Properties of Video Data (High bit
rate requirement)
•High bit rate compared to music streaming and image
transfers.
–From 100 Kbps for low quality video conferencing to over 3 Mbps for
streaming HD movies
–Internet video streaming and downloads will grow to more than 80% of
all consumer Internet traffic by 2019 (Data via Cisco)
Comparison of bit rate requirements of three internet applications.
Properties of Video Data
(Video Compression Overview)
•Video is a sequence of images that are typically displayed at a
constant rate, typically referred to as frames per second (fps),
e.g., 24 or 30 fps
Properties of Video Data
(Video Compression Overview)
•Image is an array of pixels, each encoded into a number of
bits to represent luminance and color.
• Two types of redundancies in Video
–Spatial redundancy: An image having large portion with white spaces or
similar color can be compressed much without quality degradation.
–Temporal redundancy: When an image and a subsequent image are
same, no need to encode the second image,
Properties of Video Data
(Video Compression Overview)
•Video can exploit both spatial and temporal
redundancy
•Multiple versions are created of same video for adaptive
delivery depending on network conditions (end-to-end
available bandwidth) and host characteristics
……………………...…
spatial coding example:
instead of sending N values temporal coding example:
of same color (all purple), instead of sending
send only two values: color complete frame at i+1,
value (purple) and number send only differences from
of repeated values (N) frame i
frame i frame i+1
Properties of Video Data
(Spatial Compression: JPEG)
Properties of Video Data
(Spatial Compression: JPEG)
1. Preprocessing A
RGB to YCbCr Divide into 8X8 blocks
A B
Subtract 127 from each pixel intensity
Properties of Video Data
(Spatial Compression: JPEG)
2. Transformation : C=UBU’ ( where U is an 8x8 special block and B is the last block in
the Pre-processed step), DCT pushes high intensity values to left upper corner.
U: Fixed block C: DCT block
Properties of Video Data
(Spatial Compression: JPEG)
3. Quantization : Elements near zero will converted to zero and other elements will
be shrunk so that their values are closer to zero. Divide each element in set C by
corresponding element in set Z (a predefined set) and round off the resultant value.
C Z
Q
Properties of Video Data
(Spatial Compression: JPEG)
3. Encoding : Original Image: 160*240*8 = 307,200 bits
Transformed Image: 85,143 bits, (saving of over 70%)
(Most a few non-zero values and then string of zeros represented as EoF)
Properties of Video Data
(Temporal Compression: MPEG)
1. Intra-frame (I)
2. Predicted frame (P)
3. Bi-directional frame (B)
Properties of Video Data
(Temporal Compression: MPEG)
-MPEG (Moving Picture Expert Group) is broken up in Group of
Pictures (GoP)
- GoP consists of I,P and B frames.
- Transmit order IPBBBPBBB
- The first frame must be I frame: It consists of the the entire picture.
P frame
- P frame contains the difference in information from the I
frame.
- So it may consist some 40 to 50 % of the information that I
frame originally contains
-B frame shows how the things move in between (across a macro block
16x16)
- It relates both to the I and P frame
- It contains information, how to render the objects in
macroblocks (like their directions or vectors)
Properties of Video Data
(Temporal Compression: MPEG)
- I and P frames in a GoP are generated and transmitted before B frames
- But each frame has given a sequence number
-The decoder at receiving end then positions them correctly before de-
compression.
GoP
IBBBPBBBP
Only the differences between frames are encoded for each GoP
Reference: http://joshweeklyatsit.blogspot.co.uk/2016/03/interframe-and-intra-frame.html
Networked Multimedia Applications
3 Types
1.Streaming stored audio/video (e.g., Netflix, Hulu,
Kankan, BBC iPlayer, YouTube, YouKu, Dailymotion)
2.Streaming live audio/video (e.g., Internet radio, IPTV)
3.Conversational voice and video over Internet (e.g., Skype,
Google Talk)
1. Streaming Stored Audio and Video
(Features)
• Media is pre-recorded.
O n l i n e movie watching sites (Netflix, Hulu, Lovefilm, Kankan, etc.)
C a t c h up TV (BBC iPlayer,etc.)
O n l i n e video sharing sites (YouTube, Dailymotion, YouKu,etc.)
•Streaming stored video contains both video and audio
components
–Streaming stored audio alone much lower bit rates, hence less
challenging
–Streaming stored video sometimes also referred to as video- on-demand
(VoD)
1. Streaming Stored Audio and Video
(Storage/ sharing)
• Multimedia content stored on servers, possibly in
different encodings.
•Content is often placed on a content distribution network
(CDN) , rather than a single data centre, for faster access.
•With P2P streaming applications, peers hold different chunks
of content and they collectively form the “server”. The chunks
arrive from different peers speeding up sharing.
1. Streaming Stored Audio and Video
(3 Requirements)
–Streaming
V i d e o plays out after a few seconds of the beginning of thedownload
Simultaneous playout of earlier part and download of later part
T h i s iscalled streaming (as opposed to download-and-play)
–Interactivity
P a u s e , reposition forward, reposition backward,fast-forward
R e s p o n s e within few seconds
–Continuous
W h e n playout begins, it proceeds according to the originaltime of recording.
W h e n not receiving frames in time causes stalls on the clientsd
ieand degrades
user experience
1. Streaming Stored Video
(Streaming: no jitters)
Cumulative data
2. video
sent
1. video 3. video received,
recorded network delay played out at client
(e.g., 30 (fixed in this (30 frames/sec) time
frames/sec) example)
streaming: at this time, client
playing out early part of video,
while server still sending later
part of video
1. Streaming Stored Audio and Video
(Continuos)
•Continuous playout and interactivity require low and stable
delays
•However it is common for Internet paths to exhibit
variable end-to-end delays and available bandwidth
–This is especially the case for long paths with many links; long paths also
increase the initial playout delay
–Retran with TCP
smissions of lost or dropped packets
Packet Jitter
1. Streaming Stored Audio and Video
(Continuos)
• Most important feature is average throughput
–With buffering and prefetching continuous playout can be maintained
even if throughput fluctuates.
–Avg. throughput should be at least equal to the bit rate of video itself.
1. Streaming Stored Video
(Client-Side Buffering)
•Common technique employed in all forms of streaming video systems to
absorb network delay variability and enable continuous playout
–Higher the delay variability, longer the playout delay needed
constant bit
rate video client video constant bit
transmission reception rate video
playout at client
variable
network
delay
Cumulative data
buffered
video
client playout time
delay
1. Streaming Stored Video
(A different view of Client-Side Buffering)
buffer fill level,
Q(t)
variable fill playout rate,
rate, x(t) e.g., CBR r
client application
video server buffer, size B
client
1. Streaming Stored Video
(A different view of Client-Side Buffering)
buffer fill level,
Q(t)
variable fill playout rate,
rate, x(t) e.g., CBR r
client application
video server buffer, size B
client
1. Initial fill of buffer until playout begins at tp
2. Playout begins at tp,
3.Buffer fill level varies over time as fill rate x(t) varies and
playout rate r is constant
1. Streaming Stored Video
(A different view of Client-Side Buffering)
buffer fill level,
Q(t)
variable fill playout rate,
rate, x(t) e.g., CBR r
client application
video server buffer, size B
Key parameters: average fill rate (xavg), playout rate (r)
• xavg < r: buffer eventually empties (causing freezing of video playout until buffer again
fills)
• xavg > r: buffer will not empty, provided initial playout delay is large enough to absorb
variability in x(t)
• initial playout delay tradeoff: buffer starvation less likely with larger delay, but
larger delay until user begins watching
1. Streaming Stored Video
(3 Categories)
• Three types, depending on mode of transport:
1.UDP streaming
2.HTTP streaming
3.Adaptive HTTP streaming
•Majority of current systems use (adaptive) HTTP streaming
• Optimisation strategies
–Client buffering and prefetching
A common characteristic of the three categories to mitigate
the effect of varying end-to-end delay and bandwidth.
–Adapting video quality to available bandwidth
–CDN based distribution
1. Streaming Stored Video
(Category1: UDP Streaming)
•Audio/video chunks are placed inside Real-Time Transport
Protocol (RTP) or other similar protocol packets, which become
the payload in UDP packets
•Server transmit rate matches with client video
consumption rate.
–Example: If client video consumption rate is 2Mbps, and each packet
contains 8000 bits, then server should transmit each UDP packet at
8000b/2Mbps=4msec.
–transmission rate can be oblivious to congestion levels
–A small buffer at client side is used to hold (2-5sec) video to remove
network jitter
1. Streaming Stored Video
(Category1: UDP Streaming)
• For interactivity (i.e. pause, reposition, resume etc.), a
separate parallel “control” connection to the server via Real-
Time Streaming Protocol (RTSP) is used.
Limitations:
•
• Due to unpredictable and varying amount of available bandwidth,
constant-rate streaming may fail to provide continuous playout.
Freezing and skipped frames
• Requirement for media control server (e.g., RTSP server) to
• support interactivity, increases cost and complexity with
scalability
• UDP traffic maybe blocked by many firewalls
1. Streaming Stored Video
(Category2: HTTP Streaming)
• Multimedia file is stored at HTTP server as an ordinary
file.
• Retrieved via URL
• TCP connection with server HTTP GET
• request for the URL
• HTTP response message (the video file)
• Client buffers incoming video, when it reaches a certain level the
playout begins.
• YouTube and Netflix used HTTP streaming over TCP
1. Streaming Stored Video
(Category2: HTTP Streaming)
• Features
• HTTP/TCP passes more easily through firewalls and NATs
• Can manage without a media control server (like RSTP), thus scalable
Fill rate fluctuates due to TCP congestion control,
• retransmissions (in-order delivery)
However keep sending bits at maximum possible rate that TCP allows
• A form of prefetching from client perspective to handle jitter.
Additionally using a larger playout delay smooth TCP delivery rate
•
Early Termination and Repositioning of the video
• (interactivity)
HTTP byte range header (HTTP get message)
•
•
• Server forgets about earlier request and start sending bytes from the point
specified in the header.
1. Streaming Stored Video
(Category2: HTTP Streaming)
Streaming stored video over HTTP/TCP
1. Streaming Stored Video
(Category 3: Adaptive HTTP Streaming via DASH)
• In HTTP streaming (YouTube at its inception)
• Every user receives same encoding of video despite differences in
bandwidth and bandwidth variation in time.
• DASH: Dynamic, Adaptive Streaming over HTTP
• Server:
• divides video file into multiple chunk
• each chunk stored, encoded at different rates (e.g. 3G, Fiber)
–manifest file: provides URLs for different chunks
• Client:
• periodically measures server-to-client bandwidth consulting manifest,
• requests one chunk at a time
• chooses maximum coding rate sustainable given current bandwidth
via a rate determination algorithm
• can choose different coding rates at different points in time (depending on
available bandwidth at any given point in time)
1. Streaming Stored Video
(Category 3: Adaptive HTTP Streaming via DASH)
Reference: https://bitmovin.com/dynamic-adaptive-streaming-http-mpeg-dash/
1. Streaming Stored Video
(Category 3: Adaptive HTTP Streaming via DASH)
• Features:
• Reduces startup delays and buffering stalling
• By-passes NATs and Firewalls by using HTTP
• “Intelligence” at client: client determines
–when to request chunk (so that buffer starvation, or overflow does not
occur)
–what encoding rate to request (higher quality when more bandwidth
available)
–where to request chunk (can request from URL server that is “close” to
client or has high available bandwidth)
• A byproduct: improved server-side scalability
Content Distribution Network
CDN
(Content Distribution Challenge)
•Many Internet video comapnies are distributing on-demand
multi-Mbps streams to millions of users
•Challenge: how to stream content (selected from millions of
videos) to hundreds of thousands of simultaneous users while
providing continous playout and high interactivity?
•Option 1: single, large “mega-server/ massive data center”
–single point of failure
–point of network congestion
–long path to distant clients : if client is far from data center
CDN
(Without and With CDN)
CDN
(A better approach)
• A CDN consists of multiple server clusters distributed
geographically around the world
• Stores copies of videos
• Web contents (including documents, images and audios).
• Serves a user request for content from the “closest” CDN
location with the requested content
• CDN can be
• Private CDN: owned by the content provider itself (e.g. Google's
CDN distributed YouTube videos and other types of content)
Third-party CDN: that distributed content on behalf of multiple
• content providers (Akamai's CDN distributes Netflix and Hulu
content, among others)
CDN
(A better approach)
Two approaches for placement of CDN server clusters
1.Enter deep: Push CDN servers deep into many access networks of ISP, by
deploying server clusters in access ISPs all over the world
- Close to users, improves user-perceived delay and throughput by
decreasing number of links and routers (used by Akamai, 1700 locations)
- Task of maintaining and managing the clusters becomes challenging
2.Bring home: Builds large clusters at a smaller number (10’s) of key
locations and connecting these clusters using a private high speed network
(used by Limelight).
- Near Tier 1 ISPs.
- Content owners such as media companies and e-commerce vendors pay CDN
operators to deliver their content to their end users. In
CDN
(CDNs deployed near Point of Presence)
Google has Bring Home CDNs deployed near Tier 1 ISPs
CDN
(CDN operation)
Suppose a content provider NetCinema, employs the 3rd-party CDN KingCDN to distribute videos
to its customers
1.User visits the web-page at NetCinema and clicks the video link
http://video.netcinema.com/6Y7B23V
2. User's host sends DNS query for video.netcinema.com
3a. The users LDNS relays DNS query to authoritative DNS for
NetCinema.
3b. By seeing the string 'video', instead of returning an IP the DNS
address, server returns KingCDN's domain e.g.
a1105.kingcdn.com to LDNS
4a. The LDNS query enters into KingCDN's private DNS structure.
4b. KingCDN's DNS system eventually returns the IP addresses of the content server to
LDNS.
5. The LDNS forwards IP address of the content server to client.
6.Which then established TCP connection with the server and issues HTTP GET request
for the video (manifest file helps client choose appropriate version of the video).
CDN
(CDN operation)
DNS directs user request to CDN
CDN
(Selection Strategies)
Problem: how does CDN DNS select “good” CDN cluster to stream
requested content to client?
Some possible solution strategies:
1).Pick the CDN cluster geographically closest to client’s LDNS server, several
drawbacks:
- Closest geographically does not mean closest in network terms (i.e path)
- May not be a good choice with remote local DNS servers
- Network dynamics not taken into account (ignores the variation in delay and
available bandwidth over time)
2) Measurement based selection of best CDN cluster.
- Active measurement of delay and loss through dedicated probes.
- Passive monitoring of recent/ongoing traffic between clients and CDN servers
(observing delay suffered by SYNACK and ACK messages during 3-way handshake)
CDN
(Selection Strategies)
•3. IP anycast: The routers in the Internet route the client's packet to the
“closest” cluster, as determined by BGP.
–In IP anycast, the CDN company assigns the same IP address to each of its
cluster locations
–When a BGP router receives multiple route advertisements for this same IP
address, it treats these advertisements as providing different paths to the same
physical location.
–Following standard operating procedures, the BGP router will then pick the
“best” (e.g. closes as determined by AS-hop counts) route to the IP address
according to its route selection mechanism.
–This approach has the advantage of finding the cluster that is closest to the
client rather than the cluster that is closest to the client's LDNS.
–It does not take into account the dynamic nature of the Internet over short time
scales.
CDN Cluster Selection Strategies
Using IP anycast to route clients to closest CDN cluster
CDN Cluster Selection Strategies
ther factors impacting cluster selection strategy
– Delay
– Loss
– Bandwidth performance
– Load on the cluster
– ISP delivery cost
– (i.e. contractual relationships between ISPs and the cluster operators
Peer-to-Peer (P2P) Assisted Streaming
P2P Video Streaming
Mostly used in Live Video Streaming: (e.g. P2PTV)
- Each user, while downloading a video stream, is simultaneously also
uploading that stream to other users, thus contributing to the overall
available bandwidth.
- The arriving streams are typically a few minutes time-delayed compared
to the original sources.
- If a user wishes to view a certain channel, he is directed to the "tracker
server" for that channel in order to obtain addresses of peers who
distribute that channel; it then contacts these peers to receive the feed.
- The tracker records the user's address, so that it can be given to other
users who wish to view the same channel.
- The need for a tracker can also be eliminated by the use of distributed
hash table (DHT) technology, where the tracker is deployed at each peer
node (out of scope of this study)
P2P Video Streaming
Case Study 1: Netflix
• Netflix
• 30% downstream US traffic in 2011
• Leading service provider for online movies and TV shows
• Owns very little infrastructure, uses 3rd party services.
• Rents servers, bandwidth, storage and database services from 3rd parties.
• Employs both CDN and adaptive streaming over HTTP
• Amazon cloud
•Four major components
• Registration and payment servers (Own). Handles registration of new account and
credit card payments.
• Content ingestion: Before Netflix distributes a movie to its customers, it must first
ingest and process the movie. Netflix uploads studio master to Amazon cloud
• Content processing: Creates multiple versions of movie (different encodings), suitable
for diverse client video players and data rates.
• Uploading versions to the CDNs: Upload versions from cloud to CDNs
• Multiple CDN providers and clients
• Three 3rd party CDNs host/stream Netflix content: Akamai, Limelight, Level-3
Case Study 1: Netflix
Netflix video streaming platform
Case Study 2: YouTube
• Half a billion videos in its library and half a billion video views
per day (A study conducted in 2011)
World's largest video-sharing site
• Extensive use of CDN technology
• Google uses private CDN to distribute YouTube videos.
• Google has installed server clusters in different locations.
Google uses DNS to redirect a customer request to a specific
• cluster.
•
Case Study 2: YouTube
• Selection strategy:
• Mostly the cluster that results in lowest RTT to client.
• Sometimes to balance load, a client is directed to more distant
cluster
• If a cluster does not have requested video instead of fetching it
from somewhere else, client is redirected to another cluster.
• Previously YouTube used HTTP streaming and a small
number of different versions for video, each with different
bit-rate and quality.
• Recently YouTube has adopted adaptive HTTP streaming
Once a video is received by a YouTube, Google creates its
• multiple versions at its data centers.
Case Study3: KanKan
• Netflix and YouTube both uses CDN and therefore
have to bear the cost
• Kankan, avoids CDN by reducing its
infrastructure and bandwidth costs
• Kankan, is leading P2P-based video-on- demand provider in
China (has over 30 million unique users per month)
When a peer wants to see a video it,
•
• Contacts a tracker (centralized or peer-based DHT), to discover
other peers in the system that have a copy of that video
The peer then requests chunks of the video file in parallel
• from the other peers that have the file.
Requests are preferentially made for the chunks to be viewed in
• near future (i.e. to have continuous playout).
2. Streaming Live Audio and Video
(Introduction)
• Live streaming comes from a content source such as
video cameras and microphones. It is made available at
the same time as the event
being filmed occurs.
• Similar to traditional broadcast radio and TV, except that
transmission is on Internet.
2. Streaming Live Audio and Video
(Features)
• Applications
– Internet Radio (Talk Show)
– TV broadcast (News, TV shows)
– Live sporting event, IPTV (pplive, ppstream etc)
• Streaming (as with streaming stored multimedia)
– Playback buffer
– Playback can lag tens of seconds after transmission
– Delay constraints more stringent than streaming stored video but less than
conversational voice/video
• Interactivity
– Fast forward (impossible) BUT Rewind, pause (possible)
• Continuity
– Average throughput greater than video rate desired
– Forward error correction (FEC) more effective than reactive loss recovery
2. Streaming Live Audio and Video
(Distribution)
• Distribution of live audio/video to many receivers can be
done via:
– Network layer approaches
Multiple unicast streams
IP multicast streams
– Application layer approach
Multicast using P2P networks or CDNs.
• Network layer approaches
– Multiple Unicast streams
– One-to-one connection between the server and a client
– Each client receives a distinct stream
– Burden upon the system and links (due to redundant packet generation
and transmission) but enable interactivity.
2. Streaming Live Audio and Video
(Distribution)
• IP multicast streams
• A single flow, from server to a group of clients
• IP packets have multicast address in class D
• Minor burden on server (sends a single copy)
• Better capacity utilization of network (single copy of message
on a link)
• Requirements
• Group management
• Packet replication at network nodes (from a single input port
to many output ports).
• All clients receive the same stream and do not have control of
content playback
2. Streaming Live Audio and Video
(Distribution)
3. Conversational Voice and Video over
Internet
• Includes group meetings involving more than two participants
• Voice/video generated by every member (E.g., Skype,
Google Talk)
3. Conversational Voice and Video over
Internet
• We will focus on conversational voice over Internet /
Internet telephony / Voice-over-IP (VoIP)
– Real-time conversational voice over the Internet (i.e.
Internet Telephony)
– Protocols: RTP, SIP
3. Conversational Voice and Video over Internet
(VoIP: Characteristics)
• Digitised voice encapsulated in packets and transported between 2 or
more VoIP call participants
• Highly delay sensitive
≤ 150ms ideal
> 400ms unacceptable
Loss-Tolerant
Occasional glitches in vedio/audio placyback
Loss recovery schemes and error concealment: FEC,interleaving
3. Conversational Voice and Video over Internet
(VoIP: Best Effort delivery and its limitations)
• IP provides Best-Effort delivery:
– Service tries best to move a packet from source to destination
– BUT does not guarantee speedy delivery, no-packet loss
• Limitations of best-effort IP service in the context of VoIP
– Example:
The sender generates bytes at a rate of 8,000 bytes per second;every 20 msecs the
sender gathers these bytes into a chunk.
A chunk with header are encapsulated in a UDP segment. Thenumber of bytes in
a chunk is (20 msecs)(8,000 B/sec) = 160B.
If constant end-to-end delay, then packets arrive at the receiverperiodically every
20 msecs.
If unfortunately delay is varying and network is congested, thensome packets will
arrive with delay and some packets will be lost
3. Conversational Voice and Video over Internet
(VoIP: Best Effort delivery and its limitations)
– Packet Loss
TCP can solve it by retransmission BUT not unacceptable for conversational
VoIP application. Most existing VoIP applications use UDP.
Also loss from packet 1 – 20% can be tolerated depending how loss
encoded, transmitted and concealed (FEC)
If loss is larger than what is tolerable, nothing can be done to recover from
bad quality
(a)
(b)
3. Conversational Voice and Video over Internet
(VoIP: Best Effort delivery and its limitations)
– End-to-End Delay
End-to-end delay is the sum of:
o Transmission, processing and queuing delay at each router
o Propagation delay of each link on the path
o End-system processing delays
o Packets with delay > 400msec are disregarded by receiver.
3. Conversational Voice and Video over Internet
(VoIP: Best Effort delivery and its limitations)
– Packet Jitter
– Varying queuing delay (per packet) at routers (on path) towards destination
Some packets arrive in burst others in greater delay
– An earlier transmitted packet may arrive later while a packet transmitted later
– may arrive earlier (when transmitted on paths having different packet load)
If the receiver plays out chunks as soon as they arrive, then the
resulting audio quality can easily become unintelligible. Perfect
– stream
Jitter
3. Conversational Voice and Video over Internet
(VoIP: Removing Jitter at Receiver for audio)
• Fortunately, jitter can often be removed by using sequence
numbers, timestamps , and a playout delay
• Timestamps tell when each chunk was generated and at what time
gap to play it
• Delaying playout of chunks at the receiver, it can be either fixed
or adaptive. It enable continuous audio reception.
– Fixed playout: Each chunk is generated at t time, played at t+q time. Any delayed
chunk is discarded. (has delay-loss trade-off). Long playout delay not good for
conversation.
– Adaptive playout: Estimate the network delay and the variance of the
network delay, and adjusts the playout delay accordingly at the beginning of each
talk spurt. So for each talk spurt the delay of playout may be different
3. Conversational Voice and Video over Internet
(VoIP: Delayed Playout)
3. Conversational Voice and Video over Internet
(VoIP: Removing Jitter at Receiver for audio)
– Adaptive playout algorithm:
di: Average network delay when packet i is received:
di=(1-u)di-1 + u(ri-ti), where u=0.001
vi: average deviation of delay from the estimated avg. delay: vi=(1-u)vi+u|ri-
ti-di|
–
pi: play out time for first packet in talk spurt:
pi=ti+di+Kvi, where K=4
–qi=pi-ti (play out delay for packet i)
Playout delay for every packet j in talk spurt with first packet i is:
–
–– pj=tj+qi
–
– ti: time when packet i is generated by sender
– ri: when packet i is received by receiver
– pi: when packet i is played by receiver
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• A packet is lost either if it never arrives at the receiver or
if it arrives after its scheduled playout time.
• Retransmission of a packet that has missed playout
deadline serves no purpose.
• Loss recovery schemes used by VoIP
– Forward Error Correction
– Interleaving
– Error Concealment
3. Conversational Voice and Video over Internet
(VoIP: Removing Jitter at Receiver for audio)
– Adaptive playout algorithm:
di: Average network delay when packet i is received:
di=(1-u)di-1 + u(ri-ti), where u=0.001
vi: average deviation of delay from the estimated avg. delay: vi=(1-u)vi+u|ri-
– ti-di|
pi: play out time for first packet in talk spurt:
pi=ti+di+Kvi, where K=4
– qi=pi-ti (play out delay for packet i)
Playout delay for every packet j in talk spurt with first packet i is:
–
– ti: time when packet i is generated by sender
–
– ri: when packet i is received by receiver
–
– pi: when packet i is played by receiver
– pj=tj+qi
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Forward Error Correction
– Add redundant information to the original packet stream
– Redundant information helps in recovering lost information
– First Mechanism:
Send a redundant chunk after every n chunks
The redundant chunk is generated by XORing the original n chunks
If one packet in the group of n+1 is lost, it can be fully recovered
If more than one packet is lost, loss can not be recovered
If n is small, larger number of lost packets can be recovered. Thedownside is
transmission rate raise.
Example: if n=3, transmission rate raises by 1/3 or 33%
T h i s scheme raises playout delay. Receiver has towatimore for entire
group of packets to arrive.
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Forward Error Correction
– Second Mechanism:
Send a a lower resolution stream as a redundant information.
E a c h nth nominal bit-rate chunk is grouped with a low-bitrate(n-1)th chunk
containing same information as earlier one but with lower quality.
W h e n loss is non-consecutive, and a nominal bit-rate chunk is lost, the low-
quality chunk is then played
C o m p a r e d to the earlier mechanism, receiver has to wait fortwo chunk
before playback, therefore playout delay is small.
I f the low-quality has comparatively much smaller bit-ratethen raise in
transmission rate is marginal.
T o recover from consecutive loss:
Each (n)th nominal chunk be grouped with
(n-1) and (n-2) OR (n-1) and (n-3) low bit rate chunk
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Forward Error Correction
• Piggybacking lower quality redundant information
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Interleaving
– The original consecutive units are separated by a distance in
transmission stream
– Mitigates effect of packet loss
– Example:
Each unit is of 5 msec
Each chunk is of 20 msec
Then each chunk has 4 units. First chunk will have uni1,2,3
t and 4 and so
on.
But in transmission first chunk will carry units 1,5,9 and13 and
Second chunk will carry units 2,6,10 and 14 and so on.
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Interleaving
• The loss of a single packet from an interleaved stream results in
multiple small gaps in the reconstructed stream, as opposed to
the single large gap that would occur in a non-interleaved stream.
Improves the perceived quality of an audio stream
• It also has low overhead, does not raise bandwidth
• requirement.
Disadvantage of interleaving is that it increases latency
• – Limits its use for VoIP
– BUT can be used for streaming stored video/audio
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Interleaving
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Error Concealment (Packet Repetition):
• Attempt to produce a replacement for a lost packet that is similar to the
original
• Possible since audio signals, and in particular speech, exhibit large amounts
of short-term self-similarity.
• Works for relatively small loss rate (< 15%) and for small packet (4-
40msec)
• Packet repetition with packets arriving immediately before the loss.
Low computational complexity and performs reasonably well.
•
Succeeding Lost Packet Preceding
Packet Packet
Succeeding Packet Preceding
Packet Repetition Packet
3. Conversational Voice and Video over Internet
(VoIP: Recovering from packet Loss)
• Error Concealment (Interpolation):
• Uses audio before and after to interpolate (estimate)
the information in the lost packet.
• Performs better than packet repetition BUT somewhat
computationally expensive
Succeeding Lost Packet Preceding
Packet Packet
Succeeding Interpolated Preceding
Packet Packet Packet
3. Conversational Voice and Video over Internet
(Case Study: Skype)
• Supports a variety of conversational voice/video over IP services: host-
to-host VoIP, host-to-phone, phone-to-host, multi-party host-to-host
video conferencing
• A host is again any Internet connected IP device (including
PCs, tablets, and smartphones.)
• A property of Microsoft A
• proprietary system:
– Encrypted control and media messages, so one is unable to see how it
operates
– what we know about it is mainly via reverse engineering by researchers
using measurements
3. Conversational Voice and Video over Internet
(Case Study: Skype)
• Uses wide variety of audio and video codecs
– Offering different audio/video rates and qualities
– Video quality ranges from 30kpbs to 1Mbps
– Audio codecs use a higher sampling rate 16000
samples/second.
Plain old telephone system (POTS) provides 8000samples/sec
Skype audio quality is usually better than POTS
Control packets over TCP
• By default, audio and video packets sent over UDP
• – Uses FEC for loss recovery
Adapts audio and video streams to current network conditions by varying
• audio/video bit-rate and FEC overhead
3. Conversational Voice and Video over Internet
(Case Study: Skype has an overlay network)
• A hierarchical overlay network with
– Super nodes (SNs)/ Super Peers and
– Skype clients / client peers
Super Node overlay network
Skype
login server A Super node (SN)
Skype clients (SC)
3. Conversational Voice and Video over Internet
(Case Study: Skype A P2P system)
• User location:
• Skype maintains an distributed index that maps Skype usernames to current IP
addresses (and port numbers).
• Distributed Hash Table (DHT) organization is very possible at
Super peers.
Alice
Alice searches for distributed
index to find Bob’s IP
address
Bob
3. Conversational Voice and Video over Internet
(Case Study: Skype A P2P system)
• NAT Traversal:
• Problem: if both parties behind NATs
– NAT prevents outside peer from
initiating connection to a peer behind
it
– Peer behind NAT can however initiate
connection to outside
• Super peers can enable
connection between client peers
(that are behind NAT)
3. Conversational Voice and Video over Internet
(Case Study: Skype A P2P system)
• Host-to-host Internet
telephony is inherently P2P
• SN as a relay
– Client maintain open connections to
their respective SNs
Caller client notifies its SN to
– connect to callee
SNs for caller and callee
– connect with each other
Another SN is identified as a relay
– between the two parties and both
connect to that SN and engage in the
call
3. Conversational Voice and Video over Internet
(Case Study: Multi-part Skype conferences)
• Multi-Party = number of participants, N > 2
• With audio streams, the conference initiator collects streams
from all parties, combines them and distributes back the
combined stream
– reduce communication overhead from sending N(N-1) to
2(N-1) streams
• With video streams, each participant’s video stream sent to a
server cluster which in turn sends to the rest of the participants
– N(N-1) streams sent in total but does not stress the upstream access
links which typically have much lower bandwidth than downstream
links
Protocols for Real-Time Interactive
Applications
Protocols for Real Time Interactive
Applications
• Real Time Protocol
• Real Time Control Protocol
• Session Initiation Protocol
• H.323
Protocols for Real Time Interactive
Applications
Real-Time Protocol (RTP)
• RTP is companion protocol of RTCP
• Complementary to SIP and H.323
• It runs on top of UDP
• Specifies packet structure for packets carrying audio,
video data
• RFC 3550
• RTP runs in end systems only
– (not by intermediate routers)
• Interoperability:
– If two VoIP applications run RTP,
– they may be able to work together
Protocols for Real Time Interactive
Applications
Real-Time Protocol (RTP)
• Each audio and video has separate RTP stream
• BUT if in encoding process audio and video are bundled
in one stream; then one RTP stream is generated (e.g. in
MPEG 1& 2)
• RTP packet provides
– payload type identification
– packet sequence numbering
– time stamping
• RTP does not provide any mechanism to
ensure timely data delivery or other
Quality of Service (QoS) guarantees
Protocols for Real Time Interactive
Applications
(RTP Example)
Consider sending 64 kbps PCM-encoded voice over RTP
• Application collects encoded data in chunks,
• e.g., every 20 msec = 160 bytes in a chunk
• audio chunk + RTP header form RTP packet, which
is encapsulated in UDP segment
• RTP header indicates type of audio encoding in each
packet
– Sender can change encoding during conference
(UDP HEADER, (RTP HEADER,(CHUNK)))
Protocols for Real Time Interactive
Applications
(RTP Header Format)
Payload Sequence Synchronization Miscellaneous
Timestamp
type number fields
Source ID
type
• Payload type (7 bits): indicates type of encoding currently being used
– Payload type 0: PCM mu-law, 64 kbps
– Payload type 3: GSM, 13 kbps
– Payload type 7: LPC, 2.4 kbps
– Payload type 26: Motion JPEG
– Payload type 31: H.261
– Payload type 33: MPEG2 video
• If sender changes encoding during call, sender informs receiver via
payload type field
• Sequence number (16 bits): increment by one for each RTP packet
sent
– detect packet loss, restore packet sequence
Protocols for Real Time Interactive
Applications
(RTP Header Format)
type Miscellaneous
Payload Sequence Timestamp Synchronization
type number fields
Source ID
• Timestamp field (32 bits long): sampling instant of first
byte in this RTP data packet
– for audio, timestamp clock increments by one for each sampling
period (e.g., each 125 µsecs for 8 KHz sampling clock)
– if application generates chunks of 160 encoded samples,
timestamp increases by 160 for each RTP packet when source is
active. Timestamp clock continues to increase at constant rate
even when source is inactive.
– For removing jitter and for synchronous playout at receiver.
• Synchronization Source or SSRC field (32 bits long):
identifies source of RTP stream. Each stream in RTP
session has distinct SSRC.
Protocols for Real Time Interactive
Applications
(RTCP: Real-Time Control Protocol)
• RTCP works in conjunction with RTP
– Each participant in RTP session periodically
sends RTCP control packets to all other
participants
• For an RTP session
– There is typically a single multi-cast
address
– All RTP and RTCP packets belonging to
same session uses same multicast address
– RTP and RTCP are distinguished by port
number, RTCP port#= RTP port#+1
Protocols for Real Time Interactive
Applications
(RTCP: Real-Time Control Protocol)
• RTCP packets
– Do not encapsulate chunks of video/audio
– Reports statistics periodically
• Each RTCP packet contains sender and/or receiver
reports useful to applications
– Number of packets sent
– Number of packets lost
– Inter-arrival jitter
• Feedback used to control performance
– Sender may modify its transmissions based on feedback
Protocols for Real Time Interactive
Applications
(RTCP Packet Types: Sender Report)
• For each RTP stream that a sender is transmitting, the sender creates
and transmits RTCP sender report.
• A sender report includes information about the RTP stream including:
• SSRC of the RTP stream: Identification number of the RTPstream.
• Timestamp and wall clock time: of the most recently generated RTP
packet in the stream.
• Number of packets: sent in the RTP stream.
• Number of Bytes: sent in the RTP stream.
• Sender report enables receiver to synchronizes different media streams
within an RTPsession
Sender Report
(SSRC, Timestamp, Wall-clock time,
# of sent packets, # of sent bytes)
Protocols for Real Time Interactive
Applications
(RTCP Packet Types: Source Descriptors)
• For each RTP stream that a sender is transmitting, the sender also
creates transmit source description packet.
• These packets contain information about source:
• SSRC of the associated RTP stream
• E-mail address of the sender.
• Sender’s name.
• The application that generates the RTP stream
• These packets provide a mapping between the source identifier (that is
SSRC) and the user/host name.
• The source report, receiver report and source descriptor can be
concatenated into a single packet. The resulting packet is then
encapsulated into a UDPsegment.
Protocols for Real Time Interactive
Applications
(RTCP Packet Types: Receiver Report)
• Receiver generates a report for each RTP stream that it receives.
• It aggregates its reception reports into a single RTCP packet and sends
it into multicast tree that connects all the session’s participants.
• Important information in the reception report are:
• SSRC of the RTP stream: For which the reception report is generated.
• Fraction of Packets Lost within RTP stream: If a sender receives
reception reports indicating that the receivers are receiving only a
small fraction of the sender’s transmitted packets, it can switch to a
lower encoding rate; with aim to improve the reception rate.
• Last sequence number: of the RTP packet received.
• Inter-arrival jitter: which is a smoothed estimate of the variation in the
inter-arrival time between successive packets in the RTP stream.
Receiver Report
(SSRC, Fraction of lost Packets,
Last seq. number, Inter-arrival jitter)
Protocols for Real Time Interactive
Applications
(RTCP Bandwidth Scaling)
• RTCP has scaling problem:
– An RTP session consists of one
sender and a large number of
receivers
– If each of the receiver periodically
generates RTCP packets, then the
aggregate transmission rate of
RTCP packets can greatly exceed
the rate of the RTP packets sent by
the sender
– As RTP sent traffic in multi-cast
tree does not change as the
number of receivers increases,
whereas the amount of the RTCP
traffic grows linearly with the
number of the receivers.
Protocols for Real Time Interactive
Applications
(RTCP Bandwidth Scaling)
RTCP attempts to limit • 75 kbps is equally shared
its traffic to 5% of among receivers:
session bandwidth – with R receivers, each
receiver gets to send RTCP
traffic at 75/R kbps.
Example: one sender,
• Sender gets to send RTCP
sending video at 2
traffic at 25 kbps.
Mbps
• Participant determines
• RTCP attempts to limit
RTCP packet transmission
RTCP traffic to 100 Kbps
period by calculating avg
• RTCP gives 75% of rate RTCP packet size (across
to receivers; entire session) and dividing
remaining 25% to by allocated rate
sender
Protocols for Real Time Interactive
Applications
(SIP: Session Initiation Protocol, RFC3261)
Long-term vision:
• All telephone calls, video
conference calls take place over
Internet
• People identified by names or
e-mail addresses, rather than
by phone numbers
• Can reach callee (if callee so
desires), no matter where callee
roams, no matter what IP
device callee is currently using.
– Callee can be at home, office or
driving etc.
– Callee can be at PC, PDA, Phone
etc.
Protocols for Real Time Interactive
Applications
SIP Example: Setting up call to a known IP address
• Alice’s SIP invite message indicates her port number, IP address,
encoding she prefers to receive (PCM µlaw)
• Bob’s 200 OK message indicates his port num ber, IP address,
preferred encoding (GSM)
• SIP messages can be sent over TCP or UDP; here sent over
RTP/UDP
• Default SIP port number is 5060
• If Bob could not use PCM µlaw encoding, it will simply send 600
Not Acceptable message along with encoding schemes that it can
use, Alice will then send a new INVITE with new ending scheme.
• Bob can reject the call by sending “busy”, “gone”, “payment
required” and “forbidden” messages.
Protocols for Real Time Interactive
Applications
SIP Example: Setting up call to a known IP address
Protocols for Real Time Interactive
Applications
SIP Addresses
• SIP addresses resembles e-mail addresses
• Example: bob@domain.com
• When Alice’s SIP device sends an INVITE message, the message
would include his e-mail like address.
• The SIP infrastructure then routes the message t the IP device that
Bob is currently using.
• Other possible forms can be:
– Bob’s legacy phone number
– Bob’s first/middle/last name (assuming it isunique)
• SIP addresses can be included in web-pages (for visitors to call on
it), just like e-mail address of people (making mailto URL)
Protocols for Real Time Interactive
Applications
SIP Messages
• SIP INVITE message from Alice to Bob, with Alice not
knowing IP address of the SIP device Bob is currently using.
Protocols for Real Time Interactive
Applications
SIP Name Translation and User Location
• Caller wants to call callee:
– But only has callee’s name or e-mail address.
– Need to get IP address of callee’s current host:
– User moves around
– DHCP protocol
– User has different IP devices (PC, smartphone, car device)
• SIP Proxy helps in mapping a callee’s name to IP address
• Result can be based on:
– Time of day (work, home)
– Caller (don’t want boss to call you at home)
– Status of callee (calls sent to voicemail when callee is already talking to
someone)
Protocols for Real Time Interactive
Applications
SIP Name Translation and User Location
• SIP registrar: Every SIP user has an associated server called
registrar. Whenever a user launches a SIP application on a
device, the application sends SIP register message to the
registrar informing registrar of its current IP address.
• When device is changes, a new SIP Register message is sent.
• If device and IP address is same then a refresh message is
sent to registrar, stating that IP address is still valid (every
3600 sec).
REGISTER sip:domain.com SIP/2.0
Via: SIP/2.0/UDP 193.64.210.89
From: sip:bob@domain.com
To: sip:bob@domain.com
Expires: 3600
SIP REGISTER message:
Protocols for Real Time Interactive Applications
SIP Name Translation and User Location
• SIP Proxy:
– Alice sends invite message to her proxy server
Ø contains address sip:bob@domain.com
Ø proxy responsible for routing SIP messages to callee, possibly through multiple
proxies
– Bob sends response back through same set of SIP proxies
– Proxy returns Bob’s SIP response message to Alice
– Contains Bob’s IP address
Protocols for Real Time Interactive Applications
SIP Example: jim@umass.edu calls keith@poly.edu
Poly SIP
2. UMass proxy forwards registrar
request to Poly registrar
server 2 3 3. Poly server returns redirect
response, indicating that it should try
keith@eurecom.fr
UMass 4. Umass proxy forwards request Eurecom SIP
SIP proxy to Eurecom registrar server
4 registrar
7
1. Jim sends 6-8. SIP response returned 5. eurecom
INVITE 8 5 registrar
to Jim 6
message to 1 forwards
UMass SIP INVITE to
proxy. 197.87.54.21
, which is
9 running
128.119.40.1
9. Data flows between keith’s SIP
86
clients client
197.87.54.21
Protocols for Real Time Interactive
Applications H.323 [ITU-T]
• An alternative to SIP
• Popular for real-time audio and video conferencing.
• Standard also covers how Internet end-systems be
attached to telephones in circuit-switched network
Protocols for Real Time Interactive
Applications H.323 [ITU-T]
• Terminals (what people see/hear)
– Communication devices
– Multimedia PCs, any stand-alone device, a simple telephone
– Must support audio (video) communication
• Gateways (control and ‘routing’)
– Speaks H.323 protocols on Internet side and PSTN protocols on telephone side
– Data format translation
– Audio/video codec translation
– Call setup, termination from both sides of the network
Protocols for Real Time Interactive
Applications H.323 [ITU-T]
• Multipoint Control Units (optional)
– Provides capability of video-conferencing with more than one party.
– Acts as a co-ordinator of multiparty conferences
• Gatekeepers (brain of H.323, access to other environments)
– Address translation
– Admissions Control
– Bandwidth Control
– Zone Management (Zone consists of terminals in Gatekeeper’s jurisdiction)
– Routing Capabilities
Protocols for Real Time Interactive
Applications H.323
• An umbrella specification that includes following
specifications:
• How end-points negotiate common audio/video
encodings
• How audio and video chunks are encapsulated and sent
over the network.
• How endpoints communicate with their respective
gatekeepers
• How internet phones communicate with PSTN phones via
a gateway
Protocols for Real Time Interactive Applications
H.323
• H.323 defines protocols:
– G.711: Audio encoding scheme (PCM), 8000 samples/sec with 8-bits per
samples to give an uncompressed speech of 64kpbs
– G.723.1: Also speech encoding scheme. This algorithm gives an output rate
of either 6.4 kbps or 5.3 kbps (compression factor of 10 and 12)
– H.245: Since multiple compression algorithms are permitted, a protocol is
needed to allow terminals to negotiate which one they are going to use.
H.245 also negotiates other aspects such as bit rate.
– RTP: Actual data transmission
– RTCP: for control of RTP channel
– Q.931: For establishing and releasing connections, providing dialing tones,
making ringing sounds etc.
– H.225: Terminals talk to the gatekeeper such as about
Ø RAS (registration, /authorization/status).
Ø Terminals join/leave zone
Ø Request/return bandwidth
Ø Status update
Protocols for Real Time Interactive Applications
H.323 stack
29
Protocols for Real Time Interactive Applications
H.323 stack
• How Protocol stack is used
• Consider a PC on a LAN calling a remote telephone
• PC has to discover the gatekeeper, so broadcasts a UDP gatekeeper discovery
message to port 1718.
• When gatekeeper responds, PC learns its IP address
• Now PC registers with gatekeeper by sending it an RAS message in UDP
packet
• After gatekeeper accepts, the PC sends the gatekeeper an RAS admission
message requesting bandwidth
• After bandwidth is granted; call may setup. PC now establishes a TCP
connection to the gatekeeper to begin the call setup. It sends a Q.9331 SETUP
message over the connection by specifying number of telephone being called.
• The gatekeeper responds with Q.931 CALL PROCEEDING message toACK
correct receipt of the request. And forwards the SETUP message to the
gateway.
30
Protocols for Real Time Interactiven Applications
H.323 stack
• How Protocol stack is used
• Gateway which is half computer half telephone, then makes an ordinary
telephone call to the desired number.
• The end office to which the telephone is attached rings the called telephone and
also sends back a Q.931 ALERT message to tell the calling PC that ringing has
begun.
• When person at the other end picks the telephone, the end office sends back
Q.931 CONNECT message to the PC.
• Once connection is established, the gatekeeper is no longer in the loop; but
gateway is; subsequent packets go directly to the gateways. The H.245 protocol is
now used to negotiate the parameters of the call.
• H.245 uses control channel; each side announces its capabilities (i.e. whether it
can handle video or conference calls, which codecs it supports etc) .
• Once each side knows what other can handle; two unidirectional data channels
are setup and a codec and the other parameters are assigned to each one.
31
Protocols for Real Time Interactive Applications
H.323 stack
• How Protocol stack is used
• Data flow is done by RTP
• RTCP manages RTP and acts a role of congestion control. If video is present
RTCP handles synchronization of audio and video.
• When either party hangs up; Q.931 call signaling channel is used to tear down the
connection
• When call terminates; the calling PC contacts the gatekeeper again with a RAS
message to release the bandwidth.
32
Protocols for Real Time Interactive Applications
H.323 vs SIP
H.323 [ITU-T] SIP: [IETF]
• Complete, vertically integrated suite of • Embraces simplicity
protocols for multimedia conferencing: • Single component. Works with RTP,
signaling, registration, admission but does not mandate it. Can be
control, transport, codecs combined with other protocols,
• Comes from the ITU(telephony) services
• Borrows much of its concepts from
HTTP
Network Support for
Multimedia
Providing Multiple Classes of Service
(3 network layer approaches to support multimedia app)
Making the best of best-effort service.
• Where packet loss and excessive delay rarely occurs
• When demand increases, ISP deploy more bandwidth and switching
capacity to ensure satisfactory loss and delay
Differentiated Services
• Not single one-size-fits-all best-effort service
• One type of traffic might be given strict priority over another class
of traffic when both types of traffic are queued at a router
– Packet marking
– Packet scheduling
• Type-of-service field in IPv4 header
– To indicate quality of service desired by a packet
• Per-connection QoS Guarantee
– Each instance of the application explicitly reserves bandwidth and end-to-end
performance
Providing Multiple Classes of Service
(3 network layer approaches to support multimedia app)
Providing Multiple Classes of Service
(Dimensioning Best-Effort Networks)
Multimedia communication needs.
•Low end-to-end delay/jitter and loss
Problem occurs
•When congestion occurs
Simple solution
• Throw money at the problem and avoid network contention
• This could be achieved with no changes to the Internet best-effort
architecture.
Problem with this solution
• Not practical according to ISP business standpoint
• How much capacity to provide at network links in a given topology
to achieve a given level of performance (bandwidth provisioning).
• How to design a network topology to achieve a given level of end-to-
end (network dimensioning)
Providing Multiple Classes of Service
(Dimensioning Best-Effort Networks)
• Following issues must be addressed to predict
– application-level performance between two network end points,
– and provision enough capacity to meet an performance
requirements.
• Models of traffic demand between network end points. Models may
need to be specified
– Call level (for example, users “arriving” to the network)
– Packet level (packets being generated by ongoing applications).
– Note that workload may change over time.
• Well-defined performance requirements.
– E.g. to support delay-sensitive traffic (a threshold value)
• Models to predict end-to-end performance for a given workload
model, and techniques to find a minimal cost bandwidth allocation
that will result in all user requirements being met.
Providing Multiple Classes of Service
(Providing Multiple Classes of Service)
Simplest enhancement to the one-size-fits-all best-effort service is
• To divide traffic into classes
• Provide different levels of service to these different classes of traffic.
– For example, an ISP might well want to provide a higher class of
service to delay-sensitive Voice-over-IP or teleconferencing traffic
(and charge more for this service!) than to elastic traffic such as
email or HTTP.
– An ISP may provide a higher quality of service to customers
willing to pay more for this improved service. Anumber of ISPs
have adopted such tiered levels of service—with platinum-service
subscribers receiving better performance than gold- or silver-
service subscribers
Economy class < business class < first class passengers
Providing Multiple Classes of Service
(Motivating Scenarios)
• Suppose two application packet flows
originate on Host H1 and H2 on one
LAN
• The packets are destined for Hosts H3
and H4 on another LAN
• The routers on the two LANs are
connected with a 1.5Mbps link
• Speed of the LANs are significantly
higher than 1.5Mbps
• Both packet loss and delay will occur
if the combined sending rate of H1 and
H2 exceeds 1.5Mbps
• Suppose one communication is from
1Mbps audio application and other
from HTTP web browsing
Providing Multiple Classes of Service
(Motivating Scenarios)
• In best-effort Internet
– FIFO order
– Burst of packets from Web-browsing
may delay/ loss audio packets
• Multiple classes
– Since web-browsing is elastic
application, problem can be solved by
giving priority to audio stream
– An audio packet in the router buffer
will always be transmitted before web
traffic packet
– A dedicated 1.5Mbps link for audio
traffic
– Web traffic will flow only when no
audio traffic
– Each type of traffic will be marked for
router to distinguish each
Providing Multiple Classes of Service
(Motivating Scenarios)
• Multiple classes
– In current scenario audio traffic
1Mbps,
– So, web traffic can still have the
remaining 0.5 Mbps
– BUT what if audio application
starts transmission at 1.5Mbps
– The web traffic will starve
– Similarly many audio traffics
sharing a link will starve FTP
traffic if the priority is given to
audio traffic.
• Degree of isolation
– So that one class of traffic can be
protected from the other class.
Providing Multiple Classes of Service
(Motivating Scenarios)
Degree of isolation .
- Traffic Policing
- It makes sure that certain criteria
(like <=1Mbps rate) is followed by a
traffic class
- In case the traffic class exceeds the
threshold packet be delayed or
discarded at routers.
- The traffic policing and marking is
both implemented at network edge
Providing Multiple Classes of Service
(Motivating Scenarios)
Degree of isolation .
- Explicit allocation of fixed traffic rate
- E.g. 1Mbps for audio
- 0.5 Mbps for web-browsing
- In this case, audio and web-browsing
application see a logical channel with 1
and 0.5 Mbps capacity
- An application can only use the
bandwidth allocated to it; it cannot
even use the rest of bandwidth if it is
available.
- E.g. if audio app. is silent, HTTP will
still not be able to transmit more than
0.5 Mbps.
• Use it OR loose it
– Not optimal
Providing Multiple Classes of Service
(Motivating Scenarios)
Scheduling Mechanisms .
- First in-first out (FIFO)
• Packets arriving at the link output queue wait for transmission if the link
is currently busy transmitting another packet.
• If there is not sufficient buffering space to hold the arriving packet, the
queue’s packet-discarding policy then determines
– whether the packet will be dropped (lost) or
– whether other packets will be removed from the queue.
Providing Multiple Classes of Service
(Motivating Scenarios)
Scheduling Mechanisms .
- Priority Queuing
• Packets are divided into classes and marked accordingly (i.e. TOS field in
IPv4).
• Each class has its own queue
• Packets will be transmitted from the highest priority class that has a
nonempty queue (that is, has packets waiting for transmission).
• The choice among packets in the same priority class is typically done in a
FIFO manner .
Providing Multiple Classes of Service
(Motivating Scenarios)
Scheduling Mechanisms .
- Priority Queuing
• Operation of priority queue
Providing Multiple Classes of Service
(Motivating Scenarios)
Scheduling Mechanisms .
- Round Robin
• Packets are sorted into classes just like priority queue
• Rather than a strict selection based on priority; Round robin process is
used:
– A class 1packet followed by a class 2 packet is transmitted
• A work-conserving round robin discipline:
– that looks for a packet of a given class but finds none will immediately
check the next class in the round robin sequence.
Providing Multiple Classes of Service
(Motivating Scenarios)
Scheduling Mechanisms .
- Weighted Fair Queue (WFQ)
• A generalized abstraction of Round Robin queue
• Arriving packets are classified and queued in appropriate waiting area
– a WFQ scheduler will serve classes in a circular manner
• Here each class receives a differential amount of service in any interval of
time. Specifically, each class, i, is assigned a weight, wi.
• During any interval of time during which there are class i packets to send,
class i will then be guaranteed to receive a fraction of service equal to wi
/(Σwj ), where the sum in the denominator is taken over all classes that
also have packets queued for transmission.
• Thus, for a link with transmission rate R, class i will always achieve a
• throughput of at least R · wi /(Σwj ).
Providing Multiple Classes of Service
(Motivating Scenarios)
Scheduling Mechanisms .
- Weighted Fair Queue (WFQ)
Providing Multiple Classes of Service
(Traffic Policing)
Goal: limit traffic to not exceed declared parameters
Three commonly used parameters:
• (long term) average rate: how many packets can be sent
per unit time (in the long run)
• crucial question: what is the interval length?
• 100 packets per sec or 6000 packets per min have same average!
• peak rate: e.g., 6000 packets per min (ppm) average and
1500 pps peak rate
• (max.) burst size: max number of packets sent
consecutively (with no intervening idle)
Providing Multiple Classes of Service
(Leaky Bucket Mechanism)
Token bucket: limits input to specified average rate and
burst size
• Bucket can hold btokens
• Tokens generated at rate r tokens/sec unless bucket full
• Over an interval of length t, the number of packets
admitted ≤ r t +b
Providing Multiple Classes of Service
(Use of Link Scheduling and Policing
Mechanisms to achieve QoS Guarantees)
• Example: combination of leaky bucket and WFQ to
place an upper bound on delay, a QoS guarantee
per-flow rate
= R . wi /(∑wj)
arriving token rate, r
traffic
bucket size, b
WFQ
arriving Dmax = b1 (∑wj) / R . w1
traffic
Multimedia Networking
Network Support for Multimedia
Applications
Protocols for Real Time Interactive
Applications
• Differentiated Services (DiffServ)
• Per Connection Quality of Services
Guarantees (IntServ)
Differentiated Services
Introduction
• Ability to handle different classes of traffic in different
ways within the Internet in a scalable manner.
• Millions of simultaneous source-destination traffic flows
may be present at a backbone router.
• Scalability is met by placing only simple
functionality within the network core, with more
complex control operations being implemented at
the network’s edge
Differentiated Services
Functional Elements
• Edge Functions:
– Packet classification
Ø At the incoming edge of the network (that is, at either a Diffserv-
capable host that generates traffic or at the first Diffserv-capable
router that the traffic passes through), arriving packets are marked.
Ø Differentiated Services (DS) field in IPv4 (Type of services) and IPv6
(Traffic class) header is set to some value.
Ø DSCP: DiffServ Code Point (6 bits)
– Traffic conditioning
Ø Packet marking, metering, testing with contracted profile, shaping
and dropping.
Differentiated Services
Sub Functions
• Traffic Profile:
– Some of the end-nodes have an upper bound on their sending
rate
– E.g. a limit on peak rate or burstiness of the packer flow
– As long as the user sends packets into the network in a way that
conforms to the negotiated traffic profile, the packets receive
their priority marking and are forwarded along their route to the
destination.
– On the other hand, if the traffic profile is violated, out-of-profile
packets might be marked differently, might be shaped (for
example, delayed so that a maximum rate constraint would be
observed), or might be dropped at the network edge
Differentiated Services
Sub Functions
• Metering Function:
– It is to compare the incoming packet flow with the negotiated
traffic profile and
– To determine whether a packet is within the negotiated traffic
profile.
– The actual decision about whether to immediately remark,
forward, delay, or drop a packet is a policy issue determined by
the network administrator and is not specified in the Diffserv
architecture.
Differentiated Services
Sub Functions
Traffic Conditioning at Edge Router
Differentiated Services
Functional Elements
• Core Functioning:
– Forwarding
Ø When a DS-marked packet arrives at a Diffserv capable router, the
packet is forwarded onto its next hop according to the so-called per-
hop behavior (PHB) associated with that packet’s class.
Ø The per-hop behavior influences how a router’s buffers and link
bandwidth are shared among the competing classes of traffic.
Ø A crucial tenet of the Diffserv architecture is that a router’s per-hop
behavior will be based only on packet markings, that is, the class of
traffic to which a packet belongs.
Differentiated Services
Sub Functions
• Per-hop behavior (PHB):
– A PHB can result in different classes of traffic receiving different
performance (that is, different externally observable forwarding
behaviors).
– While a PHB defines differences in performance (behavior)
among classes, it does not mandate any particular mechanism for
achieving these behaviors.
– E.g, a PHB would not require that a particular packet-queuing
discipline (for example, a priority queue versus a WFQ queue
versus a FCFS queue) be used to achieve a particular behavior.
– Differences in performance must be observable and hence
measurable.
Differentiated
Services
Functional Elements
Differentiated Services
Two PHBs
• Expedited Forwarding PHB:
– The departure rate of a class of traffic from a router must equal or
exceed a configured rate.
– EF is supported by a specific queue at the router
• Assured Forwarding PHB:
– Divides traffic into four classes
– Each AF class will have its own queue at the router
– Each AF class is guaranteed to be provided with some minimum
amount of bandwidth and buffering such that
Ø AF1 > AF2 >AF3 >AF4
Differentiated Services
End-End DiffServ
• In order to provide end-to-end Diffserv service:
– All the ISPs between the end systems must not only provide this
service, but most also cooperate and make settlements in order
to offer end customers true end-to-end service.
• Second, if Diffserv were actually in place and the network
ran at only moderate load:
– Most of the time there would be no perceived difference between a best-
effort service and a Diffserv.
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
• In DiffServ with proper network dimensioning, the
highest class of service can:
– Indeed achieve extremely low packet loss and delay—essentially
circuit-like performance.
– But can the network guarantee that an ongoing flow in a high-
priority traffic class will continue to receive such service
throughout the flow’s duration using only the mechanisms that
we have described so far?
– It cannot
• Why additional network mechanisms and protocols are
required for a hard service to individual connections.
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
• Ex: Two audio applications
– Each transmitting at 1Mbps
– Sharing 1.5Mbps
– Each is belonging from same class
– The router will treat each similarly
– Each traffic stream will loose 25% of its packets
– It is unacceptable QoS, both applications are unusable.
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
• Ex: Two audio applications
– Both the applications cannot be satisfied simultaneously
– Question: How to resolve the problem?
– Answer: One of the application flow be blocked [Telephone
network is an example]
– By explicitly admitting or blocking flows based on their resource
requirements, the network can guarantee that admitted flows
will be able to receive their requested QoS.
• Call Admission:
– A flow declares about the resources it will need for a flow
– The network either accepts or rejects the flow
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
• Ensuring an application flow gets its desired QoS
• Resource Reservation
– To guarantee a call gets its desired QoS from a network, it must
specify:
Ø The resources that it needs (e.g. link bandwidth, buffers etc)
– Once call reserves the resources; it has on demand access the the
resources throughout its duration.
– If a call reserves and receives a guarantee of x Mbps of link
bandwidth, and never transmits at a rate greater than x , the call
will see loss- and delay-free performance.
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
• Ensuring an application flow gets its desired QoS
• Call Admission
– Since resources are not infinite; the call application when asks for
resources
Ø The network accepts it, if resources are available.
Ø Or blocks the call if there are not enough resources. In such a case
the call application may try again and again until required resources
are available (i.e. released by others).
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
• Ensuring an application flow gets its desired QoS
• Call setup signaling:
– A signaling protocol is needed to ensure
Øthe per-hop allocation of local resources,
Øas well as the overall end-to-end decision of whether or not the
call has been able to reserve sufficient resources at each and
every router on the end-to-end path.
ØRSVP protocol is a call setup protocol,
ØIn ATM network; Q2931b does this job
Integrated Services
Per-connection Quality of Service (QoS) Gaurantee
Integrated Services
Per-connection Quality of Service (QoS) Guarantee
Quality of Service (QoS) Guarantee
Multimedia Networking Summary
• Types of Data
– Audio
Ø Analog to Digital Conversion
Ø Encoding Schemes
q Pulse Code Modulation (Sampling, Quantization)
q MP3 (Filters out unintelligible frequencies)
– Video
Ø Consist of Frames/ second
Ø Each frame is an image
Ø Compression/ Encoding
q Exploits Spatial Redundancy (e.g. JPEG)
q Exploits Temporal Redundancy (e.g. MPEG)
Multimedia Networking Summary
• Type of Applications
– Streaming Stored Audio/video
Ø UDP streaming
Streaming
Ø HTTP streaming (HTTP/TCP)
Ø Adaptive HTTP streaming (HTTP/TCP)
– Stream Live Audio/video Interactivity
Ø Multiple unicast IP
Ø Multi-cast IP Continuity
Ø Application layer P2P/ CDN
– Conversational VoIP
Ø RTP/UDP, TCP
Ø FEC, Interleaving, Error Concealment
Ø Playout Delay
Multimedia Networking
Summary
• Video/Audio Distribution
– Content Delivery Network
– Peer to Peer Network
• Conversational VoIP Protocols
– Real-Time Protocol (RTP) for audio/video transfer
– Real-Time Interactive Protocol (RTCP) for control messages
– Session Initiation Protocol (Enables Conversational VoIP)
Ø E-mail like addresses
Ø Terminals
Ø Proxy
Ø Registrar
– H.323
Ø A suite of protocols
Ø Terminals, Gatekeeper, MCU and Gateway
Multimedia Networking
Summary
• Guarantee QoS
– Best Effort Network
– Multiple Classes
– Per-connection QoS guarantee