Network dimensioning for voice over IP
Tuomo Hakala
                                              tuomo.hakala@kolumbus.fi
Abstract                                                       •    Overall delay
This article concentrates in the issues of network             Currently there are several approaches to improve the
dimensioning for voice over IP (VoIP). The network             audio quality of VoIP [7]:
under dimensioning is an IP network between VoIP user
devices. First, a short introduction to VoIP in general is     •    Integrated Services (IntServ) is a stateful approach
given. Second, the issues in network dimensioning for               where resources are reserved in the network before
VoIP are identified. Third, bandwidth requirements of               data starts to flow along the reserved path. [8]
VoIP are calculated. Fourth, basic approaches to Quality
of Service are discussed and finally conclusions are           •    Differentiated Services (DiffServ) is a stateless
drawn.                                                              approach where real-time traffic is marked to get
                                                                    preferred treatment in the network. [9]
                                                               •    Forward Error Correction (FEC) algorithms
1.       Introduction                                               reduce the impact of data loss by sending redundant
                                                                    data along with the audio data. The redundant data
VoIP represents the best opportunity so far for voice and
                                                                    helps to reconstruct lost data. [10] [14]
data convergence and it is now one of the fastest-
growing industries [12]. An IP network with mixed
                                                               •    Loss Concealment algorithms try to reduce the
voice and data makes the network management easier
                                                                    impact of data loss by replacing the lost audio with
than managing separate voice and data networks. A
                                                                    an approximation. [11]
VoIP call uses less bandwidth than a circuit-switched
call. VoIP makes new services possible.
                                                               Forward Error Correction and Loss Concealment
                                                               algorithms are methods used in the VoIP user devices.
IP networks, like the current Internet, offering only best-
                                                               IntServ and DiffServ are methods used in the IP
effort service, cannot satisfy the Quality of Service
                                                               network.
(QoS) requirements of VoIP. This is primarily because
of the variable queuing delays and packet loss during
network congestion [12].
                                                               2.       The issues
The end-to-end Quality of Service of VoIP is composed
of factors related to the network and factors related to the   During an average conversation, each party usually talks
applications. Factors related to the network are [13] [1]:     only about 35 percent of the time. Most of the techniques
                                                               used to transform voice into data, the codecs, have the
•    Network delay                                             ability to detect silences. With this voice activity
                                                               detection, data is transmitted only when needed. When
•    Network jitter                                            several conversations are multiplexed on a single
                                                               transmission line, statistical multiplexing can be used
•    Network packet loss and desequencing                      which leads to more efficient use of bandwidth.
Factors related to the applications are:                       When a VoIP packet is transferred through an IP
                                                               network, it will experience delay that is caused by:
•    Overall packet loss
                                                               •    Transmission delay between the nodes, depends on
•    Jitter buffers                                                 the frame size and the transmission speed
•    Codec performance                                         •    Queuing delay in the nodes because of buffering
•   Switching and processing delay in the nodes, the         Real-Time Control Protocol (RTCP) allows the
    time to switch a frame from an input port to an          conveyance of feedback on the quality of the
    output port                                              transmission and it can also carry information on the
                                                             identity of the participants [4]. RTP and RTCP are
•   Propagation delay, depends on the characteristics of     mostly used on top of User Datagram Protocol (UDP)
    the transmission media and the distance between the      [3], which provides the use of a port number and a
    nodes                                                    checksum. The use of UDP enables also the use of IP
                                                             multicast i.e. sending packets to IP multicast addresses.
The use of statistical multiplexing means that the delay     This means that a RTP stream generated by a single
of sent packets within a conversation will vary. This        source can be received by several destinations. [1]
varying delay is called jitter. The jitter must be
minimized in the network and the remaining jitter needs
to be corrected by the receiving side using jitter buffers   3.         Bandwidth requirements
to make the original speech intelligible. Jitter buffers
increase the overall delay.                                  Vocoders that support Voice Activity Detection (VAD)
                                                             use less bandwidth in silence periods (m) than in active
Several technologies enable the use of statistical           speech periods (M). M and m values for popular coders
multiplexing and mixing of voice and data on the same        are shown in Table 1 [1].
transmission lines. Such technologies are voice over
frame relay, voice over ATM and VoIP. VoIP is the            Table 1: M and m values for popular codecs
most flexible technology because it does not require
virtual channels to be set up between the parties having a   Codec                 M (kbit/s) m (kbit/s)
conversation. Also, VoIP scales in terms of connectivity
much better than frame relay or ATM.
                                                             G.723.1 (5.3 kbit/s) 8            3,73
In IP networks, routers are the devices that execute the
statistical multiplexing functionality. IP packets
belonging to the same conversation may use different
routes having different delays and therefore they may be     G.723.1 (6.4 kbit/s) 9,07         3,73
received in different order than in which they were sent.
This is called desequencing.                                 Lucent SX7003P        20,27       13,87
When an overflow of the buffers in the network nodes
occurs because of network congestion, there will be
some packet loss, which must be handled by the               The M values in Table 1 include transport overheads of
receiving side. It makes no sense to resend part of speech   RTP, UDP and IP headers that are shown in Table 2 [1].
because of the overall delay. [1]
                                                             Table 2: Transport overheads
The bandwidth required by VoIP must be calculated
considering the bandwidth requirements of a single           Protocol                                   Overhead
conversation and the number of conversations on each                                                    (octets)
link in the network. Acceptable packet loss and the
buffering capacity of the nodes in the network must be
considered as well. Delay and jitter must be minimized       IPv4 (Internet Protocol version 4) [2]     20
in the network.
                                                             UDP (User Datagram Protocol) [3]           8
The receiving side must take care of the remaining
network jitter and the desequencing of packets. The          RTP (Real-time Transport Protocol) [4]     12
Real-time Transport Protocol (RTP) was designed to
allow the receiver to do the correction [4]. RTP includes:
•   Information on the type of data transported              Activity rate a, in a one-way network bitrate during a
                                                             voice conversation, is the proportion of active speech
•   Timestamps                                               intervals of the whole time of the conversation. An
                                                             average value is usually 0,35 but to be on the safe side
•   Sequence numbers                                         a=0,5 should be used in the calculations.
Average bitrate of a single one-way voice channel during    network, the network delay and jitter are minimized. In
a conversation is                                           an IP network with mixed voice and data traffic, some
                                                            mechanism must be used to ensure that the bandwidth
        Average_bitrate = Ma+m(1-a)                  (1)    calculated for VoIP is not used by other real-time traffic
                                                            or non real-time traffic. When calculations for VoIP are
where                                                       done assuming zero packet loss in the network,
                                                            somehow it must be taken care of that the buffers in the
        M = active bitrate (kbit/s)                         network nodes are not filled with packets of other traffic
                                                            types which would cause VoIP packets to be dropped
        m = silence bitrate (kbit/s)                        causing packet loss. Also, when the calculations for
                                                            VoIP are done assuming that there is no buffering in the
        a = activity rate (%)                               network nodes, because buffering would lead to
                                                            increased delay and jitter, it must be somehow taken care
For N simultaneous conversations using the same coder       of, that VoIP packets get sent first to the outgoing link
and with no buffering requirements for the nodes in the     even though there are packets of other traffic type in the
network, the average one-way bitrate is                     buffers.
        N*Average_bitrate = N(Ma+m(1-a)              (2)    First of all, VoIP traffic must be somehow differentiated
                                                            from other traffic types in the network so that it can be
This formula gives the bandwidth needed when zero           treated better. The nodes in the IP network, the routers,
packet loss is required.                                    can differentiate traffic according to source and
                                                            destination IP addresses, protocol type, port numbers and
Buffering in the network increases jitter and therefore     by the Differentiated Services (DS) field. The DS field
reduces interactivity. It is good practice to dimension     means the type of service (TOS) byte in IPv4 and the
VoIP links considering that there is no buffering in the    traffic class byte in IPv6.
network. This leads to some overprovision for slow
links, but this overhead can be used by non real-time       There are two basic approaches in an IP network with
traffic in an IP network designed for both voice and data   mixed voice and data traffic that can be used to improve
[1]. In an IP network with mixed voice and data the         the quality of VoIP [7]:
bandwidth requirements of VoIP are small compared to
the bandwidth used for data in today’s IP networks.         •      Integrated Services (IntServ) is a stateful approach
                                                                   where resources are reserved in the network before
When bandwidth needs to be reserved for voice in an IP             data starts to flow along the reserved path. [8]
network designed for both voice and data, information
needs to be gathered in order to know who phones            •      Differentiated Services (DiffServ) is a stateless
where, how often and how long. When an existing                    approach where real-time traffic is marked to get
circuit switched telephone network is planned to be                preferred treatment in the network. [9] [5]
realized by using VoIP, this information can be derived
from existing phone bills during a reference period [1].
When VoIP network is planned for a new                      4.1.       Integrated Services (IntServ)
implementation and no statistics or phone bills are
available, calculations of the voice traffic can be done    IntServ model proposes two service classes in addition to
using the Erlang model [1].                                 best-effort service: guaranteed service and controlled-
                                                            load service. Guaranteed service is for applications
An optimal route on the network is chosen for each of       requiring a fixed delay bound. Controlled-load service is
the calls considering the cost of each link per unit of     for applications requiring reliable and enhanced best-
bandwidth. After this, it is possible to calculate the      effort service. [12]
number of simultaneous calls on each link at any given
time. The peak number of simultaneous busy hour calls       IntServ requires that resources are explicitly managed
is used to calculate the needed link bandwidth for zero     for each real-time application. Routers must reserve
packet loss using the formula (2).                          resources (e.g. bandwidth and buffer space) in order to
                                                            provide specific QoS for each packet flow. This requires
                                                            flow-specific states in the routers. [12]
4.      Delay, jitter and packet loss
                                                            The four components of IntServ are:
When the bandwidth required by VoIP is calculated for
zero packet loss and no buffering is assumed in the
•      Flow specification - Flowspec describes the                    provider network according to the rules derived
       characteristics of the flow and it has two separate            from the SLA. Between domains, service provider
       parts, Tspec (describes flow’s traffic characteristics)        networks, DS fields may be remarked, if so defined
       and Rspec (specifies the service requested from the            in the SLA between the two service providers.
       network)                                                       These traffic control functions at hosts, access
                                                                      routers or edge routers are generically called traffic
•      Signaling protocol - e.g. Resource ReSerVation                 conditioning. Per hop behavior (PHB) are defined to
       Protocol (RSVP) [6]                                            allocate buffer and bandwidth resources at each
                                                                      node among traffic streams. PHB is applied to a
•      Admission control routine - determines whether a               DiffServ behavior aggregate and a DiffServ-
       request for resources can be granted.                          compliant node.
•      Packet classifier and scheduler - packets entering        •    DS CodePoint – DS field means the type of service
       a router are classified and put in the appropriate             (TOS) field in IPv4 and the traffic class byte in
       queue and then scheduled accordingly.                          IPv6. Six bits of this DS field are used as a
                                                                      codepoint (DSCP) to select the PHB for a packet at
                                                                      each node.
4.2.       Differentiated Services (DiffServ)
                                                                 •    A node mechanism for achieving PHB – Buffer
                                                                      management and packet scheduling mechanisms are
In DiffServ model traffic entering an IP network is
                                                                      used in nodes to achieve a certain PHB. PHBs are
classified, marked, policed and shaped at the edge of the
                                                                      defined as behavior characteristics relevant to
network.
                                                                      service provisioning policies instead of particular
                                                                      implementation        mechanisms.         Various
 The packets are then assigned to different behavior
                                                                      implementation mechanisms may be suitable for a
aggregates (BA). Each BA is identified by a single
                                                                      particular PHB group.
DiffServ CodePoint (DSCP). Users request a specific
performance level per packet by marking the DiffServ
field of each packet with a specific DSCP value which
specifies the Per-Hop-Behavior (PHB) within the                  5.       Conclusions
provider’s network. Packets are forwarded within the
core of the network according to the PHB.                        The issues to be considered in network dimensioning for
                                                                 VoIP are bandwidth, delay, jitter, desequencing and
The four components of DiffServ are [12]:                        packet loss. The bandwidth required by VoIP must be
                                                                 calculated considering the bandwidth requirements of a
•      Services - Characteristics of packet transmission in      single conversation and the number of conversations on
       one direction over a path in a network are defined by     each link in the network. Acceptable packet loss and the
       a service. DiffServ can be provided by two                buffering capacity of the nodes in the network must be
       approaches:                                               considered as well. When the bandwidth required by
                                                                 VoIP is calculated for zero packet loss and no buffering
       •   Quantitative DiffServ - QoS is specified in           is assumed in the network, the network delay and jitter
           deterministically or statistically quantitative       are minimized. The receiving side must correct the
           terms of throughput, delay, jitter and/or loss.       remaining network jitter and the desequencing of
                                                                 packets. In an IP network with mixed voice and data
       •   Priority based DiffServ - Services are                traffic, some mechanism must be used to ensure that the
           specified in terms of a relative priority of access   bandwidth calculated for VoIP is not used by other real-
           to network resources.                                 time traffic or non real-time traffic. There are two basic
                                                                 approaches to achieve this: Integrated Services (IntServ)
                                                                 and Differentiated Services (DiffServ). IntServ is a
•      Conditioning Functions and PHB - A user and a
                                                                 stateful approach where resources are reserved in the
       service provider must have a service level
                                                                 network before data starts to flow along the reserved
       agreement (SLA) in place that specifies the
                                                                 path. DiffServ is a stateless approach where real-time
       supported service classes and the amount of traffic
                                                                 traffic is marked to get preferred treatment in the
       allowed in each class. Individual packets have
                                                                 network.
       DiffServ (DS) fields that indicate the desired service
       and these DS fields can be marked at hosts or at the
       access router or at the edge router in the service
       provider network. Packets are classified, policed and
       possibly shaped at the ingress of the service
References
[1] Hersent, Olivier; Gurle, David; Petit, Jean-Pierre: IP   [13] TELECOMMUNICATIONS AND INTERNET
    Telephony,         Packet-based           multimedia          PROTOCOL           HARMONIZATION            OVER
    communications systems; Great Britain, 2000,                  NETWORKS ETSI PROJECT – TIPHON,
    www.awl.com/cseng/, ISBN 0-201-61910-5                        http://webapp.etsi.org/tbhomepage/TBDetails.asp?T
                                                                  B_ID=291&TB_NAME=TIPHON (12 March 2001)
[2] Postel, Jon: Internet Protocol, RFC 791, September
    1981                                                     [14] Padhye, Chinmay; Christensen, Kenneth J.; Moreno,
                                                                  Wilfrido: A New Adaptive FEC Loss Control
                                                                  Algorithm for Voice Over IP Applications; IEEE
[3] Postel, Jon: User Datagram Protocol, RFC 768, 28
                                                                  2000
    August 1980
[4] Schulzrinne, Henning; Casner, Stephen L.;
    Frederick, Ron; Jacobson, Van: RTP: A Transport
    Protocol for Real-Time Applications, RFC 1889,
    January 1996
[5] Blake, Steven; Black, David L.; Carlson, Mark A.;
    Davies, Elwyn; Wang, Zheng; Weiss, Walter: An
    Architecture for Differentiated Services, RFC 2475,
    December 1998
[6] Mankin, A.; Baker, Fred; Braden, Bob; Bradner,
    Scott; O'Dell, Michael; Romanow, Allyn; Weinrib,
    Abel; Zhang, Lixia; Resource ReSerVation Protocol
    (RSVP), Version 1 Applicability Statement, Some
    Guidelines on Deployment, RFC 2208, September
    1997
[7] Trends       in    the    Internet     Telephony,
    http://www.fokus.gmd.de/research/cc/glone/projects
    /ipt/ (11 March 2001)
[8] IETF Integrated Services (IntServ) Working Group
    charter,   http://www.ietf.org/html.charters/intserv-
    charter.html (11 March 2001)
[9] IETF Differentiated Services (DiffServ) Working
    Group                                       charter,
    http://www.ietf.org/html.charters/diffserv-
    charter.html (11 March 2001)
[10] Speech      Property-Based    FEC     (SPB-FEC),
     http://www.fokus.gmd.de/research/cc/glone/product
     s/voice/spb-fec/ (11 March 2001)
[11] Adaptive Packetization / Concealment (AP/C),
     http://www.fokus.gmd.de/research/cc/glone/product
     s/voice/apc/ (11 March 2001)
[12] Li, Bo; Hamdi, Mounir; Jiang, Dongyi; Cao, Xi-
     Ren: QoS-Enabled Voice Support in the Next-
     Generation Internet: Issues, Existing Approaches
     and Challenges; IEEE Communications Magazine,
     April 2000