A Streaming Video Primer

The article is aimed at AV consultants and integrators in particular, to provide both groups with more confidence to use streaming based solutions in place of traditional video transmission methods and products. We wrote it to help to demystify the subject of streaming video, also known as IPTV, and "video over IP".

Please note that the article presents a number of simplifications in regards to some of the technical detail... we did not want to get bogged down in the full complexity of the subject more than necessary to achieve a solid high-level understanding.

We conclude by offering a list of questions that need to be reviewed when evaluating how to best use streaming in a particular situation.

Why should I send video over I.P.?

The simplest reason is the cost of installing dedicated cabling infrastructure. Structured twisted-pair cabling, switching and routing is ubiquitous in the corporate or educational environment. As the bandwidth and overall capacity of corporate data networks has increased dramatically over the past decade, it has become mainstream for that same infrastructure to be used to carry real-time voice traffic in addition to data.

A more significant reason though is flexibility. Streaming video can be easily transmitted to anywhere the I.P. network goes, in contrast to dedicated point-to-point video links. Video over Cat6, for example as provided for by the HDBaseT standard, has its places but streaming has a lot of other functional advantages.

Having said that, there will be the odd occasion where streaming may not be the answer. For example, if you need zero (or close to zero) latency from end-to-end, as you might require in a remote KVM situation, then other solutions may be more appropriate. (A subject for a future article!)

How did this all get started?

The precursor to video-over-IP was surely voice-over-IP (VOIP), which is no longer considered a novel use of the corporate I.T. infrastructure. In its early days though, it created new challenges for the network administrators to manage the additional technical parameters such as jitter, latency, and traffic prioritisation in order to guarantee the quality of the call.

We are now at the early stages of this phenomenon being repeated with respect to the carriage of video. Of course, virtually everyone in the world has first-hand experience of video being delivered to their doorstep in the form of data packets, courtesy of YouTube. To begin with, this relied on mostly proprietary technologies, protocols (RTMP) and software (e.g. Flash player) developed by Adobe, and hence there was a certain degree of mystique in how this all worked. But the pressure for open standards and platforms has given us alternative ways to achieve all this.

And indeed, what works for delivering recorded video to many millions of end-users over the Internet does not necessarily suit simple point to point situations that exist within corporate walls.

In the context of the corporate environment, video is generally required to be moved from one location to another on either a point to point basis, or "broadcast" to multiple listeners connected to the WAN in a way that is conceptually similar to how conventional MATV achieves this with RF signal modulation, amplification and distribution.

How is video turned into data?

Let us assume for the purpose of this discussion that we are confining ourselves to video that is already digital in nature. The HDMI output of a computer is a classic example. This signal contains digital video in a "raw", uncompressed form. It consists mainly of primary colour signals (RGB) and timing, along with auxiliary information including audio which is carried by separate pins.

Each video picture consists of a series of "frames" that are displayed on a TV or computer monitor, most often around 60 Hz or 60 frames per second. Turning these signals directly into "data packets" is therefore relatively straight forward - in a sense this is essentially what HDBaseT does in the context of running video over dedicated Cat-6 cable.

However, these uncompressed data streams consume enormous bandwidth, and so they need to be compressed in some way to use them on a typical corporate I.P. network. For example, one 1080p60 RGB signal requires about 3 Gigabits per second of bandwidth. Most ethernet switches today would offer only 100Megabits or perhaps even 1Gigabit to a device connected to a port on the switch device, hence clearly it is not going to be possible to carry uncompressed video in this form through a typical I.P. network.

Fortunately, most video traffic - especially content that is film based - is highly compressible without significant compromise in picture quality.

There are more than one technique for compressing video for streaming, and they are normally used in combination; there is colour compression, spatial compression and temporal compression. Let's go under the covers of this a bit more...

Colour Compression

You may be familiar with the RGB representation of colour. For example, a single pixel has three values that each represent the strength of each primary colour. Commonly in most desktop computing, each value is allocated 8 binary bits, or 256 possible values from 0 to 255. A value of 0,0,0 thus represents black, and 255,255,255 represents white. Most visible colours can be represented by all the possible combinations of these three values. So this way of representing pixels requires 24 bits (or 3 bytes) per pixel.

This is actually not the only way to represent a colour pixel value though. Another way of doing it is called YUV, and it's a way of representing colours more like the way the human eye perceives color. It turns out the human eye is actually a bit more sensitive to the light intensity (the "luminance") than to the actual colour information (the "chroma"), so some of that chroma information can be thrown out without losing too much in picture quality.

Furthermore, in a process known as "subsampling", the colour information is effectively imparted at a lower resolution than the luminance information, making further savings in bandwidth. You may have seen reference to colour pixel formats 4:4:4, 4:2:2 and 4:2:0. These all represent different levels of YUV subsambling. 4:4:4 format is the best quality (i.e. the least compression) but still uses less bandwidth than pure uncompressed 24-bit RGB. If you are wanting to move a video signal with a lot of fine detail in it (lines, small text fonts) as you might see in an Excel spreadsheet for example, then 4:4:4 may be a requirement. Otherwise you may be fine with one of the more aggressive profiles.

Temporal Compression

Instead of sending every frame in a video as an independent block of data, we can analyse what's happening in the pictures in a group of successive frames and see if there is scope for saving bandwidth across time. Typically this is what happens inside a movie compression "codec". One of the mostly widely supported such codecs is the H.264 standard.

These algorithms basically reduce a stream of video into a series of full frames, with a larger number of intermediate frames between them which represent "deltas" or changes in pixel values. Thus if the scene is relatively static, then for that period of time the savings in bandwidth can be quite massive. A good encoder implementation will allow a lot of control over things like the Group of Pictures structure (i.e. the ratio of full frames to delta frames), the amount of loss that can be tolerated, and the tradeoff between loss and latency.


Latency is a term describing how long it takes to move a frame of video through a transport process. The latency of moving video down to a screen using HDMI is extemely low, but at the cost of very high bandwidth needed. Conversely, there is non-zero latency associated with packet-encoded video simply because the encoder has to look at a whole series of frames before it can perform the (temporal) compression. On the encode side this latency can be anything from 50mS to several hundred mS. That's before allowing for deliberate buffering introduced on the decode side to keep video playback smooth rather than stuttering if the amount of available bandwidth is highly variable and not well controlled.

In practice, it is actually possible with good H.264 implementations to achieve "glass to glass" latency of around 100mS for full HD video (1080p60).

As a point of contrast, HLS streaming, as now happens when you watch video through your HTML5 browser, has a high degree of buffering on an end-to-end basis, to the latency could be 15 or 30 seconds or more. This is often unacceptable in many AV scenarios, especially control rooms watching live camera feeds.

Delivering Streams over I.P. - Sessions (TCP) or Sessionless (UDP)

A really fundamental aspect of video streaming over an I.P. network is the way the packets are structured for transmission.

Most I.P. traffic, for example between a desktop PC and a server can be broken down into two fundamental types:

  • Session-based, (TCP) which means the data packets are guaranteed to be reliably delivered, and in the sequence they were generated; and
  • Datagram-based (UDP), which offers performance advantages for certain applications, but there is no guarantee of delivery or reception order.

HTTP is an example of a session-based TCP protocol. If a packet gets lost along the way, there is a mechanism in place for a retransmit of the missing piece, so that everything is delivered, and also arrives in the right order.

Conversely, dynamic media such as VOIP and streaming video - where low latency is more important than losing the occasional packet - gets transmitted just as a series of packets without all that session-based overhead. The two predominate UDP stream formats (loosely called protocols but this is really a misnomer in this case) are MPEG-TS and RTP. Both are widely used.

RTP is often used in conjunction with companion protocols, typically RTCP, which is a "control" protocol. The purpose of RTCP is not to carry media, but to facilitate connections between endpoints and control playback. The instruction to pause or resume the playback of a video for example is carried by RTCP, whereas the media itself is delivered on a separate "channel" so to speak via RTP.

The combination of RTCP for control and RTP for media has an umbrella term: RTSP, or "Real Time Streaming Protocol".

In contrast, MPEG-TS may actually be used without a control protocol at all, simply because it has its roots in broadcast applications. (The manner in which digital TV DVB-T encodes video employs MPEG-TS rather than RTP).

Delivering Streams over I.P. - Unicast, Multicast and Broadcast

The second key concept with I.P. is how packets are addressed. Every packet that flows on an I.P. network has a source and destination address.

When the destination corresponds to a single endpoint, this is called a unicast (or "one to one") packet. Any session-based traffic, such as HTTP or RTCP is inherently unicast because of the need to manage delivery re-transmission and sequencing on a "per session" basis.

When the destination is addressing "all stations" on a section of the network, or the entire network, this is called a broadcast. Broadcasts are only used in fairly narrow situations, as otherwise they would result in large amounts of traffic being carried around parts of the network where it didn't need to go, clogging it up in the process. It is in fact quite common for broadcasts to be disabled from flowing through routers, otherwise low-speed links that are used to connect branch offices for example can end up being severely impacted by unnecessary traffic propogation.

The third type of addressing scheme, which is actually the most interesting for video, is called multicast. Put simply, this is a way of transmitting a stream of data to multiple parties, and have that traffic only flow through the parts of the LAN (or WAN) that is needs to go to reach all the parties involved. The use of Multicast is a an efficiency meausure. If it is disabled in the network, which is still common in many corporate networks, then all conversations have to be unicast. The problem with that is, if you have three parties all wanting to watch a stream at the same time, the sender has to individually stream it three times, once to each party. This is clearly not very scalable. The sender may not in fact have enough horsepower to manage this for a larger number of end clients, and the network may become saturated with what is essentially duplicate information.

What about content protected by DRM ?

Some content, for example what comes out of a BluRay player, is protected with digital rights mechanisms such as HDCP. This is essentially an end-to-end encryption scheme to prevent the video signal from being visible in its unprotected form. Unfortunately this is an area where there is a direct contradiction of objectives of HDCP [to prevent 'broadcast' implementations] and what streaming aims to do, especially in combination with IP multicast. This may or may not be an obstacle in certain corporate situations.

Key Questions to c onsider when considering the use of streaming :

  • What is the nature of the content? (film, TV, desktop video?)

  • What frame rate is required? (15, 25, 30, 50, 60 fps?)

  • What frame resolution is required? (720p, 1080p, 4K?)

  • What colour space is necessary? (RGB, YUV 4:2:0, 4:2:2 or 4:4:4 ?)

  • How many concurrent streams is the network backbone expected to carry?

  • Is there sufficient bandwith in the backbone of the I.P. switch(es)?

  • Is there Quality of Service management policy on the LAN/WAN?

  • Separate VLAN's for AV traffic? (Only relevant if all encoders and decoders are placed in the VLAN together... but routing to other VLAN's may still be required).

  • Is there a need for compatibility with existing streams making use of RTSP, RTP, HLS or MPEG-TS protocols?

  • Is the network enabled for multicast? (NB, the Internet does not support multicast, but VPN tunnels using the Internet for carriage may well)

  • Which party is responsible for the administration of multicast addresses in the organisation?


Transmitting video over I.P. is now mainstream, and we're now seeing second generation products and platforms appearing in the marketplace that are providing sophisticated functionality and scalability. As a consultant or integrator, it is crucial to have a suite of streaming solutions at your disposal.

Examples from PanoTek's range are the CES-400 and CES-1200 encoding appliances.

You are cordially invited to a webinar workshop demonstrating our CES-400 product this Wednesday, 21st February at 1:30pm AEDT. Please register your interest in attending the webinar by visiting the workshop registration page.