MPEG Transport Stream

The MPEG-2 standard is defined by ISO/IEC 13818 as "the generic coding of moving pictures and associated audio information." It combines lossy video compression and lossy audio compression to comply with bandwidth requirements. The basic structure of all MPEG compression systems is asymmetric because the encoder is always more sophisticated than the decoder.

MPEG encoders are always algorithmic. The better ones are also adaptive, using a feedback path. MPEG decoders are not adaptive and perform a fixed function. This works well for applications like broadcasting, where the number of expensive complex encoders is few and the number of simple inexpensive decoders is enormous.

The MPEG standard provides little information about how encoder processes and operation. Rather, MPEG-2 specifies how a decoder interprets metadata in a bit stream. The metadata tells the decoder the rate the video was encoded, defines the audio coding, and identifies channels and other vital stream information.

A decoder that successfully deciphers MPEG streams is called compliant. The beauty of MPEG is that it allows different encoder designs to evolve simultaneously. Generic low-cost and proprietary high-performance encoders and encoding schemes all work because they are all designed to communicate with the compliant decoder base.

Stream Structures
An MPEG-2 stream can be either an Elementary Stream (ES), a Packetized Elementary Stream (PES) or a Transport Stream (TS). The ES and PES begin with and are stored as files. Individual ESs are essentially endless because the length of an ES is as long as the program itself.

Starting with analog video and audio content, individual ESs are created by applying MPEG-2 compression algorithms to the source content in the MPEG-2 encoder. This process is typically called ingest. The encoder creates an individual compressed ES for each audio and video stream. An optimally functioning encoder will appear transparent when decoded in a set-top box and displayed on a professional video monitor.

A good ES depends on several factors, beginning with the quality of the original source material, and the care used in monitoring and controlling audio and video variables when material is ingested. The better the baseband signal, the better the quality of the digital file. Also influencing ES quality is the encoded stream bit rate and how well the encoder applies its MPEG-2 compression algorithms within the allowable bit rate.

MPEG-2 has two main compression components: intraframe spatial compression and interframe motion compression. Encoders use a variety of techniques, some proprietary, to maintain the maximum allowed bit rate while at the same time allocating bits to both compression components. This balancing act can sometimes be unsuccessful. It is a tradeoff between allocating bits for detail in a single frame and bits to represent frame to frame motion changes.

Researchers are still investigating what constitutes a good picture. Presently, there is no direct correlation between the data in the ES and subjective picture quality. For now, the best way of checking encoding quality is with the human eye, after decoding.

The Packetized ES
Each ES is broken into variable-length packets. The result is a PES containing a header and payload bytes. The header includes information about the encoding process required by the MPEG decoder to decompress the ES.

Each individual ES results in an individual PES. At this point, audio and video information still resides in separate PESs. The PES is primarily a logical construct and is not actually intended to be used for interchange, transport and interoperability. The PES also serves as a common conversion point between TSs and PSs.

Both the TS and PS are formed by packetizing PES files. During the formation of the TS, additional packets, containing tables needed to demultiplex the TS, are inserted. These tables are collectively called PSI and will be addressed in detail later.

Some packets contain timing information for their associated program, called the program clock reference (PCR). The PCR is inserted into one of the optional header fields of the TS packet. Recovery of the PCR allows the decoder to synchronize its clock to the rate of the original encoder clock.

Null packets, containing a dummy payload, may also be inserted to fill the intervals between information-bearing packets.

TS packets are fixed in length at 188 bytes with a minimum 4-byte header and a maximum 184-byte payload. Key fields in the minimum 4-byte header are the sync byte and the Packet ID (PID). The sync byte's function is indicated by its name. It is a long digital word used for defining the beginning of a TS packet.

The PID
The PID is a unique address identifier. Every video and audio stream as well as each PSI table needs a unique PID. The PID value is provisioned in the MPEG multiplexing equipment. Certain PID values are reserved or specified by organizations such as the Digital Video Broadcasting Group (DVB) and the Advanced Television Systems Committee (ATSC) for electronic program guides.

In order to reconstruct a program from all its video, audio and table components, it is necessary to ensure that the PID assignment is done correctly and that there is consistency between PSI table contents and the associated video and audio streams. This is one of the more critical points in a MPEG-2 stream.

There are four other important fields in the TS header. One is the continuity counter. It is a 4-bit field that repeatedly increments zero through 15 for each PID. It’s used to determine if packets are lost or repeated PCR. Second is the discontinuity indicator. It indicates a time base (PCR) and continuity counter discontinuity, which allows the decoder to handle such discontinuities. Third is the random access indicator. It indicates that the next PES packet in the PID stream contains a video-sequence header or the first byte of an audio frame. Fourth is the splice countdown. It indicates the number packets of the same PID number to the splice point when a new PES packet begins.

PSI
During the formation of the TS, additional packets, containing tables needed to demultiplex the TS, are inserted. These tables are collectively called PSI. PSI is part of the TS. PSI is a set of tables required for demultiplexing and sorting out which PIDs belong to which programs.

To identify which audio and video PIDs contain the content of a particular program, a Program Map Table (PMT) must be decoded. Each program requires its own PMT with a unique PID value.

In order to determine which PID contains the desired program's PMT, the Program Allocation Table (PAT) must be decoded. The PAT is the master PSI table with PID value always equal to zero (PID = 0). If the PAT cannot be found and decoded in the TS, then no programs can be found, decompressed, or viewed.

For a set-top box or ATSC tuner to successfully perform the program recovery and decompression process, the PSI tables must be sent periodically and with a fast enough repetition rate that it doesn’t delay channel-surfing viewers. Thus, checking the PSI tables for correct syntax and repetition rate is a vital part of MPEG testing.

Testing PSI involves verifying the accuracy and consistency of PSI contents. As programs change or multiplexer provisioning is modified, some problems may occur. One problem would be unreferenced PID. Packets with a PID value are present in the TS but are not referenced in any table.

If there are no packets with the PID value referred to in a PSI table present in the TS, the problem could be a missing PID.

Another useful PSI test is a check of program content. Just because there are no unreferenced or missing PIDs indicated does not mean that the viewer is receiving the correct program. There may also be a mismatch of the audio content from one program being delivered with the video content from another program. Because MPEG allows multiple audio channels for multiple languages, an air-check can ensure that viewers are receiving the correct language.

It is possible to use a set-top box and television to do the air check, but a better way would be to use an MPEG test set that incorporates all the PSI table checks plus a built-in decompressor with picture and audio display. This would allow you to correlate PSI contents and actual program content as well as allow a quick visual and aural check of ES.

By Ned Soseman, Broadcast Engineering