This article is from the MPEG FAQ, by Frank Gadegast phade@cs.tu-berlin.de with numerous contributions by others.
The MPEG-1 specification (official title: ISO/IEC 11172 Information
technology Coding of moving pictures and associated audio for digital
storage media at up to about 1.5 Mbit/s, Copyright 1993.) consists of
five parts. Each document is a part of the ISO/IEC number 11172. The
first three parts reached International Standard in 1993. Part 4
reached IS in 1994. In mid 1995, Part 5 will go IS.
Part 1---Systems: The first part of the MPEG standard has two primary
purposes: 1). a syntax for transporting packets of audio and video
bitstreams over digital channels and storage mediums (DSM), 2). a
syntax for synchronizing video and audio streams.
Part 2---Video: describes syntax (header and bitstream elements) and
semantics (algorithms telling what to do with the bits). Video breaks
the image sequence into a series of nested layers, each containing a
finer granularity of sample clusters (sequence, picture, slice,
macroblock, block, sample/coefficient). At each layer, algorithms are
made available which can be used in combination to achieve efficient
compression. The syntax also provides a number of different means for
assisting decoders in synchronization, random access, buffer
regulation, and error recovery. The highest layer, sequence, defines
the frame rate and picture pixel dimensions for the encoded image
sequence.
Part 3---Audio: describes syntax and semantics for three classes of
compression methods. Known as Layers I, II, and III, the classes trade
increased syntax and coding complexity for improved coding efficiency
at lower bitrates. The Layer II is the industrial favorite, applied
almost exclusively in satellite broadcasting (Hughes DSS) and compact
disc video (White Book). Layer I has similarities in terms of
complexity, efficiency, and syntax to the Sony MiniDisc and the Philips
Digitial Compact Cassette (DCC). Layer III has found a home in ISDN,
satellite, and Internet audio applications. The sweet spots for the
three layers are 384 kbit/sec (DCC), 224 kbit/sec (CD Video, DSS), and
128 Kbits/sec (ISDN/Internet), respectively.
Part 4---Conformance: (circa 1992) defines the meaning of MPEG
conformance for all three parts (Systems, Video, and Audio), and
provides two sets of test guidelines for determining compliance in
bitstreams and decoders. MPEG does not directly address encoder
compliance.
Part 5---Software Simulation: Contains an example ANSI C language
software encoder and compliant decoder for video and audio. An
example systems codec is also provided which can multiplex and
demultiplex separate video and audio elementary streams contained in
computer data files.
As of March 1995, the MPEG-2 volume consists of a total of 9 parts
under ISO/IEC 13818. Part 2 was jointly developed with the ITU-T,
where it is known as recommendation H.262. The full title is:
Information Technology--Generic Coding of Moving Pictures and
Associated Audio. ISO/IEC 13818. The first five parts are organized in
the same fashion as MPEG-1(System, Video, Audio, Conformance, and
Software). The four additional parts are listed below:
Part 6 Digital Storage Medium Command and Control (DSM-CC): provides a
syntax for controlling VCR- style playback and random-access of
bitstreams encoded onto digital storage mediums such as compact disc.
Playback commands include Still frame, Fast Forward, Advance, Goto.
Part 7 Non-Backwards Compatible Audio (NBC): addresses the need for a
new syntax to efficiently de- correlate discrete mutlichannel surround
sound audio. By contrast, MPEG-2 audio (13818-3) attempts to code the
surround channels as an ancillary data to the MPEG-1
backwards-compatible Left and Right channels. This allows existing
MPEG-1 decoders to parse and decode only the two primary channels while
ignoring the side channels (parse to /dev/null). This is analogous to
the Base Layer concept in MPEG-2 Scalable video. NBC candidates include
non-compatible syntaxs such as Dolby AC-3. Final document is not
expected until 1996.
Part 8 10-bit video extension. Introduced in late 1994, this
extension to the video part (13818-2) describes the syntax and
semantics to coded representation of video with 10-bits of sample
precision. The primary application is studio video (distribution,
editing, archiving). Methods have been investigated by Kodak and
Tektronix which employ Spatial scalablity, where the 8-bit signal
becomes the Base Layer, and the 2-bit differential signal is coded as
an Enhancement Layer. Final document is not expected until 1997 or
1998. [Part 8 will be withdrawn]
Part 9 Real-time Interface (RTI): defines a syntax for video on demand
control signals between set-top boxes and head-end servers.
 
Continue to: