This article is from the CD-Recordable FAQ, by Andy McFadden (firstname.lastname@example.org) with numerous contributions by others.
EFM encoding is applied to a series of bytes called a "frame". Some
sources -- including the SCSI-3 MMC specification -- refer to a CD sector
as a "frame", but that's incorrect usage. A frame holds 24 bytes of user
data, 1 byte of subcode data, and 8 bytes of parity (error correction),
for a total of 33 bytes.
When read from the disc, each frame is preceded by a 24-bit synchronization
pattern and 3 merging bits. The sync data has a unique pattern not
found elsewhere on the disc, and it ensures the read head correctly
finds the start of the frame. (The pattern is 100000000001000000000010,
three transitions separated by 11T, which can't occur otherwise because
the merging bits are specifically chosen to prevent it.) If you don't
understand why having a sync field is important, remember that every time
the read head seeks to a new part of the disc or is confused by a scratch,
it has to start reading in the middle of a stream of 1s and 0s and try to
make sense of what it's reading. Until it sees a synchronization pattern,
it has no idea if it's reading the start or middle of a frame, or even if
it's at the start or middle of an EFM word.
The rest of the 33-byte frame is read as 14-bit EFM values followed by 3
merging bits. This means there are 588 (24 + 3 + (14+3)*33) "channel bits"
in a frame. This 588-bit structure is called a "Channel Frame".
Once EFM is decoded and the merging bits discarded, we are left with an
"F3 Frame". The subcode byte is removed, and the remaining data (now an
"F2 Frame") is passed into the CIRC (Cross-Interleave Reed-Solomon) decoder.
The decoder is an important part of the reason why CDs and CD-ROMs work.
The raw error rate from a CD is around 1 error per 100K to 1 million bits.
That's pretty good, but at 4 million bits per second (588 channel bits
per frame x 98 frames per sector x 75 sectors per second = 4.3218Mbps),
the errors add up quickly. CIRC encoding takes the 192 bits (24 bytes)
of data and 64 bits (8 bytes) of parity, shuffles it around, and performs
some weird math involving Galois Fields. The bits are processed by two
error correction stages, referred to as C1 and C2. The efficacy of the
results can be expressed as a set of error counts.
Errors are noted with a two-digit number that indicates the number of
errors with the first digit and the CIRC decoder stage with the second
digit. The E11 count indicates the number of single-symbol (correctable)
errors in the C1 decoder. E21 indicates double-symbol (correctable)
errors in C1, and E31 indicates triple-symbol (uncorrectable at C1)
errors in C1. The sum of these counts is the Block Error Rate (BLER),
a measure of correctable and uncorrectable errors. The CD standard
sets the acceptable limit to 220 BLER errors per second, averaged over
a 10-second stretch.
The E12 count indicates the number of single-symbol (correctable) errors
in the C2 decoder. Because the data is interleaved after the C1 pass, one
E31 error can generate up to 30 E12 errors, so a high error count here is
not problematic. E22 counts double-symbol (correctable) errors, which are
a bad sign. The sum of E21 and E22 form a burst error count (BST), which
can be used to identify physical defects on a disc.
Any E32 errors, representing triple-symbol (uncorrectable) errors in the C2
decoder, result in damaged data. For an audio CD interpolation is performed,
for a CD-ROM the damaged data must be repaired at a higher level. (This,
incidentally, explains how some forms of audio CD copy protection work.
The CD author introduces deliberate uncorrectable errors to the CD.
An audio player will inaudibly interpolate across them, but a CD-ROM
performing digital audio extraction will simply return the bad bits.)
Some software, e.g. Plextor's PlexTools, refer to E32 errors as "CU errors".
With CIRC, the bit error rate is reduced to one in 10 to 100 billion. The 24
bytes that comes out of the CIRC decoder are referred to as an "F1 Frame".
It's worth noting that the subcode channels are not CIRC-encoded, and hence
are the least-reliable storage directly accessible to the user. The EFM
encoding provides some protection against single-bit errors, because only
256 of the 16,384 possible combinations are valid, but without any parity
bits the best the drive can do is tell you that it failed to read the
data correctly. The Q subcode channel, which can hold vital information
about the disc, has a 16-bit CRC.