This article is from the MPEG FAQ, by Frank Gadegast phade@cs.tu-berlin.de with numerous contributions by others.
Well, with audio you basically have two alternatives. Either
you sample less often or you sample with less resolution (less
than 16 bit per sample). If you want quality you can't do much
with the sample frequency. Humans can hear sounds with
frequencies from about 20Hz to 20kHz. According to the Nyquist
theorem you must sample at least two times the highest
frequency you want to reproduce. Allowing for imperfect
filters, a 44,1kHz sampling rate is a fair minimum. So
you either set out to prove the Nyquist theorem is wrong or
go to work on reducing the resolution. The MPEG committee
chose the latter.
Now, the real reason for using 16 bits is to get a good
signal-to-noise (s/n) ratio. The noise we're talking
about here is quantization noise from the digitizing
process. For each bit you add, you get 6dB
better s/n. (To the ear, 6dBu corresponds to a doubling of
the sound level.) CD-audio achieves about 90dB s/n. This
matches the dynamic range of the ear fairly well. That is, you
will not hear any noise coming from the system itself (well,
there is still some people arguing about that, but lets not
worry about them for the moment).
So what happens when you sample to 8 bit resolution? You get
a very noticeable noise floor in your recording. You can
easily hear this in silent moments in the music or between
words or sentences if your recording is a human voice.
Waitaminnit. You don't notice any noise in loud passages,
right? This is the masking effect and is the key to MPEG Audio
coding. Stuff like the masking effect belongs to a science
called psycho-acoustics that deals with the way the human
brain perceives sound.
And MPEG uses psychoacoustic principles when it does its
thing.
 
Continue to: