This article is from the MPEG FAQ, by Frank Gadegast phade@cs.tu-berlin.de with numerous contributions by others.
Oh, right. Well, I don't know as much about the audio compression.
Basically they use very carefully developed psychoacoustic models
derived from experiments with the best obtainable listeners to
pick out pieces of the sound that you can't hear. There are what
are called "masking" effects where, for example, a large component
at one frequency will prevent you from hearing lower energy parts
at nearby frequencies, where the relative energy vs. frequency
that is masked is described by some empirical curve. There are
similar temporal masking effects, as well as some more complicated
interactions where a temporal effect can unmask a frequency, and
vice-versa.
The sound is broken up into spectral chunks with a hybrid scheme
that combines sine transforms with subband transforms, and the
psychoacoustic model written in terms of those chunks. Whatever
can be removed or reduced in precision is, and the remainder is
sent. It's a little more complicated than that, since the bits
have to be allocated across the bands. And, of course, what is
sent is entropy coded.
 
Continue to: