This article is from the MPEG FAQ, by Frank Gadegast phade@cs.tu-berlin.de with numerous contributions by others.
A. Experiments showed little compaction gains could be achieved with
larger transform sizes, especially in light of the increased
implementation complexity. A fast DCT algorithm will require roughly
double the number of arithmetic operations per sample when the linear
transform point size is doubled. Naturally, the best compaction
efficiency has been demonstrated using locally adaptive block sizes
(e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See Gary Sullivan and Rich
Baker "Efficient Quadtree Coding of Images and Video," ICASSP 91, pp
2661-2664.].
Inevitably, adaptive block transformation sizes introduce additional
side information overhead while forcing the decoder to implement
programmable or hardwired recursive DCT algorithms. If the DCT size
becomes too large, then more edges (local discontinuities) and the like
become absorbed into the transform block, resulting in wider
propagation of Gibbs (ringing) and other unpleasant phenomena.
Finally, with larger transform sizes, the DC term is even more
critically sensitive to quantization noise.
 
Continue to: