What exactly is the quantization matrix? I get that increasing the image quality will lower the Quantization coefficients (so fewer cells will zero out), but I'm not sure where those coefficients come from in the first place.
Quantization in general is a way reduce the number of ways to reduce the number of bits necessary to store something by reducing its precision. One of the assumptions of JPEG is that we can generally ignore the high frequency changes because the eye isn't as sensitive to those. I think the quantization matrix elements are a way to represent the resolution or step size of the different frequency cosines when reconstructing the image discretely. This would make sense because the values on the upper left are small meaning when reconstructing we take a lot more of these low frequency samples since the step size is small vs the elements on the bottom right which are much more high frequency so we take a much larger step size and so fewer of these high frequency cosine basis vectors. A nice consequence of this is that many cells become 0. As for where the coefficients come, I think its after careful experimentation as to what the eye is sensitive to, much like the other experiments in this lecture.
I have a follow up question. Why are the values not monotonically increasing horizontally and vertically. For example, in the last row, why does it peak somewhere in the middle? Shouldn't we give the biggest step size to the highest frequency component? Or is this JPEG's way of compensating for generally ignoring the edges?
@kapalani. Great question. It's because human eye has the greatest contrast sensitivity at low-medium-ish spatial frequencies (contrast sensitivity is the ability to distinguish between two different luminances. You'll notice that the quantization table values dip slightly in the first third of the table and then get larger. This dip roughly matches the human contrast sensitivity function which peaks at 2-5 cycles per degree.
See the subsection "Contrast Sensitivity" in this article: https://en.wikipedia.org/wiki/Contrast_(vision)
It seems like JPEG would be a plausible compression format if EITHER real world images are dominated by low frequencies OR if human vision is insensitive to high frequencies. But through luck (or evolution?) BOTH are true, which makes JPEG even more effective.
Well, if high-frequencies were dominant in the real world and measuring those frequencies was really important (e.g., predators or food only appeared as high-frequency information), then survival-of-the-fittest would say that the most fit species would have good eyes for that information!