Emerging Tech

How JPEG Compression Actually Works

JPEG is the cockroach of file formats. It was standardized in 1992, predates the web browser, and has survived every attempt to replace it. WebP, AVIF, HEIC — they all compress better, yet JPEG remains the most widely used image format on the internet. Part of that is inertia. But part of it is that JPEG's compression pipeline is genuinely brilliant — a masterclass in applied signal processing that's worth understanding even if you never write an encoder.

Most developers treat JPEG as a black box: put an image in, get a smaller file out, crank the quality slider until it looks acceptable. But understanding what happens inside the box changes how you think about images, compression, and the trade-offs between file size and visual quality.

The Pipeline at a Glance

JPEG compression is a pipeline of five stages, each doing something clever. The magic is that they compose: each stage sets up the next one to be more effective.

  1. Color space conversion — RGB to YCbCr (separate brightness from color)
  2. Chroma subsampling — reduce resolution of color channels
  3. Discrete Cosine Transform (DCT) — convert spatial data to frequency data
  4. Quantization — discard high-frequency detail (this is the lossy step)
  5. Entropy coding — lossless compression of the quantized data

Steps 1 and 2 exploit how human vision works. Steps 3 and 4 exploit properties of natural images. Step 5 is standard data compression. Each step is individually simple. The insight is the sequence.

Step 1: Color Space Conversion

Your camera captures images in RGB — red, green, and blue channels, each with 8 bits per pixel. JPEG's first move is to convert this into YCbCr: one luminance (brightness) channel and two chrominance (color) channels. This isn't compression yet — no data is lost. It's a coordinate transform, like rotating axes.

Why bother? Because human eyes are far more sensitive to brightness than to color. You can see incredibly fine detail in a black-and-white photograph, but your ability to perceive color resolution is much coarser. By separating brightness from color, JPEG can treat them differently — preserving brightness detail while aggressively reducing color detail.

RGB to YCbCr conversion (simplified):
Y  =  0.299R + 0.587G + 0.114B    (luminance — brightness)
Cb = -0.169R - 0.331G + 0.500B    (blue chrominance)
Cr =  0.500R - 0.419G - 0.081B    (red chrominance)
Note: green dominates the luminance calculation because
human eyes have far more green-sensitive cone cells.
This isn't arbitrary — it matches the physiology.

Step 2: Chroma Subsampling

This is where the first real data reduction happens. Since humans can't perceive color at full spatial resolution, JPEG reduces the resolution of the Cb and Cr channels — typically by half in both dimensions. This is called 4:2:0 subsampling. For every four luminance pixels, there's one chrominance sample.

The result: you've just cut the raw data by half (from 3 full-resolution channels to 1 full + 2 quarter-resolution channels), and the image looks virtually identical to the original. Try it yourself — take any photograph, reduce the color channel resolution by 4x, and you'll struggle to see the difference. Your brain fills in the color detail from the luminance information.

This doesn't work equally well for all images. Photographs compress beautifully because natural scenes have smooth color gradients. But images with sharp color boundaries — like text on a colored background, a red circle on white, or pixel art — show visible color fringing after chroma subsampling. This is why JPEG is bad at text: the sharp color edges bleed.

Step 3: The Discrete Cosine Transform

The DCT is where JPEG gets interesting. The image is divided into 8×8 pixel blocks, and each block is transformed from spatial domain (pixel brightness values) into frequency domain (how much of each spatial frequency is present).

Think of it this way: an 8×8 block of pixels can be represented as a weighted sum of 64 basis patterns. The first pattern is a flat block (the average value). The next patterns add horizontal and vertical gradients. Higher-numbered patterns add increasingly fine detail — rapid oscillations in brightness.

An 8x8 DCT block conceptually:
DC ──► Low frequency ──────────────► High frequency
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ AVG │  →  │  →  │  →  │  →  │  →  │  →  │  →  │ Low freq
│     │     │     │     │     │     │     │     │   ↓
│  ↓  │  ↘  │     │     │     │     │     │     │
│     │     │  ↘  │     │     │     │     │     │
│     │     │     │  ↘  │     │     │     │     │
│     │     │     │     │  ↘  │     │     │     │
│     │     │     │     │     │  ↘  │     │     │
│     │     │     │     │     │     │  ↘  │     │
│     │     │     │     │     │     │     │ FINE │ High freq
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
Top-left = low frequency (smooth gradients)
Bottom-right = high frequency (sharp edges, fine detail)
Natural images: most energy concentrated in top-left
Text/line art: significant energy in bottom-right

The DCT itself is lossless — you can perfectly reconstruct the original 8×8 block from its 64 DCT coefficients. But here's the critical property: for natural images (photographs, gradients, organic textures), most of the energy is concentrated in the low-frequency coefficients. The high-frequency coefficients — representing fine detail — tend to be small. This is what makes the next step work.

Why 8×8? It's a compromise. Larger blocks (16×16, 32×32) would concentrate energy better, but they're more expensive to compute and cause visible 'blocking' artifacts at low quality settings. Smaller blocks (4×4) would reduce blocking but wouldn't concentrate energy as well. 8×8 turned out to be a sweet spot that's endured for over 30 years.

Step 4: Quantization — Where Data Dies

This is the lossy step. Every DCT coefficient is divided by a corresponding value from a quantization table, then rounded to the nearest integer. Large divisors (for high-frequency coefficients) round small values to zero. Those zeros are gone forever — this is where information is permanently discarded.

Example quantization (quality ~50):
DCT coefficient: 42.7    Quantization divisor: 16    Result: round(42.7/16) = 3
DCT coefficient: 8.3     Quantization divisor: 40    Result: round(8.3/40)  = 0
DCT coefficient: -2.1    Quantization divisor: 99    Result: round(-2.1/99) = 0
The quality slider in your image editor controls the quantization table.
Quality 100: divisors close to 1 (almost no rounding, huge file)
Quality 75:  moderate divisors (good balance, typical web use)
Quality 50:  large divisors (noticeable artifacts, small file)
Quality 10:  very large divisors (blocky mess, tiny file)

The quantization table is the single most important factor in JPEG quality. The JPEG standard defines a default table, but encoders can use any table they want. Modern encoders like MozJPEG use optimized tables that squeeze more quality out of smaller files by tuning divisors to human perceptual sensitivity — being more aggressive on frequencies that humans are less sensitive to.

After quantization, a typical 8×8 block at moderate quality has maybe 6-10 non-zero coefficients out of 64. The rest are zero. This massive reduction in non-zero values is what makes the final compression step effective.

Step 5: Entropy Coding

The final step is lossless compression of the quantized coefficients. JPEG uses a combination of run-length encoding (RLE) and Huffman coding. The coefficients are read in a zigzag pattern from the 8×8 block — starting from the DC (top-left) coefficient and zigzagging through increasing frequencies. This groups all the trailing zeros together, which RLE handles efficiently: instead of storing '0, 0, 0, 0, 0, 0, ..., 0', it stores 'forty-seven zeros.'

Huffman coding then assigns shorter bit sequences to more common values and longer sequences to rare values. Since most non-zero coefficients are small integers (1, -1, 2, -2), they get very short codes. The result: the quantized 8×8 block, which started as 64 values, typically compresses to 20-40 bytes depending on image content and quality setting.

Why Some Images Compress Better Than Others

Understanding the pipeline explains compression behavior that otherwise seems mysterious.

  • Photographs compress well because they have smooth color gradients (chroma subsampling works great) and most energy is in low-frequency DCT coefficients (quantization zeroes out many coefficients without visible impact).
  • Text and line art compress poorly because sharp edges produce significant high-frequency DCT coefficients that can't be zeroed without visible artifacts. The 8×8 block boundaries become visible as ringing around text edges.
  • Noise destroys compression because sensor noise is high-frequency random data. Every noisy pixel adds non-zero high-frequency coefficients that resist quantization. A noisy photo at quality 80 can be 3-5x larger than a clean photo of the same scene at the same quality.
  • Gradients are almost free because a smooth gradient across an 8×8 block is represented by just 2-3 DCT coefficients. The rest are zero even without quantization.
  • Re-saving degrades quality because each save cycle applies quantization again. The rounding errors accumulate. Saving a JPEG 10 times at quality 75 is noticeably worse than saving once at quality 75. This is why you should always edit from the original source file.

The Quality Slider Is Not Linear

Most developers treat the JPEG quality slider as if it's a linear trade-off: quality 50 is half the quality of quality 100, and half the file size. Neither is true.

Quality 100 is not 'lossless' — it still applies quantization, just with small divisors. The file is huge (often larger than PNG) and visually indistinguishable from quality 95. Quality 95 is visually indistinguishable from quality 90 for photographs. The perceptual difference between quality 85 and quality 75 is larger than between 100 and 85. Below quality 50, artifacts become severe and file size reductions slow down.

The sweet spot for web images is usually quality 75-85. Below 75, artifacts are visible at normal viewing size. Above 85, the file size increases substantially with negligible perceptual improvement. MozJPEG's default quality of 75 is well-chosen — it hits the knee of the quality-vs-size curve for most photographs.

Modern JPEG Encoders Are Smarter Than You Think

The JPEG standard defines the decode format, not the encoder. This means encoder implementations can be wildly different in compression efficiency while producing spec-compliant files. MozJPEG, developed by Mozilla, produces files 5-15% smaller than libjpeg at the same visual quality through several techniques.

  • Trellis quantization — instead of independently rounding each DCT coefficient, trellis quantization considers how rounding decisions affect the entropy coding stage and makes globally better choices.
  • Optimized Huffman tables — the default JPEG Huffman tables are generic. MozJPEG generates custom tables tuned to the actual coefficient distribution in each image.
  • Progressive encoding — progressive JPEGs store coefficients in multiple passes (coarse first, then refinement). This often compresses better than sequential encoding because the coarse pass has more predictable coefficient distributions.
  • Adaptive quantization tables — instead of one quantization table for the whole image, adaptive encoders tune quantization per-region, being more aggressive in busy areas (where artifacts are less visible) and more conservative in smooth areas.

If you're serving images on the web, switching from the default libjpeg encoder to MozJPEG is one of the easiest performance wins available. Most image CDNs and processing libraries support it. The encode is slower (irrelevant for pre-processed static images) and the output is smaller at the same quality.

JPEG vs the New Formats

WebP, AVIF, and HEIC all use more modern compression techniques — larger transform blocks, in-loop deblocking filters, intra-prediction (using neighboring blocks to predict the current block). They compress 20-50% better than JPEG at the same visual quality. AVIF in particular, based on the AV1 video codec, is impressively efficient.

So why is JPEG still everywhere? Universal support. Every browser, every image viewer, every operating system, every camera, every phone, every piece of software that handles images supports JPEG. WebP is close to universal in browsers but lacking in native OS support. AVIF support is still inconsistent. HEIC is mainly Apple's ecosystem. For a format that needs to work everywhere, JPEG is still the only safe choice.

The practical advice: use AVIF or WebP with JPEG fallback if your serving infrastructure supports it (most CDNs do via content negotiation). If you can only use one format, JPEG encoded with MozJPEG at quality 80 is hard to beat for photographs. For screenshots, diagrams, and text-heavy images, PNG remains the better choice — or explore more creative approaches to image representation.

The 8×8 Block That Conquered the World

JPEG's compression pipeline is a textbook example of engineering that works with human perception rather than against it. Every stage exploits a specific property — color insensitivity, frequency distribution of natural images, perceptual masking of quantization noise — and they compose into a system that reduces files by 10-20x with minimal visible impact. Thirty-four years later, the same 8×8 DCT blocks that ran on 1992 hardware still carry most of the images on the internet. That's the mark of a design that got the fundamentals right.