Image Compression
In the evenings he would feast on ancient books
to the lazy enchanting lap of wavelets ...
from Vladimir Nabakov's Invitation to a Beheading
Heavily updated in May and June of 1998.
Sections
Lossy
WaveVideo
Finished Sept. 1999, finally released October, 2004 !!
WaveVideo.zip (300k zip). Full source + exe + crblib so it stands alone.
Compile with MSVC6. Uses DirectDraw 4.
The highlight of WaveVideo is a very fast full-frame Wavelet image decoder. See also this article -
Wavelet Video notes
NOTEZ : the compiled version uses the D97/C22 transform ; for speed, make sure you use the C22/Haar
transform set.
Also, I do silly things like ftoi() that are not needed and can be way faster with SSE2. The true
speed of the algorithm is probably well beyond the experimental numbers.
excerpt from the readme :
Basically WaveVideo works in-place on an image. It is a full-image wavelet transform,
it is not block-based at all. It is also a true Wavelet transform (CDF22 and Daubechies 9/7
are supported), not just a trivial Haar wavelet.
The decode works with a few circular row buffers. The size of the buffers needed depends on
the span of the transform; eg. Haar needs 2 row, CDF22 needs 3 rows, D97 needs 7 rows.
Very good speed can be acheived with a horizontal CDF22 transform and a vertical Haar transform.
I've got some nice hand-written diagrams of the memory flow, but nothing digitial.
Basically you first decode the wavelet coefficients for the needed rows into the temp LH,HL,HH
bands. You have the previous LL band already. You never actually decode the full image, you
decode rows as you need them. Then you do the horizontal inverse transform on the LL+LH and
the HL+HH to make a L and H temp vertical rows; you build these into temp circular row buffers
until you have enough to do inverse vertical transforms, and then you do that into the output
image.
Each level of wavelet inverse transform works like this :
[coded stream] -> Coefficient decoder -> [wavelet coefficients for LH,HL,HH bands in temp buffers]
[previously decoded LL band] + [wavelet coefficients for LH,HL,HH bands] ->
horizontal inverse transform -> [temp L and H vertical rows] ->
vertical inverse transform -> [decoded image rows]
the decoded rows are thus streamed out in order. So, we are streaming in the coded bits and
the previous LL band, and streaming out a decoded image.
On the final level, we also do a YUV->RGB transform (the coded stream is in YUV).
WaveCode
- Get the source code to WaveCode (Zip,150k) .
This is wavelet, DCT, S+P (tdec), and the kitchen sink.
- For the merely-curious (or users of bad compilers), you can
download an v1.7 executable for Windows NT/95 (Zip,141k)
- Added a VC6 project, April 2003
- Wavecode v1.0 released May 23, 1998
- Wavecode v1.1 released May 26, 1998
- Wavecode v1.2 released June 16, 1998
- Wavecode v1.3 got lost in the mail :^)
- Wavecode v1.4 released June 26, 1998
adds many new coders, a new "stop at bitrate" and a seek-best-quantizer for bitrate,
adds zerotrees to some coders, and tunes some coders for speed. My DPCM on the LL
band is quite poor, so performance may even be worse than v1.2. The coder "bp_bin"
uses many ideas stolen from Xaolin's excellent ECECOW. The "bp bin fast" is an
almost-as-good coder that is much fast; try also the "bp bin fast zerotree" for
blazing speed and compression.
- Wavecode v1.7 released August 19, 1998
- Adds a slick new DPCM for the LL that makes a fast estimate of the Gaussian variance
then uses a semi-Guassian code for that variance.
- Adds many Lifting Transforms : cdf22, cdf24, bcw3, d4, daub97
- Adds Wavelet Packets ; we don't prune the full tree right now, we only go forward
as deep as we can figure is optimal
- adds more coders still, and god knows what else
- does color images in YUV and sorts bands on activity
- More information someday soon; some performance numbers in the table below.
- Take a look at Lena in Wavelet, Fourier, and Lifting (15k,gif) bases.
(no quantization, that is, quantizer=1.0)
- I've often complained that the simple histograms used to estimate the rate for use in
rate-distortion optimal quantization (see, for example a full discussion on the standard
method by Dominik Schmidt which I downloaded locally and can't find on the web anymore...)
Anyhoo, here is Lena at 0.25 bpp (8k,gif) with various quantizer values; the goal of the optimal quantizer
is to move around in this curve and find the quantizer which minimizes the error. As you can
see this curve is not simple, and not monotonic.
- More...
DCT Based
We've got a nice family of DCT coders for you. All use a DCT Transform
shamelessly stolen from the Independent JPEG Group (and simplified, and
consolidated). We beat JPEG in a measure of MSE (mean-squared-error) or
PSNR (= 20*log(256/rmse)) but that's no surprise since jpeg is meant for
*visual* quality, not MSE quality; in fact Arith-Coded JPEG with flat
quantizers is quite competitive with the state of the art wavelet coders.
-
Read the Bloom Public License
-
You need crblib to compile this source.
-
DCT (Zip,14k) coder. Many advances over JPEG,
incorporating various state-of-the-art techniques (rate-distortion optimal
thresholding, order1 context coding, etc.) help very little. Equipped
with vast status reports, this is a nice toolbox for examining DCT.
-
EZ DCT (Zip,12k) coder. An idea shamelessly
stolen from Xiong et.al. ("A DCT-Based Embedded Image Coder") is to apply
the "EZ" scheme to DCT data. We tree-structure the coefficients of each
block, rather than zig-zag scan. We simply transmit bitplanes. There are
no "zero trees" or "significance maps" - we refuse to introduce jargon which
is irrelevant to the coding; all an EZ coder really does is code the bitplanes
instead of the values. This coder is *incredibly* simple (and fast), and is
quite competitive with the state-of-the-art.
-
| Report of Results on Lena (512x512x Grey) |
| coder | bit rate | MSE | PSNR |
|
| JPEG base, flat-q=75 | 0.25 | 59.1 | 30.45 |
| JPEG huff, flat-q=58 | 0.25 | 44.5 | 31.68 |
| JPEG arit, flat-q=54 | 0.25 | 41.3 | 32.00 |
| EZW | 0.25 | 31.4 | 33.2 |
| SPIHT (improved EZW) | 0.25 | 29.2 | 33.5 |
| WaveCode v1.7 (bitplane) | 0.25 | 24.9 | 34.2 |
| SFQ | 0.25 | 24.3 | 34.3 |
|
| baseline jpeg | 0.50 | 22.32 | 34.7 |
| jpeg-arith | 0.50 | 20.00 | 35.15 |
| flat-q jpeg, huffman | 0.492 | 19.45 | 35.3 |
| ezdct | 0.50 | 19.00 | 35.38 |
| dct | 0.50 | 18.53 | 35.49 |
| tuned jpeg-huff | 0.50 | 18.5 | 35.5 |
| flat-q jpeg, arith | 0.503 | 17.92 | 35.6 |
| EZW | 0.50 | 15.4 | 36.3 |
| SPIHT | 0.50 | 13.5 | 36.9 |
| WaveCode v1.5 | 0.50 | 12.9 | 37.0 |
| SFQ | 0.50 | 11.83 | 37.4 |
Key:
- dct and ezdct are our DCT coders.
- WaveCode is our generic transform-coder package. Results are
for the Daubechies 9/7 wavelet transform (with uniform quantizers)
unless otherwise stated.
- baseline jpeg is the very stupid JPEG which is usually set up and knocked
down like a bowling pin : it uses pre-computed entropy tables. The "opitmized"
JPEG uses per-file computed Huffman tables, and is a much reasonable comparitor.
- jpeg-arith is the ISO-Spec implementation of JPEG arithmetic coding.
Note how well the flat-q jpeg arith compares with EZW.
Get it
- flat-q jpeg is JPEG with flat quantizers (25 for the arithcoding version,
27 for the "optimized" huffman version). Be careful when comparing, because I couldn't
get these to hit 0.50 bpp exactly. Note that the flat-q huffman beats the arithcoding
jpeg with visual quantizers and at a lower bitrate! clearly indicating that
quantizers are the single most important thing in JPEG.
- tuned jpeg-huff is baseline JPEG with optimized (computed) Huffman tables,
and rate-distortion optimized quantization tables. This process is *very* slow, and
helps a whole lot. It should be noted that this is only mildly better than using flat
quantizers; it is foolish to compare tuned quantizers vs. baseline JPEG using an MSE
or PSNR rating.
- EZW is the famous coder of Shapiro. It beats ezDCT presumably only
because of the superiority of the wavelet transform.
- SPIHT is the improved version of EZW by Said&Pearlman. It is
structurally identical to EZW, but modernized.
- SFQ is the R-D optimal EZ coder of Xiong et.al. ("Space-Frequency Quantization").
They also do optimal scalar quantization and so-called zerotree prediction. It's amusing
that SFQ at a rate of 0.25 has the same quality as JPEG baseline at twice the rate !!!
A sample run of DCT
Q:\dct>dct c:\ccc\images\peppers.512.raw 512 512 1 83
dct v0.5, (c) 1998 by Charles Bloom
peppers.512.raw : 262144 -> 29840 = 0.910 bpb = 8.784 to 1 , 169125 byps
DC dpcm : 4096 -> 3370 = 6.582 bpb = 1.215 to 1
AC order-0-1 : 258048 -> 22103 = 0.685 bpb = 11.674 to 1
AC order(-1) : 56 -> 55 = 7.857 bpb = 1.018 to 1
signs : 4321 -> 4312 = 7.983 bpb = 1.002 to 1
error: av= 2.85,max= 29,mse= 13.712,rmse= 3.70,psnr= 36.79,perc= 3.67
performance, MSE = 0.640665 , RMSE = 2.372391 , percep = 2.394120
n0 = 33.197403 % , n1 = 7.017136 %, zero_len = 136453 , av = 33.313721
0p0 = 83.736185 % , 0p1 = 12.543942 % , 0pbig = 3.719872 %
nsign = 14.6774 % , same = 33.9069 % , diff = 28.8621 % , noc = 37.2310 %
Some figures are nice. The average EOB length of zeros is 33.3 ;
the non-EOB'd values are in turn 33% zeros. We also get the delightful
figure that zero predicts another zero 84% of the time, and a large value only
follows a zero 4% of the time.
15% of pixels needed signs; 34% of these signs were the same as their parent,
29% were different, and 37% had a zero parent (no sign). This slight
skewing saved a whopping 9 bytes of the total file length. :^)
The "percep" error measure gives less weight to high frequency errors;
it is an rmse error, so compare 3.67 percep and 3.70 rmse, you see not much
difference (this is because we use uniform quantizers).
The coders here are reasonably
practical compared to the state of the art (CALIC,
context-based image coders (UCM)). I conjecture that the best
image coder would be a PPMZ modified to work on 2-d arrays
of data (2-d contexts), but I don't have time to make that
modification. (some other subtleties may be necessary,
like a preliminary transform of basis to YUV or delta-codes,
and/or using the other color-planes to predict the current
pixel (R predicts G, G predicts B, etc.).
-
Read the Bloom Public License
-
You need crblib to compile this source.
-
Get my DPCM (Zip,8k) (Digital Pulse Code Modulation) coder.
The name is an anachronism - the DPCM coder is the simplest sensical
image coder; we take advantage of the fact that images tend to be
smooth. I use ideas shamelessly stolen from CALIC and others, so this source
code is for academic experimentation only!! Some parts are patented by other people,
so do some research before using it in commerce.
Added 3-28-98 : "imppm" deterministic pre-pass with a PPM (with my SEE) helps a bit.
We now slightly beat the state-of-the-art, but I think with the techniques in here,
we should be doing even better still. There's some fine-tuning which is very delicate.
Noted 4-28-98 : there is a minor bug which causes stalls on rare files
-
A tree decomposition (zip,3k) coder. This is an extremely small
and fast little coder, with performance roughly on par with CALIC's 1-off mode (lossy, with a
largest error of +/- 1 , and a square-average error of zero). The advantage over calic's
lossy mode is the extreme simplicity and speed of this coder, along with its progressive
transmission. Unfortunately, I see very little room for improvement here.
-
It's big brother is EZ tree-dec (zip,4k) . This is
inspired by the "EZ" wavelet and DCT coders. The transform is the same simple one as
in the above tdec, but is now kept lossless by going to 10-bit pots. An EZ pyramid
scheme is used to code bit-planes of coefficients. We get about 0.4 bpb worse than
the state of the art lossless coders, at the benefit of extreme simplicity. This is
a nice way to learn the EZ scheme on a simple coder.
Our performance is quite encouraging, considering that our transform is
basically a 2x2 DCT. (apparently I rediscovered the
so-called "S-transform", the trivial Haar "wavelet"). I get 4.640 bpp on
Lena which equals the best lossless-JPEG (not saying much) and
makes a mockery of the 4.77 figure that Said & Pearlman quote for the S-Transform.
As usual, authors do not give proper credit to the paper-doll algorithms they
knock down.
Now try EZ tree-dec IV (zip,7k) . I got the basic
idea from S & P 's S+P coder, went off to play with it and tuned out the params,
and lo reproduced exactly their coder. Strangely, their compression beats us
by about 0.2 bpp. Also strangely, this coder seems to work best at levels==0 ,
that is, pure DPCM ! This is a disappointment and presented only for
instructional value (or maybe you can tell me what went wrong).
(btw the EZ tree-dec coders are now available as a subset of WaveCode, and
the ones in WaveCode are betters; the code here is self-contained and
embedded so may still be valuable for instruction).
- The "Tapestry" Image (512x512 x 8bit grey). I created this as a test, and it's so
interesting that I post it in its own right. This is a natural image,
copied around to be extremely repetitive. This image is a challenge
to image compressors, which are generally very bad at this sort of thing
(or good at it, like PNG and Rkive, while being bad on standard tests
like Lena). For example, PPMZ v9.1 compresses Tapestry to 0.736 bpp
( download it , 24k ) , while EZ-Tdec-3 packs it to 7.289 bpp,
almost ten times worse!!! In fact this is one of the most (apparently) incompressible
non-random images I have ever found.
Charles Bloom / cb at my domain send me email
Back to the Source Code Index
The free web counter says you are visitor number