PPMZ v7.6 by Charles Bloom cbloom@mail.utexas.edu report is in output-bits-per-input-byte on the Calgary Corpus * indicates the Best item Coder : 0 1 2 3 4 5 6 Guess Order : 8 5 4 3 2 1 0 varies BIB : 1.801 1.798* 1.823 2.075 2.618 3.442 5.208 1.798* BOOK1 : 2.305 2.267* 2.272 2.438 2.889 3.595 4.531 2.267* BOOK2 : 1.925 1.923* 1.961 2.241 2.874 3.765 4.758 1.923* GEO : 4.639 4.640 4.629 4.598 4.476* 4.620 5.648 4.476* NEWS : 2.306* 2.315 2.346 2.619 3.233 4.114 5.084 2.306* OBJ1 : 3.718 3.713 3.711* 3.740 3.844 4.435 5.668 3.711* OBJ2 : 2.287* 2.334 2.392 2.696 3.070 4.069 6.056 2.287* PAPER1 : 2.284* 2.290 2.299 2.441 2.888 3.797 4.948 2.284* PAPER2 : 2.286 2.276* 2.282 2.405 2.842 3.612 4.600 2.276* PIC : 0.811* 0.814 0.816 0.835 0.833 0.840 1.017 0.811* PROGC : 2.316* 2.330 2.339 2.466 2.869 3.814 5.177 2.316* PROGL : 1.518* 1.566 1.601 1.866 2.371 3.292 4.580 1.518* PROGP : 1.568* 1.603 1.622 1.802 2.252 3.340 4.841 1.568* TRANS : 1.282* 1.331 1.366 1.716 2.331 3.461 5.401 1.282* AVERAGE : 2.202 Local Order Estimating (LOE) Coders Coder : 9 10 11 12 13 Best BIB : 1.771* 1.776 1.791 1.777 1.833 1.771 BOOK1 : 2.235* 2.243 2.265 2.258 2.251 2.235 BOOK2 : 1.886* 1.890 1.907 1.894 1.942 1.886 GEO : 4.581 4.590 4.601 4.604 4.475* 4.475 NEWS : 2.280* 2.281 2.293 2.285 2.371 2.280 OBJ1 : 3.731 3.726 3.729 3.723* 3.877 3.723 OBJ2 : 2.302 2.297* 2.302 2.291 2.453 2.297 PAPER1 : 2.265 2.263* 2.269 2.266 2.393 2.263 PAPER2 : 2.244* 2.246 2.260 2.257 2.302 2.244 PIC : 0.798 0.800 0.803 0.804 0.793* 0.793 PROGC : 2.297 2.295* 2.306 2.295* 2.446 2.295 PROGL : 1.505* 1.506 1.519 1.505* 1.574 1.505 PROGP : 1.552 1.552 1.562 1.550* 1.653 1.550 TRANS : 1.273* 1.276 1.291 1.277 1.347 1.273 AVERAGE : 2.194 2.196 2.207 2.199 2.265 2.185 Technical Notes: The LOE coders use this heuristic: each assigns an Order Rating (OR) to each PPM Order the Order with the highest OR is coded from All higher orders are updated, lower ones are not the OR heuristics for each coder are: let P = probability of Most Probable Symbol in context let E = probability of Escape Coder9 : OR = P Coder10 : OR = 20*ln(P) + ln(E) Coder11 : OR = 6*ln(P) + ln(E) Coder12 : OR = P*ln(P) + (1-P)*( ln(E) - 10*ln(2) ) Coder13 : OR = P*ln(P) + (P-E-1)*8*ln(2) These are equivalent to using the Lowest Order Entropy (H) with Coder9 : H = log2(1/P) Coder10 : H = log2(1/P) + (1/20)*log2(1/E) Coder11 : H = log2(1/P) + (1/6)*log2(1/E) Coder12 : H = P*log2(1/P) + (1-P)*( log2(1/E) + 10 ) Coder13 : H = P*log2(1/P) + (1-P)*8 + E*8 I think it should be easy to see from each of these how they are trying to minimize output length. i.e. Coder12 guesses there is a probability P for a length of log2(1/P) to be output, and if P is not output, then an Escape of length log2(1/E) is written followed by 10 bits from lower orders.