11-6-95
It is well known that Markov-Modelling compression algorithms can also be used for prediction. However, this result is also true of all compression algorithms. In prediction, the goal is to use the preceeding set of data, S, to guess the probability that the next character will be n, (where 0 <= n < N). Let Si = S + i ( where + denotes string concatenation) let C(X) be a reversible transform (aka a compressor). let L(C(X)) be the length of the output of C(X) let Li = L(C(S+i)) let DiffLi = Li - L(C(S)) if ( DiffLi <= 0 for any i, this method does not apply ) ( DiffLi == 0 is imaginable, DiffLi < 0 is strange ) now PP(S,i) = 2^( - DiffLi) PP(S,i) = Predicted_Probability of i following S -- for example : let C(X) be an order-0 statistical compressor. Then Li = L(C(S+i)) = L(C(S)) + L(C(i)) and L(C(i)) = - log2( P(i) ) so DiffLi = -log2( P(i) ) then PP(S,i) = 2^( -DiffLi ) = 2^( log2( P(i) ) ) = P(i) !! The example of a high-order markov modeller is similar. --- the advantage of this definition of PP(S,i) is that it is totally general, and proves that compression and prediction are equivalent in all cases.
Charles Bloom / cb at my domain