cb's rants

The older rants are semi-regularly moved off this page. You can always read the old rants :
03/2008 to 08/2008
11/2007 to 03/2008
07/2006 to 11/2007
12/2005 to 07/2006
06/2005 to 12/2005
01/1999 to 06/2005

Go to the new cbloom rants @ blogspot

You can see some of my photos at Flickr .


We need to get away to somewhere sunny and warm this winter. Hawaii makes the most sense from here, but Hawaii weather in the winter is unreliable, and maybe a bit expensive and of course full of damn Americans (big advantage : nonstop flights, no passport control, dear god I hate airports). The Riviera Maya is pretty convenient and cheap and the weather is great there in winter. It's nice and all but I've been there a lot and it's not amazing (plus : nonstop flights to Cancun, though very few of them and expensive).

I'd like to go to the British Virgin Islands maybe or something but it seems like a huge pain in the ass to get there. Not really worth it unless I can stay a week. Or a month. Or maybe a few years.


I think the realtime raytracing guys are rather too optimistic. The thing is, first hit raytracing against solid objects is just not interesting. It gives you absolutely zero win over rasterization, and rasterization is way way faster. They brag about 100 M rays a second, but rasterization is doing many Billions of pixels per second now.

I should back up and say they are doing very fast raytracing for coherent rays using packets that they run through SIMD, and they can do stuff like break the screen into tiles for good parallelism on multi-core or multi-proc systems. Those kind of things are the exact same optimizations we do in rasterization, and they only work for plain old rendering a big image from the camera. Also the raytracing stuff is only really fast on static scenes which is a huge problem; we need to be moving towards everything-is-dynamic worlds.

In fact, if you like, you can think of rasterization as a very clever optimization for first hit ray-tracing. They share the exact same strengths and weaknesses - they both only work well when you're shooting a ton of "rays" from the same spot with lots of coherence. They both exploit parallelism by making quads and doing them at the same time down a SIMD path, and they both fall apart if you do lots of different objects, or rays in random directions etc.

Raytracing is of course very interesting, but the raytracing that you really *want* to do is also the exact thing that's very slow. You want N bounce raytracing, and for it to really be a win over current rasterization techniques you need to be able to do good sampling at the really hard edge-on cases where your rays diverge badly and you need to shoot lots of rays and do monte carlo integration or something, eg. to get better fresnel reflections on the edge-on surfaces of objects.

It also totally tilts me the way raytracing guys grossly overstate the importance of refraction and caustics. Improving the realism of our wine glass rendering is about 1000th on the list of things we need to make games look better.

Things that would actually help :

1. Better basic lighting & shadowing. This is something that hybrid raytracing could help in the near term. I mean eventually we'd like full realtime GI, solving the rendering equation every frame, but in the near term even just better shadow maps that don't have any sampling problems would be awesome, which is something hybrid raytracing could do. If you could render your scene with rasterization and also cast 100M rays per frame for shadows, you could probably get some semi-soft shadow area light source effects, which would be cool.

2. Volumetric effects, and light & shadow through objects. Dust clouds with lighting attenuation and scattering. Light through cloth and skin and leaves.

You don't really get the cool benefits from raytracing until you can do something like 1000 non-coherent rays per pixel (to cast around and get lighting and reflections and scattering and so on). At 1000x1000 at 100 fps, that's 100 Billion rays per second (and non-coherent). We're very far away from that. Even on something like Larrabee 3 you'll be better off rasterizing and maybe just using rays for shadows.


TextBlog : .NET C# app to edit your Blogger blog as Text : TextBlog.zip (64k)

Hopefully I didn't leave my password in there or anything.

The Simple Blogger layout I've got is pretty butt ugly, but it's the only one without a ton of shit in the side pane. I hate that fucking side pane, it takes up like half the screen and makes every blog all narrow and turns a tiny bit of text into five hundred pages.


Oh duh. I just realized there's a much better way to cycle through the images in cbsaver. I should just enumerate all the images at startup, then do a random permutation of the whole set, and then just step through that list in a straight line. That gaurantees that you see every image once before you repeat, which is pretty much exactly what you want for this kind of random slideshow.

The only nagging imperfection with that is that it rebuilds the list each time it runs. I'd like the permutation to be persistent across runs. One cheezy way to accomplish that would be using the last access time.


The new cbloom rants at blogspot is almost working. If one of my Google buddies who reads this can hook me up with someone at Blogger to get past this 50 post a day limit I'll be up and running. It's not up to date yet because I can only put up 50 posts a day >:|

I'm going to post the source code to my TextBlog uploader. Even if you don't like blogging from Text it's a pretty handy way to back up your blog, because it can download the whole blog to files, then you could edit some of those if you want, and it can restore your whole blog from those files. Would be a very handy way to do something like a mass find-replace on all your blog entries.

It's been a good day. The Hamachi stuff works, and I got to listen to music at work finally. I've been thinking this HTPC project has been a huge pain in the butt and I wish I was still just listening to CD's, but then something like that comes together and it's all worth it.

I got my Oodle space shooter game sort of working. It's a little fly around in Galaxy, but it loads models and textures async through Oodle. As usual it's hard enough to actually find enough content to stress the async at all. It loads like 20 megs of models in less than a second. It's still just loading single-file packages; next up : some logical pagacking, tracking how the app loads data, defining load group markers, compressing bundles, bundle baking processes, etc. etc.


So, I wrote a while ago about figuring out a way to VPN from work to home through my router and DHCP and all that. It seemed like a real pain in the ass. Then someone pointed me at Hamachi , and BLAMMO it just works and takes like 2 seconds to set up. In fact the slowest part was that I went through their fucking tutorial cuz I thought it might have something useful in it. No, it does, not, skip it. Just create a new network, give yourself a password, then log all your machines into that network. Super blammo it works fantastique! formidable!


God damn casts are the fucking devil. I just found a huge bug in Galaxy3 where I was assuming that one struct was a prefix of another struct. In particular I was assuming that a vertex with Diffuse, 2 UVs, and a Normal, had as its prefix a vert with just Diffuse and a UV. Eight years ago or so when I wrote it, that was true. Then four years ago I changed some of the structs and it became broken, and I only just found it now. Yuck.

Obviously you want to avoid casts, but sometimes you have to do them. One thing I've started doing is making little helper functions instead of just casting. Like :

gVertexDUV & gVertPrefixDUV( gVertexDUV2N & vertex )
	return (gVertexDUV &) vertex;

then use it like :

FuncOnDUV( gVertPrefixDUV( m_vertexDUV2N ) );

instead of just doing :

FuncOnDUV( (gVertexDUV &) ( m_vertexDUV2N ) );

Now, that on its own already does a little bit for you, because it means that if you change the type of m_vertexDUV2N you will get a compile error, unlike the normal cast which will just keep casting anything. That is, the function more narrowly specifies, I want to cast X to Y, not just a cast of any type to any other type (you could get the same constraint by using static_cast and specifying both template arguments manually instead of letting one be inferred).

But the real win comes if you start checking the cast in the function :

// MEMBER_OFFSET tells you the offset of a member in a type
#define MEMBER_OFFSET(type,member)	( (size_t) &(((type *)0)->member) )

gVertexDUV & gVertPrefixDUV( gVertexDUV2N & vertex )
	COMPILER_ASSERT( sizeof(gVertexDUV) <= sizeof(gVertexDUV2N) );
	COMPILER_ASSERT( MEMBER_OFFSET(gVertexDUV,diffuse) == gVertexDUV(gVertexDUV2N,diffuse) );
	COMPILER_ASSERT( MEMBER_OFFSET(gVertexDUV,uv1) == gVertexDUV(gVertexDUV2N,uv1) );
	return (gVertexDUV &) vertex;

Stuff like that. Now you have a semi-safe cast that will warn you if you change the structs and mess up the cast.

I've also been using a little utility that looks like static_cast<> but ensures the value is of the same size :

// same_size_cast like static_cast or whatever
template < typename t_to, typename t_fm >
t_to same_size_cast( const t_fm & from )
	RR_COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
	return *( (const t_to *) &from );

That's okay. I really hate the compiler warning about settings signed to unsigned and all that, because it makes me put casts in to make the warning go away, and that actually makes the code much less safe, because now it's casting god knows what if I change the type.


So I'm finally writing the Blogger API app to post my blog for me. It's going pretty well. It works like this :

download all the posts from Blogger

parse rants.html and break it into separate posts and try to give them unique logical names

compare all the posts from parsing with what I downloaded

update any changed posts, upload any new posts

and that's all working. I have run into a few problems :

1. Blogger puts a limit of 50 posts per day on you. For my initial upload of the old rants that is really fucking annoying. It also doesn't error at all when the limit kicks in, the posts just stop showing up, so I wasted some time going "WTF is happening".

2. Fucking Blogger by default converts slash-n (LF) into a BR tag. Ugh. Even when I tell it my content-type is HTML it still does the conversion. There's a "convertLineBreaks" tag in the feed but you can't put it, only get it. Again I was pulling my hair out on that one until I found that you can disable it from the web interface.

3. The documentation is actually pretty balls. It looks good at first because the "Getting Started" guide is really nice and I'm humming along all happy, and then I start actually wanting to know things, and there's just zero detailed documentation about what all these fields are and what values are okay and what they do to the server.

4. C# is a fucking pain in the ass. Sarcastic example of how to make int i = 1;

ValueHolder< int > i = System.Int.Clone( System.ConstantIntegerValue.1 );


The NFL officiating is really fucking impressive. The guys conference and try to get the call right, they have people all over the field, and if all that fails the replay system is really good. They actually seem to care about letting the play decide the game. Contrast to basketball and soccer which are both so retardedly horribly officiated (and the result of games depends so strongly on those bogus calls) that they're just unwatchable.

The only sport that maybe has it even better than the NFL is Rugby, where the head referee is almost an active player and has a huge amount of freedom to call or not call things. That's necessary because the rules of rugby are sort of vague and if you played by the letter of The Law the games would be retarded with penalties being called constantly. Rugby refs get to use their judgement and let a lot of things slide but try to keep things fair overall. They also will get right in there and yell at players instead of stopping play, like "let go of him!" "roll away!" "let it go!" (talking about the ball), etc. It's really a wonderful thing when the head ref is good. The bad thing is that it does give the ref a lot of power in affecting games, and it works horribly if the ref is not very good.


I found a really nice page on all the Windows Services . So that you can turn off all the rotten fucking nonsense that MS wants you to run.


The Maximum Compression guy has some bizarro efficiency measure . The right way to measure a compressor is the total time to compress, transmit & decompress over a channel of a certain speed. In my case I mainly only care about decompression, so I measure the quality in terms of the time to transmit the compressed data & the time to decompress.

Here's a CSV of his data with a reasonable efficiency rating. I also make a "True Decomp" time by subtracting off 3 seconds to remove the executable load overhead and the time to write the uncompressed data to disk. Copy-paste this to a file and load with excel as a CSV. The numbers that matter are Decomp+Load at 6 MB/sec and Decomp+Load at 1 MB/sec

number,archiver,switches,solid,size,ratio (%),Comp time,Decomp time,Efficiency (lower is better),,,True Decomp,,Load 6MB/sec,,Load 1MB/sec,,Decomp+Load 6Ms,,D+L 1MBs
99,CABARC 1.00.0106,LZX:17,N,102561882,67.58,175,7,14394,,,4, ,17.093647,,102.561882,,21.093647,,106.561882
45,THOR 0.96,e5,Y,112275112,64.51,24,6,5779,,,3, ,18.71251867,,112.275112,,21.71251867,,115.275112
18,WinRAR 3.71,Normal solid 4096Kb,N,90540300,71.38,152,10,3309,,,7, ,15.09005,,90.5403,,22.09005,,97.5403
73,WINIMP 1.21,1 BEST MM 16Mb B=1020Kb,N,97241095,69.26,204,9,9316,,,6, ,16.20684917,,97.241095,,22.20684917,,103.241095
59,WINIMP 1.21,1 BEST MM,N,99025199,68.7,137,9,7621,,,6, ,16.50419983,,99.025199,,22.50419983,,105.025199
36,WINIMP 1.21,Normal MM,N,100399936,68.26,82,9,5310,,,6, ,16.73332267,,100.399936,,22.73332267,,106.399936
76,GZIP 1.2.4,(none),Y,115442964,63.51,28,7,9570,,,4, ,19.240494,,115.442964,,23.240494,,119.442964
66,AIN 2.32,(none),N,115576936,63.47,25,7,8672,,,4, ,19.26282267,,115.576936,,23.26282267,,119.576936
98,BIX 1.00 b7,(none),N,99694432,68.49,237,10,14196,,,7, ,16.61573867,,99.694432,,23.61573867,,106.694432
80,LHARK 0.4d,(none),N,113439063,64.14,37,8,10132,,,5, ,18.9065105,,113.439063,,23.9065105,,118.439063
63,PKZIP 2.50,#NAME?,N,114022347,63.96,28,8,8178,,,5, ,19.0037245,,114.022347,,24.0037245,,119.022347
52,PKZIP 2.50,(none),N,114501263,63.81,22,8,6775,,,5, ,19.08354383,,114.501263,,24.08354383,,119.501263
85,ESP 1.92,(none),N,115164577,63.6,32,8,10605,,,5, ,19.19409617,,115.164577,,24.19409617,,120.164577
87,DZip 2.90,(none),N,115269882,63.56,33,8,11065,,,5, ,19.211647,,115.269882,,24.211647,,120.269882
86,File2Pack 2.0,(none),N,115305901,63.55,32,8,10772,,,5, ,19.21765017,,115.305901,,24.21765017,,120.305901
41,THOR 0.96,e3,Y,124588656,60.62,6,7,5638,,,4, ,20.764776,,124.588656,,24.764776,,128.588656

Don't put too much stock in these numbers, his timing is very rough (as is my "true decomp" time estimate). Also stuff like RAR and WINIMP is taking advantage of special multimedia recognition to do so well. Anyway, what you can see is that on modern DVD media that does between 1MB/sec and 6 MB/sec, you really cannot afford to spend much time in the decompressor. The slower the channel, the more important it is to spend a lot of time compressing really well, if the channel is infinitely fast then memcpy is the best option. Note that loading from a hard drive you really can't afford to decompress at all (and you shouldn't really be streaming or pointer fixing either, flat load baby!).

If you refer algebra, I ran these numbers :

6 MB/sec compressed data rate

4.0 bpb at 120 MB/sec

( 256 * 4.0 / 8 ) / 6 + ( 256 / 120 ) = 23.47 seconds

3.75 bpb at 80 MB/sec

( 256 * 3.75 / 8 ) / 6 + ( 256 / 80 ) = 23.20 seconds (** winner)

3.6 bpb at 40 MB/sec

( 256 * 3.60 / 8 ) / 6 + ( 256 / 40 ) = 25.60 seconds

3.5 bpb at 20 MB/sec

( 256 * 3.50 / 8 ) / 6 + ( 256 / 20 ) = 31.50 seconds


2 MB/sec compressed data rate

4.0 bpb at 120 MB/sec

( 256 * 4.0 / 8 ) / 2 + ( 256 / 120 ) = 66.13 seconds

3.75 bpb at 80 MB/sec

( 256 * 3.75 / 8 ) / 2 + ( 256 / 80 ) = 63.20 seconds (** winner)

3.6 bpb at 40 MB/sec

( 256 * 3.60 / 8 ) / 2 + ( 256 / 40 ) = 64.00 seconds

3.5 bpb at 20 MB/sec

( 256 * 3.50 / 8 ) / 2 + ( 256 / 20 ) = 68.80 seconds


4.0 bpb at 120 MB/sec is stuff like THOR,quicklz,LZRW,LZO
3.75 bpb at 80 MB/sec is stuff like LZX, and my Oodle-LZH
3.6 bpb at 40 MB/sec is stuff like RAR & Quantum
3.5 bpb at 20 MB/sec is stuff like LZMA/7-zip

Even though those efficiency numbers are very rough, I was surprised to see Zip was up there doing quite well. A lot of people claim to be "better than zip" , but they measure that in the useless metric of how much compression they get. The real measure is the full time to load compressed data into memory uncompressed.

I've been thinking about making my LZ optimal parser able to output Zip code streams. I think it would be the best Zip recompressor (7-zip also has a very good zip recompressor, I'd like to test myself against that). Unfortunately the source code for both 7-zip and InfoZip is a total mess and untangling that to be able to output a Zip code stream seems like an awful lot of work for a pointless curiosity project.

ADDENDUM : I should note that the total time is actually not really the best measure unless you are completely stalling on the load. Hopefully the actual IO is async and you do other work, which means actually that the decoders which spend more time in IO and less time on CPU work are preferred. The slower decoder with higher compression may look good in a pure race on blocked IO, but it's preventing other stuff from using the CPU. The only time a slow decoder really makes sense is when the CPU is very fast and the IO channel is very slow.


You can't use floating points in compression generally because it makes you non-platform-independent due to little differences in the internal precision and so on. But if you were okay with being locked to one platform and one particular FPU (for example, say on Xenon or Cell), I wonder if you could do a very fast arithmetic coder using floating points. A normal arithmetic coder does an integer divide which is usually slower than an FPU divide by quite a bit. And you might even be able to use the fast reciprocal estimate instead of the divide, as a standard kind of giving of some compression for speed. Of course you have to turn your float back into an int to get bits in & out of the code stream and such, but you may be able to do that quickly with a magic number addition or something like that, since you know your floats will be in a limited range. I think the hardest part would be making sure all the float->int parts were fast, not the range update.


I've been daydreaming about Everquest lately. I vividly remember places in the EQ world as if they were real places where I spent time in my life. The giant wood bridge to the Karanas, the Aviak towers in South Karana with Elephants wandering around; the burrows full of gnolls; the desert south of freeport with the big giant who stomped me, and the desert next to that with all the crocodiles, the swamps full of frogloks and the city of the trolls, the dungeon of Guk, etc.

I played most of the races just to see their different areas and feel what it was like to grow up there. Barbarians in the frozen north, wood elves in the misty forest of giant trees. It was such a huge world, especially for the time, you could just explore for days. And it was really well designed with interesting hidden stuff that kept you finding new things all the time if you wandered about. Each area had cute little decorations and context and NPC's related to the stuff going on in that area, with lots of hints of stuff you would see at higher levels and hints about the history of the game world.

It was a whole alternate universe you could dive into and run around in for hours. A lot of the magic of that was because it was so open. There were basically no keys or doors in the entire game. You could go anywhere, and players of various levels did mix quite often. It's the most perfect execution I've ever seen of a game where your level is the "key" to various areas. You could go to the hardest dungeons right away if you want, you're just going to die quickly. Even at low levels you could cast invisibility and go on a risky raid into high level dungeons to check things out and it was fascinating and terrifying hoping your invisibility wouldn't wear off. One of the great hurdles early on was trying to make the run from Freeport to Qeynos at low level, you'd beg someone to give you fleet feet and invisibility so you could get past the minotaurs and beholders in the mountain pass.

Of course a lot of the fun came from having other people in it. Even if you were solo'ing you could discover new things and then talk to your friends and be like "man have you seen this!?". I haven't really played WoW but I feel like it has ruined the things I really loved about the game; I liked not knowing the right place to be or the right item to get, I didn't really like the sitting around power leveling; I liked the fear of dying and having it be a really singificant disaster; I liked that it took some effort to explore and get away from other players who were overcrowding the most ideal power leveling spots and all camping for the same item.

There are a few games in my life that really stick with me to this day. "Faery Tale" on the Amiga was one. It was very similar in that it was a huge open world you could explore, with neat special places, and odd magical fairy tale stuff, like blowing the shell to ride the turtle. Another was "Out of this World" / "Another World". The gameplay was pretty shitty, but the art style was just so cool, and the universe was strange and gorgeous, it was like the Half Life 2 of that era.


Urg, I'm trying to get my home machine set up as a VPN server so I can get to it from work and I can't get it to accept a PPTP connection.

I did find the MS Support Tools which has a pptpsrv and pptpclnt to at least be able to test the ports.

you need to make sure TCP port 1723 (for PPTP) or UDP port 500 (for IPsec) is open for the VPN to communicate through the firewall. You also need to open up IP port 47 (for PPTP) or IP port 50 (for IPsec). The IP port is not TCP/UDP or ICMP it's IP.

WTF does that mean? IP Protocol 47 (GRE) ? WTF. Urg. Networking is such complete balls.


I bought an M-Audio Audiophile 2496 for my Media PC because the motherboard audio was ass quality. I wanted something that had direct connections to RCA out, none of that headphone jack bullshit, those connectors are so shitty, I hate that they've become standard for computer audio output (of course if you're on digital you should use SPDIF or HDMI or whatever and you're fine). The sound quality of this thing is very good, I can turn the speakers way up and don't hear any noise. It was kind of a pain getting the driver to work right because as usual with all this high end audio shit they have to do everything some weird nonstandard way that doesn't play well with normal windows apps.

Anyway, I have a weird complaint about it. It's really fucking loud! The signal level coming out of it is super high, I have turn the computer's software volume level way down and also turn my amp way down to get a reasonable volume level. I don't like turning the software volume down because it worries me that the driver is doing something awful like just compressing the dynamic range into a tiny portion of the 16 bit wave (I have no idea if that's actually true). I generally prefer to have the computer output full volume and just turn it down at the amplifier, but this card is too loud to do that.

Oh, most of you would probably hate this card, it does no 3d and no surround sound processing. Just stereo. Yum.


I used to think about getting some kind of wacky physics or math related tattoo. OMG I'm glad I didn't. These Equation Tattoos are hillarious ! The best quote :

I’m just happy that the artist now understands what a Swartzchild Radius is.

Then there's this girl who starts out promising ...

Girl with Shrodinger equation

Someday i hope to be a wacky, flannel-sportin’ physicist.

Um, cool. Wait, what? Flannel?

my tattoo is schroedinger’s equation for the wave function of a particle. i chose this equation because its elegance & symmetry reflect that of our multiverse, & also because it describes the fundamental source of quantum weirdness.

Um, okay, not really, quantum wierdness comes from the fact that the probability is the square of the shrodinger wave function, the actual wave function is a pretty ordinary wave function. But okay, close enough.

time travel, quantum computers no matter what happens in my life, there is an infinitely Glorious Plan swirling all about us

Alright, now, see, that's your problem. That um, Glorious Plan you got swirling all around you is why you ain't ever going to see inside the ivory tower of physics.

I would be honored to be included among the ranks of badass scientists all over the world. oh, & if you have any pull with any preeminent physicists, tell Brian Greene to return my fan mail! :]

I'm afraid my dear the best you could ever be for Brain Greene is a cum dumpster.

And of course the hot chick is a marine biologist, and she's dumb as nails. Umm, I got it cuz like, jellyfish are cool, or something.

These are actually pretty rad :






I'm not a chronic insomniac. Like many folks, I am prone to occasional nights of bad sleeplessness maybe once a month. These are awful. Nothing is worse than lying awake in bed for six hours, mind racing, all the while knowing you'll be useless the next day.

It's funny to me how some people's "awful, nothing worse" is other people's every day. I think most people who complain a lot just have no idea how many people suffer through great trials constantly.


We went hiking last saturday. Meh it was pretty but wet and cold, not really enjoyable. Anyway, I pulled over at some spot on the dirt forest service road to the trail so I could pee. I got out and stepped off the road a bit, and saw on the ground at my feet : 1) a bunch of shotgun shells on the ground, 2) some crushed cans of Miller Lite, 3) a used condom.


I've been trying to write the Oodle APIs nicely as I go, in the simple RAD way that's very close to plain C, but it's really slowing me down to figure out the right API as I go. I think I've just decided to stop doing that. I'm going to just get it all working whatever way is easiest for me, and then once it's all working I'll rewrite the external APIs from scratch. That should result in much better APIs since I'll have the functionality done and really understand it, and I think it will be faster overall too since I can rough out the functionality more comfortably if I just stop worrying about what the API looks like.


I keep being told RefCounting is bad for threads. Really? I mean obviously if you do something retarded like mutex to touch the refcount that's bad, but you don't need to do that, you can just InterlockedIncrement. Also if you pass them around by value all the time so that the ref count goes up and down all time, that would be bad, but you don't need to do that either. I just want a way to have objects count the # of people that want them to stick around.

I'm thinking about making the strings in Oodle refcounted. Currently I just use char buffers everywhere, but that's awfully fat and I spend time strcpy'ing them around. So I could make them COW refcounted thingies and I think it just works. Another option would be to have a big static dictionary of every filename ever, and then I can just pass around read-only char *'s to that dictionary. That's actually a good solution *if* you are using most of those strings at any time, but could be very bad if you are generally only using a few of them at any one time.


Ugh. My fucking upstairs neighbors are killing me. They were dragging furniture across the floor again at 3 in the morning. I got no sleep yet again and feel like total ass. When I'm home I'm a tense ball of quivering nerves. I haven't really relaxed in weeks. All I want is peace and no stress. Normally if someone is making my life unpleasant I just cut them out, avoid them, but I can't do that with the neighbors. I'm kind of tempted to just move, but ugh moving is such a pain in the ass, and no resonable amount of throwing money at moving seems to make it any easier. I've talked to the upstairs neighbor, but I don't think it does any good. Maybe someone sweet and cute could talk to them and get them to change, but I'm no good at talking to people, I try to be nice and nonconfrontation but I always just come off as an asshole and they have no motivation to help me. People only change for two reasons : 1. the threat of some consequences, 2. if they like you. I have no stick to use against them and I can't charm them. I hate feeling powerless. Ugh fuck bleck.


Some compression ideas I'll probably never work on :

Lossless image coder. See for prior art GLICBAWLS, TMW, MRP, CALIC, MEC. Lossless image coding has not really gotten the heavy work that other types of compression has and there's a lot of interesting stuff still to do. There are several fun issues :

When to use DPCM (LP, Linear Prediction) or not. For smooth part of the image you want ot use an LP, but there are digital parts and pattern parts as well that you could detect and do something better there. Eg. for text layed on top of a natural image, you want to adaptively per pixel identify the type of predictor and switch between a smooth/LP and a pattern/context type of coder. For example BMF has two separate models for smooth or digital images, you really should be picking the right one per pixel.

How to do the LP. There are a few pieces of this. For each pixel, you want the right LP for that environment, eg. some weighting of what neighbors to predict from and what shape of slope to predict with. Then you also need an error estimate. This is really just a machine learning problem. You are predicting a variable from a symmetric source and you need to predict the average and sdev, and your prediction should minimize entropy. There are lots of interesting pieces here. GLICBAWLS is very simple and close to the best, it trains an LP for each pixel based on the local neighborhood, but that is both very slow and also obviously can be beat in terms of compression by doing something like CALIC to characterize the local neighborhood and use different weights (GLICBAWLS always uses the same weights just based on distance). eg. in some areas you might only want a horizontal neighborhood because you detect that the vertical neighbor pixels are just random and not useful.

My idea is roughly to build a piecewise-regression model from the ML literature, using something like DT -> LP. The rough algorithm goes like this :

1. Start with some categories of predictors.  The fixed function CALIC gradient predictors might be good, something like :
	{N , W , NW , (N+W)/2 , N+W-NW , a larger neighborhood smooth predictor }

2. Classify each pixel by which predictor has the lowest error on it

3. Build a decision tree which maps each neighborhood to the desired predicition classification
	(this is a bit tricky because the error should not be classification mistakes, it should be entropy)

4. Now classify each neighborhood with the DT you made.  Construct optimal LP for each class.  Go to #2

This is rough and there are some missing pieces; for one thing you need to estimate the confidence in each neighborhood; again the confidence could be estimated by a function on the neighboring pixels, and also as a Markov model of the neighboring errors. You might also still want to do shape-context correction ala CALIC.

There are obvious things that current LP coders are not doing, but they're already super slow, so the question is how to do better and also be faster.

ADDENDUM : another interesting thing to explore would be using the other channels better. Linear correlation is not the place to look. I already did some work on finding optimal transforms for image coding, but in general you could do a lot more than just transforms. Even after a transform, the channels are still highly spatially correlated; that is, the places that are non-linear in one channel are very likely to be non-linear in another channel, that is LP errors occur in the same places.

High Compression LZ ; I'd like to take another shot at doing an LZ coder to beat LZMA on compression. The basics are obvious, arithmetic coding, context coding, forward optimal parse with "flexible parsing". The main new thing I'd like to try is an ROLZ with offset ordering. That is, rather than offsets just being 1 to window_size, make them 1= most likely, 2 = second most likely, etc. using some ordering that the decoder can reproduce pretty easily. PMR's make sense there, as does a simple ordering with context modeling. eg. contexts predict likely matches. A big deal is that a good flexible parse with the ROLZ can be a lot smarter about taking advance of those special offsets that code cheaply. You could also code lengths as deltas from 2nd best though I'm not sure that's worth the computation.

One idea is to not send big offsets ever. Long matches can be coded very efficiency by waiting for a context to predict them. That is, if you code from order-2 contexts with ROLZ, that means you give up 2 match len to get the prediction, so instead of sending {offset,Len} you send two literals and then {context,Len-2}. That's good for long matches. For short matches, you could have something special and local, like maybe only code up to offset 1024 or something. Short matches are only really important for binary files, which do a lot of 3 & 4 long matches, and those have very regular patterns.

This idea is not fleshed out, one problem with HCLZ is you can easily fall into the trap of just making it a context coder basically. Even LZMA has basically fallen into that trap, it's not much faster than Shkarin's PPMd.

Blocksort without MTF ; I've had this idea for a long time, and I know people have done it before and it's also sort of pointless just like HCLZ. The blocksort + MTF works well, but it destroys all your good extra information that you could use to get more compression. You'd like to get compression closer to PPM but keep the better speed of blocksort. Generally as you play with blocksort to try to get closer to PPM, you also become slower than PPM, but never get as good compression, so it's totally pointless. Anyhoo.

The idea is just to try to do the Blocksort without MTF. If you just look back at the previous symbols in the post-sort stream, those are the symbols predicted by your current context. You can almost recreate your context by looking backwards and looking for an inflection point where the local statistics rapidly change. Once you have that you have a local context and you can do "PPM". Obviously you can't do that exactly but maybe you can get close.

One bad thing about the MTF is that it keeps the tail ordering around too much. That is, the ordering of symbols past #16 or so is pretty much junk. You really want to just use the MTF to code the first 16 or 32 symbols, and then after that send an escape and code the remainder with their order-0 statistics, which doesn't change. When you transition from one major context group to another, you basically want to throw away your whole MTF order and restart with the order-0 order.

My basic idea is to use the power of "SE" (Secondary Estimation). It goes like this :

As you scan over the post-sort blocksort symbols, track the count of the last 8 symbols in one table using a sliding window,
in another table track the last 16, in another the last 32, etc.  (this is very fast with a sliding window update)

To code the current symbol, you code from the last-8 , if its not there you escape and code from the last-16, etc.
(with exclusion of course)

To code from the last-8 you do NOT just the counts.  Instead you use Secondary Estimation using the Markov *state* information of
where the symbols were seen.  That's basically the 64-bit word of what the last 8 symbols were.

Obviously 64-bits is too big to use in an SE table, but you can almost always pack that down.

if there was only 1 char in the last 8 -> that's one table entry
if there were 2 chars, then it's a binary pattern, and you know how many chars of each which gives you a permutation index.
eg. one char was seen 7 times, one char was seen 1 time, so you have
  00000001 = 0
  00000010 = 1
  00000100 = 2

it's obvious from those that they should have very different prediction characteristics, eg.


should predict the 0 very strongly and have a tiny tiny weight for the 1, while


should have a decent weight on the 1 since it might be the beginning of a context transition, while


is actually clearly an alternating state and should strongly predict a 0
(though the adaptive SE will tell us if this is a fluke or not)



should basicaly predict 50/50, it says we're in a context where 0 and 1 are both likely.

Once you get more than something like 4 different chars in the last 8 you stop doing this so carefully and just use something more like a plain order-0 coder because the number of permutations gets too large for the SE to have good statistics. But 2 and 3 chars in the last 8 is the most important case, as we know from PPM most of our compression comes from the very deterministic contexts.


The new Google features for research papers are very awesome. The "All N Versions" lets you find a PDF you can actually download; they've had it for a while but now there's the green triangle thing that points you straight at an actual copy you can get.


Sometimes these days I start to feel like one of those nutters who says things like "it doesn't matter who I vote for, the whole system's broken, they're all crooks". I don't believe that at all, and in fact I think that's quite foolish. There's a huge difference between the crooks. Just because the system is horrible doesn't mean you can't still make a big difference. There's a huge huge difference between the numbers 1 and 10, even though they're both really far away from 100 where you'd like them to be, you'd be retarded to not exercise your right to choose between the 1 and the 10.


Widespread stock ownership is destroying American corporations. The idea of corporate governance is that the executives work for the benefit of the stock-holders. That works well when the stock is held by a few magnates, like JP Morgan or Carl Icahn or Warren Buffet, they actually know what's going on, they can see when the CEO is raping the company and taking the profit for himself, but when the stock is owned by a million grannies with 401k's, they have no idea WTF is going on and the idea that they are holding the executives responsible is nonsense. It allows companies to be less transparent, to get away with more, and for incompetent executives to keep bouncing from job to job. Obviously there're advantages to having a liquid market where you can be evaluated by your merits rather than by the whims of someone like Morgan, but I contend that the market would function better if the minimum stock purchase was $10,000 or so. Stock analysts actually did honest due dilligence back when they were primarily doing analysis to make purchases themselves (with their bank's capital). When they started just pimping stocks for the public to buy all connection to reality went out the window.


I've lost my zeal for Obama. For a while there I was pretty high on Obama, because he seemed to be actually honestly addressing issues and speaking directly and intelligently about what's happening in the world. Obviously he's still better than McCain or W. but a few things are disturbing me. For one Obama seems to be promising free money to everyone these days. I know it's the campaign pressure, but it's sad to see him cave in to that and offer stimulus and tax cuts and money money hey vote for me I'll give you money!

The reason for America being fucked up and the positive change that's happening is going to be obscured. Obama is going to inherit a complete clusterfuck. He will have no budget to be able to do anything significant. In 4 years we're going to be in the shitter and Republicans will be able to say "Dems have been in control for 4 years and look at the shit we're in". Democrats will become complacent because "they won".

We're missing our chance to actually do something significant to fix the US government. Crises are one of the few chances that we have to actually get major changes through, when enough people are upset and the usual interests that stop any change are sidelined. Lately the Shadow Powers have been very slick about using crises to improve their positions. After 9/11 they quickly got the Patriot Act and all kinds of other nonsense through to be able to spy on americans and give more power to the executive and so on. With this financial crisis they get to give out masses of money. Getting Obama in will kill the momentum for real change.

A sampling of things we actually need :

1. Require transparency of the executive. Everyone in the executive branch except the president must answer congressional subpeonas. That's like required by the constitution or something, but I'd love to see a "Prime Minister's Minutes" thing for the President but we know that won't happen.

2. Require the executive to enforce the laws as intended. That means allocating budget to laws even if the President doesn't like them. Put an end to this idea of signing statements.

3. Require transparency of the Pentagon budget. No more multi-billion dollar secret budget for the Pentagon. The claim that it's crucial to national security is just nonsense these days. Our security from Al Qaeda is not going to be compromised if we disclose that an extra $6 billion went to Lockheed for "research".

4. Eliminate the Electoral College. The allocation of votes based on the senate gives unfair extra votes to tiny states with microscopic populations that provide very little to America. Each person's vote should have equal importance, this system gives much greater value to votes from tiny states. A vote in Alaska is worth 3* a vote in California. (Pop 670000 in Alaska = 3 electoral votes, Pop 36458000 in CA = 55 electoral votes). WTF, how are we the fucking majority of people tolerating this - hey your Californians, your vote is worth 1/3 as much as a vote in Alaska, step the fuck up and change that! This is the reason the Republicans have been able to win with the "small town" strategy for the last few elections.

5. Eliminate the Senate. Similar to the previous. Also, greatly reduce the number of people in the house, maybe like 150 is reasonable. It's too much, it's impossible for voters to keep track of who their congressman is and what they're doing.

6. Make lobbying illegal. All lobbying. Yes, there are some benefits to lobbying but the costs have grown to far outweigh the benefits.

7. Require all candidates to only use public financing. Similar to the previous.

8. Restore the media ownership laws we had until Clinton & Bush tore them up. Basically don't allow one company to control all the news media in a single market.

9. Restore the bank laws Clinton repealed. Give government insurance only to plain banks with lots of supervision and plenty of cash cushion. Banks with insurance are not allowed to be connected to any other financial institution.

10. Eliminate all the government-corporate complexes in which the government enforces monopolies and gaurantees pricing for companies that are weakly regulated and get to keep their profits. In some cases this would mean nationalizing more, in many cases it would mean removing government intervention in fixing markets. For example, airwaves and cable lines should be changed to an open bidding system. Transmission on the power grid should be opened up more, but management of the grid itself should be nationalized more. Get rid of the Medicare Prescription plan and instead require pharma companies to sell pills to the government at cost. If you want to have the privilege of selling pills for profit in America, you have to sell at cost to people on Medicare.

11. Require the power companies to fix their plants and get up to current environmental standards, eliminate the "old source" loop holes. The idea that this would be a massive cost is a complete fallacy. In most cases the improvements in efficiency would actually decrease long term cost to the consumer, they just don't want to do it because it's a big hurt to short term profit which is how executives makes their paycheck.

12. Stop subsidizing the automobile by pumping money into roads, gas companies, cheap oil leases, etc. Make drivers pay for their cars, AND tax them the cost of their pollution.

13. Spend more on the things that are going to save us. Technology infrastructure, public transit, education, etc. And by education I mean real substantive education for the brightest who really matter, not rubber stamp colleges and primary schools teaching memorization for standardized tests.

Anyway, there's a lot more. You can read my political rant from the 2000 election and pretty much all that still applies because we never fucking do anything significant that would actually make a big difference and structurally change how broken the system is.

Let me say that I do not dismiss the importance of competent beaurocrats. I think under W we got see just how important it is to have good people even in a horrible beaurocracy, and a bad president can really chase away the competent technocrats, while a good one can give people hope and bring a lot of good people into the horrible jobs in Washington.


A few more hashing notes and then I have to move on :

1. Sean pointed out there are cases where the static jumplist thing totally kicks ass. For example, if your data element is a large object and you have them in a flat array anyway, like eg. its 1024 byte file information record. In this case you don't want a normal hash to hold the data by value because it's large and you don't want to waste a bunch of space on the empties. The jump table can index directly into the flat array of elements, so the only memory use is for the index table itself, which is very small. In fact, the number of indexes need is around (N/4) and the size of the index is 4 bytes, so for N elements the index is just N bytes. Note that you can only index the array on one key this way, because it requires reordering the original array for the jump index.

2. Classic style hashes like khash are really bad as "multimaps" in the STL lingo. When you put in a bunch of identical values it makes them locally N^2 because even fancy reprobing can't get identical values to separate. The STLport style of hash handles being "multi" a lot better. Basically just don't try to make classic hashes be "multi" ever. (to some extent any multi-hash is locally N^2 inherently, but the STLport style can insert quickly, give you an equal_range easily, and then once you have it it's ust a linear walk over all the equal key elements, not a bunch of jumping around the table).

3. For the sorted list you could do an interpolation search instead of a binary search. The hashes, which I store, are actually nicely linearly distributed, so you can get a very good estimate of where to jump in. A pure Interpolation search is very slow and kind of silly actually. Once you do your first interpolation to get an initial index, further interpolation doesn't make much sense. You can almost just do a first interpolation and them linear search from there. In fact I just tried that and it works very well :

Binary search on sorted list : 370 clocks

Full interpolation search : 450 clocks

One interpolation then linear search : 300 clocks

Implicit binary tree search : 290 clocks

STLport hash : 150 clocks

jump list : 114 clocks

So obviously there's something of value there, but the one interpolation is sometimes pretty far off and then it linear searches a whole lot. What it could do is do one interpolation, and then do a binary growing search, like +1,+2,+4 to find the range that the value lies in, and then once it finds the range do a binary or linear search to find the value, but I tried that and it was slower. Also note the fast version above is using >> 32 to divide by the hash range and I assume that the hash is using almost all of the 32 bit range. Divides are way too slow to be in the search.


So I benchmarked khash from Attractive Chaos ; it's a little bit slow, but the memory use is very low. It's basically a plain old flat mod-prime hash table like the old Knuth days. It uses a form of linear rehash reprobing. It has a separate flag array for empties which is a poor decision IMO, that means separate cache misses, and you pretty much always have key values that you can use for EMPTY and DELETED. It's better to just have the key array. He also has the data in a seperate array which is again bad for cache, but I'm not using that since I'm just doing a "set" not a "map".

On the plus side, the memory use is quite low. One thing I hadn't really thought of is that using mod-prime hashes lets you size your hash much closer to the size that you really want. Like in one test I have 8950 words, which with a pow2 means I jump up to 16384 and waste a lot of space. With mod-prime I can have a table of primes and get close to the fill size I want, like for 75% I can use 11933 which is almost perfect.

The hash for strings in khash is a variant of a "Bernstein" hash which is very special-cased to ASCII. It's actually a very poor hash, but it works totally fine when you mod-prime it. mod-prime just like magically fixes crappy hashes and makes them okay. I tried putting some better hash functions in khash and they don't really make a difference. I saw the same thing with STLport hash which also uses a mod-prime - it's very insensitive to the hash function.

Anyway khash performs similarly to the STLport hash but with lower memory use, so that's cool. Basically a good old fashioned reprobing hash is totally still a good thing. Sean's "stb.h" has a very similar hash implementation that is better in some ways. STB doesn't have seperate flags, it uses the EMPTY and DELETED keys. STB uses pow-2 sizes, but I'm not sure that's a win, the mod-prime is pretty nice for these hashes.

Very large benchmark :

num words : 775481
num queries : 1602134

jumplist             : 0.13 seconds, 249.35 clocks, rate= 12.04 M/s
mem use: 10398156 = 13.41/per

strhash              : 0.14 seconds, 255.68 clocks, rate= 11.75 M/s
mem use: 12582912 = 16.23/per

std::hashset         : 0.17 seconds, 311.93 clocks, rate= 9.63 M/s

khash                : 0.16 seconds, 303.12 clocks, rate= 9.91 M/s
khash mem use : 6684696 = 8.62/per

sorted               : 0.43 seconds, 809.12 clocks, rate= 3.71 M/s
mem use: 6203848 = 8.00/per

implicit bt          : 0.33 seconds, 621.21 clocks, rate= 4.83 M/s
mem use: 8388608 = 10.82/per

Small benchmark :

num words : 8960
num queries : 1602134

jumplist             : 0.06 seconds, 113.15 clocks, rate= 26.54 M/s
mem use: 137220 = 15.31/per

strhash              : 0.07 seconds, 138.33 clocks, rate= 21.71 M/s
mem use: 196608 = 21.94/per

std::hashset         : 0.08 seconds, 150.36 clocks, rate= 19.97 M/s

khash                : 0.11 seconds, 197.32 clocks, rate= 15.22 M/s
khash mem use : 52232 = 5.83/per

sorted               : 0.20 seconds, 367.69 clocks, rate= 8.17 M/s
mem use: 71680 = 8.00/per

implicit bt          : 0.16 seconds, 294.28 clocks, rate= 10.21 M/s
mem use: 131072 = 14.63/per

Very Large benchmark is all the unique words of 4 chars or more in the 10^8 byte enwik8 data set.


Holy crap Pushing Daisies is so awful. It's sickly sweet and cheesy, which would actually be okay if it had any heart or realism to it. It's very obviously trying to copy Amelie and other similar movies that copy the style of fables in modern settings with pumped up saturation and magical realism, but it's all sacharine with no body behind it. It's disgusting. I find garbage like that offensive, worse than nothing, because it makes it harder for me to appreciate the sweet fables that I do love.


Hashing scares me. Any time I see a hash function I feel great skepticism. I have no confidence that a hash is actually working well unless I test it in the real situation it's being used. Hashing is fast *if* everything is working perfectly, if the hash table is actually sized right for the data, if the hash function is actually randomly distributing your actual keys, if the reprobe is not walking all over unnecessarilly, etc.

Preface : What I want to do is hash strings for lookup. I want low memory use. I'm going to store a (char *) and a 32 bit hash value in each element. I only do a full strcmp if the 32-bit hash values are equal. The time that I want to minimize is the combined time of the hashing function and the lookup using the hash. So the hash computation should be fast and also good enough so that the lookup performs well. Hashes that don't randomize well will result in table collisions and also unnecessary strcmp's being done which is slow. The 32-bit hash value is used in the elements, but generally a much smaller hash size is used for the table. Ok, now let's get on with it...

The STLport hash is a linked list of all the elements, and the hash table gives you jump pointers into that list. When you make a query with a given key, it makes hash(key) and then you know that your element is between

begin = jump_table[ hash(key) ]
end = jump_table[ hash(key) + 1 ]
where begin and end are pointers into the big linked list. This lets it easily make an iterator that can walk over all the data in a deterministic order (though not sorted order, unless the hash function is lexicographic which it probably isn't). It also makes insert and remove very fast because you just find yourself then unlink the node. STLport grows the hash table pretty aggressively, and just uses a linear list within each segment. You could grow the jump table more slowly, and then use a sorted list on each segment and do a binary search. This is equivalent to the conceptual model of a hash table where each bucket points at a linked list, ala PkZip.

A few things of note :

1. STLport grows the hash table based on *total occupancy* not the max occupancy of any bucket. That means if your hash function is retardedly broken, it can perform really badly, lookups can become O(N). That's really your fault. On the plus side, because it's not a reprober, if it gets all screwed up in some part of the hash, the other parts will still function file, eg. screwups don't leak all over the way they do in reprobers.

2. The STLport hash is both faster and uses less memory than map<> or almost any self-balancing binary tree type thing. Pretty much any time your key is something that can be easily hashed, you should use a hash.

3. The STLport hash always makes the table prime number in size and uses a mod to convert your hash() into a table index. Basically they don't trust you to be making decent hashes (wisely so). For example if you just have an integer index, you can return that as your hash and it will work fine.

4. For static data you could of course do something neater by just keeping the elements in a vector<> and making a jump table that indexes into the vector. Also if you know you can make good hashes you could use a pow two size and a mask instead of a mod.

A normal table-based hash with reprobing on collision can be faster. I tried making one that's minimum size. I want to use a pow2 to avoid a divide, so I just find the next pow2 up and use that, meaning I have very little empty space. A normal reprobing hash can perform very badly when there's not much empty space because you have to keep walking until you see empty to know you've visited everything that goes to your bucket.

The first thing I tried as reprobing with a step of 1. There's a lot of fancy stuff on reprobring with rehashing and whatnot, but I've read that those things are terrible for cache and it's better just to reprobe by stepping linearly, so I'm trying that. ADDENDUM : more on this later.

The next is to count how many elements really belong in each bucket. When you insert something in a bucket, you bump its count, and then if its full you reprobe and add it to the actual first empty spot you find (and don't bump that count). Then when you lookup you only need to keep reprobing until you've seen count elements that hash the same as your query. For very full hash tables this means much less reprobring, it was about 2X faster for me (though it makes the hash table bigger because you have to store that count).

Test of finding all the words in "book1" and "book2"

sorted list                   : 537.19 clocks
reorg as implicit binary tree : 348.14 clocks

std::set                      : 664.89 clocks
std::hashset                  : 253.01 clocks

hash with 1-reprobe & count   : 212.33 clocks
static vector jump table hash : 152.29 clocks

The winner for me is the "static vector jump table hash" , which is basically the STLport hash modification #4 that I described above; the elements are just stored flat in an array, and the hash bucket tells you the start index of the data that goes in that bucket. I found that a linear search is much faster than a binary search within the bucket, but I do still sort within the bucket so that I could early return as soon as I see a value larger than the query in the linear search walk. Note that the jump table is actually *smaller* than the hash because it doesn't need any empty space.

Memory use :

num words : 41509

sorted list lookup:
mem use: 332072 = 8.00/per

	sorted list = 8 bytes per element (zero overhead)
	element = 4 byte hash + 4 byte pointer

implicit btree lookup:
mem use: 524288 = 12.63/per

	implicit btree = sorted list with empty spaces to make perfect pow 2 tree
	worst case almost doubles memory use when size = pow2 + 1
	best case is zero overhead when size is exactly a pow2

jumplist lookup:
mem use: 594220 = 14.32/per

	jumplist = 8 bytes per element + 4 bytes per hash entry; 16 bit hash

strhash lookup:
mem use: 786432 = 18.95/per

	hash is 12 bytes per element + overhead for empty space

I should note that I'm hashing words here, which are quite short, which means the overhead of the hash is crucial. Hashes with expensive rollups don't work. If you were hashing something very very long, it might be wise to use a hash like Bob Jenkins' which rolls up to a loop that works on 3 32-bit ints at a time.

Links : Paul Hsieh hash ; I found this to be slow. It's certainly not magic.

Bob Jenkin's page ; nice page with a summary of tons of hash functions and discussion of what makes good hash functions. His "lookup3" is very complex and slow on small blocks but apparently fast and good on large blocks.

Murmur Hash good simple hash, works on 32-bits at a time so probably fast for big blocks, but was slow for me (though a lot faster than Paul Hsieh).

FNV Hash is the winner. This is the most basic hash and it's also the best for this application. It's just multiply by a prime and xor in the next byte. Apparently the primes he lists here are especially good, and you should take some care in how you make your final hash value from the 32-bit hash. In my experiments it really doesn't matter what prime I use, any prime bigger than 256 seems to work fine.

I think hashes where you do a bunch of shifts are probably pointless. The reason is that one multiply by a good magic number is equivalent to a whole mess of shifts ! eg. x*5 = (x << 2 ) + x , etc. On almost all modern chips a multiply and shift are very nearly the same speed, so hashes with one multiply are faster than hashes that do a bunch of shifts. (sadly there are some major new architectures where that isn't true; multiply and shift by a variable are both slow, while shift by a constant is much faster)

Okay, I counted collisions too.

	// 220 clocks :
	// StringHash collisions : 4620821
	//return rrCRC_Block((const U8 *)string,(S32)strlen(string));
	// 190 clocks :
	// StringHash collisions : 5520089
	//return PaulHsiehHash(string,(S32)strlen(string));

	// 180 clocks :
	// StringHash collisions : 4372337
	//return MurmurHash2(string,(int)strlen(string),0x12345678);

	// 171 :
	// StringHash collisions : 4815568
	//return oneAtATimeHashStr(string);
	// 152 clocks :
	// StringHash collisions : 4192267
	//return 4001 * stb_hash((char *)string);
	// 146 :
	// StringHash collisions : 6157277
	//return bernstein(string);

	// 144 :
	// StringHash collisions : 6603083
	return alphaNumHashStr(string);
	// 142 :
	// StringHash collisions : 5892286
	//return FNVHashStr(string);
The best hash in terms of avoiding collisions is the STB hash, which is :

unsigned int stb_hash(char *str)
   unsigned int hash = 0;
   while (*str)
      hash = (hash << 7) + (hash >> 25) + *str++;
   return hash + (hash >> 16);

The top three best hashes are some of the worst in terms of avoiding collisions. But they're also super simple. The time I'm measuring is the total time to compute the hash & use it for lookup. The lookup is pretty damn fast so any extra work in computing the hash has got to be worth it.

BTW conventional wisdom says the hash value in stb_hash should not be initialized to zero, because that makes all the lengths of pure-zero input all map to the hash value zero. Of course that is irrelevant if you're hashing strings. In fact, using anything but 0 there seems to hurt the number of collisions. Also note the shift 7 & 25 is a rotate around the 32 bits which can be done in one instruction on x86, which would make it very fast. This hash has some bad theoretical degeneracies, and Bob Jenkins actually lists this exact hash as an example of a bad hash, but on my test data it has the fewest collisions of any of them! (I haven't tested Bob's full strength hard core lookup3).

Again the murmur looks very good; if you're just going to take a hash and drop it in, which I strongly discourage, murmur might be the way to go. Also I should note that the big flat hash table with reprobing can be very fast if you let it be very big and have lots of empty space. In my experiments it needs around 50% empty space to be fast, even at 25% empty space it gets very slow, beyond 50% empty space doesn't help much more.

I also generally don't like checking performance in ways like this. The real application is queries that will happen randomly in the middle of other application processing. A hash functions with a table lookup might perform very well in an isolated test, but when you drop that in the middle of a real app, the table is not in cache and your hash function also dirties the cache which slows down the rest of your app.

ADDENDUM on collisions : So handling collisions well is actually very important. In fact, with good collision handling the reprobing hash can be almost as fast as the static-jump-list hash. In latest test static jump list is 114 clocks and reprobe hashing is 119 clocks. That's with a 75% full hash table. (you may see authors around the net using 25-50% full hash tables; those people are memory wasters and their speed numbers are lies!).

So the +1 reprobing is in fact great for cache but it does suffer badly in areas of degeneracy. Areas of degeneracy arise where a bunch of values get put in the same bucket. This can happen even if your hash is a perfect hash in the sense that random input = random output, but you might just give it input that has many values that hash to the same spot. When you do +1 probing in those spots it's locally N^2 because for each value you walk on all the other values trying to find empty space.

One good way to reprobe that I got from Sean is to use bits of the hash that were not used in the hash indexing. That is, you computed a 32 bit hash value, but then you only used 16 bits to index the hash table, usually using something like xor folding :

index = (h >> 16) ^ h;

So, try to make a reprobe that uses the *other* 16 bits of the hash. For example :
reprobe = (h>>8) * 173 + 1;


reprobe = h + (h>>6) + (h>>19);
or something from the FNV page. What this does is take two elements that has to the same bucket and makes them reprobe to different buckets (hopefully) so they no longer collide. Yay. But just using that as a linear reprobe is not the best thing.

Another thing you can do is a quadratic reprobe. Your reprobe step starts at 1 and then grows quadratically. Warning : some quadratic steps do not ever cover all values, so they might be an infinite loop. The Google Sparse Hash guy claims that triangular steps do cover all values but I haven't seen anyone else claim that so YMMV. (ADDENDUM : apparently it's in Knuth and also on Wikipedia - yes Triangular numbers do in fact cover all steps). A triangular number is n*(n+1)/2 , that's {1,3,6,10,15,...}. Anyway, what the quadratic step does for you is it makes neighbors not stomp on each other. That is if you have one bucket at hash "h" with 10 elements in it and a bucket at hash "h+1" with 10 elements, if you just do +1 reprobing they'll be all up in each other's business and make 20 solid elements with no space. The quadratic step makes big spaces between and they don't run into each other.

The best reprobe that I found is a hybrid, basically it goes like this :

reprobe with +1 twice

then reprobe with a rehash like (h>>8) * 173 + 1;

then reprobe with a linearly growing step : 1 then 2,3,4
This is faster than any other reprobe that I tried by a few clocks. Annoted what we get is :
reprobe with +1 twice
	-> very good for cache in the cases where we get lucky and have empty space right next to us
	-> basically this is so fast its free if we got lucky, helps more for more empty tables

then reprobe with a rehash like (h>>8) * 173 + 1;
	-> breaks the original hash degeneracy, but it's a very evil random cache miss
	-> (btw you could prefect this address before doing the two +1's which would probably make us even faster)

then reprobe with a linearly growing step : 1 then 2,3,4
	-> that's a triangular number from the original index
That's the reprobe that makes us almost as fast as the static jump list, even at 75% fill. BTW the rough numbers were :

Reprobe +1 : 170 clocks
Reprobe with rehash : 130
Reprobe with quadratic : 127
Reprobe with rehash then quadratic : 125
Reprobe with +1 then rehash then quadratic : 119

Static jump list : 115

ADDENDUM more links : Google Sparse Hash Implementation Details is a nice document; it's a cool idea. Basically it's a big flat put the values in the table and reprobe kind of hash, but he uses a really neat sparse table thing instead of an array, which lets you have more empty space in your hash without wasting memory on the empties. Obviously it's not the fastest thing in the world but if you're in a memory-constrained situation it's hot (not memory constrained like small memory, but like hey my search cache machine is using 3.9 GB already).

Cuckoo Hashing looks pretty interesting. I definitely think this could be faster than regular reprobe hashing, and in fact it's ideal for my case of put-rare-get-common.

Python dictionary implementation notes Some good stuff including notes on cache locality in reprobing.

little page with numbers of reprobes for different load factors

Attractive Chaos has a big hash table comparison. His hash appears to be very ordinary, but he claims it's the best, so I'll benchmark his hash tomorrow for completeness...


I had like 8 transitions in cbsaver but I've disabled them all, and now it only does a slow gamma-correct cross fade. Gamma correction makes the brightness change linear so it's less distracting. IMO lots of flash and sizzle in screen savers is very bad, particularly for a living room HTPC screen saver, I want it to be very mild and inobtrustive. The latest version is in exe : cbsaver zip .


For static data (or semi-static) and very low memory use, one of the best data structures for lookup is just a sorted array. I've written about this a bunch in the past long ago, so you can look back if you like, but hey you basically sort & then you binary search, and it has zero memory use for the lookup because you're just holding the data you had to hold anyway. Part of the reason it's good is that in the common case of small lookups, hey all your memory is right there in a row and it all comes in to cache, and you never have to jump around pointer chasing, and you can easily do things like once you get down to a range of 8 elements or so in your binary search you switch to linear.

While that lookup is log(N) it can be a little slow for big trees, partly because logN gets big, but mainly because you jump all over in memory and cache miss at every branch. Rather than using a plain sorted list, you could do a binary tree style of reshuffle, eg. first sort, and then reorder the list to {N/2, 0,N, N/4,3N/4, N/8,3N/8,5N/8,7N/8 , ... } so that as you do a binary search the data you want to lookup is right near you in cache, of course this is just a binary tree with implicit child indexes, it's just a reshuffling of the array it's not creating any new nodes. Actually I guess the best layout is something like a van Emde Boas tree; it's intuitively obvious that you don't just want all the elements at the same depth next to each other, rather you want to gather subtrees near the leaves so they are consecutive chunks in memory. One problem I'm having is that I don't want to store indexes to kids, I want to implicitly index the tree, and it's not clear to me how to implicitly index a vEB tree.

BTW this is sort of like doing a B-tree but having 16 elements in each level of the B-tree and using a binary search to find your element in each step of the B-tree. So really you still just have a binary tree overall, but it's organized differently in memory. This is simple and all, and in fact pretty good, but the problem is that choice of # of elements at each level of the B-tree is very strongly tied to your architecture's cache line size, which is evil for libraries. See Cache Oblivious Binary Trees .

There must be lots of research on this but I'm having trouble finding it. What I'm interested in is speeding this up simply, or adding a small acceleration structure, something that's O( sqrt(N) ) in size, not O( N ) like a full hash.

For example it seems obvious that you should be able to do a jump-ahead structure to get past half the depth. One obvious thing would be to hash the lookup key down to log(N)/2 bits, then make a seperate sorted list for each of the hash values.


I really think the poor should rise up, and I encourage them to do so. Fancy cars should be stolen. Ostentatious mansions should be broken into. People walking down the street in fancy garb should be robbed. It is illogical to not do so. It's a necessary check on society.

The upper class simply does the logical thing which gives them the most immediate benefit - stomp all over the lower class and all those weaker than them. They will continue to do so until they have a reason not to.

One of the things I learned in poker over time is that you should never stop running over someone. That is, your gut will tell you to slow down. You think there's no way you should be able to get away with the absolute obvious larceny that you're setting up. You gut is wrong. When someone lets you take advantage of it, you just do it, and do it again, and again and again, your gut will be screaming "they must see what we're doing, slow down! try to hide it more, ease up a bit" , no, your gut is wrong, just keep robbing them. Until they actually give you some clear sign that they are going to respond correctly your to overplaying, just keep on overplaying. (specifically in poker this goes for being super tight just as much as it does for doing lots of stealing; when you're playing super tight waiting to trap some maniac, you starting thinking "god there's no way he's going to give me action, I'm playing so tight, I need to pretend to be looser to trick him", no, no you don't, until he actually shows you that he can slow down and stop being a maniac because you're so tight, you just keep on waiting to trap him).

The upper class is like "um, we're gonna just cut all the taxes for the rich" and the lower class is like "um, shucks, okay, sounds good". WTF, you retards, you deserve it. When somebody overplays a certain style, it's only wrong because it opens up a possible correction. If that correction is not taken, then the overplay is not wrong.


I've been feeling a little lost the last few days working on Oodle. The low level stuff is coming along well, but the high level APIs are a big scary void of possibilities. I feel like I need a more realistic test app to really stress the system and to see how easy the APIs are to work with and all that. I've got unit tests and such but that doesn't really do it, I want a mini game that loads data the way games do and pages data the way games do, with stuff streaming as you walk around, and then big loads at area boundaries with a prefetch and a transition area. I'm uncomfortable writing libraries without a user; I at least like to have *me* as a user, but more would be even better.

One of the tricky things about the high level is how easy should I make it for games to do the broken stuff they do now, like for example the Unreal style of having everything in giant packages. I generally want to make the APIs encourage people to do things the "right way", and make that as easy as possible, but at the same time I do have to provide some reasonable fallbacks for more broken behavior.

Basically the "right way" is having static resource usage information and letting me make bundles of resources however I see fit, and then letting me load the bundles in and out of memory based on the game's usage and prefetch hints (or, you know, if you want to define bundles in logical units that are reasonably small and atomic that's okay too). There are two main kinds of "wrong way". One is a system that just says "okay load everything" at the start of the game. The other "wrong way" is just randomly requesting content synchronously in the middle of frames with no static resource analysis or prefetch to help me out, like just randomly going "texture * cur = load( rand() ); int a = cur->bytes[3];"

I've got a few ideas of little fake games to make. One idea is to make a "Sparse Texture Buffer" (STB) (aka SVT) test app that does the whole texture-page loading thing. I could cook up 1 GB of texture and put it on a quad and fly around on that and page texture. That might be amusing and make people say "oo" but it's not really a very good test of general game content loading. I might do it anyway just because it's easy and sounds fun.

Another idea is to try to do some geometry loading for an actual walkthrough type of thing. One problem with that is getting enough geometry to actual be too big for memory and stress and the paging. I'd have to somehow find a hell of a lot of geometry, or just fake it by duping a few files and calling them different names, or do some procedural geometry generation which I guess could be fun.

One idea that popped in my head was doing a little real-time raytracer. The fun thing about doing a realtime raytracer for paging is that I can do the infinite geometry load-on-demand thing. The basic idea goes like this :

Fire all the rays from the camera
	(group in packets of 4 or 16 or something)
	(process in a coherent tile order for locality)

Intersect rays with objects in front-to-back order

When a ray collides with a resident object its done

If a ray hits a non-resident object, request that object to page in,
	and push that ray to the deferred queue

After you process all the rays you can, see if any of the objects we requested became resident
	if so, collide them
	if not, wait a little if we have enough frame time

If we're running out of frame time and the objects still aren't resident,
	then collide the lower LOD version that's in memory, or some other fallback

this gives you perfect object residency which is just-in-time (if the IO returns fast enough and you don't stress it too hard). Because the rays tell you what to load you get perfect occlusion residency information. There's other paging stuff to make it work better, such as prefetching objects that are very likely to be hit at the start of the frame and also prefetching in the direction of camera translation to anticipate.

I could do the Id oct-tree thing, or just do a more normal objects that are made of kd-trees and triangles kind of thing. Writing a little packet kdtree raytracer is actually pretty easy, but I still might not do this because it seems like getting off track a bit too much. And I still have the problem of getting enough geometry to actually stress the pager.

ADDENDUM : hmm , yeah I should probably just find some reasonably simple open source game to plug into and replace its IO. Anybody have a good idea of a game I can adapt whose code base won't break my brain?


Links about Inigo Quilez and real time Ambient Occlusion. Some of the demo videos are quite beautiful, I like the art style.

Realtime Ambient Occlusion, part 2 - GameDev.Net Discussion Forums
Real-Time Rendering 2006 - DEMO
meshula.net » Screen Space Ambient Occlusion
Inyigo Quilez - home page
Inyigo Quilez - fractals, computer graphics, mathematics, demoscene and more
Inigo Quilez's videos on Vimeo
3d geometry + pixel effects cube field on Vimeo
WaveSlice on Vimeo


I've been reading through the papers from Interactive Raytracing 2008 . The talk slides posted there are all pretty worthless, but you can copy-paste the names of the talks into Google and find the actual papers. The one on BSP's is pretty good, there's a lot of solid stuff. ADDENDUM : Here are all the papers

I like the paper by Andrew Kensler on tree rotations to optimize trees with simulated annealing. (there's a lot of good stuff at his site if you click his name). It reminded me of this old talk by Sedgewick on Left Leaning Red Black Trees ; I can't remember if I linked that up when I first read it, it's very elegant and a nice clean way to understand self-balancing binary tree algorithms. Not really super related to the Kensler paper, just fired that neuron for me.

The simulated annealing thing is a fun approach. It can be used on any kind of problem that you can quickly find an approximate greedy solution to, but where the true global optimum search is very hard, but you can locally measure how a change affects the global cost. It should now be obvious to the reader that the LZ optimal parse we just talked about is one of these problems. If you think of it as a shortest path search, you can see that you can seed the path with a greedy normal parse to get an initial decent solution, then you could consider making local changes and see if they improve it, like replacing any match with a shorter match and then filling in the blank with the best match there. If you just make all the local improvements that improve data rate, you will get closer to the true optimum but you won't reach it because you'll be stuck in a local minimum, so simulated annealing to the rescue to let you agitate out of the trough and temporarily screw up your LZ code stream before you settle back down.


I've seen a lot of bad movies recently (such as "That Obscure Object of Desire" and "A Scene at the Sea" , wow, huge wastes of time). Fortunately that streak ended last night with "Ratcatcher". It's a simple contemplation of growing up in squalor. It's full of quiet moments of painful beauty.

It follows in the line of "Kes" by Ken Loach or the early TV stuff by Mike Leigh; there are quite a few movies about poor kids in the north of the UK; another good one is "Small Faces". But the imagery of "Ratcatcher" is very poetic.


I don't really understand why people like the MacBook. They're way overpriced (like the ThinkPad), about 50% over the price of a similar Dell and double the price of a similar no-name. They're fucking widescreen which I hate. Hello !? WTF, text pages are fucking vertical. 4:3 is about the widest usable aspect, and in fact I actually like 3:4 better; when I had a monitor that would rotate vertical I would do it a lot, and it's super nice for web pages or reading PDF's or something. Widescreen on a laptop is useless and retarded, it means I get like 50 lines of text vertical. I'd like a 16" laptop that does 1600x1200. Then they do the fucking retarded awful thing of having a wide form factor, but not using all that width for the keyboard ! WTF ! The tiny laptop keyboard is one of the worst usability problems and you're making the thing fucking widescreen, make the keyboard wide! Oh no, we need one and a half inches of worthless aluminum on each side. No comprendo.


You need youth to innovate. I used to think about all the many things that computers could easily do that they didn't; I would rant and rave about how we could so easily be doing this and that if only people would get their heads out of their asses. Now I just wish I could do the few simple things I like to do without it being a frustrating ordeal.


Urg; Nifty Perforce looks pretty cool, but I can't use it because its VS 2005 and I'm stuck on 2003 because the new version of Visual Assist is like completely different than the one I'm used to. I really fucking hate relying on anybody else's programs because they always fucking suck. The only decent thing in the world are the things I make myself. Or something less extreme and crotchety.

Anyway, if you're on VC 2005 then Nifty Perforce looks like it fixes a lot of the P4SCC awfulness, basically by just being much less obtrusive. It has the only integration feature that I actually want, which is the automatic checkout on edit of readonly files.


I made the Dave Rosengarten Molten Chocolate Cake the other night. It was quite excellent, the best molten choco cake recipe I've found yet. It's still not quite perfect though; the outside parts that are more cooked are too cakey and tasteless. I want the outside bits to be fudgy and caramelized, very crispy on top. The center was just about perfect, and I think part of that was because his recipe involves actually putting melted chocolate in the middle rather than just relying on the batter being less cooked in the middle the way most molten chocos do.


OMG the VC perforce integration is such a fucking disaster. Any time I open an old project from Oddworld it's like 15 minutes of clicking dialog boxes that randomly pop up then waiting while it stalls out trying to find the fucking perforce server for those files. ARG.


"bool" in C++ fucking sucks. For one thing, its size is undefined, so you can't ever store it, and yet tons of people do, which makes huge bugs if you ever switch compilers and the size of bool changes. (that "size of bool is undefined" was a stroke of genius by the standards committee, assuming their goal was to cause maximum heart damage). Another problem is that it silently converts into "int" which is pointless and evil so far as I can tell. The last big problem is that if you're trying to make code that works in C++ and C there appears to be no good way to set up a "bool-alike" that works in C.

I want a fricking thing that is either on or off, and can only be converted between ints with an actual boolean operation, like

int x = b ? 1:0;


bool b = (x >= 3);

If I was only in C++ I could fix the size thing. The best way to do it is to make your own class bool, like :

class Bool
	Bool( bool b ) : m_val(b) { }
	operator bool () { return m_val ? true : false; }
	S32 m_val;

which makes you act like bool but gaurantees a fixed size. But you can't integrate that with C.

To integrate with C I see only two options, either typedef Bool to an int, or typedef Bool to an enum. The enum is different but not really much better; enums still silently convert to int, but at least it does prevent int conversion to bool, but then it also doesn't convert actual booleans to the enum.

Example of bool horribleness :

enum EBool

void testb(bool x,int y);
void testi(int x,int y);
void teste(EBool x,int y);

static void test()
	testb(true,3); // good
	testi(true,3); // silently bad
	teste(true,3); // error

	testb(4,7); // warning
	testb(1,7); // silently bad
	testi(4,7); // good
	teste(4,7); // error

	testb(eTrue,3); // silently bad
	testi(eTrue,3); // silently bad
	teste(eTrue,3); // good

It's easy enough to say "don't use bool" but I don't see a way around it. Obviously I could not ever store bools in structs (though that's hard to enforce and just leads to evil things like BOOL). The worse situation is function args like in our example above. I want to be able to make a function that takes a bool arg, and actually *requires* the client to pass in either "true" or "false", not some random int (or 0 or 1). In particular, when I change a function signature, I want compile errors when people call with the old signature.

In C++ I can do this :

class Bool
	explicit Bool( bool b ) : m_val(b) { }
	static const Bool T;
	static const Bool F;
	bool operator == ( const Bool & b ) const { return m_val == b.m_val; }
	bool operator != ( const Bool & b ) const { return m_val != b.m_val; }
	bool operator ! () const { return ! m_val; }
	S32 m_val;

const Bool Bool::T( true  );
const Bool Bool::F( false );

#define True	Bool::T
#define False	Bool::F

Which actually works pretty well and accomplishes most of what I want, but it's a little clumsy and doesn't work with C at all.


Scalloped Potatoes (au Gratin) so I can do it again :

Oven to 400 (385 on my over-hot oven)

Slice :
ham in bite size pieces, about 1/2 cup
4 cloves of garlic into small dice
1/2 yellow onion (no dice)
4-6 yukon gold potatos in thin rounds (about 0.5 cm)

Boil 1 cup of cream in microwave, 2-3 minutes

Begin layering : potatos, ham, onion, garlic,
salt & pepper on each layer, a little swiss cheese
Build up all the layers
pour cream on top, should come about half way up the potatoes
top with more swiss (not to cover) + parmesan & grated nutmeg

Bake 40-45 minutes

Top should be brown all over and potatoes should be soft
If potatoes are soft & top is not brown, broil a bit
If top is brown and potatoes are not soft, cover in foil & cook more

Nom! It should be really sweet from the caramel of the onion and garlic, it should be nutty and smokey from the ham and cheese. Seattle is going to make me fat.


What actually happens when you have a big static array in your executable? Like

static char c_bigBuff[65536] = { 0 };
I believe that executables have a special "zeroed statics" section, and when the exe loader does its thing, it grabs pages pages for them and just points them at the zeroed pages. I remember on the XBox I was worried about our static initializers taking up space in the executable, and it turns out *zeroed* static values do not take up any space, but non-zero ones do obviously.

I do know that Windows has a page zeroer thread which is always running. It's at the lowest possible priority and all other threads are higher. It runs in the idle and just zeros free pages. I don't know if these are actually used to speed up page zeroing, like when I ask for a zeroed page do it give me a pre-zeroed one if possible and not burn any time doing the memset?

Anyhoo, somebody school me.

Okay, yeah, this stuff goes in the "BSS" in most executable formats. BSS is sometimes called "uninitialized" data though of course it's really zero-initiliazed, but this comes from an evil fucking old C-ism :

Uninitialized statics are gauranteed to be zeroed in C !!

void func()
	int x;
	static int y;

	... code ...

Here "x" is undefined (maybe a bug), but "y" is defined by the standard to be zero. Yay C. That's evil.

Apparently under Linux, when you load your exe the BSS points at a virtual COW "zero page". If you only ever read those zeros it never copies.

Whoah, Won just blew my mind. Obviously all pages that the OS gives to an app are always zeroed! This is for "security" just like how sectors of disk that you get access to by extending a file are zeroed. When the executable is loaded, all the pages that it consumes are zeroed before it does anything. The way the static data is loaded apparently is just that the data with non-zero initializers is put at the front of the section, so the exe just asks for a big block, and then values are only copied into the front, the rest stays zero. So if you have a big static array, it's getting zeroed whether you ask for it or not.

So the question then is how does VirtualAlloc zero memory, and I assume the answer is that it tries to use pages already zeroed by the zero-thread, but if there aren't any available it memsets.

Okay yeah, I've had that confirmed by hearsay anyway. In fact, free pages in windows go into a dirty pool, the background zeroer thread takes pages from dirty and puts them in a zero pool, new allocs try to take pages from the zero pool. There's this really sweet looking app : MemInfo from Alex Ionescu , but it only runs on Vista. Actually Vista has a lot of sweet new APIs that let you do some new low level things. Anyway it shows you info about how many pages are in each type of pool.


I've always wanted a text-to-blogger posting app for my blog; I thought about writing it myself a little while ago with the Blogger API, but they only support Managed C and such so I gave up. I was thinking maybe I should hire an Indian to do it at one of those code-for-cash web sites.


I made this cool looking test image a while ago. It's a delta of the original image from the lapped compressed & decompressed image, eg. it shows where the errors are in the decompressed image. I find these kind of scientific images really beautiful. You can actually see the shapes of the lap filter very clearly in some parts.


RRRAAAAAARG fuck I hate the way TiVo just starts playing the TV randomly when you leave it on the main menu for a while. WTF WTF WTF so retarded. I might have to throw the thing away and get my media PC fixed up for TV recording just because of this fucking incompetent fucking awful usability decision. I'm sitting here typing away some nice gentle rants about puppies and rainbows and the fucking TV suddenly explodes into a cacophony of redneck screaming as the TiVo decides to start playing live TV and it happens to be fucking WWE wrestling.


There's this super fucking evil thing about unicode that I wrote about here before when I wrote all the junk about Unicode. There are multiple possible encodings of the exact same string because you can encode many accented characters either as a single combined symbol or as a decomposed symbol (the accent is a separate symbol). Windows treats the two different encodings as two separate valid file names, so you can have both encodings in the same directory (I'm not sure how any other OS handles this case). Windows will also fail to open the file if you try to open with an equivalent string that's encoded differently.

What that means is if you ever take your string through a different code type, you might lose your ability to address your file. For example, if you convert the file name from UTF-16 to UTF-8 (and you don't preserve exactly whether the accents were decomposed or not), you might lose the ability to address the file. If you take the file through the windows "A" code page, even if it can represent the name perfectly, you might lose track of the file.

This could be fixed if the OS always used a canonical encoding.


I'm going to write a little tool soon so that I can view the performance of my threads as a timeline. It will show what all the threads are doing at each second, and operations of each type will get a certain color; sleeps will be gray, any time the main thread is stalled out waiting on another thread will be red. I'll post things like when did an IO prefetech get fired, when did the IO request for that actually come in, then you can see when the IO Threads actually start doing that op, when they finish it, and then when the main thread gets it back; if the main thread blocks on that IO you'll see that, and there will be lines hooking up the same operation's journey across the various threads.

It's almost exactly Pix. I wish I could just put my data in the pix viewer and not have to write a GUI app (I hate writing GUI apps). It's frustrating that what I want is right there but I can't use it cuz it's not open source.

Really a nice thread timeline viewer that shows your time slices and mutexes and all that is something that should be in VC or whatever. It's a super useful standard app.

Oh, apparently NVidia has this AgPerfMon thing which also is pretty much what I want but also doesn't help me. Also : dear god, that looks way too complicated.

I've been thinking about doing another app to dump everything about your machine to a text file. Kind of like the stuff Valve does for graphics, but I also want everything about all your device drivers, what kind of disk hardware you have, stuff about your PCI bus, your network setup, all your BIOS settings, stuff like your IDE master/slave configs, is stuff in PIO mode, etc. That should totally be a standard app.


At Oddworld we set up a thing where any crashes on the artists or designers machines would grab a stack trace and email it to all the programmers. It was super awesome and handy and part of why we ran so clean all the time. For all of you who don't have that kind of system set up, you should, and I just found a really easy way for you to get on it : BlackBox at CodeProject is a DLL that hooks in minidumping; all you have to do is LoadLibrary on it in your app to pull in the DLL and it makes you behave ten times better.

Another thing I really liked was just a stupid option I had in the game that would optionally load an arbitrary text file and echo that file to the output window. That way I could tell it to echo any error logs that I got from artists, and all the errors were logged like " file (line) : string " so that I could just go through the VC output window and double click them and go straight to code.

These kind of things don't actually save you a huge amount of time, but they make debugging and responding to errors much much more pleasant, which makes it far more likely that programmers will actually respond when the artists are running around like chicken little with their hair on fire saying "the build is broken! the build is broken!".


Well, I couldn't resist, I looked at my finances. Holy crap, I'm broke. About two years ago I decided the US was headed for the crapper so I got almost all my money out of the US. Sounds like a good idea, right? My European funds are down about 40% this year. My emerging market funds are down about 50%. That's worse than the S&P which is down about 30%. WTF. I especially thought Europe was a pretty safe bet because even just the falling dollar is like free growth in stocks traded in Euros.

The other thing that kind of surprised me is that my TIPS are down. The only money I have in the US is in TIPS (Treasury Inflation Protected Securities), which are supposed to be even less volatile than normal treasuries. WTF TIPS are down 4% this year. The fucking super safe bond is not supposed to go down, yo!

I definitely should've cashed out all my savings and gone to live on a beach in Thailand back when I was considering that a few years ago.


On LZ Optimal Parsing

So the LZ-Huffman I've done for RAD/Oodle does some new stuff with optimal parsing. I want to write up what I'm doing so I remember it, and I'm also going to try to write about the semi-optimal parsing that a lot of people are doing now.

First let's review a bit. In LZ coding you have a lot of choices about how you code things. Greedily choosing the longest match at each point dOpoes not produce the best code stream. There is a very good ancient paper by Storer and Szymanski that introducing optimal parsing, and their method is both fast and 100% optimal, but it is only "optimal" in the sense of coding the fewest number of literals possible. See my old notes on Storer-Szymaski optimal parsing . Now, for the old LZSS coder that used fixed bit coding (8 bit literals, 24 bit matches, or whatever), that optimality is also the same as code length optimality. But in reality what we care about is code length optimality - we want to produce the shortest possible file - which is not necessarilly the same as coding the fewest possible literals.

There are sort of two interesting domains of modern LZ coders, and optimal parsing matters a whole lot to both of them. One domain is the super fast, super simple LZ coder like my LZH. Optimal parsing is important to LZH because the matches are so important to compression, you really want to get the best matches you can. The other domain is something like LZMA or BALZ (an ROLZ) that does a lot of context coding of literals. In those cases, the context coder for literals is so good that you really want to only code good matches, and often the literals will be so cheap you want to send them as literals; also the cost of literals and matches is highly variable for those coders.

Optimal parsing is a path search in a graph. For game developers you can think about A* as I talk about this. You start yourself at the head of the file, and your goal is to walk to the end of the file in the shortest possible path. From each node in the graph, your exit paths are the possible coding operations at that point in the file, eg. write a literal, write match X of length Y. Each coding operation has a certain cost which is equal to the number of bits output if you do that coding operation, and then you step along that link to the next point in the file as specified by the match length.

Note that with the original LZSS we only needed to consider either matching or not matching at each position in the file, since any time you wrote a match the best thing to do was to write the maximum match length. With true code-optimal parsing, you have to consider all possible match offsets and all possible match lengths at each location (!!). This is a huge number of coding choices, but we can rule out almost all of them. (more on this later).

So, the brute force retarded optimal parser should be obvious at this point. Start with an empty code stream and push position 0 on the stack. Repeat forever : pop the stack. Try all coding operations at the position specified in the stack. Push all of those coding operations on the stack with the code stream they made, and the position of the end of the match. This works fine and produces a 100% optimal parse, but is insanely slow. If you have N coding choices at each position it makes N^L operations for a length L file.

With adaptive coding (eg. you're using an adaptive entropy coder for literals matches and offsets), you cannot get a 100% optimal parse unless you do this full search, because it does not have the optimal subgraph property. That is if you have the optimal path from A to C, and that path goes through B, that optimal path is not the same as taking he optimal path from A to B and the optimal path from B to C. To speed things up we are going to start making approximations, and we are going to generally assume a looser version of the optimal subgraph property. What this means in terms of LZ coding is that if you are at a certain position in the file, the best way to code the whole file does not necessarily include the best way to code from {begin} to the current position.

Let's revisit the original LZSS parse now. LZSS does no entropy coding, so it doesn't matter how you get to a given byte of the stream, you will code the remainder in the same way. This is obviously a "Dynamic Programming" problem. You have shared sub-portions of the problem that can be collapsed by storing their result in a table. The optimal path from point P to the end is the same regardless of how you get to P. You could now proceed by doing the brute force approach and search depth-first, and each time you find a path, you store all the sub-paths that you visited, and then you reuse them instead of recomputing them. Or you can do it a more elegant way which is identical, which is to walk backwards through the file and at each position P store the optimal path from the P to the end. When you find the optimal path at P obviously all the future subgraphs that you might need are already done because you're going backwards. So, we now think of the LZSS optimal parse as a way of solving the graph search using dynamic programming.

The case of a static entropy coder is simpler so let me focus on that first. Offsets and lengths and literals are written in some # of bits that is known in advance and we wish to make the shortest possible code stream. We can do a sort of LZSS backwards parse to do this quickly. It's not as simple as LZSS because we cannot assume that the best match choice at each step is the longest one.

Optimal Parser for Static Statistics :

We're going to walk backwards like LZSS. At each position already considered (future positions in the file), we have already found the optimal choice at that spot. We have various coding choices at the current position, either literal or various {offset,length} possibilities. The total cost of each of those coices is : {Cost to code the current choice} + {Table cost at current pos + match length} , that is the already computed cost to finish the walk after you step match length.

It should be obvious that we can solve this just by considering every coding operation possible at each position and walk backwards. That gives us a total cost of O(N*L) , which is a lot better than O(N^L) , but it's still slow because N is very large. So let's start making approximations.

What are all these coding choices that we have? We found various offsets that we matched at least min match length (usually 3). This gives us a bunch of possible offsets and the maximum match length at each offset : {Off1,Len1} , {Off2,Len2}. In general we also have to consider all lengths >= 3 at each offset.

The first approximation is the assumption that for any given match length, the lowest code cost comes from the smallest possible offset that matches with that length. That is, we assume that if Off1 < Off2 , then cost(Off1) <= cost(Off2). This is certainly true for all non-pathological files in the real world. This kind of assumption is also self-satisfying, that is by making this assumption, the optimal parser will prefer shorter offsets, which makes them more likely, which makes their code length shorter. It's a positive feedback loop; I'll talk more about feedback later.

With this approximation, we no longer need to consider all offsets and all match lengths. Now we only need to consider all match lengths, and the offset for any length is chosen for us - it's the lowest offset that codes that length. Our code choices now look like :

you find a match of length 7 at Off1, length 11 at Off2, length 22 at Off3
with Off1 < Off2 < Off3

choose from :

Length 3-7 at Off1
Length 8-11 at Off2
Lebgth 12-22 at Off3

Okay, so we could do this, but it's still too slow. We need to reduce our choices further.

One option is to only consider the longest possible length at each offset. That is, assume that if you code a length at a given offset, the best choice is the longest length at that offset. So in our example you would only choose from {7 at Off1, 11 at Off2, 22 at Off3}. That is sort of okay, but it actually gives up a lot of compression. It misses too many good choices. Apparently there are some good coding paths that involve taking a shorter than maximal match.

To get to the really cool optimization, we need to stop and think about this intuitively. Why would you not write a maximal match? It should almost always give us a shorter total code len, because writing a longer match is very cheap and lets us not code symbols after the match. The cost of a given match is

Cost(offset) + Cost(match length) + CostTable( at pos + match length )

Let's assume for now that the cost of the offset and the match length are seperable (they aren't used to predict each other). The cost of each match length is an entropy code that generally codes shorter lengths in fewer bits. It's sub-linear in match length, but it is generally increasing. However, it increases less than the average cost of future symbols. Note that for long match lengths, the cost of len and (len+1) may be exactly equal depending on how you code your lengths. For example of you use a semi-log2 code, then the cost of many large match lengths is identical. That means coding longer matches is free in terms of the current code cost.

The CostTable generally increases backwards linearly proportional to the average entropy. That is :

CostTable[ {end} ] = 0
CostTable[ end - 1 ] = cost to code form (end-1) to end
CostTable[ end - N ] = N * H

on average ; where H is the average entropy of one symbol

However, even though CostTable is linearly increasing backwards on average, it is not always. Consider for example what CostTable looks like in the neighborhood of a very good very long match. Say at some position P this long match is available with length L. At position (P+1) the best coding choice will be that same match with length (L-1), same at (P+2) etc. The cost table in this area will look like :

CostTable[ P-1 ] = Cost to code literal at (P-1) + Cost(L) + CostTable[ P + L ];
CostTable[ P   ] = Cost(L  ) + CostTable[ P + L ];
CostTable[ P+1 ] = Cost(L-1) + CostTable[ P + L ];
CostTable[ P+2 ] = Cost(L-2) + CostTable[ P + L ];
CostTable[ P+3 ] = Cost(L-3) + CostTable[ P + L ];

If L is large, like say 100 or more, the Cost(L) and the Cost(L-1) will be very very similar. In this region, CostTable[] is increasing as you step backwards, but it's only increasing very slowly, far more slowly than the average slope of H. We see that CostTable goes backwards like N*H in a line, but it deviates up and down across the line. Some areas you code worse than average entropy, some areas you code much better than average entropy. The areas where you code better than average entropy are candidates for choosing a shorter than maximal match length!

What are we really doing as we step back through the file and choose the best match length? We have this constant table Cost(L) that tells us the cost of coding each match length. This table is increasing, it's smallest at Cost(3) and largest at Cost(infinity). We find the maximum match length at a given offset and the minimum at a given offset (the minimum is the smallest length that can't be coded from a lower offset). We take the Cost(L) table in that range and add it onto CostTable() in that range, then pick the lowest value after the sum. Generally CostTable increases backwards faster than Cost(L) increases forwards, but not always! As we saw in the example above with a long match, CostTable can increase backwards slower than Cost(L) increases forwards.

So, how do we find this spots quickly? Well, when we are looking at coding at spot P, we have already visited all the CostTable spots > P. What we'd like to know is, Is CostTable ahead of us increasing sublinearly? And if so, where is it MOST sublinear? That is, Where is CostTable[P] as far below (end - P)*H ? That will tell us the most likely spot to code a less than maximal match.

We could build a binary search tree for these spots as we go backwards, which would be logN and all, but I used a simpler solution. At each spot P, I store the best sublinear spot preceding that spot (preceding in file order, actually occuring later in the backwards walk). That is :

I chose a coding at pos P and updated CostTable[P]

BestSpotPreceding[P] = P;
int step = 1;
while( CostTable[P] + step * TinyCost < CostTable[P + step] )
	BestSpotPreceding[P + step] = P;

Normally when CostTable is increasing a lot as you step back, this does not get triggered at all. That is, CostTable[P+1] is usually a H smaller than CostTable[P]. But in weird cases, the cost of coding this symbol was almost free, so BestSpotPreceding points back to us. BestSpotPreceding gets updated in waves that terminate at cheap code spots. Normally you keep stepping back, then you hit a really cheap code spot and a wave percolates forward, filling out BestSpotPreceding, then you step some more normally, then you hit a cheap code spot and it waves through again. Sort of carries in an arithmetic coder. Anyway, this is technically still O(N*L), but it's a very cheap operation and it's way way less than N*L in practice.

So, how do we use this? Instead of considering only the maximum length at each offset ( {7 at Off1, 11 at Off2, 22 at Off3} in our previous example ), we also consider matches at BestSpotPreceding[] in that offset. That is :

Look in 3-7 at Off1
Consider a 7 at Off1
Also consider BestSpotPreceding[Pos + 7] at Off1 (if it's >= 3 )

Look in 8-11 at Off2
Consider an 11 at Off2
Also consider BestSpotPreceding[Pos + 11] at Off2 (if it's >= 8)

Look in 12-22 at Off3
Consider a 22 at Off3
Also consider BestSpotPreceding[Pos + 22] at Off3 (if it's >= 12)

And those are all the matches we have to consider. I did this and found it hurt less than 0.001 bpb compared to the full optimal parse that considers all possible coding operations at each position.

ADDENDUM #1 : I should mention there's another cool way to get this optimization if you don't want to do the BestSpotPreceding thing. We again want to ramp up our intuition for the problem. The cases where we need to make a non-trivial decision about match lengths are cases when we are writing two matches in a row, both greater than minimum length, and they overlap. That means there's a big range of ways we can code the same block. For example :


At pos P we find a match of length 7
At pos P+4 we find a match of length 10

The area from pos P to [P+14] can be coded with two matches, the ways are :


The area of overlap of the matches all codes to the same offset, it's the same offset that was found at P+4, but if you match past that at P, then you just match less of it. The amount of overlap is the number of different coding choices we have.

We're coding the same two offsets and skipping the same number of symbols, so the only difference in code cost is the cost to code the match lengths. The code costs of the choices are :

Cost(len 4) + Cost(len 10)
Cost(len 5) + Cost(len 9)
Cost(len 6) + Cost(len 8)
Cost(len 7) + Cost(len 7)

But we don't even need to compute these. The cost of coding a match len is almost always increasing in len, and increasing more slowly for larger lens, that is :

(len1 > len2) implies codecost( len1 ) >= codecost( len2 )

(len1 > len2) implies ( codecost( len1 +1 ) - codecost(len1) ) <= ( codecost( len2 + 1 ) - codecost( len2 ) )

That is, code costs go up for larger lens, but they go up more slowly. It's like a sqrt(x) function - sharp at the beginning and then levels out. That means the lowest total code cost is the one that's most skewed. You want to extend the longer match as much as possible because that's very cheap, and you want to shorten the shorter match as much as possible because that saves you a lot. Intuitively you can think the difference between coding a len of 3 and 4 is very big, but the difference between 511 and 512 is tiny tiny.

In our example that means the best choice is :

Cost(len 4) + Cost(len 10)

The simple heuristic to do this in general is :

Consider coding option at pos P
Find max match len possible at each offset
Consider coding with max match len

len = max match len
while ( matchoffset[ P + len - 1] == matchoffset[ P + len ] )
if ( len != max match len )
	consider coding at len

Note the similarity with the BestSpotPreceding heuristic. We try coding with our longest match len. Then we also try coding with the *shortest* match len that still reaches the same match after we jump forward. We're trying the two most skew possibilities, our match as long as possible and our match as short as possible, constrained to be getting the same match following. (matchoffset is already computed because we're going backwards).

ADDENDUM #2 : More early outs. LowestCostPreceding. @@ todo : write this.

There are some more minor issues we need to consider.

The first one is that if we are doing a Huffman (or static arithmetic) coder to entropy code the length, offset & literal, then we can't actually know the code length in advance. And the way we choose to do the coding will affect the statistics which will affect the code lengths, which affects how we choose to do the coding, ad infinitum. It's a feedback loop.

As usual in these kind of optimization searches, we make the semi-static assumption. We make a good guess for the code lengths, find the optimal parse for those code lengths, use that parse to guess new code lengths, optimal parse again, etc. Hopefully this smoothly converges. Well, it doesn't. The problem is that the coding alphabet is actually pretty sparse. On medium size files there are not that many literals and there are plenty of offsets and match lengths you might not code at all in one particular encoding choice. For example, in one pass you might not code any matches of length {31} at all. Now when you do the next pass you need something reasonable in there. If you use the actual code lengths, you can strangely skew how you do the coding, and it can actually ping pong a lot in the search and not settle down.

To improve this, I suggest smoothing the counts in the early passes. That is, the chance of a length 31 match should be somewhere between the chance of a length 30 and length 32 match, regardless of the coincidence that we happened not to code with it in the last pass. To do this, I fit a laplacian probability model to lengths and offsets, and in the early passes I blend that model in to smooth out the regularize the distribution. As I do more passes I decrease the weight of that blended simple model and use the raw counts more.

In practice, you can avoid doing multiple passes of the actual optimal parser without hurting compression too much. You can do a first pass with just a normal non optimal parser and build statistics from that. Smooth those statistics with a regularizing simple model. Then do the optimal parse with those statistics, and you're done. Doing more passes will improve the coding decisions, though.

The other issue is that it's still a bit slow. Most of the slowness comes from the areas where very long matches are possible. If there's a long match of length L at pos P, we also consider (L-1) at (P+1), and (L-2) at (P+2), etc., and for each of those we generally find many closer offsets to also consider, roughly O(L) of them, which makes us O(L^2) over that region. On super degerate files like "pic" in the Calgary Corpus, this makes us very slow. However, it's easy to fix. In this area with the long match, it really doesn't matter much what we do, because all of them are good choices. This is a property that in areas of super high compression, the importance to the whole file is much less. You want to maximize your wins on the hard to compress spots, because in terms of output bytes they are more important. So, in these big long match areas, we just cut out the whole area inside the long match, and we always code the long match at the beginning. (actually I allow coding at P and P+1 up to P+K for a small K ; K is like 4 and L is like 100).

Optimal Parser for Adaptive Coders :

Now, with real modern coders you aren't doing static statistics, they're adapting as you go. You could just make a guess at the overall statistics and use the semi-static optimal parser that I just described, and then code the stream with your adaptive coders. That's better than not optimal parsing at all, but it's giving up a lot. Adaptive coders get a big win by changing their statistics as they scan through the file, and an optimal parser can make much better decisions if it uses the actual local statistics. For example, in one part of the file, literals might code very well, and the optimal parser should favor literals more, in another part of the file there might be lots of very short match lengths, and the optimal parser should code for that, etc.

One way to solve this is to use a different kind of semi-static approximation and optimal parse forward in blocks. For each block, you pretend your coder are static and use some fixed code lengths. Now, run the semi-static optimal parser that I described above to find the optimal coding for the next 1000 bytes or so. Now step up through that coding and output codes with your adaptive entropy coder. Only output 900 bytes or so, don't go all the way to the end of the block so that you avoid edge effects. Now hold use the current statistics to find the optimal parse for the next 1000 bytes. This in fact works very well, and I believe was first used by RKive many years ago. He called it "dynamic programming" which is sort of right in the sense that all backward optimal parses are a type of dynamic programming.

There's another form of adaptive optimal parse that a lot of people do now using forward walking. It's called "flexible parsing" by the encode.ru guy, and I believe a form of this is used in LZMA.

The simplest form of "flexible parse" is just the old "lazy evaluation" of Zip. Lazy matching walks through the file and considers either writing {match} or {literal + match at next pos}. It's a slightly non-greedy parser. Flexible parsing is basically the same thing, but we consider a few different choices.

To do this, we run through the whole file first and find the best match at each position. (you don't actually have to do this in practice, you could just look ahead some amount N from the current position to avoid multiple passes). In any cases, the match search at each pos is only done once, and we have them ready far enough in the future to do the flexible parse.

So, we consider a few coding choices at the current location. We choose the one that gives the lowest code cost to the end of the file. To make good decisions, it needs to also consider the coding after you take the current operation. That is, it's like a recursive function :

CodeCostRecurive( P )
	consider n in N choices
		cost of choice n = Cost(code n) + CodeCostRecurive( P + len(n) )
	return best of N

now of course if we actually did this full recursion we would be doing the whole O(N^L) tree like we didn't want to do. But we don't need to do that because we're only using this to make the decision at the current byte. That is, we can use the "short horizon" assumption - coding choices far in the future don't matter that much to our current coding choice. It's like only considering a few moves ahead in a Chess program (which is a bad thing to do there, but a good thing here). In fact, you only need to look 2 moves ahead here to make good coding choices.

Also, you don't need to consider very many N choices. N = 4 or 8 is plenty. Certainly consider coding a literal and coding the longest possible match, and then heuristically pick some other likely good choices in between (such as the longest matches available with some shorter offsets, or the longest matches possible from previous-match-references if you have those).

How do we terminate the recursion? After 2 steps (considering one op and then one more op after that) we just need an estimate of the cost of coding the rest of the file. That's easy, it's just :

CodeCostEstimate( P ) = ( {end} - P ) * H

That is, just assume the remainder is coded at the entropy. The entropy you use here should be a conservative overestimate of the entropy; for example just use the entropy of what you've done so far and add a fudge factor. This ensures that you actually prefer writing matches and don't do something strange if you start the file in a very low entropy region.

This flexible parse sounds kind of expensive, but in fact with N = 4, it's only 16 table lookups to compute all the code lengths, then we pick the best one. This of course is not a true optimal parse, but it's pretty damn good, and the fact that it's using the true current adaptive statistics makes up for the fact that it's not doing the best possible parse.

ADDENDUM : I should note that this kind of forward parse is also better at making good decisions with PMR's (Previous Match References). PMR's are cheap to code offsets, so they should always be considered in the optimal parse when they are possible matches. Going backwards you can't make good decisions about trying to use PMR's because you haven't done the earlier parts of the file yet.


On the Art of Good Arithmetic Coder Use

I think it's somewhat misleading to have these arithmetic coder libraries. People think they can take them and just "do arithmetic coding". The reality is that the coder is only a small part of the arithmetic coding process, and it's the easiest part. The other parts are subtle and somewhat of an art.

The other crucial parts are how you model your symbol, how you present symbols to the coder, how you break up the alphabet, and different type of adaptation.

If you have a large alphabet, like 256 symbols or more, you don't want to just code with that alphabet. It has few problems. For one, decoding is slow, because you need to do two divides, and then you have to binary search to find your symbol from a cumulative probability. Aside from the speed issue, you're not compressing super well. The problem is you have to assign probabilities to a whole mess of symbols. To get good information on all those symbols, you need to see a lot of characters to code. It's sort of like the DSP sampling problem - to get information on a big range of frequencies in audio you need a very large sampling window, which makes your temporal coherence really poor. In compression, many statistics change very rapidly, so you want quick adaptation, but if you try to do that on a large alphabet you won't be gathering good information on all the symbols. Peter Fenwick has some good old papers on multi-speed adaptation by decomposition for blocksorted values.

In compression most of our alphabets are highly skewed, or at least you can decompose them into a highly skewed portion and a flat portion. Highly skewed alphabets can be coded very fast using the right approaches. First of all you generally want to sort them so that the most probable symbol is 0, the next most probable is 1, etc. (you might not actually run a sort, you may just do this conceptually so that you can address them in order from MPS to LPS). For example, blocksort output or wavelet coefficients already are sorted in this order. Now you can do cumulative probability searches and updates much faster by always starting at 0 at summing towards 0, so that you rarely walk very far into the tree.

You can also decompose your alphabet into symbols that more accurately represent its "nature" which will give you better adaptation. The natural alphabet is the one that mimics the actual source that generated your code stream. Of course that's impossible to know, but you can make good guesses. Consider an example :

The source generates symbols thusly :

It chooses Source1 or Source2 with probability P
Source1 codes {a,b,c,d} with fixed probabilities P1(a),P1(b),...
Source2 codes {e,f,g,h} with fixed probabilities P2(e),P2(f),...

The probability P of sources 1 and 2 changes over time with a semi-random walk
After each coding event, it either does P += 0.1 * (1 - P) or P -= 0.1 * P

If we tried to just code this with an alphabet {a,b,c,d,e,f,g,h} and track the adaptation
we would have a hell of a time and not do very well.

Instead if we decompose the alphabet and code a binary symbol {abcd,efgh} and then code
each half, we can easily do very well.  The coding of the sub-alphabets {a,b,c,d} and
{e,f,g,h} can adapt very slowly and gather a lot of statistics to learn the probabilities
P1 and P2 very well.  The coding of the binary symbol can adapt quickly and learn the
current state of the random decision probability P.

This may seem rather contrived, but in fact lots of real world sources look exactly like this. For example if you're trying to compress HTML, it switches between basically looking like English text and looking like markup code. Each of those is a seperate set of symbols and probabilities. The probabilites of the characters within each set are roughly constantly (not really, but they're relatively constant compared to the abruptness of the switch), but where a switch is made is random and hard to predict so the probability of being in one section or another needs to be learned very quickly and adapt very quickly.

We can see how different rates of adaptation can greatly improve compression.

Good decomposition also improves coding speed. The main way we get this is by judicious use of binary coding. Binary arithmetic coders are much faster - especially in the decoder. A binary arithmetic decoder can do the decode, modeling, and symbol find all in about 30 clocks and without any division. Compare that to a multisymbol decoder which is around 70 clocks just for the decode (two divides), and that doesn't even include the modeling and symbol finding, which is like 200+ clocks.

Now, on general alphabets you could decompose your multisymbol alphabet into a series of binary arithmetic codes. The best possible way to do this is with a Huffman tree! The Huffman tree tries to make each binary decision as close to 50/50 as possible. It gives you the minimum total code length in binary symbols if you just wrote the Huffman codes, which means it gives you the minimum number of coding operations if you use it to do binary arithmetic coding. That is, you're making a binary tree of coding choices for your alphabet but you're skewing your tree so that you get to the more probable symbols with fewer branches down the tree.

(BTW using the Huffman tree like this is good for other applications. Say for example you're trying to make the best possible binary search tree. Many people just use balanced trees, but that's only optimal if all the entries have equal probability, which is rarely the case. With non-equal probabilities, the best possible binary search tree is the Huffman tree! Many people also use self-balancing binary trees with some sort of cheesy heuristic like moving recent nodes near the head. In fact the best way to do self-balancing binary trees with non equal probabilities is just an adaptive huffman tree, which has logN updates just like all the balanced trees and has the added bonus of actually being the right thing to do; BTW to really get that right you need some information about the statistics of queries; eg. are they from a constant-probability source, or is it a local source with very fast adaptation?).

Anyhoo, in practice you don't really ever want to do this Huffman thing. You sorted your alphabet and you usually know a lot about it so you can choose a good way to code it. You're trying to decompose your alphabet into roughly equal probability coding operations, not because of compression, but because that gives you the minimum number of coding operations.

A very common case is a log2 alphabet. You have symbols from 0 to N. 0 is most probable. The probabilities are close to geometric like {P,P^2,P^3,...} A good way to code this is to write the log2 and then the remainder. The log2 symbol goes from 0 to log2(N) and contains most of the good information about your symbol. The nice thing is the log2 is a very small alphabet, like if N is 256 the log2 only goes up to 9. That means coding it is fast and you can adapt very quickly. The remainder for small log2's is also small and tracks quickly. The remainder at the end is a big alphabet, but that's super rare so we don't care about it.

Most people now code LZ77 offsets and lengths using some kind of semi-log2 code. It's also a decent way to code wavelet or FFT amplitudes. As an example, for LZ77 match lengths you might do a semi-log2 code with a binary arithmetic coder. The {3,4,5,6} is super important and has most of the probability. So first code a binary symbol that's {3456} vs. {7+}. Now if it's 3456 send two more binary codes. If it's {7+} do a log2 code.

Another common case is the case that the 0 and 1 are super super probable and everything else is sort of irrelevant. This is common for example in wavelets or DCT images at high compression levels where 90% of the values have been quantized down to 0 or 1. You can do custom things like code run lengths of 0's, or code binary decisions first for {01},{2+} , but actually a decent way to generally handle any highly skewed alphabet is a unary code. A unary code is the huffman code for a geometric distribution in the case of P = 50% , that is {1/2,1/4,1/8,1/16,...} ; we code our symbols with a series of binary arithmetic codings of the unary representation. Note that this does not imply that we are assuming anything about the actual probability distribution matching the unary distribution - the arithmetic coder will adapt and match whatever distribution - it's just that we are optimal in terms of the minimum number of coding operations only when the probability distribution is equal to the unary distribution.

In practice I use four arithmetic coder models :

1. A binary model, I usually use my rung/ladder but you can use the fixed-at-pow2 fractional modeller too.

2. A small alphabet model for 0-20 symbols with skewed distribution. This sorts symbols from MPS to LPS and does linear searches and probability accumulates. It's good for order-N adaptive context modeling, N > 0

3. A Fenwick Tree for large alphabets with adaptive statistics. The Fenwick Tree is a binary tree for cumulative probabilities with logN updates. This is what I use for adaptive order-0 modeling, but really I try to avoid it as much as possible, because as I've said here, large alphabet adaptive modeling just sucks.

4. A Deferred Summation semi-adaptive order-0 model. This is good for the semi-static parts of a decomposed alphabet, such as the remainder portion of a log2 decomposition.

Something I haven't mentioned that's also very valuable is direct modeling of probability distributions. eg. if you know your probabilites are Laplacian, you should just model the laplacian distribution directly, don't try to model each symbol's probability directly. The easiest way to do this usually is to track the average value, and then use a formula to turn the average into probabilities. In some cases this can also make for very good decoding, because you can make a formula to go from a decoded cumulative probabilty directly to a symbol.

ADDENDUM BTW : Ascii characters actually decompose really nicely. The top 3 bits is a "selector" and the bottom 5 bits is a "selection". The probabilities of the bottom 5 bits need a lot of accuracy and change slowly, the probabilities of the top 3 bits change very quickly based on what part of the file you're in. You can beat 8-bit order0 by doing a separated 3-bit then 5-bit. Of course this is how ascii was intentionally designed :

    0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F  0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
1   SP  !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /  0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
2   @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O  P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
3   `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o  p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ DEL


The new 520 bridge isn't even scheduled to be open until 2016. Which means it will actually be like 2020 or something. It's fucking ridiculous. It's such a huge cost to the local economy to have all this gridlock, this is what you need government and taxes for. When you have fucked up problems that are hurting your whole city, as much money as possible should be spent to fix them as quickly as possible.


Well, I've learned why my non-buffered writes are so much faster than buffered. You can read the MS KB article : Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP

The first issue is that buffered writes of course still go through the page cache like I described previously. There's no way to specify that a cache page is being completely stomped, so they are validated automatically by the page manager when they are touched.

Even on an empty file that you just created they get touched. In order to do an async write, you had to do a SetFileSize to reserve room for yourself (extending file size is always synchronous, so you have to bump the file size up in advance). Windows won't let you just extend the file size without validating the data in the new file. In particular, it stomps zeros in there. What this means is the first time you write to a page it gets zeroed before your data goes in. The reason they do this is for security. If you could just extend your file size and they didn't stomp zeros, you could see sectors on disk that had been marked as deleted but not wiped, so you could look at someone else's data.

To avoid this hit, they nicely tell you that you can use this handy function SetFileValidData() which marks the data up to some length as valid so it won't get zeroed. Unfortunately, only administrators are allowed to use this function so you can't put it in an application at all.

Sigh. The result is that non-buffered writes are way faster than buffered writes, and also that overlapped writes are almost never actually asynchronous (when writing to portions of the file you haven't previously touched, they are synchronous, if you write back over something you touched it is async).

Of course, like most Windows security measures, this is all pretty retarded. Anybody who actually wants to install a malicious app that peeks at your deleted sectors can still easily do so by installing something at the device layer. The only actual consequence of this security measure is that normal apps that just want to write data are slower.


I've been trying not to pay any attention to this financial crisis. I hate worrying about money, it doesn't interest me and I don't want to spend any brain power on it, and I really don't need any stress of any kind. But it's becoming hard to ignore.

I think the situation is actually rather more dire than most commentators are making it out to be. The biggest problem is not a "credit crunch". It's that the US has basically had zero growth in real industries (eg. not finance or real estate) in 10 years, and there's no sign that that will turn around any time in the future. The government has done nothing in terms of investing in education or research or improving the business environment for companies that could actually help us. And now it no longer has the cash to do so.

There are various nightmare scenarios that I don't think are terribly far fetched. They mainly involve the world getting fed up with our financial crisis and turning to other currencies and treasuries. OPEC could switch to pricing oil in Euros. China might stop investing in treasuries, and perhaps worse could let the Yuan float and stop buying dollars at a fixed rate. Those things in combination could lead to a huge plummet in the dollar.

The US government is absolutely fucked and I don't see how we're going to get out of it. We've got a huge defecit, taxes are cut and need to be raised, the burden of medicare and social security is ever increasing, and we've already been cheating our future by already not putting the spending we desperately need into things like infrastructure, polluting less, etc. bills that we'll have to pay.

Now I don't think we're going to be poor any time soon, but we might have to start living like Europeans or something. You know, consuming less, driving smaller cars, turning off lights when not in the room, running the heat and AC less, wearing clothes for more than 6 months, buying condos or duplexes instead of giant houses in the suburbs. The horror!!!

I actually think the end result may be that the Republicans get their wish and one of the major pillars of our government might be demolished. Something like Social Security or Medicare. (of course they would never stop funding their buddies in defence). Cutting either of those entitlements is politically impossible in normal situations, but if you manage to completely bankrupt the government there might be no choice.


I've been getting email blacklist rejections from people.

Hmm, it appears to be an "SPF" issue. WTF there's new internet standards nobody told me about blocking my mails!


Arithmetic coders throw away accuracy in lots of little places. A full precision range update would be :

low += (symlow * range) / symtot;
range = (symrange * range) / symtot;

Intead we do :

U32 r = range / symtot;
low += symlow * r;
range = symrange * r;

This has changed the parentheses of the multiplication, which lets us do math in 32 bits and also saves a divide. It's inaccuracte, we would waste some range, so we check for being at the end of the range and put the wasted space on there :

if ( (symlow + symrange) == symtot )
	U32 t = symlow * (range / symtot);
	low += t;
	range -= t;
	U32 r = range / symtot;
	low += symlow * r;
	range = symrange * r;

We can see the waste by comparing the range update for the top symbol in the two cases :

case1 = range - symlow * (range / symtot);
case2 = symrange * (range / symtot);

waste = case1 - case2;
waste = range - symlow * (range / symtot) - symrange * (range / symtot);
waste = range - (symtot - symrange) * (range / symtot) - symrange * (range / symtot);
waste = range - symtot * (range / symtot);
waste = range % symtot;

Putting the waste on the last symbol is not great. To make the most of it, you should really reorganize your alphabet so that the most probable symbol is the last one. Symtot is usually around (1<<14) which means the waste is on average (1<<13). Range is between (1<<24) and (1<<32) so the fractional waste is between 2^-11 and 2^-19. This appears to come out to about a waste of 1 byte every 4096 on text with the alphabet rearranged to put the space (0x20) at the end.

I should note that many many people have arithcoders that do this check to put the wasted range on the last symbol, but they *don't* do the work to make sure the last symbol is the MPS (the most probable symbol). If you don't do that, it's rather pointless. With english text for example you might be putting the extra range on the symbol "255" which basically never happens, so you are doing work for nothing.

Sean asked a good question that had me a little confused yesterday. Why do some of the renormalization updates shift in 1's to "high" and some don't? You might see :

while ( range < MinRange )
	*outptr++ = low >> 24;
	low <<= 8;
	range <<= 8;

or :

while ( range < MinRange )
	*outptr++ = low >> 24;
	low <<= 8;
	range <<= 8;
	range += 255;

I thought maybe there was some subtle math reason why you would choose not to put in the 255. Nope. It's just another approximation + optimization. Shifting in the 1 bits to range is the more accurate correct way to do it, but at that point range is already in 2^24 to 2^32, so adding on 255 doesn't amount to much. In practice on my real file tests it doesn't even save 1 byte. I guess it should save something like 1 every 2^20. So, most people just leave it out to save the 1 instruction.

Here are some numbers :

r:\bigtest.bin : inLen : 25651384

True Entropy : 18090226.9 : 5.642

// these are all using cum prob max = 1<<14 and doing a divide for range

radArith cycles: 64.9
radArith : encLen : 18090853 = 5.642 bpb

// radArith is a Michael Schindler 31 bit range encoder with check to put waste range on top sym

Arithmetic_Codec cycles: 58.2
Arithmetic_Codec : encLen : 18091236 = 5.642 bpb

// Arithmetic_Codec is FastAC as modified by me with no check for waste on top sym

rrArithCoder cycles: 57.3
rrArithCoder : encLen : 18091236 = 5.642 bpb

// rrArithCoder is my reimplemented 32 bit range coder similar to FastAC ; yay I beat them by 1 clock !

rrArithBinaryModel cycles: 35.1
rrArithBinaryModel : encLen : 18844782 = 5.877 bpb

// rrArithBinaryModel is a bit-by-bit adaptive model + coder ; that's cycles *per bit* , it's 280 cycles/byte

rrArithCoderRawBits cycles: 24.1
rrArithCoderRawBits : encLen : 25651385 = 8.000 bpb

// rrArithCoderRawBits just puts the bytes out through the arithmetic coder
// this basically just measures the speed of the renorm loop

rrHuffman cycles: 14.5
rrHuffman : encLen : 18179876 = 5.670 bpb

// Huffman encoding is still 4X faster and doesn't cost you a whole lot in compression efficiency


A little more on arithmetic coding ...

Frédéric Kayser sent me the link to this FastAC page with a bunch of papers and source code by Amir Said. It's pretty good stuff, I thought I'd give a synposis of what's in there :

The coder is almost a vanilla Michael Schindler style "range coder". There are a few little tweaks in there that differ from the standard implementation.

1. Instead of a "virtual queue" for the bytes that might carry, he goes ahead and just writes them out, and then if a carry is detected he goes back through the output bytes and increments them while they're 0xFF. This violates one of the rules of standard arithmetic coder design, which is that you never touch output bytes after you stream them out, so that they can be transmitted or written to disk or whatever. Now of course if you're just outputting to a big buffer in memory his way is totally fine, and it appears to be faster than bothering with all the variables to account for the virtual queue.

2. He actually uses the full 32 bits for range instead of 31 bits like in the standard Schindler. We use 31 bits so that we can see the carry in the top bit. Said has a neat little trick for detecting the carry :

U32 old_low = range_low;

range = range_high - range_low;
range_high = range_low + symhigh * (range / symtot);
range_low += symlow * (range / symtot);

if ( range_low < old_low ) // carry

(This maybe could be even faster if there were an elegant way to directly get the carry flag in C). Using the full 32 bits is nice because it makes you byte aligned and eliminates some little ugly bit junk you have to do with shifting bytes 7 and 1 to get that last bit right. This lets you avoid a LOT of shift and mask work which is a big win.

Now, as for the models, his "Adapative_Data_Model" is 100% exactly my Deferred Summation model from DCC96. You can read about the original DefSum in my papers section or get the ancient code in the statistical coders section. From what I can tell there's nothing new or interesting in the modeling, it's 100% just deferred summation to get you power of two CumProbMax so you can turn your division into a shift, he uses my fast decode table, etc.

Anyway, the implementation is pretty good and the source code is pretty clean, so it's a decent place to start. However, I don't recommend actually using it as is. The models are retarded and the API interface is very bad.

The reason why "Deferred Summation" never amounted to much is that arithmetic coding a big hunk of symbols with order-0 modeling is a retarded thing to do. You just never actually want to do that. If you're doing arithmetic coding, you're also doing higher order modeling or LZ coding, or something, so that you don't just have static statistics. Providing API's that are locked to these silly models is lame. The correct API interfaces for an arithmetic coder are the ones I have in "arithc" , just

Encode( symlow, symfreq, symtot );
EncodeTotIsPow2( symlow, symfreq, symtotshift );

The binary models are also silly. He does deferred summation for the bit model as well, which is not necessary. You can either do the implicitly rescaling fixed-tot bit model that I just wrote about recently, or you can use a rung-ladder model like I have in crblib which uses a table to find power of 2 probabilities from a state.

Said's codec is in fact very fast. It's almost 10% faster that any arithmetic coder I've ever seen. That's pretty damn good. I'm going to modify it a bit to provide what I think are reasonable API's and post it.

Okay here it is : cb_Arithmetic_Codec.zip ; a tiny .h with a modified version of FastAC. This is pretty rough, I tried to modify the original FastAC code as little as possible.


Random file stuff I've learned :

Good article on Windows file caching . The Windows file cache architecture is pretty sweet really. Files are cached in 256k blocks. The caching is also file "buffering". That is, file IO in the application doesn't get any buffer just for that app - you get access to the cache pages, that is your buffer. It means multiple apps on the same file share buffers. This is all pretty sweet. The caching is just a file mapping of pages in the system address space. When you create a memory mapped file, you are just getting direct access to those pages, which you can then lock to get pointers in your address space. You can lock & unlock the pages just like any pages using VirtualAlloc and VirtualFree. When you write to pages, you can control them getting flushed out the write cache by unlocking them.

Windows file caching is *extremely* greedy. It will basically throw away anything it has cached in order to cache the latest request. This makes it look very good in benchmarks, but can be pretty retarded in practice. For example, I was testing by doing linear streaming IO across a 2 GB file. Windows caches every single byte of the 2 GB file so that future touches to it are instant; in order to do that, it will evict the baby and the bathwater from the cache (including stuff like the cache of the NTFS MFT table).

One thing I haven't been able to figure out is a way to manually control the prefetching that windows does. When you do a bunch of ReadFile calls in a row, the cache will asynchronously prefetch ahead the next 256k chunk. This means that even if you just call the synchronous ReadFile (or fread) to get bits of the file, the return is almost immediate. What I would like, however, is a way to manually say "prefetch at this byte". The only way I can do that at the moment is to issue a read at that byte on a thread, but that's not really ideal, it would be better to be able to signal the underlying cache directly.

Also it appears that if you use memory mapped files, windows stops doing the smart async prefetch ahead. Say you memory map a huge file, then you run through streaming touching pages in order. Windows uses the virtual page fault from the memory manager to interrupt and pull in pages from disk. It seems to not prefetch because you get a lot of faults and your app just stalls when you get to pages that aren't in memory yet. Memory mapped files are pretty awesome, it's cool that you just get access to the cache, but without the ability to request prefetches they're pretty worthless since the plain old ReadFile beats them so badly. (if you are in fact doing totally random file accesses on a huge file, then they are just about perfect).

Stdio in MSVC has a 4k buffer and then calls straight to ReadFile to fill it. The 4k buffer isn't really providing you any "buffering" it just hides function call overhead and makes the reads page aligned; the real buffering is inside ReadFile.

If you're linking with the multithreaded libs, stdio becomes outrageously slow. Personally I like my threading model to work such that individual objects, such as a "FILE" are not protected from concurrent use by the library. That is, once the API hands out an atomic object to the app, it is up to the app if it wants to protect that object from multiple threads accessing it or not. The library should only do protection from access to shared/global data structures. (I know they can't do this with stdio because they want to make sure it's safe, and they want to protect stdout/stdin/stdierr). Anyway, the result is that "getc" goes from being around 10 clocks to being around 170 clocks. Fortunately, you can fix that pretty easily by just bringing back the macro yourself. Just define this :

#define macrogetc(_stream)     (--(_stream)->_cnt >= 0 ? ((U8)*(_stream)->_ptr++) : _filbuf(_stream))

And use it to do your IO when you know the FILE is only being used by one thread.

Something I just discovered today : while Windows buffering is in fact very good for reads, it sucks for writes. If I write out a stream of data through a buffered file, I only get around 25 MB/sec. If I write out chunks with async non-buffered calls, I get 100 MB/sec. (conversely, buffered streaming input is actually faster than doing a bunch of async non-buffered reads).


WTF's :

#1. WTF, tabs in HTML are 8 characters. Nobody in their right mind wants fucking 8 column indents. Yeah yeah I know that's the original standard size for a tab but WTF were those people thinking. A tab is 4.

#2. When I say "PRE" in my HTML that means fucking PRE! Stop trying to turn my less than signs into fucking HTML codes! I SAID "PRE"!


From the MSDN :

Because buffer addresses for read and write operations must be sector-aligned, the application must have direct 
control of how these buffers are allocated. One way to sector-align buffers is to use the VirtualAlloc function 
to allocate the buffers. Consider the following:

    * VirtualAlloc allocates memory that is aligned on addresses that are integer multiples of the system's page size.
	 Page size is 4,096 bytes on x64 and x86 or 8,192 bytes for the Intel Itanium processor family. For additional
	 information, see the GetSystemInfo function.
    * Sector size is typically 512 to 4,096 bytes for direct-access storage devices (hard drives) and 2,048 bytes for CD-ROMs.
    * Both page and sector sizes are powers of 2.

I'm guessing that if you ever tried to use a sector size larger than 4096, a lot of apps would fail. And by "a lot" I mean 100% of apps that do non-buffered IO.

This is an example of terrible API's, IMO. It's extremely hard to do the "right" thing (for each file name, figure out what drive that file is on, ask for the sector size of that drive, allocate buffers aligned to the sector size of that drive), and it's very easy to do something that will almost always work - just use VirtualAlloc to get 4k aligned pages and use that. Which is why everyone just uses VirtualAlloc. Until one day it doesn't work.


Fuck I need to be able to make VC never ever never try to fucking get missing items from source control. That's such a broken and horrible feature. Sometimes it completely hangs VC. At other times, like when I have items in perforce open for delete but not yet submitted, it fucking creates 0 byte local files for the ones I've got open for delete. WTF.


I'd like to do some backpacking, but with the state of my back I don't think it's a good idea, like ever. Backpacking is just out for the rest of this sad life of mine. But it occurs to me I should just hire porters. Get a "sherpa" (and by that I mean a Mexican) to carry my gear. He'd be happy to get $10/hour, and the cost of that is pretty microscopic to me in the scheme of things. Hell, paying the sherpa is cheaper than the cost of the gas to drive to the trail head. Fuck he's cheaper than the fucking hiking pants at REI that are like $250 for fucking hiking pants. I'll just wear no pants and hire another Mexican to follow me around and rub my legs to keep them warm.

We currently live in a society of grossly different costs and wages, but we're not acting like we do. It's sort of a taboo to have servants and treat the underclass as shittily as we logically should.


The lack of color adjustment on the Dell 30" is fucking retarded. What did they save, like ten bucks by leaving that out of the circuitry ? I've always found "color temperature" the easiest and fastest way to adjust a monitor to look decent in different room lighting environments.


Urg, somebody help me out, I can't find this with Goog :

There's a way to encode the integers in binary such that larger values always have more or equal number of bits "on" (=1) than smaller values. Eg. if X >= Y , then NumBitsOn(X) >= NumBitsOn(Y). There's a cool way to make this encoding using just a simple bit trick, but I don't remember where I saw it.

For example one possible encoding of 3 bit integers is :

0 : 000
1 : 001
2 : 010
3 : 100
4 : 011
5 : 101
6 : 110
7 : 111

Maybe my memory is wrong that I saw a bit trick for this. Certainly I recall seeing the trick for Gray Codes, but that's not quite the same thing.


Followup on the "Russian Range Coder". So, first of all let me stress once again that this thing is pointless and you should not use it. It's an "optimization" that gives up some compression and doesn't actually make you faster at all. You should never do approximations or optimizations unless they actually help, even if they look like they should be faster. Best to stick with full quality everywhere lest it come back to bite you.

But I was thinking about it and I realized you can improve the renorm loop. The original loop in the case of underflow just truncates the range down to the bottom part to make the top bits always equal to the top bits of "low". But that can be really bad in some cases, since you might be straddling a boundary, but most of your range could be above the straddle, not below. It's easy enough to just check whether you have more range above or below the straddle :

    while(rc->range < (1<<16))
		int above = (rc->low + rc->range) & 0xFFFF;
		int below = (~rc->low) & 0xFFFF;
		//int above = rc->range - 1 - below;
		U32 hi = rc->low + rc->range;
		U32 hi_below = rc->low + below;
		U32 lo_above = hi & 0xFFFF0000;
		if ( above > below )
			rc->low += rc->range - above;
			rc->range = above;	
			rc->range = below;
		*rc->ptr++ = (U8) (rc->low >> (RANGE_SIZE - 8));
		rc->range <<= 8;
		rc->low <<= 8;

That is, we either push the top bits of low up to the top bits of hi, or push the top bits of hi down to the top bits of low, depending on which gives us more code space. Again, this doesn't hurt speed at all because it's rare. In fact, that's why you should just use a proper virtual-queue range coder that handles underflow right.

Just to make it super clear what's happening, here is an actual example :

    rc->low + rc->range            0x4d0043e4
    rc->low                        0x4cffa580
    above                          0x000043e4
    below                          0x00005a7f
    rc->low + below                0x4cffffff
    rc->low + rc->range - above    0x4d000000

Some numbers :

m:\ccc\book1 : inLen : 768,771

True Entropy : 435,042.6 : 4.527 bpb

radArith : encLen : 435,116 = 4.528 bpb
	bytes exceeding entropy : 73

original RRC : encLen : 435,284 = 4.530 bpb
	bytes exceeding entropy : 241

cbloom RRC : encLen : 435,216 = 4.529 bpb
	bytes exceeding entropy : 173

radArith cycles: 69.6
original rc cycles: 74.2
cbloom rc cycles: 70.3

Here "radArith" is a normal Michael Schindler range coder with virtual queue. My modified "russian range coder" has about half the waste of the original (compared to the canonical range coder, not compared to entropy). Note that the arithmetic coders here are all using a max cumulative frequency of 16384, while the "true entropy" uses the exact character counts in the file.

BTW these are the links to the authors of the "Russian Range Coder" - they seem to all be dead. If anyone has live links to these people let me know and I'll hook them up.


  Arithmetic coder based on "carryless rangecoder" by D. Subbotin.
  * http://www.softcomplete.com/algo/pack/rus-range.txt
    (Original C++ code by Subbotin)
  * http://hem.spray.se/mikael.lundqvist/
    (C impelementation by M. Lundqvist)
  * H. Okumura: C magazine, Vol.14, No.7, pp.13-35, Jul. 2002
    (Tutorial of data compression with sample codes, in japanese).


It's been a really rough week. I caught a little cold for the first few days and just felt like ass. Then Wednesday morning for no apparent reason my back just gave out. It happens to me about once a year. It almost always happens in the morning, in the first ten minutes after waking up. That's the most dangerous time, when your spine has moved around in your sleep, and then suddenly you put the load of your weight on it by standing up, and something goes horribly wrong. All of a sudden a nerve was getting pinched, like a hot knife jabbed in my back, and I collapsed on the floor to try to get some relief. My whole back instantly knotted up to try to protect the nerve, which of course just makes it worse. I lay on the floor for hours that morning just breathing through the pain. The whole week has been agony, I'm still in intense pain, it's no longer just in that one spot, it's spread all over my back now as the knotted up wrenching of the muscles has pulled on many parts and everything hurts now. But the worst is over I think, and fortunately Alissa has been an angel giving me back rubs and helping me through it.

It's a random moment that the back rebels, but of course it's because something bad has been happening to it. I have a pretty good guess. After years of being on my own schedule and doing lots of random physical activity I suddenly went back to sitting at a desk for way too many hours every day. Then add in moving loads around, the fact that I stopped working out for a month, and then the kicker was that in the last few weeks I did start working out again. Being sedentary at a desk for 10+ hours and then suddenly doing 1 hour of really hard workout is a sure way to hurt yourself.


Night in Seattle. A trash can sits on the curb across the street from out apartment. It's a quiet residential road, no cars are passing by. A pedestrian walks down the sidewalk and suddenly kicks over the trash can. It noisily bounces into the middle of the street. He walks on somewhat sheepishly, looking around to see if anyone noticed, as if he just tripped and is trying to pretend nothing happened. I'm surprised and amused. A few seconds later another pedestrian walks up from the opposite direction. He pauses for a second and looks at the trash can, walks into the street and picks it up and restores it to its original position on the curb, then continues on. I'm even more surprised.


I have a ton of grey t-shirts, like maybe ten of them, nearly identical. I don't want to wear grey t-shirts any more. I've been past that phase for a long time now, but I keep wearing them because I'm cheap and lazy, they're in my drawer and I don't want to have to shop.

I miss the heat of San Francisco. Sunny days lying out in Dolores park, Alissa in her bathing suit, that guy who looked like Phillip Seymour Hoffman and took his shirt off and started doing these amazing athletic hula-hoop moves. Sitting in my apartment where it was so hot I had to sit around in my underwear, or go out and sit on the stoop. I miss the liveliness, all the wild people around, walking out of my apartment to get groceries and being in a zoo of young friendly hip people. I miss the feeling of being part of something bigger, like everyone in the city was enjoying and suffering through the same shit, and fuck the tourists and bridgers.

I miss watching football on Sunday with my brothers, going out to the street at half time to throw the ball around, in the horrible humid heat of Houston. I miss watching football on Sunday in San Luis Obispo, usually alone, but with that glorious private patio. I would usually barbecue something for myself, chinese style chicken thighs, or a slow cooked tri-tip with dry rub, eat several times, get drunk, run around miming football moves inside my house of glass, then get bored and go stand around outside half naked and half drunk and hit fruit into the neighbor's yard with my baseball bat.


Good stuff at Trader Joe's right now :

Petit Verdot TJ label wine from Paso Robles (actually Creston). It's $9.99, very balanced, quite earthy with notes of leather and spice, kind of like grenache. Plus who the fuck has had Petit Verdot?

The new hot dog buns are sublime. They're extremely fresh and tender, unsliced, they're almost more like a dinner roll than a traditional hot dog bun, they taste fresh baked and melt in your mouth. They should not be grilled or broiled, that's murder to hot dog buns. They should be very lightly steamed to moisten and warm them, as is traditional in Chicago.


Rant on New Arithmetic Coders

Can we fucking stop calling arithmetic coding "range coding" !? It's fucking arithmetic coding. "Range coding" differs from the ancient CACM arithmetic coder in exactly two ways :

1. It uses a "virtual queue" for the carry propagation, exactly as I described in my VirtQ paper . (warning : ancient!)

2. It introduces an approximation to the range update which lets it use more bits by changing the order of multiplication :


range = range_high - range_low + 1;
range_high = range_low + (symhigh * range) / symtot - 1;
range_low += (symlow * range) / symtot;

"range coder" :

range = range_high - range_low;
range_high = range_low + symhigh * (range / symtot);
range_low += symlow * (range / symtot);

This small rearrangement of the math does in fact make a huge difference, but dude look, it's literally just moving where you put the parentheses.

Of course it means you save a divide, and it means range can go up to 31 bits since you never multiply by it. Getting range up to 31 bits means you can afford to shuffle out whole bytes at a time. In CACM the multiplicataion has to fit in 32 bits so your range can only be 16 bits so outputting bytes is a no go; range stays between 15 and 16 bits all the time in CACM while a "range coder" can go down to 23 bits; when it's down at 23 bits you only have 9 bits of coding precision because of the approximation of doing the divide before the multiply.

I'm sick of all these web pages that are like "LZMA uses a statistical algorithm called range coding". No, it uses a fucking entropy coder called arithmetic coding.

Okay, this was just an angry rant, but it reminds me of something interesting. I recently stumbled on some new arithmetic coders and they blew my mind for a minute until I figured out how they work. (BTW For reference, the canonical "range coder" is by Michael Schindler )

(BTW #2 it's sort of curious to me historically that I missed the range coder; I was so close to inventing it myself, but somehow got right next to it and didn't see it. All the pieces of the range coder were thought of previously by various people, including the original CACM guys, such as trying to use 32 bits, shifting out bytes, approximating the division, and the virtual queue, but on their own each of those ideas doesn't quite work or make a big difference, nobody put those ideas together and realized that together they make this "range coder" thing that's a big improvement).

The new arithmetic coders I've discovered are the fpaq0p binary range coder and "rc" the "russian range coder". FPAQ0p is by Ilia Muraviev, the guy from "encode.ru" who's done lots of good stuff; you can find it on Matt Mahoney's PAQ page . The "russian range coder" used to have a web page but it appears to have disappeared. I found the code in MRP (Minimum Rate Predictors, a decent lossless image coder).

The "fpaq0p" arithmetic encoder is brutally simple. It's what you would write if you just heard of arithmetic encoding; in fact it was my first idea for a coder ever, but it didn't work, so everybody dropped it. Well, it's back. If you remember the super basic idea of an arithmetic coder is you're specifying a code stream by writing an infinite precision number; at any time you know your desired number is in some interval, low and high, you shrink the interval down to specify the symbol, and to do it in finite precision you shift out top bits of low & high as they become the same. (See my ancient write up of basic arithcoding )

So what would you write?

range_low = 0;
range_high = 0xFFFFFFFF; // 32 bits on

range = range_high - range_low;
range_high = range_low + symhigh * (range / symtot);
range_low += symlow * (range / symtot);

while( (range_low>>24) == (range_high>>24) )
	*outPtr+= = (range_low>>24);
	range_low <<= 8;
	range_high <<= 8;
	range_high += 0xFF;

Okay, good job, but this is badly broken. First of all, you're overflowing your 32 bit code. Second of all, it doesn't handle carry propagation, and it doesn't handle underflow. The problem with carries is that range can step up to the next bit and push a bit over code size. The problem with underflow is that we aren't handling the case of the range getting very very small but while the top bytes are different.

In particular, when range_high comes down close to range_low from above and range_low comes up close to range_high from below, but the top bytes are different, eg, they're like 0x800000AA and 0x7FFFFFBB , the top bytes are different so they don't get shifted out, but the range can get arbitrarily small. (in particular the top byte of "low" is X, the top byte of "hi" is X+1, the second byte of low is 0xFF and the second byte of hi is 0x00).

But fpaq0p works, so how does it do it?

The main trick is to only do binary coding. Then you just don't worry about underflow. With the underflow problem, your range can get as small as only 1 , with 0x01000000 and 0x00FFFFFF, but you implicitly add 1 to the top, so you still have enough range to code 2 symbols, which is what you need for your binary alphabet to be decodable. Any alphabet larger than binary is not codable there.

The other minor trick is just about where you do your "+1". The high is always implicitly +1 above what you actually store in range_high. We started it with 0xFFFFFFFF but we want an implicit FFFFF... repeating to infinity after it. That's why we shift in FF, but we also have to do it right in the range update in a way that doesn't blow the 32 bits or ever generate a carry.

So the working fpaq0p coder is :

range_low = 0;
range_high = 0xFFFFFFFF; // 32 bits on

range = range_high - range_low;
range_mid = range_low + symp0 * (range / symtot);
if ( bit )
	range_low = range_mid+1;
	range_high = range_mid;

while( (range_low>>24) == (range_high>>24) )
	*outPtr+= = (range_low>>24);
	range_low <<= 8;
	range_high <<= 8;
	range_high += 0xFF;

Which looks like it's doing the same trivial thing, but is subtly different.

Note that in the horrible underflow case your coding precision can get as bad as only 1 bit (that is, the probabilities are forced to 50/50 regardless of what they really are). Obviously that hurts coding, but it's actually not that bad, because it's statistically incredibly rare. In fact it happens something like 2^-24 of the time, since all spots on the code range are equally likely. And even when it does happen you only lose a few bits of coding efficiency.


To do this faster :

	while( (range_low>>24) == (range_high>>24) )

You can do :

	while( ((x1 ^ x2)>>24) == 0 )


	while( ((x1 ^ x2)&0xFF000000) == 0 )


	while( (x1 ^ x2) < (1<<24) )

but in practice it doesn't really matter because you're stalled out on branches anyway.

Also in a binary arithmetic coder you never actually divide by symtot. You use symtot = some fixed power of 2, and you do a probability update that keeps tot the same. My code's got this old rung/ladder thing that does different styles of adaptation with a power of 2 total. Or you can just do the super simple thing that everybody seems to do these days :

#define symbits		(12) // or whatever < 14
#define symtot		(1<< symbits)
#define updshift	(6) // or anything in {1,symbits-1}

void update(int & p0,bool bit)

	if ( bit )
		p0 -= p0 >> updshift;
		p0 += (symtot - p0) >> updshift;

Here "updshift" controls how fast your adaptation is vs. how precise you become in the long term. This update makes no distinction between the initial rollup when you have no stats and the adapted state, which the rung/ladder does. This is an approximation of a normal n0/n1 counter with an implicit max count of (1 << updshift). Compare to an n0/n1 update :

void update(int & p0,bool bit)
	int n0 = p0;
	int n1 = symtot - p0;
	int increment = symtot >> updshift;
	if ( bit )
		n1 += increment;
		n0 += increment;
	p0 = (n0 * symtot) / (n0 + n1);

We can see the equivalence and approximation by looking at the bit on case :

	I'll use := to mean assignment while I do maths :

	p0 := (n0 * symtot) / (n0 + n1);
	p0 := (p0 * symtot) / (symtot + incr);
	p0 := (p0 * symtot)*(symtot - incr) / (symtot + incr)*(symtot - incr);
		now approximate incr^2 is much less than symtot^2
	p0 := (p0 * symtot)*(symtot - incr) / (symtot^2);
	p0 := (p0)*(symtot - incr) / (symtot);
	p0 := p0 - (p0*incr)/symtot);
	p0 -= p0/(symtot/incr);
	p0 -= p0>>updshift;

BTW this approximation goes back to the ancient stuff by Howard & Vitter on fast arithmetic coding. Of course in practice you also want to merge your modelling with arithmetic coding since they do the same random branch on bit.

Most people these days are good at designing their compressor to drive the binary arithmetic coder well. A well driven binary arithmetic coder is generally coding probabilities close to 50%. That's better because it keeps precision high and keeps you in the accurate part of the coarse probability update approximation. It's also better because it means you're calling the coder many fewer times. If you only call the arithmetic coder with probablities close to 50% it means you're only doing coding roughly as many times as the number of bits you output, which is low. You accomplish this by coding symbols that are "natural" for the problem. For example, for LZ77 offsets you code in the log2 of the offset (similarly with BWT MTF codes). etc. etc.

So I also found this range coder implementation called the "Russian Range Coder". It's extremely similar to the standard Michael Schindler coder except the renorm & bit output loop is slightly different :

"Russian Range Coder" :

    while((rc->low ^ (rc->low + rc->range)) < (1<<24))
		*rc->ptr++ = (U8) (rc->low >> 24);
		rc->range <<= 8;
		rc->low <<= 8;
    while(rc->range < (1 << 16))
		*rc->ptr++ = (U8) (rc->low >> 24);
		rc->range = ((~rc->low) & 0xFFFF) << 8;
		rc->low <<= 8;

So what's going on here? The first part we can recognize is exactly the simple loop from fpaq0p that just outputs the top byte when it's the same. But if you just do that you're fucked for multisymbol alphabets because of underflow, so looky here, the second part handles that. It checks if the range is getting too small to resolve symbols. But normally in the virtual queue / schindler coder that would mean setting up a carry queue and such. This coder avoids that. It does it by just giving up a huge chunk of the range.

The bad case occurs when the range is small and the top bytes are off by 1. The second bytes are 0x00 and 0xFF. With a virtual queue, you would put the top byte of "low" in the virtual queue spot, and count the number of 0xFF bytes but don't write them yet. If you get a carry, you propagate the carry by writing the top of low+1, then implicitly flipping the 0xFF's to 0x00's and writing them. This is the virtual Q / schindler way.

Here we don't check for underflow and do the carry and such, we just smash down the "high" to the "low". We know the top bytes are not equal, so the next bytes must be 00 and FF, eg. we have something like 0x0B00XXXX and 0x0AFFXXXX , we just smash the high down to 0x0AFFFFFF. This bit :

	rc->range = ((~rc->low) & 0xFFFF) << 8;
	rc->low <<= 8;

is the same as :

	high = low | 0xFFFF;
	range = high - low;
	range <<= 8;
	low <<= 8;

which I think makes it clearer what's happening. You just chop down to make your top bytes the same, eliminating the bad straddle situation which caused underflow. Now you can renormalize the simple way by shifting out equal top bytes and get a big enough range back.

Now obviously this is an approximation that loses some efficiency, you're throwing away on average half your range in the cases that you have this bad underflow, but again it's pretty rare. I don't know the exact probabilities, but the measured efficiency is very close to the full coder.

In practice, however, this coder is not any faster than the full correct schindler coder, so there's no point to using it. It is however, very simple and elegant once you see how it works.

ps. the other new thing is the Jarek Duda lattice coder thing which I have yet to understand. It seems to offer no advantages and many disadvantages, but I would like to grok it for my own brain's happiness.


I miss Ebert and Roeper. I haven't added anything to my Netflix queue in months. Roeper's opinion is pretty worthless, but I liked that Michael guy, and anyway the best part was just seeing the actual clips from the movies, which gives you a better feel for whether you'll like it or not.


Random weird stuff about Seattle :

Cars park any which way they want. You see them parked nose-to-nose and butt-to-butt. I always thought it was a dumb law that you hard to park all the same direction. I got a ticket for it once in Pasadena; my uncle even told me "hey, you shouldn't park backwards they'll give you a ticket" and I was like "pshaw, no way they write that ticket, I didn't even know it was illegal". Boom I got the ticket. Anyway, it looks really weird to me now.

They let power lines go right through the middle of trees. In California if a tree gets anywhere near a power line they chop it back (and by "it" I mean the tree, not the power line). It was done to some of my favorite trees and it pissed me off at the time, but here there are power lines literally going through trees. The trees sway in the wind and tug on the wire and nobody thinks it's a big deal. I don't exactly understand, but hey, I like trees.

There are goths! WTF! It's especially funny because the goths are sort of demure and domestic like everyone here; I see them in couples shopping at the grocery store in 6 inch boots laced up to their knees and long velvet coats, quietly discussing which brand of pasta sauce to buy.


People who do not know how to use the grocery store self checkout should not try it for the first time during evening grocery rush hour. WTF. Ridiculous lack of courtesy. People older than 60 should *never* use the self checkout, or any other electronic device where others have to wait behind them. In fact people over 60 really shouldn't leave their homes.

The people who live in our building are fucking WEIRD. Not the stompy coke-head who lives above us, that's just normal weirdness. I mean everyone else in the building. They creep around and try to avoid eye contact. They have not said hi once. A few days ago Alissa and I got to the front door and one of our neighbors that I've never seen before was at the mail boxes. They started frantically hurrying with the mail so they could get it before we opened the door, and they ran up the stairs ahead of us without a look or a hi. When we first moved in I saw a few of the neighbors and tried to wave or make eye contact and they actively looked away. A few weeks ago I got a package from UPS and another neighbor got the package at the same time, so we both went to the front door. I could see the terror on his face as soon as he saw me. I said "hey" , no response. At the front door he chatted with the UPS guy, we both got our stuff, and when we shut the door I said "howdy", no response.

Drivers up here are fucking assholes. I mean willfull look you in the eye fucking assholes. It really bothers me. Every day when I drive, or even when I walk (!!) they do something to really piss me off, and it turns me off humanity in general up here. It seems like everyone up here is a selfish mother fucking cock who has no respect for other people's time or lives. It's bad that they're asses to other drivers, but people are fucking asses and reckless around pedestrians, which is so unnacceptable. Cars just do not stop for peds here. You can be like a quarter of the way across the street and cars on the same side as you will still just keep blazing through at full speed. And I'm not talking about J-walking, I'm talking about crossing at the corner. If you just stand at a corner and wait, the cars will literally never stop for you. It's fucking whack.


So I looked into LZMA / 7-Zip a bit. So far as I know there's no real description of the algorithm available, but there is source code here . This is from just browsing the source code for a bit, so take it with a bit of skepticism.

LZMA is an arithmetic coded LZ + context coder. It has a couple of clever ideas that make it perform very well. In fact it compresses better than Quantum and is also faster (at compression - Quantum is faster at decompression, in fact the only disadvantage of LZMA is that its decomp is not super fast).

The match finder is very fast. I can't figure out what it's doing exactly. It makes strong hashes using the CRC. The docs claim it then goes to a patricia trie, but my god I can't see that in the code.

It does lots of binary arithcoding. In fact I don't think it has a multisymbol arithcoder at all, it just breaks larger alphabets into a series of binary codes. He does a pretty good job of breaking up the binary tree so that the common cases of short lengths and low offsets get written in very few binary coding ops (that is, the coding choices of 0 or 1 are close to equal probability).

There are some clever models. In general it's well tweaked to use few contexts but capture the salient information in those contexts. For example, the order-1 literal model only uses the top N bits of the previous char as context, since for most data the top bits are the good predictors and the bottom bits are semi-random. One clever model is using the bottom 2 bits of position as part of the literal context. This is great for capturing the very common pattern on binary data of a 0 every fourth byte, or any every-2 or every-4 bytes kind of patterns (such as when you write a bunch of 32 bit floats and the exponent part is the same on all of them).

It also does a semi-optimal parse. From what I can tell it looks like it finds the match possibilities N steps ahead, and then does a kind of semi-backward cost evaluation from the end of that sequence to pick the best choice at the current coding spot. I believe N is only 4, so this is sort of like the old "lazy matching" but it looks ahead 4, and rather than a rough heuristic like most people use in lazy matching he actually evaluates a coding cost like a true optimal parse.

On text it's very good, but on binary it really kicks ass, I'm guessing the biggest win is because of the pos-conditioned contexts which are really good for binary.


Is it possible to make the MSVC help actually behave reasonably? Like #1) never ever never fucking pop up that Dynamic help nonsense. Make that thing go away forever. #2) just show me the fucking Platform SDK. When I search for "BeginPaint" I want the fucking Win32 API help on BeginPaint, not fucking Windows CE, not fucking the BeginPaint member function on CWnd or whatever. I know I can do the "Platform SDK" filter, but that's not exactly what I want either because that just blocks out other categories, I want to be able to set the search order. If I just include the Platform SDK I don't get any compiler help or XBox help or whatever, but if I include those then I pull in junk I want and I make the default action of "F1" be all retarded. Urg.


One of my huge pet peeves is when people are like "oh yeah, do whatever you want, I don't care" and then when you actually do something or suggest something you discover they actually do have very specific desires and just didn't tell you. Like you're trying to decide where to go eat and they're like "oh, I don't care", so you suggest a place, and they go "mmm, no" , you suggest another "not that", urgh, fuck, so obviously you do care, why don't you give me some fucking clue what the boundaries are so I can work within them.

I think this is the same thing that really frustrates video game artists. If you give them really specific rules where they know things will be okay, then they can just relax and go to town and explore what they can do within those rules. The thing that really pisses you off is when you think you have freedom, and you try things, and you keep getting told, no, not that, not that, not that.


There's a certain file IO library for a certain console platform that recently came out which is rather similar to the Oodle stuff I'm working on. I think that's actually good for me, since people who use that will already have set up their game to work with a decent layer, so they can easily switch their calls to talk to me instead, and they'll get platform independence and all the other good stuff I do.

Anyhoo, this certain library is a ridiculously overcomplicated mess of heavy C++. And it's on the least C++ friendly major platform in existence. WTF. The whole advantage of making libraries for a single platform is that you can make things very light and easy to use and tailored to the strengths of that platform.


I just don't understand the anger against regulation and big government. People act like regulation killed their mother. Here we are in the middle of the nation's second largest ever financial collapse, caused almost 100% by lack of regulation, and still people are railing about how government needs to be smaller. Taxes have been slashed for the super rich, and 90% of americans are far worse off for it (even though their taxes were slightly cut, the cost to them in inflation and loss of services and the national debt and the failure of our government to take care of the country is worth much more) - and yet still people rail about how taxes are too high. Certainly most of the politicians are simply in the pocket of industries, when they push for deregulation it's just so their donor can make more money at the expensive of the safety and security of the average citizen, but a lot of the voters actually believe this malarkey.

Bill Clinton was on the Daily Show last week, and he was charming and charismatic as ever. He gave a very simple analysis of the problem, which I largely agree with. After the dot com crash, we were basically in a recession. The Fed and GWB tried to block the recession by cutting taxes and slashing the Fed overnight rate to ridiculous lows, pumping the US economy full of tons of cheap credit. Unfortunately that money wasn't invested into anything productive since all our industries were tanking, the only thing profitable they could find to jam it in was real estate. It's easy to be seduced by Bill, but we should remember that he's one of the people behind this mess. The 1999 bank deregulation (also here ) that he passed was the opening of the floodgates for financial irresponsibility. Those regulatations were put in place in the depression because the financial industry had been so reckless and caused that mess, and guess what, you repeal them and they immediately do it again, big surprise! (Clinton's first step was followed up whole heartedly by GWB, who gutted the SEC and just about every other regulatory agency through appointment of people who were incompetent or simply not interested in enforcing the law; this is the real tragic legacy of the Bush administration IMO, not the Iraq war).

Of course nobody's talking about fixing the corruption that would really help. How about opening up the government-sponsored mopolies of the cable and cell networks to true open competition so that we can get some innovation and compete with other countries on tech infrastructure? How about opening up immigration for qualified applicants? Tech companies desperately want to hire more skilled labor; hell, the entire Indian tech boom would be diminished or delayed if we'd just let them come to America the way they all wanted to.

This crisis also brings into focus something I have often written about here : the corruption & giant loop hole of the corporate veil. It's almost like you can call yourself a corporation, then go on a killing spree, then when the cops come looking for you, you go "oh yeah, that corporation really fucked up, unfortunately it went bankrupt and dissolved" and the cops just say okay and go home. You can basically do whatever the fuck you want as a corporation, no laws or code of ethics binds you in any way, because if your malfeasance ever catches up with you, you just go bankrupt and everyone walks away clean. In the mean time you paid yourself huge profits. This is assuming that everyone in charge is smart enough to not link themselves personally to any of the bad deeds or do something stupid with finances to pierce the corporate veil, but that's not a question of how bad their deeds are, it's just a question of how good their lawyers are and how careful they are about working the loop hole.


It actually sort of sucks that there are so many super tuned tweaked compressors out there. It's really stifling to innovation. It means if you try something new and interesting, it's almost gauranteed to look bad by comparison because it's not as refined. This is a general problem that I often see in development. Somebody has a stupid O(N^2) sort that they put in assembly and optimized for ages. You try something better like an O(N log N) sort, but the speed numbers in your test application are worse. Well, that's not fair, your new algorithm hasn't gotten all the same tweaking and refinement. But now if I want to compare a bunch of possible solutions, I have to waste a ton of time tweaking them all out. The very existance of that one super optimized possibility really fucks up my entire algorithm search.

The thing that makes this super extra hard is that different algorithms lend themselves to more tweaking. So you really can't just implement them all in a rough high-level fashion and know that you're picking the right thing. In compression for example, you might look at Block Sorting and PPM and say hey these are both very simple elegant high level algorithms, and they perform very comparably. But they are not equal. PPM lends itself to far more tweaking and refinement and tricks and specialization, while with block sorting you're stuck in a dead end where the general solution is all you can do. Of course you see the same thing in speed and optimization.


On LZ and ACB

LZ77 coders are hugely inefficient. Everybody knows the basic idea of LZ77. You compress data using back references to previous data. You either write a match, consisting of an offset & length, or a literal, consisting of one raw character.

The inefficiencies are intuitively obvious. If the coding was perfect, then if you were the decoder, the encoded stream should be totally random, you shouldn't be able to predict the code stream, but of course you can. One way is with context modeling. If you see a context like "what the ", then it's highly likely the next sequence is "fuck" so offsets which point to that are far more likely. Similarly, certain offsets point at certain words, and there are natural likely match lengths for those as well.

Someday soon I'm going to write about how I'm doing "Optimal Parsing". But the very fact that you can do optimal parsing is a sign of the flaw of LZ77. The encoder has many choices of different ways to write the code stream, which means there is massive redundancy in the code space. There are many different LZ77 code streams which all correspond to the same input. The optimal parser tries to pick the best one of these, but it doesn't change the fact that all the ways to encode that you did not choose are just wasted probability space in the coded bits.

Let's look at how you might reduce the redundancy in LZ77 if you for some reason had to do LZ coding and wanted to go as far with it as you can :

First of all, whether or not a match will be found is highly predictable. Most high compression LZ coders do a lot of context modeling for the match flag, because you can tell by looking at the local neighborhood whether it's an area that's likely to match. There are also finite-state patterns that arise in many types of data (particularly binary data), where it may be very likely that the data goes match-literal-match-literal , or mat-lit-lit-mat-lit-lit-mat , or whatever. Apparently LZMA does this.

Modeling the redundancy in the offset and the length are more interesting and difficult. The first thing we see is that they are not really two separate parameters, they are highly correlated, but not in a trivial way. For example, say we simply find the longest possible match and send the {offset,length} in the normal LZ77 way. There were very similar shorter matches that we passed up. When we send just the {offset} , the decoder knows what the match string we want looks like, and it knows that we passed up all the other matches, so this offset must be the longest one. That means it can mostly infer the match length. The decoder gets the offset, that specifies the match string, then the decoder finds the other string in the window that has the longest prefix length with the match string. The actual match must be at least the prefix length + 1. You can then either just output that match, or you could send the amount that the lengths exceeds that.

Let me show an example for clarity :

T A B C X X A B C D T T * A B C D ...

current code pointer is at (*)

encoder send match offset "-6"

decoder sees the match string "ABCDTT"

it looks for the longest other matching prefix and finds "ABCXX" with shared length 3

therefore we must match of at least length 4

decoder outputs "ABCD"
(then possibly also an additional match length starting at 0)

Now, we're still sending the offsets raw, but of course the offset is highly predictable too. Given the current context (or whatever type of model you want to use), you're much more likely to match certain offsets than others. You shouldn't just code the offset as a backward index into the window, you should assign a probability to each. The "ROLZ" and "PMR" LZ coders of today do this to some extent.

(aside : ROLZ = reduced offset LZ; basically means only code matches that share some amount of context, perhaps also MTF the offsets within the contexts; you code many fewer matches, but that's a good thing because ROLZ coders generally use a strong context model for literals; in fact if you use a powerful PPM for literals then the fewer matches you code the better your compression is. PMR = previous match reference, just codes references to previous matches in fewer bits because they are more likely, encoder also favors choosing PMR's over other offsets).

We can code an arbitrary offset with some modeling by reordering them. Make offset 0 be the most likely, up to offset N the least likely. One way to do this is to order by context match length. Say the longest context match is L symbols, then all the offsets that match with a context of length L are first, then all the offsets that match at length (L-1), down to length (0). Within each context, order so that more recently seen & used offsets are lower. So to code an offset, you still find the longest match, then you see where that offset is in the list ordered by likelihood, and output that index. (alternatively you could just context code the offset in the normal context way).

This is very closely related to LZP, Block Sorting, and Fenwick's Symbol Ranking stuff ( for example ).

We can do something similar in a slightly different way by sorting. Consider all previous possible match spots. Sort them by their context string, backwards, like for PPM. Now to match your current string, find the longest match with your forward string. This is the offset you want to send. To send it, find your current context in the sorted list - you'll find the longest match (actually you'll be between two strings in the list). Send the difference between your position in the list and the position of the longest match. Hopefully this delta is usually small, because the best match usually comes from a context similar to yours. The length of match is like before - the offset implicility send the length, you send the additional length after the implicit.

Obviously our goal has been to make the offset and match length both very close to zero. I mentioned block sort and symbol ranking before; the statistics of the codes we're generating here should look very much like those statistics.

Well, holy crap, this is exactly ACB !! (ACB = Associative Coding by Buyanovsky). I found a paper on ACB by Tomas Skopal or you can read this or this on my site.

ACB was never given much attention for a few reasons. 1) George's english is awful and he could never write a decent paper on it, so it never got into academia, his terminology is all non-standard and very confusing, 2) it's way the hell off the optimal curve in terms of speed vs. compression. It's slower than PPM and can't compress as well. It is however, interesting to me in a theoretical sense, because it's sort of like as good as you can do with dictionary compression, and as we see it's extremely similar to LZP, block sorting, and symbol ranking. In fact all of these coders wind up sucking in the same way, because by simply turning your correlation knowledge into an index with low mean, you're giving away a lot of good local information that a plain context modeller could do more with.


I stumbled on some interesting interviews with Stepanov, the main guy behind the STL, here and here . He's a bit of a nut; there are lots of little funny gems in there, like : "I think that object orientedness is almost as much of a hoax as Artificial Intelligence."

Followup on yesterday's rant : apparently the guy who did the STL sort is Dave Musser; he has an article on his introsort which is apparently the good sort that's in the SGI/STLport STL.


Sarah Silverman is coming back! I heard she was cancelled but she's not, OMG OMG OMG.

Plus "Sunny" is back on. Season 3 was kind of hit & miss, partly because they started using writers other than the original creators, such as in "The Gang Gets Invincible" and "The Gang Gets Taken Hostage" which were very sitcom. Hopefully the new stuff is back on form.


If you search for pages on sorting (real world practical fast sorting), you might find Paul Hsieh's page or this comparison and good implementation or this detailed page with lots of graphs . But all of them are slower than std::sort. Where's the guy who wrote std::sort ? Why doesn't he have a web page? And why aren't there tons of web pages of sorts that beat it? std::sort should certainly NOT be the fastest sort at everything. It's a very very very good implementation, but it's a compromise that trades off reasonable code size and flexibility, and of course sorts are fickle bitches and it should be easy to cook up some test set where your weird sort is better than std::sort. It's kind of like these guys came up with compression algorithms that are worse than pkzip, but they published them anyway, when something better is right there for anyone to use for free.

Of course it's all sort of pointless because if you actually are doing something where sort speed really matters you should use a sort that's customized for your application. For string suffixes I'm currently using Michael Maniscalco's . For floats you can use Pierre Terdiman's radix sort. Hmm.. I just searched for the link and Google tells me the site will harm me. Better wear an e-condom if you go to coder corner. Oh yeah, apparently Michael Herf has an improved version of Pierre's radix, so just go there.

Actually it's not pointless. Having the std::sort as a good baseline is extremely valuable. It's a sanity check that lets you know if you're doing something reasonable. Good libraries like this are extremely valuable *even when you don't use them* because you can test your code against them and know you're not off in lala land. Stuff like dlmalloc and tcmalloc are great for the same reason; I've never actually shipped a product with either, but I consider them invaluable because I can easily link with them and make sure my allocator is at least as good as they are.


I found a cool windows tip . Windows uses the read only flag on *dirs* to mark them as special. If you're like me and you hate all the windows special folder shit, you can disable it easily by doing "attrib /d -r dirname". For example this will make it stop calling the fucking "My Videos" folder in shared documents "Shared Videos". Just show me the real fucking dir name mkay thanks.


I think the obsession with perfecting minutia is very unhealthy and illogical. Everything is so incredibly fucked up that going off and iterating over and over on some little thing is usually not the best bang for your buck. Certainly in game development you usually want to attack your biggest flaws, which are all over in systems that people don't usually spend much time on (UI, load times, controls, etc.) At RAD I'm seeing some of the upside to iterating on minutia. When you make one really great tool, even if it is rather narrow in focus and everything around it still sucks, it means you can use that one tool with confidence and not worry about it sucking.


Compression's really fun to work. For one thing the algorithms are fun, the theory is complex and mathematical, but the practice involves a lot of trial and error and hacking. It's a nice intersection of science and engineering. And then you have an absolute metric to test yourself on. In a lot of fields I find people going "oh yeah our method for this and that is awesome". Is it? Really? Did you test it? Against the known best algorithms, on the exact same data that they used, with a non-subjective measure of performance? No, you probably didn't, and in fact your method probably sucks bawls and you're just kidding yourself. Compression lets you check your numbers and there's no way to hide.


So here's my screen saver : cbsaver zip 112k ; with source code ; it's crufty as hell but IMO it works pretty nicely. It doesn't do multimon or a lot of other junk screen savers should do, like the preview drawing is just a pink fill. I should also make the transitions actually use MMX or something fast, since like hey I work for the company that does the fastest damn software color junk in the world.


Fucking upstairs neighbor woke up at 6 and clomped around in the bedroom over my head. I lay in bed clutching the pillow over my ears, dreaming of beating the bloody shit out of him (literally), imagining the pleasant weight of a hammer as I swung it and cracked his skull, I pictured lurking in the stairwell waiting for him to come down and jumping out and pushing him down the stairs. I haven't slept well in days and it's making me depressed. I have no energy, a mild headache, and a sour demeanor. Some days it feels like all I do is fucking drive back and forth to work. I hate all this rushing. I cram breakfast down my throat at 10 AM when I realize I've been working all morning and haven't eaten yet and I need to get to work. At night I get home at 9 and realize I'm starving because I skipped lunch and have to just cram dinner down my throat. I like to take my meals slow, cook, enjoy it. Eating lunch feels like such a huge waste of time, I feel like I have no hours to work if I take a lunch break.

The commute here is really awful. It's almost impossible to avoid traffic. It's trafficy in the morning from about 7 AM to 10 AM. In the evening it's trafficy from 3 PM to 8 PM. The really frustrating thing to me is that it's such a short distance, and there really aren't that many cars on the road. The traffic is 90% just because these people are all fucking retards. They see a car half a mile ahead touch the brakes, so they slam on the brakes, and then everyone behind them slams on the brakes and we get a huge lockup. I hate to be that California guy who's always talking about how it's so much better there, but my god the drivers here are so much worse. Every single merge is a clusterfuck. The people merging don't get up to speed, the people being merged into don't change lanes, and they try to be "courteous" by slamming on the brakes to let the damn slow merger in.

Fuck, this rant tires and bores me.

I'm sick of my energy and vitality being drained by people who are just fucking bores or assholes or morons. I don't want to deal with any of you. I can't teach you, I can't improve you, you fucking suck, get away from me and stop dragging me down. I want a cabin in the woods. Perhaps a yurt on a mining claim. I'll run naked in the grass, feel the sun on my skin, jump in the freezing water, slink through the forest, speak with the deer, and be happy.

The drivers here are like holier-than-thou know-it-all sort of nosey minister's wife think they know best, but also retarded. You get a lot of scenes like when two people pull up to a stop sign and they both sit there and wave for each other to go. That is not courtesy, that's just retarded. And in fact its very discourteous to the people behind you. Somebody just fucking go!


So I wrote my own screen saver. I just wanted a fucking decent "My Pictures" screen saver. My media PC runs on an old CRT TV, and those things have this property that if the screen image ever changes really abruptly, they make a weird hiss sound, like electricity crackling. So I tried turning on "transition" in the My Pictures screen saver. LOL. What it does is first completely blacks the screen, and then does a transition effect to bring in the new image. Then when it's time to bring in the next image, pop to black, then some lame effect.

So my screen saver just cycles through your pictures and actually does smooth transitions. And hey, I also made it grab the desktop and use that as the first image to transition from, so it comes on smoothly too. And I wrote a couple little transition effects to start. I'm doing them in software in C so they're slow as hell at the moment (at 1920x1200 or higher). It would be kind of fun to do transitions with D3D but I don't really want to worry about my screensaver using the 3d hardware right in all cases that it can come on.

I was pleased to find that MS actually made it pretty nice and easy to make screensavers. Hey it's just an EXE that you rename to a ".SCR" and you can do whatever you want in there.

Anyhoo, I'll post it in a few days once I get a freaking settings dialog box and such done.


It's interesting to me the way H264 is designed. The coder is so simple, it really relies on the good motion compensation + bit allocation to do a good job, and it does very well. What this means is that the when image is easily predicted from motion vectors, you get a very high quality reproduction. When the image does not map to motion vectors, you get noise until it settles down. This is exactly the right way to create error from a human visual point of view. As long as there are still or simple translation elements that our eyes can track, we are very sensitive to error, and H264 does very well in those cases. When something changes in a non-linear way, like it rapidly changes direction randomly, or a new object is created via explosion or expulsion or whatever, then it appears distorted until it becomes linear, but that's exactly when our own visual system sort of fuzzes out the object anyway!

You can see my point better perhaps if you contrast with the opposite way of doing things. Something like motion-jpeg2000 might have roughly the same CPU usage, but it's not doing any motion comp and instead spends that CPU time on a better transform & still image coder. The total error rate is the same (pretend), but the motion-jpeg2000 coder is putting error all over the place, and we'll see horrible wiggles in still spots or simple translating spots.


As a walker and cyclist, I fucking hate hybrids. They creep up on me silently, I pull out to make a left turn and WTF there's a car behind me! I rely on my ears to know when cars are coming up from behind me. They should stick baseball cards in their spokes.


I've been noticing recently that everyone seems even more retarded than usual. I just got a hint what the problem might be. Low dose daily Cialis. Everyone's walking around with a halfie and no blood in their brain.



Fu Manchu emoticon


I just had this idea for embedded unit tests. My big problem with unit tests has always been that they're off in seperate code that's hard to maintain and gets out of date and is too much of a pain to create and people don't do it. Also it's not as valuable as an assert that's right there in the code as documentation when you read it.

Now, I have a COMPILER_ASSERT thing and love that, but it can't call functions, you can only check constant conditions. What if you could do a compiler assert that was right there, something like :

REQUIRE( ilog2ceil(256) == 8 );
REQUIRE( ilog2ceil(11) == 4 );

Now, in the actual compile there would just be a

#define REQUIRE(exp)

that makes those expressions do nothing, but you would run your own custom build step that scans the file for "REQUIRE" and grabs all those expressions and puts them in their own little .cpp file, compiles it and runs it with

#define REQUIRE(exp)	if ( ! (exp) ) { make bad error report, send people email, whatever }

The stuff in REQUIRE might use things in the file, so the easiest thing would be for the autogen to make the test function at the bottom of each .cpp Then call them all from main. The autogen would also have to disable the original main somehow to run its own main. In VC you would make the autogen thing a custom build step as part of your project build, so that a unit test failure would result in a failure to build.

Actually I can do something without autogen that's not bad. With cblib right now I can do :


That executes at startup so it can't make the build fail, but hey I can do it right now without any autogen code madness.

For longer unit tests you can just put them in a function, like :

static bool CheckLotsOfStuff()
	.. blah blah ... do unit testy stuff here ...
	return true;

REQUIRE( CheckLotsOfStuff() );


Followups :

Lomont says you can't rely on the behavior of shifts on negative numbers. IMO if you start dealing with really weird platofrms then you need to make wrapper functions for everything and customize them for each platform.

I've been thinking about negatives and two's complement and such. It's one of those things that's common knowledge and all, but it seems new and confusing every time I think about it.

The easiest way for me to understand it is to think about ints as a ring. Unsigned ints are a ring that go from [0,FFFF] and then loop back around. Don't think about the integer line as a straight line, it's a circle like a clock. You count up to 65535 then +1 and you're back at 0. Similarly you can subtract by going backwards on the clock, so 3 - 7 = 65532. Sometimes I get confused when I'm doing subtracts with unsigned values, but of course subtracting unsigned just means walk backwards around the ring.

Now, signed ints are just exactly the same thing, but offset into the middle of the range. Note with signed ints, 0 is part of the "positive" half of the range. It helps to think of 0 as positive, and the actual middle of the range is at -0.5. So the positives are [0,32767] and negative is [-32768,-1]. This is also a ring, it loops around, when you take 32767 and add 1 you get -32768.

Conceptually you could pretend that a signed value is unsigned offset by half, like S = U - 32768. That's not how the bits are actually stored, but it gives you equivalent operations. It makes it obvious that addition and subtraction and so on produce the exact same values, because you're just offset by a constant amount around the ring.

In terms of data compression if you think about the ring, in unsigned ints, 0 and FFFF are actually neighbors, but they look very different bitwise. As an example, we can consider the JPEG-LS or PNG style bytewise delta operations :

D = A - B;
A = D + B;
These are reversible in 8 bits. That's not completely obvious. The range of D is actually [-255,255] , it would seem we've expanded our 8 bit values to 9 bits. But depending on the value of B, not all of that range is possible. A - B is the delta from B, which is the number of steps on the ring from B in the positive direction. If A is less than B, you'll loop all the way around the ring, eg. 3 - 7 = 252 steps around the ring forward. That number of steps is always in [0,255]. To undo the delta you just take the same number of steps. Obviously you don't want to transmit this because it's offset, but
D = A - B + 128;
A = D + B - 128;
is nicely centered. Again the ring wraps when A-B = 127 to 128 you get a bitwise discontinuity, but that's okay because in compression that should be very rare, and those values have equal probability.

Speaking of probabilities, I wrote a little thing before about how to shift and preserve average by making half round down and half round up. It's an exercise for the reader to make that work on negative numbers too (aka, I don't want to do it myself).

Writing with Chris it made me realize that division has a very weird property with negatives. Int division goes to zero (in C). Shifts always go "down" (towards minus infinity (on x86 anyway)). Int division basically takes the abs, does a positive divide, then puts the sign back (it's not actually done this way of course, that's just the concept). Shifts are actual bitwise register shifts, so the top sign bit gets in. What this means is that integer division actually makes you switch directions of rounding at zero. That never really seemed like a big deal to me, but man it's really weird. For example, if you have a bunch of signed ints and take the average, now you divide them all by two in ints - take the average of the new numbers and multiply that up by two. With real numbers you'd have the same value. With positive ints, what you get is everything shifted down towards zero a bit, so the result is less by 0.5. If you had a bunch of numbers whose average was zero before, the average is still zero! Because half went down and half went up!

In terms of probabilities, if you have a bunch of ints scattered evenly over the int line, and you divide them all by two - the probabilities of the result are no longer evenly distributed! The probabilty of a zero is 3/2 the probability of any other value. All values in the result come from 2 values in the source, eg. {7,6} -> 3 , except for zero which comes from 3 values! {-1,0,1} -> 0. This makes it obvious why you can never use division of negatives for wavelet transforms or any kind of reversible mapping, because you are mapping too many values to zero. BTW in a "deadzone" style quantizer we actually do this exact thing on purpose, we make the zero more likely than anything else.


Ten years ago people were predicting the end of consumer electronics. The argument basically went like this : computers can do everything - play CD's, be a word processor, be a TV, etc. so eventually you will just have one powerful computer in your house and all the consumer electronics will go away. The idea was that the consumer would rather have one multi-purpose device that can do anything rather than a bunch of single-function devices.

The reality is that powerful complex devices like a multipurpose computer don't work very well. It's hard to make good UI's that are right for all those different functions. It's hard to make absolutely sure that they work right all the time and never crash, because they're so complex. And it's hard to teach your grandma how to use them.

The exact opposite of that old prediction now seems to be the trend of the future - rather than 1 powerful computer, you will still have tons of single-function devices - they will just all be computers! A computer to watch TV, a computer for video games, a computer to browse the web, a computer for business software. They'll be very powerful, but very cheap, with special single-function software that's easy to use and reliable. This is of course already happening.


Macros are really the fucking devil. It's unfortunate that they still are faster than inline functions. Maybe on MS compilers with LTCG the inline function is pretty close, but most compilers just don't inline like they should. Macros are hard to use, you have no argument info, when you pass in the wrong thing the error messages are impossible, they aren't type safe which can make them fail very mysteriously, you can't trace in with the debugger; I'm finding myself wasting a lot of time debugging stupid things because I'm using the macros wrong. If I never use another macro again in my life it would be a happy life indeed. It would be ideal to just write inline functions and have them act like macros in the compiler.

I really fucking hate unsigned types. I've got unsigned types all over for no good reason at the moment and they keep giving me bugs where I try to count backwards or something, like

for( ; pos>=0 ; pos--)

Hey I just wrote an infinite loop cuz freaking pos is unsigned and I wasn't thinking about that and I don't want to have to worry about that all the time. IMO it's best to just use signed types unless you specifically need to allow two billion - and as Won points out if you actually are trying to handle two billion in an unsigned int you have to be super careful (eg. never add two numbers that can cause a carry ,etc.).

I really really like parameter checking, but people mostly do it wrong and don't get the point. Checking pointers for NULL, for example, is pretty worthless (unless it's a release-mode check that throws; I'll assume your parameter checking is just an assert). When someone passes in a NULL they will get a very obvious crash, the system provides a nice message for that already, and you can trivially see what caused the problem by going "hey that pointer is NULL". The good parameter checking is for conditions that will cause non-obvious failures that may be hard to debug and may not cause a crash at all, just incorrect operation. For example, things like :

U32 ShiftRight(U32 value, U32 shift)
	ASSERT( shift >= 0 && shift < 32 ); // !! 32 is not valid !
	return ( value >> shift );


void memcpy256(void * dest,void * src,int count)
	ASSERT( abs( (int)dest - (int) src ) >= 32 ); // must be >= 256 bits apart !!

	// ... do 256 bit memcpy

The other thing people often do wrong in parameter checking is go the opposite way and check too strictly on conditions that are just how they think things should be, but aren't actually necessary for the function to get right. Stuff like :

void Matrix::Transpose()
	ASSERT( IsOrthonormal() );

	// ...

where some jackass put too strict an assert because he was thinking you wanted the inverse of the matrix, but that's not the only reason to use transpose. Asserts should check for cases that will cause failure, not checks for being in the expected bounds of operation. Another common kind of bogus parameter check I see stuff like :

float lerp(float from,float to,float t)
	ASSERT( t >= 0 && t <= 1 );

	return from + (to - from) * t;

Umm, no, this parameter check is all wrong. It doesn't prevent any bugs, extrapolating is perfectly valid and will return a reasonable and correct answer.

I also think that result checking is perhaps more important than parameter checking. Especially in cases where a function does something really complex that you're not sure you got right, but is easy to check. For example :

uint ilog2ceilasm(uint val)
		FILD val
		FSTP val
		mov eax,val
		add eax,0x7FFFFF // 1<<23 - 1
		shr eax,23
		sub eax,127

uint ilog2ceil(uint val)
	ASSERT( val > 0 );

	uint ret = ilog2ceilasm(val);

	ASSERT( (1 << ret) >= val && val > (1 << (ret-1)) );

	return ret;

The result check does more than just make sure I wrote my code right. It also implicitly checks the parameters because invalid parameters will cause the result verification to fail. It also provides a constraint for modifications. It makes it clear what contract this function has made with its clients, so that if you try to optimize in the future you know exactly what you must satisfy, and the assert tells you if you screw up. (obviously we'd love compile time automatic dynamic analysis instead of relying on the actual bad case to run and trigger the assert)

Of course asserts and parameter checking and such should never be used at all on stuff that comes from users or artists. That stuff should be manually checked with code and they should be informed about bounds violation etc. We had a pretty good rule at Oddworld I think - any condition violation that can only be generated by coders errors is an assert and you can halt the game if necessary, any condition violation that can be caused by bad data or artist error should be a log/throw/warning & you try to keep running despite the error.


Fucking Eudora 5 handles multiple personalities really badly. When you reply to a mail, it should always use the personality that received that mail as the originator of the reply. It doesn't. Instead the "current personality" is sticky and gets set when you do anything with that personality, including just receiving mail. This is screwing up my new rad personality.

And no I won't use any web mail ever, or any type of mail where they aren't just raw text files on my own computer. I tried going up to Eudora 7 a while ago and there were some things I didn't like but I forget what they were, and I don't recall if they fixed the dumb personality behavior either.

How fucking hard is it to write a decent mail client? It's one of those things I should just do myself. Right after I write a fucking music library browser and a fucking image editor. (yes, I'm looking at you, iTunes and Photoshop).


Wow, some wacky shit has been going on here.

A few nights ago a girl came on my front lawn. I had just gotten home from work around 9:30 and was setting up my laptop and stuff when I hear this loud moaning coming from somewhere. I listened to try to hear where it was coming from, and it seemed like it was coming from the window, so I walked over and looked outside. There was a couple on the sidewalk, arms around each other, and the guy had his hand up the girls skirt, vigorously finger-fucking her. She literally collapsed onto the lawn and he fell down with her and kept going at it for a minute or so with her wailing. My front lawn is not large and in no way is it at all private, there are apartments all around and no bushes or any kind of shelter. It was quite surprising.

Tonight I watched my upstairs neighbor buy drugs. I was standing near the front window lifting weights and watching TV. I hear neighbor come pounding down the stairs, and just as he does a car pulls up in front of the building and drives right up onto the curb (our street is only one lane wide, there's nowhere to pull over). Upstairs neighbor gets in the passenger side of the car and they just sit there. I can see very clearly into the car whenever the door is opened because the light comes on, but it's less clear once he closes the door. I see the driver go into the center console and pull out a ziploc bag. Upstairs neighbor rustles around in his pockets. After 30 seconds or so, the passenger door opens again and I see upstairs neighbor making sure what's in his pocket is definitely inside. He gets out and comes back in, car drives off. This explains the coming home at 3 or 4 AM and clomping around on the floor above my brain. I just wish he would get on heroin instead of coke.


Block-based compressors can be lapped and use fancy coders and all that to compete with wavelets in most every way. But there is a really nice thing about wavelet compression as opposed to block-based. It's the resolution independence. If I take a photo at 1024x1024, and take the same photo at 2048x2048 and compress them both - the wavelet coder has the smaller one as a subset of what it does; it basically just keep doing the same thing, it still codes the LL DPCM band at the same resolution, 32x32 or whatever you chose. In contrast, the block-based coder is now working on very different data. It still works on 8x8 blocks, but they are now half as big in real space, they contain detail at a different scale. The DC coefficient set is now 256x256 instead of 128x128 and the DPCM properties of it are quite different.

One thing this means is that the block based coders that are super tuned (like JPEG,MPEG,H264) are actually dialed in for a certain era and a certain typical image resolution.


Vector dot products and matrix multiplies need to be accumulated in doubles. This is super obvious and I'm sure all the scientific computing people are going "duh" but I bet 99.9% of game engines don't do this. A matrix multiply is just a bunch of dot products, so let's consider the dot product :

dot = ax * bx + ay * by + az * bz;


dot = x + y + z;

since all that really matters here is the accumulation. I'm going to assume that this is done like :
dot = x + y;
dot += z;
though the exact order of summation depends on the language convention and compiler and whatnot. The really bad case happens when 'x' is small, 'y' is large, and 'z' is large and nearly exactly negative 'y'. When you compute (x+y) the precision in x is killed, then you add on 'z' and it cancels y, but you get something that's nowhere near where it should be (in a relative sense), eg. (x+y)+z is way off from x+(y+z).

This may seem rather obscure and arcane but in fact it happens all the time in very common cases. When 'a' and 'b' are two vectors that are mostly in the YZ plane (small x component) and are nearly 90 degrees off each other, this is what happens.

Its also super trivial to fix and the performance cost is near zero on platforms where the native float is double anyway like the PC.

I've had so many problems in the last few years with mathematical code failing due to precision, that I now just use doubles everywhere until I see some very good reason to go to floats. Stuff like the SVD and PCA can very easily lead to total garbage if you don't have enough precision in your temporaries. I learn the lessons of coding over and over. Don't prematurely optimize. Always be extra safe and robust until you have a good reason not to be. And check your results even when you're sure you don't have to. I only know that I have had precision problems because I do things like after I get the singular vectors out of the SVD, I go back through and dot them all against each other and assert that they're actually orthogonal to each other.


I've been trying to figure out what's happened in data compression in the last 10 years. Here's my quick run-down :

H.264 : the new video codec; a very simple 4x4 transform used with highly tuned quantizers and predictive coders, and very sophisticated motion estimation and bit allocation. Many different profiles support different bit rates. There are *TONS* of papers on this beast and every aspect has been highly tuned.

PPMii / PPMd / PPMonstr / Durilca : stuff by Dmitry Shkarin. This was a while ago, it's the successor to PPMZ and very good. Some code and papers are available. One of the things that he does that everyone does now is detect data types and do some special munging to make the data more compressible.

PAQ by Matt Mahoney : bitwise coder with lots of context models and weighting, sort of like CTW, but really more like PPMZ. Uses SSE similar to the SEE in PPMZ. Model weighting is well done, uses many types of models such as models with skips. The best compressor at the moment, very slow. Bitwise compressors like PAQ are somewhat easier to write than byte-wise, because you don't have the issue of escapes. Escapes and sparse contexts are really hard to get perfectly right, with bits you always have a 0 and 1, you just need to put a probability on each.

Preprocessors & dictionaries are super common now. Lots of people use a variant of an "LZP" preprocessor, Shkarin and Mahoney both have one. Most of the top compressors will decompress already packed data so they can recompress it, they do an E8E9 transform on exes, and most are conditioned with some training model. (E8E9 transform changes relative jumps to absolute jumps; that way function calls to "strcpy" or whatever always look the same in bytes, so they're much more compressible). Most compressors also detect binary data semi-automatically and use skip-contexts.

ROLZ : aka "reduced offset LZ" ; apparently this is in RKive and many of the Russians are using it, eg. I believe it's in 7-Zip. I haven't found any actual papers or good descriptions of how they do it exactly, but the basic idea is to do something like an LZP to build a list of only a few possible match indexes using context prediction, write a code to select one of those with context modeling.

There are a bunch of big packages with source code, such as 7-Zip, LZMA, FreeArc, BALZ, and I'm sure many more. While this is cool, they seem to be huge messes with little or poor documentation. Holy crap the Russians write really god awful unreadable code. A lot of them look they were intentionally written for an obfuscated C coding contest, stuff like :

m = H(++*p, i & ( -i >> DS )) && *++p | ( dec(m.[(i>>MS)&0xFF) + t );
These are some actual lines of code from Shkarin's PPMD (and seriously, he is definitely one of the cleanest Russian coders) :
    SubRange.HighCount=(SubRange.scale += (SubRange.LowCount=LoCnt));
        psee2c += 2*(2*NumStats < t+NumMasked)+Flags;
        (FoundState=p)->Freq=(HiCnt += 4);  SummFreq += 4;

In the super high lossless compression area, there's a bunch of talk in the Russian forums such as encode.ru . It's super hard to actually find the good information in there though.

Lapped image transforms. Malvar's page at MS is a good place to start on this. Basically improves quality & compression of block based image coding by using basis functions that extend past the blocks. An alternative viewpoint is that a Lifting based smearing filter is run on the blocks at decompress, and the inverse filter is run at compress.

There's some new interesting stuff on blocksort. Jurgen Abel after BWT , and Michael Maniscalco has a bunch of stuff.

For lossless image compression, it seems to be stuff like GLICBALS and MRP , basically still DPCM but with rigorous linear predictor training.

There are some new arithmetic coders. One I found is "rc.c" in the MRP distribution which is called the "Russian Range coder". It's slightly different than the standard Schindler implementation that I use, but seems to preform almost identically. Another one is coder used by PAQ, the best one seems to be "fpaq0p". Again it's a very slightly different binary arithcoder implementation than the one I use but seems to perform almost identically. The big new one is the strange paper by Jarek Duda called "Asymmetric Binary System". The paper is totally unreadable IMO, it's more useful to try to find people's notes on it around the web, such as this .


All the theory of digital signal processing can get a little silly. People do all this analysis of the aliasing properties and frequency response of these various filters, they come up with these massively complex things like using sinc functions windows by Kaiser-Bessel functions - and then you wind up using it to change an image size by 2X which means you actually only evaluate those functions at 5 discrete values. So it's just a table of 5 floats that looks sort of like a hump.

I'm doing a little lapped image transform. To do a 4x4 DCT you need an 8x8 lap window, but by symmetry and reversibility it's separable to 8 taps, 4 are just mirrored, and half of those are constrained by the mirrored sum = 1 rule, so there are only 2 free floats. So your lap window can be a kaiser or a sin(sin^2) window, or you can just parameterize by two floats and tweak them to maximize PSNR and subjective quality.


I'm pretty fucking delighted with safeprintf. I forget that I'm even using it most of the time, since I mapped the names "printf" and such to redirect through my safeprintf. And then today I run my app and it says :

safeprintf type error; wanted (int), found (float)
safeprintf err in fmt "got option : quantizer = %d

but then it just keeps rolling like nothing happened!


So my main project for RAD that I'll be on for the next N months is a big file system / data packaging / DVD / paging thing. If you are a game developer and have thoughts on your content pipeline & loading system, I'd like to hear it; particularly if you like your pipeline and think it would be good for me to learn from, or if you have some ideas on how things should be, or if you think it really needs to handle a certain something for you to use it.


So I did my first project for RAD, which was using the PCA to find the KLT for color conversions. We're also going to be doing optimal channel decorrelation for audio, and in fact it can be used for other things, such as coordinates in geometry; basically any channels of data that are correlated is interesting to consider.

A comprehensive list of (decent) color conversions for image compression :

General note : color conversions which involve floats should & can be used for lossy image compression, but require a whole pipeline made for floats; many color conversion scale the components so again your quantization system must be aware of that. Lossless color conversion that go int->int are easier to analyze. They generally use lifting of some sort.

YUV or YCbCbr like in JPEG ; this is lossy and crappy.

KLT : matrix multiply by the PCA of the color planes of the image.

fixed KLT : constant KLT matrix optimized on the Kodak image set; you can find this in the FastVDO H.264 paper.

DCT3 : 3-sample DCT. Whoah, when I thought of this it gave me a huge "duh" moment. It's actually a very very good float->float matrix color conversion. Implemented as just a 3x3 matrix multiply. In fact there's a theoretical result that the DCT optimally decorrelates any data which is just Gaussian noise with order-1 correlation, and in fact hey color is very close to that.

YCoCg : both lossless and lossy versions. See Malvar papers at microsoft research or his H.264 submission. BTW this is equivalent to doing Haar[R,B] then Haar[G,B] (the Haar acts in place like {x,y} <- { (x+y)/2, y-x })

CREW aka RCT aka J2K-lossless : older lossless transform; pretty much always worse than YCoCg

FastVDO : another lossless transform proposed for H.264 ; matches the KLT slightly better than YCoCg , but actually usually codes worse than YCoCg.

This last one leads me to a general issue that was somewhat confounding :

Decorrelation is not the same as real world coding performance. That is, the most decorrelating color transform (the KLT) is not the best for coding in most cases. In fact, the KLT was quite poor. I did come up with some heuristic tricks to make a pseudo-KLT that does code quite well.

There's a theoretical measure of "coding gain" and the KLT maximizes that, but when run through real coders it falls down. I'm not sure at this point exactly what's happening. I have some theories; one issue is that the original RGB is not a Gaussian float, it's integers, so things don't behave smoothly; for example, long long ago I wrote on here about how D(Q) is not smooth in the real world, that is the distortion for a given quantizer does not increase monotonically with Q; it has special peaks when Q hits rational numbers, because those values map ints to ints better. All the theoretical literature on rate-distortion is almost garbage because D(Q) and R(Q) are so non-smooth in the real world. My other theories are that the oblique rotations the KLT sometimes takes is essentially making the bottom bit random which is hurting the spatial prediction of later coding stages.

One interesting case for games is compressing images with an alpha channel. In that case, the alpha channel can be losslessly predicted from a linear combination of RGB, which is a very good model of many alpha channels, which leads to them being packed in only a few bytes.


Seattle's really quite beautiful all summer long; our neighborhood here is delightful, tree lined streets and old houses. There are tons of great parks and we've been tossing the frisbee when I get away from work.


WTF , there's no fucking perforce option to tell it to NOT FUCKING CHANGE MY LINE ENDINGS !! That's so fucking retarded. Those guys definitely suffer from the deplorable mania of "we know the Right Way and we don't care if 99% of our user base wants something different, we refuse to give it to them". Witness for example their refusal to provide some candy syntax for a rename despite it being the #1 question in the FAQ for lo these last 10 years.


Int division takes you toward zero; shifting right takes you toward -infinity :
-1/4 == 0
-1>>2 == -1
-5/4 == -1
-5>>2 == -2

To divide an int by 2 and preserve exact average of a bunch of ints, you need to push the odd values up half the time and down half the time; this is :

( val + ((val>>1)&1) ) >> 1

One way to get rid of signs is to interleave the positive and negative halves of the lines : (0,1,2) -> (0,2,4) and (-1,-2,-3) -> (1,3,5)

if ( val >= 0 ) return val*2;
else return -val*2 - 1;
I wouldn't actually code from this directly, however, what you can do is first code the 0 or not-0 flag, then code the sign bit, which is just (val&1) , then code the magnitude :
if ( positive ) mag = val>>1;
else mag = (val+1)>>1;

To code a value with a Laplacian probability distribution ( P(x) = c * r^x ), simply using a Golomb code with W = average ; this code is :

top = value / W
bottom = value mod W = value - top * W
send top with unary
send bottom raw


Some further little notes about PCA :

The primary principal component is just a linear fit to the data; then the other axes are perp and point in the next primary axes of variance. This is why a lot of people heuristically think of it as the "best fit" to the data.

For doing 2-means or BSP splitting, the primary principal component of course tells you the axis to split along. To find the best BSP plane, you can search along that axis in O(NlogN) by sorting all the data by their dot along that axis and trying the plane between each consecutive pair of points. Since you're incrementally stepping you can just increment to count the number of points on each side, as well as the mean & sdev of points of each side.

You should remove the means to calculate the covariance, but of course you don't have to remove the means to transform the data. The mean is just multiplied by the basis transform and becomes the mean in the new coordinate system. If you do remove the mean, then of course the mean in the new space is also zero.

You should *NOT* divide by sdevs in the covariance. The true "correlation" does divide by sdevs to give you a relative correlation measure that's scaled to [-1,1], where 1 = 100% correlation. The "covariance" without the sdevs divided out will have units of [value]^2 , just like the variance. A good thing to display is the *root* of the covariance, since that's in [value] units. I actually find root-covariance a more useful measure than the true correlation, because it tells you how significant the value correlation is in the original scale, and it won't exaggerate up small wiggles; obviously this depends on the problem you are solving. eg. [4040404] [1010101] has a root-covariance of 2, a correlation of 1.0 , while [808080808] [80808080] has a root covariance of 8.


Some things naive wavelet coders get wrong :

1. Extra truncations that just throw away data. One way to say this is they basically do math in floats and then convert to int, and do it multiple times. Often they do their math "in ints" and right shift away extra bits, which is the same mistake. The easiest way to get this right is just to read in your input image to floats, do your color transform, your wavelet transform, all in floats. Then when you quantize, quantize float->int and do your coding on ints. (of course you could also just code the floats, instead of doing bit masks for bit plane coding you check if value >= thresh).

2. Quantizing wrong. Quantizing is pretty simple if you think about it in terms of buckets and medians. You're taking a certain range and labeling that with an int. When you dequantize back to floats, you should restore to the middle of the range. (actually you should restore to the most likely value in the range, which might not be the middle in a typical skewed distribution, but that's a very minor detail). Many wavelet quantizers intentionally use an ESZZ (Extra Size Zero Zone) quantizer, which is an artificial way of getting more zeros, which is generally a win because the coder likes lots of zeros - but an ESZZ is wrong on the DC LL band !. When you have zeros you don't need to send sign bits. One thing people frequently get wrong is handling negatives wrong; you have to be careful about trying to divide negatives and do float-to-ints and all that; I find it easiest to take the sign off and do my math then put the sign back.

3. Not accounting for norm scaling. If all your transforms were orthonormal you wouldn't have to do this, but most wavelet transforms are not. What that means is that a value of 1 in the LL and a value of 1 in the HH do not have the same L2 norm after untransforming. That means you can't quantize them with the same value, or use the same bitplane truncation. Quantizers need to be scaled, as do any rate-distortion functions.

4. Not sending the most important bits first (assuming you want an "embedded" aka truncatable stream, which you do). Obviously when you are planning to chop off the end of your data you need to sort it by importance. To really get this right you need to be able to shuffle in different bits. You want to send the Y LL first, then the U LL, V LL, then the Y LH, etc. but not necessarilly in a fixed order; in some images you might want to send the whole Y band before you send any U. That's global level stuff that even value-based coders should do. In the extreme case you can do things like the bit-plane coder EBCOT that splits the bits of a row into chunks to try to get the most important bits first in the stream. Note that each bit of the compressed stream always has the same amount of information; what we want to do is put the bits that have the biggest L2 norm significance first; that is to say, bits that wind up affecting a large region of the image with large value changes.


There are two ways to think about the Haar transform. Haar takes two values {A,B} and makes the average and difference. Naively it seems like this might be impossible to do losslessly in integers because the average has a divide by two. It's crucial that you make average, not sum, because you don't want "bit expansion" (that is, requiring one more bit each time you do a Haar). BTW Haar aka S-transform.

One way is this magic sort of trick with where the LSB goes. You form

L = (A+B)/2 
H = B - A

(all divides floored on integers); how do I get back my original values? The trick is getting the bottom bit right, and the key is that when H is odd it means you need to set a bottom bit on either A or B; if H is positive you should put it on B, if negative put it on A. This means :

A = L + (1-H)/2
B = L + (H+1)/2

An easier way to see it is via the generic lifting/predictor mechanism. You can always create a transform that is reversible by taking a series of predictor steps. One predictor step is :

A -= P(B)
You predict A only using B, and subtract it off; you can invert it of course just by doing +=. To make the Haar transform this way you do :
B -= A;
A += B/2;
To invert you flip the signs and reverse the order :
A -= B/2;
B += A;

P() the lifting predictor can actually be any function, it need not be linear to be reversible, but if we restrict ourselves to linear transforms, then it can be written as a 2x2 matrix multiply :

A = [ 1 X ] [ A ]
B   [ 0 1 ] [ B ]
Any transform like this with 1's on the diagonal and half the matrix zero is a valid lifting transform. This is a "shear". One useful fact is that any rotation can be written as a shear. In particular :
[ c -s ] = [ 1   0 ] [ 1  -sc ] [ c  0  ]
[ s  c ]   [ s/c 1 ] [ 0   1  ] [ 0 1/c ]

t = (c-1)/s
[ c -s ] = [ 1 t ] [ 1 0 ] [ 1 t ]
[ s  c ]   [ 0 1 ] [ s 1 ] [ 0 1 ]
Form A uses a scale first which is not reversible, form B is pure shears. Form B is good except when the sine "s" becomes small, in which case the temp value "t" becomes very large. Even ignoring divergence, a large "t" value is very bad for accuracy and bit expansion. You don't really want to be doing lifting shears with multipliers much bigger than 2.0 ; fortunately, when "s" is small, "c" is very close to 1 (or -1), which means the scale term in form A is nearly identity. In that case we can just approximate it as exactly one and not do the scale step at all. If we drop the scale a more accurate lifting rotation is just :
near s = 0
[ c -s ] ~= [ 1 0 ] [ 1 -s ] (* -1 if c < 0)
[ s  c ]    [ s 1 ] [ 0  1 ]
Using this, any matrix that can be written as rotations (eg. any orthonormal matrix) can be decomposed into lossless integer lifting steps.

There's a beautiful paper by Srinivasan called Modulo Transforms - An Alternative to Lifting . I don't think it's actually very practical because division and mod are so expensive, but it is a very nice explanation of integer transforms. The description of the quantization lattice and volume expansion is the best I've seen.

Let's look at the Haar again. It starts as a 45 degree rotation :

[ sqrt(1/2)  -sqrt(1/2) ] =  sqrt(1/2) * [ 1 -1 ]
[ sqrt(1/2)   sqrt(1/2) ]                [ 1  1 ]
This is orthonormal, which means it's L2 preserving, and BTW it's what the KLT would tell you to do if your data is 100% correlated; that is, the fit is through x = y.

To do this on ints, the scaling factor sqrt(1/2) ruins int->int , so the easiest thing to do is just don't do the scale :

[ 1 -1 ]
[ 1  1 ]
BTW this is called the "Hadamard transform". It's int->int, but it's expanding. It uniformly expands the L2 norm by *2 (the determinant of the matrix is the expansion factor). On ints, this expansion means that half of the output values in the valid range are impossible. In particular the two outputs always have the same bottom bit. In the Srinivasan paper he has some nice pictures of how int transforms give you a lattice of valid values with lots of blanks. Obviously we can get rid of the volume expansion by quantizing the output space. This just means dividing one of the two outputs by 2.

This is where it gets subtle. We can divide either of the two outputs by 2. Both choices eliminate the volume expansion, but neither choice is L2 preserving, and the L2 norm of the two choices is different.

V(x) = < x^2 > is the variance of x assuming zero mean
< y > means average of y

The Hadamard expansion is :

V(a+b) + V(a-b) = 2 * ( V(a) + V(b) )

The normal Haar (quantizing the sum) gives variance :

V( (a+b)/2 ) + V( a-b ) = 5/4 * ( V(a) + V(b) ) - 3/2 * < a*b >

Quantizing the difference gives :

V(a+b) + V( (a-b)/2 ) = 5/4 * ( V(a) + V(b) ) + 3/2 * < a*b >
Note that quantizing the difference is the same as negating one of the values before doing the Haar.

You can see the Haar transform is not variance preserving. It reduces the variance if the correlation between the values is high enough, in particular :

Haar reduces variance if :

< a*b >  >=  (1/6) ( V(a) + V(b) )

This is interesting; the Haar may actually be better than the original 45 degree rotation it approximates. There is some implicit data modelling going on in the Haar; like most wavelets it is not norm preserving, not orthonormal, and it behaves well (reduces norm) when the input data fits the predicted pattern.

Addendum : I found another nice paper ( PL Haar PDF ), on something called the "PL Haar" (Piecewise Linear Haar) ; the PLHaar transform is itself interesting, but aside from that they go over the same points I'm talking about here re: the Haar and range expansion, etc. , and they have some nice pictures.

BTW you can see the implicit modeling I'm talking about in the PL Haar very neatly as well. By design, the PLHaar preserves Linf norm, not L2 norm. L2 norm is (roughly) what matters for coding. In the square domain, stuff in the corners has the max L2 norm, and stuff along the sides has the lower L2 norm. By doing the PL 45 degree rotation, PLHaar takes the large L2 norm stuff along the x=y lines and rotates them to the sides of the matrix, giving them the lower norm.


I've been working way too hard; I'm having trouble getting anything done, partly because I'm really uncomfortable in the new coding environment, and it makes me want to just work harder to get past the problems. I have this personality flaw where when I'm focused on some problem, I just can't stop thinking about it. I wake up first thing in the morning and todo lists start popping into my head and I feel antsy until I start doing them. When I finally decide to quit working for the day, my brain is fried, I'm exhausted, I can hardly manage a grunt hello for my sweet girlfriend, and then the coding problems start going around in my head again. It's never pressure from the job that does this, it's the obsessive in me.

I miss the creature comforts of home at the office. The home is really a lovely place to be, I've got natural light, wood floors, windows that actually open for fresh air, and also crucially a kitchen. I miss being able to stop coding and go cook myself something. I've found myself feeling miserable at the office, and I think it's largely the light and air. I'm very sensitive to environments; of course it's my own fault for staying inside there too much.

I think the ideal work situation for me would be lounging in a lush courtyard like some greek philosopher, snacking on olives and almonds in the warmth and fresh air, chatting with other wise men, tended by attractive youths, and occasionally doling out words of wisdom to customers.

Universities have pretty sweet offices in the older buildings, courtyards or lawns around, lots of trees and windows, cozy rooms for the offices with lots of brick and wood and windows that open; they feel warm and cozy, where you could hole up for hours and puff on a pipe and think about general relativity.


URGH why in the fuck is the windows Task Manager not always resident!?!? An OS should have a tiny super-stable core that maybe even lives in ROM. Then there should be a next super minimal layer that can run very simple apps, and stuff like a light task manager should live in that layer. Then all the complicated stuff can get built on that, and the complicated stuff can crash and be all kinds of mess and never take down the little layer that lets you still control your machine.


Anybody out there have some large lossless test images? I'm having trouble finding good ones; all the old test images are like 512x512 and not really representative of modern content. Maybe some of you photographers that take RAW format photos could send me some.

Ah, Won pointed out to me that Google has a filetype thing in image search so you can ask for big PNG's.


Yay, our media machine is sort of working. I bailed on the LCD and am just giving up on TV for now, I'll keep using my TiVo for what tiny bit of real TV I watch. But I have video out and the music library interface of MediaPortal is pretty okay.

MediaPortal has a really nice crossfade when you fuck around with music; it's so obviously the right thing to do, like when you skip or mute or anything it's not abrupt. The default setting is like 4 seconds which is retarded awful laggy, but you can turn that down to like 0.5 seconds and it's nice.

The LEDs are retardedly bright; I need to just unplug them. All this makes me appreciate my old analog amp even more. It's got nice subtle LEDs. It's got a heavy smooth big knob for the volume. All the buttons are switches that actually flip so you can tell which way they're set and they feel solid. And there's never any incompatibility problem. Audio comes in on wires. Audio goes out on wires. That's it.



Known Problems

      (Bug #14194) 
      Error adding files already under source control to project: 
      "chmod: : 
      The system cannot find the file specified."

      Workaround: Quit and relaunch the IDE, open the project,
      and revert the file, then retry.

Translation : "our shit is totally fucking broken, but we're gonna release it anyway. Deal with it."


I can't get Visual Assist X to behave like Visual Assist .NET ; it keeps wanting to either not do completion or do this god-awful fucking listbox popup thing. For the moment I've reverted back to using the old version, but if there's a VAX user out there who can help me, please do.


WTF, Yelp just closed my account. As best I can figure out, I wrote some negative reviews of places, and the people who run those businesses went on a Flag war flagging me like crazy to get the reviews removed and my account banned. I'm torn between just saying "fuck them" and staying away, or making a bunch of fake accounts and messing with them.


I'm sitting at home waiting for the traffic on the 520 to die down. I've been ready to work for an hour. This fucking sucks. I like to wake up in the morning and start working right away.


"At work" is like the opposite of "in bed" ; you can stick it on any phrase and it instantly becomes uninteresting and unappealing. Like :

"I have some good ideas of things to do" ... at work.

You get excited for a second, hey he has some good ideas, oh, it's at work, bah, boring.

"Something funny happened today" ... at work.

Oh dear god please no more work stories they are not funny.


We were talking about SVD and PCA and KLT and such yesterday and it made me realize I'm still a little fuzzy on that stuff and it could use a really simple explanation. They're actually all exactly the same thing, and they're also closely related to eigen decomposition. Basically the PCA (Principal Component Analysis) or SVD (Singular Value Decomposition) gives you a new coordinate space (aka a set of basis vectors) that decorrelate the input data and concentrate as much energy as possible on the primary axes. If you like, they find the subset of the space that your data primarily spans, and the magnitudes tell you to what extent your data really needs all the extra dimensions. If you prefer to think in terms of transform theory, they construct the best possible linear matrix transform for decorrelating and concentrating data (like what the DCT does for JPEG). The KLT is just the name of the transform when you use the PCA vectors as your matrix transform.

As usual the Wikipedia pages are good. I'm going to be a bit sloppy with my maths and just try to hand wave a bit so you get the concept.

After I posted this Ignacio pointed me at a very cool tutorial by Jon Shlens : his page contains that and some other very cool stuff; it's a good tutorial, similar to mine but more correct and it's much more readable since it's PDF and all that.

There are two ways of thinking about this. One is you have a bunch of K-element vectors, N of them, it's like a bunch of samples of an experiment; you can of course put them together in a KxN matrix; in this mode you are usually looking for K-element basis vectors. The other way to think about it is just that you have some KxN matrix and you wish to find an approximation of it. In either case we'll call this matrix of KxN things M.

You need K basis vectors to span your K element vectors, of course. You will also get K N-element "cobasis" vectors. Your basis transform on your vectors will be a KxK matrix. Obviously the identity matrix is no transform and it corresponds to using a basis like {1000},{0100},{0010},{0001}. That is, instead of thinking of your data element as {a,b,c} think of it as a*(100) + b*(010) + c*(001). That original basis is not necessarily the best, or even the "right" one.

We can think about this in terms of data compression. For example maybe you have a sound file with K tracks and N samples in each track. If you just compress the tracks one by one in that space, there's a lot of correlation between the tracks you're wasting. What you want is to find a different basis in which all the same parts of the tracks are mashed together and each track only has the new unique bits of that track. Then, for example if two tracks are exactly the same, a whole track will be zeros. In general you can model any linear relationship between tracks with a linear transform like a matrix multiply, eg. stuff like D = A + B - C can be completely modeled out.

The best possible linear basis transform is the PCA. The PCA finds the "principal components" of the set. These basis functions are orthogonal and completely decorrelate the dimensions (in the strict linear correlation sense; more on this later). We can figure out the PCA from this :

The "covariance" of dimension i with dimension j is :

Cij = Sum(n to N) { M'in * M'jn }

(M' = M with the mean of each dimension subtracted off, note this is almost the "correlation" but the sdevs are not divided out). I'm going to drop the prime on the M and just assume that you took out the mean from now, but make sure you do that.

Of course we can just write this same thing in matrix notation :

C = M'^T * M'

^T = Tranpose

The diagonal elegents of C are the variances of the elements (the true correlation with yourself is 1). C is symmetric, and the off-axis elements tell you the correlations of one dimension with another. We want a basis transform that makes the cross-correlations zero. That is we want to find :

D = B * C * B^T

where D is some diagonal matrix and B is a basis transform.

For those who know something about eigenvectors you can see that the basis transform we want is just the eigenvectors of C.

The eigenvector has the property (by definition) that

C * e_i = k_i * e_i

That is, multiplying by C gives you the same vector scaled by some number. That is, C does not rotate the eigenvector at all, it stays in the same direction. That means if the eigenvectors were our basis vectors, then multiplying by C on any vector would only scale the coordinates in each dimension, it would not ever munge one coordinate into another.

Any matrix can always be written as the outer product of a spanning set of vectors. That is :

C = Sum{ij} Cij * |i> * where |i> is the ith basis vector and is the jth basis vector transposed.

If we write it in terms of the eigenvectors, then C must be diagonal because we know C turns e_i into e_i , thus all the off-diagonal terms must be zero, so

C = Sum{i} k_i * |e_i> * Since = delta_ij , that is they are orthonormal. The basis matrix B that we wanted is just all the eigenvectors stacked up, and the diagonal matrix D, that we get, is just the eigenvalues down the diagonal.

So, we see that the eigenvectors of C are a diagonal representation of C, which means that the non-self correlations are zero, which is exactly what we want, which means this B is exactly the basis transform we want for PCA.

Now, the eigenvalues also tell us how much energy is in each axis. If you look back at the definition of the correlation matrix, you see the diagonal terms are just the L2 norm of that dimension. Now if we look at C in terms of eigenvectors we can see the eigenvalues of C are just equal to the L2 norms of the data in that direction. If we sort the eigenvalues so k0 is the biggest, k1 is the next biggest, etc. this is the PCA.

Now let me look at the exact same thing a completely different way in terms of successive approximation.

You want to construct basis vectors which are orthogonal and decorrelating. You want the 0th to have most of the signal energy, then the 1st the next most, etc. You can do this just by learning each one one by one.

First find a basis vector b such that the linear correlation of b to the vectors in M is maximized. That's actually the same as minimizing the least square error of a fit with arbitrary coefficient :

Err = Sum{i in K,n in N} ( (Min - an * bi)^2 }

for some vector a ; b is a basis vector that has K element, a is a "cobasis" that has N elements. Either you can think about this as trying to find b and the a's are just arbitrary coefficients, or you can think of it as trying to approximate the matrix with an outer product of two vectors.

This gives your first b. Then take |a> The result that we get is a representation of M in terms of a sequence of outer products :

M = Sum{i} |ai> * The a are N-vectors and the b are K-vectors, our bases. If we normalize the a and b and pull out their lengths this is :

M = Sum{i} si * |a'i> * or M = A * S * B

Where S is a diagonal matrix of si values, A is NxK and B is KxK. Now C the correlation matrix = M^T * M = B^T * S * A^T*A * A * B = B^T * S*S * B

This is exactly like our eigenvectors decomposition, except we have S*S instead of the eigenvalues. Well they must be the same thing. So these B that we found by successive fitting are in fact exactly the same as the eigenvectors of the correlation.

Well this successive fitting is just the SVD, the Singular Value Decomposition. The S are the singular values - and we see now that the S are the square root of the eigenvalues of the correlation matrix. Or rather, if you square the singular values you get the L2 norm of the data in each basis, and they are sorted from largest to smallest.

What we have found is a K -> K orthogonal basis transform that takes us to a coordinate space where the dimensions are linearly decorrelated and as much energy (in an L2) sense is in the lowest dimensions.

This is often done as a way of reducing the number of dimensions you have to deal with. You can see how much L2 energy is in each dimension, and just say I'll use enough dimensions to capture 99% of the energy and just ignore the rest. It also lets you see when you have redundant dimensions. If one dimension can be written as a linear combo of another dimension, that's equivalent to saying you actually have lower dimensional data embedded in higher dimension space, like your data is actually planar but it's embedded in 3d space. The PCA vectors will be two primary vectors in the plane of the data, and one vector with very low energy orthogonal to them.

When this is done in data compression, it's often called the Karhunen-Loève transform (KLT). PCA/SVD/KLT is obviously good for data compression not just because it docorrelates (linearly), but also because it puts as much energy as possible up front and then you can just truncate the stream.

The KLT is orthonormal, which means it's just a rotation. Orthonormal transforms are not required for reversibility; obviously any matrix that has an inverse is a reversible transform. The nice thing about Orthonormal is that the L2 norm of the data is preserved, which means that the magnitudes of the transformed data is in a space you can understand and compare to untransformed norms. That means you can measure the distortion in transformed space, you can quantize in transformed space, etc. and not have to worry about how that scales to the original space. In fact Wavelet transforms are not orthogonal, but are reversible, and the "short time fourier transform" that's sometimes used for audio compression is not even reversible the way it's often implemented. I should also note that if your input data is in integers, an arbitrary matrix may not map ints to ints and thus not be reversible. However, since the KLT is orthonormal it can be turned into a lifting transform that does map ints to ints, at the cost of not exactly decorrelating. (actually, most Wavelet transform implementations are not norm-preserving, which is fine, but the vast majority of implementations that I've seen screw up and fail to account for that when doing quantization and measuring error).

It's interesting to compare the KLT to something like the Fourier transform. The KLT has the property that the best subset of the data is the prefix. That is, the best possible 4 value representation is just the first 4 values, not ever the 5th or the 6th. (this is true over the whole set that the KLT was made for, not necessarilly true on any one given sample).

The Fourier Transform, or more correctly the DFT, is also a basis transform that can be written as a matrix transform. (we don't actually implement the DFT as a matrix transform because a matrix multiply is N^2 and the DFT can be done in NlogN due to the magic of the FFT). The Fourier Transform also has the optimal prefix property (called the "least squares property") - that is the first K terms of the DFT are the best possible K terms of any trigonometric transform (that is, no later terms could be substituted). This is very useful because it means we can truncate the terms correctly, the first ones are always the most important. However, unlike the KLT, the DFT is not necessarilly fully decorrelating, and that means that the energy in the lower dimensions is not maximized.

You could imagine taking a ton of images and gathering all the runs of 8 pixels into samples, so you have K=8 and N samples. Do the PCA to find the best spanning 8-vectors and use these instead of the DCT in something like JPEG. What you would find is that your PCA vectors would look very much like the DCT basis vectors. And you broke the magic NlogN fastness of the FFT.

BTW people in video games have long used the eigenvectors of the correlation matrix (aka the PCA) as basis vectors to find bounding volumes around points. This is complete crap. The PCA has absolutely zero connection to what axes would make good bounding volume axes. There are established very good algorithms for fitting bounding volumes that work in totally different ways.

Why don't people use KLT transforms more in data compression? Well, for one thing, you have to transmit your basis. If you use something like the DCT the basis is implicit, you're assuming some knowledge about the type of data you're working on which saves you bits in the stream. Also obviously it's slower, though the slowness on encode isn't as big a deal as the slowness on decode (there are lots of applications where slow encode would be fine). The main reason is that getting your transform perfect is actually not that important to modern data compression. If you are using a fancy quantizer and a fancy context coder on the back end, then mistakes in the transform can largely be hidden. For example, if your transform is not perfectly decorrelating, your entropy coder can largely model any correlation that's left over and hide that mistake. The simpler your back end is, the more your transform matters.


I want to set up a server at home so I can get to my home machines from work. It's somewhat complicated by the fact that I don't have static IP's at home (though I do have an IP at cbloom.com that maybe I could redirect to my home machines somehow).

There seem to be a bunch of remote control apps that are sort of intended for this. GoToMyPC is the old craptastic one; the newer ones are LogMeIn, WebEx PCNow, and BeAnywhere.

Really I don't want a custom web interface; I want to hook up to my home machines over regular Windows file sharing. I guess I could have my home machines dial in to the VPN before I leave then I should be able to get to them, but that messes up the network at home and relies on me making sure I put them in the VPN whenever I want them.


I went in for my first day of work today. Ugh, work is tiring. I fell right back into bad work habits : I got really thirsty and had to pee really bad, and rather than get up and get a drink and pee, I kept sitting there working away at the bastard computer. On the plus side, the work situation is really good and I'm excited to do some good coding.

I set up a VPN to RAD on my laptop, and I've been having this problem that the fucking VPN connection dialog randomly pops up when I'm home browsing. Fucking Windows. This happens because VPN is treated like a dial up adapter, and Windows has junk in it to automatically dial your dialup connection whenever you're disconnected. The best fix is to go to Control Panel -> Internet Options -> Connections and select "Never Dial a Connection".


GLICBAWLS is a really sweet little lossless image compression by B. Meyer (the guy who made TMW). It's a beautiful, simple, elegant paper (get the PDF here ) ; the only thing I don't like about is the way he introduces the one-over-sigma weighting at the very end as if it's no big deal. It uses a clever running sum that graphics people should be very familiar with. I'm still not sure if it should be pronounced "glick balls" or "gee lick balls".

Holy fucking crap. g-lick-balls is not only one of the sweetest and most obviously right little compression algorithms ever, with the best acronym, it also has the absolute coolest source code . The original web page and PDF are gone from the net now, but fortunately these pieces remain. The hint text that went with the source code also contains some tidbits that weren't in the PDF article.

One of the other major lossless image compression algorithms is RkIm, part of the Rkive suite by my old compadre Malcolm Taylor. Malcolm was always a bit of a troubled weird character, and he seems to have disappeared off the face of the earth. His page is gone and I can't find any information about him or his work around on the net any more. Malcolm made the original Rkive right around the same time I wrote the original PPMZ; we were working in very similar territory, we had the top two text compressors in the world for a long time, he helped me work out many of the details of PPMZ and he made the best implementation of it. He was just a hobbyist, not employed, and I invited him to come out to Wild Tangent to interview when I worked there. He came out and was kind of weird, and then disappeared; he never replied to any followup emails or calls. I wonder what's become of him.


So I installed Media Portal and the initial thought is that it's way better than Media Center. It's way more configurable and actually has display modes that show your music in usable ways. Like it actually lets you use your directory structure, and it has various list modes including ones that show a decent number of tracks. It lets you build ans use M3U playlists pretty well, etc.

MediaPortal is a hacked together mess and it's an absolute bitch trying to figure out how to do things with it. The documentation is minimal and what's there is way out of date. Tons of plugins for it are for old versions and no longer work any more and to figure it all out you have to just go searching around the forums which are a total mess. (BTW this idea that just having forums to document your software is okay is totally false; forums are an okay way for people to interact during development, but they do not provide a useable way for new people to learn about something because you can't figure out what information is still true or not).

MediaPortal starts up really slow by default, but you can go into the settings, turn on expert mode, and disable lots of stuff like animations and all the plugins you don't want.

Setting up MediaPortal for a remote is a huge bitch. The best way with my system seems to be to set MediaPortal for generic HID input, which makes it take key presses, and then set the iMon thing up to output the right key presses. There are config files for iMon -> MediaPortal floating around the net, but none of them is "official" so you just have to randomly try some and then tweak things yourself. Media Portal's key mapping functionality is also semi-broken, you can't access all the functions from keys and you can't make them context-sensitive.

Oh, I also found out Requiem needs to be run from the machine where you authorized your iTunes; apparently iTunes does some encryption using the MAC address of the machine. Also there's some weird thing with an "SC Info.txt" file that you can search about if Requiem gives you trouble. It also makes zero output so it's hard to tell when it's working or not, and it just crashes if you give it bad args. But it's still the only decent un-DRM solution.

If you want to make an HTPC and have it Just Work you probably have to go all-MS. Buy a Windows MCE Remote from MS, use the IR receiver that comes with it, disable the IR in the case, and just use Media Center as your player, and put all your content in My Documents. Don't bother trying to make your case's LCD or IR work and just go with what the MS stuff can do and it should all work okay. While I found MS's Media Center to be ass for music, it does seem pretty nice for TV if that's what you want.

Anyhoo, the HTPC with Media Portal is now sort of mostly working. The LCD doesn't work yet, but fuck it, it sucks anyway. AMD's Cool & Quiet doesn't seem to be working, but fuck it, it's cool and quiet enough already. UltraVNC does work with MediaPortal so I can remote that way. The default MP skin is ugly and the other skins I tried are broken, but whatever, fuck it.

In theory MP is open source so I could build it myself and fix the things I don't like. I've never had success with building one of these big messy open source apps before, though, so I'm very scared.

A few big todos remaining :

1. I've got a big problem with the music mirrors on our laptops. Ali and I both have our old copies of music and we may download new stuff in the future; I need a way to get that stuff into the share easily. The problem is I've already renamed a bunch of stuff on the share machine and deleted our duplicates, so I can't just xcopy. Yucky bleck.

Really the whole interface with exposing the share to her iTunes is yeck. When she gets new stuff I need to copy it to the share, and possibly change the file names to regularize because fucking music file names are randomly broken (the same band will be given slightly different names - for example you would have "AC-DC", "AC DC" and "ACDC"). I need to remember what was copied so that I don't copy it again. To expose the share, I need to get it into her iTunes, but without duplicating the stuff she has locally. And when I add new stuff to the share I want it to get into her iTunes quickly without doing a full dir scan. Ass ass ass fuck. There are some apps that do some of these things - Dupe Eliminator and iTunes Library Updater but they're both ass in various ways. I've used iTLU on my local machine and it's slow as all fuck, and you have to manually run it to get your new music in. Bleck, better solution is needed.

2. I want a figure out a way to just send playlists from other machines. I want to be able to browse the music share from my laptop and pick some files and click something on my laptop and have it start playing on the HTPC. The HTPC remote interface is okay for playing one album, but if you want to put together a playlist you really want your computer with a desk and a mouse. I'm sure I could do this pretty easily myself if I could write a Plugin for MediaPortal. This is relatively easy if I'm making playlists in WinAmp on my machine that directly refer to files in the Share, and the just sending the M3U file over the net to the HTPC machine; I'm sure I could rig that up pretty easily. It's much harder to rig so that Ali can make playlists in her iTunes and get those to play; for one thing her iTunes may refer to local copies of a song that is also on the music share. Like I said before there's still the possibility of just beaming your audio out from your machine to the HTPC, but that's not ideal because you'd like to be able to fire a playlist off and then do other things on your laptop and have the music keep playing. You know, like a fucking CD does.


I'm trying to copy all our music to the HTPC over the net and it's going outrageously slow. Alissa's 50 G has taken about 12 hours. Obviously I'm not using Explorer's copy, I'm using XCOPY which is supposed to be max speed. I'm also using mapped network drives which is supposed to help. I turned off all our Firewalls which was slowing things down, but it's still nutso slow. I also turned off various file info displays in Explorer and turned off the scheduled tasks discovery. Still slow.

This is stuff I've already done : Slow network browsing in Windows XP , Speed up Windows 2k/XP network , Hide the Scheduled Tasks and Printers folder in the network share view , InfoCellar XP Home Networking Problems

Part of the problem may be that the music folder has many thousands of files in it, and the MS network file system may have some kind of retarded N^2 thing. I am already hitting that fucking NTFS stupidity on the HTPC machine in which NTFS just fucks up completely on large directories (the directory B+ tree gets scattered around in fragments).

It's possible Windows network use is just retarded, like it sends 10X as much net traffic as the number of bytes you want to send.

BTW robocopy is pretty cool; I like the syntax of (from dir) (to dir) (file filter).


Fucking Avira has smart extensions on but it still thinks it needs to scan all my fucking MP3 files. No, there is not a fucking virus in my music data, and even if there was it wouldn't be executed DUCY !?!

In other news Alissa had a virus on her iPod in the autorun.inf ; Windows is so fucking retarded that it by default runs the autoruns on any removable storage device that's plugged in, so if you plugged the iPod in any random machine - blammo infected. We haven't detected it because of course it happens to be not plugged in when the scheduled virus checks run. On a similar note, Windows handling of cached writes to removable storage is ass bad. Caching writes is on by default, and that very quickly leads to corrupt storage. On my machine of course I have cached writes off for removable devices, but if I plug in to some random machine they're on and my disk can get hammered. Yay.

BTW, the many ways cached writes are fucked in windows : it doesn't flush the writes even after sitting idle for minutes, so you can quite sensibly pull the disk minutes later and not have complete files; it doesn't flush before going in to standby, and then doesn't wake up slept disks when it does try to flush; the "remove this device" icon dealy to tell it to flush often just refuses to work, saying some app is still using the disk when that's damn well not true (which leads you to always ignore the warnings to not remove a device since they're so often false you get in the habit of just pulling it anyway).


I'm seeing some really weird shit with DOS and backslash. Apparently in "cmd" the backslash works as an escape character. Not for normal characters, like \X for any alpha X is not an escape, but \" and \\ are special. \" is a quote char, and \\ is a single backslash. Also, the DOS 'echo' command is special in that it doesn't do any fancy processing. I'm not sure if this escaping is being done by the cmd environment, or if it's being done by the MSVC runtime before it gets to argc/argv.

In particular, I made a little program called "echoargs" that just outputs argc & argv. It makes the following output :

c:\>echoargs "hello world\"
argc : 2
1:hello world"

A more full line :

c:\src\chuksh>echoargs "hello world\" stuff\\" crap
argc : 3
1:hello world" stuff\

WTF this is seriously screwy. Supposedly the escape char in DOS is the caret ^ , but it's doing some escaping with backslash !? WTF

I can sort of hack a fix for my shell apps by looking for a single quote char in the middle of args and guessing where that came from, but seriously !?


God damn one of my great computer pet peeves is when I start up some big batch process to run over night, then I wake up and check on it, only to find that 15 minutes in it decided to put up some fucking prompt and stop running. Fuck you. I've written before about how you should always try to check all your possible failure cases and prompt the user right up front, but it occurs to me there's another pattern that people fail to do : defer questions to the end. When you hit a case that you need to prompt about, defer it to the end, keep processing as long as you can until you have to prompt. That way the user can walk away and then handle all the prompts at once.

Let me use a big dir copy as an example. Say I want to copy a big dir over to some other spot. First you should gather the file list and immediately check for things you might want to prompt - overwrite questions or make dir questions for example. Immediately prompt those at the beginning. Now go ahead and start doing the copy. Since you are not holding an exclusive lock on the drives, the files could change as you're copying, and unexpected situations might arise that require more prompting. If they do - defer them to the end. Finish all the copying you can, and then you might have a few little prompt cases at the end to wrap it up.

The Windows install is like the fucking devil for this. It takes almost an hour and it just randomly pops up prompts throughout, so you have to sit and watch it the whole time.


So far this fucking media PC is a giant frustration. It kind of amazes me that so many people out there build their own computers. It's still just such a cluster-fuck of incompatibility and broken ass drivers and Windows bugs. I would've really struggled if I didn't have another computer to use to get on the net and google about these problems, and if my google-fu wasn't so fucking advanced I never would've made it.

Apparently Windows XP SP3 has a well known crash bug, it fails to boot if you have a firewire drive hooked up. Yay MS. So I have to remember to unplug my external drive every time I reboot or it hangs.

The fucking LCD front panel is worthless; the hardware is terrible, it's too bright and the contrast is awful and the visible angle range is like one degree. The software for it is even worse, I have yet to succeed in actually getting it to show the media information like it's supposed to.

MS's Media Center app is no good. It's sort of okay for video I guess, but it's totally broken for big music collections. Just opening up "My Music" it stalls out for 60 seconds, and browsing around is just impossible with how slow & badly designed the GUI is (it shows 9 albums per page). Plus it does the fucking evil iTunes thing where it thinks it knows how to organize my music better than I do. JUST SHOW ME MY FUCKING DIRECTORY STRUCTURE!! Those 5000 songs I have randomly downloaded and stuffed in a folder called "misc" , no I do NOT want those to get all mixed in with my purchased music when you sort by artist or song title. Urgh.

So now I get to try some of the other media center apps, like MediaPortal. Or I could just write my own. It's a pretty fucking trivial app. Take remote control commands, browse playlists, play music, yay.

If I can't find better software to run this thing it's going to wind up being just a disk server + networked audio player. So like we can use our laptops to pick songs and tell it to play remotely and it will output to the stereo. Actually I want that kind of interface no matter what, so I'll have to find an elegant way to make that work. The ugly way is just to remote desktop to the media pc and set up plays there. Another ugly way is to use something like PulseAudio to beam your laptop's audio out to the media pc.


I've been doing some architectural history investigations walking around Seattle. Just by looking at the buildings in the neighborhoods, it's evident that Seattle was really prosperous around 1900-1925. There are lots of huge mansions from that time, and the old apartment complexes are really high quality, all brick with lots of nice architectural details, leaded glass, etc. Also around that time the city built a public transit system, re-graded the whole downtown area, raised Pioner Square up 15 feet, and built the city park system. Olmsted was called in to do the parks, which is sort of equivalent to hiring Frank Gehry to build you something today - it's a way for the up & coming podunk towns to show off.

You can also see that were was a big crash after that. There's almost no construction at all visible from 1930-1960, and then there's a bunch of really awful stuff from the 60's and 70's. You can be walking down a nice street full of old houses from the 20's and suddenly there's a ramshackle apartment complex from the 70's. That's a clear sign of an economic collapse, the families living in the houses bailed out for some reason and shitty apartments got built.


Mercedes and Audis are symbols of prosperity specifically because they are such horrible values; if they were more reasonably priced or higher quality, they wouldn't be status symbols, they would just be expensive reasonable choices. Their status comes from the fact that only someone filthy rich could care about money so little to make such a poor purchase. This kind of thing is the white person's version of "bling" - retardedly overspending on highly visible items to make yourself seem more prosperous than you really are. I guess you could toss fancy AV equipment in there and lots of other things.


My new computer came and I spent half of yesterday putting it together. It's been a really long time since I put together a computer completely from scratch, and it's not too much fun. The Taiwanese case/mobo manufacturers could really learn something about assembly and instructions from Ikea. I spent almost an hour puzzling over how to hook up the "power sw" cables. The front panel hookup instructions consist solely of one loose sheet of paper that has this confounding diagram on it. The actual inside front of the case looks like this . I eventually figured out the diagram was of the inside front of the panel, but upside down, and with all the metal of the case removed. So it looks nothing like what you see. And even then I was like, okay, WTF, A to A-prime !?

Anyway, after much googling I found others had the same problem. Many people in fact shorted out their front panel display by hooking things up wrong. If anybody tries it, here's what you do :

1. Look at the inside of the front panel from behind. On your left is the HD area, on your right is the DVD area. There's a bundle of cables coming from the left, such as "reset", "hdd led" and "power sw".

2. The "power sw" cable from the left is black and white. Run that power cable over to the right side through the little hole at the bottom, and across the DVD area and up to the top right corner - there's a little exposed spot with two pins to plug the cable in to.

3. From the right side, there is a smaller bundle of cables coming out of the panel. One is red and black and labeled "power sw" just like the black and white one. Run this one through the hole by the power supply and over to the power button on the motherboard's F_PANEL pins.

4. Also from the right side there's a thick black cable with a USB adapter in the middle of it and a 5-pin head. It took me a minute to figure this out (again there's zero directions for this thing). Obviously it's a cable for external USB converted to internal USB. The USB pins on your motherboard are 2x5 blocks that normally hook up two USB cables, we're going to put this thing on just half of one of those. The motherboard pins have one missing pin and the 1x5 head on this cable has one blocked so it's obvious how to put that in.

I also had one fuck of a time getting the Scythe Mini Ninja cooler on. The socket AM2 cooler attachment is supposed to be really easy, but the Scythe wasn't set up for the right depth, so you have to take the clips off (they come in the shallowest of 3 settings) and move them to the middle setting.

Other random notes for anyone actually trying to build one of these things : Make sure to install the chip and cooler before you put the mobo in. Also install the right plastic air blockers before you screw in the mobo, it's impossible to change them once screwed in. See silentpcreview for HD mounting - it should have the top facing the mobo and be closest to the mobo. The little metal cable clips on the inside are designed to be bent in order to release and secure cables, you don't have to try to wiggle cables onto them without bending them like I did. The PSU is supposed to have a black and blue cable to control its fan; mine doesn't have one, whatever.

To get Windows to install I had to unplug the front panel USB and set the SATA drives to IDE mode. (apparently there's a well known BSOD crash in the Windows installer if you have drives in AHCI). Once Windows installs successfully, you can install the Mobo and Panel drivers and then put the hardware back.

I made a few little mistakes with the build. I got a SATA DVD drive solely for the thinner cable; I've read in reviews that getting an IDE cable around this case is a bitch, and in fact I see that would be true. However, the PSU only has one SATA power out cable, so I don't have enough SATA power for an HD and a DVD. D'oh. Easy fix with a Molex-SATA adapter cable. I also probably should've gotten a WiFi card so I don't have to run a cable across the living room (and a wifi card is like the same price as a cable these days!).

The biggest problem for me is that this mobo (AMD 780G) doesn't support any analog TV out, only digital. Apparently the majority of AMD/ATI cards do support composite/svideo/whatever output through their VGA or DVI port with just a cable adapter, but this one doesn't. A note on those cables : there are lots of things like the ATI DVI to Component Adapter or VGA to Component Cable which seem like they might help. These cables are not converters, they are just dumb electricity re-routers. The normal signal coming out of a DVI-I or VGA wire does NOT work with an analog video TV. What these cables do is let your video card output a special analog TV compatible signal through the DVI or VGA pins. Your video card must be able to the conversion internally and it just re-uses the same pins to send a different signal. The brilliant people at AMD didn't include that circuit in the integrated graphics, presumably to save some money, even though this an HTPC-intended part and lots of people still have TV's that don't have any digital inputs. Yay.

So my first thought was just fuck it I'll buy a new TV, but then I calmed down and realized I should just get a $30 video card with TV out. Anyway, I'm pissed about the lack of analog out because the whole idea was to build a super minimal cool quiet cheap machine, and part of that was not having a fucking video card which blocks air flow and makes a bunch of unnecessary heat. One thing to note, if you put in a video card, the onboard video output (VGA, DVI, HDMI) is disabled, so you must get a video card that has all the outputs you want.

Also, WTF, all these hardware review websites are kind of pissing me off. Search for something like "ATI 3450 review" and you will find about 100 web sites, every single one of them copying the Tom's Hardware style, every one doing a shitty job of it, none of them actually understanding the hardware well, just running all the same benchmarks, not looking at things like features or compatibility. Stop it. Go back to playing Starcraft and pretending you're a 1337 HaX0r

I was thinking if I'm going to get a video card I may as well get a TV tuner, but there seems to still be some major fuckup with HD TV tuner's that's scaring me. The ATI All-In-Wonders for example apparently only work in Vista, not XP. Windows Media Center apparently won't record HD. It seems to all be related to some fucking ass licking DRM bullshit which is fucking retarded. My original plan was to not get a TV tuner until that shit was settled down a bit, but now my hand is being pressed. Urgh!

On the plus side, the Scythe fans kick major ass, they feel like they blow a lot of air to me and they are super quiet. Also the WD drive in the rubber grommets is almost silent. The only time I hear it is if I put my head close to the case and listen for it parking after a few seconds of idle, it makes a little tick sound.

On the minus side of the quiet front, the DVD drive is loud as hell. I did a little searching for "quiet DVD drive" and didn't turn up anything, they all seem to be super loud. Also the activity lights on the case are pure evil for watching video, but that's easy to fix with a piece of black electrical tape. It would be ideal to have a software toggle for the activity lights.


This apartment has really old bouncy wood floors that squeak when you walk around. The problem is, when I play music with any bass even at moderate volume it makes the whole floor vibrate like a giant drum head. Currently I've got some 3M Bumpons on my speakers which gives them a little bit of vibrational decoupling from the floor, but it's not great. I'm not sure how much of the floor shaking is coming from the direct contact with the speakers and how much is coming just from the sound waves, which is unfixable.

The only better idea I've got is to get a thick piece of acoustic damping foam (something like this ) and put it under the speakers.

BTW I fucking love the 3M Bumpons; they're awesome for lifting up your electronics a little to get better air flow between your components, and they do provide a nice little bit of vibrational decoupling which makes things like your external hard drives quieter.


There's a Sushi place in Seattle called "Imo". I have to go just so I can write a review that says "Imo is meh, IMO".


Seattle SODO does a weird zig and there's a spot where 2nd and 3rd Ave intersect . That also happens to be where Armandino's Salumi is located. I always think of those weird impossible street intersections as a stitch in the space-time-continuum. Hopefully the Enterprise won't come by and fire any Tachyons at it, they would ruin the portal to the Salumi planet.


So Alissa's got all this music she bought from iTunes long ago with the evil DRM shit in it. I want to put it in the music share but I don't want the fucking devil program that is iTunes to get anywhere near my media PC. So I started looking around for un-DRM solutions.

There are a whole host of retarded solutions out there. "NoteBurner" uses a virtual CD device so that you can go into iTunes and burn to CD and writes the raw audio to the virtual device and then it encodes from that to MP3. "TuneBite" actually hooks the wave output of any app and can encode it; to convert audio it plays the DRM music through a decoder device and reads the output. Another popular one is "dbPowerAmp" which uses a similar method (I'm not sure which exactly). These might be useful if for some reason you couldn't access the actual music data bits, but of course they are totally retarded when you have the music file sitting right there and all you need to do is pull off the DRM tag.

The real solution is "Requiem", created by the HYMN project. It's no longer available from their main servers but it can be found around the net. It can crack all your files in like 1 second because it isn't actually playing and re-encoding them like all the "legal" dumb solutions. And it has a simple command line interface. Yay.


Any bakery that sells frosted cakes with writing on them definitely sucks. Any bakery that sells cupcakes probably sucks. Any bakery that offers a whole bunch of types of breads (from challah to pugliese to brioche) probably sucks. Good places do just a few things and do them masterfully.


I just randomly found this really nice 2+2 Poker learning Anthology that somebody made (it's an index of posts on a poker learning web forum). If you're seriously trying to learn poker (in particular, online no limit hold'em) this is a great place to start.


I'm trying to get over my love of IKEA and start buying better stuff; "better" being purely defined by how you snobby pricks judge me for having IKEA stuff. I still think most of their stuff is wonderfully fuctional; it's simple and practical and well designed for actual use. But you all associate it with poor college kids and scoff when you walk in my door, so I'm trying to change. (I'm mainly talking about bookcases and shelving stuff and desks and kitchen stuff here; obviously their couches and whatnot are junk).

Anyhoo, I have not lost my love of the Galant desk ; it's the most perfect desk I've ever seen. It has a very thin table top which lets you get your keyboard very close to your knees, and it's height adjustable which is absolutely mandatory in a work desk IMO. Years ago I went shopping for desks with no price limit, just trying to find the best thing I could, and all the supposedly fancy desks were all garbage, with no height adjustment and way too much wood between lap and hands.

I just bought a new one and took my old one in to work. Unfortunately, they have changed it a bit since I got my old one some 8 years ago. The old one had just one big piece of tubular steel down the middle for support. It was far enough in that your knees never got to it. The new one has a box frame of much thinner steel tubes, but one of them is quite close and my knees do reach it.


I remember now why I didn't get out in nature that much when I lived here before. There's tons of great camping around, but I rarely did it. It's because I fucking despise camping in the rain. And it's always fucking raining up here. It's such a tease, it's like seeing a hot naked girl and not being allowed to touch.


I bought an Ultimate Power Shower Head for our bathroom here. It's okay if rather overpriced for how cheaply it's made. It's definitely better than the P.O.S. adjustable massaging thing that was on there. Those massaging shower heads are absolutely worthless for the purpose of showering, they have a million settings and every single one of them makes worthless weak or sparse stream patterns. The only thing they're actually good for is female masturbation. But I digress. The big improvement really came from taking the old head off and cleaning out the gunk that had gathered in the pipe, there was a huge mound of detritus piled up right behind the shower head, and of course the ancient shower head was 75% clogged by dirt and hard water deposits.

It made me think of a bunch of little shit you should always do when moving in to a new place -

Replace shower head, clean out pipe. Similarly if the sinks are running with bad flow this is often just gunk in the aerator; you can take off the nozzle and clean it out or replace it for $1.

Just replace screens on vents; of course replace any air filters if possible.

Replace the toilet seat. Ours was broken (like everything in this place), but hell it's only $10 for a toilet seat and god knows what diseases the previous resident had, don't bother cleaning it, just buy a new one.

I'm sure there are some other good little things to do that cost almost nothing and take little time and make a big difference. I've replaced most of the light bulbs too just because the previous ones had no concept of good lighting design; eg. a compact flourescent is great for the kitchen but not anywhere else, any bulbs with direct line-of-sight exposure need to be soft white, but bulbs behind shades generally should not be soft white, etc.


Wierd public restroom behavior :

Guys whistling or humming. Yes, we know you're uncomfortable, be quiet.

The stand back and pelvis thrust. This is when the guy stands like two feet back from the urinal and then leans back and thrusts his pelvis way forward.

The both hands on the hips. Like the guy is trying to tell the urinal who's boss. Actually this is usually done by business-douche type guys. Maybe they have their hands permanently affixed to their hips to indicate their importance. A much more rare variant is the both hands behind the head position, aka the "urinal is giving me a blowjob" position. Both very creepy. Keep a hand on your hose at all times folks.

The no-unbutton. This is when you leave your pants buttoned and feed your weiner through the trap door in your underwear. Are you Hasidic or a Never-Nude ? No? Then stop it.

The talker. This is such an obvious social deviance that I hardly need to even mention it. There is absolutely no talking of any kind in a public restroom until you are both at the sink washing your hands. If there's some emergency and you need to talk to someone who's at a urinal, stand directly behind them, do not walk around to the side of them. If you are inside a stall you must never initiate conversation except for toilet paper emergencies. I've had several bosses that would break this rule; again it seems to be common among boss types who lose touch with social norms due to their inflated self-worth. I would get a call on my office phone like "hey I've got a question about blah blah can you come talk to me?" I say "okay, where are you?" , "in the can" , oh no no no you cannot call me for a meeting when you're in the bathroom.


AC-DC adapters with the brick directly attached to the plug are the fucking devil.

It's kind of weird that I'm happy to shell out $1500 for a couch but I won't spent over $700 on a computer. Part of it is because I know computers so much better and it's so much easier to find out about products and comparison shop and all that. Furniture is one of the great scam markets. That and cars and gyms are some of the scummiest sales businesses around. Pretty much all furniture is listed at a 100% markup or more, and they all do their damndest to keep you from comparison shopping or getting real product information. For example, the majority of furniture and mattresses are actually made by only 3 or 4 major manufacturers. It's all the same product, but they stick different corporate names on it and give them different product names, and try to hide the true origins of the product so that you can't trace them back. (for example, Simmons and Serta mattresses are actually made by the same people, the same is true of sofas).


How to make sandwiches more yummy :

Make sandwich as usual, with sliced untoasted bread. Heat frying pan and butter very liberally. When hot, put sandwich in pan and cover with lid. Flip when brown. Do not burn butter.

The result is crisp buttery bread on the outside and nice warm sandwich in the middle. Pure yum. This is the deluxe conclusion of something I've been doing a long time - when you make a sandwich and toast the bread, butter the bread on the *outside* not on the inside. Butter on the outside means you taste it more when you eat the sandwich because it's the first thing to hit your tongue. This works great for hamburger buns too. There's no need to put butter on the inside, the filling is moist enough, but a nice coat of the butter on the outside enhances the flavor and texture of the initial attack of the bun.


My TV signal quality sucks here; I always split my cable twice, once off to the internet, then again to split to the TiVo and TV. I cannot fucking stand watching live TV through the TiVo, the latency of channel switching is infuriating (for similar reason I will never buy a "fancy" digital TV that takes like 3 seconds to change channels; WTF back in the 50's they had the technology to flip through channels instantly!!). Anyway the result of all this splitting is a lot of noise. Each splitter introduces 3.5 dB apparently; I'm not sure why it's necessary for the splitters to introduce so much noise. You can get good quality RF splitters, but they cost like $500. I can't really figure out a solution, I guess I just live with shitty TV.

Addendum : I'm smoking crack. The problem apparently is noise right at the receiver device (the TV or whatever). I'm not sure exactly where the noise is coming from but it seems to be a constant power noise source. The problem is that splitting cuts the signal power in half, so that makes this external noise 2x higher in a relative sense. The 3.5 dB on these splitters comes from the fact that 3 dB is about a factor of 2x. (10 dB = 10x in log scale, so 2x = 10*log10(2) = 3.0103). Anyway you can solve this by just amplifying your signal before you split. You can get a 10 dB amplifier for like $30 so I'm trying that.

Final result : I bought a signal amp and it fixed the bad signal quality. Yay.


I want to get a terabyte disk for us to put all our music and videos and stuff on, but then I was thinking hey maybe I should make it a network device so we can both access it more easily. Then I was thinking, hell if I make it a network device, maybe I should just get a little machine and make it a server. Then if I make a disk server machine, I may as well make it a media center machine and hook it up to the TV so I can just watch video off it directly and play MP3's from it to the stereo.

So I started looking into that a bit. Doing some kind of simple RAID would be pretty sweet, but that seems to still be outrageously expensive (just a good RAID controller is like $500). TV out cards are cheap as hell these days; TV in might be fun too, I could make it a DVR. I'd love to use a quiet mini case, but all the super mini PC's (like the AOpen or Shuttle) don't take full size hard drives (and probably don't have the juice to play 1080p video anyway).

I guess there could be some value in running Linux on it, but I have zero experience with that. I wonder how good MS's XP authenticity checks are these days. I suppose eventually this box should also be my Blu-Ray player but there's no rush to that.

Any advice or experience appreciated.

I found this Mvix thing; it's a pretty sweet deal, it's all ready to go, but I don't like the fact that it's not just a computer. I gather it's running a custom version of Linux and there's some bruhaha because they're not releasing the code they're supposed to under GPL.

It's pretty sweet that people are making good home-theater style PC cases now. I really like the look of the Antec Fusion which apparently was designed with the help of the SilentPCReview guys. The case prices are pretty outrageous; you can get whole computers for the price of one of these HTPC cases. For example this Visionman system for $469 is almost ready to go.

If you're a money-is-no-object kind of guy the EndPCNoise mCubed is close to perfect. It's 100% fanless; it uses the solid aluminum case as a big heatsink with heat pipes hooked to the CPU; I've been dreaming of systems like this for a long time and they are finally here, albeit with an outrageous $2500 price tag.

Addendum : So I spec'ed a machine and made a Newegg Wishlist with all the parts. $662 for the whole thing. Some notes on the parts :

The Antec Fusion case is partly designed by the SilentPCReview guys. It's got good air flow and vibration-damping mounting brackets. Cool. It has an IR Receiver and front display panel for song info and whatnot.

The Scythe Mini Ninja CPU cooler was also designed by them specifically for this case. It has horizontal vanes so that transverse air flow will cool the CPU. Most CPU heat sinks have vertical vanes and are designed for a fan to be mounted on top. The Antec case is specifically designed to have air flowing across the chips, so putting a vertical vane cooler in would wreck it.

The AMD 780G mobo is perfect for HTPC right now. It's got plenty fast integrated graphics to do HD H.264 decoding without pegging the CPU. It also includes all the good video outputs. At idle it's actually cooler than the older 690G mobos, but at full load the northbridge with the GPU in it gets very hot. Hopefully will be okay.

Hard drives you have two options depending on your planned use : the Sumsung F1 is very fast and cool and quiet and cheap. The WD GP is slower. I went with the GP for my build because I imagine this machine will be sitting on but completely idle a lot, and the GP will park its heads in idle which will extend its life and also reduce noise & power when not in use.

For CPU I went with the 4850e which is the fastest chip at 45W, which hopefully will be cool enough to run with no fan on the heat sink. If you want to put a fan on the CPU there are various good 65W choices.

The Scythe SFF21D is the quietest case fan available; it doesn't push much air though, so hopefully it will be good enough for this build. BTW make sure you install the fans to blow out of the case not into it.

My reference material :

Visionman DTA-1A7800 Slim Desktop PC - AMD Athlon X2 4200, 2GB DDR2-800, 500GB SATA II, DVDRW-LS, Radeon 3200 Onboard, Gigab
SysOpt.com - A Case for the Ultimate Media Center Antec Fusion Review
Scythe Ninja Mini CPU heatsink silentpcreview.com
MvixUSA Revolutionizing Portable Entertainment - Mvix Multimedia Centers Mvix (MX-780HD) Wireless HD Media Center [wHDMI]
MATX Computer Cases, ATX Computer Cases, HTPC Comptuer Cases TigerDirect.com
How To Build A High-Class Media PC With Antec's Fusion Media Center Case - Personal Technology - Network Computing
Build It A Living-Room Media Center PC - Solutions by PC Magazine
Build a Quiet, Efficient Home Theater PC - Motherboard ASUS M2NPV-VM - How To by ExtremeTech
Brian's Media Center Edition - Introduction
bit-tech.net Review - AMD's 780G integrated graphics chipset
Asus M2A-VM HDMI AM2 mATX motherboard silentpcreview.com
Antec NSK2400 Fusion Media PC Case silentpcreview.com
AMD Cool & Quiet replacement
AMD Athlon 4850e & 780G as HTPC Platform TweakTown
AMD 780G Best Ever Integrated Mainstream Chipset silentpcreview.com

BTW my link lists like this are autogenerated from a bunch of .url files in a dir. I run the following batch file :

call spawnm -i url2href *.url >> r:\hrefs.txt

spawnm and url2href are programs in the ChukSH distribution.


So I'm in a war with our upstairs neighbor already. Yesterday some courier came to the house and banged on the door; I was like, yo, WTF? and he was like "hey I have this letter for someone can I drop it off?" so I let him in. As soon as I let him in, it occurred to me that this smelled a lot like a summons, having previously encoutered that when a violent wierdo acquaintance was trying to dodge being served divorce papers by his wife and he asked me to not answer the door if someone with papers was standing outside. Anyway, that was yesterday, and last night the guy in E comes home and stomps around an awful lot. Then at 2 in the morning he wakes up and bangs and stomps all over. I was just about to go up and talk to him when he quit it. This morning when I went out in the hall I found the summons papers balled up in front of my door. Ah, the upstairs neighbor has left a turd of wrath on my doormat.

On the plus side it let me read his summons. Apparently he defaulted on some loan and some property of his is being foreclosed. The summons is for somewhere in New York, and amusingly, he apparently has had some 25 aliases. Apparently this guy is one of those scammers who goes around taking different identities, pretends to be rich, gets loans purportedly to start a business or whatever, then runs away around the country.

This would all be just rather amusing except for the fact that he lives above us, which means he has the great strategic advantage of higher ground in this war. I guess I'm going to go apologize tonight so that he'll quit this stomping.


God damn fucking fuck. Have I mentioned our landlord is the devil? He's supposed to come do the grout Tuesday. He just doesn't show, no call. I call him Wes to tell him I'm just gonna do his work and empty out the garage (in which the previous tenant left a fucking car engine), he says he'll come do the grout Thursday. Thursday he shows up and clears our stuff out of the kitchen and then discovers he's out of grout. He says he'll be back Friday some time. WTF WTF.

I hate fucking bills that require you to put a stamp on to pay them. I mean I guess these days you have your online pay and credit cards and whatnot so it makes some sense to not include postage, but back in the day it was fucking retarded. Just put a fucking postage paid on it and stick the cost in the bill for me. It's fucking better for both of us, it saves me the trouble of dealing with a stamp, it's cheaper because you get the bulk rate, and it makes it more likely you get your bill back promptly.

Fucking Picasa is not a normal web page, so I can't middle-click photos in the gallery to open them in tabs, and the fucking gallery takes an hour to load when I hit the back button. Also fucking Yelp does some retarded as thing where it stalls out really slowly when you *close* a page. Urg. Actually Yelp does a lot of retarded ass shit, like when you click on the "show me stuff near this address" and it first spends a bunch of time loading a page not related to that address, and then after it's done loading it posts your request and then spends a bunch of time loading that. Urg Web 2.0 can bite me.


Basic fucking courtesy when you live in an old building with wood floors :

Put felt pads on your furniture so it doesn't make that horrible scraping sound when you move it, the wood-on-wood squeal is like nails on a chalkboard.

Wear socks or slippers in the house; certainly never wear hard-soled shoes or high heels.

Try to walk with a shuffling step, not a stomp. You can pretty easily just slide your feed and not make any impact sounds at all.


The abbreviation for Land o' Lakes butter at QFC is "LOL BTR" . LOL indeed.


Our landlord is a fucking disaster; he's one of those people who just never has his shit together and is always an hour late to everything and rushing to the next thing, and doesn't do what you tell him unless you keep calling and reminding him, and shows up really late to appointments. I really fucking hate dealing with those people, they inject massive amounts of stress into my life. They make everything seem way harder than it really is, I hate having one in the office.

It's one of those relationships that I really hate. I really want to just fucking bitch this guy out, he's flaked and fucked up so bad on our place, but I need him to do a bunch of repairs and take care of the place, so I want to be on his good side. At the moment I've resisted telling him off but it hurts me.


I want a voice recorder for commuting; I always have tons of ideas in the car and forget them by the time I get to the office; sometimes I try to grab a pen and paper and write things down which is pretty insanely hazardous. It seems kind of retarded to buy a voice recorder these days when you could just use your iPhone or whatever.

It's sad that the speech-to-text movement has died. It definitely has died. 5 years ago it was very high profile, IBM was bragging about Dragon, MS was bragging about this big name research group they were assembling, lots of voice recorders advertised their automatic text capabilities. Now the voice recorders don't advertise that at all, they just present themselves as audio recorders, and in fact a whole mess of listen-and-type-along software has popped up in the wake of the dying speech-to-text behemoth.

The first voice recorder I had years ago came with a light version of the IBM speech-to-text engine, and it was indeed worthless. I spent about 30 minutes reading the training text into it (you read 5 minute chunks, and it kept telling me insufficient recognition, train more), and after that it proceeded to mangle the recognition so badly as to be useless. I had thought that some errors would be no big deal, presumably they would be phonetic errors and I would still get the gist of it and be able to fix it or see what I meant, but in fact the mess that came out was so bad it took minutes of trying to figure out WTF I meant.


I know I'm going to ripped a new one for this, but I think SF has better coffee than Seattle. I know, I know, that's hitting you right in the mommy-daddy buttons. Let me clarify; I think Seattle probably has better roasters (meh?), and Seattle definitely has far better trained baristas, and much higher average skill at making espresso drinks like caps and lattes and whatnot. But I believe the true expression of coffee flavor is in drip coffee, and Seattle just hasn't even woken up to that fact yet. I don't mean the normal shit drip coffee that you still get in Seattle, I mean single-cup hand dripped coffee like I've been doing at home for years, and you can get at Philz or Ritual or Blue Bottle or John Campbell's in SF. (also BTW IMO the Clover machine is not the best drip coffee; it's interesting, but the coffee it makes is too light and fruity; I prefer the strong Philz-style filter-dripped "turkish style" coffee).

BTW despite all the Seattle ripping, I'm really mostly liking it up here. Our neighborhood is pretty super convenient for nice walking. We have two cool parks for hanging, lots of coffee shop, groceries, decent eating, clubs, and the quiet residential streets are a nice change from the constant noise and dirt of SF.


If you Google "Trader Joe's Madison Seattle" it gives you the one in Queen Anne, not the one on Madison. WTF Google. How did you expect me to work for you when you do retarded shit like that?


Homeless are like pigeons; yes, they're an annoying dirty infestation, but you can't really blame them - it's the people who feed them that are to blame. It's really the exact same kind of problem; you rarely see it, but there are retarded weirdos who come out and dump masses of bird seed to the pigeons, and they just multiply to fill the available niche; there must be a parallel for the homeless; we need a "Please don't feed the homeless" ad campaign.

(BTW I'm talking about your Haight Street/Capitol Hill type of homeless here; it's a very different flavor of homeless than say your Tenderloin homeless. The Haight/Cap homeless are mostly kids and drunks who have opted out of jobs because they got off track, but could get back on; if you stopped giving to them, they wouldn't die (mostly), they could get it together. The Loin homeless are another story, they would wind up eating each other or quietly wasting away)


I haven't met any of our apartment neighborhors yet; I spotted one going into the garage to get his bicycle when I was sitting on the front stoop, but he avoided eye contact and dodged me. We've been making a hell of a racket moving in, and the walls and floors here are really thin, I feel guilty.


I am now officially moved to Seattle.

Moved rants to oldrants6. Recent actual techy highlights :

SmoothDriver page

GoldBullion poker AI here

New tabview in ChukSH ; Casey's tabview was like a perfect minimal bicycle, light and fast and elegant; I strapped a noisy inefficient polluting lawnmower engine to it. But mine goes faster ;) (also a bunch of unicode utils in ChukSH)

More : 03/2008 to 08/2008

Back to the Index