cbloom.com

Go to the new cbloom rants @ blogspot


01-09-12 | LZ Optimal Parse with A Star Part 5

Wrapping up the series with lots of numbers.

Previous parts :

cbloom rants 12-17-11 - LZ Optimal Parse with A Star Part 4
cbloom rants 12-17-11 - LZ Optimal Parse with A Star Part 3
cbloom rants 12-17-11 - LZ Optimal Parse with A Star Part 2
cbloom rants 10-24-11 - LZ Optimal Parse with A Star Part 1

I'm delirious with fever right now so I might write something inane, but I'm so bored of lying in bed so I'm trying to wrap this up. Anyhoo..

So first of all we have to talk a bit about what we're comparing the A Star parse to.

"Normal" is a complex forward lazy parse using heuristics to guide parsing, as described in Part 1. "Fast" is like Normal but uses simpler heuristics and simpler match finder.

"Chain" is more interesting. Chain is a complex "lazy"-type parser which considers N decisions ahead (eg. Chain 4 considers 4 decisions ahead). It works thusly :

Chain Parse : first do a full parse of the file using some other parser; this provides with a baseline cost to end from each point. Now do a forward parse. At each position, consider all match and literal options. For each option, step ahead by that option and consider all the options at the next position. Add up the cost of each coding step. After N steps (for chain N) add on the cost to end from the first baseline parse. Go back to the original position and finalize the choice with the lowest cost. Basically it's a full graph walk for N steps, then use an estimate of the cost to the end from the final nodes of that sub-graph.

To make Chain parsing viable you have to reduce the number of match options to a maximum of 8 or so. Still Chain N has a complexity of 8^N , so it becomes slow very quickly as N grows.

Chain forward parse is significantly better than LZSS style backwards optimal parse for these LZ coders that have important adaptive state. The baseline parse I use for Chain actually is a backwards LZSS optimal parse, so you can see how it does by looking at the "Chain 0" results.


First overall results. Chain 6 is the most amount of steps I can run in reasonable time, and AStar 2048 means the quantum length for dividing up the file for AStar was 2048.

raw Fast Normal Chain 6 AStar 2048
lzt00 16914 5179 5016 4923 4920
lzt01 200000 198313 198321 198312 198312
lzt02 755121 181109 177792 173220 173315
lzt03 3471552 1746443 1713023 1698949 1690655
lzt04 48649 13088 12412 10407 10249
lzt05 927796 368346 367598 355804 354230
lzt06 563160 352827 351051 344721 343173
lzt07 500000 226533 215996 209133 208566
lzt08 355400 250503 249987 230541 230220
lzt09 786488 302927 287479 268544 265525
lzt10 154624 11508 10958 10307 10291
lzt11 58524 20553 19628 19139 19087
lzt12 164423 29001 26488 23966 23622
lzt13 1041576 935484 931415 924510 922745
lzt14 102400 47690 47298 46417 46350
lzt15 34664 10832 10688 10269 10260
lzt16 21504 10110 10055 9952 9927
lzt17 53161 19526 18514 17971 17970
lzt18 102400 64280 63251 59772 59635
lzt19 768771 322951 288872 269132 269162
lzt20 1179702 888881 872315 856369 855588
lzt21 679936 91677 88011 83529 83184
lzt22 400000 287715 284378 279674 279459
lzt23 1048576 807253 804048 798369 798334
lzt24 3471552 1418076 1411387 1399197 1388105
lzt25 1029744 113085 107882 97320 100175
lzt26 262144 212445 210836 207701 207552
lzt27 857241 237253 235137 222023 220837
lzt28 1591760 332660 308940 260547 252808
lzt29 3953035 1193914 1180823 1147160 1135603
lzt30 100000 100001 100001 100001 100001
10800163 10609600 10337879 10289860


Now number of Chain steps for the chain parser : (that's O0 - O6)

U N O0 O1 O2 O3 O4 O5 O6
lzt00 16914 5016 5024 4922 4922 4922 4922 4923 4923
lzt01 200000 198321 198321 198312 198312 198312 198312 198312 198312
lzt02 755121 177792 177877 175905 174835 174073 173759 173509 173220
lzt03 3471552 1713023 1712337 1704417 1703873 1702651 1701635 1700282 1698949
lzt04 48649 12412 11315 10516 10481 10457 10427 10416 10407
lzt05 927796 367598 368729 365743 364332 360630 356403 355968 355804
lzt06 563160 351051 350995 346856 345500 344778 344739 344702 344721
lzt07 500000 215996 215644 211336 209481 209259 209244 209138 209133
lzt08 355400 249987 249372 239375 237320 231554 231435 233324 230541
lzt09 786488 287479 284875 280683 275679 270721 269754 269107 268544
lzt10 154624 10958 10792 10367 10335 10330 10311 10301 10307
lzt11 58524 19628 19604 19247 19175 19225 19162 19159 19139
lzt12 164423 26488 25644 24217 24177 24094 24108 24011 23966
lzt13 1041576 931415 931415 929713 927841 926162 924515 924513 924510
lzt14 102400 47298 47300 46518 46483 46461 46437 46429 46417
lzt15 34664 10688 10656 10317 10301 10275 10278 10267 10269
lzt16 21504 10055 10053 9960 9966 9959 9952 9948 9952
lzt17 53161 18514 18549 17971 17970 17974 17971 17973 17971
lzt18 102400 63251 63248 59863 59850 59799 59790 59764 59772
lzt19 768771 288872 281959 277661 273316 269157 269141 269133 269132
lzt20 1179702 872315 872022 868088 865376 863236 859727 856408 856369
lzt21 679936 88011 88068 84848 83851 83733 83674 83599 83529
lzt22 400000 284378 284297 281902 279711 279685 279689 279696 279674
lzt23 1048576 804048 804064 802742 801324 799891 798367 798368 798369
lzt24 3471552 1411387 1410226 1404736 1403314 1402345 1401064 1400193 1399197
lzt25 1029744 107882 107414 99839 100154 99710 98552 98132 97320
lzt26 262144 210836 210855 207775 207763 207738 207725 207706 207701
lzt27 857241 235137 236568 233524 228073 223123 222884 222540 222023
lzt28 1591760 308940 295072 286018 276905 273520 269611 264726 260547
lzt29 3953035 1180823 1183407 1180733 1177854 1170944 1162310 1152482 1147160
lzt30 100000 100001 100001 100001 100001 100001 100001 100001 100001
10609600 10585703 10494105 10448475 10404719 10375899 10355030 10337879

Some notes : up to 6 (the most I can run) more chain steps is better - for the sum, but not for all files. In some cases, more steps is worse, which should never really happen, but it's an issue of approximate optimal parsers I'll discuss later. (*)

On most files, going past 4 chain steps helps very little, but on some files it seems to monotonically keep improving. For example lzt29 stands out. Those files are ones that get helped the most by AStar.


Now the effect on quantum size on AStar. In all cases I only output codes from the first 3/4 of each quantum.

raw 256 512 1024 2048 4096 8192 16384
lzt00 16914 4923 4923 4920 4920 4920 4921 4921
lzt01 200000 198312 198312 198312 198312 198312 198314 198314
lzt02 755121 175242 173355 173368 173315 173331 173454 173479
lzt03 3471552 1699795 1691530 1690878 1690655 1690594 1690603 1690617
lzt04 48649 10243 10245 10234 10249 10248 10241 10241
lzt05 927796 357166 354629 354235 354230 354233 354242 354257
lzt06 563160 346663 343202 343139 343173 343194 343263 343238
lzt07 500000 209934 208669 208584 208566 208556 208553 208562
lzt08 355400 228389 229447 229975 230220 230300 230374 230408
lzt09 786488 266571 265564 265487 265525 265559 265542 265527
lzt10 154624 10701 10468 10330 10291 10273 10273 10272
lzt11 58524 19139 19123 19096 19087 19085 19084 19084
lzt12 164423 23712 23654 23616 23622 23628 23630 23627
lzt13 1041576 923258 922853 922747 922745 922753 922751 922753
lzt14 102400 46397 46364 46351 46350 46350 46348 46350
lzt15 34664 10376 10272 10260 10260 10251 10258 10254
lzt16 21504 9944 9931 9926 9927 9927 9927 9927
lzt17 53161 17937 17970 17968 17970 17969 17969 17969
lzt18 102400 59703 59613 59632 59635 59637 59640 59640
lzt19 768771 269213 269151 269128 269162 269193 269218 269229
lzt20 1179702 855992 855580 855478 855588 855671 855685 855707
lzt21 679936 83882 83291 83215 83184 83172 83171 83169
lzt22 400000 279803 279368 279414 279459 279605 279630 279647
lzt23 1048576 798325 798319 798321 798334 798354 798357 798358
lzt24 3471552 1393742 1388636 1388031 1388105 1388317 1388628 1388671
lzt25 1029744 97910 101246 101302 100175 100484 100272 100149
lzt26 262144 207779 207563 207541 207552 207559 207577 207576
lzt27 857241 222229 220832 220770 220837 220773 220756 220757
lzt28 1591760 256404 253257 252933 252808 252737 252735 252699
lzt29 3953035 1136193 1135442 1135543 1135603 1135710 1135689 1135713
lzt30 100000 100001 100001 100001 100001 100001 100001 100001
10319878 10292810 10290735 10289860 10290696 10291106 10291116

The best sum is at 2048, but 1024 is a lot faster and almost the same.

Again, as the previous note at (*), we should really see just improvement with larger quantum sizes, but past 2048 we start seeing it go backwards in some cases.


Lastly a look at where the AStar parse is spending its time. This is for a 1024 quantum.

The x axis here is the log2 of the number of nodes visited to parse a quantum. So, log2=20 means a million nodes were needed to parse that quantum. So for speed purposes a cell one to the right is twice as bad. The values in the cells are the percentage of quanta in the file that needed that number of nodes.

(note : log2=20 means one million nodes were visited to output 768 bytes worth of codes, so it's quite a lot)

log2 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
lzt00 0 0 0 18.18 59.09 18.18 4.55
lzt01 3.75 0.75 41.2 34.08 13.86 5.62 0.75
lzt02 1.81 1.36 25.37 34.09 13.59 13.02 8.15 1.93 0.23 0.23
lzt03 1.46 1.18 17.51 18.46 14.16 13.17 6.95 4.81 3.66 4.81 9.54 2.79 0.96 0.11 0.03
lzt04 1.67 0 0 1.67 0 21.67 5 18.33 3.33 10 16.67 16.67 5
lzt05 0.59 0.25 4.41 10.77 9.92 18.32 13.23 10.09 9.67 6.02 12.47 3.22 0.51 0.08 0.08
lzt06 0.8 0.93 6.81 23.77 14.69 16.96 21.09 11.48 2.67 0.8
lzt07 0.46 0.46 8.66 7.88 6.8 15.3 17 14.53 5.56 9.58 8.19 4.79 0.31 0.31
lzt08 0 0 0 0 0 0 1.68 1.68 1.47 27.67 53.88 11.95 1.68
lzt09 0.29 0.48 0.76 0.86 0.95 3.9 28.07 47.76 16.18 0.38
lzt10 0 0.56 10.17 12.99 9.04 9.04 10.17 41.24 4.52 0.56 1.13
lzt11 0 0 7.89 10.53 14.47 17.11 6.58 9.21 21.05 10.53 2.63
lzt12 0 0 0 0 0 4.27 28.91 59.24 7.58
lzt13 0 0 0.07 0.14 0.57 1.72 3.36 5.72 39.24 42.03 7.08 0.07
lzt14 0 0.83 0 2.5 8.33 34.17 42.5 5 2.5 1.67 0.83 0 0.83
lzt15 0 2.27 4.55 15.91 13.64 15.91 13.64 6.82 11.36 11.36 4.55
lzt16 0 0 3.57 0 14.29 42.86 32.14 3.57
lzt17 1.39 1.39 2.78 1.39 4.17 75 13.89
lzt18 0 0 0 0 0 0.72 0 2.17 2.9 11.59 56.52 23.19 2.9
lzt19 0 0 1.26 2.81 0.39 7.56 87.11 0.87
lzt20 0 0.13 2.08 2.02 4.29 67.07 24.29 0.06
lzt21 0.2 0.78 6.07 6.07 5.28 19.77 35.62 22.9 1.96 0.2 0.2
lzt22 0 0.56 2.98 5.59 26.82 62.94 1.12
lzt23 0 0 0 0 0 0.07 1.35 2.63 0.92 70.88 23.15 0.14 0.36 0.5
lzt24 0.44 0.61 4.14 37.41 7.62 12.68 12.72 8.52 6.11 5.19 3.11 0.94 0.31 0.04
lzt25 0.22 0.43 1.52 1.74 2.68 6.44 15.69 27.19 30.22 13.09 0.72
lzt26 0 0 0 1.15 3.15 2.58 77.65 14.61 0.57
lzt27 0.61 0.1 7.55 6.53 1.22 4.39 5 4.08 7.76 44.8 16.43 1.43
lzt28 0.25 0.1 3.71 0.94 0.74 6.77 15.56 10.08 10.97 14.82 18.68 11.41 4.05 1.24 0.1
lzt29 0.3 0.73 1.61 22.37 5.28 6.16 26.34 2.97 0.48 0.85 19.63 12.47 0.73
lzt30 3.7 0.74 47.41 34.07 12.59 0.74

Well there's no easy answer, the character of the files are all very different.

In many cases the A Star parse is reasonably fast (comparable to Chain 3 or something). But in some cases it's quite slow, eg. lzt04, lzt08, lzt28.


Okay, I think that's all the data. We have one point to discuss :

(*) = in all these type of endeavors, we see these anomolies where as we give the optimizer more space to make decisions, it gets better for a while, then starts getting worse. I saw the same thing, but more extreme, with video coding.

Basically what causes this is that you aren't optimizing for your real final goal. If you were optimizing for the total output size, then giving it more freedom should never hurt. But you aren't. With Chain N or with A Star in both cases you are optimizing just some local portion, and it turns out that if you let it make really aggressive decisions trying to optimize the local bit, that can hurt overall.

A similar issue happens with an Huffman optimal parse, becuase you are using the huffman code lengths from the previous parse to do the current parse. That's fine as long as your parse is reasonably similar, but if you let the optimal parser really go nuts, it can start to get pretty far off those statistics, which makes it wrong, so that more optimizing actually gives worse results.

With video coding the main issue I had was that the optimization was generally local (eg. just on one macro block at a time or some such), but it of course affects the future as a source for motion compensation (and in other ways), and it turns out if you do really aggressive optimization on the local decisions, that can wind up hurting overall.

A similar thing can happen in image and video coding if you let optimization proceed very aggressively, because you have to use some simple analytic criterion (such as RMSE - though even if you use a fancier metric the same problems arise). The issue is that the coder can wind up finding strange states that are a good trade-off for RMSE, but wind up looking just horrible visually.

Obviously the correct solution is to optimize with the true final goal in mind. But that's not always possible, either computationally, or because the final goal is subjective.

Generally the solution is to moderate the optimization in some way. You have some heuristic idea of what kind of solutions will provide good globally optimal solutions. (for example, in image/video coding, you might require that the bit rate allocation not create too big of a difference between adjacent blocks). So you sort of want to guide your optimization to start around where you suspect the answer to be, and then you tune it so that you don't allow it to be too aggressive in making whatever decision it thinks is locally optimal.


01-07-12 | Protectionism

There are some basic economics that I just don't understand, and a lot of the times the accepted "right answer" conflicts with common sense.

One example is it seems to me that buying something locally made is better for the local area than buying something made far away. (for "locality" you may substitute "state" or "nation" or whatever region you want to divide things by).

For me this conjures bad memories of the anti-"Jap" "USA USA" crowd of the '80's that had "buy American" bumper stickers and such ; something in my moral fibers says you should buy the best quality cheapest product. But I don't think that's true.

Consider for the moment the case that there are two products, identical in price and quality. One is locally made, one is foreign made. I contend it is better for the local area (and usually better for you personally) to buy the locally made product. When you do that, the money goes to someone who lives nearby, who spends that money again, and that person spends it, again, etc. This makes the local area prosperous.

(in a purely selfish sense, whether or not making the local area prosperous is good for you or not depends on the details of your situation; if you are a merchant or an altruist, it is good for you; but if your business is international and you would prefer local property values to be low, it might be bad for you; we will assume for the moment that you want the local area to benefit).

So there is some value to buying local and keeping money and industry circulating locally.

So, even if the local product is somewhat more expensive, it still might be better overall if you bought that instead of the foreign product. You have to weigh the benefit of both; the region gains some utility from access to cheaper foreign products, but that is traded off against not circulating that money around the local economy. eg. there's some break even point (in terms of overall utility) ; maybe if the local product costs 20% more, that's the actual break even point.

Of course consumers should not have to make that decision themselves, they should just be able to buy the cheapest product. The correct way to fix that is with government - one of the valuable things that government can do is to make apparent price equal to actual price (eg. to make price proportional to utility, or to move long term costs forward, etc.), or to use laws to bias pricing so that logical purchasing decisions lead to the greatest overall utility (eg. putting penalties on products that help you but hurt others).

The obvious way to make the prices match utility here is either with tarrifs on imports, or subsidies for local production. This is called "protectionism" to attack it, but it seems to me it's just a way of getting the benefit of circulating those dollars locally.

I'm a little disturbed by my conclusion because it's awfully close to the anti-globalization crackpots who claim that modern government financial policy benefits "wall street not main street" (and other slogans).


Granted, in reality, it's too late to go back to pre-1990's protectionism. The cat is out of the bag. And of course in reality protectionism degrades into political gifts for corrupt corporations. But we can ignore those issues for the theoretical discussion.

Also, if you are an extremely altruistic chap you might question the whole goal of maximizing the benefit to your locality (nation/state/fiefdom/whatever). You might say the goal of policies should be to maximize the good for the world. But for the moment let's ignore that and assume that the government of a nation should act to maximize benefit for that nation.


01-06-12 | Surveying is a Powder Keg

So I got my property surveyed a while ago, because there were some boundaries I wasn't totally sure about, and wanted to see how much space I had for fences, etc.

I should have realized this but did not - surveying is a powder keg. When the surveyor comes out and puts out his stakes and flags, it's like a siren call for neighborhood crazies to come around and dispute the line.

Basically people are retarded and unreasonable, and before just calmly talking to you, they assume you will ask them to take down their fence which is across the line (or whatever). Depending on where you live, the exact force of crazy takes different forms; out in the country you might get shot at; over in the suburbs you might get served with a notice of adverse posession. All just because you hired a surveyor, before you even consider doing anything about it. The crazies seem to see the surveyor's flag as a declaration of war.

The mistake I made was I got the survey and then I wanted some time to think about how I was going to fence things, I didn't just jam up the fences right away. That gave the crazies time, and this is what they did :

(pink flag stake is official from the surveyor of course, and crazy neighbor has put up his own line two feet into my property from the official mark)

Well done, crazy neighbor.

A bit of forehead vein popping and yelling at them got them to remove the post, but I expect more complications of this issue before it is done.

(BTW yelling at people is never satisfying in real life the way it is in fiction. In fiction there's this myth that people are actually good and reasonable, and were just doing something bad for a moment, and when you yell at them they realize their mistake and reform; like if Gorden Ramsay walked in at the right moment and yelled at Hitler he would be like "oh gosh, you're right, I'm so ashamed, I'll try to do better". Furthermore in fiction, they respond to the yelling either by yelling back, giving a satisfying argument, or by accepting the scolding and apologizing. In real life that never happens, what really happens is they try to change the subject, or turn it around and somehow blame you, or make excuses, or bring up random other points that don't matter to the issue (*), and it just leaves you feeling derailed).

(* = this is maybe the most common and most effective response - the completely random diversion into some other story that just leaves me going "WTF?" and totally takes the wind out of my sails. The other super effective tactic that I've noticed from like car salesman and contractors and such is to just completely stone wall you, like they tack on some $500 extra fee that they specifically agreed they wouldn't do, and you're like "hey WTF is this fee that you said wouldn't be there" and they just act like they did nothing wrong and of course you're going to go along with them, like "yep that's the $500 extra rapeage surcharge, so you can pay now by giving me your bank account and social security number..." , wait, what? I'm in the middle of yelling at you, you can't just act like your way is the only way).


01-06-12 | Nice Wiring Bub

I took off a horrible track light (blyeck, so tacky, track lights and can lights are the worst), and underneath I found this gem :

at first I thought it was just a big wad of masking tape on the end of the wire (bad enough, not an actual insulator, and a fire hazard), but upon peeling the masking tape I found this :

Oh, of course. You spliced on an extra 1 inch of wire, no wire nut, wrapped in masking tape - and the thing that most boggles my mind is that the wire ends are not even twisted in any kind of sane way, they are just randomly balled up around each other. Not to mention the original wires are plenty long without the splice.

Pretty impressive piece of fail.

BTW one of the hazards of old knob & tube wires is that the insulation is only rated to 60 C, but newer light fixtures are allowed to heat up to 90 C (which new Romex can handle). So you need to be careful when installing new light fixtures, and at the very least don't over-bulb (*). One way to solve this (without a ton of rewiring) is to back up the knob and tube a few feet away, put a junction box there, then run new Romex for the last few feet.

(* = I just love to over-bulb ; I can never get enough light; I used to put 100 W's in everything whenever I moved into an apartment. Here I might be a bit more careful about that, because they do generate a lot more heat (BTW I despise the gross inhumane light of CFL's, but one advantage of them with old wiring is they draw much less power (which keeps the wires cool) and they themselves are cool (which doesn't heat up the light fixture and box)). I'm not a huge fan of dimmers (fucking 75 W is already dark enough, I don't need any less than that), but if I could install an *amplifier* that let me over-drive the bulbs that would be sweet (but not in my house, which is apparently wired by paper clips and masking tape)).

It's kind of scary what kinds of disasters can be hiding inside your walls that you don't know about upon purchase (or maybe ever, until they cause water damage or a fire or whatever). I really like doing home improvement work in the garage and the basement, because the walls are unfinished so I can see where the studs are, which is so handy, I can see all the wire runs and junction boxes. It's totally superior.

Covering up your walls is super over-rated. I think if I was designing a modern house it would be all Pompidou Center style with color-coded pipes running around where I could directly access the electricity, water, etc.

If you want a more old fashioned home look, you could still do all your major wire runs around the ceiling and then cover them with a removeable wood crown molding piece. That way if you want to get into the wires, you just pop off the crown molding and you have a wooden box for access.


01-04-12 | Two laws you should hate

NDAA : makes legal the GWB/Obama policy of indefinite detainment (outside of war zone, Geneva convention, or any legal jurisdiction), and unilateral assasination orders -

» Obama’s Signing Statement on NDAA I have the power to detain Americans… but I won’t Alex Jones' Infowars There's a war on
The NDAA's historic assault on American liberty Jonathan Turley Comment is free guardian.co.uk
The Hit List The Public Applauds As President Obama Kills Two Citizens As A Presidential Prerogative « JONATHAN TURLEY
Senate Votes Overwhelmingly To Allow Indefinite Detention of Citizens « JONATHAN TURLEY

SOPA : basically makes free speech on the internet impossible, by making site hosts legally liable for any content posted on them. Basically allowed private companies to censor the internet at will.

Stop Online Piracy Act - Wikipedia, the free encyclopedia
SOPA for Dummies - Google Docs
House takes Senate's bad Internet censorship bill, tries making it worse

SOPA might not pass in it's worst form, but some lobbyists are pushing very hard for something like this, so the internet is going to get censored unless we fight it very hard.


01-04-12 | Double Pane Glass is a scam

"Replacement Windows" are shit sold by the window industry to sucker homeowners. The tiny gap (typically less than 1 inch) in a standard double pane IGU (integrate glass unit) is no better (and sometimes worse) than a traditional window + storm.

Throwing out perfectly good lovely old windows for "environmental" reasons is of course retarded; if you want more air proofing and don't already have storm windows, just get some good storms and you're done.

Replacement windows almost always uglier than good old wood windows, which architecturally fit the house and have nice wavy old glass.

Furthermore, they cannot be maintained and repaired in the same way as an old window + storm. When an IGU fails (which they do in 10-20 years typically) it cannot be repaired, it has to be replaced. Old wood windows can be easily taken a part, cleaned, resealed, and can last 100 years. (Vinyl windows and caulk and foam seal strips and so on are similarly problematic - they seem great at first, but they all decay badly in sun and weather, so have to be replaced regularly and can't really be maintained).

Replacement windows are usually shoved inside the existing window framing, and add their own thin frame which makes the opening smaller and adds an extra ugly architectural detail.

It's a standard "sustainable" bullshit corporate ripoff. To sell you some new crap and get you to throw away your perfectly good old windows. (it's like the wonderful irony of "sustainable christmas trees").

This guy addresses the issue in much more detail.


01-04-12 | Police Brutality

Much has been made of the outrageous treatment of Occupy protestors by police. But I believe it's been small potatos compared to the rampant, systemic brutality which pervades our nation's police departments. It's fallen out of the news because we're bored of it (we've seen black guys getting beaten by cops a million times) and because it doesn't affect the wealthy, but it has really not gotten better (or not enough, anyway).

Police in American are de-facto above the law. They violate human rights at will, with rarely a punishment greater than suspension or transfer.

Here in Seattle things have gotten shockingly bad, so bad in fact that even the DOJ has made an official report about how bad our police department is. (original here) . What is Seattle's official response? Not to do anything about it, it's to question the methodology of the report. Shameful.

To Protect & Skull-Fuck - Page 1 - News - Seattle - Seattle Weekly
SPDecay - Page 1 - News - Seattle - Seattle Weekly
Seattle sues attorney over public records request Local & Regional Seattle News, Weather, Sports, Breaking News KOMO News
Seattle Police Department Sued by KOMO News for Not Releasing Dash-Cam Videos - Seattle News - The Daily Weekly
Seattle Police A Department in Denial - Page 1 - News - Seattle - Seattle Weekly

One of the few ways that people are getting any justice these days is by getting the dash cam footage to prove that the cops' lies are in fact lies. (see for example Ian Birk lying about John T. Williams "lunging" at him (eerily similar to the very tragic case of Otto Zehm in Spokane in which officers also lied and claimed he "lunged" (with a soda pop bottle, which led them to kill him))). The result of course is that SPD is doing what it can to stop the dash cam system. They are now suing to stop releases of footage under public records disclosures, and have "accidentally" deleted many thousands of hours of footage.

Police Chief Diaz needs to be fired.

But in the larger picture, the big problem is the stone wall and loyalty attitude of police departments; that when there is a case where a police office may have killed a civilian without cause, their attitude is not to investigate and apologize, it's to cover it, draw ranks, support the officer, etc. This attitude makes not just the few bad cops responsible, but every cop who treats his compatriots as beyond reproach or above the law. Loyalty to evil is not admirable. (ask Joe Paterno).

Following a rash of unjustified killings in the 80's, many laws were passed that make it somewhat more difficult for police officers to use their guns. But the gap has been filled by stompings, clubbings, and taserings.

spokane police abuses summary

previous post at cbloomrants

Of course part of the problem is that there is a decent portion of the population that thinks "tough policing" is a good thing.


12-20-11 | Grocery Store Lines

Grocery store lines are a microcosm for how fucking shitty almost every human is, in almost every way.

First of all, you have the fact that nobody shows any basic human courtesy to each other. I almost never see someone with a ton of items let ahead the person with one item. But of course when I let the person with one item go ahead of me, they invariably do something fucked up like ask for a pack of cigarettes (which always takes forever in US grocery stores) or pay with food stamps or some shit. (aside : why is it always such a clusterfuck to pay with food stamps in some groceries? they must have done it a million times, but the checker always acts like someone handed them monopoly money, and the manager has to get called, and forms are filled out, wtf). Of course the people who are paying with coupons and civil war scrip never warn the person lining up behind them that maybe they should pick a different line.

But when the lines get long you really start to see people's souls.

There's the people who stand around and chat right in the middle of the lines. I watch people over and over asking "are you in line? oh, no? okay". Hmm, maybe you should get the fuck out of the line area to have your chit chat!

Then there's the people who can't seem to run a line in a reasonable direction and wind up blocking all the aisles or running it into another line. Invariably it takes a manager to come over and tell people to "please line up over here" since god knows they aren't going to sort themselves out.

Then you get the people who start stamping around and huffing and quickly looking from one side to another like this is the greatest injustice since slavery. You can just see wheels spinning in their heads about how "ridiculous this is" and so on.

There are the people who think that being really pushy in the back of the line is going to speed things up. We're eight people away from the register and they keep jamming their cart into my feet because the person three ahead of us moved. I get out of line to grab a magazine (leaving my cart) and they push into the gap where my body was. Whoah, slow down dick-face, we're still twenty feet from the register, you can chill a little now.

On the flip side then is the people who are absurdly slow about getting their checkout done with. (and of course to double-down on dickishness, it's often the same people who were impatient and pushy when they were way back in line (*)).

There are two general classes of people who fail to check out quickly :

1. The epically incompetent. These people pay with a check, either because they are ancient geezers (excusable) or because they think cards are somehow inferior or checks are more convenient (inexecusable). They might go digging around in their purse for half an hour trying to find exact change, or somehow still don't know they can scan their card before the checker is done.

2. The intentionally slow. These people think everyone needs to chill and slow down; what's the rush? They might chat with the checker a bit. They think everyone else is rotten for being in such a hurry. OMG you epic douchebag; it's fine if you want to live a slow, relaxed, life, but it's not okay to impose that on everyone behind you in line. You probably drive slowly too and think that everyone behind you is in the wrong for wanting to go faster. You probably have a "keep your laws off my body" bumper sticker, and fail to see that your own behavior is the same kind of selfish forcing of your values on others.

(* = the double-dick seems to be the norm for airplane passengers; who are inevitably and annoying and pushy and do a lot of huffing when you are way back in line, but when they actually get up to the TSA guy they still have their shoes on, don't have their id in hand, are drinking a bunch of liquid, and act like it's some big surprise. Same thing with the overhead bin stowage of course).


12-19-11 | SRAM

SRAM is rolling out a big promotional campaign this year, trying to convince people that their components are actually superior.

They've signed up lots of the pro teams. Just as with car racing, do not be mislead by what the pros use. I see lots of morons on forums saying "well the race team uses this, it must be great". No, the reason the race team uses it is because they are paid to use it.

SRAM double-tap shifters are fucking *awful*. Absolutely retarded. Imagine your right mouse button was taken away and instead you had to double-tap the left to accomplish that function. Yep, it's horrible.

Double-press GUI is always horrible and always should only be used as a last resort. We use it sometimes in games because there just aren't enough buttons on console controllers, but a smart game designer knows that only the secondary functions should go on double-tap buttons (the same goes for "hold" buttons) and the twitch functions should be on their own dedicated button.

Actually it's even worse than that. They don't do the right thing when you're at the edge of the gear shift limits. So like if you are at the low end, you can't go any lower (which you would accomplish by double-tapping), it will still let you single tap (to up-shift). So you're riding up a steep hill and you want a lower gear, you go to double-tap, and oh fuck half way through it the lever won't let you do the double-tap, but you've already single-tapped. There's no way to back out of it, when you let go it will up-shift you and you'll be fucked.

The STI system is just one million billion times better. But it's patented. That's why all these shift levers are so dang expensive, because they're patented.

It's also why there has to be a new lever system every year, a new size of bottom bracket, a new headset system - it's so that the manufacturer can patent it and/or make an exclusive line so that they can rip you off. The old system was perfectly fine functionally, the problem with it was that generic brands were starting to come out with cheap decent components for that system. We can't have that.

It's just like medicine of course, though with medicine it's much more diabolical.

Certainly with medicine it's obvious that there should be laws that prevent the pointless pushing of the new expensive product when it's not actually any better than cheap old solutions.

But I think it would actually be in the world's best interest to have a similar law for everything. It would be hard to phrase and hard to enforce, but the idea is something like - you must make components that are compatible with others on the market unless the incompatibility is for a necessary functional reason. It's actually much better for the free market and competition of products can plug and play and the consumer can choose based on pice and functionality, not compatibilty with some bullshit proprietary interface.

One that annoys me is car parts; most of the car parts for a Porsche or BMW or whatever are actually identical to the ones for a cheaper car (like a VW for example) - but they intentionally make the interface ever so slightly different so that you can't just go buy the cheaper part. The parts are all made by Bosch or whoever major part supplier anyway, it's not like you get a better brand of part for the money. The interesting thing to me is that the car maker doesn't really benefit from this, it's the part maker who does, so there must be some kind of collusion where the car maker gets a kickback in exchange for using the proprietary part.

Maybe the most obvious example is car wheels. Wheels are wheels, there's no need for them to be car specific, but the auto manufacturers intentionally use different bolt spacings (5x130, 4x110, etc) so that you can't go buy cheap mass market wheels for your fancy car. You can cross-shop the exact same wheel with different bolt spacings, and the price difference can be 2X or more.


12-17-11 | LZ Optimal Parse with A Star Part 4

Continuing ...
Part 1
Part 2
Part 3

So we have our A star parse from last time.

First of all, when we "early out" we still actually fill out that hash_node. That is, you pop a certain "arrival", then you evaluate the early out conditions and decide this arrival is not worth pursuing. You need to make a hash_node and mark it as a dead end, so that when you pop earlier arrivals that see this node, they won't try to visit it again.

One option would be to use a separate hash of just bools that mark dead ends. This could be a super-efficient smaller hash table of bit flags or bloom filters or something, which would save memory and perhaps speed.

I didn't do this because you can get some win from considering parses that have been "early outed". What you do is when you decide to "early out" an arrival, you will not walk to any future nodes that are not yet done, but you *will* consider paths that go to nodes that were already there. In pseudo-code :


pop an arrival

check arrival early outs and just set a flag

for all coding choices at current pos
{
  find next_node
  if next_node exists
    compute cost to end
  else
    if ! early out flag
       push next_node on arrivals stack
}

So the early out stops you from creating any new nodes in the graph walk that you wouldn't have visited anyway, but you can still find new connections through that graph. What this lets you do in practice is drive the early out thresholds tighter.

The other subtlety is that it helps a lot to actually have two (or more) stages of early out. Rather than just stop consider all exit coding choices once you don't like your arrival, you have a couple of levels. If your arrival looks sort of bad but not terrible, then you still consider some of the coding choices. Instead of considering 8 or 16 coding choices, you reduce it to 2 or 4 which you believe are likely advantageous.

The exact details depend on the structure of your back end coder, but some examples of "likely advantangeous" coding choices that you would consider in the intermediate early out case : if you have a "repeat recent offset" structure like LZX/LZMA, then those are obvious things to include in the "likely advantageous". Another one might be RLE or continue-previous type of match codes. Another would be if the literal codes below a certain number of bits with the current statistics. Also the longest match if it's longer than a certain amount.

Okay, so our A star is working now, but we have a problem. We're still just not getting enough early outs, and if you ran this on a big file it will take forever (sometimes).

The solution is to use another aspect we expect from our LZ back end, which is "semi-locality". Locality means that a decision we make now will not have a huge effect way in the future. Yes, it has some effect, because it may change the state and that affects the future, but over time the state changes so many times and adapts to future coding that the decision 4000 bytes ago doesn't matter all that much.

Another key point is that the bad (slow) case occurs when there are lots of parses that cost approximately the same. Because of our early out structure, if there is a really good cheap parse we will generally converge towards it, and then the other choices will be more expensive and they will early out and we won't consider too many paths. We only get into bad degeneracy if there are lots of parses with similar cost. And the thing is, in that case we really don't care which one we pick. So when we find an area of the file that has a huge branching factor that's hard to make a decision about, we are imperfect but it doesn't cost us much overall.

The result is that we can cut up the file to make the parse space tractable. What I do is work in "quanta". You take the current chunk of the file as your quantum and parse it as if it was its own little file. The parse at the beginning of the quantum will be mostly unaffected by the quantum cut, but the parse at the end will be highly affected by the false EOF, so you just throw it out. That is, advance through the first 50% or 75% of the parse, and then start the next quantum there.

There is one special case for the quantum cutting which is long matches that extend past the end of the quantum. What you would see is when outputting the first 50% of the parse, the last code will be a match that goes to the end of the quantum. Instead I just output the full length of the match. This is not ideal but the loss is negligible.

For speed you can go even further and use adaptive quantum lens. On highly degenerate parts of the file, there may be a huge node space to parse that doesn't get early-out'ed. When you detect one of these, you can just reduce the quantum len for that part of the file. eg. you start with a quantum length of 4096 ; if as you are parsing that quantum you find that the hash table occupancy is beyond some threshold (like 1 million nodes for example), you decide the branching factor is to great and reduce the quantum length to 2048 and resume parsing on just the beginning of that chunk. You might hit 1 million nodes again, then you reduce to 1024, etc.

That's it! Probably a followup post with some results numbers and maybe some more notes about subtle issues. I could also do several long posts about ideas I tried that didn't work which I think are sort of interesting.


12-17-11 | LZ Optimal Parse with A Star Part 3

Continuing ...
Part 1
Part 2

At the end of Part 2 we looked at how to do a forward LZSS optimal parse. Now we're going to add adaptive "state" to the mix.

Each node in the walk of parses represents a certain {Pos,State} pair. There are now too many possible nodes to store them all, so we can't just use an array to store all {Pos,State} nodes we have visited. So hopefully we will not visit them all, so we will store them in a hash table.

We are parsing forward, so for any node we visit (a {Pos,State} will be called a "node") we know how we got there. There can be many ways of reaching the same node, but we only care about the cheapest one. So we only need to store one entering link into each node, and the total cost from the beginning of the path to get to that node.

If you think about the flow of how the forward LZSS parse completes, it's sort of like an ice tendril reaching out which then suddenly crystalizes. You start at the beginning and you are always pushing the longest length choice first - that is, you are taking big steps into the parse towards the end without filling in all the gaps. Once you get to the end with that first long path (which is actually the greedy parse - the parse made by taking the longest match available at each step), then it starts popping backwards and filling in all the gaps. It then does all the dense work, filling backwards towards the beginning.

So it's like the parse goes in two directions - reaching from the beginning to get to the end (with node that don't have enough information), and then densely bubbling back from the end (and making final decisions). (if I was less lazy I would make a video of this).

Anyhoo, we'll make that structure more explicit. The hash table, for each node, stores the cost to get to the end from that node, and the coding choice that gives that cost.

The forward parse uses entry links, which I will henceforth call "arrivals". This is a destination node (a {pos,state}), and the cost from the beginning. (you don't need to store how you got here from the beginning since that can be reproduced at the end by rewalking from the beginning).


Full cost of parse through this node =

arrival.cost_from_head + hash_node.cost_to_tail

Once a node has a cost in the hash table, it is done, because it had all the information it needed at that node. But more arrivals can come in later as we fill in the gaps, so the full cost from the beginning of the parse to the end of the parse is not known.

Okay, so let's start looking at the parse, based on our simple LZSS pseudo-code from last time :


hash table of node-to-end costs starts empty
stack of arrivals from head starts empty

Push {Pos 1,state initial} on stack of arrivals

While stack is not empty :

pop stack; gives you an arrival to node {P,state}

see if node {P,state} is already in the hash
if so
{
  total cost is arrival.cost_from_head + hash_node.cost_to_tail
  done with this arrival
  continue (back to stack popping);
}

For each coding choice {C} at the current pos
{
  find next_state = state transition from cur state after coding choice C
  next_pos = P + C.len
  next_node = {next_pos,next_state]

  if next_node is in the hash table :
  {
    compute cost to end from code cost of {C} plus next_node.cost_to_tail
  }
  else
  {
    push next_node to the arrivals stack (*1)
  }
}

if no pushes were done
{
  then processing of current node is done
  choose the best cost to end from the choices above
  create a node {P,state} in the hash with that cost
}

(*1 = if any pushes are done, then the current node is also repushed first (before other pushes). The pushes should be done in order from lowest pos to highest pos, just as with LZSS, so that the deep walk is done first).

So, we have a parse, but it's walking every node, which is way too many. Currently this is a full graph walk. What we need are some early outs to avoid walking the whole thing.

The key is to use our intuition about LZ parsing a bit. Because we step deep first, we quickly get one parse for the whole segment (the greedy parse). Then we start stepping back and considering variations on that parse.

The parse doesn't collapse the way it did with LZSS because of the presence of state. That is, say I parsed to the end and now I'm bubbling back and I get back to some pos P. I already walked the long length, so I'm going to consider a shorter one. When I walk to the shorter one with LZSS, then states I need would already be done. But now, the nodes aren't done, but importantly the positions have been visited. That is -


At pos P, state S
many future node positions are already done
 (I already walked the longest match length forward)

eg. maybe {P+3, S1} and {P+5, S2} and {P+7, S3} have been done

I a shorter length now; eg. to {P+2,S4}

from there I consider {P+5, S5}

the node is not done, but a different state at P+5 was done.

If the state didn't matter, we would be able to reuse that node and collapse back to O(N) like LZSS.

Now of course state does matter, but crucially it doesn't matter *that much*. In particular, there is sort of a limit on how much it can help.

Consider for example if "state" is some semi-adaptive statistics. Those statistics are adaptive, so if you go far enough into the future, the state will adapt to the coding parse, and the initial state won't have helped that much. So maybe the initial state helps a lot for the next 8 coding steps. And maybe it helps at most 4 bits each time. Then having a better initial state can help at most 32 bits.

When you see that some other parse has been through this same position P, albeit with different state at this position, if that parse has completed and has a total cost, then we know it is the optimal cost through that node, not just the greedy parse or whatever. That is, whenever a hash node has a cost_to_tail, it is the optimal parse cost to tail. If there is a good parse later on in the file, the optimal parse is going to find that parse, even if it starts from a non-ideal state.

This is the form of our early outs :


When you pop an arrival to node {P,S} , look at the best cost to arrive to pos P for any state, 

if arrival.cost_from_head - best_cost_from_head[P] > threshold
  -> early out

if arrival.cost_from_head + best_cost_to_tail[P] > best_cost_total + threshold
  -> early out

where we've introduced two arrays that track the best seen cost to head & tail at each pos, regardless of state. We also keep a best total cost, which is initially set to infinity until we get through a total parse, and then is updated any time we see a new whole-walk cost.

This is just A star. From each node we are trying to find a lower bound for the cost to get to the end. What we use is previous encodings from that position to the end, and we assume that starting from a different state can't help more than some amount.

Next time, some subtleties.


12-17-11 | LZ Optimal Parse with A Star Part 2

Okay, optimal parsing with A star. (BTW "optimal" parsing here is really a misnomer that goes back to the LZSS backwards parse where it really was optimal; with a non-trivial coder you can't really do an optimal parse, we really mean "more optimal" (than greedy/lazy type parses)).

Part 1 was just a warmup, but may get you in the mood.

The reason for using A Star is to handle LZ parsing when you have adaptive state. The state changes as you step through the parse forward, so it's hard to deal with this in an LZSS style backwards parse. See some previous notes on backwards parsing and LZ here : 1 , 2 , 3

So, the "state" of the coder is something like maybe an adaptive statistical mode, maybe the LZMA "markov chain" state machine variable, maybe an LZX style recent offset cache (also used in LZMA). I will assume that the state can be packed into a not too huge size, maybe 32 bytes or so, but that the count of states is too large to just try them all (eg. more than 256 states). (*1)

(*1 - in the case that you can collapse the entire state of the coder into a reasonably small number of states (256 or so) then different approaches can be used; perhaps more on this some day; but basically any adaptive statistical state or recent offset makes the state space too large for this).

Trying all parses is impossible even for the tiniest of files. At each position you have something like 1-16 options. (actually sometimes more than 16, but you can limit the choices without much penalty (*2)). You always have the choice of a literal, when you have a match there are typically several offsets, and several lengths per offset to consider. If the state of the coder is changed by the parse choice, then you have to consider different offsets even if they code to the same number of bits in the current decision, because they affect the state in the future.

(*2 - the details of this depend on the back end of coder; for example if your offset coder is very simple, something like just Golomb type (NOSB) coding, then you know that only the shortest offset for a given length needs to be considered, another simplification used in LZMA, only the longest length for a given offset is considered; in some coders it helps to consider shorter length choices as well; in general for a match of Length L you need to consider all lengths in [2,L] but in practice you can reduce that large set by picking a few "inflection points" (perhaps more on this some day)).

Okay, a few more generalities. Let's revisit the LZSS backwards optimal parser. It came from a forward style parser, which we can implement with "dynamic programming" ; like this :


At pos P , consider the set of possible coding choices {C}

For each choice (ci), find the cost of the choice, plus the cost after that choice :
{

  Cost to end [ci] = Current cost of choice C [ci] + Best cost to end [ P + C[ci].len ]

}

choose ci as best Cost to end
Best code to end[ P ] = Cost to end [ best ci ]

You may note that if you do this walking forward, then the "Best cost to end" at the next position may not be computed yet. If so, then you suspend the current computation and step ahead to do that, then eventually come back and finish the current decision.

Of course with LZSS the simpler way to do it is just to parse backwards from the end, because that ensures the future costs are already done when you need them. But let's stick with the forward parse because we need to introduce adaptive state.

The forward parse LZSS (with no state) is still O(N) just like the backward parse (this time cost assumes the string matching is free or previously done, and that you consider a fixed number of match choices, not proportional to the number of matches or length of matches, which would ruin the O(N) property) - it just requires more book keeping.

In full detail a forward LZSS looks like this :


Set "best cost to end" for all positions to "uncomputed"

Push Pos 1 on stack of needed positions.

While stack is not empty :

pop stack; gives you a pos P

If any of the positions that I need ( P + C.len ) are not done :
{
  push self (P) back on stack
  push all positions ( P + C.len ) on stack
    in order from lowest to highest pos
}
else
{
  make a choice as above and fill "best cost to end" at pos P
}
If you could not make a choice the first time you visit pos P, then because of the order that we push things on the stack, when you come back and pop P the second time it's gauranteed that everything needed is done. Therefore each position is visited at most twice. Therefore it's still O(N).

We push from lowest to highest len, so that the pops are highest pos first. This makes us do later positions first; that way earlier positions are more likely to have everything they need already done.

Of course with LZSS this is silly, you should just go backwards, but we'll use it to inspire the next step.

To be continued...


12-12-11 | Things I want and cannot find

1. A true "sauna" near Seattle. A hot room right next to a lake, so I can steam it up and then go jump in the cold lake. We're in the perfect place for it, we have lots of nice cold swimmable lakes, and there are tons of Swedes around here, and yet I can't find one.

There are plenty of "saunas" at spas and health clubs, but without the lake swim it's fucking bullshit. I imagine some rich guy has one at his lakefront home, but I can't get the hookup.

2. A secluded cabin rental. I like the idea of going out in the woods and writing code alone. I can wear some flannel and chop wood for the fire like a real Northwesterner. But all the cabin rentals I can find are in sort of "cabin communities" or near a highway, or some shit. I want a place where you can look out of the big picture window and just see scenery for miles.

3. A good river swim. I found a bunch in CA but I can't find any up here. An ideal river swim has a nice deep "hole" due to rocks or waterfall or something. It should be a 4-5 mile hike from the closest parking to cut down on traffic (or be up a rough road or something, not just right off a highway). Ideally it should not be straight out of snow melt so that it's not ball-breaking freezing cold even in the middle of summer.

4. A nice place to ride. A country lane, no cars, good pavement. God damn I miss this.


12-12-11 | Sense

One of the most important skills in an employee is the sense to know when to ask for help and when not to. To know when they should just make a decision on their own vs. ask to make sure their choice is okay. To know when they need to call a meeting about something vs. when not to disturb others about it.

It's incredibly rare actually to find someone who has the sense to get this just right. I think it's very undervalued.

When you're a manager, the most awesome thing you can have is an employee you can trust. That means no unpleasant surprises. If you give them a task, it will be done on time, or you will be notified with enough notice to take action. You won't find out that they're slipping when it's too late to do anything about it. You won't have them claim to be done and then upon inspection find out that they've done it all wrong. You can just assign the task off and then you don't have to worry about it any more. You don't have to follow up and keep pinging them for status updates.

Someone with a great deal of sense will just know to give you status updates at the appropriate intervals. Not too often that they waste your time, but not too infrequently - they should always come in just before you start wondering "WTF happened to this task?".

One of the most crucial things is knowing what decisions they need to get approval for. It sucks to have an employee who asks about every little thing. "should I put this button here or here? should I make another file for this code or put it in this file?" Just make a fucking decision yourself, I don't care! But it also sucks to have someone go off and do all kinds of crazy shit without asking, like "oh yeah I ripped out the old animation system and am doing a new one" ; uh, you did what? and you didn't ask me first? WTF. Both are very common.

Of course the definition of the "right amount of approval" depends on the manager, and a key part of having good "sense" is actually social adaptation - it's about adapting to your situation and learning what is wanted of you. Many of the type-A left-brain coders never get this; part of your job as an employee is always interacting with other human beings, even if it's only with your boss, and there is no rational absolute answer about the right way to communicate, you have to feel it out and adapt.

Of course part of the role of a good manager is to teach these things, and to help people who may have good skills but not much "sense".

It's actually more annoying in personal life than in business life. For example you're having a dinner party and somebody volunteers to bring the wine, and then they show up with none, or they show up with a box of ripple. WTF dude, I could have just gotten it myself, if you're going to drop the ball, you need to notify someone with sufficient warning.

The annoying thing about the non-business world is you can't check up on them; like "hey can you give me a status update on that wine purchasing?" because you would be considered a huge dick.


A lot of this goes along with what I call "basic professionalism". Like if I assign you a crucial task that I need done today, don't go home without checking in with me and telling me it's done or not. If you think I assigned you too much and you can't get it done in time, don't go pout, come and tell me about it.

Another aspect of "basic professionalism" is knowing when to shut up. Like if you think the company is going in the wrong direction - raise the issue to your managers, that's good, if you have a good boss they want that feedback. But after they call a meeting and everyone disagrees with you and the decision is made to go on the path you don't like - it's time to shut up about it. We don't want to hear complaints every day.

A related aspect is knowing who it's appropriate to say things to. When we have someone from the publisher touring the studio, that is not the time to point out that you don't like the design of the lead character.

"Basic professionalism" is sort of a level below having good "sense" but it's also actually surprisingly hard to find.


One of the worst situations is to have someone who is not great about "sense" or "basic professionalism" but is touchy about it. Most people are not perfect on these points, and that's okay, but if you're not then you need a certain amount of supervision. That's just the way work gets done, but some people act like it's a personal affront to be monitored.

Like they occasionally drop the ball on tasks, you decide, okay I just have to ask for daily status reports. Then they get all pissy about it, "don't you trust me" or it's "too much beaurocracy" blah blah.

Or if they don't come to you and ask questions at the appropriate time, then you have to pre-screen all their approaches. Like sometimes you assign them a task and they'll just go off and start doing it wrong without saying anything. Now what you have to do is when you assign a task you have to say "can you tell me how you're going to approach this?" to make sure they don't say something nutso.


12-09-11 | Kittens

We want to get a kitten (since we have a stable house now), and I would like to just get a kitten from a home but WTF they don't exist any more.

When I was a kid, every couple of weeks some family in the neighborhood would have kittens and put out a sign. You could go to their house and see the kittens play and pick one. You could see if they were coming from a good home where they got socialized. You could see how old they were to know they weren't separated from their mom at too young an age.

There just isn't any of this anymore. It seems to have all been corporatized into kitten adoption centers.

Yeah yeah yeah you should adopt an a deformed adult cat that drools and has mange. No thanks. Part of the reason why I can't just find a normal home to adopt from is all the pet-adoption-nazis make it so that you can't use craigslist (or whatever) to find pets.

Also all the adoption agencies have strict spay/neuter at time of adoption rules. When I was a kid when we would get a cat sometimes we would not spay right away so that we could get a batch of kittens and keep one and give the result away. It was delightful to have a line of generations and a bunch of kittens to play with. That tradition seems to be all gone. The result is that the babies only come from strays or weirdos outside the system. It's sort of like if all law abiding citizens were castrated, then the only children in the world would be from criminals.

Boo.


Also, in other cat news, it turns out our professional cat sitter grossly overfed our cat while we were away in Hawaii. This despite verbal and written instructions on the correct amount to feed her. So she's sickly obese on our return.

How fucking hard is it to follow basic instructions? Jesus christ, I'm trying not to be a rich old crank, but I can't help thinking things like "it's so hard to find good help" and that the poor are poor because they're fucking retarded. (*). Half a cup means fucking half a cup, not "oh, I'll feed her until she stops eating like she's starving". You are the fucking help, you don't get to make your own decisions when I gave you specific orders.

What makes it even worse is that she (the cat sitter) gave us the usual condescending "I know so much about cats" bullshit when we interviewed her. Hey lady, I've watched an episode of the Dog Whisperer too, I'm not impressed by your amateur pet psychiatry.

* = you would think that it should be easy to find someone who could like get your groceries for you, or build you a fence, or pay your bills, or whatever, but it's actually really hard. It's amazing how badly the average person will fuck up the most basic assignments. To get someone that is smart enough that you can trust them to do those things, you have to hire someone in the top 1% of intellects, someone who could make $100k a year. It's actually sort of easy to hire someone really smart who costs a lot of money, and it's easy to hire someone who is just manual labor that you have to constantly supervise, but to hire someone in between that you can trust enough not to supervise but doesn't cost a fortune is hard because people are epic fuck ups.


12-08-11 | Some Semaphores

In case you don't agree with Boost that Semaphore is too "error prone" , or if you don't agree with C++0x that semaphore is unnecessary because it can be implemented from condition_var (do I need to point out why that is ridiculous reasoning for a library writer?) - here are some semaphores for you.

I've posted a fastsemaphore before, but here's a more complete version that can wrap a base semaphore.


template< typename t_base_sem >
class fastsemaphore_t
{
private:
    t_base_sem m_base_sem;
    atomic<int> m_count;

public:
    fastsemaphore_t(int count = 0)
    :   m_count(count)
    {
        RL_ASSERT(count > -1);
    }

    ~fastsemaphore_t()
    {
    }

    void post()
    {
        if (m_count($).fetch_add(1,mo_acq_rel) < 0)
        {
            m_base_sem.post();
        }
    }

    void post(int count)
    {
        int prev = m_count($).fetch_add(count,mo_acq_rel);
        if ( prev < 0)
        {
            int num_waiters = -prev;
            int num_to_wake = MIN(num_waiters,count);
            // use N-wake if available in base sem :
            // m_base_sem.post(num_to_wake);
            for(int i=0;i<num_to_wake;i++)
            {
                m_base_sem.post();
            }
        }
    }
    
    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        int c = m_count($).load(mo_acquire);
        while ( c > 0 )
        {
            if ( m_count($).compare_exchange_weak(c,c-1,mo_acq_rel) )
                return true;
            // c was reloaded
            // backoff here optional
        }
        return false;
    }
        
    void wait_no_spin()
    {
        if (m_count($).fetch_add(-1,mo_acq_rel) < 1)
        {
            m_base_sem.wait();
        }
    }
    
    void wait()
    {
        int spin_count = 1; // ! set this for your system
        while(spin_count--)
        {
            if ( try_wait() ) 
                return;
        }
        
        wait_no_spin();
    }
    
    
    int debug_get_count() { return m_count($).load(); }
};

when m_count is negative it's the number of waiters (plus or minus people who are about to wait, or about to be woken).

Personally I think the base semaphore that fastsem wraps should just be your OS semaphore and don't worry about it. It only gets invoked for thread wake/sleep so who cares.

But you can easily make Semaphore from CondVar and then put fastsemaphore on top of that. (note the semaphore from condvar wake N is not awesome because CV typically doesn't provide wake N, only wake 1 or wake all).

Wrapping fastsem around NT's Keyed Events is particularly trivial because of the semantics of the Keyed Event Release. NtReleaseKeyedEvent waits for someone to wake if there is noone. I've noted in the past that Win32 event is a lot like a semaphore with a max count of 1 ; a problem with building a Semaphrore from normal Event would be that you Set it when it's already Set, you effectively run into the max count and lose your Set, but this is impossible with KeyedEvent. With KeyedEvent you get exactly one wake from Wait for each Release.

So, if we wrap up keyed_event for convenience :


struct keyed_event
{
    HANDLE  m_keyedEvent;

    enum { WAITKEY_SHIFT = 1 };

    keyed_event()
    {
        NtCreateKeyedEvent(&m_keyedEvent,EVENT_ALL_ACCESS,NULL,0);
    }
    ~keyed_event()
    {
        CloseHandle(m_keyedEvent);
    }

    void wait(intptr_t key)
    {
        RL_ASSERT( (key&1) == 0 );
        NtWaitForKeyedEvent(m_keyedEvent,(PVOID)(key),FALSE,NULL);
    }

    void post(intptr_t key)
    {
        RL_ASSERT( (key&1) == 0 );
        NtReleaseKeyedEvent(m_keyedEvent,(PVOID)(key),FALSE,NULL);
    }
};

Then the base sem from KE is trivial :


struct base_semaphore_from_keyed_event
{
    keyed_event ke;

    base_semaphore_from_keyed_event() { }
    ~base_semaphore_from_keyed_event() { }
    
    void post() { ke.release(this); }   
    void wait() { ke.wait(this); }
};

(note this is a silly way to use KE just for testing purposes; in practice it would be shared, not one per sem - that's sort of the whole point of KE).

(note that you don't ever use this base_sem directly, you use it with a fastsemaphore wrapper).

I also revisited the semaphore_from_waitset that I talked about a few posts ago. The best I can come up with is something like this :


class semaphore_from_waitset
{
    waitset_simple m_waitset;
    std::atomic<int> m_count;

public:
    semaphore_from_waitset(int count = 0)
    :   m_count(count), m_waitset()
    {
        RL_ASSERT(count >= 0);
    }

    ~semaphore_from_waitset()
    {
    }

public:
    void post()
    {
        m_count($).fetch_add(1,mo_acq_rel);
        m_waitset.notify_one();
    }

    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        int c = m_count($).load(mo_acquire);
        while ( c > 0 )
        {
            if ( m_count($).compare_exchange_weak(c,c-1,mo_acq_rel) )
                return true;
            // c was reloaded
        }
        return false;
    }

    void wait(wait_thread_context * cntx)
    {
        for(;;)
        {
            // could spin a few times on this :
            if ( try_wait() )
                return;
    
            // no count available, get ready to wait
            waiter w(cntx);
            m_waitset.prepare_wait(&w);
            
            // double check :
            if ( try_wait() )
            {
                // (*1)
                m_waitset.retire_wait(&w);
                // pass on the notify :
                int signalled = w.flag($).load(mo_acquire);
                if ( signalled )
                    m_waitset.notify_one();
                return;
            }
            
            w.wait();
            m_waitset.retire_wait(&w);
            // loop and try again
        }
    }
    
    void wait()
    {
        wait_thread_context cntx;
        wait(&cntx);
    }
};

The funny bit is at (*1). Recall before we talked about a race that can happen if two threads post and two other threads pop. If one of the poppers gets through to *1 , it dec'ed the sem but is still in the waitset, one pusher might then signal this thread, which is a wasted signal, and the other waiter will not get a signal, and you have a "deadlock" (not a true deadlock, but an unexpected permanent sleep, which I will henceforth call a deadlock).

You can fix that by detecting if you recieved a signal while you were in the waitset. That's what's done here now. While it is not completely ideal from a performance perspective, it's a rare race case, and even when it happens the penalty is small. I still don't recommend using semaphore_from_waitset unless you have a comprehensive waitset-based system.

(note that in practice you would never make a wait_thread_context on the stack as in the example code ; if you have a waitset-based system it would be in the TLS)

Another note :

I have mentioned before the idea of "direct handoff" semaphores. That is, making it such that thread wakeup implies you get to dec count. For example "base_semaphore_from_keyed_event" above is a direct-handoff semaphore. This is as opposed to "optimistic" semaphores, in which the wakeup just means "you *might* get to dec count" and then you have to try_wait again when you wake up.

Direct handoff is neat because it gaurantees a minimum number of thread wakeups - you never wake up a thread which then fails to dec count. But they are in fact not awesome. The problem is that you essentially have some of your semaphore count tied up in limbo while the thread wakeup is happening (which is not a trivial amount of time).

The scenario is like this :


1. thread 1 does a sem.wait

2. thread 2 does a sem.post 
  the sem is "direct handoff" the count is given to thread 1
  thread 1 starts to wake up

3. thread 3 (or thread 2) now decides it can do some consuming
  and tries a sem.wait
  there is no sem count so it goes to sleep

4. thread 1 wakes up and processes its received count

You have actually increased latency to process the message posted by the sem, by the time between steps 3 and 4.

Basically by not pre-deciding who will get the sem count, you leave the opportunity for someone else to get it sooner, and sooner is better.

Finally let's have a gander at the Linux sem : sem_post and sem_wait

If we strip away some of the gunk, it's just :


sem_post()
{

    atomic_add( & sem->value , 1);

    atomic_full_barrier (); // (*1)

    int w = sem->nwaiters; // (*2)

    if ( w > 0 )
    {
        futex_wake( & sem->value, 1 );  // wake 1
    }

}

sem_wait()
{
    if ( try_wait() ) return;

    atomic_add( & sem->waiters , 1);

    for(;;)
    {
        if ( try_wait() ) break;

        futex_wait( & sem->value, 0 ); // wait if sem value == 0
    }

    atomic_add( & sem->waiters , -1);
}

Some quick notes : I believe the barrier at (*1) is unnecessary ; they should be doing an acq_rel inc on sem->value instead. However, as noted in the previous post about "producer-consumer" failures, if your producer is not strongly synchronized it's possible that this barrier helps hide/prevent bugs. Also at (*2) in the code they load nwaiters with plain C which is very sloppy; you should always load lock-free shared variables with an explicit load() call that specifies memory ordering. I believe the ordering constraint there is the load of nwaiters needs to stay after the store to value; the easiest way is to make the inc on value be an RMW acq_rel.

The similarity with waitset should be obvious, but I'll make it super-clear :


sem_post()
{

    atomic_add( & sem->value , 1);
    atomic_full_barrier ();

    // waitset.notify_one :
    {
        int w = sem->nwaiters;
        if ( w > 0 )
        {
            futex_wake( & sem->value, 1 );  // wake 1
        }
    }
}

sem_wait()
{
    if ( try_wait() ) return;

    // waitset.prepare_wait :
    atomic_add( & sem->waiters , 1);

    for(;;)
    {
        // standard double-check :
        if ( try_wait() ) break;

        // waitset.wait()
        // (*3)
        futex_wait( & sem->value, 0 ); // wait if sem value == 0
    }

    // waitset.retire_wait :
    atomic_add( & sem->waiters , -1);
}

It's exactly the same, but with one key difference at *3 - the wait does not happen if count is not zero, which means we can not receive the wait wakeup from futex_wake if we don't need it. This removes the need for the re-pass that we had in the waitset semaphore.

This futex semaphore is fine, but you could reduce the number of atomic ops by storing count & waiters in one word.


12-05-11 | Surprising Producer-Consumer Failures

I run into these a lot, so let's have a quick glance at why they happen.

You're trying to do something like :


Thread1 :

Produce 1
sem.post

Thread2 :

Produce 2
sem.post

Thread 3 :

sem.wait
Consume 1

Thread 4 :

sem.wait
Consume 2

and we assert that the Consume succeeds in both cases. Produce/Consume use a queue or some other kind of lock-free communication structure.

Why can this fail ?

1. A too-weak semaphore . Assuming out Produce and Consume are lock-free and not necessarily synchronized on a single variable with something strong like an acq_rel RMW op, we are relying on the semaphore to synchronize publication.

That is, in this model we assume that the semaphore has something like an "m_count" internal variable, and that both post and wait do an acq_rel RMW on that single variable. You could certainly make a correct counting semaphore which does not have this behavior - it would be correct in the sense of controlling thread flow, but it would not provide the additional behavior of providing a memory ordering sync point.

You usually have something like :


Produce :
store X = A
sem.post // sync point B

Consume:
sem.wait // sync point B
load X  // <- expect to see A

you expect the consume to get what was made in the produce, but that is only gauranteed if the sem post/wait acts as a memory sync point.

There are two reasons I say sem should act like it has an internal "m_count" which is acq_rel , not just release at post and acquire at wait as you might think. One is you want sem.wait to act like a #StoreLoad, so that the loads which occur after it in the Consume will see preceding stores in the Produce. An RMW acq_rel is one way to get a #StoreLoad. The other is that by using an RMW acq_rel on a single variable (or behaving as if you do), it creates a total order on modifications to that variable. For example if T3 seems T1.post and T2.post and then does its T3.wait , T4 cannot see T1.post T3.wait T4.wait or any funny other order.

Obviously if you're using an OS semaphore you aren't worrying about this, but there are lots of cases where you use this pattern with something "semaphore-like" , such as maybe "eventcount".

2. You're on POSIX and forget that sem.wait has spurious wakeups on POSIX. Oops.

3. Your queue can temporarily appear smaller than it really is.

Say, as a toy example, adding a node is done something like this :


new_node->next = NULL;

old_head = queue->head($).exchange( new_node );
// (*)
new_node->next = old_head;

There is a moment at (*) where you have truncated the queue down to 1 element. Until you fix the next pointer, the queue has been made to appear smaller than it should be. So pop might not get the items it expects to get.

This looks like a bad way to do a queue, but actually lots of lock free queues have this property in more or less obvious ways. Either the Push or the Pop can temporarily make the queue appear to be smaller than it really is. (for example a common pattern is to have a dummy node, and if Pop takes off the dummy node, it pushes it back on and tries again, but this causes the queue to appear one item smaller than it really is for a while).

If you loop, you should find the item that you expected in the queue. However, this is a nasty form of looping because it's not just due to contention on a variable; if in the example above the thread is swapped out while it sits at point (*), then nobody can make progress on this queue until that thread gets time.

The result I find is that ensuring that waking from sem.wait always implies there is an item ready to pop is not worth the trouble. You can do it in isolated cases but you have to be very careful. A much easier solution is to loop on the pop.


12-03-11 | RAD - Hawaii Branch

It's a pretty nice place to work. The ergonomics of the picnic table are not half bad actually. Very glad I brought my keyboard; wish the laptop screen was bigger.


12-03-11 | Worker Thread system with reverse dependencies

In the previous episode we looked at a system for doing work with dependencies.

That system is okay; I believe it works, but it has two disadvantages : 1. It requires some non-standard synchronization primitives such as OR waits, and 2. There is a way that it can fail to do work as soon as possible; that is, there is the possibility for moments when work could be done but the worker that could do it is asleep. It's one of our design goals to not let that happen so let's see why it happens :

The problem basically is the NR (not ready) queue. When we have no RTR (ready to run) work, we popped one item from the NR queue and waited on its dependencies. But there could be other items later in the NR queue which become ready sooner. If the items in the NR queue become ready to run in order, this doesn't occur, but if they can become ready in different orders, we could miss out on chances to do work.

Anyhoo, both of these problems go away and everything becomes much simpler if we reformulate our system in terms of "forward dependencies" instead of "backward dependencies".

Normal "dependencies" are backwards; that is, A depends on B and C, which were created earlier in time. The opposite direction link I will call "permits" (is there a standard term for this?). That is, B and C permit A. A needs 2 permissions before it can run.

I propose that it is conceptually easier to set up work in terms of "dependencies", so the client still formulates work items with dependencies, but when they are submitted to be run, they are converted into "permissions". That is, A --> {B,C} is changed into B --> {A} and C --> {A}.

The main difference is that there is no longer any "not ready" queue at all. NR items are not held in any global list, they are only pointed to by their dependencies. Some dependency back in the tree should be ready to run, and it will then be the root that points through various NR items via permission links.

With no further ado, let's look at the implementation.

The worker thread becomes much simpler :


worker :

wait( RTR_sem );

pop RTR_queue and do work

that's it! Massively simpler. All the work is now in the permissions maintenance, so let's look at that :

How do we maintain permissions? Each item which is NR (not ready) has a (negative) count of the # of permissions needed before it can run. Whenever an item finishes, it walks its permission list and incs the permit count on the target item. When the count reaches zero, all permissions are done and the item can now run.

A work item now has to have a list of permissions. In my old system I had just a fixed size array for dependencies; I found that [3] was always enough; it's simply the nature of work that you rarely need lots of dependencies (and in the very rare cases that you do need more than 3, you can create a dummy item which only marks itself complete when many others are done). But this is not true for permissions, there can be many on one item.

For example, a common case is you do a big IO, and then spawn lots of work on that buffer. You might have 32 work items which depend on the IO. This only needs [1] when expressed as dependencies, but [32] when expressed as permissions. So a fixed size array is out and we will use a linked list.

The maintenance looks like this :


submit item for work :

void submit( work_item * wi , work_item * deps[] , int num_deps )
{

    wi->permits = - num_deps;

    if ( num_deps == 0 )
    {
        RTR_queue.push( p );
        RTR_sem.post();
        return;
    }

    for(int i=0;i<num_deps;i++)
    {
        deps[i]->lock();

        if ( ! deps[i]->is_done )
        {
            deps[i]->permits_list.push( wi );
        }
        else
        {
            int prev = wi->permits.fetch_add(1); // needs to be atomic
            if ( prev == -1 ) // permitted (do this also if num_deps == 0)
            {
                RTR_queue.push( p );
                RTR_sem.post();
            }
        }

        deps[i]->unlock();
    }

}


when an item is completed :

void complete( work_item * wi )
{
    wi->lock();

    set wi->is_done

    swap wi->permits_list to local permits_list

    wi->unlock();

    for each p in permits_list
    {
        int prev = p->permits.fetch_add(1);

        if ( prev == -1 )
        {
            // p is now permitted

            RTR_queue.push( p );
            RTR_sem.post();
        }
    }
}

the result is that when you submit not-ready items, they go into the permits list somewhere, then as their dependencies get done their permits count inc up towards zero, when it hits zero they go into the RTR queue and get picked up by a worker.

The behavior is entirely the same as the previous system except that workers who are asleep because they have no RTR work can wake up when any NR item becomes RTR, not just when the single one they popped becomes RTR.

One annoyance with this scheme is you need to lock the item to maintain the permits_list ; that's not really a big deal (I use an indexed lock system similar to Nt Keyed Events, I don't actually put a lock object on each item), but I think it's possible to maintain that list correctly and simply lock free, so maybe we'll revisit that.

ADDENDUM : hmm , not easy to do lock free. Actually maintaining the list is not hard, and even doing it and avoiding races against the permitted count is not hard, the problem is that the list is in the work item and items can be deleted at any time, so you either need to hold a lock on the item to prevent deletion, or you need something like RCU or SMR.


12-02-11 | Natural Expression

It's so nice when you find the "natural" way to express a coding problem. All of a sudden everything because so much simpler and the answers just start popping out at you. Like oh, and I can do this here, and this automatically happens just the way I wanted. Tons of old code just disappears that was trying to solve the problem in the "un-natural" way.

It doesn't change the code; in the end it all becomes assembly language and it can do the same thing, but changing the way you write it can change the way you think about it. Also when you find an simple elegant way to express things, it sort of makes it feel "right", whereas if you are getting the same thing done through a series of kludges and mess, it feels horrible, even though they are accomplishing the same thing.

It reminds me of physics. I think some of the greatest discoveries the past century in physics were not actually discoveries of any phenomenom, but just ways to write the physics down. In particular I cite Dirac's Bra-Ket notation and Feynman's path integrals.

Neither one added any new physics. If you look at it in a "positivist" view point, they did nothing - the actual observable predictions were the same. The physics all existed in the equations which were already known. But they opened up a new understanding, and just made it so much more natural and easier to work with the equations, and that can actually have huge consequences.

Dirac's bra ket for example made it clear that quantum mechanics was about Hilbert spaces and Operators. Transformation between different basis spaces became a powerful tool, and very useful and elegant things like raising and lowering operators popped out. Quantum mechanics at the time was sort of contraversial (morons like Einstein were still questioning it), and finding a clear elegant solid way to write it down made it seem more reasonable. (physicists have a semi-irrational distrust of any physical laws that are very complicated or vague or difficult to compute with; they also have a superstition that if a physical law can be written in a very concise way, it must be true; eg. when you write Maxwell's equations as d*F = J).

Feynman's path integrals came along just at a time when Quantum Field Theory was in crisis; there were all these infinities which make the theory impossible to calculate with. There were some successful computations, and it just seemed like the right way to extend QM to fields, so people were forging ahead, but these infinities made it an incomplete (and possibly wrong) theory. The path integral didn't solve this, but it made it much easier to see what was actually being computed in the QFT equations - rather than just a big opaque integral that becomes infinity and you don't know why, the path integral lets you separate out the terms and to pretend that they correspond to physical particles flying around in many different ways. It made it more obvious that QFT was correct, and what renormalization was doing, and the fact that renormalization was a physically okay way to fix the infinities.

(while I say this is an irrational superstition, it has been the fact that the laws of physics which are true wind up being expressable in a concise, elegant way (though that way is sometimes not found for a long time after the law's discovery); most programmers have the same supertition, when we see very complex solutions to problems we tend to turn up our noses with distate; we imagine that if we just found the right way to think about the problem, a simple solution would be clear)

(I know this history is somewhat revisionist, but a good story is more important than accuracy, in all things but science)

Anyhoo, it's nice when you get it.


11-30-11 | Some more Waitset notes

The classic waitset pattern :

check condition

waiter w;
waitset.prepare_wait(&w);

double check condition

w.wait();

waitset.retire_wait(&w);

lends itself very easily to setting a waiter flag. All you do is change the double check into a CAS that sets that flag. For example say your condition is count > 0 , you do :

if ( (count&0x7FFFFFFF) == 0 )
{
    waiter w;
    waitset.prepare_wait(&w);

    // double check condition :
    int c = count.fetch_or( 0x80000000 ); // set waiter flag and double check
    if ( (c&0x7FFFFFFF) == 0 )
        w.wait();

    waitset.retire_wait(&w);
}

then in notify, you can avoid signalling when the waiter flag is not set :

// publish :
int c = count.atomic_inc_and_mask(1,0x7FFFFFFF);
// notify about my publication if there were waiters :
if ( c & 0x80000000 )
  waitset.notify();

(note : don't be misled by using count here; this is still not a good way to build a semaphore; I'm just using an int count as a simple way of modeling a publish/consume.


I was being obtuse before when I wrote about the problems with waitset OR. It is important to be aware of those issues when working with waitsets, because they are inherent to how waitsets work and you will encounter them in some form or other, but of course you can do an OR if you extend the basic waitset a little.

What you do is give waiter an atomic bool to know if it's been signalled, something like :


struct waiter
{
  atomic<bool> signalled;
  os_handle  waitable_handle;
}

(a "waiter" is a helper which is how you add your "self" to the waitset; depending on the waitset implementation, waitable_handle might be your thread ID for example).

Then in the waitset notify you just do :


if ( w->signalled.exchange(true) == false )
{
   Signal( w->waitable_handle );
}
else
    step to next waiter in waitset and try him again.

That is, you try to only send the signal to handles that need it.

If we use this in the simple OR example from a few days ago, then both waiting threads will wake up - two notify_ones will wake two waiters.

While you're at it, your waiter struct may as well also contain the origin of the signal, like :


if ( w->signalled.exchange(true) == false )
{
    // non-atomic assignment :
    w->signal_origin = this; // this is a waitset
    Signal( w->waitable_handle );
}

That way when you wake from an OR wait you know why.

(note that I'm assuming your os_handle only ever does one state transition - it goes from unsignalled to signalled. This is the correct way to use waitset; each waiter() gets a new waitable handle for its lifetime, and it only lives for the length of one wait. In practice you actually recycle the waiters to avoid creating new ones all the time, but you recycle them safely in a way that you know they cannot be still in use by any thread (alternatively you could just have a waiter per thread in its TLS and reset them between uses))

(BTW of course you don't actually use atomic bool in real code because bool is too badly defined)


11-30-11 | Basic sketch of Worker Thread system with dependencies

You have a bunch of worker threads and work items. Work items can be dependent, on other work items, or on external timed events (such as IO).

I've had some trouble with this for a while; I think I finally have a scheme that really works.

There are two queues :


RTR = ready to run : no dependencies, or dependencies are done

NR = not ready ; dependencies still pending

Each queue has an associated semaphore to count the number of items in it.

The basic work popping that each worker does is something like :


// get all available work without considering sleeping -
while( try_wait( RTR_sem ) )
{
    pop RTR_queue and do work
}

// (optionally spin a few times here and check RTR_sem)

// I may have to sleep -

wait( RTR_sem OR NR_sem ); // (*1)

if ( wakeup was from RTR_sem )
{
    pop RTR_queue and do work
}
else
{
    NRI (not ready item) = pop NR_queue
    deps = get dependencies that NRI needs to wait on

    wait( deps OR RTR_sem ); // (*3)

    if ( wakeup was from RTR_sem )
    {
        push NRI back on NR_queue and post NR_sem  // (*4)
        pop RTR_queue and do work
    }
    else
    {
        wakeup was because deps are now done
        NRI should be able to run now, so do it
        (*2)
    }  
}

*1 : the key primitive here is the ability to do a WFMO OR wait, and to know which one of the items signalled you. On Windows this is very easy, it's just WaitForMultipleObjects, which returns the guy who woke you. On other platforms it's trickier and probably involves rolling some of your own mechanisms.

Note that I'm assuming the semaphore Wait() will dec the semaphore at the time you get to run, and the OR wait on multiple semaphores will only dec one of them.

*2 : in practice you may get spurious wakeups or it may be hard to wait on all the dependencies, so you would loop and recheck the deps and possibly wait on them again.

How this differs from my previous system :

My previous system was more of a traditional "work stealing" scheme where each worker had its own queue and would try to just push & pop works from its own queue. This was lower overhead in the fast path (it avoids having a single shared semaphore that they have to contend on, for example), but it had a lot of problems.

Getting workers to go to sleep & wake up correctly in a work stealing scheme is a real mess. It's very hard to tell when you have no work to do, or when you have enough work that you need to wake a new worker, because you don't atomically maintain a work count (eg. a semaphore). You could fix this by making an atomic pair { work items, workers awake } and CAS that pair to maintain it, but that's just a messy way of being a semaphore.

The other problem was what happens when you have dependent work. You want a worker to go to sleep on the dependency, so that it yeilds CPU time, but wakes up when it can run. I had that, but then you have the problem that if somebody else pushes work that can immediately run, you want to interrupt that wait on the dependency and let the worker do the ready work. The semaphore OR wait fixes this nicely.

If you're writing a fractal renderer or some such nonsense then maybe you want to make lots of little work items and have minimal overhead. But that's a very special purpose rare case. Most of the time it's much more important that you do the right work when possible. My guiding principles are :

If there is no work that can be done now, workers should go to sleep (yield CPU)
If there is work that can be done now, workers should wake up
You should not wake up a worker and have it go back to sleep immediately
You should not have work available to do but the workers sleeping

Even in the "fractal renderer" sort of case, where you have tons of non-dependent work items, the only penalty here is one extra semaphore dec per item, and that's just a CAS (or a fetch_add) assuming you use something like "fastsemaphore" to fast-path the case of being not near zero count.

There is one remaining issue, which is when there is no ready-to-run work, and the workers are asleep on the first semaphore (they have no work items). Then you push a work item with dependencies. What will happen in the code sketch above is that the worker will wake up, pop the not ready item, then go back to sleep on the dependency. This violates article 3 of the resolution ("You should not wake up a worker and have it go back to sleep immediately").

Basically from *1 to *3 in the code is a very short path that wakes from one wait and goes into another wait; that's always bad.

But this can be fixed. What you need is "wait morphing". When you push a not-ready work item and you go into the semaphore code that is incrementing the NR_sem , and you see that you will be waking a thread - before you wake it up, you take it out of the NR_sem wait list, and put it into the NRI's dependency wait list. (you leave it waiting on RTR_sem).

That is, you just leave the thread asleep, you don't signal it to wake it up, it stays waiting on the same handle, but you move the handle from NR_sem to the dependency. You can implement this a few ways. I believe it could be done with Linux'es newer versions of futex which provide wait morphing. You would have to build your semaphore and your dependency waiting on futex, which is easy to do, then wait morph to transfer the wait. Alternatively if you build them on "waitset" you simply need to move an item from one waitset to the other. This can be done easily if your waitset uses a mutex to protect its internals, you simply lock both mutexes and move the waitable handle with both held.

The net result with wait morphing is very nice. Say for example are you workers are asleep. You create a work item that is dependent on an IO and push it. None of the workers get woken up, but one of them is changed from waiting on work available to waiting on the dependency. When the IO completes it wakes that worker and he runs. If somebody pushed a ton of work in the mean time, all the workers would be woken and they would do that work, and the dependent work would be pushed back on the NR queue and set aside while they did RTR work.

ADDENDUM : at the spot marked (*4) :


push NRI back on NR_queue and post NR_sem // (*4)
pop RTR_queue and do work

In real code you need do something a bit more complex here. What you do is something like :

if ( NRI is ready ) // double check
{
  RTR_sem.post() // we woke from RTR_sem , put it back
  do NRI work
}
else
{
  push NRI onto NR_lifo and post NR_sem
  pop RTR_queue and do work
}

we've introduced a new queue , the NR_lifo which is a LIFO (eg. stack). Now whenever you get an NR_sem post, you do :

// NR_sem just returned from wait so I know an NR item is available :

NRI = NR_lifo.pop()
if ( NRI == NULL )
  NRI = NR_queue.pop()

the item must be in one or the other and we prefer to take from the LIFO first. Basically the LIFO is a holding area for items that were popped off the FIFO and were not yet ready, so we want to keep trying to run those before we go back to the FIFO. You can use a single semaphore to indicate that there is an item in either queue.


11-28-11 | Some lock-free rambling

It helps me a lot to write this stuff down, so here we go.

I continually find that #StoreLoad scenarios are confusing and catch me out. Acquire (#LoadLoad) and Release (#StoreStore) are very intuitive, but #StoreLoad is not. I think I've covered almost this exact situation again, but this stuff is difficult so it's worth revisiting many times. (I find low level threading to be cognitively a lot like quantum mechanics, in that if you do it a lot you become totally comfortable with it, but if you stop doing it even for a month it is super confusing and bizarre when you come back to it, and you have to re-work through all the basics to convince yourself they are true).

(Aside : fucking Google verbatim won't even search for "#StoreLoad" right. Anybody know a web search that is actually verbatim? A whole-word-only option would be nice too, and also a match case option. You know, like basic text search options from like 1970 or so).

The classic case for needing #StoreLoad is WFMO. The very simple scenario goes like this :


bool done1 = false;
bool done2 = false;

// I want to do X() when done1 & done2 are both set.

Thread1:

done1 = true;
if ( done1 && done2 )
    X();

Thread2:

done2 = true;
if ( done1 && done2 )
    X();

This doesn't work.

Obviously Thread1 and Thread2 can run in different orders so done1 and done2 become set in random order. But one thread or the other should see them both set. But they don't; the reason is that the memory visibility can be reordered. This is a pretty clear illustration of the thing that trips up many people - threads can interleave both in execution order and in memory visibility order.

In particular the bad execution case goes like this :


done1 = false, done2 = false

T1 sets done1 = true
  T1 sees done1 = true (of course)
  T2 still sees done1 = false (store is not yet visible to him)

T2 sets done2 = true
  T2 sees done2 = true
  T1 still sees done2 = false

T1 checks done2 for (done1 && done2)
  still sees done2 = false
  doesn't call X()

T2 checks done1
  still sees done1 = false
  doesn't call X()

later
T1 sees done2=true
T2 sees done1=true

when you write it out it's obvious that the issue is the store visibility is not forced to occur before the load. So you can fix it with :

Thread1:

done1 = true;
#StoreLoad
if ( done1 && done2 )
    X();

As noted previously there is no nice way to make a StoreLoad barrier in C++0x. The best method I've found is to make the loads into fetch_add(0,acq_rel) ; that works by making the loads also be stores and using a #StoreStore barrier to get store ordering.


The classic simple waitset that we have discussed previously is a bit difficult to use in more complex ways.

Refresher : A waitset works with a double-check pattern, like :


signalling thread :

set condition
waitset.notify();

waiting thread :

if ( ! condition )
{
    waitset.prepare_wait()

    // double check :
    if ( condition )
    {
        waitset.cancel_wait();
    }
    else
    {
        waitset.wait();
    }
}

we've seen in the past how you can easily build a condition var or an eventcount from waitset. In some sense waitset is a very low level primitive and handy for building higher level primitives from. Now on to new material.

You can easily use waitset to perform an "OR" WFMO. You simply add yourself to multiple waitset. (you need a certain type of waitset for this which lets you pass in the primitive that you want to use for waiting). To do this we slightly extend the waitset API. The cleanest way is something like this :


instead of prepare_wait :

waiter create_waiter();
void add_waiter( waiter & w );

instead of wait/cancel_wait :

~waiter() does cancel/retire wait 
waiter.wait() does wait :

Then an OR wait is something like this :

signal thread 1 :

set condition1
waitset1.notify();

signal thread 2 :

set condition2
wiatset2.notify();


waiting thread :

if ( condition1 ) // don't wait

waiter w = waitset1.create_waiter();

// double check condition1 and first check condition2 :

if ( condition1 || condition2 ) // don't wait
  // ~w will take you out of waitset1

waitset2.add_waiter(w);

// double check :

if ( condition2 ) // don't wait

// I'm now in both waitset1 and waitset2
w.wait();

Okay. This works fine. But there is a limitation which might not be entirely obvious.

I have intentionally not made it clear if the notify() in the signalling threads is a notify_one (signal) or notify_all (broadcast). Say you want it to be just notify_one , because you don't want to make more threads than you need to. Say you have this scenario :


X = false;
Y = false;

Thread1:
X = true;
waitsetX.notify_one();

Thread2:
Y = true;
waitsetY.notify_one();

Thread3:
wait for X || Y

Thread4:
wait for X || Y

this is a deadlock. The problem is that both of the waiter threads can go to sleep, but the two notifies might both go to the same thread.

This is a general difficult problem with waitset and is why you generally have to use broadcast (for example eventcount is built on waitset broadcasting).

You may think this is an anomaly of trying to abuse waitset to do an OR, but it's quite common. For example you might try to do something seemingly simple like build semaphore from waitset.


class semaphore_from_waitset
{
    waitset_simple m_waitset;
    std::atomic<int> m_count;

public:
    semaphore_from_waitset(int count = 0)
    :   m_count(count), m_waitset()
    {
        RL_ASSERT(count >= 0);
    }

    ~semaphore_from_waitset()
    {
    }

public:
    void post()
    {
        m_count($).fetch_add(1,mo_acq_rel);
        // broadcast or signal :
        // (*1)
        //m_waitset.notify_all();
        m_waitset.notify_one();
    }

    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        int c = m_count($).load(mo_acquire);
        while ( c > 0 )
        {
            if ( m_count($).compare_exchange_weak(c,c-1,mo_acq_rel) )
                return true;
            // c was reloaded
        }
        return false;
    }

    void wait(HANDLE h)
    {
        for(;;)
        {
            if ( try_wait() )
                return;
    
            // no count available, get ready to wait
            ResetEvent(h);
            m_waitset.prepare_wait(h);
            
            // double check :
            if ( try_wait() )
            {
                m_waitset.retire_wait(h);
                // (*2)
                // pass on the notify :
                m_waitset.notify_one();
                return;
            }
            
            m_waitset.wait(h);
            m_waitset.retire_wait(h);
            // loop and try again
        }
    }
};

it's totally straightforward in the waitset pattern, except for the broadcast issue. If *1 is just a notify_one, then at *2 you must pass on the notify. Alternatively if you don't have the re-signal at *2 then the notify at *1 must be a broadcast (notify_all).

Now obviously if you have 10 threads waiting on a semaphore and you inc the count by 1, you don't want all 10 threads to wake up so that just 1 of them can dec the count and get to execute. The re-signal method will wake 2 threads, so it's better than broadcast, but still not awesome.

(note that this is easy to fix if you just put a mutex around the whole thing; or you can implement semaphore without waitset; the point is not to reimplement semaphore in a bone-headed way, the point is to just that even very simple uses of waitset can break if you notify_one instead of notify_all).

BTW the failure case for semaphore_from_waitset with only a notify_one and no resignal (eg. if you get the (*1) and (*2) points wrong) goes like this :


the problem case goes like this :

    T1 : sem.post , sem.post
    T2&T3 : sem.wait

    execution like this :

    T2&3 both check count and see zereo
    T1 now does one inc and notify, noone to notify yet
    T2&3 do prepare_wait
    T2 does its double-check, sees a count and takes it (does not retire yet)
    T3 does its double-check, sees zero, and goes to sleep
    T1 now does the next inc and notify
    -> this is the key problem
    T2 can get the notify because it is still in the waiter list
        (not retired yet)
    but T3 needs the notify

The key point is this spot :

            // double check :
            if ( try_wait() )
            {
                // !! key !!
                m_waitset.retire_wait(h);

you have passed the double check and are not going to wait, but you are still in the waiter list. This means you can be the one thread chosen to receive the signal, but you don't need it. This is why resignal works.


11-25-11 | Sustainability

I've always had a sour feeling about the "sustainability" movement, but I haven't been quite sure why exactly. As a knee-jerk reaction, I feel uneasy about any of the cultish movements where people get overly devoted to a narrow worldview, and tend to get into a dogma where adherence to the movement is more important than logically pursuing the original goals. So for example there are lots of current movements which I basically agree with, like "nose to tail" and "locavorism" and "minimalism" and so on, I think the basic ideas are great, but the movements themselves tend to be weird and actually kind of ruin the idea that I like so much by making it dogmatic.

(eg. if you eat pig's ears because you like pig's ears, that's cool. If you eat pig's ears because you got the whole pig and don't want to throw parts away, that's cool. If you eat pig's ears because you are trying to be a good "nose to tail"'er , that's fucking stupid.)

Anyhoo, I had a few realizations about what is that bothers me so much about "sustainability". First the obvious ones that I've known for a while :

"sustainability" is so expensive that it's only accessible to 10% of the population. When the vast majority of the population can't afford those products, they are inherently unsustainable, as in they do not support human life and they do not signficantly reduce the amount of factory farming , clear cutting , etc. A lifestyle which is only accessible to the rich cannot transform the earth.

The majority of "sustainable" products are unproven and may in fact not be sustainable, it's just a marketing word that doesn't correspond to any fact of actual low long-term impact on the earth. The fact is that the central valley of california and the fields of iowa have sustained "unsustainable" factory farming for the past 100 years or so, and despite predictions of imminent collapse, they are still feeding hundreds of millions of people for very low prices. On the other hand we get new coconut charcoal and bamboo or hemp or whatever which we don't really know how it will affect the earth in mass production on the long term.

Buying a bunch of new products because they are "sustainable" is of course highly ironic. The most destructive thing that modern society does is buy new junk every time there's a new trend, and this appears to be just another new trend, people throw out their unfashionable "unsustainable" stuff to buy new approved stuff, and will throw that out for the next trend. (also ironically, "minimalism" generally tends to involve buying new stuff).

High-paid low-yield gentleman farmers are inherently unsustainable. You cannot support a 7+ billion person planet with a good quality of life if the cost of a piece of lumber or a piece of fruit is so high and takes so much labor. Our quality of life (per capita) is entirely based on the fact that those things are so cheap and easy, so that we can spend more time producing TV shows and iPods. Now the more extreme hippie-ish end of the sustainability movement might espouse a true back-to-the-land lifestyle change where in fact people do spend more time laboring and don't get TV shows and iPods, but that is a small fringe, the main thrust wants the spoils of civilization.

Now a little equivocation. Buying "sustainable" junk is obviously a form of charity. When you spend much more on a "sustainable" version of a product, you are essentially donating the difference. Where does that donation go? Some (I suspect small) portion of it actually goes to benefiting the earth. Most of the rest goes to profit of the product maker. On that level, it is a very bad form of charity; your charity dollars would have a much greater direct benefit on the earth if you just bought normal products and donated the difference to direct action.

But it is a bit more complicated than that. For the most part I'm not delighted by exercising political expression through purchasing (it's far too easy to manipulate and take advantage of, and in the end the only thing that They care about is that you keep buying things like a good little consumer, so you really aren't winning) - however I can't deny that it does sometimes work. When industry sees that lots of consumers are willing to waste their dollars on "green" products, they do sometimes change their practices for the better, and the net result can be a greater impact than the amount of charity dollars suggest. That is, there is a sort of leverage if businesses think that the "political buyers" will continue spending lots of money far into the future, the businesses will make a change based on lots of *future* dollars. Thus something like only a few tens of millions of dollars in charity spending can actually create a hundred-million dollar product line transformation.

As an aside, I should note that there are lots of small scale "sustainable" endeavors that are basically irrelevant because they are inherently small scale and cannot ever have a significant effect on the planet. For example the reclaimed lumber movement, it's okay, I have no major objection to it (though it's not entirely clear that it's the best value for your charity lumber eco dollars), but it's just irrelevant to any large scale analysis because it can't significantly reduce commercial lumber use. The only sustainable businesses that matter are the ones that have the possibility to go large scale.

Anyhoo, the thing that occurred to me last night was that the large scale sustainable industry is basically built on the back of unsustainable industry.

What I mean is, the large-scale mass produced "sustainable" industry (eg. bamboo flooring, "sustainable" chocolate, etc) is largely about making products in the 3rd world and exporting them to the 1st world. First of all this is sort of inherently unsustainable and hypocritical because it relies on a massive income gap for affordability, essentially you have to have people in subsistence living conditions to subsidize this product, and a good liberal who is spending their charity dollars to direct the world towards a better future should not include that in their better future. But more directly, the workers in those sustainable factories could not live a decent life on their low wages without unsustainable industry. The only way they can be paid so low is because they can get cheap factory farm corn to eat, and cheap sneakers and clothes and everything they need to live. If they had to buy the expensive sustainable junk, they would have to have huge wages, which would make the product even more expensive, which would make it impossible.


11-23-11 | This is not okay

Fuck this shit. I'm going to Hawaii.


11-22-11 | The Mature Programmer

1. The Mature Programmer

The mature programmer manages their own time and productivity well. The MP knows that maintenance is as much work as the initial writing and code always takes longer than you think. The MP knows that any changes to code can introduce bugs, no matter how seemingly trivial. The MP knows that premature optimization is foolish and dangerous. The MP knows that sexy coding like writing big complex systems from scratch is rarely the best way to go. The MP does not get into ego competitions about who has the prettiest code. The MP acheives the best final result in the minimum amount of time.

When I started at Oddworld, I was watching lots of game companies get into dick-waving contests about who had the greatest home-rolled graphics engine, and I was also watching lots of indie developers spend massive amounts of time on their "one true" product and never actually ship it. I resolved that we would not fall into those traps - we would be humble and not reach too far, we would not let our egos stop us from licensing code or using old fashioned solutions to problems, we would stay focused on the end product - any sexy code that didn't produce a visible benefit in the actual shipping game was nixed. For the most part I think we succeeded in that (there were a few digressions that were mainly due to me).

But the way of the Mature Programmer can be a trap which comes back to bite you.

The problem is that writing code in this way is not very much fun. Sure there's the fun of making the product - and if you're working on a game and believe in the game and the team, then just seeing the good product come out can give you motivation. But if you don't have that, it can be a real slog.

Most of us got into programming not for the end products that we create, but because the programming itself is a joy. Code can be beautiful. Code can be a clever, artistic, exciting creation, like a good mathematical proof. The Mature Programmer would say that "clever code is almost always dangerous code". But fuck him. The problem is that when you get carried away with being "mature" you suck the joy right out coding.

You need to allow yourself a certain amount of indescretions to keep yourself happy with your code. Sure those templates might not actually be a good idea, but you enjoy writing the code that way - fine, do it. Yes, you are optimizing early and it just makes the code harder to maintain and harder to read and more buggy - but you love to do that, fine, do it.

Obviously you can't go overboard with this, but I think that I (and many others) have gone overboard with being mature. Basically in the last ten years of my evolution as a coder I have become less of a wild card "hot shot" and more of a productivity manager, an efficient task analyzer and proactive coordinater of code-actualizing solutions. It's like a management beaurocracy of one inside my head. It's horrible.

I think there are two factors to consider : first is that being "mature" and productive can cause burnout which winds up hurting your productivity, or it can just make coding unpleasant so you spend fewer hours at it. Most "mature" coders brag about the fact that they can get as much done in 6 hours as they used to do in 14. But those 14 hours were FUN, you coded that long because you loved it, you couldn't get to sleep at night because you wanted to code more; now the 6 hours is all sort of unpleasant because instead of rolling your own solution you're just tying together some java and perl packages. Second is that being productive is not the only goal. We are coding to get some task done and to make money, but we're also coding because we enjoy it, and actually being less productive but enjoying your coding more may be a net +EV.

2. The healthy opposition of a producer

Many programmers in normal coding jobs hate having the interference of a producer (or corporate management, or the publisher, or whatever). This person enforces stupid schedules and won't let us do the features we want, and urrgh we hate them! These coders long to be able to make their own schedules and choose their own tasks and be free.

It's actually a very healthy and much more relaxing in many ways to have that opposition. When you have to schedule yourself or make your own decisions about tasks, you become responsible for both the creative "reach for the sky" side and the responsible "stay in budget" side. It's almost impossible to do a good job of both sides. This can happen if you are an indie or even if you are a powerful lead with a weak producer.

Most creative industries know that there is a healthy opposition in having the unconscrained creative dreamer vs. the budget-enforcing producer. You don't want the dreamer to be too worried about thinking about schedules or what's possible - you just want them to make ideas and push hard to get more.

When you have to cut features or crunch or whatever, it's nice to have that come from outside - some other force makes you do it and you can hate them and get on with it. It's nice to have that external force to blame that's not on your team; it gives you a target of your frustration, helps bonding, and also gives you an excuse to get the job done (because they told you to).

When you have to balance dreams vs schedules on your own, it adds an intellectual burden to every task - as you do each task you have to consider "is this worth the time? is this the right task to do now? should I do a simpler version of this?" which greatly reduces your ability to focus just on the task itself.

3. Coding standards

It's kind of amazing to me how many experienced programmers still just don't understand programming. The big difficulty in programming is that the space of the ways to write something are too large. We can get lost in that space.

One of the problems is simply the intellectual overload. Smart coders can mistakenly believe that they can handle it, but it is a burden on everyone. Every time you write a line of code, if you have to think "should I use lower case or mixed caps?" , or "should I make this a function or just write it in line?" , your brain is spending masses of energy on syntactic decisions and doesn't have its full power for the functionality. Strict coding standards are actually an intellectual relief because they remove all those decisions and give you a specific way to do the syntax. (The same of course goes for reading other people's code - your eyes can immediately start looking at the functionality, not try to figure out the current syntax)

The other big benefit of coding standards is creating a "meta language" which is smaller than the parent language and enforces certain invariants. By doing that you again reduce the space that the brain has to consider. For example you might require that all C macros behave like functions (eg. don't eat scopes and don't declare variables). Now when I see one I know I don't have to worry about those things. Or you might require that globals are never externed and only get accessed through functions called "GetGlobal_blah". It doesn't really matter what they are as long as they are simple, clear, uniform, and strictly enforced, because only if they are totally reliable can you stop thinking about them.

4. The trap of "post-Mature Programmer" ism.

Many great coders of my generation have gone through the strict clean rules-following coder phase and have moved onto the "post" phase. The "post-mature programmer" knows the importance of following strict coding style rules or not indulging themselves too much, but also sees the benefit of bending those rules and believes that they can be a bit more free about deciding on what to do for each situation.

I believe that they/we mostly get this wrong.

The best analogy I can think of is poker. Most successful poker players go through several phases. First you think you're ever so clever and you can bluff and trap people and play all sorts of weird lines. Once you move up levels and start playing serious poker this delusion is quickly wiped out and you realize you need to go back to fundemantals. So then most people will go through the TAG "standard line" phase where they learn the right thing to do in each situation and the standard way to analyze hands, and they will be quite successful with this. (note that "standard line" doesn't mean nitty, it involves things like squeeze plays and even check-shove river bluffs, but it's based on playing a balanced range and studying EV). But then they are so successful with their solid play that they start to think they can get away with "mixing it up", playing hands that are almost certainly not profitable because they think they are good enough post-flop to make up for it (eg. Durrr style), or imagining that by playing some minus EV hands it helps their image and pays off later.

This is almost always wrong. Limping AA is almost always wrong, opening 72o UTG is almost always wrong - maybe you've done some analysis and you've decided it's the right thing at this table at this moment (for example limping AA because the people behind you attack limpers way too much and they think you would never limp AA so they will get stuck easily). It's wrong.

(telling yourself that your current bad play is made up for with later "image value" is one of the great rationalizations that poker players use an excuse to justify their bad play. programmers due to same with a set of excuses like "performance" that are really just rationalizing justifications for their bad practices; with poker, EV in the hand is worth more than EV in the bush; that is, the later image value you might win is so small and dubious and depends on various things working out just right that it's almost never correct to give up known current value for possible future value. (a particularly simple case of this is "implied odds" which bad players use an excuse to chase hands they shouldn't))

The problem is that when you open yourself up to making any possible move at any moment, there is simply too much to consider. You can't possibly go through all those decisions from first principles and make the right choice. Even if you could, there's no way you can sustain it for thousands of hands. You're going to make mistakes.

The same is true in coding; the post-MP knows the value of encapsulating a bit of functionality into a struct + helpers (or a class), but they think I'm smart enough I can decide not to do that in this particular case. No! You are wrong. I mean, maybe you are in fact right in this particular case, but it's not a good use of your brain energy to make that decision, and you will make it wrong some times.

There is a great value in having simple rules. Like "any time I enter a pot preflop, I come in for a raise". It may not always be the best thing to do, but it's not bad, and it saves you from making possibly big mistakes, and most importantly it frees up your brain for other things.

The same thing happens with life decision making. There's a standard set of cliches :

Don't marry the first person you sleep with
Don't get in a serious relationship off a rebound
Don't buy anything if the salesman is pushing it really hard
Take a day to sleep on any big decision
Don't lend money to poor friends
etc.
you may think "I'm smart, I'm mature, I don't need these rules, I can make my own decision correctly based on the specifics of the current situation". But you are wrong. Sure, following the rules you might miss out on the truly optimum decision once in a while. But it's foolish arrogance to think that your mind is so strong that you don't need the protection and simplicity that the rules provide.

In poker the correct post-solid-player adjustment is very very small. You don't go off making wild plays all the time, that's over-confidence in your abilities and just "spew". A correctly evolved player basically sticks to the solid line and the standard way of evaluating, but knows how to indentify situations where a very small correction is correct. Maybe the table is playing too tight preflop, so in the hijack position you start opening the top 35% of hands instead of the top 25% of hands. You don't just start opening every hand. You stay within the scope of the good play that you understand and can do without rethinking your whole approach.

The same is true in programming I believe; the correct adjustment for post-mature coding is very small; you don't have to be totally dogmatic about making every member variable private, but you also don't just stop encapsulating classes at all.


11-09-11 | Weird shite about Exceptions in Windows

What happens when an exception is thrown in Windows ? (please fill in any gaps, I haven't researched this in great detail).

1. The VectoredExceptionHandlers are called. One of these you may not be aware of is the "first chance" exception handler that the MSVC debugger installs. If you have the flags set in a certain way, this will cause you to breakpoint at the spot of the throw without passing the exception on to the SEH chain.

2. The list of __except() handlers is walked and those filters are invoked; if the filter takes the exception then they handle it.

* of note here is the change from x86 to x64. Under x86 SEH handlers were made on the stack and then tacked onto the list as you descended (basically the __try corresponds to tacking on the handler); under x64 that is all removed and the SEH filter walk relies on being able to trace back up the function call stack. Normally there's no difference, however under x64 if your function call stack can't be walked for some reason, then your SEH filters won't get called! This can happen for a few reasons; it can happen due to the 32-64 thunk layer, it can happen if you manually create some ASM or "naked" functions and don't maintain the stack trace info correctly, and it can happen of course if you stomp the return addresses in the stack. See for example : The case of the disappearing OnLoad exception – user-mode callback exceptions in x64 at Thursday Night . (stomping the stack can of course ruin SEH on x86 as well since the exception registration structures are on the stack).

More info on the x64 SEH trace : at osronline and at nynaeve .

3. If no filter wanted the exception, it goes up to the UnhandledExceptionFilter. In MSVC's CRT this is normally set to __CxxUnhandledExceptionFilter, that function itself will check if a debugger is present and do different things (eg. breakpoint).

4. If UnhandledExceptionFilter still didn't handle the exception and passes it on, the OS gets it and you get the application critical error popup box. Depending on your registry settings this may invoke automatic debugging. However as noted here : SetUnhandledExceptionFilter and VC8 - Jochen Kalmbach's WebLog there is a funny bypass where the OS will pass the exception directly to the OS handler and not call your filter.

Automatic debugging is controlled by

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug].  
when it was first introduced it defaulted to Dr Watson. At some point (Vista?) that was changed to point it to the Windows Troubleshooting engine instead. I believe that when you install DevStudio this registry key is changed to point to vsjitdebugger. The key "Auto" is set to 0 by default which means ask before popping into the debugger.

To clarify a bit about what happens with unhandled exceptions : your unhandled exception callback is not called first, and is not necessarily called at all. After all the SEH filters get their chance, the OS calls its own internal "UnhandledExceptionFilter" - not your callback. This OS function checks if you are in a debugger and might just pass off the exception to the debugger (this is *not* the "first chance" check which is done based on your MSVC check boxes). This function also might just decide that the exception is a security risk and pass it straight to the AeDebug. If none of those things happen, then your filter may get called. (this is where the CRT CxxUnhandledExceptionFilter would get called if you didn't install anything).

Another note : the standard application error popup box just comes from UnhandledExceptionFilter. One of the ways you can get a silent application exit with no popup is if the OS detects that your SEH chain is corrupted, it will just TerminateProcess on your ass and drop you out. Similarly if you do something bad from inside one of your exception handlers. (another way you can get a silent TerminateProcess is if you touch things during thread or process destruction; eg. from a DLL_THREAD_DETACH or something like that, if you try to enter crit secs that are being destroyed you can get a sudden silent process exit).


Some links :

DebugInfo.com - Unexpected user breakpoint in NTDLL.DLL
Under the Hood New Vectored Exception Handling in Windows XP
SetUnhandledExceptionFilter Anti Debug Trick « Evilcodecave’s Weblog
SetUnhandledExceptionFilter and VC8 - Jochen Kalmbach's WebLog
SetErrorMode function
C++ tips AddVectoredExceptionHandler, AddVectoredContinueHandler and SetUnhandledExceptionFilter - Zhanli's tech notes - Sit
A Crash Course on theDepths of Win32 Structured Exception Handling, MSJ January 1997

A bit of interesting stuff about how the /RTC run time checks are implemented :

Visual C++ Debug Builds–”Fast Checks” Cause 5x Slowdowns Random ASCII

A bit about stack sizes on windows, in particular there are *two* thread stack sizes (the reserved and initial commit) and people don't usually think about that carefully when they pass a StackSize to CreateThread :

Thread Stack Size

Not directly related but interesting :

Pushing the Limits of Windows Processes and Threads - Mark's Blog - Site Home - TechNet Blogs
Postmortem Debugging Dr Dobb's
John Robbins' Blog How to Capture a Minidump Let Me Count the Ways
Collecting User-Mode Dumps
Automatically Capturing a Dump When a Process Crashes - .NET Blog - Site Home - MSDN Blogs


11-08-11 | Differences Running in Debugger

Bugs that won't repro under debugging are the worst. I'm not talking about "debug" vs "release" builds; I mean the exact same exe, run in the debugger vs. not in the debugger.

What I'd like is to assemble a list of the differences between running under the debugger and not under the debugger. I don't really know the answers to this so this is a community participation post. eg. you fill in the blanks.

Differences in running under the debugger :

1. Timing. A common problem now with heavily threaded apps are bugs due to timing variation. But where do the timing differences come from exactly?

1.a. OutputDebugString. Duh, affects timing massively. Similarly anything you do dependent on IsDebuggerPresent().

1.b. VC-generated messages about thread creation etc. These obviously affect timing. You can disable them being shown by right-clicking in the output window of the debugger, but the notification is still being sent so you can never completely eliminate the timing difference for creating/destroying threads. (and the debugger does a lot more work for thread accounting anyway, so create/destroy will always have significant timing variation).

2. Exceptions. (not C++ exceptions, which are handled pretty uniformly, but more the low level SEH exceptions like access violations and such). Obviously in the debugger you can toggle the handling of various exceptions and that can change behavior. One thing I'm not sure of is if there are any registry settings or other variables that control exception behavior in NON-debugged runs? (* more on this in another post)

3. Stack. Is there a difference here? Not that I know of.

4. Debug Heap. This is probably the biggest one. Processes run in the debugger on windows *always* get the debug heap, even if you didn't ask for it. You can turn this off by setting _NO_DEBUG_HEAP as an environment variable or starting MSVC with -hd. See Behavior of Spawned Processes .

Note that this isn't coming from MSVC, it's actually in ntdll. When you create your process heap, ntdll does a "QueryInformationProcess" and sees if it's being debugged, and if so it stuffs in the debug heap. The important thing is that this is at heap creation time, which leads to a solution.

5. Child Process issues. Because the debugged process is a child process of the debugger, it inherits its process properties. (the same issue can occur for running under "cmd" vs. spawning from explorer). Two specifics are "permissions" and environment variables. Another inherited value is the "ErrorMode" as in "GetErrorMode/SetErrorMode".

There's a solution to #4 and #5 which is this :

Start your app outside of the debugger. Make it do an int 3 so it pauses. Then attach the debugger. You can now debug bug you don't get some of the ran-from-debugger differences.

(note to self about attaching : for some reason the MSVC "attach to running process" seems to fail a lot; there are other ways to do it though, when you get an int 3 message box popup you can click "debug" there, or from task manager or procexp you can find the task and click "debug" there).


11-03-11 | BoolYouMustCheck

I've got a lot of functions that return error codes (rather than throw or something). The problem with that is that it's very easy to just not check the error code and then you have incorrect code that can possibly break in a nasty way if the error case is hit and not detected.

One way to test this is like this :


class BoolYouMustCheck
{
private:
    bool m_b;
    mutable bool m_checked;

public :

    //BoolYouMustCheck() : m_b(false), m_checked(false) { }
    BoolYouMustCheck(bool b) : m_b(b), m_checked(false) { }
    
    ~BoolYouMustCheck()
    {
        ASSERT( m_checked );
    }
    
    operator bool () const
    {
        m_checked = true;
        return m_b;
    }

};

it's just a proxy for bool which will assert if it is assigned and never read.

So now you can take a function that returns an error condition, for example :


bool func1(int x)
{
    return ( x > 7 );
}

normally you could easily just call func1() and not check the value. But you change it to :

BoolYouMustCheck func2(int x)
{
    return ( x > 7 );
}

(in practice you probably just want to do #define bool BoolYouMustCheck)

Now you get :


{

    int y = clock();

    // asserts:
    func1(y);

    // asserts :
    bool b1 = func1(y);
    
    // okay :
    bool b2 = func1(y);
    if ( b2 )
        y++;
    
    // okay :
    if ( func1(y) )
        y++;
        
    return y;
}

which is kind of nice.

The only ugly thing is that the assert can be rather far removed from the line of code that caused the problem. In the first case (just calling func1 and doing nothing with the return value), you get an assert right away, because the returned class is destructed right away. But in the second case where you assign to b1, you don't get the assert until the end of function scope. I guess you could fix that by taking a stack trace in the constructor.

(note : if you want to intentionally ignore the return value b1 you can just add a line like (int) b1; to surpress the assert.


11-03-11 | The difficulty of school reform

I'm so opposed to top-down metric based "reform" that I figured I should talk about what I think is a better alternative.

First of all there is no doubt that American public schools are sick. There are lots of good teachers and good classes, but also lots of bad. In my opinion, we don't need massive structural reform, we need a way to get rid of the bad teachers.

(almost always a cluster of bad teachers goes with a bad principal and often a bad superindendent too; they tend to be teachers with seniority who just don't care much anymore, and they all just want to maintain the status quo)

I do believe that charters are not the answer. There's nothing wrong with private schools, but they should be private. I don't believe that federal money should go to private institutions, almost ever, because it leads to corruption, and it also just sucks funding out of the public school system. The charters almost always wind up being a way to discriminate about entrants in some way (even just by desire to go to them), and are also often just a way to sneak around the teacher's union. Anyhoo.

I think the answer is motivating teachers and rewarding good teachers, and also being able to fire bad teachers. If teachers are motivated to succeed, and principals are motivated to hire good teachers and fire bad ones, you have a more free labor market and things will improve.

But how do you do that? This is where the trouble comes in.

I believe standardized test performance is a terrible way to do measure success. Most simple metrics like this would be similarly bad.

Judgement by a panel of peers doesn't work, because the teachers get into collusion and just say everyone is great. Perhaps this could be improved by making teachers grade each other, and forcing the grade to be on a curve so there are gauranteed to be winners and losers. But this would just degenerate into a game of "Survivor" where the old guard makes alliances to vote for each other and so on.

I believe the best answer is to let parents grade the teachers. Schools are one of the few areas where I think local government is actually better than top-down federal government, because it's one of the few areas where the local people actually pay attention to what's happening and get involved. (on the other hand, I think local school funding is probably unconstitutional and needs to be abolished; it creates great inequality to this day, despite many court rulings trying to redistribute funding (such as the "robin hood" ruling in Texas))

One idea is to let parents apply for what school they want their kids in and what specific teacher they want. Kids are then assigned by lottery, but you count the number of applications each teacher gets and that's their score. It's basically measuring demand as if teaching was a good. Teachers with high scores get raises and teachers with low scores get fired.

Now you obviously have to control for things like teachers just giving all A's, so people apply because it's the "easy" teacher. One solution might to force all classes to be graded on a bell. That would actually balance out the social stratification of classes because the grade-grubber kids might want to avoid the most prestigious classes (since they would be full of smart kids and very hard to do well on with a bell curve).

That's all sort of okay I think, but there's a big problem, which is that it biases strongly against areas where the parents don't give a shit. And those are the most problematic areas.


11-02-11 | StringMatchTest Release

Code for my string match testbed discussed previously. I'm not gonna do the work to turn this into a clean standalone, so it's a big mess and you can take what you like out of it.

stringmatchtest.zip (45k)

Note : the stringmatchtest.vcproj project refers to some files that are not included in this distribution. Just delete them from the project.

Requires cblib.zip (633k)

You may also need STLPort (I haven't tried building with the VC STL , I use STLPort 5.1.5 or 5.2.1). (BTW I had to modify the STLPort headers to make it build on VS 2008 ; the mods should be obvious).

Tested with VC 2005 and 2008. Does not build with VC 2010 currently.

The most interesting bit is probably in test_suffixarray, which implements the three suffix-array based string searchers previously described on this blog. See previous posts :

cbloom rants 06-17-10 - Suffix Array Neighboring Pair Match Lens
cbloom rants 09-23-11 - Morphing Matching Chain
cbloom rants 09-25-11 - More on LZ String Matching
cbloom rants 09-27-11 - String Match Stress Test
cbloom rants 09-28-11 - Algorithm - Next Index with Lower Value
cbloom rants 09-28-11 - String Matching with Suffix Arrays
cbloom rants 10-02-11 - How to walk binary interval tree
cbloom rants 09-24-11 - Suffix Tries 1
cbloom rants 09-24-11 - Suffix Tries 2
cbloom rants 09-26-11 - Tiny Suffix Note
cbloom rants 09-29-11 - Suffix Tries 3 - On Follows with Path Compression

cbloom rants 09-30-11 - String Match Results Part 1
cbloom rants 09-30-11 - String Match Results Part 2
cbloom rants 09-30-11 - String Match Results Part 2b
cbloom rants 09-30-11 - String Match Results Part 3
cbloom rants 09-30-11 - String Match Results Part 4
cbloom rants 09-30-11 - String Match Results Part 5 + Conclusion
cbloom rants 10-01-11 - String Match Results Part 6

StringMatchTest includes :


/*
 * divsufsort.c for libdivsufsort-lite
 * Copyright (c) 2003-2008 Yuta Mori All Rights Reserved.
 *

/* LzFind.c -- Match finder for LZ algorithms
2009-04-22 : Igor Pavlov : Public domain */

/*
    MMC (Morphing Match Chain)
    Match Finder
    Copyright (C) Yann Collet 2010-2011

StringMatchTest like all cbloom.com software is released under zlib license (basically free for all uses).


11-02-11 | I need

I need some light entertainment that won't actively damage my brain.

Ideally like a web feed or something so I get a few minutes of mild diversion every day.

Nothing serious or political or overly technical that will take real concentration or make me angry.

But also nothing that will insult my intelligence or subtly put filth into my brain.

As an example of bad ones : I rather enjoy architecture and design, but I find almost all the design blogs are way too consumerist, pushing the constant purchasing of new crap just because it's the new thing, and that makes me ill ; another example is my current addiction, which is car news sites, which subconsciously fills my brain with all kinds of horrible ideas, like drifting is cool, I should make my exhaust louder, and so on.


10-31-11 | Photos , Mostly Enchantments

Colchuk lake is a beautiful turqoise :

I'm in love with this rock face. It towers over Colchuk and feels like a real living being, it has such presence, and you're not sure if it's protective or menacing; The super-difficult barely-a-trail up to the enchantments is on the left :

Sun shining through larches :

The actual enchantments area is a weird top-of-the-world wasteland :

This is from a hike to Snow Lake earlier in the year; I noticed these vortices sheeting off a rock in the river; the river had perfect laminar flow and the rock edge disturbance was shedding this regular "street" of round vortices that then acted as lenses. The lenses were so perfect they were creating caustic rings of light focused and defocused on the river bed. You can see some of the lenses in the lower left of the photo and the light through them hits the creek bed near the top of the photo.

This guy visited our back yard a few days ago. Pearlescent feathers. No idea what he is, but he seemed real tame like he was probably a pet.

This is the view from my new home office (it's downtown Seattle in the distance). We're in a slight microclimate where we get fractionally less wetness (it's not like a real San Francisco microclimate that's dramatically different; I think we get 90-95% of the wetness); the storms hit us later and leave us sooner, so I get to watch them roll in to Seattle, and I get to watch them clear up. The result is a lot of rainbows. I took a photo of the first one I saw. (BTW whoever has that house with the red roof, I thank you, it really adds some spice to my view).


10-31-11 | Small poker note

This year's WSOP final table has perhaps the best tournament poker players ever at a WSOP final table. Maybe at any major live tournament (?). I don't really follow tournaments much, but there's only one fish at the table, and he's not even a huge fish, he's just a "solid" old player, and everyone else is an internet kid, which means that they actually know what things like "fold equity" and "tournament chip EV vs. real dollar EV" is.

Martin Staszko (40.1 million in chips)
Eoghan O'Dea (33.9 million)
Matt Giannetti (24.7 million)
Phil Collins (23.8 million)
Ben Lamb (20.8 million)
Badih Bounahra (19.7 million)
Pius Heinz (16.4 million)
Anton Makiievskyi (13.8 million)
Samuel Holden (12.3 million)

I'm sure it will be horrible TV ; for one thing, ESPN will just show the all-ins which is horrible boring poker broadcasting. But beyond that, the young internet players are just SOoooo boring to watch. God, cash in some of your poker winnings and buy a personality, please. You may as well point a camera at me while I'm coding.

It's too bad that Daniel Negreanu doesn't have enough humility to buy some lessons from JMan or someone good, because it's so much more entertaining to see someone who actually interacts at the table. And his live metagame skills are very good, so if he would just play better technically he could do well.


10-31-11 | What are these pipes ?

So there's this low area along the path by my house that gathers water. I went out to fix it and digging around I found this colony of pipes :

The two small ones are two inch diameter, the big one is four inch; they're black PVC with hammered in caps (that I can easily pull out by hand). I dug down another foot or so and from what I could see they just continue straight down.

Pretty much all my utilities flow past that spot so they could be related to almost anything. In particular the sewer goes past there so I think they might be an outside sewer access. Kind of wierd that's there three of them though instead of just one.

I don't think that we have any kind of french drain system, though it would make sense to have a low point there with a french drain drawing the water off.

Anyhoo, kind of curious what they are before I cover up that spot with a bunch of rock and dirt to raise it up. Photos backing up :


10-31-11 | I hate the web

Google search no longer supports +. The Google response does a great job of making it worse by responding in super-douche corporate bullshit speak, like "actually you didn't want that feature, we know better than you, it's better without it" and also the power-douche "we hear your concerns and are glad for your feedback but fuck you we're going to ignore you".

Google has done some similarly epically retarded shit with Google+ . Like, hey dumb asses, if businesses want to be on Google+ and you don't have the business accounts features done yet, why don't you just let them keep using the normal Google+ and migrate them over when you have the business features done? Oh no, let's kick everyone off and cause a big shit storm because we know the "right way" and you will thank us because it's "worth the wait". Epically retarded.

Anyway, Google Reader got a new look and it seems to be neither good nor bad, but it PISSES ME OFF.

I fucking hate it when shit that I use as a basic part of my day changes under me randomly. It's like if somebody semi-randomly periodically came into your house and swapped out your clothes or your appliances. Sometimes they fuck up some feature that you really liked. But most of all it's just distracting and annoying and ruins your familiarity with a tool.

Web software in general sucks because of this. Even ignoring all the bullshit about how slow it is, or the fact that you need a live connection, or the fact that you can't download web pages properly, etc. it sucks because people are constantly changing it out under your feet.


10-27-11 | Tiny LZ Decoder

I get 17 bytes for the core loop (not including putting the array pointers in registers because presumably they already are there if you care about size) .

My x86 is rusty but certainly the trick to being small is to use the ancient 1 byte instructions, which conveniently the string instructions are. For example you might be tempted to read out length & offset like this :


        mov cl,[esi]    // len
        mov al,[esi+1]  // offset
        add esi,2

but it's smaller to do

        lodsb  // len
        mov cl,al
        lodsb  // offset

because it keeps you in 1 byte instructions. (and of course any cleverness with lea is right out). (naively just using lodsw and then you have len and offset in al and ah is even better, but in practice I can't make that smaller)

Anyhoo, here it is. I'm sure someone cleverer with free time could do better.


__declspec(noinline) void TinyLZDecoder(char * to,char * fm,char * to_end)
{
    __asm
    {
        mov esi,fm
        mov edi,to
        mov edx,to_end
        xor eax,eax
        xor ecx,ecx
    more:
        movsb   // literal
        lodsb   // len
        mov cl,al
        lodsb   // offset
        push esi
        mov esi,edi
        sub esi,eax
        rep movsb   // match
        pop esi
        cmp edi,edx
        jne more
    }

}

------------------------------------------------------

    more:
        movsb   // literal
00401012  movs        byte ptr es:[edi],byte ptr [esi] 
        lodsb   // len
00401013  lods        byte ptr [esi] 
        mov cl,al
00401014  mov         cl,al 
        lodsb   // offset
00401016  lods        byte ptr [esi] 
        push esi
00401017  push        esi  
        mov esi,edi
00401018  mov         esi,edi 
        sub esi,eax
0040101A  sub         esi,eax 
        rep movsb   // match
0040101C  rep movs    byte ptr es:[edi],byte ptr [esi] 
        pop esi
0040101E  pop         esi  
        cmp edi,edx
0040101F  cmp         edi,edx 
        jne more
00401021  jne         more (401012h) 

Also obviously you would get much better compression with a literal run length instead of a single literal every time, and it only costs a few more bytes of instructions. You would get even better compression if the run len could be either a match len or a literal run len and that's just another few bytes. (ADDENDUM : see end)


A small "Predictor/Finnish" is something like this :


    __asm
    {
ByteLoop:   lodsb   // al = *esi++ // control byte
            mov edx, 0x100
            mov dl, al
BitLoop:
            shr edx, 1
            jz  ByteLoop
            jnc zerobit
            lodsb
            mov [ebx], al
zerobit:    mov al, [ebx]
            mov bh, bl
            mov bl, al
            stosb  // *edi++ = al
            jmp BitLoop
    }

the fast version of Finnish of course copies the bit loop 8 times to avoid looping but you can't do that if you want to be small.

I'm quite sure this could be smaller using some clever {adc esi} or such. Also the sentry bit looping is taking a lot of instructions and so on.

Note the crucial trick of "Finnish" is that the hash table must be 64k in size and 64k aligned, so you can do the hash update and the table address computaton just by cycling the bottom two bytes of ebx. (I use the name "Predictor" for the generic idea of the single bit prediction/literal style compressor; "Finnish" is the specific variant of Predictor that uses this bx trick).

(note that this is not remotely the fast way to write these on modern CPU's)


ADDENDUM : Better (in the sense of more compression) LZ decoder in 22 bytes (core loop only) :


__declspec(noinline) void TinyLZDecoder(char * to,char * fm,char * to_end)
{
    __asm
    {
        mov esi,fm
        mov edi,to
        mov edx,to_end
        xor eax,eax
        xor ecx,ecx
    more:
        mov cl,[esi]    // len
        inc esi
        shr cl,1        // bottom bit is flag
        jc literals
        
    //match:
        lodsb   // offset -> al
        push esi
        mov esi,edi
        sub esi,eax
        rep movsb   // copy match
        pop esi
    
        // ecx is zero, just drop through
    literals:
        rep movsb  // copy literal run
    
        cmp edi,edx
        jne more
    }

}

Note that obviously if your data is bigger than 256 bytes you can use a two byte match offset by doing lodsw instead of lodsb.

x86 of course is a special case that's particularly amenable to small LZ decoders; it's not really fair, all other instruction sets will be much larger. (which is sort of ironic because RISC is supposed to make instruction encoding smaller)


10-27-11 | Metrics

The best thing you can ever have in software development is a good metric that you are trying to optimize for. A repeatable test case that produces a score and all you have to do is maximize that score.

However, having a performance metric that doesn't exactly match what you want to optimize can be very harmful. You have to be very careful about how you set your metric and over-training for it.

If you make a speed test where you run the same bit of code over and over a thousand times, you wind up creating code that overlaps well with itself and runs fast when it's hot in cache, and maybe code that factors out to a precompute then repeat - not necessarily things that you wanted.

If you set a compression challenge based on the Calgary Corpus, you wanted to get just great compressors, but instead you get compressors specifically tuned for those files (mainly english text).

An example that has misled many people is automated financial trading software. It might seem that that is the ideal case - you get a score, how much money it makes - and you just optimize it to make more money and let it run. But that's not right, because there are other factors, such as risk. If you just train the software to maximize EV it can wind up learning to do very strange things (like massive leverage circular arbitrage trades that require huge leverage to squeeze tiny margins; this is part of what killed LTCM for example).

The only time you can really go nuts optimizing for a metric is when the metric is the real final target of your application. Otherwise, you have to be really careful and take the metric with a grain of salt; you optimize for the metric, but also keep an eye on your code and use your human judgement to decide if the changes your making are good general changes or are just over-specific to the metric.

Anyone in software should know this.

Which is what makes it particularly disturbing that the Gates Foundation supports moronic metric-based education.

When you set simple performance metrics for big bureaucracies, you don't make things better. You make the bureaucracies better at optimizing those metrics. And since they have limited resources and limited amounts of time and energy, that typically makes everything else worse.

Granted, Gates is not so moronic as to advocate "teaching to the test", but even a more complicated cocktail of metrics (which they have yet to define, instead pouring money into metrics research) will not be any different. If you're going to pay and hire and fire people based on metrics you create a horrible situation where any creative thought is punished.

(I think Gates' opposition to small class sizes reflects an over-attention to test results (which have been shown to not correlate strongly to class size) and a lack of common sense about what actually happens in a class room)

The irony is that it's just like the way that horrible teachers grade their students. It's like those essay questions on AP exams where they don't actually read your essay and appreciate what you're saying at all, the grader just looks for the key words that you're supposed to have used if your answer is correct, so you could actually write something that doesn't make sense at all, as long as it has the correct "thesis/evidence/summary" structure you get full points.

It gets me personally hot and bothered. My experience of American public schools was that they were generally absurdly soul-crushing in a bureaucratic Kafka-esque way; like you would be tested for your creativity and independent though process, and the way that was done was you had to recite back the specific problem solving steps that you had been told was the method. In that general depressing stupidity, I was lucky enough to have a few teachers that really touched me and helped me through life because they just engaged me as a human being and were flexible about how I was allowed to go about things. In terms of objective evaluations I'm sure many of those special teachers would have done very poorly. They often spent entire class sessions rambling on about their personal lives or about politics - and that was great! That was so much more valuable to a child than taking turns reading out of the textbook or following the official lesson plan.


10-26-11 | Some Things I Find Appalling

The US continues to be the largest provider of arms around the world, including to questionable third world countries and private militias.

The FBI continues to use entrapment techniques on suspected possible terrorists in which they provide a more radical undercover agent who provides the arms and encouragement. (just like the 60's, man)

The FBI continues to spy on non-criminals inside the US.

The US government creates semi-hidden propaganda to sell its policies to US citizens.

US journalists are not allowed to cover our own wars any more. Don't be misled by "embeds" or other government-provided "news" footage.

The executive continues to hide its actions under the cloak of "priviledge" or "national security" , way beyond what is remotely reasonable.

Private companies are paid to imprison our citizens. Privitization of prisons is just insane, but of course it's only natural when you have private military forces, which are not only illegal but paid for by our own government. WTF.

We continue to use terrorism as a thin excuse for deporting or imprisoning ("detention") thousands of immigrants.

etc.


10-26-11 | The Eight Month Cruise

If you're buying a home in Seattle, you should always try to do so in early spring. This will give you a few months of wetness to see any problems, and then you'll have the whole summer to deal with them before the wet sets in again. (it also lets you see the homes during rains, which lets you look for water incursions while you shop). I tried to time it that way, but the home shopping took too long and I didn't wind up buying until late summer.

The problem with Seattle is that once it starts raining (around Oct 1 pretty reliably) it literally does not stop for the next 8 months. Sure maybe it stops for a day or two, but never long enough for the whole house to dry out and then give you a big chunk of dry days to do something like replace the roof or paint the exterior.

Fortunately I don't have any problem so large as that, but even for minor things it's damn annoying. For example some time around September I realized that I really need to get a coat of waterproofer on all my decking. Oh well, it's gonna have to wait 8 months. There's a couple of spots I need to touch up exterior paint, but you can't really do a good job of painting without a solid 5 days of dry and decent warmth.

It's almost like our houses are boats, and we go on an 8 month aquatic voyage every year. You really need to use those 4 months as a chance to get your boat up on dry dock, scrub off the moss, dig out the dry rot, apply epoxy, sand and paint, etc.


10-26-11 | Tons

In the US, a "ton" = 2000 pounds.

In the UK a "ton" is 2240 pounds (which comes from twenty "hundredweights" where a "hundredweight" is eight stone, and a stone is 14 pounds, WTF Britain).

A "metric ton" is obviously 1000 kg. In the UK this is officially called a "tonne" which you will see in technical documents, but I don't see that used much in casual writing, and it's certainly confusing when spoken since it sounds the same. (but a UK ton is very close to a metric ton (2204.6 pounds) so the mixup here surely happens all the time and is not a huge problem).

(when you hear someone in the UK phonetically say "ton" do they mean "tonne" or imperial ton?)

To differentiate the US ton vs UK ton they can be called "short ton" or "long ton".


On a related note, a pint is not a pound *anywhere* in the world.

In the UK, 1 oz by volume of water = 1 oz of weight. But a "pint" in the UK is 20 oz. So a pint is 1.25 pounds (a gallon is exactly 10 pounds)

In the US, 1 oz by volume of water = 1.041 oz of weight, so a pint = 1.041 pounds. (and a gallon = 8.33 pounds).

(neither liquid ounce is anything neat in terms of volume; the only nice whole number unit is the US gallon which is 231 cubic inches)

(the weight measures are the same in the US and UK, it's the US volume measure which went weird (1.041), and I believe it was done in order to make the gallon an integer number of cubic inches)

If you want to get technical, a (US) "pint's a pound" at some high temperature. (...some digging...) actually it's very close just before boiling. It looks like 98 C water is almost exactly a pound per pint.

Actually there is a sort of cute book-end of the ranges of water density there :

Very close to freezing (4 C) water is 1 g/ml , and very close to boiling (98 C) it's a pound per (US) pint. The difference is a factor of about 0.96.


10-24-11 | We Cut Costs ...

"We cut costs, and pass the savings on to .... us (!?) "

That's the motto of the modern era. Your Jeff Bezoses and Steve Jobs of the world are like a late night "mattress king" but with much less integrity, since the mattress king also slashes prices.

Music and books are digital. The producers save 90% of their costs. No manufaturing. No distribution. No retail space. Where are the savings? In their pocket.

Nobody has a physical location anymore where you can go and get help. Even the horrible call centers are often just automated, or moved to the web. Savings for consumers? Nope.

It struck me when I got an "Orca card" , the local transit pass. I have to load up an e-wallet so they get to hold onto a bunch of my money all the time. All my fares are prepaid and processed in one big transaction. They save massive amounts (vs. collecting physical money to pay for rides). Where are the savings? Not for me of course.

It's gotten so bad that when a new technology comes around and the savings are not passed on to the consumer we don't even blink. They never are.


10-24-11 | LZ Optimal Parse with A Star Part 1

First two notes that aren't about the A-Star parse :

1. All good LZ encoders these days that aren't optimal parsers use complicated heuristics to try to bias the parse towards a cheaper one. The LZ parse is massively redundant (many parses can produce the same original data) and the cost is not the same. But in the forward normal parse you can't say for sure which decision is cheapest, so they just make some guesses.

For example the two key crucial heuristics in LZMA are :


// LZMA lastOffset heuristic :
  if (repLen >= 2 && (
        (repLen + 1 >= mainLen) ||
        (repLen + 2 >= mainLen && mainDist >= (1 << 9)) ||
        (repLen + 3 >= mainLen && mainDist >= (1 << 15))))
  {
    // take repLen instead of mainLen
  }

// LZMA lazy heuristic :

        // Truncate current match if match at next position will be better (LZMA's algorithm)
        if (nextlen >= prevlen && nextdist < prevdist/4 ||
            nextlen == prevlen + 1 && !ChangePair(prevdist, nextdist) ||
            nextlen > prevlen + 1 ||
            nextlen + 1 >= prevlen && prevlen >= MINLEN && ChangePair(nextdist, prevdist))
        {
             return MINLEN-1;
        } else {
             return prevlen;
        }

One is choosing a "repeat match" over a normal match, and the next is choosing a lazy match (literal then match) over greedy (immediate match).

My non-optimal LZ parser has to make similar decisions; what I did was set up a matrix and train it. I made four categories for a match based on what the offset is : {repeat, low, medium, high } , so to decide between two matches I do :


  cat1 = category of match 1 offset
  cat2 = category of match 2 offset
  diff = len1 - len2

  return diff >= c_threshold[cat1][cat2]
  
so c_threshold is a 4x4 matrix of integers. The values of threshold are in the range [0,3] so it's not too many to just enumerate all possibilities of threshold and see what's best.

Anyway, the thing about these heuristics is that they bias the parse in a certain way. They assume a certain cost tradeoff between literals vs. repeat matches vs. normal matches, or whatever. When the heuristic doesn't match the file they do badly. Otherwise they do amazingly well. One solution might be having several heuristics trained on different files and choose the one that creates the smallest output.

Also I should note - it's not trivial to tell when you have the heuristic wrong for the file. The problem is that there's a feedback loop between the parse heuristic and the statistical coder. That causes you to get into a local minimum, and you might not see that there's a better global minimum which is only available if you make a big jump in parsing style.

Repeating that more explicitly : the statistical coder will adapt to your parse heuristic; say you have a heuristic that prefers low offsets (like the LZMA heuristic above), that will cause your parse to select more low offsets, that will cause your statistical backend to code low offsets smaller. That's good, that's what you want, that's the whole point of the heuristic, it skews the parse intentionally to get the statistics going in the right direction. The issue then is that if you try to evaluate an alternative parse using the statistics that you have, it will look bad, because your statistics are trained for the parse you have.

2. It's crazy that LZ compression can do so well with so much redundancy in the code stream.

Think about it this way. Enumerate all the compressed bit streams that are created by encoding all raw bit streams up to length N.

Start with the shortest compressed bit stream. See what raw bits that decodes to. Now there may be several more (longer) compressed bit streams that decode to that same raw bit stream. Mark those all as unused.

Move to the next shortest compressed bit stream. First if there is a shorter unused one, move it into that slot. Then mark all other output streams that make the same raw bits as unused, and repeat.

For example a file like "aaaaaaa" has one encoding that's {a, offset -1 length 6} ; but there are *tons* of other encodings, such as {a,a,[-1,5]} or {a,[-1,3],[-1,3]} or {a,[-1,2],[-2,4]} etc. etc.

All the encodings except the shortest are useless. We don't want them. But even the ones that are not the best are quite short - they are small encodings, and small encodings take up lots of code space (see Kraft Inequality for example - one bit shorter means you take up twice the code space). So these useless but short encodings are real waste. In particular, there are other raw strings that don't have such a short encoding that would like to have that output length.

Anyhoo, surprisingly (just like with video coding) it seems to be good to add even *more* code stream redundancy by using things like the "repeat matches". I don't think I've ever seen an analysis of just how much wasted code space there is in the LZ output, I'm curious how much there is.

Hmm we didn't actually get to the A Star. Next time.


10-18-11 | StringMatchTest : Hash 1b

For reference :

The good way to do the standard Zip-style Hash -> Linked List for LZ77 parsing.

There are two tables : the hash entry point table, which gives you the head of the linked list, and the link table, which is a circular buffer of ints which contain the next position where that hash occured.

That is :


  hashTable[ hash ]  contains the last (most recent preceding) position that hash occured
  chainTable[ pos & (window_size-1) ]  contains the last position of the hash at pos before pos

To walk the table you do :

  i = hashTable[ hash ];
  while ( i in window )
    i = chainTable[ i & (window_size-1) ]

To update the table you do :

  head = hashTable[ hash ];
  hashTable[hash] = pos;
  chainTable[ pos & (window_size-1) ] = head;

And now for some minor details that are a bit subtle. We're going to go through "Hash1" from StringMatchTest which I know I still haven't posted.

int64 Test_Hash1(MATCH_TEST_ARGS)
{
    uint8 * buf = (uint8 *)charbuf;
    const int window_mask = window_size-1;
        
    vector<int> chain; // circular buffer on window_size
    chain.resize(window_size);
    int * pchain = chain.data();
    
    const int hash_size = MIN(window_size,1<<20);
    const int hash_mask = hash_size-1;
    
for small files or small windows, you can only get good per-byte speed if you make hash size proportional with that MIN. (what you can't see is outside the Test_ I also made window_size be no bigger than the smallest power of 2 that encloses file size).


    vector<int> hashTable; // points to pos of most recent occurance of this hash
    hashTable.resize(hash_size);
    int * phashTable = hashTable.data();
    
    memset(phashTable,0,hash_size*sizeof(int));

As noted previously, for large hashes you can get a big win by using a malloc that gives you zeros. I don't do it here for fairness to the other tests. I do make sure that my initialization value is zero so you can switch to VirtualAlloc/calloc.

    int64 total_ml = 0;
    
    // fiddle the pointers so that position 0 counts as being out of the window
    int pos = window_size+1;
    buf -= pos;
    ASSERT( (char *)&buf[pos] == charbuf );
    size += pos;

I don't want to do two checks in the inner loop for whether a position is a null link vs. out of the window. So I make the "null" value of the linked list (zero) be out of the window.

    for(;pos<(size-TAIL_SPACE_NEEDED);)
    {
        MATCH_CHECK_TIME_LIMIT();

        // grab 4 bytes (@ could slide here)
        uint32 cur4 = *((uint32 *)&buf[pos]);
        //ASSERT( cur4 == *((uint32 *)&buf[pos]) );

On PC's it's fastest just to grab the unaligned dword. On PowerPC it's faster to slide bytes through the dword. Note that endian-ness changes the value, so you may need to tweak the hash function differently for the two endian cases.

        // hash them 
        uint32 hash = hashfour(cur4) & hash_mask;
        int hashHead =  phashTable[hash];
        int nextHashIndex = hashHead;
        int bestml = 0;
        int windowStart = pos-window_size;
        ASSERT( windowStart >= 0 );
        
        #ifdef MAX_STEPS
        int steps = 0;
        #endif

Start the walk. Not interesting.

        while( nextHashIndex >= windowStart )
        {
            uint32 vs4 = *((uint32 *)&buf[nextHashIndex]);
            int hi = nextHashIndex&window_mask;
            if ( vs4 == cur4 )
            {
                int ml = matchlenbetter(&buf[pos],&buf[nextHashIndex],bestml,&buf[size]);
                    
                if ( ml != 0 )
                {
                    ASSERT( ml > bestml );
                    bestml = ml;
                    
                    // meh bleh extra check cuz matchlenbetter can actually go past end
                    if ( pos+bestml >= size )
                    {
                        bestml = size - pos;
                        break;
                    }
                }
            }

            #ifdef MAX_STEPS
            if ( ++steps > MAX_STEPS )
                break;
            #endif
                                
            nextHashIndex = pchain[hi];
        }

This is the simple walk of the chain of links. Min match len is 4 here which is particularly fast, but other lens can be done similarly.

"MAX_STEPS" is the optimal "amortize" (walk limit) which hurts compression in an unbounded way but is necessary for speed.

"matchlenbetter" is a little trick ; it's best to check the character at "bestml" first because it is the most likely to differ. If that char doesn't match, we know we can't beat the previous match, so we can stop immediately. After that I check the chars in [4,bestml) to ensure that we really do match >= bestml (the first 4 are already checked) and lastly the characters after, to extend the match.

The remainder just updates the hash and is not interesting :


        ASSERT( bestml == 0 || bestml >= 4 );
        total_ml += bestml;
        
        // add self :
        //  (this also implicitly removes the node at the end of the sliding window)
        phashTable[hash] = pos;
        int ci = pos & window_mask;
        pchain[ci] = hashHead;
                
        if ( greedy && bestml > 0 )
        {
            int end = pos+bestml;
            pos++;
            ASSERT( end <= size );
            for(;pos<end;pos++)
            {               
                uint32 cur4 = *((uint32 *)&buf[pos]);
                
                // hash them 
                uint32 hash = hashfour(cur4) & hash_mask;
                int hashHead =  phashTable[hash];
                phashTable[hash] = pos;
                int ci = pos & window_mask;
                pchain[ci] = hashHead;      
            }
            pos = end;
        }
        else
        {
            pos++;
        }
    }
    
    return total_ml;
}

Note that for non-greedy parsing you can help the O(N^2) in some cases by setting bestml to lastml-1. This helps enormously in practice because of the heuristic of matchlenbetter but does not eliminate the O(N^2) in theory. (the reason it helps in practice is because typically when you find a very long match, then the next byte will not have any match longer than lastml-1).

(but hash1 is really just not the right choice for non-greedy parsing; SuffixArray or SuffixTrie are vastly superior)


10-18-11 | To wrap it up or move on

One of the things I've really struggled with in the last few years at RAD is the trade off between wasting time on a side shoot of the main code line vs. really wrapping something up while your focus is on it.

Generally after I've spent a week or two on a topic I start feeling like "okay that's enough time on this I need to move on" ; I guess that's my internal project manager watching my schedule. But on the other hand, there's such a huge advantage to staying on a topic while it fills your mind. You just lose so much momentum if you have to come back to it later.

For example I wish I had finished my JPEG decoder back when I was working on it, that was an important piece of software I believe, but I felt like I needed to move on to more practical tasks, and now it's all out of my head and would take me several weeks to figure out all the nuances again.

Currently I'm working on a new way to do LZ optimal parsing, and I feel like I need to move on because it's not that important to my product and I need to get onto more practical tasks, but at the same time I hate to leave it now because I feel close to a breakthrough and I know that if I stop now I may never come back to it, and if I do it will take forever to get back into the flow.


10-18-11 | Occupy Personal Responsibility

I am happy to see the "occupy" movements ; it's nice just to see people trying to do something about our fucked up politics.

Using "the top 1%" as your scapegoat is very clever, because it's such a narrowly defined group that it can actually get some majority support behind it. Past democratic/populist movements have tried to blame "the rich" or the top 10% , but that never works since 20-30% of people think they are in the top 10% (or will be soon), and then another 20% will oppose you just because you're a democrat, and another 20% will oppose any kind of redistribution, and the result is that you can't get a majority. The "top 1%" is a mysterious group that nobody personally knows, they live in crystal castles and somehow screw us all over.

But that's where I find the whole movement to be rather depressing. It's not actually a new movement towards more realistic, humble governance. It's yet another call for a free lunch.

The real problem with American politics goes back to the voters. Nobody is ever willing to see the big picture and do what's good for the country. Everyone wants their taxes lower. They want their services increased. It's somebody else who can get higher taxes and get a service cut.

And the whole Occupy/1%er movement is the same thing. It's not our fault that the country is so fucked and we don't have jobs. It couldn't be because we act like a herd of buffalo bidding up houses and jumping on stocks just before they tank. It couldn't be because we ran up massive personal debts to buy imported crap. It couldn't be because we chose to get useless educations. It couldn't be because we refuse to raise taxes. It couldn't be because we slash education and infrastructure spending that would help our country develop. No, it's those top 1% ! They're somehow manipulating markets and controlling government and screwing us all over!

(not related to my main point, but another funny bit is that the *actual* poor are completely missing from this movement; it's always the middle class or maybe lower-middle class who are in a bit of a hard spot; if you watch the news about foreclosures, it's always some white people in a suburban 3000 square foot tract home whining about their foreclosure. We have 15-20% of our population in serious fucking big time poverty (the official poverty line is crazy low; $10k a year for one person, $20k for a family of four, a more reasonable definition would make the number even higher). This is not a small group, but they are completely invisible from modern politics, the news, and all these populist movements. That's very intentional, I believe, because the democrats/populists know that the truly poor are politic poison. When you get a bunch of blacks, homeless, immigrants, etc. in your rallies, that's when the Republican opposition calcifies against you. Plus it's just not news; we know we have massive embarassing ghettos where people are barely scraping by, and we don't care and we don't want to hear about it).


10-17-11 | Crap Products

Am I the only person in the world that's just disgusted by how the moron marketting/design people are ruining all modern cars? This is not the correct amount of glass in a car :

Oh sure, I can sacrifice something minor like seeing *out* of the car for something more important like how "aggressive" it looks (that seems to be the key word for car marketting morons ons at the moment). I guess with high end cars like the LFA it actually is more important to the average buyer how it looks from the outside than how it is to drive, but sadly this stupidity has infected all levels of car design.


Modern bed designers either are intentionally out to bruise a lot of shins or they're just morons. I like to think it's the former. They love the fact that all the pretentious douches who just buy things because they're "stylish" without thinking about whether they're actually good are walking around with constantly bruised shins.

If you google "modern platform bed" almost every single one is a shin minefield.

I picked the particular one because it exhibits another piece of retarded design - the integrated nightstand. It seems sort of like a nice idea at first; some of them have little cubbies or nooks in the headboard where you can stash your book, a lamp, a glass of water. But then I started thinking...

Do you people never have sex? When you do, is it lying almost perfectly still with a sheet between you?

Also on the bed topic - "Eco Leather" ? Seriously? Hmm, let's see... 70% polyurethane. That's plastic my friend. I believe that's "pleather", or "leatherette". Fucking eco leather. May as well call it "iLeather".

(the smugstainability junta want "eco leather" to mean recycled leather or some such shit, the ridiculously-obsessive-mom junta want "eco leather" to mean leather processed without chemicals so it's safe for their ever-so-fragile babies, but in fact the furniture manufacturers of the world have just said "nope, fuck you, it means plastic").


10-17-11 | Sensor Dry

Do the laundry manufacturers of the world really believe that "slightly damp" is the correct amount of dryness?

So far as I can tell all the fancy "Energy Star" low-energy dryers just work by NOT FUCKING DRYING. Oh, big woop you saved energy because you just fucking didn't heat up the clothes. I can make a dryer from 1960 use zero watts if I'm allowed to not actually dry the clothes.

It's so disrespectful. It treats me, the user of the device, like a fucking piece of shit. I told you to do something, but who the fuck am I? Just some moron consumer. I'm sure the engineers know what I want better than I do, so they'll detect that the clothes are dry and stop the machine. NO! I didn't fucking tell you to stop it, so don't stop it.

My range hood has a similar thing; a "heat sensor" feature which automatically detects a certain temperature and kicks it on high. FUCK YOU. You will run when I tell you to run, you bitch. If I want to boil a pot of water and leave it on the stove and not run the hood then I fucking get to, it's my god damn house. What a fucking cock sucking feature. And of course none of these things can be turned off.

All modern cars are essentially the same way. You wanted to do an abrupt engine brake to get some off-throttle oversteer? Nope, sorry, that throttle release is low-passed. You want to abruptly jump on the throttle to speed away? Nope, sorry, the ingition timing was advanced to save fuel. You wanted to get the weight unbalanced side to side to initiate a flick? Nope, sorry, the magnetic suspension detected the sway and adjusted to stop it. We (the engineers, the ECU, the manufacturer) know better than you. You don't get the freedom to do what you want with your own tools.

Of course they're right most of the time. People are morons. But we should be allowed to be morons. I hate this shit.

It's an unfortunate truth that any time you get software involved, things become shit.

Part of the problem, I believe, is that software allows products to be designed by the marketers.

When products were designed by engineers and scientists, and had to be pressed out of metal, and some big custom machine had to be made to do the pressing, there was a long turn around time, and they just tried to design it to work as well as it could. Some marketer could come in and say "can we make it automatically turn off after 10 minutes?" and the engineer would say "well, that would require this extra part, and it would take 6 months to change the machines, ..." so it wouldn't get done. You couldn't risk chasing the latest trend because by the time your products got through the cycle the trend would be different.

Software is just too easy to change. And programmers are cheap and easy to replace, and have no personal ethics about the code they're writing anyway.

So some moron Producer/Marketer can call a meeting at the last minute and say "what if we add vending machines that sell Sobe drink powerups?" or "what if we sell hats?" , the engineer says "yeah, umm, I guess we could do that, I don't really think it's a good idea..." but it's not your job to have ideas, just go write the code.


10-12-11 | Post-backpacking Recalibration

1. 65 degree house is fucking blazing hot at night.

2. Of course I'll walk a mile in the rain to get lunch. No big deal.

3. Everything is fucking DELICIOUS! Jar of spaghetti sauce, om nom nom. Beef stew yum.

4. None of the usual hesitation in sitting on the semi-shit-covered office toilet after having had to sit on the very-shit-covered wilderness toilet.


10-12-11 | Some random politico-economics

I believe it's possible to slant recent events such that Greenspan is the central villain. He presided over both the .com bubble and the real estate bubble, fueling both with super low interest rates and his approval (eg. the "new economy" that can grow forever!). He presided over a Fed that did nothing to control the banks it was supposed to be monitoring. Perhaps most damning though is how he steered the Clinton presidency, which led to the creation of this entire modern financial era.

One version of the myth is that Clinton arrived at the white house all liberal and bright eyed, only to find it infested by a den of wolves who forced him into realpolitik compromises. One of those wolves was Greenspan, who met with Clinton and said something along the lines of "if you don't cut federal spending, I will raise interest rates". Which is basically a threat - if you don't do what I like (small government) then I will destroy your presidency by constricting the economy in an already recessed time. I believe the fact of this threat is public record though you may believe Greenspan's version that the reason behind it was "advice" that runaway government debt would force his hand to raise interest rates. In any case, what followed was deregulation, low interest rates, and the inevitable disaster.

Anyhoo, I don't really believe that reading of history, but it makes a good story.

One thing that strikes is all the articles these days that describe the "Great Recession" as a "unique time in American history" or "unprecedented income inequality". Uhh, hello? Did you go to high school? Maybe if you qualified that with the addendum "since WW2". Basically it was completely standard pre-1910 American economics, with the only exception being that our new modern government caught the crash and smoothed it into a recession rather than just letting the bubble pop. If they hadn't caught the crash it would have been a generic "Panic of XXXX" (1857, 1873, 1884, 1890, 1893, 1896, 1907, 1910, 1914, 1920, 1929) many of which were unsuprisingly similar (massive over-speculation in a over-valued asset leading to a crash in the value of that asset which leads to a liquidity crisis and bank failures); we could have dropped our monocle in shock as the robber-baron speculators caused a run on the banks.

Ever since the 80's we've been talking about how we need to transform America into a new "information economy" or "service economy". With NAFTA, etc. whenever we lost jobs, the response would be we need new types of jobs, new education etc. I don't recall much effort by anyone to stop and ask - do we want to live in a service economy? I mean, at the moment we are even failing to have a healthy service economy, in the sense that there's lots of unemployment, so all the cries are just to *fix* the service economy. But really, even if it was fixed, if there were call center jobs and retail jobs for everyone, that's a pretty fucking bad world to live in. An "information economy" is inherently bad for almost everyone in it, because "intellectual property" is owned by the very few and is where all the money is.

I believe that raising capital gains taxes to match income taxes would help. Not only is it just inherently fair to tax all income the same way and would eliminate the tax rate dip of the super rich vs the merely rich (a low capital gains tax is a regressive benefit to the super rich, since the poor don't make any income from capital gains), it would greatly reduce capital movement.

I also think that "retirement accounts" and all that shit should go away. It's an unnecessary complexity. Instead just make the capital gains tax go down based on the length held. Maybe reduce by 1% per year held, so after 30 years the tax goes to zero. This further encourages "buy and hold" type investing.

I'm more and more convinced that massive rapid capital movement isn't actually good for anyone but the very top echelon of financiers. It creates panics and crashes in small emerging markets. It creates bubbles, and it sucks profit out of markets. Obviously capital markets are beneficial to get money to companies that need it. All structures that are permitted in society need to justify themselves as being beneficial to the greater good.

I believe that allowing bankruptcy to wipe out student loans would provide a valuable incentive to colleges to keep tuition low, and to keep education useful. Right now it's far to easy to go and get a $200k education in the cultural impact of deconstruction in gay/lesbian underwater basket weaving, and wind up in a position with no job and no hope to ever pay it back. Obviously it's the fault of the student for doing that, but they're children, they shouldn't be expected to make such a huge financial life decision in the summer after high school when someone is handing them a huge blank check. If the colleges had to underwrite the student loans themselves, they would have an inherent interest to provide good useful educations, and to simply refuse to admit students that shouldn't be going to college.

I believe the most basic fundamental thing that we should do to fix the US economy (aside from the obvious things like eliminating off-book derivatives and restoring bank/investment separation and so on) is to make it cheaper to employ Americans.

It's simply too expensive to employ Americans. You can try to prop up our manufacturing sector in various ways, but you have to get to the root cause. It's insane how expensive labor is here, even only semi-skilled labor, vs. buying manufactured products.

I believe the best way to reduce the cost of employees is to get rid of all the additional costs to the employer. Health care, workers' comp, pensions, 401k's, all of this nonsense that employers in America have to administer and pay for. If these were provided by the government, the total cost for them would be the same, but it would be paid by corporate profits whether you hired those workers or not, so you may as well hire them here rather than outsourcing. You would greatly reduce the incremental cost of adding a new employee, which would encourage business to hire. Furthermore, without having to worry about so much hiring and firing paperwork, businesses would be more willing to hire in times of uncertain growth.

Ideally you would get to a scenario of no hiring/firing paperwork at all. This would make it far easier for businesses to find the best employees, and easier for employees to move on to better jobs without fear of being without health care for a while or whatever.

You might note that the heavy social welfare countries don't exactly have lively agile businesses. I believe that is not the fault of the government providing social care, but rather because those countries tend to also have very heavy regulatory structures that make doing business difficult and expensive. My proposal is to make small businesses much much easier to start and run, and much cheaper.

There's this whole modern movement to "save money" (for the government) by pushing the maintenance of social welfare programs onto the businesses (health care, retirement, etc.) But that doesn't actually save money for society, it just makes someone else pay it. The proponents will say they "reduced taxes" but they also reduced your pay check because the business now has to cover those costs. And it's worse than that, because it increases the cost per employee to the business, it makes them prefer to hire fewer people, or outsource some work, or just buy pre-made pieces from overseas.


10-12-11 | Subaru VIP program

FYI, just found out about this. Buy a Subaru for 2% under invoice with no haggling. WRX for $24,300 for example. Easy to get "VIP" status by joining some charity or other but requires six months of lead time.

Info at legacygt.com and cars101.com


10-12-11 | Bad profiling

I've seen quite a few posts recently that purport to do some profiling and show you which option is faster. The worst is probably spin locks are faster than mutexes but this kind of cache line study is not awesome either.

(Bouliii's test is not actually a demonstration of cache line sharing at all; it's a demonstration that threading speedup depends on data independence; to demonstrate cache line effects you should have lots of threads atomically incrementing *different* variables that are within the same cache line, not *one* variable; if you did that you would see that cache line sharing causes almost as much slow down as variable sharing)

This kind of profiling is deeply flawed and I believe is in fact worse than nothing.

Timing threading primitives in isolation or on synthetic task sets is not useful.

Obviously if all you measure is the number of clocks to take a lock, then the simplest primitives (like spinlock) will appear the fastest. Many blogs in the last few years have posted ignorant things like "critical section takes 200 clocks and spin lock is only 50 clocks so spin lock is faster". Doof! (bang head). What you care about in real usage is things like fairness, starvation, whether the threads that have CPU time are the ones that are able to get real work done, etc.

So say you're not completely naive and instead you actually cook up some synthetic work for some worker threads to do and you test your primitives that way. Well, that's better. It does tell you what the best primitive is for *that particular work load*. But that might not reflect real work at all. In particular homogenous vs. heterogenous threads (worker threads that are interchangeable and can do any work vs. threads that are dedicated to different things) will behave very differently. What other thread loads are on the system? Are you measuring how well your system releases CPU time to other threads when yours can't do any more? (a fair threading benchmark should always have a low priority thread that's counting the number of excess cycles that are released). (spinning will always seem faster than giving up execution if you are ignoring the ability of the CPU to do other work)

Furthermore the exact sharing structure and how the various workers share cache lines is massively important to the results.

In general, it is very easy for bad threading primitives to look very good in synthetic benchmarks, but to be disastrous in the real world, because they suffer from things like thundering herd or priority inversion or what I call "thread thrashing" (wasted or unnecessary thread context switches).

You may have noticed that when I posted lots of threading primitives a month or two ago there was not one benchmark. But what I did talk about was - can this threading primitive spin the CPU for a long time waiting on a data item? (eg. for an entire time slice potentially) ; can this threading primitive suffer from priority inversion or even "priority deadlock" (when the process can stall until the OS level set manager saves you) ; how many context switches are needed to transfer control, is it just one or are there extra? etc. these are the questions you should ask.

This has been thoroughly discussed over the years w.r.s.t. memory allocators, so we should all know better by now.

Also, as has been discussed thoroughly wrst allocators, it is almost impossible to cook up a thorough benchmark which will tell you the one true answer. The correct way to decide these things is :

1. Understand the issues and what the trade-offs are.

2. Make sure you pick an implementation that does not have some kind of disastrous corner case. For example make sure you allocator has decent O() behavior with no blow-up scenario (or understand the blow-up scenario and know that it can't happen for your application). Particularly with threading I believe that for 99.99% of us (eg. everyone except LRB) it's more important to have primitives that work correctly than to save 10 clocks of micro efficiency.

3. Try to abstract them cleanly so that they are easily swappable and test them in your real final application with your real use case. The only legitimate test is to test in the final use case.


10-11-11 | Comcast

Comcast keeps calling me with some horrible auto-dialer trying to tell me something. Unfortunately., if I just ignore it or screen it, it leaves me an automated message on my voicemail.

I try to ignore it for a while, but constantly having voicemails to go through is damn annoying so I finally call them up.

Enter your damn phone number. Bleep bloop. Urg I hate you. Mash buttons through the menu. Urg.

Me : "Why the fuck are you calling me over and over?"

Comcast : "Sir that's because you have a late a balance due ... "

Me (interrupting) , "Umm, my account is on credit card pay, why the fuck do I have a late balance?"

Comcast : "Sir that's because credit card payments aren't processed for 2-3 months after setting up the account"

Me : "Umm, okay so can I just pay the balance now so they stop pestering me?"

Comcast : "No it will be charged automatically in the next billing cycle"

Me : "So can you stop calling me?"

Comcast : "Unfortunately I can't do that, but I can give you a number to call ..."

Me : (sighs and rolls eyes)

Comcast : "..it's 877-824-2288"

Me : "Umm.. that's the generic comcast customer support number"

Blah. Boring, I know. So frustrating. I have absolutely no recourse, I can't pick a different internet provider, and I can't fight with these people because they just send you around in circles and you never get to talk to anyone with power. You always get some damn call center person who says "I'm sorry I can't do that".


10-06-11 | Fiberglass

Don't put this shit in your house. It's toxic, it's poison, it's the modern day asbestos. There are plenty of alternatives.

Even in the attic or walls, sure it's sealed up most of the time, but any time you have to go in there to work you stir it up, then you get glass shards in the air which get in your eyes and lungs, so you have to wear safety suits and respirators and so on just to do basic maintenace work.

If anything ever goes wrong with it, it's a nightmare to dispose of.

Worst of all is using it to wrap ducts. The problem is that all ducts will leak eventually. Maybe not right away, but in 10 years cracks will form. Then the air return ducts will start sucking in at the seams, and eventually that will be sucking glass fibers in, then blowing them out all over the house.

Just say no to toxic shit in your home.


10-03-11 | Amortized Hashing

The "hash1" simple Zip-style matcher was very fast. So why don't we love it?

The problem is amortized hashing. "Amortized" = we just stop walking after A steps. This limits the worst case. Technically it makes hash1 O(N) (or O(A*N)).

Without it, hash1 has disastrous worst cases, which makes it just not viable. The problem is that the "amortize" can hurt compression quite a bit and unpredictably so.

One perhaps surprising thing is that the value of A (the walk limit) doesn't actually matter that much. It can be 32 or 256 and either way you will save your speed from the cliff. Surpisingly even an A of 1024 on a 128k window helps immensely.


not "amortized" :

 file,   encode,   decode, out size
lzt00,  225.556,   35.223,     4980
lzt01,  143.144,    1.925,   198288
lzt02,  202.040,    3.067,   227799
lzt03,  272.027,    2.164,  1764774
lzt04,  177.060,    5.454,    11218
lzt05,  756.387,    3.940,   383035
lzt06,  227.348,    6.078,   423591
lzt07,  624.699,    4.179,   219707
lzt08,  311.441,    7.329,   261242
lzt09,  302.741,    3.746,   299418
lzt10,  777.609,    1.647,    10981
lzt11, 1073.232,    4.999,    19962
lzt12, 3250.114,    3.134,    25997
lzt13,  101.577,    5.644,   978493
lzt14,  278.619,    6.026,    47540
lzt15, 1379.194,    9.396,    10911
lzt16,  148.908,   12.558,    10053
lzt17,  135.530,    5.693,    18517
lzt18,  171.413,    6.003,    68028
lzt19,  540.656,    3.433,   300354
lzt20,  109.678,    5.488,   900001
lzt21,  155.648,    3.605,   147000
lzt22,  118.776,    6.671,   290907
lzt23,  103.056,    6.350,   822619
lzt24,  218.596,    4.439,  2110882
lzt25,  266.006,    2.498,   123068
lzt26,  118.093,    7.062,   209321
lzt27,  913.469,    4.340,   250911
lzt28,  627.070,    2.576,   322822
lzt29, 1237.463,    4.090,  1757619
lzt30,   75.217,    0.646,   100001

"amortized" to 128 steps :

 file,   encode,   decode, out size
lzt00,  216.417,   30.567,     4978
lzt01,   99.315,    1.620,   198288
lzt02,   85.209,    3.556,   227816
lzt03,   79.299,    2.189,  1767316
lzt04,   90.983,    7.073,    12071
lzt05,   86.225,    4.715,   382841
lzt06,   91.544,    6.930,   423629
lzt07,  127.232,    4.502,   220087
lzt08,  161.590,    7.725,   261366
lzt09,  119.749,    4.696,   301442
lzt10,   55.662,    1.980,    11165
lzt11,  108.619,    6.072,    19978
lzt12,  112.264,    3.119,    26977
lzt13,  103.460,    6.215,   978493
lzt14,   87.520,    5.529,    47558
lzt15,   98.902,    7.568,    10934
lzt16,   90.138,   12.503,    10061
lzt17,  115.166,    6.016,    18550
lzt18,  176.121,    5.402,    68035
lzt19,  272.349,    3.310,   304212
lzt20,  107.739,    5.589,   900016
lzt21,   68.255,    3.568,   147058
lzt22,  108.045,    5.867,   290954
lzt23,  108.023,    6.701,   822619
lzt24,   78.380,    4.631,  2112700
lzt25,   93.013,    2.554,   123219
lzt26,  108.348,    6.143,   209321
lzt27,  103.226,    3.468,   249081
lzt28,  145.280,    2.658,   324569
lzt29,  199.174,    4.063,  1751916
lzt30,   75.093,    1.019,   100001

The times are in clocks per byte. In particular let's look at some files that are really slow without amortize :

no amortize :

 file,   encode,   decode, out size
lzt11, 1073.232,    4.999,    19962
lzt12, 3250.114,    3.134,    25997
lzt15, 1379.194,    9.396,    10911
lzt27,  913.469,    4.340,   250911
lzt29, 1237.463,    4.090,  1757619

amortize 128 :

 file,   encode,   decode, out size
lzt11,  108.619,    6.072,    19978
lzt12,  112.264,    3.119,    26977
lzt15,   98.902,    7.568,    10934
lzt27,  103.226,    3.468,   249081
lzt29,  199.174,    4.063,  1751916

Massive speed differences. The funny thing is the only file where the compression ratio is drastically changes is lzt12. It's changed by around 4%. (lzt29 is a bigger absolute difference but only 0.34%)

So amortized hashing saved us massive time, and only cost us 4% on one file in the test case. Let me summarize the cases. There are three main classes of file :

1. 80% : Not really affected at all by amortize or not. These files don't have lots of degeneracy so there aren't a ton of links in the same hash bucket.

2. 15% : Very slow without amortize, but compression not really affected. These files have some kind of degeneracy (such as long runs of one character) but the most recent occurances of that hash are the good ones, so the amortize doesn't hurt compression.

3. 5% : Has lots of hash collisions, and we do need the long offsets to make good matches. This case is rare.

So obviously it should occur to us that some kind of "introspective" algorithm could be used. Somehow monitor what's happening and adjust your amortize limit (it doesn't need to be a constant) or switch to another algorithm for part of the hash table.

The problem is that we can't tell between classes 2 and 3 without running the slow compressor. That is, it's easy to tell if you have a lot of long hash chains, but you can't tell if they are needed for good compression or not without running the slow mode of the compressor.


10-02-11 | How to walk binary interval tree

I noted previously that SA3 uses a binary interval tree. It's obvious how that works but it always takes me a second to figure it out so let's write it down.

This is going to be all very similar to previous notes on cumulative probability trees (and followup ).

Say you have an array of 256 entries. You build a min tree. Each entry in the tree stores the minimum value over an interval. The top entry covers the range [0,255] , then you have [0,127][128,255] , etc.

If you're at index i and you want to find the next entry which has a value lower than yours. That is,


find j such that A[j] < A[i] and j > i

you just have to find an interval in the min tree whose min is < A[i]. To make the walk fast you want to step by the largest intervals possible. (once you find the interval that must contain your value you do a normal binary search within that interval)

You can't use the interval that overlaps you, because you only want to look at j > i.

It's easy to do this walk elegantly using the fact that the binary representation of integers is a sort of binary interval tree.

Say we start at i = 37 (00100101). We need to walk from 37 to 256. Obviously we want to use the [128,256) range to make a big step. And the [64,128). We can't use the [32,64) because we're inside that range - this corresponds to the top on bit of 37. We can use [48,64) and [40,48) because those bits are off. We can't use [36,40) but we can use [38,40) (and the bottom on bit corresponds to [37,38) which we are in).

Doing it backwards, you start from whatever index (such as i=37). Find the lowest on bit. That is the interval you can step by (in this case 1). So step by 1 to i=38. Stepping by the lowest bit always acts to clear that bit and push the lowest on bit higher up (sometimes more than 1 level). Now find the next lowest on bit. 38 = 00100110 , so step by 2 to 40 = 00101000 , now step by 8 to 48 = 00110000 , now step by 16 to 64 = 01000000. etc.

In pseudo code :


Start at index i

while ( i < end )
{
  int step = i & (-i); // isolate bottom bit
  // I'm looking at the range [i,i+step]
  int level = BitPos(step);
  check tree[level][i>>level];
  i += step;
}
 
Now this should all be pretty obvious, but here comes the juju.

I've written tree[][] as if it is layed out in the intuitive way, that is :


tree[0] has 256 entries for the 1-step ranges
tree[1] has 128 entries for 2-step ranges
...

The total is 512 entries which is O(N). But notice that tree[0] is only actually ever used for indexes that have the bottom bit on. So the half of them that have the bottom bit off are not needed. Then tree[1] is only used for entries that have the second bit on (but bottom bit off). So the tree[1] entries can actually slot right into the blanks of tree[0], and half the blanks are left over. And so on...

It's a Fenwick tree!

So our revised iteration is :


// searching :
Start at index i
(i starts at 1)

while ( i < end )
{
  int step = i & (-i); // isolate bottom bit
  // I'm looking at the range [i,i+step]
  check fenwick_tree[i];
  i += step;
}

// building :

for all i
{
  int step = i & (-i); // isolate bottom bit
  fenwick_tree[i] = min on range [i,i+step]
}

(in practice you need to build with a binary recursion; eg. level L is built from two entries of level L-1).

Note that to walk backwards you need the opposite entries. That is, at level 7 (steps of 128) you only need [128,256) to walk forward, never [0,128) because a value in that range can't take that step. To walk backwards, you only need [0,128) , never [128,256). So in fact to walk forward or backward you need all the entries. When we made the "Fenwick compaction" for the forward walk, we threw away half the values - those are exactly the values that need to be in the backward tree.

For the three bit case , range [0,8) , the trees are :


Forward :

+-------------------------------+
|              0-8              |
+-------------------------------+
|       ^       |      4-8      |
+---------------+---------------+
|   ^   |  2-4  |   ^   |  6-8  |
+---------------|---------------+
| ^ |1-2| ^ |3-4| ^ |5-6| ^ |7-8| 
+-------------------------------+

where ^ means go up a level to find your step
the bottom row is indexed [0,7] and 8 is off the end on the right
so eg if you start at 5 you do 5->6 then 6->8 and done

Backward :

+-------------------------------+
|              8-0              |
+-------------------------------+
|      4-0      |       ^       |
+---------------+---------------+
|  2-0  |   ^   |  6-4  |   ^   |
+---------------|---------------+
|1-0| ^ |3-2| ^ |5-4| ^ |7-6| ^ | 
+-------------------------------+

the bottom row is indexed [1,8] and 0 is off the end of the left
so eg if you start at 5 you go 5->4 then 4->0 and done

You collapse the indexing Fenwick style on both by pushing the values down the ^ arrows. It should be clear you can take the two tables and lay them on top of each other to get a full set.


10-01-11 | Used Prices

I really like the idea of buying used. Save some money, maybe prevent some goods from going to the land fill, fuck the retailer, etc. But used prices are just really fucking out of whack.

Used good need to have a *huge* discount. If you look at the utility, when I see something for sale on craigslist,

I can't just click "buy it" and it shows up at my door. So I need a big discount for that.

It's usually badly described and photographed, so I have to do a bunch of research. Discount.

I can't pay with CC at all so I have to go fetch some cash. Discount.

The seller has a 50% chance of being a flake in some way, like telling me its been sold after I show up for my appointment, or putting up the wrong item. Discount.

If it's broken or wrong or whatever, I can't return it. Big discount for this.

There's always an information gap problem when buying used - the seller knows more about the item than you. This fucks you with new items too, of course, but it's worth with used. eg. there's a chance that it's in worse condition than you know. Discount for this.

Even if some of the bad eventualities don't happen on a particular purchase, the price needs to be discounted by the probability that they happen times the cost to you when they do. eg. if your $100 purchase has a 10% chance of being fucked and worthless, you need a $10 discount on all purchases PLUS a discount for the trouble in that event, which is probably another $100 times 10%, so $20 total discount.

The result when you add up all the factors is that something that sells for $1000 new needs to be under $500 or so for it to be +EV to buy used. And it just isn't. In fact it's usually over $800. Used prices are uniformly too high.

Part of the problem is that the price of goods has gotten so out of whack with the price of labor.


10-01-11 | Seattle Stop Shitting on my Face

I've been thinking about upgrading my neoprene gear so that I can swim in the lake through the fall/spring, not just the summer.

I fucking hate swimming laps in a pool during official lap swim hours, with all your "rules" and your system keeping me down. And it just doesn't make sense to go into some nasty indoor crowded box when I'm literally surrounded by miles of beautiful open water.

But there's a bit problem with this idea. Seattle is shitting on my face.

The fall/spring, when a wet suit would help, is when it rains. When it rains, the sewers overflow and drain into the lake. Then you get itchy bumps and vomiting and so on.

Seattle is basically doing nothing about it. There are some little programs to do "rain gardens", but those are sort of like using a tampon to stop elephant piss. What we need is serious fucking civil engineering. ("rain gardens" are also better known as "mosquito breeders"; we Houstonians are always amazed and delighted about the lack of mosquitos here; it's because Seattle is hilly and surrounded by lakes so the water doesn't pool, but the city is doing their best to ruin that). (a much bigger impact than piddly residential rain gardens would be to outlaw concrete parking lots; grass/gravel/lattice parking lots work perfectly fine for holding cars).

Of course this is a problem that is occuring all over the US. I hear NY has a major sewer problem as well. The population and development of most US cities has outstripped their infrastructure, and in our shitty faux-libertarian plutocracy of course there's no money for basic civil engineering. Only the heavy hand of EPA orders is forcing these dumb ass local governments to do anything.

The real solution is something like :

1. Separate the sewage and rain runoff. Run sewage to treatment plants in a closed system so it can never get out. (probably not realistic; alternatively, add a new clean storm water only system)

2. Since you will allow non-sewer storm water to drain to the lake, make the pollutants that run off to the lakes illegal. Fertilizers, pesticides, etc. everything that's water soluble and washes into the lakes is illegal right fucking now.


10-01-11 | More Reliable Timing on Windows

When profiling little code chunks on Windows, one of the constant annoyances is the unreliability of times due to multithreading.

Historically the way you address this is run lots of trials (like 100) and take the MIN time of any trial.

(* important note : if you aren't trying to time "hot cache" performance, you need to wipe the cache between each run. I dunno if there's an easy instruction or system call that would invalidate all cache pages; what I usually do is have a routine that goes and munges over some big array).

It's a bit better these days because of many cores. Now you can quite often find a core which is unmolested by annoying services popping up and stealing CPU time and messing up your profile. But sometimes you get unlucky, and your process runs on an IdealProc that has some other shite.

So a simple loop helps :


template <typename t_func>
uint64 GetFuncTime( t_func * pfunc )
{
    HANDLE proc = GetCurrentProcess();
    HANDLE thread = GetCurrentThread();
    
    DWORD_PTR affProc,affSys;
    GetProcessAffinityMask(proc,&affProc,&affSys);
    
    uint64 tick_range = 1ULL << 62;
    
    for(int rep=0;rep<24;rep++)
    {
        DWORD mask = 1UL<<rep;
        if ( mask & affProc )
            SetThreadAffinityMask(thread,mask);
        else
            continue;   

        uint64 t1 = __rdtsc();
        (*pfunc)();
        uint64 t2 = __rdtsc();

        uint64 cur_tick_range = t2 - t1;
        tick_range = MIN(tick_range,cur_tick_range);

    }

    SetThreadAffinityMask(thread,0xFFFFFFFFUL);

    return tick_range;
}

which makes it reasonably probable that you get a clean run on some core. For published results you will still want to repeat the whole thing N times.


10-01-11 | String Match Results Part 6

You knew that couldn't be the end.

SuffixArray3 : suffix array string matcher which uses a min/max tree to find allowed offsets.

The min/max tree is a binary hierarchy ; at level L there are (size>>L) entries, and each entry covers a range of size (1 << L). Construction is O(N) because N/2+N/4+N/8 ... = N

The min/max tree method is generally slightly slower than the elegant "chain of fences" approach used for SuffixArray2, but it's close. The big advantage is the min/max tree can also be used for windowed matching, which is not easy to integrate in SA2.

First check that it satisfies the O(N) goal on the stress tests :

0 = stress_all_as
1 = stress_many_matches
2 = stress_search_limit
3 = stress_sliding_follow
4 = stress_suffix_forward
5 = twobooks

Yep. Then check optimal parse, large window vs. the good options :

0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin

The good Suffix Trie is clearly the best, but we're in the ballpark.

Now optimal parse, 17 bit (128k) window :


totals:
Test_MMC2 : DNF 
Test_LzFind2 : 506.954418 , 1355.992865 
Test_SuffixArray3 : 506.954418 , 514.931740 
Test_MMC1 : 13.095260 , 1298.507490 
Test_LzFind1 : 12.674177 , 226.796123 
Test_Hash1 : 503.319301 , 1094.570022 

Finally greedy parse, 17 bit window :


totals:
Test_MMC2 : 0.663028 , 110.373098 
Test_LzFind2 : DNF 
Test_SuffixArray3 : 0.663036 , 236.896551 
Test_MMC1 : 0.663028 , 222.626069 
Test_LzFind1 : 0.662929 , 216.912409 
Test_Hash1 : 0.662718 , 62.385071 

average match length :

And once more for clarity :

Greedy parse, 16 bit window , just the good candidates :

totals:
Test_SuffixArray3 : 0.630772 , 239.280605 
Test_LzFind1 : 0.630688 , 181.430093 
Test_MMC2 : 0.630765 , 88.413339 
Test_Hash1 : 0.630246 , 51.980073 

It should be noted that LzFind1 is approximate, and Hash1 is even more approximate. Though looking at the match length chart you certainly can't see it.


09-30-11 | String Match Results Part 5 + Conclusion

Finally for completeness, some of the matchers from Tornado in FreeArc. These are basically all standard "cache table" style matchers, originally due to LZRW, made popular my LZP and LZO. The various Tornado settings select different amounts of hash rows and ways.

As they should, they have very constant time operation that goes up pretty steadily from Tornado -3 to -7, because there's a constant number of hash probes per match attempt.


totals : match len : clocks
Test_MMC1 : 0.663028 , 231.254818 
Test_Hash1 : 0.662718 , 64.888003 
Test_Tornado_3 : 0.630377 , 19.658834 
Test_Tornado_4 : 0.593174 , 28.456055 
Test_Tornado_5 : 0.586540 , 40.546146 
Test_Tornado_6 : 0.580042 , 56.841156 
Test_Tornado_7 : 0.596584 , 141.432393 

There may be something wrong with my Tornado wrapper as the -3 matcher actually finds the longest total length. I dunno. The speeds look reasonable. I don't really care much about these approximate matchers because the loss is hard to quantify, so there you go (normally when I see an anomaly like that I would investigate it to make sure I understand why it's happening).

0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin


Conclusion : I've got to get off string matching so this is probably the end of posts on this topic.

MMC looks promising but has some flaws. There are some cases that trigger a slowness spike in it. Also it has some bad O(N^2) with unbounded match length ("MMC2") so I have to run it with a limit ("MMC1") which removes some of its advantage over LzFind and Hash1 and other approximate matchers. (without the limit it has the advantage of being exact). It's also a GPL at the moment which is a killer.

LzFind doesn't have anything going for it really.

For approximate/small-window matching I don't see any reason to not use the classic Zip hash chain method. I tried a few variants of this, like doing a hash chain to match the first 4 bytes and then link listing off that, and all the variants were worse than the classic way.

For large window / exact matching / optimal parsing, a correct O(N) matcher is the way to go. The suffix-array based matcher is by far the easiest for your kids to implement at home.


09-30-11 | String Match Results Part 4

Okay, finally on to greedy parsing. Note with greedy parsing the average match length per byte is always <= 1.0 (it's actually the % of bytes matched in this case).

Two charts for each , the first is clocks per byte, the second is average match length. Note that Suffix5 is just for reference and is neither windowed nor greedy.

got arg : window_bits = 16

got arg : window_bits = 17

got arg : window_bits = 18

Commentary :

Okay, finally MMC beats Suffix Trie and LzFind, this is what it's good at. Both MMC and LzFind get slower as the window gets larger. Surprisingly, the good old Zip-style Hash1 is significantly faster and finds almost all the matches on these files. (note that LzFind1 and Hash1 both have search limits but MMC does not)


test set :

0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin


The same matchers ; greedy, 16 bit window, on the stress tests :

LzFind does not do well at all on the stress tests. (note that LzFind1 and MMC1 are length-limitted; LzFind1 and Hash1 are "amortized" (step limitted)).

0 = stress_all_as 
1 = stress_many_matches 
2 = stress_search_limit 
3 = stress_sliding_follow 
4 = stress_suffix_forward 
5 = twobooks


09-30-11 | String Match Results Part 3

Still doing "optimal" (non-greedy parsing) but now lets move on to windowed & non-exact matching.

Windowed, possibly approximate matching.

Note : I will include the Suffix matchers for reference, but they are not windowed.

16 bit window :

Clocks per byte :

Average Match len :

This is what LzFind is designed for and it's okay at it. It does crap out pretty badly on the rather degenerate "particles.max" file, and it also fails to find a lot of matches. (LZFind1 has a maximum match length of 256 and a maximum of 32 search steps, which are the defaults in the LZMA code; LzFind2 which we saw before has those limits removed (and would DNF on many of these files)).

lztest is :

0 = almost_incompressable
1 = bigship1.x
2 = Dolphin1.x
3 = GrannyRocks_wot.gr2
4 = Gryphon_stripped.gr2
5 = hetero50k
6 = movie_headers.bin
7 = orange_purple.BMP
8 = particles.max
9 = pixo_run_animonly_stripped.gr2
10 = poker.bin
11 = predsave.bin
12 = quick.exe
13 = RemoteControl_stripped.gr2
14 = ScriptVolumeMgr.cpp


09-30-11 | String Match Results Part 2b

Still on optimal parsing, exact matching, large window :

Chart of clocks per byte, on each file of a test set :

On my "lztest" data set :


0 = almost_incompressable
1 = bigship1.x
2 = Dolphin1.x
3 = GrannyRocks_wot.gr2
4 = Gryphon_stripped.gr2
5 = hetero50k
6 = movie_headers.bin
7 = orange_purple.BMP
8 = particles.max
9 = pixo_run_animonly_stripped.gr2
10 = poker.bin
11 = predsave.bin
12 = quick.exe
13 = RemoteControl_stripped.gr2
14 = ScriptVolumeMgr.cpp

"lztest" is not a stress test set, it's stuff I've gathered that I think is roughly reflective of what games actually compress. It's interesting that this data set causes lots of DNF's (did not finish) for MMC and LzFind.

Suffix5 (the real suffix trie) is generally slightly faster than the suffix array. It should be, of course, if I didn't do a bonehead trie implementation, since the suffix array method basically builds a trie in the sort, then reads it out to sorted indexes, and then I convert the sorted indexes back to match lengths.

Good old CCC (Calgary Compression Corpus) :


0 = BIB
1 = BOOK1
2 = BOOK2
3 = GEO
4 = NEWS
5 = OBJ1
6 = OBJ2
7 = PAPER1
8 = PAPER2
9 = PAPER3
10 = PAPER4
11 = PAPER5
12 = PAPER6
13 = PIC
14 = PROGC
15 = PROGL
16 = PROGP
17 = TRANS

I won't be showing results on CCC for the most part because it's not very reflective of real world modern data, but I wanted to run on a set where MMC and LzFind don't DNF too much to compare their speed when they do succeed. Suffix Trie is almost always very close to the fastest except on paper4 & paper5 which are very small files.


0 = BIB
1 = BOOK1
2 = BOOK2
3 = GEO
4 = NEWS
5 = OBJ1
6 = OBJ2
7 = PAPER1
8 = PAPER2
9 = PROGC
10 = PROGL
11 = PROGP
12 = TRANS

Two new tests in the mix.

Test_Hash1 : traditional "Zip" style fixed size hash -> linked list. In this run there's no chain limit so matching is exact.

Test_Hash3 : uses cblib::hash_table (a reprobing ("open addressing" or "closed hashing", I prefer reprobing)) to hash the first 4 bytes then a linked list. I was surprised to find that this is almost the same speed as Hash1 and sometimes faster, even though it's a totally generic template hash table (that is not particularly well suited to this usage).


09-30-11 | BMWs

I think the E46 M3 is almost the most perfect car ever made. Great engine, plenty of room, great handling (after you dial out the factory understeer); comfortable enough to drive anywhere, but tight enough to toss. I wish it wasn't quite so ugly, and I wish it weighed about 300 pounds less, and I wish it didn't have a leather interior which is invariably gross by now (seriously? can we fucking get technical fabric in cars already? it's breathable, near waterpoof, doesn't get hot or cold, doesn't get ruined by water or sun, it's been the obvious correct material for car interiors for like 10 years now, stop being such ass hats).

E46 M3 prices from cars.com :

Progression of M3 power-to-weights over the years : (with the 1M thrown in since it's the real small M sedan of the present)


E30 2865 / 250 (*) = 11.5 (pounds per hp)
E36 3220 / 316 = 10.2
E46 3415 / 338  3443? = 10.1
1M  3461 / 335  3329? 3296? =  = 10.1 (**)
E92 3704-3900 / 414 = 9.2

correct/comparable weights are hard to find. Both the manufacturer weights and magazine weights are not reliable. You need the regulatory DIN weight or something but I can't find a good source for that (Wikipedia is no good). Anyway, the details may be a bit off, but the power-to-weights of M cars have changed surpisingly little over the years. The powers have gone up, but so have the weights, only slightly slower.

(* = the E30 S14 engine made more like 200 hp stock, but can be reasonably easy brought up to "race spec" 250 hp using modern electronic fuel injection and a few other cheap mods; unlike the other engines, the S14 was actually raced and is reliable even at higher output).
(** = the 1M can easily make 380 hp or so with standard turbo mods).

Before about 2001 the US versions of most BMW's were crippled. Either worse versions of the engines or completely different engines.

The Z3 "M coupe" shooting-brake is supposedly one of the best handling cars ever made (up there with the NSX and I'm not sure what else). The good one is the late model 2001-2002 which got the E46 M3 engine. Unfortunately the public has figured this out and they're now a bit of a collectors item for enthusiasts; prices have stabilized in the $30k-40k range, around double prices of the earlier small engine Z3 M Coupe. I'm a big fan of how ugly it is, and I love that it has the practicality of a wagon, but I hear the cockpit is a bit small.

The later M coupe is extremely comparable to a Cayman :


E86 Z4 M Coupe (2006-2008) : 330 hp, 3200 pounds = 9.7 (pounds per hp)
997.2 Cayman   (2009-2011) : 320 hp, 3000 pounds = 9.4

The M Coupe makes better noises and the engine is easier to tune up, it's also more analog, more raw. The Cayman actually has a lot more cabin room and luggage room ; the M is rather uncomfortably cramped, and I didn't love the feel of sitting over the rear axle in a car with a huge hood. The M will certainly depreciate less, and is marginally less douchey.

There's a large spec E30 (E30 325i) amateur race class. It's a very cheap race class to get into with a very strict spec, it looks like a lot of fun. Maybe I'll do something like that in my retirement. Cars like that are called "momentum cars" by racers because they have very little acceleration; not much fun on the street, but they can still be great on a track because it takes a lot of skill to get the right line to keep speed through corners in traffic.


09-30-11 | Don't use memset to zero

A very common missed optimization is letting the OS zero large chunks of memory for you.

Everybody just writes code like this :


U32 * bigTable = malloc(20<<20);
memset(bigTable,0,20<<20);

but that's a huge waste. (eg. for large hash table on small files the memset can dominate your time).

Behind your back, the operating system is actually running a thread all the time as part of the System Idle Process which grabs free pages and writes them with zero bytes and puts them on the zero'ed page list.

When you call VirtualAlloc, it just grabs a page from the zeroed page list and hands it to you. (if there are none available it zeroes it immediately).

!!! Memory you get back from VirtualAlloc is always already zeroed ; you don't need to memset it !!!

The OS does this for security, so you can never see some other app's bytes, but you can also use it to get zero'ed tables quickly.

(I'm not sure if any stdlib has a fast path to this for "calloc" ; if so that might be a reason to prefer that to malloc/memset; in any case it's safer just to talk to the OS directly).

ADDENDUM : BTW to be fair none of my string matchers do this, because other people's don't and I don't want to win from cheap technicalities like that. But all string match hash tables should use this.


09-30-11 | String Match Results Part 2

The first and simplest set of results are the ones where non-O(N) algorithms make themselves known.

Optimal parsing, large window.

The candidates are :

SuffixArray1 : naive matcher built on divsufsort
SuffixArray2 : using the forward-offset elimination from this post
Suffix2 : Suffix Trie with follows and path compression but missing the bits in this post
Suffix3 : Suffix Trie without follow
Suffix5 : fully working Suffix Trie
MMC1 : MMC with max match length of 256
MMC2 : MMC with no limit
LzFind1 : LzFind (LZMA HC4 - binary tree) with max ML of 256 and step limit of 32
LzFind2 : LzFind with no max ML or step limit

Note : LzFind was modified from LZMA to not record all matches, just the longest, to make it more like the competitors. MMC was modified to make window size a variable.

In all cases I show :


Test_Matcher : average match length per byte , average clocks per byte

And with no further ado :

got arg : window_bits = 24
working on : m:\test_data\lz_stress_tests

loading file : m:\test_data\lz_stress_tests\stress_many_matches
Test_SuffixArray1 : 32.757760 , 164.087948 
Test_SuffixArray2 : 32.756953 , 199.878476 
Test_Suffix2 : 32.757760 , 115.846130 
Test_Suffix3 : 31.628279 , 499.722569 
Test_Suffix5 : 32.757760 , 184.172167 
Test_MMC2 : 32.757760 , 1507.818166 
Test_LzFind2 : 32.757760 , 576.154370 

loading file : m:\test_data\lz_stress_tests\stress_search_limit
Test_SuffixArray1 : 823.341331 , 182.789064 
Test_SuffixArray2 : 823.341331 , 243.492241 
Test_Suffix2 : 823.341331 , 393.930504 
Test_Suffix3 : 807.648294 , 2082.447274 
Test_Suffix5 : 823.341331 , 91.699276 
Test_MMC2 : 823.341331 , 6346.400206 
Test_LzFind2 : 823.341331 , 1807.516994 

loading file : m:\test_data\lz_stress_tests\stress_sliding_follow
Test_SuffixArray1 : 199.576550 , 189.029462 
Test_SuffixArray2 : 199.573198 , 220.316868 
Test_Suffix2 : 199.576550 , 95.225780 
Test_Suffix3 : 198.967622 , 2110.521111 
Test_Suffix5 : 199.576550 , 106.019526 
Test_MMC2 : 199.576550 , 36571.382020 
Test_LzFind2 : 199.576550 , 1249.184412 

loading file : m:\test_data\lz_stress_tests\stress_suffix_forward
Test_SuffixArray1 : 5199.164464 , 6138.802402 
Test_SuffixArray2 : 5199.164401 , 213.675569 
Test_Suffix2 : 5199.164464 , 12901.429712 
Test_Suffix3 : 5199.075953 , 32152.812339 
Test_Suffix5 : 5199.164464 , 145.684678 
Test_MMC2 : 5199.016562 , 6652.666440 
Test_LzFind2 : 5199.164464 , 11739.369336 

loading file : m:\test_data\lz_stress_tests\stress_all_as
Test_SuffixArray1 : 21119.499148 , 40938.612689 
Test_SuffixArray2 : 21119.499148 , 127.520147 
Test_Suffix2 : 21119.499148 , 88178.094886 
Test_Suffix3 : 21119.499148 , 104833.677202 
Test_Suffix5 : 21119.499148 , 119.676823 
Test_MMC2 : 21119.499148 , 25951.480871 
Test_LzFind2 : 21119.499148 , 38581.431558 

loading file : m:\test_data\lz_stress_tests\twobooks
Test_SuffixArray1 : 192196.571348 , 412.356092 
Test_SuffixArray2 : 192196.571348 , 420.437773 
Test_Suffix2 : 192196.571348 , 268.524287 
Test_Suffix3 : DNF
Test_Suffix5 : 192196.571348 , 292.777726 
Test_MMC2 : DNF 
Test_LzFind2 : DNF 

(DNF = Did Not Finish = over 100k clocks per byte).

Conclusion : SuffixArray2 and Suffix5 both actually work and are correct with no blowup cases.

SuffixArray1 looks good on the majority of files (and is slightly faster than SuffixArray2 on those files), but "stress_suffix_forward" clearly calls it out and shows the break down case.

Suffix2 almost works except on the degenerate tests due to failure to get some details of follows quite right ( see here ).

Suffix3 just shows that a Suffix Trie without follows is some foolishness.

We won't show SuffixArray1 or Suffix2 or Suffix3 again.

MMC2 and LZFind2 both have bad failure cases. Both are simply not usable if you want to find the longest match at every byte. We will revisit them later in other usages though and see that they are good for what they're designed for.

I've not included any of the hash chain type matchers in this test because they all obviously crap their pants in this scenario.


09-30-11 | String Match Results Part 1

I was hoping to make some charts and graphs, but it's just not that interesting. Anyhoo, let's get into it.

What am I testing? String matching for an LZ-type compressor. Matches must start before current pos but can run past current pos. I'm string matching only, not compressing. I'm counting the total time and total length of matches found.

I'm testing match length >= 4. Matches of length 2 & 3 can be found trivially by table lookup (though on small files this is not a good way to do it). Most of the matchers can handle arbitrary min lengths, but this is just easier/fairer for comparison.

I'm testing both "greedy" (when you find a match step ahead its length) and "optimal" (find matches at every position). Some matchers like the suffix tree ones don't really support greedy parsing, since they have to do all the work at every position even if you don't want the match there.

I'm testing windowed and non-windowed matchers.

I'm testing approximate and non-approximate (exact) matchers. Exact matchers find all matches possible, approximate matchers find some amount less. I'm not sure the best way to show the approximation vs. speed trade off. I guess you want a "pareto frontier" type of graph, but what should the axes be?

Also, while I'm at it, god damn it!

MAKE YOUR CODE FREE PEOPLE !!

(and GPL is not free). And some complicated personal license is a pain in the ass. I used to do this myself, I know it's tempting. Don't fucking do it. If you post code just make it 100% free for all uses. BSD license is an okay choice.

Matchers I'm having trouble with :


Tornado matchers from FreeArc - seem to be GPL (?)

MMC - GPL

LzFind from 7zip appears to be public domain. divsufsort is free. Larsson's slide is free.


09-29-11 | Suffix Tries 3 : On Follows with Path Compression

Some subtle things that it took me a few days to track down. Writing for my reference.

1. Follows updates should be a bit "lazy". With path compression you aren't making all the nodes on a suffix. So when you match at length 5, the follow at length 4 might not exist. (I made a small note on the consequences of this previously . Even if the correct follow node doesn't exist, you should still link in to the next longest follow node possible (eg. length 3 if a 4 doesn't exist). Later on the correct follow might get made, and then if possible you want to update it. So you should consider the follows links to be constantly under lazy update; just because a follow link exists it might not be the right one, so you may want to update it.

eg. say you match 4 bytse of suffix [abcd](ef..) at the current spot. You want to follow to [bcd] but there is no length 3 node of that suffix currently. Instead you follow to [bc] (the next best follow available) , one of whose children is [dxy], you now split the [dxy] to [d][xy] and add [ef] under [d]. You can then update the follow from the previous node ([abcd]) to point at the new [bc][d] node.

2. It appears that you only need to update one follow per byte to get O(N). I don't see that this is obvious from a theoretical standpoint, but all my tests pass. Say you trace down a long suffix. You may encounter several nodes that don't have fully up to date follow pointers. You do not have to track them all and update them all at the next byte. It seems you can just update the deepest one (not the deepest node, but the deepest node that needs an update). (*)

3. Even if your follow is not up to date, you can still use the gauranteed (lastml-1) match len to good advantage. This was a big one that I missed. Say you match 4096 bytes and you take the follow pointer, and it takes you to a node of depth 10. You've lost a lot of depth - you know you must match at least 4095 bytes and you only have 10 of them. But you still have an advantage. You can descend the tree and skip all string compares up to 4095 bytes. In particular, when you get to a leaf you can immediately jump to matching 4095 of the leaf pointer.

4. Handling of EOF in suffix algorithms is annoying; it needs to act like a value outside the [0,255] range. The most annoying case is when you have a degenerate suffix like aaaa...aaaEOF , because the "follow" for that suffix might be itself (eg. what follows aaa... is aa..) depending on how you handle EOF. This can only happen with the degenerate RLE case so just special casing the RLE-to-EOF case avoids some pain.

(* = #2 is the thing I have the least confidence in; I wonder if there could be a case where the single node update doesn't work, or if maybe you could get non-O(N) behavior unless you have a more clever/careful update node selection algorithm)


09-28-11 | String Matching with Suffix Arrays

A suffix sorter (such as the excellent divsufsort by Yuta Mori) provides a list of the suffix positions in an array in sorted order. Eg. sortedSuffixes[i] is the ith suffix in order.

You can easily invert this table to make sortLookup such that sortLookup[ sortedSuffix[i] ] == i . eg. sortLookup[i] is the sort order for position i.

Now at this point, for each suffix sort position i, you know that the longest match with another suffix is either at i-1 or i+1.

Next we need the neighboring pair match lengths for the suffix sort. This can be done in O(N) as previously described here . So we now have a sortSameLen[] array such that sortSameLen[i] tells you the match length between (sorted order) elements i and i+1.

Using just these you can find all the match lengths for any suffix in the array thusly :


For a suffix start at index pos
Find its sort order : sortIndex = sortLookup[pos]
In each direction (+1 and -1)
current_match_len = infinite
step to next sort index
current_match_len = MIN(current_match_len,sortSameLen[sort index])

Okay. This is all old news. But it has a problem that has been discussed previously .

When matching strings for LZ and such, we don't want the longest match in the array, we want the longest match that occurs earlier. Handled naively this ruins the great O() performance of suffix array string matching. But you can do better.

Run Algorithm Next Index with Lower Value on the sortedSuffix[] array. This provides an array nextSuffixPreceding[]. This is exactly what you need - it provides the next closest suffix with a preceding index.

Now instead of the longest match being at +1 and -1, the longest match is at nextSuffixPreceding[i] and priorSuffixPreceding[i].

There's one last problem - if my current suffix is at position pos, and I look up si = sortIndex[pos] and from that nextSuffixPreceding[si] - I need to walk up to that position one by one doing MIN() on the adjacent pair match lengths (sortSameLen). That ruins my O() win.

But there's a solution - simply build the match length as well when you run "next index with lower value". This can be done easily by tracking the match length back to the preceding "fence". This adds no complexity to the algorithm.

The total sequence of operations is :


sort suffixes : O(NlogN) to O(N)

build sort lookup : O(N)

build sort pair same len : O(N)

build next/prior pos preceding with match lengths : O(N)

now to find a match length :

at position "pos"
si = sortLookup[pos]
for each direction (following and preceding)
  matchpos = nextSuffixPreceding[si]
  matchlen = nextSuffixPreceding_Len[si]

that is, the match length lookup is a very simple O(1) per position (or O(N) for all positions).

One minor annoyance remains, which is that the suffix array string searcher does not provide the lowest offset for a given length of match. It gives you the closest in suffix order, which is not what you want.


09-28-11 | Algorithm : Next Index with Lower Value

You are given an array of integers A[]

For each i, find the next entry j (j > i) such that the value is lower (A[j] < A[i]).

Fill out B[i] = j for all i.

For array size N this can be done in O(N).

Here's how :

I'll call this algorithm "stack of fences". Walk the array A[] from start to finish in one pass.

At i, if the next entry (A[i+1]) is lower than the current (A[i]) then you have the ordering you want immediately and you just assign B[i] = i+1.

If not, then you have a "fence", a value A[i] which is seeking a lower value. You don't go looking for it immediately, instead you just set the current fence_value to A[i] and move on via i++.

At each position you visit when you have a fence, you check if the current A[i] < fence_value ? If so, you set B[fence_pos] = i ; you have found the successor to that fence.

If you have a fence and find another value which needs to be a fence (because it's lower than its successor) you push the previous fence on a stack, and set the current one as the active fence. Then when you find a value that satisfies the new fence, you pop off the fence stack and also check that fence to see if it was satisfied as well. This stack can be stored in place in the B[] array, because the B[] is not yet filled out for positions that are fences.

The pseudocode is :


fence_val = fence_pos = none

for(int i=1;i<size;i++)
{
    int prev = A[i-1];
    int cur = A[i];

    if ( cur > prev )
    {
        // make new fence and push stack
        B[i_prev] = fence_pos;
        fence_pos = i_prev;
        fence_val = prev;
    }
    else
    {
        // descending, cur is good :
        B[i_prev] = i;

        while( cur < fence_val )
        {
            prev_fence = B[fence_pos];
            B[fence_pos] = i;
            fence_pos = prev_fence;
            if ( fence_pos == -1 )
            {
                fence_val = -1;
                break;
            }
            fence_val = A[fence_pos];
        }
    }
}

This is useful in string matching, as we will see forthwith.


09-28-11 | Rugby

God damn rugby is a fucking joyous sport to watch.

1. No commercials. I just can't watch sports with commercials any more. Right when you start to get into the action you have to see some shit about Geico or Aflac or some shit. OMG what could be more of a scam than supplemental insurance. Maybe I should insure my insurance rate. And insure against my insurance company going bankrupt. And some supplemental umbrella insurance. Now I'm upset, no more watching commercials.

2. No breaks between plays. No instant replay. No timeouts. Just action action all the time.

3. Advantage. I think I did a post about this long ago, but the "advantage" rule for penalties is just so much fucking win. It means that you don't have to stop play for every piddly penalty, you let play keep going as long as the penalizing team has not gained an advantage from the penalty (more precisely, play continues as long as the team that was infringed upon is at an advantage compared to the position they were in at the time of the penalty). This sounds complex but is not and is just 100% win.

4. The refs. The refs in rugby are just uniformly superb. I'm not quite sure why they're so much better than any other sport. They have more autonomy and more freedom to make judgement calls, and they seem to do so well. One aspect perhaps is that most rugby refs have played a bit at the professional level, which I think is rare in other sports.

5. The game is (usually) not decided by penalties. I just can't watch basketball or soccer because of this. Penalties should encourage players to stick to the spirit of the game, the game shouldn't become all about the points you can get off penalties. It ruins the game.

6. The players aren't divers or whiners (mostly). In other sports you see the players taking dives, trying to draw fouls, or going and begging to the ref after plays. WTF, do you have no self respect? You're a grown man and you're diving and whining? WTF. I wonder if they secretly practice flopping in basketball and soccer training camps, or is that something (like taking steroids) that you're supposed to figure out on your own in a kind of nudge-nudge-wink-wink way. Maybe a veteran player takes the rookie under his wings and does some supplement flop and beg practice. I don't know how you can be a fan of a player like Robert Horry or Zidane; oh yeah, I really admire the way they fake being fouled, it's so graceful.

Anyway, I feel like most ruggers just want to get back to play. They want to win the game by smashing through their opponents with ball in hand, not by begging to the ref. And that I respect.

(I must say, watching the recent NZ-France World Cup match I was absolutely disgusted by the sleazy play of the French. Several soccer-style dives trying to draw penalty, one attempt at drawing "obstruction", and the very sleazy try by kicking off while the ref was talking, just absolutely scummy soccer-style tactics, I hope they get their eyes poked in every ruck).

7. Toughness. It's great to watch some big men just brutalize each other. This used to be part of the appeal of American Football but they've all become such delicate flowers now ; oo I have a stubbed toe better get off the field. Back in the day you had guys like Ronnie Lott who gave up a chunk of their thumb to stay on the field. That sort of macho insanity still exists in rugby ; if you get a sprain or even a broken bone or something of course you fucking stay on the field, what are you a pussy? You get off the field when your team doesn't need you any more.

Most of the WC games I've seen so far have been very pretty, good games of rugby, with good discipline and a few flashes of beautiful ball movement and big runs. That's not always the case, though, it can degenerate into a very ugly game. With unskilled teams you get scrums that don't hold together and constantly have to restart, you get lots of bobbled and dropped balls, they can't put together phases, and those games are no fun to watch.


09-27-11 | S2000 Prices

No surprise - S2000's have held their value very very well. For one thing, they're pretty rare, for another, they're great, and most importantly, they're a Honda, from the tail end of the glory days, when Honda was just giving away massive amounts of value. Honda was making $50k cars and selling them for $25k.

(aside : there's some sort of vague sense in which I believe Honda from somethinge like the mid-80's to the mid 90's made the greatest cars of all time; obviously not in terms of actually comparing them to modern cars, so that forces me to qualify it as "compared to their era" and then you get some moron who says the dupendinger hummdorf from 1910 was a bigger improvement over its era, which, okay, I have no fucking idea if that's true or not, and I don't care (and I doubt it), so I hate all that "good for its time" kind of rating. But anyway, when you look at the line from the CRX, the NSX, all the Type-R cars, just staggeringly good, so well made, reliability, perfectly tuned to give the driver the feedback and control he needs, and way way underpriced, so much more car for the money than the competition; it's a damn shame that that time is passed)

On the other hand, RX8 prices are falling fast, and something like a 2007 RX8 for $15k is looking pretty attractive -

Sometimes I go off on these flights of fancy about getting a 240Z or an E30 M3 or some cool old car like that, for the wonderful analog-ness of it, the lack of driver aids, light weight, direct steering. But when you can get an RX8 for the same price, come on, it's just better, much better. And then you don't have to deal with it being in the shop all the time. I like cars and all, but my tolerance for dealing with mechanics is very close to zero.


09-27-11 | God Damn It

When you are drilling a hole in a wood cabinet, first of all, just stop, don't fucking do it. But if you insist on continuing, god damn it make it centered and level and straight and all that.

If you're nailing or screwing into the exterior wood of a house, again, just stop. Do you really need to put screws in to hang your fucking garden party lights? No, I don't think you do. But if you insist, fucking caulk them or something so you aren't just piercing the waterproof skin and providing no protection.

If you're cutting a hole all the way through a roof or a wall to put a vent, fucking stop for a second and make sure you're doing it in the right place, make it level and centered.

You can't fucking undo these things.


09-27-11 | String Match Stress Test

Take a decent size file like "book1" , do :

copy /b book1 + book1 twobooks

then test on "twobooks".

There are three general classes of how string matchers respond to a case like "twobooks" :

1. No problemo. Time per byte is roughly constant no matter what you throw at it (for both greedy and non-greedy parsing). This class is basically only made up of matchers that have a correct "follows" implementation.

2. Okay with greedy parsing. This class craps out in some kind of O(N^2) way if you ask them to match at every position, but if you let them do greedy matching they are okay. This class does not have a correct "follows" implementation, but does otherwise avoid O(N^2) behavior. For example MMC seems to fall into this class, as does a suffix tree without "follows".

Any matcher with a small constant number of maximum compares can fall into this performance class, but at the cost of an unknown amount of match quality.

3. Craps out even with greedy parsing. This class fails to avoid O(N^2) trap that happens when you have a long match and also many ways to make it. For example simple hash chains without an "amortize" limit fall in this class. (with non-greedy parsing they are O(N^3) on degenerate cases like a file that's all the same char).


Two other interesting stress tests I'm using are :

Inspired by ryg, "stress_suffix_forward" :

4k of aaaaa...
then paper1
then 64k of aaaa...
obviously when you first reach the second part of "aaaa..." you need to find the beginning of the file, but a naive suffix sort will have to look through 64k of following a's before it finds it.

Another useful one to check on the "amortize" behavior is "stress_search_limit" :

book1
then, 128 times :
  128 random bytes
  the first 128 bytes of book1
book1 again
obviously when you encounter all of book1 for the second time, you should match the head of the file, but matcher which use some kind of search limit will see the 128 byte matches first and may never get back to the really long one.


09-26-11 | Tiny Suffix Note

Obviously there are lots of analogies between suffix tries and suffix arrays.

This old note about suffix arrays which provides O(N) neighbor pair match lengths is exactly analogous to using "follow pointers" for O(N) string matching in suffix tries.

(their paper also contains a proof of O(N)'ness , though it is obvious if you think about it a bit; see comments on previous post about this).

Doing Judy-ish stuff for a suffix tree is exacly analogous to the "introspective" stuff that's done in good suffix array sorters like divsufsort.

By Judy-ish I mean using a variety of tree structures and selecting one for the local area based on its properties. (eg. nodes with > 100 children switch to just using a radix array of 256 direct links to kids).

Suffix tries are annoying because it's easy to slide the head (adding nodes) but hard to slide the tail (removing nodes). Suffix arrays are even worse in that they don't slide at all.

The normal way to adapt suffix arrays to LZ string matching is just to use chunks of arrays (possibly a power-of-2 cascade). There are two problems I haven't found a good solution to. One is how to look up a string in the chunk that it is not a member of (eg. a chunk that's behind you). The other is how to deal with offsets that are in front of you.

If you just put your whole file in one suffix array, I believe that is unboundedly bad. If you were allowed to match forwards, then finding the best match would be O(1) - you only have to look at the two slots before you and after you in the sort order. But since we can't match forward, you have to scan. The pseudocode is like this :


do both forward and backward :
start at the sort position of the string I want to match
walk to the next closest in sort order (this is an O(1) table lookup)
if it's a legal match (eg. behind me) - I'm done, it's the best
if not, keep walking

the problem is the walk is unbounded. When you are somewhere early in the array, there can be an arbitrary number (by which I mean O(N)) of invalid matches between you and your best match in the sort order.

Other than these difficulties, suffix arrays provide a much simpler way of getting the advantages of suffix tries.

Suffix arrays also have implementation advantages. Because you separate the suffix string work from the rest of your coder it makes it easier to optimize each one in isolation, you get better cache use and better register allocation. Also, the suffix array can use more memory during the sort, or use scratch space, while a trie has to hold its structure around all the time. For example some suffix sorts will do things like use a 2-byte radix in parts of the sort where that makes sense (and then they can get rid of it and use it on another part of the sort), and that's usually impossible for a tree that you're holding in memory as you scan.


09-25-11 | More on LZ String Matching

This might be a series until I get angry at myself and move on to more important todos.

Some notes :

1. All LZ string matchers have to deal with this annoying problem of small files vs. large ones (and small windows vs large windows). You really want very different solutions, or at least different tweaks. For example, the size of the accelerating hash needs to be tuned for the size of data or you can spend all your time initializing a 24 bit hash to find matches in 10 byte file.

2. A common trivial case degeneracy is runs of the same character. You can of course add special case handling of this to any string matcher. It does help a lot on benchmarks of course, because this case is common, but it doesn't help your worst case in theory because there are still bad degenerate cases. It's just very rare to have long degenerate matches that aren't simple runs.

One easy way to do this is to special case just matches that start with a degenerate char. Have a special index of [256] slots which correspond to starting with >= 4 of that char.

3. A general topic that I've never seen explored well is the idea of approximate string matching.

Almost every LZ string matcher is approximate, they consider less than the full set of matches. Long ago someone referred to this as "amortized hashing" , which refers to the specific implemntation of a hash chain (hash -> linked list) in which you simply stop searching after visiting some # of links. (amortize = minimize the damage from the worst case).

Another common form of approximate string searching is to use "cache tables" (that is, hash tables with overwrites). Many people use a cache tables with a few "ways".

The problem with both these approaches is that the penalty is *unbounded*. The approximate match can be arbitrarily worse than the best match. That sucks.

What would be ideal is some kind of tuneable and boundable approximate string match. You want to set some amount of loss you can tolerate, and get more speedup for more loss.

(there are such data structures for spatial search, for example; there are nice aproximate-nearest-neighbors and high-dimensional-kd-trees and things like that which let you set the amount of slop you tolerate, and you get more speedup for more slop. So far as I know there is nothing comparable for strings).

Anyhoo, the result is that algorithms with approximations can look very good in some tests, because they find 99% of the match length but do so much faster. But then on another test they suddenly fail to find even 50% of the match length.


09-24-11 | Suffix Tries 2

Say you have a suffix trie with path compression.

So, for example if you had "abxyz" , "abymn" and "abxyq" then you would have :


[ab]   (vertical link is a child)
|
[xy]-[ymn]  (horizontal link is a sibling)
|
z-q

only the first character is used for selecting between siblings, but then you may need to step multiple characters to get to the next branch point.

(BTW I just thought of an interesting alternative way to do suffix tries in a b-tree/judy kind of way. Make your node always have 256 slots. Instead of always matching the first character to find your child, match N. That way for sparse parts of the tree N will be large and you will have many levels of the tree in one 256-slot chunk. In dense parts of the tree N becomes small, down to 1, in which case you get a radix array). Anyhoo..

So there are substrings that don't correspond to any specific node. For example "abx" is between "ab" and "abxy" which have definite spots in the tree. If you want to add "abxr" you have to first break the "xy" and then add the new node.

Okay, this is all trivial and just tree management, but there's something interesting about it :

If you have a "follow" pointer and the length you want does not correspond to a specific node (ie it's one of those between lengths), then there can be no longer match possible.

So, you had a previous match of length "lastml". You step to the next position, you know the best match is at least >= lastml-1. You use a follow pointer to jump into the tree and find the node for the following suffix. You see that the node does not have length "lastml-1", but some other length. You are done! No more tree walking is needed, you know the best match length is simply lastml-1.

Why is this? Consider if there was a longer match possible. Let's say our string was "sabcdt..." at the last position we matched 5 ("sabcd"). So we now have "abcdt..." and know match is >= 4. We look up the follow node for "abcd" and find there is no length=4 node in the tree. That means that the only path in the tree had "dt" in it - there has been no character other than "t" after "d" or there would be a branching node there. But I know that I cannot match "t" because if I did then the previous match would have been longer. Therefore there is no longer match possible.

This turns out to be very common. I'm sure if I actually spent a month or so on suffix tries I would learn lots of useful properties (there are lots of papers on this topic).


09-24-11 | Suffix Tries 1

To make terminology clear I'm going to use "trie" to mean a tree in which as you descend the length of character match always gets longer, and "suffix trie" to indicate the special case where a trie is made from all suffixes *and* there are "follow" pointers (more on this later).

Just building a trie for LZ string searching is pretty easy. Using the linked-list method (which certainly has disadvantages), internal nodes only need a child & sibling pointer, and some bit of data. If you always descend one char at a time that data is just one char. If you want to do "path compression" (multi-char steps in a single link) you need some kind of pointer + length.

(it's actually much easier to write the code with path compression, since when you add a new string you only have to find the deepest match in the tree then add one node; with single char steps you may have to add many nodes).

So for a file of length N, internal nodes are something like 10 bytes, and you need at most N nodes. Leaves can be smaller or even implicit.

With just a normal trie, you have a nice advantage for optimal parsing, which is that when you find the longest match, you also automatically walk past all shorter matches. At each node you could store the most recent position that that substring was seen, so you can find the lowest offset for each length of match for free. (this requires more storage in the nodes plus a lot more memory writes, but I think those memory writes are basically free since they are to nodes in cache anyway).

The Find and Insert operations are nearly identical so they of course should be done together.

A trie could be given a "lazy update". What you do is on Insert you just tack the nodes on somewhere low down in the tree. Then on Find, when you encounter nodes that have not been fully inserted you pick them up an carry them with you as you descend. Whenever you take a path that your baggage can't take, you leave that baggage behind. This could have advantages under certain usage patterns, but I haven't actually tried it.

But it's only when you get the "follow" pointers that a suffix trie really makes a huge difference.

A follow pointer is a pointer in the tree from any node (substring) to the location in the tree of the substring without the first character. That is, if you are at "banana" in the tree, the follow pointer should point at the location of "anana" in the tree.

When you're doing LZ compression and you find a match at pos P of length L, you know that at pos P+1 there must be a match of at least length L-1 , simply by using the same offset and matching one less character. (there could be a longer match, though). So, if you know the suffix node that was used to find the match of length L at pos P, then you can jump in directly to match of length L-1 at the next position.

This is huge. Consider for example the fully degenerate case, a file of length N of all the same character. (yes obviously there are special case solutions to the fully degenerate case, but that doesn't fix the problem, it just makes it more complex to create the problem). A naive string matcher is actually O(N^3) !!

For each position in the file (*N)
Consider all potential matches (*N)
Compare all the characters in that potential match (*N)
A normal trie makes this O(N^2) , because the comparing characters in the string is combined with finding all potential matches, so the tree descent + string compares combined are just O(N).

But a true suffix trie with follow pointers is only O(N) for the whole parse. Somewhere early on would find a match of length O(N) and then each subsequent one just finds a match of L-1 in O(1) time using the follow pointer. (the O(N) whole parse only works if you are just finding the longest length at each position; if you are doing the optimal parse where you find the lowest offset for each length it's O(N^2))

Unfortunately, it seems that when you introduce the follow pointer this is when the code for the suffix trie gets rather tricky. It goes from 50 lines of code to 500 lines of code, and it's hard to do without introducing parent pointers and lots more tree maintenance. It also makes it way harder to do a sliding window.


09-23-11 | Morphing Matching Chain

"MMC" is a lazy-update suffix tree.

mmc - Morphing Match Chain - Google Project Hosting
Fast Data Compression MMC - Morphing Match Chain
Fast Data Compression BST Binary Search Tree
encode.ru : A new match searching structure
Ultra-fast LZ

(I'm playing a bit loose with the term "suffix tree" as most people do; in fact a suffix tree is a very special construction that uses the all-suffixes property and internal pointers to have O(N) construction time; really what I'm talking about is a radix string tree or patricia type tree). (also I guess these trees are tries)

Some background first. You want to match strings for LZ compression. Say you decide to use a suffix tree. At each level of the tree, you have already matched L characters of the search string; you just look up your next character and descend that part of the tree that has that character as a prefix. eg. to look up string str, if you've already decended to level L, you find the child for character str[L] (if it exists) and descend into that part of the tree. One way to implement this is to use a linked list for all the characters that have been seen at a given level (and thus point to children at level +1).

So your nodes have two links :


child = subtree that matches at least L+1 characters
sibling = more nodes at current level (match L characters)

the tree for "bar","band",bang" looks like :

b
|  (child links are vertical)
a
|
r-n  (sibling links are horizontal)
| |
* d-g
  | |
  * *

where * means leaf or end of string (and is omitted in practice).

Okay, pretty simple. This structure is not used much in data compression because we generally want sliding windows, and removal of strings as they fall out of the sliding window is difficult.

(Larsson and others have shown that it is possible to do a true sliding suffix tree, but the complexity has prevented use in practice; this would be a nice project if someone wants to make an actual fast implementation of the sliding suffix trie)

Now let's look at the standard way you do a hash table for string matching in the LZ sliding window case.

The standard thing is to use a fixed size hash to a linked list of all strings that share that hash. The linked list can just be an array of positions where that hash value last occured. So :


pos = hashTable[h] contains the position where h last occured
chain[pos] contains the lat position before pos where that same hash h occurred

the nice thing about this is that chain[] can just be an array of the size of the sliding window, and you modulo the lookup into it. In particular :

//search :
h = hash desired string
next = hashTable[h];
while ( next > cur - window_size )
{
  // check match len of next vs cur
  next = chain[next & (window_size-1) ];
}

note that the links can point outside the sliding window (eg. either hashTable[] or chain[] may contain values that go outside the window), but we detect those and know our walk is done. (the key aspect here is that the links are sorted by position, so that when a link goes out of the window we are done with the walk; this means that you can't do anything like MTF on the list because it ruins the position sort order). Also note that there's no check for null needed because we can just initial the hash table with a negative value so that null is just a position outside the window.

To add to the hash table when we slide the window we just tack onto the list :


// add string :
chain[ cur & (window_size)-1 ] = hashTable[h];
hashTable[h] = cur;

and there's the sort of magic bit - we also removed a node right there. We actually popped the node off the back of the sliding window. That was okay because it must have been the last node on its list, so we didn't corrupt any of our lists.

That's it for hash-chain review. It's really nice how simple the add/remove is, particularly for "Greedy" type LZ parsers where you do Insert much more often than you do Find. (there are two general classes of LZ parers - "Optimal" which generally do a Find & Insert at every position, and "Greedy" which when they find a match, step ahead by the match len and only do Inserts).

So, can we get the advantages of hash chains and suffix trees?

Well, we need another idea, and that is "lazy updates". The idea is that we let our tree get out of sorts a bit, and then fix it the next time we visit it. This is a very general idea and can be applied to almost any tree type. I think the first time I encountered it was in the very cool old SurRender Umbra product, where they used lazy updates of their spatial tree structures. When objects moved or spawned they got put on a list on a node. When you descend the tree later on looking for things, if a node has child nodes you would take the list of objects on the node and push them to the children - but then you only descend to the child that you care about. This can save a lot of work under certain usage patterns; for example if objects are spawning off in some part of the tree that you don't visit, they just get put in a high up node and never pushed down to the leaves.

Anyhoo, so our suffix tree requires a node with two links. Like the hash table we will implement our links just as positions :

struct SuffixNode { int sibling; int child; }
like the hash table, our siblings will be in order of occurance, so when we see a position that's out of the window we know we are done walking.

Now, instead of maintaining the suffix tree when we add a node, we're just going to tack the new node on the front of the list. We will then percolate in an update the next time we visit that part of the tree. So when you search the tree, you can first encounter some unmaintained nodes before you get to the maintained section.

For example, say we had "bar" and "band" in our tree, and we add "bang" at level 2 , we just stick it on the head and don't descend the tree to put it in the right place :


b
|  (child links are vertical)
a
|
NG-r-n  (sibling links are horizontal)
     |
     d

(caps indicates unmaintained portion)

now the next time we visit the "ba" part of the tree in a retrieval, we also do some maintenance. We remember the first time we see each character (using a [256] array), and if we see that same character again we know that it's because part of the tree was not maintained.

Say we come in looking for "bank". If see a node with an "n" (that's a maintained n) we know we are done and we go to the child link - there can't be any more n's behind that node. If we see an "N" (no child link), we remember it but we have to keep walking siblings. We might see more "N"s and we are done if we see an "n". Then we update the links. We remove the "n" (of band) from the sibling link and connect it to the "N" instead :


b
|  (child links are vertical)
a
|
n-r
|   
g---d

And this is the essence of MMC (lazy update suffix trie = LUST).

A few more details are significant. Like the simple hash chain, we always add nodes to the front of the list. The lazy update also always adds nodes to the head - that is, the branch that points to more children is always at the most recent occurance of that substring. eg. if you see "danger" then "dank" then "danish" you know that the "dan" node is either unmaintained, or points are the most recent occurance of "dan" (the one in "danish"). What this means is that the simple node removal method of the hash chain works - when the window slides, we just let nodes fall out of the range that we consider valid and they drop off the end of the tree. We don't have to worry about those nodes being an internal node to the tree that we are removing, because they are always the last one on a list.

In practice the MMC incremental update becomes complex because you may be updating multiple levels of the tree at once as you scan. When you first see the "NG" you haven't seen the "n" yet and you don't want to scan ahead the list right away, you want to process it when you see it; so you initially promote NG to a maintained node, but link it to a temporary invalid link that points back to the previous level. Then you keep walking the list and when you see the "n" you fix up that link to complete the maintenance.

It does appear that MMC is a novel and interesting way of doing a suffix trie for a sliding window.


09-22-11 | Roku / Amazon Streaming

Not quite ready for prime time. Setup was real easy, that's good. But some things aren't quite right.

(tangential rant : why in the fuck do you bastards still make those fucking plugs with the DC box built onto the plug so that it blogs other outlets !? god dammit, you must know that it fucking sucks and you just don't give a shit. I have to run extension cords just to get the big plugs spaced out away from the power strip or UPS because they would all run into each other). (oh, and you fuckers can't decide if you want to put the protuberance on the side of the plugs or below the plugs, so no matter which way the power strip orients the plugs, there will be some fucking device that doesn't work with it)

(oh, and I decided not to get a Roku a few days ago because I don't want to buy more electronic shit for no good reason, but then I went on the PS3 to watch some Netflix and got the fucking "a system software update is needed" AGAIN; of course the fucking morons can't check that before you get into the Netflix app, so then you have to reboot back out to the dashboard, and they can't just fucking start the update for you then, you have to manually go fucking dig all over the massive convoluted menu to find it; the week before they logged me out of PSN to force me to agree to some new license agreement; WTF WTF have you got no concept of user experience? why are you morons all so bad at taking my money? the fucking plumber won't return my calls, fucking PS3 has chased me right off their console, wtf wtf).

1. The remote sucks balls. It's like a Fischer Price My First Remote. The buttons feel horrible, really stiff and clunky, and they're way too far apart from each other, so you can't just use one thumb and move it around without straining. The whole ergonomics of it are just awful. It has too much weight in the bottom which makes it take extra muscle to balance. The ok button should be in the middle of the arrows. It should be hourglass shaped. Fucking copy the Tivo remote god dammit.

2. Navigation on the Roku is super super slow. Hit "home" and wait, and wait, and wait, and wait. Okay, there it went. Of course it's better than going from PS3 Netflix to System Settings or something like that (which requires a reboot), and all these devices seem to be unreasonably slow (the god damn TiVo was always frustrating slow, especially because it was only slow because it was wasting time loading animations, fucking give me a "plain text" option so I can get god damn fast menus!). (* addendum : it seems to have sped itself up; maybe it was doing some background task? dunno; it's still not super fast but it's tolerable; some players seem to be faster than others, Amazon seems to be a particularly slow one).

3. Amazon streaming is just unusable. There's no "queue" type of thing where I can select a subset of stuff I want to watch from my computer, and then choose one of that subset from the Roku. This just makes the whole thing complete shit, because I'm not going to browse through a thousand titles with the fucking arrow buttons on the roku. Mouse and keyboard are good tools for browsing, fucking arrow buttons are not acceptable. (there is a sort of "queue" for things you purchase, just not for the free streaming stuff).

Oh well. Maybe if Amazon really does buy Netflix it will all be fixed.

What I really want is a premium subscription service to torrents. I'd like to pay $5 a month or something, and for that I get to choose what movies and TV shows I'd like, and some kid in Russia finds the best torrent for each movie and TV show and feeds them out to the subscribers. Obviously I can use EZRSS or something right now, but it's just flakey enough that I have to manage it by hand a lot, and I would pay to not have to do that.


09-22-11 | Sports Car Tires

I fucking went through my rear tires already, in just about 1 year or about 8000 miles. It's somewhat common for modern 911's to wear out the inner edge of the rear tires very fast, because you run a lot of rear negative camber, they're heavy in the rear, and of course you tend to drive around like a maniac.

I of course knew that tires for this car would be a lot more expensive, but it's a bit more subtle than that. If you just look at tire prices you might think they are 2-4X more expensive. They aren't just 2X or 4X more expensive, they're actually something like 10X more expensive. Here's why :

1. Just the basic tire is something like 3X more expensive due to being a large/rare size. ($300 a tire instead of $100 a tire)

2. But you don't want to buy el cheapo tires like you did for your commuter car, you want some nice performance tires, right? So now we're talking 4-5X more expensive.

3. And you can't get those tires at Big O or Walmart or whatever, so you have to go to a specialty shop, so the install is more expensive.

4. But the biggest factor is that you go through them much much faster. For one thing, they're a poor-treadwear soft compound, but it's also just the driving style. You're literally "burning rubber" all the time, and if you like to go fool around and slide some drifts or donuts, that can

5. Driving street tires on the track can also wreck them in one session, because they can't handle the heat cycles; you'll literally get melted rubber, usually on the outside edges if you're cornering hard, and it can just come off in chunks. (obviously if you're serious you have special track tires and you expect to go through them fast, but some people are under the misconception that they can take their street car to the track once in a while and it will be okay; well, yeah, it will probably be okay, but it will cost a lot more than you think)

The result is that tires are costing me almost $2000/year, which is rather more than I expected.

(basically all the same things could be said for brakes, though they don't wear quite so fast, and track days and donuts don't destroy them as instantly as they destroy tires (on some cars track days can destroy brakes because they get too hot and you can crack pads or even wreck calipers, but Porsches have pretty good brake cooling))


Anyhoo. I'm getting mildly annoyed with the car. My tires are shot and I can't get replacements in for a week cuz they're rare and have to be ordered (I'm sure I could find them at some shop around town if I wanted to make a million phone calls).

It would be nice to have a car that you could just find parts for anywhere. That you could break down in the middle of the hick middle of the country and find a mechanic who could fix it. I like having a car that's fun to drive but I don't like having a car that's a prima donna.

One of the advantages of the Lotus line is that you can take them to a Toyota mechanic. I wrote a post once about how small-car-maker engine production is super retarded, but I think I didn't post it.


09-21-11 | Four Myths

I believe that the modern white middle-upper class male is highly susceptible to these four myths.

1. "Stay out of it". Politics is a mess. It's frustrating. It's largely controlled by corporate spending and the outrageous emotional ideology of fringe crackpots who scream on talk radio. What can you do about it? It's better just to stay out of it. Sure, Fox News is telling insane lies all the time and shouldn't be allowed to mascarade as news, but what can you do about it? Sure, Corporations should not be counted as people under the 1st ammendment. It's just too frustrating and gives us a headache and distracts from our work. So let's be good consumers and go back to arguing about universal remotes and make some more products that people don't need.

2. "Happiness". One of the great mythical movements of the last twenty years or so has been a sort of spiritual glorification of the pursuit of happiness. Atheists who are increasingly disillusioned with the idea of doing something "significant" with their lives are grasping for a central concept to build their lives around, and what they have found is "just be happy".

I believe this is worth restating. Human beings need to believe that their life is for something; that there is a purpose, something to base your actions on day to day, that you're not just ticking off time until you die. Modern ultra-rational man finds it hard to believe in the purposes of older days; obviously religion is out, but even things like "write the great American novel" or "make a difference in the world" are hard for cynical modern man to build his life around, because he starts thinking "what's the point of that really?", all it does is make other people like you, maybe it helps other people but what's the point of helping other people really? This reductivist reasoning can destroy any "life purpose". So after several crises, modern man finds "happiness". I should just do what makes me happy.

Now, the "happiness" pursuers don't just go and do drugs and party or whatever; if you are aware of the happiness movement at all, it's somewhat sophisicated in the sense that it is looking for longer term deeper happiness, which might come from connection to your community, or building something with your hands, or traveling somewhere you are a bit afraid to go, etc. But the reason for it at the core is not doing something for the world, it is entirely selfish.

Because this modern happiness movement is somewhat more sophisticated than plain old gluttony and self-indulgence, it can be a bit hard to see, but in fact it is still exactly what the ruling elite want you to do. They want you to focus on your self, not on the world around you. They want you to avoid difficulty that might make you unhappy. They certainly don't want people dedicating their lives to changing the world or making it a better place.

3. "Identity Liberty". There have been substantial political gains in individual liberties in the last forty years; and there is more freedom and acceptance of "alternative lifestyles" and identities. While this is good, and I don't want to diminish the importance of greater rights for The Gays or whatever, it is really a side issue that has taken center stage. When you ask a liberal how you think the world has done in the last 40 years, they will inevitably bring this up as the major positive.

The things is, the ruling elite really don't give a rats ass about "identity liberty". It's a distraction. It's a gladiator fight in the colliseum to keep the rabble occupied while they keep raping you. They don't give a shit about gays or abortion or any of that shit. They care about the structure of power. And while we are fighting about whether there should be a Native American monument at Little Big Horn they have been putting wall street bankers in power at Treasury, the SEC, Fannie, the Fed, etc. They have been giving corporations greater power than humans or countries via NAFTA, Citizens United, etc. There is no more check on executive power or journalistic oversight. The entire congressional law-making process has become a joke.

It's like we're squabbling over the 2 men from Australia and they've just locked up the 7 bonus armies from the Americas. The most important thing is the structure of power, because the capacity for liberty flows from that.

4. "Meritocracy".

bonus : 5. "Anti-unionism".

... I got bored of this post. I'm gonna go watch TV and drink beer. Rugby world cup is on, woot!


09-20-11 | Sensory Pollution

My TV has a red light to indicate that it's OFF. Everything has lights to indicate they're on, including the power strips and UPS and such. The alarm sensors have a mess of lights.

Everything beeps when you press buttons. Gas air blower and lawn mower. Truck backing up, beep beep beep. Car alarms going off. Fucking beeps and honks when cars lock and unlock.

Fucking car headlights are way too bright. Some annoying cars now have this sparkle flashy thing when they brake.

It's an assault. Literally, it beats on your brain with a cudgel of "look at me!". No, god dammit, I want to look at what I choose to pay attention to.


09-19-11 | Game Theory

When your opponent in poker is playing like an absolute moron, you don't think "god damn this guy", you think "how do I take advantage of it". If you don't want to be quite so callous, another way to say it is : you must accept the reality of the situation you are given, and then think how can you best act in that reality. You shouldn't stick to a plan (like playing straightforward tight poker) and think that the world should go along with your plan, just because you are doing things "right" (in the naive sense) doesn't mean the world is obligated to go along with it and let you win.

WA landlord-tenant law is absurdly pro-landlord. Response : don't rent, be a landlord. (CA and NY law is very pro-tenant, response is the opposite of course).

Landlords don't actually charge you enough for move-out deposit subtractions. I'm constantly pissed off by the fact that they charge me for bullshit that is totally inappropriate (like charging me for cleaning even after I've hired professional cleaners). The thing is, they might charge you the $150 cleaning fee, but they don't charge you $100 for the pain in the butt of hiring the cleaners and letting them in and out of the house. Response : don't clean your rentals, just pay the charge. (further response : don't agree to more than $500 or so security deposit)

Service men who work on plaster, fiberglass, or any of those other nasty toxic substances don't charge nearly enough for the trouble of it. They basically just charge normal low-skill labor rates, no extra fees for the life-shortening or discomfort. Response : never do this work yourself, never work with toxic substances, chemicals or fine particles, always hire someone else to do it.


Home maintenance is one of those unstable equilibria of implicit contract (like the "golden rule"). What I mean is, in home maintance you have two options :

1. Fix things properly so that they last. or 2. Fix things just well enough so that they will probably be okay for 5-10 years.

I'm really talking about things that are hard to go back and change later, that are much cheaper to do well when you have the chance. Like you have a wall open, do you use high quality studs and put in extra wiring so that you won't need to open it again later, or do you just do the minimum for the moment? Or you have the foundation exposed, do you just fill a crack with vinyl crack filler or really properly fix the foundation for the future? Or you're doing framing, do you use dense high-quality treated wood that will resist rot for a long time, or the cheapest wood that passes code?

Let us assume that over 50+ years, the more robust choice (#1) will be much better, but over 1-20 years, the cheap out choice will be better.

For you personally, chances are that the cheap-out way (#2) will be +EV , because chances are you won't live in the same house super long. But for society as a whole, if everyone did the #1 choice and fixed things properly, we would all be better off. You wouldn't come in to a home and find deferred maintenance and crappy short-term patches. Your good quality work might not pay off for you (because you move out), but the next person would inherit it, and you would inherit the good work they had done.

The problem is that cheating on the social constract is always +EV for you personally, though it may be -EV for the group.

A good example in poker arises when many pros are at a table with a whale. The most +EV way for the pros to play is to all mostly avoid each other and go after the whale, but don't make it super obvious, and don't do things that annoy him and might chase him away. But the problem is that for any one pro, you can in the short term (local maxima) increase your EV by also going after the other pros and by really baiting the whale, for example isolation raising big any time the whale enters the pot, and re-raising other people's light isolations. The problem is once all the pros start doing this, they wind up shutting the whale out of a lot of pots and playing too many pots just with each other, and the net EV of the pros goes way down.


09-19-11 | Netflix Super-Self-Crapulation

Wow, great example of an "apology" that makes things so much worse. Paraphrase :

"We've listened to your complaints and decided that we don't give a shit, so we're going to continue in that vein even more! We will be going ahead with our corporate strategy to fuck you over; our long term plan to gradually phase out physical DVD's isn't going fast enough for our quarter-by-quarter stock growth expectations, oh and I'm going to do the massive-douche thing of pretending that fucking you over is somehow good for you".

Original :

I messed up. I owe you an explanation.

It is clear from the feedback over the past two months that many members felt we lacked respect and humility in the way we announced the separation of DVD and streaming and the price changes. That was certainly not our intent, and I offer my sincere apology. Let me explain what we are doing.

For the past five years, my greatest fear at Netflix has been that we wouldn't make the leap from success in DVDs to success in streaming. Most companies that are great at something – like AOL dialup or Bordeers bookstores – do not become great at new things people want (streeaming for us). So we moved quickly into streaming, but I should have personally given you a full explanation of why we are splitting the services and thereby increasing prices. It wouldn’t have changed the price increase, but it would have been the right thing to do.

So here is what we are doing and why.

Many members love our DVD service, as I do, because nearly every movie ever made is published on DVD. DVD is a great option for those who want the huge and comprehensive selection of movies.

I also love our streaming service because it is integrated into my TV, and I can watch anytime I want. The benefits of our streaming service are really quite different from the benefits of DVD by mail. We need to focus on rapid improvement as streaming technology and the market evolves, without maintaining compatibility with our DVD by mail service.

So we realized that streaming and DVD by mail are really becoming two different businesses, with very different cost structures, that need to be marketed differently, and we need to let each grow and operate independently.

It’s hard to write this after over 10 years of mailing DVDs with pride, but we think it is necessary: In a few weeks, we will rename our DVD by mail service to “Qwiksterâ€. We chose the name Qwikster because it refers to quick delivery. We will keep the name “Netflix†for streaming.

Qwikster will be the same website and DVD service that everyone is used to. It is just a new name, and DVD members will go to qwikster.com to access their DVD queues and choose movies. One improvement we will make at launch is to add a video games upgrade option, similar to our upgrade option for Blu-ray, for those who want to rent Wii, PS3 and Xbox 360 games. Members have been asking for video games for many years, but now that DVD by mail has its own team, we are finally getting it done. Other improvements will follow. A negative of the renaming and separation is that the Qwikster.com and Netflix.com websites will not be integrated.

There are no pricing changes (we’re done with that!). If you subscribe to both services you will have two entries on your credit card statement, one for Qwikster and one for Netflix. The total will be the same as your current charges. We will let you know in a few weeks when the Qwikster.com website is up and ready.

For me the Netflix red envelope has always been a source of joy. The new envelope is still that lovely red, but now it will have a Qwikster logo. I know that logo will grow on me over time, but still, it is hard. I imagine it will be similar for many of you.

I want to acknowledge and thank you for sticking with us, and to apologize again to those members, both current and former, who felt we treated them thoughtlessly.

Both the Qwikster and Netflix teams will work hard to regain your trust. We know it will not be overnight. Actions speak louder than words. But words help people to understand actions.

Respectfully yours,

-Reed Hastings, Co-Founder and CEO, Netflix

p.s. I have a slightly longer explanation along with a video posted on our blog, where you can also post comments.

Lesson for myself :

Never ever NEVER put any work into a web site. Do not post to forums. Do not write reviews. Do not keep lists of movies. If you do not own the content, they will fuck you. Be it censorship (Yelp, Amazon, CNET), using your work for advertising profit (everyone), deleting your work or shutting down the site, introducing new "features" or revising the site in a way that breaks it, or just otherwise fucking you.

I know this, but I get sucked into thinking "oh it'll be fine, just this one time". It's also one of those things where everyone else is doing it, so I start to think "hmm maybe it's fine, maybe I'm being unreasonable". Nope. Everyone else is wrong. Make your own correct decision, and that is do not give control of your personal content to anyone else.

ADDENDUM :

Somehow I completely missed the fact that Amazon Prime subscribers get free streaming, just found out today (that's why it's occasionally useful to talk to other human beings). WTF Amazon, way to go informing me. You do a great job of letting me know about the fucking Amazon Visa that I don't want, but not this. Wow. And Roku streams Amazon Prime. Goodbye Netflix!

ADDENDUM 2 :

I'm surprised that a lot of people don't get why this is such a massive fuckup. The greatest asset that Netflix has is a large user base that has invested personal time into the site, writing reviews, tracking what movies they've seen and what they want to see. It's a "Yelp" or "Myspace" for movies. (they've already massively fucked up on this by failing to develop the community features and such, but whatever).

When they split the pricing earlier this year, it caused a lot of us to switch to streaming only, at which time we discovered that with streaming only you couldn't even *see* the movies that weren't available for streaming. That's such a massive fuckup right there. I should still be able to mark what movies I want to watch and which I have seen already. Having users on your site storing their movie-watching preferences is what gives you value. It's what makes them committed to your site.

Now completely splitting the streaming and non-streaming into two sites so that I no longer have one place to go and store my movie watching desires (and hopes and dreams).

ADDENDUM 3 : Netflix also silently deleted your "Saved" section of the Instant Queue recently. Hope you didn't have any data in there you wanted.

this comic is alright.

They're being so massively retarded that it has to be intentional. It makes me think this speculation might be true.


09-15-11 | DIY

I am so fucking sick of doing this shit, it's such a waste of time. I wish there people you could hire that would do this shit for you, but it just doesn't seem to exist. I thought I found the ideal thing - Seward Park Repairs , right in my neighborhood, just call them up they subcontract out and take care of whatever. (the biggest problem with hiring people to do shit is the amount of time it takes to find them and call them up and then vet them and so on). But then I found a forum post by the owner of Seward Park Repairs where he says it's okay to vent a bathroom exhaust fan into the attic. WTF David, if that's the kind of shoddy ass work you do, I'm glad I found out first. Wow, how are you people all so epically incompetent.

I really want a grounds manager to just take care of this shit for me. Higgins, this door is sticking, have someone take care of it, of course sir. I guess I need to make more money, but that's not possible as long I'm wasting all my time on fucking DIY bullshit!

There is this stupid dangerous machismo of "I could do that". Harrumph, I won't hire someone, I'm a man, I can do that myself. Of course other people will judge you and pressure you in this way as well; harrumph why'd you hire someone to put in that vent fan, can't you do that yourself? don't you have a penis?

Well, that's fucking stupid. Just because you *can* do something doesn't mean you should. Just because lots of stupid people think that it's admirable to do things yourself doesn't mean that it actually is.

What's admirable is making the right decision for yourself, regardless of what others think is right. This is obvious, but it's a very difficult way to actually live.

It would be so nice to be able to hire someone to take care of things and be able to trust that they will do a decent job. I don't even mean a builder, it could just be anyone off the street and they could subcontract everything. All they have to be able to do is basic research and decision making and phone calls. Unfortunately that takes a ton of intelligence and is very hard to find. It's hard to find even among highly skilled programmers.

A crucial aspect is the "what needs approval" question. You need an employee who has the sense to know what they should ask you about, and what they should just decide themselves and not bug you. Both ways of getting it wrong are bad; you don't want someone who just goes off and makes a bunch of decisions and you find out too late that you got the gold-plated Versace bathroom set ; but you also don't want someone who comes to you every five minutes with every question. The vast majority of programmers I've worked with fall towards one or the other side; it's very rare when you find someone who has that sense of judgement to know hey this is important I better ask about it, or hey I should just keep on trucking and make my own call on this.

I have a whole rant percolating about how shockingly hard it is to find what I consider "basic professionalism". You want to know why you're all out of work? When you get a business call, return it right away. When you have an appointment, make it, and if you can't then you call and tell them you'll be late *before* the appointment time. When you say that you will do something, you fucking get it done or you let them know very early on that you can't do it. If you are given a task that you don't know how to do right, you say so, you don't just try to fake it and fuck it up.


09-15-11 | Spray Park

Spray Park is one of those easily accessible and outrageously beautiful places that are a real bonus to living here. You hike through a bunch of typical northwest forest, up a decently hard hill, and then suddenly emerge into this sub-alpine meadow wonderland of little wild flower and new growth, backed by the giant mountain. Further up you can get into the true alpine barren lands, which are sort of calming in their emptiness. Eventually you get up into big snow fields, where I've seen the crazy outdoors people skiing in the middle of summer.


09-11-11 | Shitty Product Design

I feel like a lot of "design" is making product worse. With lots of products there is a well known good way to make them, just fucking leave it alone, stop changing things for no good reason, you're fucking them up.

A good example are "stemless wine glasses". Uh, WTF, you moron, the stem is there for a reason. You just made it look trashy and made it much worse. Oh yes, I want finger prints on my wine glass. Yes, I love to warm up my wine when I hold it. Oh yes, I don't want to be able to hold it up to the light properly.

(Classic designs are not always right though. I really don't see the point to double hung sash windows. Why in the fuck do I need to be able to open the top part? When do you want the top sash open and the bottom closed that you couldn't just have the bottom open instead? In my experience with these fuckers, the only thing the top sash is good for is letting in massive air leaks, or falling down slightly and getting stuck and being incredibly hard to slide back up because it has no handles or anything.)

And now for a photo tour of horrible designs around my home :

Pushing a window open with my hand was much too difficult. It only took a second and was easily adjustable, but I had to lean over. I'd much rather crank on this fucking floppy handle for five minutes. The result is that sometimes I want the window open but I just can't be bothered to take the time to crank it a million times, so I just open another window that isn't on a "convenient" crank.

These kind of faucets are horrible in many ways. The mode clicker at the end is always flakey (this one happens to work mostly), the fucking retractable head is totally unnecessary and just makes it wobbly and lower flow, but worse of all is the fucking joystick water control. It makes precisely setting the flow volume and temperature almost impossible. The variant of the joystick which pervades cheap hotel showers is the real pinnacle of shitty water control design. Fucking hot/cold knobs worked great. They're easy to adjust precisely, they hold their position and can't be easily bumped into scalding or freezing, they aren't fighting gravity so they don't slip. If you really want to change it would could do flow & temperature knobs, but don't fucking abandon the two-knob design! Knobs are perfect! I think water control peaked when the two faucets for hot and cold got combined into one faucet, but you still had the two knobs, it's been down-hill ever since.

I've done this one before, but it was in my sight so let's do it again. The Melitta single cup dripper is such a clear case of taking a near-perfect product that does the task it is designed for, and just fucking it up for no reason. (the only thing I would change about the original single cup dripper is I'd make it a bit bigger, because I like to use an obscene unreasonable amount of grounds for one cup of coffee).

Product designer : Why would anyone want to grab a drawer handle from above? I know, let's seal off the top and make a big surface to get dirty!

Quick - find the Stop button. Too late, your burrito exploded. Thank god for the "vegetable" feature. I'm sure glad there's a "hold warm" and "light timer" feature. And WTF is "cook" ? I bet they could cram some more buttons on there and it would be even more deluxe.

You could do the stop button test again. Think about how much your hand has to move around the panel just to set it to bake at 350. There's just no fucking thought about usability. Touch pads like this in general are just horrible interface devices, and sadly are getting more and more common (see for example washer/dryer rant). Two knobs is the perfect interface for an oven. One for temperature and one for function (off/bake/broil) (physical radio buttons would also be okay for function).

WTF are you product designers thinking? Do you actually think you're doing good work? You're not. You're making shit worse. You should be ashamed. You should feel humiliated and miserable every day at work as you take good products and make them trendy or "modernize" them or make them slightly cheaper to mass produce.

There needs to be something like a hippocratic oath for product designers ; "First, don't make it worse".


09-11-11 | Walk to the Lake

In pictures :

Discovered this hole in my bathroom ceiling. The gray at the top is the bottom of the upstairs bathtub, so I assume this was a water overflow that soaked through, so they just cut out the rotten bits all the way through three layers of floor. So that was most of a day wasted fixing this, and then repainting the ceiling. The worst part was that the hole they cut was a ragged mess so I had to square it up, and the ceiling is old plaster and lathe which is a pain in the ass to cut cleanly.

Leaving home now. Back yard is a nice place to sit. It's great to be able to walk down to the lake at the end of a long hot day of breathing plaster dust and paint fumes.

Fucking neighbor is running a drain pipe from his gutter onto my property. More annoying shit to deal with.

I feel like the blackberries have been especially good this year. Maybe it's just because I've moved to an area that has a lot more wild land than Cap Hill. In Seattle if you ignore a patch of land for a few minutes it becomes instantly covered in black berries. There's a pretty good patch of blackberries on almost every block around here, so you can take a stroll and snack as you go. I love just the smell of them, they make the air sweet and rich. I love how they come ripe at different times based on sun exposure and microclimate, so that the ripe season lasts over a month, you just pick from different sides of the block.

One of the trees on my parking strip is growing into the power lines. In Seattle it's the home owner's responsibility to keep their trees out of the lines. Texas and CA are not like that, the city or power company does it. Seattle also has no street sweeping (except right down town). And there are very few street lights (home owners are suggested to have a bright porch light).

On the way down to the lake now; there's been this congregation of sail boats next to I-90 almost every day this summer.

My local swim spot. Nice and un-crowded. Unfortunately there's also a sewage pipe near here (there seems to be one at almost every swim spot, including the official ones with life guards, I'm not quite sure why that is, I guess there's a mutual correlation that swim spots and sewer lines both tend to be on large patches of public land). ( see here for map )


09-11-11 | Consumer Choice

Capitalism just doesn't work.

When I went in to buy a washer+dryer a while ago, I told the guy exactly what I wanted : modern efficiency and quiet and good function, but no fancy features, no computers, just a physical knob and hard switches. Nope, sorry, that doesn't exist. Well, fuck, okay, can I try out the ones you have to see which has the least annoying fucking stupid computer? Nope, they're not plugged in. Well, fuck.

We went to buy a little inflatable boat the other day to paddle around the lake. Going in I thought - the main thing I'd like is for it to have a normal Shrader valve (like a car tire or old American bicycle) so that I can use the nice pumps I have instead of the shitty flimsy plastic pumps that they give you. Nope, not one.

How am I supposed to pick the product I want and steer the market when there's not one good choice?

I decided to just bite the bullet and pay Netflix $8 a month just to let me record movies that I'm interested in watching on instant some day.

Of course much worse is that Comcast is fucking me over and by government-regulated monopoly I have no recourse at all. They get to punch me in the face and I have to say "thank you sir can I have another? please don't take my internets away".

In other stupid product news, I've been constantly annoyed that my fucking TV insists on showing "Air/Cable" as an input option when I have nothing plugged in there. I have to toggle inputs between my computer & PS3, and sometimes I accidentally stop on "Air/Cable" , at which point I'm beaten about the ears with brutal static. Fucking god damn it. First of all, you know there's no signal, it fucking says "no signal" right there, maybe you could show me a silent black screen instead of audible static, hmm? okay? Second of all, when there's no fucking signal, maybe you could just disable that input option the same way you disable the other inputs when nothing is plugged in them hmm? Actually the worst case is when the PC and PS3 are both off and I turn on the TV, then it insists on showing me static and I can't change the input source at all. Anyway, the conclusion is that I'm buying an antenna just so I get a signal instead of static. Fucking hell.


09-10-11 | You Fuckers

If you have set your blog to not deliver a full copy of the post by RSS, I'm unsubscribing. You can't bully me into clicking through. Provide it in a way that's nice to your reader.

Fucking cars that beep their horn when they lock & unlock is my latest nemesis. A quiet little chirp from a special-purpose speaker is sort of okay, though really it should come from the *key* not the *car* and it should be much quieter, the purpose is for the owner with the key to hear it, not the whole world (and you know, really if the key just had a separate lock & unlock button (not a state toggle) and flashed the lights, then it could avoid making any sound at all, which would be preferable). But many car makers have fucking cheaped out in the shittiest pettiest way. They thought hey, we already have a noise maker for the horn, we don't need to add another one to do the lock/unlock chirp, we'll just beep the horn and save a dollar per car. WTF, not okay. Try sitting outside a coffee shop in a strip mall, it's a fucking chorus of honking horns as people get in and out of their horrible cars, BEEP, BEEP BEEP, BEEP, ack, WTF. What if you drove in near where someone was sitting and just honked your horn for no reason? Do you think that would be okay? No, of course not, it would be a huge dick move, fucking honking your horn in a parking lot for no reason, but that is exactly what you're doing. Fucking hell.

Car alarms are fucking infuriating because I shouldn't have to be writing this. We all heard the comedians in the 80's (the guys who did "toilet seat up" and "what's the deal with airline food" and "I can't program my VCR") who did the jokes about "what's the point of a car alarm? it's just to annoy your neighbors; you're not actually running outside to see a crook when your alarm goes off? am I right people?" and we all laughed and though "ha ha he's right car alarms are pointless" - BUT THEN YOU KEPT BUYING THEM! Why !? You just laughed and recognized they serve no purpose but to annoy, so why do you keep buying them? Jesus christ.

(The worst case actually is riding the fucking ferries up here, where the vibration sets off lots of people's alarms, so you get a massive racket in the ferry, and then you get the loudspeaker doing "will the owner of a blue BMW please turn off their alarm", several times per trip)

Jet skis on lake washington are getting annoying. It's not that jet skis are inherently evil, it's that they attract massive douche bags. Jet ski owners (like Harley douchebags) seem to think it's cool to run them un-muffled making way too much noise. And of course they won't just go out to the middle of the lake, they have to come buzz the shore to show off, oo look at you on your douche-mobile, speeding around way too close to kayaks and sail boats, buy a fucking muffler and get away from non-motorized vehicles you fucking dick.

Sometimes when I'm sitting by the lake I imagine what it would be like if there no motor boats at all on the lake, only sail boats. Delightfully peaceful and picturesque. What if there was a green parkway all around the lake. What if there were cafes with outdoor tables and chairs. What if Seattle didn't dump its sewage in the lake? I feel like Seattle is one of the more naturally beautiful urban settings in the world, but we sort of waste it.


09-06-11 | Connecting Wires

This info was a bit hard to find, so summary :

Basic connection of "solid" (typically 14 gauge) to solid wire (by solid I mean single core) : strip a good inch of each. Lay metal bits side by side, grab the tips with pliers and twist clockwise (same direction you will screw on the wire nut). The twisted wires should appear to be like a threaded screw in the direction the nut will tighten on. If after twisting the ends are not neat, don't try to crimp with pliers, instead snip off the forked bit down to a neat nub with wire cutter. Screw on wire nut to hand tight. Wrap electrical tape clockwise (tightening the wire nut, not loosening) around the nut and then around the wires.

Connecting "stranded" to stranded (stranded = many small cores making up the wire). Strip about an inch or slightly less. Lay both wires side by side and fan them out flat (undo any twisted). Mush they fanned out wires together to make like one big stranded wire. It may help to tape the insulated part together just to hold the wires in place as you do this. With your fingers gently twist the big stranded wire clockwise. This is just so it forms a neat tip, not to create threads. Screw on wire nut & tape.

Connecting stranded to solid : this is by far the weakest of the three connections, and ideally you would use solder and/or crimp connection or perhaps a screw terminal block. Another option is to solder the end of the stranded wire to make it effectively a solid end, then wire nut it. But assuming you don't want to do any of that, you do this : strip stranded wire to 1.25 inches, solid to 1 inch. Twist stranded wire by itself to give it a neat solid end. Lay the two wires side by side with the stranded extended slightly past the solid, 1/8" or so. Tape the insulation of the two wires together just to hold them together. Do not twist the wires with each other. Screw on wire nut, then wrap in tape.

In all cases you can test the connection by giving a little tug, there should be no feeling of looseness (and if the little tug was enough to wreck it, it was no good). Wire nut connections only work great between wires of roughly the same gauge.


09-03-11 | Bullshit

WTF is up with "flat feet" ? It's very strange that back in the early 20th century, when half the world went to war, and every able bodied young man was sent out to die, you could avoid all that just by having a low arch. It seems like a scam, like maybe Bush the First had flat feet and it was a way to get out of WWI, or perhaps it was some voodoo animist belief that the flat-footed were bad luck, I dunno, it's very strange. Could I get out of war service because of a deviated septum? it does make it hard to breathe. What about bunions? WTF, why flat feet?

Nails are fucking bullshit. When you build something with nails you're basically saying "I expect this to pull apart in 1-5 years". Screws are slightly better, but still just a friction bond that can easily wiggle free. If you actually are building shit to stay together, you use bolts or proper wood fittings (dovetails or dowels or such like).

Of course nails do have their place - as temporary non-load-bearing tacks to hold together a structure so that the major pieces can bear the load. When a house is framed, for example, nails are used to hold the boards together, but the nails are not expected to bear the loads, they just put the boards in the right place and then the loads are tensile and compressive through the boards. It's similar to the role of rivets in an iron bridge - they should not be load bearing, they just ensure that two I-beams meet up correctly, and the load is all in the beams.

Anyway, the problem with nails is that people don't understand them (and/or get lazy) and use them incorrectly. This goes not just for DIY'ers but also contractors and expensive home builders, who wind up doing things like building railings and fences that are held together by nails such that when you lean on them the force acts to push the nail out.

Shitty nail-built bullshit.

All the shopping at Lowes / Home Depot / etc. really has illuminated my understanding of the depressing shitty quality of modern American construction. All your contractors shop at these places, when you go to your average shitty tract home, most of the home is this stuff. There just isn't a single good quality piece in the place. It's all asian-made super cheapo crap. The light switch covers are bendy plastic. The yard tables are wobbly thin metal. All the pieces are just shitty. Even if you're a contractor who wants to do better work - where do you get your supplies? There is no better choice. It's not like there's the cheap shit and then some more expensive actually well made choices, no, it's just not there at all, there is no well-made choice, you only get shit.

Drills are fucking bullshit. You take this long proboscis and of course you can't possibly line it up straight. WTF , there should be a flat metal plate around the bit which is attached by pistons to the drill body, so that you can set the plate flush and get a good perpendicular hole.

But maybe it doesn't matter because wood is fucking bullshit. You might think if you buy a 2x4 you get a piece of wood which is 2x4. Nope. It's actually probably something like 1.75 x 3.75 , because they measure the size before planing. But it's worse than that, if you buy a bunch of 2x4's, they will all be slightly different sizes, so if you try to make a flat tray from them or something it will be all uneven. But even worse than the variation in sizes, they will be warped and twisted and all out of whack in ways that make clean building seem absolutely impossible to me. I suppose higher grades of wood are probably milled more uniformly.


09-02-11 | Old Wiring

I'm replacing a bunch of our 2 prong outlets with 3 prongs. For my computer, I'm going to try to actually ground it properly, but for the rest I'm just leaving the ground attached to nothing.

All the electrician manuals say if the "receptacle" (that's the fancy name for "holes") is not actually grounded then you must use a two prong outlet, so that the user doesn't think it's grounded when it's not. Oh noes! You lied to me about grounding! What the hell am I going to do if I have a 3 prong dealy to plug in and only 2 prong outlets, I'm not going to say "oh well, I guess I can't use this because I don't actually have a ground", I'm just going to use one of those little adapter dealies. (*)

Oh, and that little tab on the adapter where you can run your own ground is a huge lol. Yeah right, adapter, like I'm going to plug you into the wall and then run a wire outside the wall out my window and hammer in a 6 foot iron spike so I can be properly grounded.

I mean, when the fuck do you need grounding anyway? Our electric service is not sending massive lightning surges into the house at random intervals.

The worst thing about the two prong holes is just that plugs don't stay in them. The grounding prong is really just to hold your fucking plug in the holes. The only time I've ever seen scary sparks from receptacles is because of the two prong dealies not making good contact and pulling halfway out and bending the prongs and so on.

Our house has a mix of old knob & tube wiring and newer stuff. Any electrician who works to code is not allowed to touch the old stuff, their only allowed action is to replace it. So I either have to do the work myself, or hire someone who will work under the table.

It turns out that hiring people to work under the table is not actually hard at all. So far I have yet to encounter a single contractor who insists on working to code; in fact they all say something like "I could do this to code but it will cost you 25% more". Okie doke. (I imagine some national chain guys would insist on doing it by the book).

* = you see the same sort of daft behavior from library writers all the time. They think they are being "rigorous" and "safe" by not providing a function to the user which is maybe a bit dangerous to use, or maybe doesn't do exactly what you would hope it does, but it's retarded. What do you think the user is going to do? Not write code in a way that needs that function? Pfft, of course not, they will just write their own version of the faccility you failed to provide, but their version will be *much worse*, so by being "safe" you have actually made the final product worse.

On a semi-related note, I just got a washer & dryer, and of course the delivery guys won't hook up the gas. It's funny/ironic that because of liability fear, they won't help you with the bit that's actually dangerous and could use the touch of someone with experience (the main issue is knowing how tight to tighten the fittings).

ADDENDUM : well I tried to ground the computer outlets, but it was way too much trouble. I would need a flex drill extension, which is not that hard, but then I would have to worry about what exactly it is that I'm drilling through inside the wall that I can't see (maybe some dental mirrors would make that safer), then after that there's a hole you have to try to thread a wire through it from outside the wall. No thank you. Grounding is over-rated.

Some A-hole at some point ran new romex into the walls, but didn't hook up the ground line of the romex to anything, and then installed two-prong receptacles. That's legal code and all, but it's fucking shitty. It means I can put in a three-prong receptacle and hook up to the romex but I have no ground. And I don't know where the other end of that line is to get to it and try to attach it to something. All the lines go through the basement so it would have been easy to just run it over to a water pipe. They put in a new duplex box so they had the wall open and could have done it but didn't. Fuckers.


08-30-11 | cb's guide to using the Comcast self-install

Comcast now offers self-install for cable modem, which is awesome. Unfortunately, their suggested process is not awesome. My guide :

1. Call in (or use the online live chat) to sign up for your new service. They will tell you to go to your local comcast store and buy a self-install kit. Don't. They will tell you to call in to tech support to speak to someone when you do the self-install. Don't.

2. Just plug in your cable modem, router, and computer.

3. Open a web browser. Any connection you try to make should automatically be showing the Comcast self-install page.

4. Click through various obvious prompts until it says "Activating... (please wait 10 minutes)". Wait for a long time. For me, this page never moved on, so if that happens to you proceed to the next step :

5. Close the browser and re-open it. You should get a prompt about "resume activation in progress". Select "create a new login" and fill in the blanks.

6. You will now to get a page that says "continue to download the Comcast desktop package" ; there's no option to not do it. Go ahead and click continue and you will get a download popup window with the "Run , Download , Cancel" options. Hit Download or Cancel (not Run!!), and then just don't install it. This software is nasty malware that tries to install a toolbar and change your home page and all that shit.

7. Close the browser and open it again - you should now be live on the net.


08-30-11 | Big Pile of Junk

I dunno why I thought a home you buy would be clean; of course not, it's not required in the contract so only a sucker like me would do it (cleaning).

This is what we (or rather, our employees) pulled out of the house :

There was some crazy junk up in the attic and garage from the 40's / 50's. Lots of old metal bits that looked like turbine parts or something. I should've taken more photos but I was in a crazed rush with all the shit I was trying to get done before move-in.

Another thing that's obvious that I didn't realize is that between the inspection and key exchange, many sellers badly neglect the house. No yard maintenance, it might just sit vacant and dirty; sometimes they even actually trash it. It seems to me that a last minute walk-through before signing the closing papers would be a good idea, but nobody does that.

Anyway, paint is up, house is clean, lots of little shit left to do still, but I think the crazed part is over. I intentionally bought a house in very good shape to avoid living in a "fixer upper", but there's still just so much little shit to do.

Of course a lot of the problem is that I find myself succumbing to "home improver's disease". I find my eyes just scanning the room, and when they alight on something that isn't right the thought "I should fix that" pops into my head. All that little shit that you would just ignore in a rental like "those cabinet pulls are really ugly" or "that toilet paper roll holder is kind of broken"; my eyes won't just scan over it and keep moving, they get stuck, like they have friction and just catch up on it and my brain mentally sticks it on the todo list. This is a real disease that I have to resist.

A home is a bit like game development in that the todo list is essentially infinite. For games I've always liked to categorize todos into three groups : 1. must do , 2. would really like, and 3. wish list. You work on tasks in strict priority order generally, first all #1's , then only do #2's when there are no #1's. When new tasks come up you do an initial categorization, but you have to be flexible and move them up and down the list over time as things change (mainly you move them down the list as the #1's get too numerous).

The reality of game development is that you basically never get to any of the #3's, so in fact #3 is just a way of writing down something that you won't do (though it could move up the list under later scrutiny). And in fact you won't do most of the #2's either.


08-23-11 | The Locker Room

The locker room is the most terrifying place for the awkward male. I'm not talking about people who are scared of showing their penis, that's a very ordinary and easy and boring fear. I'm talking about the social awkwardness of it.

The big problem is the two archetypes of locker room monster :

There's the "never nudes". These guys shower in their bathing suit, then when they have to change, they huddle into a little ball in a corner and quickly slip one thing off and another on (or use a towel to cover up even then). They're tense and nerdy and you don't want to be like them.

Then there's the way-too-comfortable guys that walk around totally naked for way longer than is necessary to do their business of changing. The worst of these is the old closet gay guys. They're usually 60+ , have a giant white bush that thank god has overgrown whatever cock they once had, so that they now just have an androgynous mound of hair (and yes, I have looked at old man bush, you can't really avoid it when they come up to you and talk to you when you're sitting down and they're standing completely naked; okay old man, you got me to look at your bush, now go away).

So you have to try to straddle the line between too much showing and too little showing, which creates the painful awkwardness. The ideal man should not be afraid of anyone seeing his naked body, but he also shouldn't inflict it on those who don't want to see it, nor should he make himself the eye candy of perverted old locker room prowlers.

Shower? Definitely bathing suit off. On is way too anal. But mostly facing the shower head. Occasionally turn around to face the room just to show that it's no big deal. Only the absolute minimum of penis cleaning. But don't avoid touching it altogether either.

The walk to the locker is a difficult moment. To wrap the towel or not? Less than twenty feet away - no wrap. Twenty feet or more - wrap.

Dressing - reasonably quick to get the underwear on, but not a frantic dash. Don't stand in a corner, but also don't face the room.


08-23-11 | Painting

People who paint their house themselves are making a mistake. When you count the materials, the tools/equipment, all the time to buy things and learn how to do it (*), and then do it, it's clearly a huge net loss even if you make almost $0 per hour (by almost $0 I mean less than $50). But most of all, you do a shitty job. (* = most people don't actually spend much time learning how to do it, they think they can just slap some paint up however). When shopping for houses, I would say a good 90% had some DIY painting and it was almost always epically shitty. Streaky with obvious brush marks, too thin with the previous color showing through, not primed right and peeling, or perhaps most commonly, not prepped right, so they just painted right over nails and tape and bad splotchy patch jobs without prepping the walls at all. When you depreciate the value for how shitty the job is, the DIY paint job is worth about as much as a kick in the nuts.

In general, the whole Home Depot / DIY movement is a real fucking tragedy. Not just for the peace and quiet of neighborhoods, nor just for the quality of dinner party conversation, but most of all for the innocent houses which have your shitty amateur work inflicted on them.

I've been using day labor painters, but I don't particularly recommend it. I think it was good for me, because I got a lot of really nasty clean-up work done as part of the paint prep (which normal pro painters would refuse to do), but if it was just for the painting, not so great. The main problem is that you have to be there to supervise all the time, which is a massive time cost.

In general, I hate dealing with anyone who is paid hourly. I like to pay by the job, and if you take longer, or if you fuck it up and have to redo part of it, then you should get paid *less* for inconveneniencing me, not more!

Anyway, some random painting tips to self even though I swear not to do this ever again :

1. Buy way more of everything than you think you need. Way more. I mean *way* more. You think you need 2 rolls of blue tape, buy 20. If you don't use them, you can just return them. If you don't you will wind up having to run to the store to buy more of something, which is a huge time waster. Buy lots of brushes, tools, equipment, all kinds of shit that you think you probably won't need, just to have it on hand in case you do.

2. Cover all the floors. I know you think you are saving time by only covering the floors near the walls you are painting, but you will get paint in places you don't expect, and then spend way more time cleaning it up than if you just covered all the floors.

3. When buying more of a certain paint, make sure you check the code # on the can. Don't just buy more "Brand X latex white" , because there might actually be 4 variants of that which are not clearly labelled as being different. Every paint can has a code to uniquely identify it.

4. IMO avoid oil-based paints and primers for interiors. The thickness/durability benefits are not really worth the cleanup pain vs. modern good water/latex based paints. (painting a boat or some such shit might be a different story)

5. Foam brushes are good for tiny touch-up jobs. For painting edges of walls, or anything where you are applying a decent amount of paint, a good quality bristle is the way to go.

In general the mistakes are so predictable and obvious and yet you will almost certainly make them : eg. time saved by doing less prep costs you more time in the end, money spent on cheap equipment is lost back in wasted time, etc.

Also in general, I think it's very fucked up that so many home owners take all the time to learn to do this shit and DIY it; WTF; isn't this what civilization and capitalism is for? specialization for increased efficiency? where did it go wrong?


08-22-11 | Dicks are Rewarded

Sitting out in the lovely warm evening at a restaurant the other day, I beheld this scene :

The outdoor tables are somewhat limitted and obviously desirable. A few people are sort of milling around the hostess in a disorganized line when a table becomes available. One of the guys just walks out of the line and sits down at the table (you're supposed to wait to be seated). His girlfriend doesn't follow because she knows it's wrong and he says "come on, just sit down, we'll tell the hostess we took it".

The point of this story is not what happens next (spoiler : they get the table and the worst consequence to them is some lifted eyebrows). It's that the *worst* possible outcome would be if someone said something and they had to go back in line like everyone else. That makes it only +EV, there's no -EV line. The waiters and hostess are generally too accomodating to say anything, everyone else in line is too much of a pussy, so even the zero-EV line rarely occurs for the violator.

Once in a rare while in these kind of situations I will say something; I was in sort of a vague line at a grocery store the other day and someone cut in front of me and I was like "umm, there is a line" and they got in - but that's just the outcome that they would have gotten if they were not a dick!! it's no penalty, there is only reward for being a dick.

I guess taking 20 items to the 8 item or less line is the simplest example; I have witnessed it many times and never once have I seen any checker or other patron say anything, and even if they did it would just mean waiting in line like a normal person. To really make things right you need to take his groceries and smash them in his face. People would call you "psycho" or something but in reality you're just trying to ensure the probability-weighted cost-benefit of being a dick is negative.


Not exactly the same, but related, I'm very annoyed/jealous by/of people who can be dicks without even being aware of it. They take so much benefit from the world because dicks always win (or at least break even).

The other day I was sitting in the park enjoying another lovely warm evening when some guys started playing football. They were outrageously bad at it, like just throwing long bombs with absolutely no control, running all over the place right through other people. Everyone was giving them the stink eye, but they were just laughing, loving it.

You bastards. When I play sports in the park I am hyper aware of all the people who are annoyed by it, and I'm super careful to carve out a little patch where it's highly unlikely that an errant throw will impinge on someone's peaceful park sitting. And even though I am super careful and considerate about that, I still get stink-eyes from dumb fuckers who decide to sit and read right in the middle of the play field, curse all of you people and your indiscriminate stink-eyes.


08-22-11 | Rambling

Usually the most annoying thing about dogs at the park is actually their owners. The dog is behaving just fine, but the owner is going on and on in a string of blather like "come here choo choo , sit choo choo, good boy, isn't he so cute? get your stick choo choo" , I want to walk up to the owner and go "sit human, quiet human, good human".

I'm generally anti-"experientalist" (that is, the school of thought that you can't have a valid opinion on something without experiencing it first hand) but I'm pretty sure that all the anti-immigration nut-jobs have never actually talked to an immigrant in a serious way.

I fucking hate call centers that automatically route me to a regional call center based on my phone's area code. They never warn you that they are doing so, or give you a chance to opt out of it. Or god forbid, look up my phone number's account info and fucking see my address. Oh no, I have to be like 10 minutes into describing my problem to the fucking wrong state guy and he's like "oh, you're in WA, let me transfer you..." no!!

Also, fuck you call center people who can see my phone number but ask me for it anyway. I've only discovered this because I like to give people my Google Voice number, but I'm calling from my cell phone, so the caller Id shows something different. They say "what's your phone number?" and I say "XXX" and they say "I see you're calling from YYY" and I'm like WTF, first of all, yeah, so what, I told you XXX just fucking write it down you monkey, don't question me, and second of all, you could've just started with, "can I use the number you're calling from?" that would save the average person a lot of time.

Also, if you are fucking confused by the fact that my phone area code is not a Seattle area code, then I have some "it's so hard to program a VCR" jokes that you might like.

I enjoyed the Stieg Larsson books, but I was disappointed by the cliff they went off after the first one. I thought the first was a good beach page-turner that was sort of remarkable for how simple it was, most of the book is about researching in files of paper, and it's very sort of compact and old fashioned, despite how he constantly says it's "not a locked room mystery" , of course it is sort of. But then after the first book it becomes ridiculous super heroes and super villians and stupid action, still a good page turner, but without the quiet charm.

I enjoyed "A small death in Lisbon", but I was a bit disturbed by just how much relish the author seems to take in his characters more prurient behavior. It felt a bit like watching someone masturbate to scat porn. (hmm, not that ever watching someone masturbate is a good thing... my analogy is a bit off...)

"The Shadow Line" is like so fucking great right up until the big reveal at the end. I was so excited to see a cop show that's actually about a big conspiracy, some intrigue. God I am sick of the minute fucking clinical analysis of these boring ass petty crimes that are on all cop shows these days. Give me big government schemes, underworlds, shadowy figures, and skip the fucking CSI pseudo-scientific bullshit. I think the casting is also superb, particularly Rafe Spall as Jay Wratten just absolutely gives me the creeps; actually everyone in it is awesome except for the females who are pretty uniformly terrible. Anyhoo, it's all going great and then ... WTF ? This is what it's all about? Are you fucking kidding me? So disappointing.

BTW quick modern BBC cliche cheat sheet :

Black guy with white female.

Females who act super cold/professional/masculine/emotionless (but then occasionally turn to mush).

Random/inappropriate yelling in beaurocratic meetings.

I'm not sure if the yelling started with Waking the Dead, or maybe Gordon Ramsay? I dunno, but Brits really like to watch a man lose his temper. Grow a fucking sack, Britain. Trevor Eve (or Chiwetel Ejiofor, or Idris Elba, or whoever you have yelling at you on this hour's show) is not your daddy and he's not going to tell you to straighten up and fly right. Is this your idea of admirable behavior? Who thinks this is a good leader? And it's so outrageously unrealistic. The person receiving the yelling always just sort of sits and takes it with a look on their face that's either like "mmm, saucy" or "well I've really been straightened out".


08-21-11 | Offroading

As usual the mainstream press is completely moronic about actual offroading. They usually test offroaders on slippery grass or some nonsense and talk about the 4wd traction. Let me tell you what actually matters :

(* I should note up front that I am not talking about "severe" off-roading, like boulder-climbing or some such, which no production car is suited for, I'm talking about un-graded dirt/rock roads and such; I'm also assuming you're not doing something retarded like going off-roading when it's muddy, which is not so much foolish because you might get stuck, but ass-holish because it destroys roads and erodes hillsides)

1. Reliability. By far the most important thing is to have a vehicle that won't break down in the middle of nowhere, where cell phones won't work and AAA won't come. Above all else this is why the Toyota Truck is the greatest off-roader of all time (and the Honda Civic is the #2 off-roader (not really, but the point is that a reliable car with worse capabilities is much better than a very capable car that could crap out at any time)). Shit like old Land Rovers or International Scouts or what have you are actually terrible off roaders due to high probability of crapping out.

2. Comfort. This is one that I never see mentioned, but in fact the limitting factor on most bad roads is just your comfort. Most bad mountain roads are not impassable - in fact you could probably make it in a Honda Civic or some such, but when they are rutted, wash-board, rocky, pot-holed, it will be back-breaking hell and you will have to go 5 mph in a Civic. To realistically be able to go 50 miles on a bad back road, the most important thing is comfort, you need long travel suspension and a very soft ride. (BTW longer travel is almost always better with suspension). This is actually a bit hard to find these days as everything has gotten "sporty". I'm not sure what the great choices are for this, since the retarded car journalist corps doesn't correctly review offroaders for comfort.

3. Clearance. Far more important than 4wd or any such nonsense is clearance. For fording streams, getting over logs, big rocks, the things that will actually stop you - clearance is king. Cars with good 4wd but terrible clearance are not really great offroaders (yes, that means you, subaru).

Traction and power almost never come in to it for real-world offroading. When it does, it's better to have a Honda Civic with a good winch than a 4x4 without one.


08-18-11 | Rainier Watch

"Sunrise" is peaking right now; it looks like this from afar :

It's sunny, the snow is in patches now, the wild flowers are out in force, carpeting the hill sides in blankets of colors, the meadows are covered by the explosion of spring growth that pops out of the frozen earth.

Go right now, go Friday (the weekend up there is a nightmare), by Monday it may be too late. There are some really great scrambles in the Palisades area, best scrambles of my life, technical, fun, great views for rewards, email me if you want details.

(and no, the real mountain doesn't have "autopano" written on it)

A couple of "fuck yous" to the world :

Fuck you park service, who after hiking 4 miles out away from everyone, sticks all the camp sites right on top of each other. Yeah yeah I get why they do it so save your smart-ass-but-not-actually-smart comments.

Fuck you lazy fat tourists who can't be bothered to go more than half a mile from their cars, but do just love to go off trail and stomp around in the delicate alpine meadow with their fat ugly asses in their unnecessary trekking suits.

Fuck you REI for sticking your giant logo on everything so that I stare right at it as I go to sleep and it subliminally burns into my fucking brain. God dammit I want to see nature, not advertising, it's like fucking Times Square out in the woods these days with the brands on everything. And ..

Fuck you REI for making all your shit bright orange or whatever other heinous color. I guess it's intentional so it's easy to see, but it makes a couple of tents by a lake so much more of an eyesore than it needs to be. If your shit was just green and brown the woods would look so much better.

.. oh I can't really do the "fuck yous" justice right now. The world is a beautiful place out there in the wild.


08-15-11 | More finance

I've gotta get off this topic soon because it's a huge distraction and also just very depressing. The sad thing is that for most of us, the most +EV life move is just to close our eyes and say "la la la" and pretend we don't know about any of this.

It's funny that Geithner was sold to us as a government technocrat who wasn't a "financial insider" when nothing could be further from the truth. While it's true he has worked in government, he is deeply connected to Citi and the whole modern bubble. He rose under Summers and Rubin, who are responsible for Gramm–Leach–Bliley that deregulated finance (as well as lots of other deregulations during that time, such as keeping derivatives off book, removing SEC oversight of many institutions and lowering reserve requirements). Geithner was offered the CEO job of Citi but turned it out. Geithner's appointment to the NY Fed was backed by Citi (the NY Fed was amusingly appointed by the banks it was supposed to regulate, it's long been corrupt by design).

Woodward -- Behind the Boom
What’s behind the ICE arrests of 30 after an immigration raid in Ellensburg, WA (Courtesy of NNIR) « El Comite Pro-Reforma M
What's Obama Doing to Your Taxes - Political Hotsheet - CBS News
What Barack Obama Needs to Know About Tim Geithner, the AIG Fiasco and Citigroup The Big Picture
U.S. Says Rendition to Continue, but With More Oversight - NYTimes.com
The Worden Report Pay No Attention to the Banker behind the Curtain
The Worden Report An Institutional Conflict of Interest at the New York Federal Reserve
The Washington Monthly
The Fed And The Treasury Had A Funny Way Of Guilt-Tripping Sheila Bair
The Big Picture 2
The Big Picture 1
TaxVox » Blog Archive » Why Nobody Noticed Obama’s Tax Cuts
Steamy History of the American Economy during the Clinton Administration
Office of U.S. Trustee vs. Harmon - Witness Timothy Geither
Long-Term Capital Management - Wikipedia, the free encyclopedia
How Citigroup Unraveled Under Geithner’s Watch - ProPublica
Goldman Connection at NY Fed Major Conflict of Interest - Seeking Alpha
F.B.I. Giving Agents New Powers in Revised Manual - NYTimes.com
Deportation of illegal immigrants increases under Obama administration
Deficit and Spending Increase Under Obama - WSJ.com
Daily Kos Taxes lowest in 60 years, thanks to Democrats and Obama


08-14-11 | A note on convex hull simplification

I wrote this in email and thought it worth recording.

A while ago I wrote mainly about OBB algorithms but a little note about convex hull simplification

It's a little unclear, so I clarified :

My algorithm is very simple and by no means optimal.

I construct a standard (exact) convex hull, then make a mesh from it. I then run a normal mesh simplifier (see for example Garland Heckbert Quadric Error Metrics) to simplify the CH as if it was a mesh. This can ruin inclusion. I then fix it by taking all the face planes of the simplified mesh and pushing them out past any vert in the original mesh.

Stan's (Melax - Convex Hull Simplification With Containment By Successive Plane Removal) way is similar but better. He uses a BSP engine to create the hull. First he finds a normal convex hull. Then he considers only the planes that make up that hull. The working hull is the volume that is on the "front" side of all planes. He then considers removing planes one by one. When you remove a plane, the cost to remove it is the volume that is added to the hull, which is the volume of the space that is on the back side of that plane but is on the front side of all other planes. You create a heap to do this so that the total cost to simplify is only N log N. This requires good BSP code which I don't have, which is why I used the mesh-simplifier approach.

An alternative in the literature is the "progressive hull" technique. This is basically using PM methods but directly considering the mesh as a hull during simplification instead of fixing it after the fact as I do. Probably a better way is to use a real epsilon-hull finder from the beginning rather than finding the exact hull and then simplifying.

My code is in Galaxy4 / gApp_HullTest which is available here ; You should be able to run "Galaxy4.exe hull" ; Hit the "m" key to see various visualations ; give it a mesh argument if you have one (takes .x, .m , .smf etc.)

BTW to summarize : I don't really recommend my method. It happens to be easy to implement if you have a mesh simplifier lying around. Stan's method is also certainly not optimal but is easy to implement if you have good BSP code lying around (and is better than mine (I suspect)).

The technique I actually prefer is to just use k-dops. k-dops are the convex hull made from the touching planes in a fixed set of k directions. Maybe find the optimal OBB and use that as the axis frame for the k directions. Increase k until you are within the desired error tolerance (or k exceeds the number of faces in the exact hull).

ASIDE : I have some BSP code but I hate it; I hate all floating point geometry code. I love integer geometry code. The problem with integers in BSP's is that clipping creates rational points. Maybe I'll write some BSP routines based on rational Vec3's. The only big problem is that the precision requirement goes up with each clip. So you either need arbitrary precision rationals or you have to truncate the precision at some maximum, and then handle the errors created by that (like the truncated point could move onto the back side of a plane that you said you were in front of). (this is better than the errors in floating points, because at least the truncated point is at a definite well defined location, floating points move around depending on how you look at them, those wiggly bastards) (I'm tempted to say that they're like quantum mechanics in that they change when you measure them, except that they really aren't at all, and that's the type of pseudo-scientific-mumbo-jumbo that pseudo-intellectual fucktards love and I so despise, so no, I won't say it).


08-12-11 | A review of Obama so far

It's amazing to think back to the Obama election and how ecstatic so many people were, and how sad the reality has turned out to be. I think Obama has got to be the most disappointing president of all time. Let's try to run through the score card and remind ourselves of what's happened.

1. Military Policy. No significant de-escalation of American involvement abroad. No significiant cuts of spending on programs which are irrelevant to the modern world (such as large weapons programs). No merging of services to reduce waste. Continued extra-legal prosecution of war, such as assasinations of people in non combat zones.

2. Immigration. One of those sad cynical things that came out of the GWB white house was using 9/11 as an excuse to crank up enforcement of illegal immigration under DHS ; obviously the real motive was always getting them dirty Mexicans out of our country, but it was politically difficult before 9/11 ; so the anti-Mexicans saw a great opportunity to sneak it in as "protecting our borders from terrorists". Unfortunately this has only gotten worse under Obama, including breaking the veil between INS (ICE) and law enforcement; historically law enforcement had an agreement to not enforce immigration because they wanted immigrants to be able to interact with the police on normal legal issues, but that is now gone.

3. Domestic spying. Only continues to get worse. The FBI was recently given new powers to snoop without warrants; ever since 9/11 the FBI has amped up its monitoring of clearly non-terrorist organizations, claiming that all sorts of groups like environmental activists are "domestic terrorists" (hint : corporate property damage is not "terrorism"), and under Obama that just got massively worse, we're basically back to Hoover levels of sneaky FBI that can now monitor anyone they want without probable cause. Obama continues going around FISA. Obama continues to use "state secrets" and national security defences to make unquestionable arrests. Abroad, "Extraordinary rendition" (aka passing the torture buck) continues.

4. Taxes. No indiciation that Obama will get realistic on this or close any loop holes or the sweetheart deals for various special interests (like financiers). The GWB tax cuts seem to be near permanent at this point, except maybe on the very richest bracket. Amusingly, the majority of Republicans believe that Obama has raised taxes (he hasn't - in fact he cut them for the vast majority of Americans).

5. Health Care. This was a small step in the right direction, but completely lacking the teeth to make it actually work (which requires government control of the outrageous costs of the private medical industry). Sadly "Obamacare" is really just another case of what Dems and Reps have been doing for many years now - creating government laws that gaurantee massive profits to private industries, force you to use private services without controlling their costs. (and it looks like this trend will only get worse in the next N years as there is talk of creating publicly-gauranteed private-profit-takers out of the Mortgage Macs, for social security, etc.; this is just the most beloved form of modern "reform")

6. Accountability. Fools thought some of the lawsuits and investigations into the questionable acts of the GWB administration may have gotten some help or at least been allowed to proceed, but they were not. The opacity of the executive branch started by GWB continues and perhaps has only gotten worse.

7. Funding for basic services. This is one of the saddest things to me and something that often gets lost in the shuffle of all the other disasters. The sort of very basic things that I think everyone except real nut jobs agrees that government should be doing are getting cut because they're the easiest things to cut. Obama is doing nothing about the big costs (Medicare & Defense), but is supporting cuts to "Non-defense discretionary spending". That sounds like just beaurocratic waste, but in fact "non-defense discretionary spending" is the meat of what makes government good and useful - it's education, housing, social programs, etc. And it's even worse at the state level. Here in WA we are cutting ferries, buses, homeless programs, mental illness programs, health care for the poor, animal shelters, etc. you name it, it's being slashed severely.


08-12-11 | The standard cinit trick

Sometimes I like to write down standard tricks that I believe are common knowledge but are rarely written down.

Say you have some file that does some "cinit" (C++ class constructors called before main) time work. A common example is like a factory that registers itself at cinit time.

The problem is if nobody directly calls anything in that file, it will get dropped by the linker. That is, if all uses are through the factory or function pointers or something like that, the linker doesn't know it gets called that way and so drops the whole thing out.

The standard solution is to put a reference to the file in its header. Something like this :


Example.cpp :

int example_cpp_force = 0;

AT_STARTUP( work I wanted to do );


Example.h :

extern int example_cpp_force;

AT_STARTUP( example_cpp_force = 1 );

where AT_STARTUP is just a helper that puts the code into a class so that it runs at cinit, it looks like this :

#define AT_STARTUP(some_code)   \
namespace { static struct STRING_JOIN(AtStartup_,__LINE__) { \
STRING_JOIN(AtStartup_,__LINE__)() { some_code; } } STRING_JOIN( NUMBERNAME(AtStartup_) , Instance ); };

Now Example.obj will be kept in the link if any file that includes Example.h is kept in the link.

This works so far as I know, but it's not really ideal (for one thing, if Example.h is included a lot, you get a whole mess of little functions doing example_cpp_force = 1 in your cinit). This is one of those dumb little problems that I wish the C standards people would pay more attention to. What we really want is a way within the code file to say "hey never drop this file from link, it has side effects", which you can do in certain compilers but not portably.


08-12-11 | The IMF in America

Quick recap of what the IMF does :

When a third world country gets into bad financial trouble, the IMF jumps in. The purpose is not to help them, it's to force them to accept terms that they never would agree to if they weren't desperate. Once the IMF money arrives, it's not allowed to go to social programs or price supports that would actually help the people in the ailing country, it goes to stabilizing the financial market of that country, and the rules they force you to accept ensure that capital can freely flow out.

The end result is that what the IMF does is ensure that western finance companies are able to get their money out of the ailing countries and recoup at least some of their losses. This is what happened with the Asian collapse, this is what happened with Ireland, etc. And of course the conditions ensure that the finance giants get to play freely in that country in the future.

It occurs to me that this is largely what the US Government has done with the Mortgage Crisis. Basically they are treating the average American as a separate 3rd world country that they don't really care about, and rather than doing something that would directly help the people who have been affected by the crisis, instead they are spending massive amount of money on propping up the financial markets so that the large financiers can get their money out and not take too much of a loss.

Imagine you had $3 trillion to help America out of the housing crisis. You could directly subsidize home owners' losses - buy their house from them at book value and sell it back to them at market value. You could do something in between, take the difference between book and market value and split it into a few pieces, subsidize part, make the home owner eat part, and make the mortgage holder eat part. Or you could directly subsidize the finance companies that hold the bad mortgages - buy the mortgage at book and some day sell it back at market.

Our government chose the last of these options. (there are some programs to do the intermediate option, but they haven't ever gotten off the ground, and even if they did their scale is so miniscule as to be irrelevant, something like $50 billion vs. the $2-3 trillion for the third option)

Obviously the claimed intention of the IMF is to prop up the failing economy to keep it on its feet. But they don't put in any restrictions against capital flight - quite the opposite, they *forbid* the government from stopping capital flight - so the result is highly predictable - the western investors save their own bacon.

The exact same thing has happened with mortgages. The claimed reason for propping up the mortgage market was that it would create confidence and keep the mortgage market working so that people would still be able to buy and sell homes, etc. Of course that hasn't happened, what has happened is that private MBS trading has trickled to near zero, and the banks have used the subsidized market to unwind their holdings and preserve their profits.


08-11-11 | Free Internet

I mean "free" in a liberty sense, not a monetary sense.

Recent Seattle Weekly article got me thinking about trying to encrypt and anonymize all my internet access. The whole torrent model is just like fish in a barrel for copyright trolls. You can just hop on the net and get a list of infringers any time you want.

So whatever reason, say you want to be able to work on the net and do as you please without your actions being monitored.

Apparently the major US-based services like FindNot and Anonymizer are not to be trusted (they provide logs to the US government and to subpoenas by the RIAA etc).

Really what you want is something like Tor that takes all your traffic and bounces it around a bunch of other machines and then puts out portions of requests from all over. Currently none of those services seem to be quite ready for prime time; Tor for example kicks you out if you try to do high-bandwidth things like torrents.

Some links :

Web-based DNS Randomness Test DNS-OARC
Tor Project Anonymity Online
SwissVPN - Surf the safer way!
Public IP Swiss VPN - Page 2 - Wilders Security Forums
OneSwarm - Private P2P Data Sharing
I2P - Wikipedia, the free encyclopedia
How To Not Get Sued for File Sharing Electronic Frontier Foundation
Free Anonymous BitTorrent Becomes Reality With BitBlinder TorrentFreak
Chilling Effects Clearinghouse
Anonymous P2P - Wikipedia, the free encyclopedia

In general I'm not sure if dark-nets like Tor can survive. I don't trust the internet providers or the US government to allow you to have that freedom. I suspect that if they ever caught on en masse they would be blocked by the standard extra-judicial mechanisms that they used to shut down online poker and funding WikiLeaks (where the government nicely asks the service provider to block that traffic and the provider complies, even though it's not clear the law is on their side).

The only way to get past that (and into places like china) is to hide encrypted packets inside benign packets. That may be fine for little text messages, but you can never get high bandwidth that way.


08-11-11 | Inflation - 4

More links :

Recent Decisions of the Federal Open Market Committee A Bridge to Fiscal Sanity (Acknowledging Henry B. Gonzalez and Winston
Quantitative easing - Wikipedia, the free encyclopedia
Fed Quantitative Easing Personal Finance Mastery
US Daily Index » The Billion Prices Project @ MIT
The June GDP deflator in the US Conspiracy theory edition « The visible hand in economics
Speculative-Investor.com
Sizing Up Sarah - Up and Down Wall Street - Alan Abelson - Barrons.com
Quantitative Easing A Beginner's Guide InvestingAnswers
PIMCO Investment Outlook - Skunked
Money, Credit, Inflation and Deflation
Mish's Global Economic Trend Analysis True Money Supply (TMS) vs. Austrian Money Supply (AMS or M Prime) Update
Michael Pollaro - The Contrarian Take - Forbes 1
Michael Pollaro - The Contrarian Take - Austrian Money Supply
Jesse's Café Américain Austrian Economics True Money Supply, Deflation and Inflation
Inflation Update TMS (True Money) Status
Inflation Or Deflation Follow the Money Supply (Guest Post) EconMatters
Inflation Charting the Economy, Part 4 Robert Kientz FINANCIAL SENSE
Guest Post U.S. Dollar Money Supply Is Underreported ZeroHedge
Exploring Inflation Over the Past 10 Years Through Charts - Seeking Alpha
Charting the Course to $7 Gas - J. Kevin Meaders - Mises Daily

Now some random hand waving and incoherent thoughts.

The alternative money supply metrics like TMS or "M1 + deposit currency" show that the money supply has massively increased since the great recession (roughly the same as the fed "BASE" ). Historically TMS tracks inflation very closely. So far it appears not to be. One possible explanation is that we basically continue to be in a recession, which is driving prices down, while also inflating the currency to keep prices stable.

Of course M2 and M3 are still way down due to lack of money multiplier in our recessed economy. In theory if credit picks up again and M3 takes off, the Fed will crank back on the money supply and keep things under control. But I believe there are reasons to be skeptical that the Fed will ever really crack down on the loose monetary policy.

The PIMCO newsletter is surprisingly bleak -

the only way out of the dilemma, absent very large entitlement cuts, is to default in one (or a combination) of four ways: 1) outright via contractual abrogation – surely unthinkable, 2) surreptitiously via accelerating and unexpectedly higher inflation – likely but not significant in its impact, 3) deceptively via a declining dollar– currently taking place right in front of our noses, and 4) stealthily via policy rates and Treasury yields far below historical levels – paying savers less on their money and hoping they won’t complain.

Basically the US has got itself into massive debt; I don't really agree that entitlements are the big problem, certainly not in the short term, but anyhoo. When you're a debtor, inflation is great for you - it makes your debt smaller. We fund the debt by selling treasuries. Our debt is much cheaper to fund if we can offer a very low return on treasuries. So the best option for the US government is clearly to inflate the currency and devalue the dollar (reduces our debt) while claiming that inflation is low (keeps treasury yields low).

In other crackpottery, QE is very strange. My very crude cliff notes :


To inject cash into the economy, the Fed bought treasuries from private holders
this takes out non-cash "paper" and adds to the money supply

The government is in debt; to finance that debt it sells treasuries
this gives the government cash in exchange for paper

So during QE, banks bought bonds from Treasury, then sold them to the Fed

Basically the Fed was just giving cash to Treasury, but passing it through banks so they could take a piece of profit

Oddly the US apparently has a law that prevents the Fed from directly buying Treasuries and thus supporting the government debt; also Bernanke and such claim they are not "printing money to monetize the debt" but really passing the money through the open market doesn't change much other than giving a slice of profit to the banks. Says the Dallas Fed : "For the next eight months, the nation’s central bank will be monetizing the federal debt."

Furthermore there is good evidence that QE largely backfired at injecting money into the economy. The problem is that with the fed funds rate at 0% and Treasuries at 2-2.5% , banks can take out fed funds, buy treasuries, then sell them to the fed under QE. Free money for the banks and the Fed gets to pay for government debt, yay.

This is certainly part of why bond yields are so low; they don't need to return much when cash is free. Apparently Japan has been through all this before. There's evidence that QE schemes basically never work; when the fed funds rate is near zero, all possible profitable investments that can be made already have been made, so why in the world would injecting more liquidity into the market help? The only explanation I can see is to intentionally devalue the currency or cause inflation. (of course as we'll note later, QE was not only a monetary policy, it was also a direct and corrupt subsidy for toxic asset holders)

Side note : the Dallas Fed uses Trimmed Mean PCE instead of Core PCE. If you read the press releases from the Dallas Fed or St. Louis Fed it is encouraging that there are still technocrats in government that are trying to be reasonably honest and do their jobs well. Of course they tend to get squashed by the people in power, but still ...

What is QE really doing? Propping up stock values and other investments (particularly MBS'es). Giving banks free profits. Createing an outflow of money from the US to emerging markets. Sending up commidity prices.

"QE effects on commodity markets have been significant. Between August 2011 and January 2011, commodity prices (as measured by CRB Index) rose by 14%. Oil prices have increased by around 20% and average gasoline prices have increased around 15%. Food prices (as measured by the CRB Food Index) have increased 12%, with some individual foodstuffs rising more sharply."

It's unclear to me how you can defend dumping liquidity into the system when banks were already sitting on massive amounts of liquidity with nothing to do with it other than hold it in the Fed (*) or treasuries. It's like watering a sick plant that's already soaked in water.

* = this is one of the funnier quirks; banks actually hold their large cash balance at the federal reserve, so when they got massive cash injects from QE the main thing that happened was that their balance of cash sitting in the fed went sky high, and the fed pays interest to the banks on that deposit - banks now have excess reserves (cash beyond their reserve requirement) around $1.5 Trillion sitting in the Fed, up from only $2 billion in 2007. If your goal is to get banks to lend more, why do you want to make it more attractive for them to leave cash sitting in reserves? This was yet another change in 2008 as part of the massive experiment of letting the Fed tinker with the economy in unprecedented ways. Some links :

Why pay interest on excess reserves - William J. Polley
macroblog “Why is the Fed Paying Interest on Excess Reserves”
FRB Press Release--Board announces that it will begin to pay interest on depository institutions required and excess reserve
Fed Paying Interest on Reserves A Primer - Real Time Economics - WSJ
Dudley Seeing Interest on Reserves as Tool of Choice Sparks New Fed Debate - Bloomberg

One common thread that I don't understand is why even a tiny bit of deflation is considered such a huge disaster that almost anything will be done to prevent it.

More links :

Yellen Defends QE Is She Right - Seeking Alpha
Satyajit Das Economic Uppers & Downers « naked capitalism
Mish's Global Economic Trend Analysis US Treasury Bull Market Not Over; Record Low Yields; Shades of Japan; Why QE3 Totally
How To Make $4 Trillion Vanish In A Flash Why Another Financial Crash Is Certain « The Oldspeak Journal
FOMC Meeting Participants See QE2 Devaluing the Dollar Global Economic Intersection
fed_all_short_stacked.png (PNG Image, 722x519 pixels)
Fed QE and SPX
Bernanke's Dilemma Hyperinflation and the U.S. Dollar - Seeking Alpha
Flim-Flam Economics » Monty Pelerin's World
BERNANKE IS SATAN « The Burning Platform

I can't imagine any way to justify the Fed purchasing MBS's and other such assets. The role of the Fed is supposed to be monetary policy, not subsidizing bad investments or otherwise interfering in markets. The Fed is not the SEC or Treasury. They just slipped it in as part of "QE" which is supposed to just be increasing the money supply and bought $1.25 B of MBS's without even attempting a fair valuation of them. That's basically an illegal expansion of the TARP program. The amount of toxic MBS that the Fed owns now dwarfs what Treasury holds.

Another odd footnote is the fact that banks are no longer required to mark-to-market MBS's and similar products, so we have no idea what's really on their balance sheets.

Another funny one is that Fannie/Freddie have gone from about 20% of the mortgage market to about 80% since the collapse.

If it is the Fed's role to moderate volatility, then it must also slow growth in over-exuberant times. It has shown no willingness to do so in recent years and I can't imagine that it ever will, given the political corruption and collusion that drives decision making. It's like having a train conductor that stokes the fire when you slow down but refuses to use the brakes when you get going to fast.

More links :

The Mess That Greenspan Made Is all this exit strategy talk warranted
Market Talk » Mark-to-Market
Mark-to-market accounting - Wikipedia, the free encyclopedia
Initial Fed Audit Shows Web of Conflict of Interest FDL News Desk
Hussman Funds - Weekly Market Comment Things I Believe - December 20, 2010
Hussman Funds - Weekly Market Comment The Recklessness of Quantitative Easing - October 18, 2010
AEI Relief from Mark-to-Market Accounting


08-09-11 | Threading Links

For reference, some of the links I consulted for the recent postings :

[concurrency-interest] fast semaphore
[C++] Chris M Thomasson - Pastebin.com
[C++] Chris M Thomasson - Pastebin.com -rwmutex eventcount
[C++] Chris M Thomasson - Pastebin.com - wsdequeue
[C++] Chris M Thomasson - Pastebin.com - semaphore and mpmc
[C++] Chris M Thomasson - Pastebin.com - mpsc in relacy
[C++] Chris M Thomasson - Pastebin.com - eventcount from cond_Var
[C++] Chris M Thomasson - Pastebin.com - cond_Var from waitset
[C#] Chris M Thomasson - Pastebin.com - eventcount in C#
yet another win32 condvar implementation - comp.programming.threads Computer Group
yet another (tiny) implementation of condvars - comp.programming.threads Google Groups
Would this work on even one platform - about mutex reordering
Windows NT Keyed Events
Win32 Kernel Experimental WaitLock-Free Fast-Path Event-Count for Windows... anZ2dnUVZ, InterlockedLoadFence, and aPOdnXp1l6
win32 condvar futex - NOT! - 29464
Win32 condition variables redux - Thomasson thread list version
Win32 condition variables redux - comp.programming.threads Google Groups
Usenet - Lock-free queue SPMC + MPMC
Usenet - Condition variables signal with or without mutex locked
Time-Published Queue-Based Spin Locks
Ticket spinlocks [LWN.net]
ThreadSanitizer - data-race-test - ThreadSanitizer is a Valgrind-based detector of data races - Race detection tools and mor
Thin Lock vs. Futex «   Bartosz Milewski's Programming Cafe
The Inventor of Portable DCI-aka-DCL (using TSD) is... ;-) - comp.programming.threads Google Groups
TEREKHOV - Re win32 conditions sem+counter+event = broadcast_deadlock + spur.wake
TBB Thomasson's MPMC
TBB Thomasson - rwmutex
TBB Thomason aba race
TBB Raf on spinning
TBB eventcount posting Dmitry's code
TBB Download Versions
TBB Dmitry on memory model
Task Scheduling Strategies - Scalable Synchronization Algorithms Google Groups
Subtle difference between C++0x MM and other MMs - seq_cst fence weird
Strong Compare and Exchange
Strategies for Implementing POSIX Condition Variables on Win32
Starvation-free, bounded- ... - Intel® Software Network
spinlocks XXXKSE What to do
Spinlocks and Read-Write Locks
SourceForge.net Repository - [relacy] Index of relacy_1_0rrdinttbb_eventcount
Some notes on lock-free and wait-free algorithms Ross Bencina
So what is a memory model And how to cook it - 1024cores
Sleeping Read-Write Locks
Simple condvar implementation for Win32 - comp.programming.threads Google Groups
Simple condvar implementation for Win32 (second attempt)
SignalObjectAndWait Function (Windows)
sequential consistency « Corensic
search for Thomasson - Pastebin.com
search for Relacy - Pastebin.com
sched_setscheduler
Scalable Synchronization
Scalable Synchronization MCS lock
Scalable Synchronization Algorithms Google Groups
Scalable Queue-Based Spin Locks with Timeout
Relacy Race Detector - 1024cores
really simple portable eventcount... - comp.programming.threads Google Groups
really simple portable eventcount... - 2
really simple portable eventcount... - 1
re WaitForMultipleObjects emulation with pthreads
Re sem_post() and signals
Re Portable eventcount (try 2)
Re Intel x86 memory model question
Re C++ multithreading yet another Win32 condvar implementation
race-condition and sub-optimal performance in lock-free queue ddj code...
Race in TBB - comp.programming.threads Google Groups
QPI Quiescence (David Dice's Weblog)
pthread_yield() vs. pthread_yield_np()
pthread_cond_ implementation questions - comp.programming.threads Google Groups
POSIX Threads (pthreads) for Win32
Porting of Win32 API WaitFor to Solaris Platform
Portable eventcount
Portable eventcount - Scalable Synchronization Algorithms Google Groups
Portable eventcount - comp.programming.threads Google Groups
Portable eventcount (try 2) - comp.programming.threads Google Groups
Parallel Disk IO - 1024cores
Obscure Synchronization Primitives
New implementation of condition variables on win32
my rwmutex algorithm for Linux... - this is good
Mutexes and Condition Variables using Futexes
Multithreading in C++0x part 1 Starting Threads Just Software Solutions - Custom Software Development and Website Developmen
Multithreaded File IO Dr Dobb's Journal
Multi-producermulti-consumer SEH-based queue – Intel Software Network Blogs - Intel® Software Network
MSDN Compound Synchronization Objects
MPMC Unbounded FIFO Queue w 1 CASOperation. No jokes. - comp.programming.threads Computer Group
Memory Consistency Models
low-overhead mpsc queue - Scalable Synchronization Algorithms
Lockless Inc Articles on computer science and optimization.
Lockingunlocking SysV semaphores - comp.unix.programmer Google Groups
Lockfree Algorithms - 1024cores
lock-free read-write locks - comp.programming.threads Google Groups
Lock-free bounded fifo-queue on top of vector - comp.programming.threads Google Groups
Linux x86 ticket spinlock
JSS Petersons
JSS Dekker
Joe Seighs awesome rw-spinlock with a twist; the beauty of eventcounts... - comp.programming.threads Google Groups
joe seigh on eventcount fences
Joe Seigh Fast Semaphore
Joe Duffy's Weblog - keyed events
Implementing a Thread-Safe Queue using Condition Variables (Updated) Just Software Solutions - Custom Software Development a
How to use priority inheritance
High-Performance Synchronization for Shared-Memory Parallel Programs University of Rochester Computer Science
good discussion of work stealing
good discussion of a broken condvar implementation
git.kernel.org - linuxkernelgittorvaldslinux-2.6.gitcommit
GCC-Inline-Assembly-HOWTO
futex(2) - Linux manual page
FlushProcessWriteBuffers Function (Windows)
First Things First - 1024cores
Fine-grained condvareventcount
fast-pathed mutex with eventcount for the slow-path... - comp.programming.threads Google Groups
experimental fast-pathed rw-mutex algorithm... - comp.programming.threads Google Groups
eventcount needs storeload
eventcount example of seq_cst fence problem
Effective Go - The Go Programming Language
duffy page that's down meh
Dr. Dobb's Journal Go Parallel QuickPath Interconnect Rules of the Revolution Dr. Dobb's and Intel Go Parallel Programming
Don’t rely on memory barriers for synchronization… Only if you don’t aware of Relacy Race Detector! – Intel Software Network
dmitry's eventcount for TBB
Distributed Reader-Writer Mutex - 1024cores
Discussion of Culler Singh sections 5.1 - 5.3
Developing Lightweight, Statically Initializable C++ Mutexes Dr Dobb's Journal
Derevyago derslib mt_threadimpl.cpp Source File
Derevyago - C++ multithreading yet another Win32 condvar implementation - comp.programming.threads Google Groups
Dekker's algorithm - Wikipedia, the free encyclopedia
David's Wikiblog
data-race-test - Race detection tools and more - Google Project Hosting
condvars signal with mutex locked or not Loïc OnStage
Concurrent programming on Windows - Google Books
concurrency-induced memory-access anomalies - comp.std.c Google Groups
CONCURRENCY Synchronization Primitives New To Windows Vista
comp.programming.threads Google Groups
comp.lang.c++ Google Groups - thomasson event uses
Common threads POSIX threads explained, Part 3
Chris M. Thomasson - Pastebin.com
Chris M. Thomasson - Pastebin.com - win_condvar
Chapter 22. Thread - Boost 1.46.1
cbloom rants 07-18-10 - Mystery - Does the Cell PPU need Memory Control -
cbloom rants 07-18-10 - Mystery - Do Mutexes need More than Acquire-Release -
Causal consistency - Wikipedia, the free encyclopedia
C++1x lock-free algos and blocking - comp.lang.c++ Google Groups
C++0x sequentially consistent atomic operations - comp.programming.threads Google Groups
C++0x memory_order_acq_rel vs memory_order_seq_cst
C++ native-win32 waitset class for eventcount... - comp.programming.threads Google Groups
C++ native-win32 waitset class for eventcount... - broken for condvar
C++ N1525 Memory-Order Rationale - nice
C++ multithreading yet another Win32 condvar implementation
Bug-Free Mutexs and CondVars w EventCounts... - comp.programming.threads Google Groups
Break Free of Code Deadlocks in Critical Sections Under Windows
Boost rwmutex 2
Boost rwmutex 1
boost atomics Usage examples - nice
Blog Archive Just Software Solutions - Custom Software Development and Website Development in West Cornwall, UK
Atomic Ptr Plus Project
Asymmetric Dekker
appcoreac_queue_spsc - why eventcount needs fence
AppCore A Portable High-Performance Thread Synchronization Library
Advanced Cell Programming
A word of caution when juggling pthread_cond_signalpthread_mutex_unlock - comp.programming.threads Google Groups
A theoretical question on synchronization - comp.programming.threads Google Groups
A race in LockSupport park() arising from weak memory models (David Dice's Weblog)
A garbage collector for C and C++
A futex overview and update [LWN.net]
A Fair Monitor (Condition Variables) Implementation for Win32


08-09-11 | Inflation - 3

Some reference :

Response to BLS Article on CPI Misconceptions
Consumer price index - Wikipedia, the free encyclopedia
WSJ piece on hedonic price adjustments
Consumer Price Index
Chapter 11—Money and Its Purchasing Power (continued) - - Mises Institute
Bill Gross Claims the CPI is Understated, But Is He Right - Seeking Alpha
An inflation debate brews over intangibles at the mall
United States Consumer Price Index - Wikipedia, the free encyclopedia
Trudy Lieberman Entitlement Reform Archive CJR
Shadow Government Statistics Home Page
Meet the Fed's Elusive New Inflation Target - TheStreet
Inflation The Concise Encyclopedia of Economics Library of Economics and Liberty
How BLS Measures Price Change for Medical Care Services in the Consumer Price Index
Higher Education Price Indices
God Punishes Us When We (Collectively) Vote Republican, Part 5 Angry Bear - Financial and Economic Commentary
Consumer Price Index, a rant
Charts College Tuition vs. Housing Bubble » My Money Blog
Chained Cpi Social Security, CPI, Michael Hiltzik Using 'chained CPI' to determine Social Security payments would rip off ne
cbloom rants 5-14-05 - 1
cbloom rants 12-27-08 - Financial Quackery
cbloom rants 09-17-07 - 2

Some of these guys have the whiff of crackpottery which should give us a bit of pause. Nevertheless...

We can track down a few of the strange problems that I identified last time.

Education basically is miscounted : "The inclusion of financial aid has added to the complexity of pricing college tuition. Many selected students may have full scholarships (such as athletic), and therefore their tuition and fixed fees are fully covered by scholarships. Since these students pay no tuition and fees, they are not eligible for pricing." discounting financial aid makes some sense if you are trying to measure the consumer's expense, but not if you are trying to measure the cost of the good; just because someone else paid for part of it doesn't make it cheaper. But really I imagine the biggest problem with education cost is that they effectively count college as being free for people who can't afford college. That is, people who can't afford it don't buy it, so it's not in the basket (doesn't contribute to the "quantity" in the CPI metric). A better way to measure inflation would be to assume that everyone would go to college if they could afford it. (also it seems that non-acredited technical school time places are not counted at all)

Health care is simply not counted at all, by design : "The weights in the CPI do not include employer-paid health insurance premiums or tax-funded health care such as Medicare Part A and Medicaid" The only thing they count is out-of-pocket / discretionary health care expenses, which are obviously just a tiny fraction of the total.

Real estate has the funny owner's cost to rent thing which makes it very hard to tell if that is being gamed or not.

Obviously anything based on "core" inflation (without food or energy) is ridiculous. The standard argument that those fluctuate too much seasonally is absurd, you could just use a seasonally-adjusted moving average, you don't need to remove them completely.

The other really obviously fishy parts are :

"Substitution". A while ago the CPI was changed to use geometric averages of prices within a category. This seems pretty innocuous, but it basically causes a down-weighting of higher priced items. And in fact the geometric mean is always lower than the arithmetic mean, so this change can only make inflation seem lower, which is a dirty trick. For example :


(1+1+8)/3 = 3.333

(1*1*8)^(1/3) = 2.0

pretty big difference even though they are both "means". Now, they hand wave away and say that this reflects consumers' ability to choose and substitute cheaper products. But it is totally unscientific.

Furthermore, newer measures like the CPI-U or Fed's PCE also explicitly include substitution. This just seems like it obviously does not reflect inflation. When a product gets expensive and the consumer substitutes for something cheaper, they are by definition getting something of lower utility (because it wasn't their first choice), so you can't say that no inflation happened, they are getting less for their money.

"Hedonics". These are poorly documented pure bullshit ways of pretending inflation is lower by claiming that we got more for our money. This is just pure nonsense for various reasons :

1. The whole definition of "better" is so vague and open to interpretation that it has no business in a metric. For example they consider air travel to be massively improved since the 70's. Sure it's safer, more efficient, but also much much less pleasant. Personally I think that the same trip is actually worth much less now than it was in the past, but they say it's worth much more. Similarly for the quality of buildings and clothing and cars and so on; yes, they're safer, faster, more durable, whatever, but they aren't hand crafted, they aren't made of hard wood and steel and chrome; I think most of those things are actually much crappier now than ever, made more cleverly but also more cheaply. Anyway, it just has no business in there. The idea that you can measure the hedonic quality of some product and say it improved by 0.1% from April to May in 2010 is just absurd.

2. Using quality of goods at all just isn't right to begin with. Inflation should be a measure of the cost to buy a standard set of goods at the expected quality level of the era. Just because technology gets better over time doesn't mean you can discount the inflation! For example if computers get 50% better every year and our money is inflating by 50% would you say the cost of computers is not changing? Of course not, the cost is going up 50% , yes they are also getting better but that is not part of the discussion.

It's just wrong on the face of it. If a median decent car was $5k in the 70's and now is $25k , then the price of cars has gone up by 5X. But oh no they say, you have air bags and more power and fuel economy and so on, the modern car is 5X better, so in fact there has been no inflation at all. Well, wait a minute. I *expect* the quality of life and technology to go up over time. Are you telling me that in a world with 0% inflation that technology does not get better over time? That's a strange way to measure things. And it's not really what you want to know when you ask about inflation. You want to know how much money do I need to afford a decent house, car, food, etc. at the expected standard of the time that I buy it.

Obviously we wish there was some item that had absolute constant value that we could measure against. Also obviously measuring inflation is very complicated and we are only scratching the surface. But it's very fishy. Rotten fishy.


08-09-11 | Inflation - 2

The government has made several significant changes to how it counts inflation. A big one occurred in 1995 (Boskin commission), another happened just in the last two years (chained CPI for COLA), and at some point the Fed changed it's core measure (PCE).

In all cases they claimed to be making the inflation measure "more accurate" , and in all cases, the inflation rate was revised downward.

Now, ignoring the details of the changes for now, it should be clear the government has a very strong interest in reporting a low inflation number.

1. It makes their administration look better. If inflation is low, then inflation-adjusted GDP looks better. There are lots of horrifying statistics about inflation-adjusted median income that are very embarassing to the US government, and you can make that go away by having a lower inflation number.

2. Lots of federal costs have automatic COLA (cost of living adjustments) like Social Security, Medicare, federal employee pensions an wages, etc. A lower inflation number directly decreases the amount they have to pay out.

3. They pay less out for TIPS

Whenever governments can lie for their own benefit, they tend to, so it would be *extremely* surprising if the reported inflation was actually correct.


08-09-11 | Randomness and Fault

Recent comment ranting has made me think of something that frequently annoys me.

I get quite aggravated when people invite a certain negative outcome on themselves and then act like it's random or unpredictable or "shit happens" or "just roll with it" or whatever.

There are three separate but similar categories of this : 1. Risky Behavior, 2. Intentional Ignorance, and 3. Futility of Fighting the System.

1. Risky Behavior : these people act like because something is probabilistic, their behavior has no effect on the outcome.

A classic example is risky drivers; someone might be speeding, talking on the phone, not paying attention to the road. They have an accident, and act like "accidents happen, it's random". No, it's not. You chose your behavior, and your behavior increased the probability of an accident. You just (probabilistically) crashed your car on purpose. It was a willful intentional choice to be risky.

More benign cases happen all the time; maybe you have a friend over and they're clearing plates from the table and are carrying way too many at once. Of course they drop one and break it. You are supposed to act like "ha ha, no big deal, accidents happen". But it wasn't an accident. They just (probabilistically) threw your plate into the ground.

Now I don't actually mind if someone comes over and breaks my plate, no big deal it's a fucking plate (it's a whole 'nother rant about how stupid it is to buy expensive plates and get upset when they break), but don't act like it was random, sure there was an element of chance, but it was your actions that (probabilistically) caused it.

It's particularly annoying when the person who has the "accident" told me to "chill out, it'll be fine" or whatever when I warned them to be aware of the risk.

Of course this happens in coding all the time too. I tend to be very cautious in my coding; I'd rather spend time testing and asserting now then have problems later. Inevitably I get into situations where someone on the team is having a nasty hard to reproduce bug. They act like "bugs happen" and it's sort of a random act of god. Did you robustly assert your code? Do you have unit tests? Did you separate out classes that have strict invariants? No? Then you just (probabilistically) chose to have bugs in your code, don't act like they're random.

(there's a separate issue of whether the precautions are actually worth it or not; there's a spectrum of behavior from having to be super careful in advance so that you never have problems in the future (eg. NASA) vs. just being sloppy and fast and accepting a high probability of risk (eg. Game Jam)).

Just because something has a probabilistic element doesn't mean there's no correlation to your actions, or that you're not to blame when things go bad.

2. Intentional Ignorance : this is chosing not to do the research that you easily could have done and thus getting into a bad situation. Now, there's nothing wrong with that per se, that's a life choice and has different trade offs. The thing that annoys me is when people act like they "couldn't have known" or it's perfectly normal not to have known. Not true, you could have easily known.

Say you're visiting a strange town and you go out to eat somewhere and it sucks. It's not random that it sucked - it's because you didn't do any research (probabilistically). Okay, that's fine if that's the choice you want to make, but don't act like it's not your fault - it is a direct result of your choice to not do research that it sucked.

3. Futility of Fighting the System : this is perhaps the most naive and self-defeating variant, and mainly affects the young or the poor (except when it comes to voting, in which case it surpisingly runs across all demographics).

These people act like it doesn't matter what they do, that someone their bank or cell phone carrier or the cops or whatever will find a way to screw them. Basically they refuse to recognize the cause/effect connection between their own actions and the outcomes.

A lot of this is because of the same failure to connect cause/effect in probabilistic situations. Maybe this person tried to be really careful one month and do everything right, and they still got some absurd bank fee or roaming charge or whatever, they conclude that "you can't win" and "what I do doesn't matter". They don't see that their actions might reduce the probability of fuckage even if it doesn't eliminate it.

(of course to some extent this is just an excuse; they really know the truth, but they pretend not to because they don't want to be accountable for their own actions, they want to be able to fuck up and act like they're not to blame, that "it doesn't matter what I do, the system fucks me anyway).

Amazingly even smart people will talk this way about voting, that it "doesn't matter who I vote for the politicians always fuck us" ; well yes, there will be fuckage no matter what, but don't be retarded, of course you can affect the probability of fuckage through your actions. Just because it's not deterministic doesn't mean you are divorced from responsibility.

A lack of determinstic feedback is of course what makes poker so hard for many people. Almost everyone learns well when there is immediate determinstic feedback on whether their action is right or not. (this isn't saying much, dogs and monekys also learn well under those conditions). Many people struggle when the feedback is randomized or unclear or very delayed. For example when you try a new line in poker, like maybe you try three-betting from the blinds with medium range hands, if it goes badly a few times most people will conclude "that was a bad idea" and won't try it any more. It's very hard for these people to learn and get better because they're just looking at what they did in the instant and whether it paid off.


08-09-11 | The Lobster

(this coinage is so obvious I must have stolen it from somewhere, anyway...)

I've been thinking a lot recently about "the lobster".

I've always thought it was bizarre how you can pull into any podunk town in America and go to the scary local diner / steak house, and there will be the regular items - burger, chicken fried steak, what have you, all under $10, and then there's the lobster, for $30, ridiculously overpriced, tucked in the corner of the menu with decorative squiggles around it (as if it needs velvet ropes to separate the VIP section of the menu from the plebian fare).

The thing is, the lobster is not actually good. They probably can't remember the last time anybody actually ordered the lobster. No local would; if the waitress likes you she would warn you not to get, the chefs roll their eyes when the order comes in. Why is it on the menu at all?

I guess it's just there as a trap, for some sucker who doesn't know better, for someone wanting to show off the money they just won, or someone on an expense account to waste money on. You're really just humiliating yourself when you order it, and the restaurant is laughing at you.

I think most people know that you don't actually ever order the lobster in restaurants (other than lobster-specializing places in like Maine or something). But "the lobster" can pop up in many other guises. Expensive watches are obvious lobsters, expensive cars can be less obvious lobsters (is a Maserati a lobster? an Alfa? an Aston? a Porsche?), certainly some of the options and special editions are obvious lobsters, for example the recent Porsche "Speedster" special edition that cost $250k and was just a regular Carrera other than a few colored bits, that's clearly a lobster and Porsche laughs and rolls their eyes at the Seinfelds of the world who are stupid enough to buy the Porsche lobster just because it was on the menu with squiggly lines around it.

I feel like a lot of salesmen try to slip the lobster on you when you're not paying attention. Like when the contractor asks if you want your counters in wood or stone or italian marble - hey wait, contractor, that's the lobster! okay, yeah, you got me, I don't even know where to get italian marble but I thought I'd try to slip it in there. Home improvement in general is full of lobsters. Home theatre stores usually carry a lobster; car wheels ("rims") are rife with lobsters.

The thing that makes the nouveau riche so hilarious is they are constantly getting suckered into buying the lobster and then have the stupidity to brag about it. Ooo look at my gold plated boat ; you fool, you bought the lobster, hide your shame!


One of the things that's so satisfying about video games is that you get a clear reward for more work. You kill some monsters, you get experience, you go up a level; you collect 200 gems, now you can buy the red shield, and it is objectively better than the blue shield you had before. It's very simple and satisfying.

Life is not so clear. More expensive things are not always better. Doing more work doesn't necessarily improve your life. This can be frustrating and confusing.

One of the things that makes me lose it is video game designers who think it's a good idea to make games more realistic in this sense, like providing items in the stores that are expensive but not actually very good. No! I don't want to have to try to suss out "the lobster" in the video game blacksmith, you want video game worlds to be an escapist utopia in which it's always clear that spending more money gets you better stuff. (the other thing I can't stand is games that take away your items; god dammit, don't encourage me to do the work for that if you're going to take it away, don't inject the pains of real life into games, it does not make them better!)


08-08-11 | Some Video Watching

"Thick as Thieves" is the most surprising movie I've seen in a long time. It's fun, smart, funny, it's what movies should be and are so rarely. It's directed by the director of Black Dynamite, and stars Alec Baldwin, but despite those two things it's quite subtle, so much more subtle than modern crap. It lets you find the joke rather than cramming it down your throat, and that makes it much funnier.

"Vengo" is not really a movie with a plot, so much as a snapshot of a (stereotypical/artificial) world of flamenco. I enjoyed it. It reminded me of the music scenes in Kusturica movies (but with less humor), in the sense that seeing the music in a stereotypical fictional setting somehow makes it better, makes you feel you appreciate what it's like for the people who live with that music as the backbone of their lives.

Louie has completely gone off the rails. E1 and E2 of Season 2 were shit, then E3 (buying a house) was back to funny, and then he completely fucking lost the point again in E4+. Uh, hello, fucking Louie CK, people are watching you because they want to laugh at the absurdity and miseries of life, you're supposed to point them out and then make them funny, that provides relief. You are not just supposed to document your miserable little (fake) life in a pretentious attempt at "verite". It's sad because it really could be a good show if he would just get his head out of his ass and tell more jokes.

Game of Thrones is great, like maybe the best fantasy TV series ever (not a lot of competition there, though (actually I can't think of a single one)). By far the best thing about it is the costumes. The costumes are stellar. The sets are great (except for the rare CG set; and for some reason the matte paintings also suck pretty bad, they have poor matte artists and poor matte-foreground integration). Most of the casting is very good. The real letdown is the pathetic George RR Martin fantasy world. WTF. Cold north with frozen ancient evil. Across the narrow sea is the desert full of arab/mongols. Scottish defenders of Hadrian's wall. WTF it's Europe / Tolkein / Anne McCaffrey ; it's like the most generic uncreative fantasy world I think I have ever seen. I find fantasy proxies for Europe to be really boring. (so much worse than fiction in which you take the real Europe and imagine there might have been hidden magic in it). Anyway, I enjoyed the TV show quite a bit.

Justified Season 2 is better than Season 1. There's one or two really bad one-off "episodic" episodes (wow that's a horrible sentence) that are like "random crime happens and good ole Raylon Givens saves the day, and everything is back to normal by the end" but fortunately those are few. Of course the way they make excuses for the Marshalls to be involved in everything is retarded, but if you just ignore all that it's not bad. Actually, objectively it has a lot of flaws but I guess I like the actors enough to forgive them.

Zen is pretty retarded. Why is Italy full of Brittish people? (and why can I not spell British?) The cases are all like super typical horrible mystery writing, which goes like this :

crazy crime happens
intrigue and politics get in the way
it gets more and more complicated
hero gets attacked or in trouble
sexy females randomly introduced to the plot
solution looks hopeless
...
random coincidental shit happens and presto case is solved
And you've got the typical stupid shit like bad guys who can't hit you with a gun from ten feet, but people on the heros side are always perfect sharp-shooters, etc. However, it's almost worth watching just for the gooey cheeziness of it. The soundtrack is straight out of the 80's smooth jazz collection, it's what a "playa" would put on to seduce the ladies. Then there's all these ridiculous self-conscious shots that focus on the cool clothes or the cool cars while we listen to the funky bass line, omg.


08-08-11 | Inflation - a sanity check


Fuel prices have risen faster than (nominal) inflation.

Education - much faster than inflation.

Housing faster than inflation.

Food faster than inflation.

Health care faster than inflation.

Gold, copper, etc. - much faster than inflation.

Foreign currency - faster than inflation.

Ummm....

Something is wrong with this picture.


08-08-11 | Vets are Assholes

Some people seem to have a failure to grasp what is reasonable behavior and what is reasonable to nag about.

I've complained about my current landlord before, who just has no concept about what normal wear and tear by a renter is like. I fucking mow and weed and water and repair shit and oil the counters and all this shit, and yet they send me monthly pesters like "I drove by and the grass was looking a little tall, better get out there and trim it" ; or "it might freeze tonight, better wrap all the outside faucets" ; WTF , do you have any clue what bad renters do to houses? I'm not lighting the house on fire. I'm not turning on the water and letting it run (when the owner pays the utilities), I'm not selling crack out of the house, you should be fucking happy.

One of the more unfortunate bosses I ever had was on the IEEE board for code style guidelines (that's got to be a major red flag right there). I'm all for some level of code uniformity and cleanliness, but he would literally review every checkin I made each night and send me a mail with things like "variables at line 97 aren't lined up in the same column; there's a blank line on 416 that should be deleted" ... are you kidding me? The code is basically so clean compared to the spaghetti mess it could be; there's this failure to grasp minor transgressions.

Vets seem to consistently suffer from this problem.

Every time I have to take my cats in to the vet, I get some kind of condescending lecture from an asshole vet.

Back in CA I got a long lecture about how I shouldn't let my cats run around outside because they get diseases and injuries and so on. Okay, mister vet, I'm sure that's true for humans too, so why don't you just stay in your house and never leave and then we won't have to deal with your uptight ass.

I also got a lecture about putting the food too close to the litter. Which, hey actually is a bad thing to do, and I didn't know it, so it's good that I learned, but the vet never tells you things in the style of "oh, yeah, you might not know this and here's a tip" , it's always like "you are a rotten human being who is intentionally abusing your animals".

With Chi Chi here we always get a lecture about how she was declawed. Well fuck you vet, we didn't do it, we adopted an adult cat from a shelter that was previously declawed, so nyah.

This last time we got a lecture about how fat she is. Yeah she's slightly over weight, but she's not one of those ultra-obese cats that owners actually should be lectured about.


I wrote this a month ago but didn't post it, because it's just whiney and boring and I'm trying to avoid posting things like that (I write a lot of shit that I don't post, despite how it may seem, this blog is not a direct stream of defecation from the anus of my mind; when I'm being wise I don't hit "publish" right after I write something, and 99% of the time when I revisit it a few hours later I decide it should go in the bin).

But recent events have reminded me of it so I dug it back up.

We went backpacking recently, so I was reading lots of hiking books, and I have to say - "The Mountaineers" are assholes. Every single hike description is a fucking diatribe against trail users that they don't approve of. It's not once in a while, it's every single fucking description, they just can't resist being snarky and nasty on every page. It's so bad that I can barely stand to read the books despite them being clearly the best reference material.

And it's the same kind of out of touch ranting. There are plenty of very obvious bad trail users that deserve to be complained about - people who leave trash in the wilderness is probably the worst one, people who bring up boom boxes or dogs in no-dogs-allowed areas, people who trample the flowers off trail, etc. But that's not what The Mountaineers complain about. The things they complain about are :


People who camp near lakes (because it's too high use)

People who have camp fires (even in allowed fire places) ; (they mock us as "kumbayah-ers")

People who only backpack one or two nights (you're scum if you don't get into deep wilderness)

People who want a road that takes them to easy access

etc.

it's like so fucking out of touch with what is actually a sin on the trail.


08-06-11 | A case for OnLive ?

Reading Sanders' last post he brings up an intersting point that might actually be a case for OnLive.

I've long been a sceptic/critic of OnLive. Basically I think that taking standard games and running them over a network where you add 200 ms of latency and get no benefit is totally fucking retarded.

But a game that is custom-made for an OnLive / cloud is kind of interesting.

Particularly an MMO, because 1. latency isn't that big of a deal ; 2. they already have horrible latency so players are used to it, and 3. lots of players are in the same room at the same time, so you can share computer power on the server. Non-networked games where each player are in an independent world are much less compelling.

Also, if you're thinking of a running a Rage-like texture cache, doing local loads and recompresses is sort of like running an OnLive server on your local machine and serving yourself up compressed data - it adds latency, adds compression artifacts, and generally is very undesirable if it there was another choice.

In particular I imagine a use case like this which makes some sense to me :


MMO game
WoW style gameplay that's not super latency critical
cloud-style computing that dynamically puts more servers where needed
Huge number of players can be in the same room and there's no slow down
  (more servers just contribute to processing that area)

Non-GPU renderer; like maybe REYES or a ray tracer
Super high source content sizes stored on shared servers
  (how to generate massive amounts of source content is unknown)
Since you're just sending frames back to clients, render quality is unlimited
  just requires more servers

Could do real-time GI since you can put lots of servers on it
  (and the result is shared for lots of players so the cost is not prohibitive)
or just have massive pre-baked lightmaps with time-of-day variation
  (something like a spherical harmonic per texel, and store 24 of them, one for each hour)
  since back-end storage size is unlimited

the OnLive-style serving the images actually is an advantage in that scenario. In practice, no game company has the know-how to manage such complex servers. And the cost per player is too high. And being able to deliver massively more content just creates a big problem of how to create that content. etc.


08-02-11 | Coder Dictionary : "to checker"

To checker : verb, intransitive : to descend into the technical minutia of a subject which is only tangentially related to your project ; to work very hard and yet make very little progress toward shipping ; to stretch a few day task into many months.

I've definitely been checkering a bit for the last month on all this lockfree shit. However, I contend it was only a "half checker" because what I've been doing is actually useful to posterity (I think), whereas to complete a "full checker" you have to go off into technical minutia about topics that help *nobody* and make your friends completely exasperated.

(I kid, I kid)


08-01-11 | Double checked wait

Something that we have touched on a few times is the "double checked wait" pattern. It goes like this :
consumer :

if ( not available )
{
    prepare_wait();

    if ( not available )
    {
        wait();
    }
    else
    {
        cancel_wait();
    }
}

producer :

make available
signal_waiters();
now, why do we do this? Well, if you did just a naive check like this :

consumer :

if ( not available )
{
    // (*1)
    wait();
}

producer :

make available
signal_waiters();

you have a race. What happens is you check available and see none, so you step in to *1 ; then the producer runs, publishes whatever and signals - but there are no waiters yet so the signal is lost. Then you go into the wait() and deadlock. This is the "lost wakeup" problem.

So, the double check avoids this race. What must the semantics of prepare_wait & wait be for it to work? It's something like this :

Any signal that happens between "prepare_wait" and "wait" must cause "wait" to not block (either because the waitable handle is signalled, or through some other mechanism).

Some implementations of a prepare_wait/wait mechanism may have spurious signals; eg. wait might not block even though you shouldn't really have gotten a signal; because of that you will usually loop in the consumer.

Now let's look at a few specific solutions to this problem :

condition variables

This is the locking solution to the race. It doesn't use double-checked wait, instead it uses a mutex to protect the race; the naive producer/consumer is replaced with :


consumer :

mutex.lock();
if ( not available )
{
    unlock_wait_lock();
}

producer :

mutex.lock();
make available
signal_waiters();
mutex.unlock();

which prevents the race because you hold the mutex in the consumer across the condition check and the decision to go into the wait.

waitset

Any simple waitset can be used in this scenario with a double-checked wait. For example a trivial waitset based on Event is like this :


waitset.prepare_wait :
    add current thread's Event to list of waiters

waitset.wait :
    WaitForSingleObject(my Event)

waitset.signal_waiters :
    signal all events in list of waiters

for instance, "waitset" could be a vector of handles with a mutex protecting access to that vector. This would be a race without the prepare_wait and double checking.

In this case we ensure the double-checked semantics works because the current thread is actually added to the waitset in prepare_wait. So any signal that happens before we get into wait() will set our Event, and our wait() will not actually block us, because the event is already set.

eventcount

Thomasson's eventcount accomplishes the same thing but in a different way. A simplified version of it works like this :


eventcount.prepare_wait :
    return key = m_count

eventcount.wait :
    if ( key == m_count )
        Wait(event)

eventcount.signal_waiters :
    m_count++;
    signal event;

(note : event is a single shared broadcast event here)

in this case, prepare_wait doesn't actually add you to the waitset, so signals don't go to you, but it still works, because if signal was called in the gap, the count will increase and no longer match your key, so you will not do the wait.

That is, it specifically detects the race - it sees "was there a signal between when I did prepare_wait and wait?" , and if so, it doesn't go into the wait. The consumer should loop, so you keep trying to enter the wait until you get to check your condition without a signal firing.

futex

It just occurred to me yesterday that futex is actually another solution to this exact same problem. You may recall - futex does an internal check of your pointer against a value, and only goes into the wait if the value matches.

producer/consumer with futex is like this :


consumer :

if ( value = not_available )
{
    futex_wait(&value,not_available);
}

producer :

value = available
futex_signal(&value);

this may look like just a single wait at a glance, but if we blow out what futex_wait is doing :

consumer :

if ( value = not_available )
{
    //futex_wait(&value,not_available);

    futex_prepare_wait(&value);
    if ( value == not_available )
        futex_commit_wait(&value);
    else
        futex_cancel_wait(&value);
}

producer :

value = available
futex_signal(&value);

we see can clearly see that futex is just double-checked-wait in disguise.


Do we like the futex API? Not really. I mean it's nice that the OS provides it, but if you are designing your own waitset you would never make the API like that. It confines you to only working on single ints, and your condition has to be int == value. A two-call API like "prepare_wait / wait" is much more flexible, it lets you check conditions like "is this lockfree queue empty" which are impossible to do with futex (what you wind up doing is just doing the double-check yourself and use futex just as an "Event", either that or duplicating the condition into an int for futex's benefit (but that is risky, it can race if not done right, so not recommended)).

BTW some of the later extensions of futex are very cool, like bitset waiting and requeue.


08-01-11 | Non-mutex priority inversion

An issue I don't see discussed much is non-mutex priority inversion.

First a review of mutex priority inversion. A low priority thread locks a mutex, then loses execution. A high priority thread then tries to lock that mutex and blocks. It gives up its time slice, but a bunch of medium priority threads are available to run, so they take all the time and the low priority thread doesn't get to run. We call it "priority inversion" because the high priority thread is getting CPU time as if it was the same as the low priority thread.

Almost all operating systems have some kind of priority-inversion-protection built into their mutex. The usual mechanism goes something like this : when you block on a mutex, find the thread that currently owns it and either force execution to go to that thread immediately, or boost its priority up to the same priority as the thread trying to get the lock. (for example, Linux has "priority inheritance").

The thing is, there are plenty of other ways to get priority inversion that don't involve a mutex.

The more general scenario is : a high priority thread is waiting on some shared object to be signalled ; a low priority thread will eventually signal that object ; medium priority threads take all the time so the low priority thread can't run, and the high priority thread stays blocked.

For example, this can happen with Semaphores, Events, etc. etc.

The difficulty is that in these cases, unlike with mutexes, the OS doesn't know which thread will eventually signal the shared object to let the high priority thread go, so it doesn't know who to boost.

Windows has panic mechanisms like the "balance set manager" which look for any thread which is not waiting on a waitable handle, but is getting no CPU time, then they force it to get some CPU time. This will save you if you are in one of these non-mutex priority-inversions, but it takes quite a long time for that to kick in, so it's really a last ditch panic save, if it happens you regret it.

Sometimes I see people talking about mutex priority inversion as if that's a scary issue; it's really not on any modern OS. But non-mutex priority inversion *is*.

Conclusion : beware using non-mutex thread flow control primitives on threads that are not of equal priority !


08-01-11 | A game threading model

Some random ideas.

There is no "main" thread at all, just a lot of jobs. (there is a "main job" in a sense, that runs once a frame kicks off the other jobs needed to complete that frame)

Run 1 worker thread per core; all workers just run "jobs", they are all interchangeable. This is a big advantage for many reasons; for example if one worker gets swapped out (or some outside process takes over that CPU), the other workers just take over for it, there is never a stall on a specific thread that is swapped out. You don't have to switch threads just to run some job, you can run it directly on yourself. (caveat : one issue is the lost worker problem which we have mentioned before and needs more attention).

You also need 1 thread per external device that can stall (eg. disk IO, GPU IO). If the API's to these calls were really designed well for threading this would not be necessary - we need a thread per device simply to wrap the bad API's and provide a clean one out to the workers. What makes a clean API? All device IO needs to just be enqueue'd immediately and then provide a handle that you can query for results or completion. Unfortunately real world device IO calls can stall the calling thread for a long time in unpredictable ways, so they are not truly async on almost any platform. These threads should be high priority, do almost no CPU work, and basically just act like interrupts.

A big issue is how you manage locking game objects. I think the simplest think conceptually is to do the locking at "game object" granularity, that may not be ideal for performance but it's the easiest way for people to get it right.

You clearly want some kind of reader/writer lock because most objects are read many more times than they are written. In the ideal situation, each object only updates itself (it may read other objects but only writes itself), and you have full parallelism. That's not always possible, you have to handle cross-object updates and loops; eg. A writes A and also writes B , B writes B and also writes A ; the case that can cause deadlock in a naive system.

So, all game objects are referenced through a weak-reference opaque handle. To read one you do something like :

    const Object * rdlock(ObjectHandle h)
and then rely on C's const system to try to ensure that people aren't writing to objects they only have read-locked (yes, I know const is not ideal, but if you make it a part of your system and enforce it through coding convention I think this is probably okay).

In implementation rdlock internally increments a ref on that copy of the object so that the version I'm reading sticks around even if a new version is swapped in by wrlock.

There are various ways to implement write-lock. In all cases I make wrlock take a local copy of the object and return you the pointer to that. That way rdlocks can continue without blocking, they just get the old state. (I assume it's okay for reads to get one-frame-old data) (see note *). wrunlock always just exchanges in the local object copy into the table. rdlocks that were already in progress still hold a ref to the old data, but subsequent rdlocks and wrlocks will get the new data.

One idea is like this : Basically semi-transactional. You want to build up a transaction then commit it. Game object update looks something like this :

    Transaction t;
    vector<ObjectHandle> objects_needed;
    objects_needed = self; 
    for(;;)
    {
        wrlock on all objects_needed;

        .. do your update code ..
        .. update code might find it needs to write another object, then do :

        add new_object to objects_needed
        if ( ! try_wrlock( new_object ) )
            continue; // aborts the current update and will restart with new_object in the objects_needed set

        wrunlock all objects locked
        if ( unlocks committed )
            break; // update done
    }

(in actual C++ implementation the "continue" should be a "throw" , and the for(;;) should be try/catch , because the failed lock could happen down inside some other function; also the throw could tell you what lock caused the exception).

There's two sort of variants here that I believe both work, I'm not sure what the tradeoffs are :

1. More mutex like. wrlock is exclusive, only one thread can lock an object at a time. wrunlock at the end of the update always proceeds unconditionally - if you got the locks you know you can just unlock them all, no problem. The issues is deadlock for different lock orders, we handle that with the try_lock, we abort all the locks and go back to the start of the update and retake the locks in a standardized order.

2. More transaction like. wrlock always proceeds without blocking, multiple threads can hold wrlock at the same time. When you wrunlock you check to see that all the objects have the same revision number as when you did the wrlock, and if not then it means some other commit has come in while you were running, so you abort the unlock and retry. So there's no abort/retry at lock time, it's now at unlock time.

In this simplistic approach I believe that #1 is always better. However, #2 could be better if it checked to see if the object was not actually changed (if it's a common case to take a wrlock because you thought you needed it, but then not actually modify the object).

Note that in both cases it helps to separate a game object's mutable portion from its "definition". eg. the things about it that will never change (maybe its mesh, some AI attributes, etc.) should be held to the side somehow and not participate in the wrlock mechanism. This is easy to do if you're willing to accept another pointer chase, harder to do if you want it to just be different portions of the same continuous memory block.

Another issue with this is if the game object update needs to fire off things that are not strictly in the game object transaction system. For example, say it wants to start a Job to do some path finding or something. You can't fire that right away because the transaction might get aborted. So instead you put it in the "Transation t" thing to delay it until the end of your update, and only if your unlocks succeed then the jobs and such can get run.

(* = I believe it's okay to read one frame old data. Note that in a normal sequential game object update loop, where you just do :


for each object
    object->update();

each object is reading a mix of old and new data; if it reads an item in the list before itself, it reads new data, if it reads an item after itself, it reads old data; thus whether it gets old or new data is a "race" anyway, and your game must be okay with that. Any time you absolutely must read the most recent data you can always do a wrlock instead of a rdlock ;

You can also address this in the normal way we do in games, which is separate objects in a few groups and update them in chunks like "phase 1", then "phase2" ,etc. ; objects that are all within the same phase can't rely on their temporal order, but objects in a later phase do know that they see the latest version of the earlier phase. This is the standard way to make sure you don't have one-frame-latency issues.

*).

The big issue with all this is how to ensure that you are writing correct code. The rules are :

1. rdlock returns a const * ; never cast away const

2. game object updates must only mutate data in game objects - they must not mutate global state or anything outside of the limitted transaction system. This is hard to enforce; one way might be to make it absolutely clear with a function name convention which functions are okay to call from inside object updates and which are not.

For checking this, you could set a TLS flag like "in_go_update" when you are in the for {} loop, then functions that you know are not safe in the GO loop can just do ASSERT( ! in_go_update ); which provides a nice bit of safety.

3. anything you want to do in game object update which is not just mutating some GO variables needs to be put into the Transaction buffer so it can be delayed until the commit goes through. Delayed transaction stuff cannot fail; eg. it doesn't get to participate in the retry/abort, so it must not require multiple mutexes that could deadlock. eg. they should pretty much always just be Job creations or destructions that are just pushes/pops from queues.

Another issue that I haven't touched on is the issue of dependencies. A GO update could be dependent on another GO or on a Job completion. You could the freedom of scheduling order to reschedule GOs whose dependencies aren't done for later in the tick, rather than stalling.


07-31-11 | An example that needs seq_cst ?

No, not really. I thought I found the great white whale an algorithm that actually needs sequential consistency , but it turned out to be our old friend the StoreLoad problem.

It's worth having a quick look at because some of the issues are ones that pop up often.

I was rolling a user-space futex emulator. To test it I wrote a little mutex. A very simple mutex based on a very simplified futex might look like this :


struct futex_mutex2
{
    std::atomic<int> m_state;

    futex_mutex2() : m_state(0)
    {
    }
    ~futex_mutex2()
    {
    }

    void lock(futex_system * system)
    {
        if ( m_state($).exchange(1,rl::mo_acq_rel) )
        {
            void * h = system->prepare_wait();
            while ( m_state($).exchange(1,rl::mo_acq_rel) )
            {
                system->wait(h);
            }
            system->retire_wait();
        }
    }

    void unlock(futex_system * system)
    {
        m_state($).store(0,rl::mo_release);

        system->notify_one();
    }
};

(note that the actually "futexiness" of it is removed now for simplicity of this test ; also of course you should exchange state to a contended flag and all that, but that hides the problem, so that's removed here).

Then the super-simplified futex system (with all the actual futexiness removed, so that it's just a very simple waitset) is :


//#define MO    mo_seq_cst
#define MO  mo_acq_rel

struct futex_system
{
    HANDLE          m_handle;
    atomic<int>     m_count;

    /*************/
    
    futex_system()
    {
        m_handle = CreateEvent(NULL,0,0,NULL);
    
        m_count($).store(0);
    }
    ~futex_system()
    {
        CloseHandle(m_handle);  
    }
        
    void * prepare_wait( )
    {
        m_count($).fetch_add(1,MO);
        
        return (void *) m_handle;
    }

    void wait(void * h)
    {
        WaitForSingleObject((HANDLE)h, INFINITE);
    }

    void retire_wait( )
    {
        m_count($).fetch_add(-1,MO);
    }
    
    void notify_one( )
    {
        if ( m_count($).load(mo_acquire) == 0 ) // mo_seq_cst
            return;
        
        SetEvent(m_handle);
    }
};

So I was finding that it didn't work unless MO was seq_cst (and the load too).

The first point of note is that when I had the full futex system in there which had some internal std::mutexes - there was no bug, the ops on count($) didn't need to be seq cst. That's a common and nasty problem - if you have some ops internally that are seq_cst (such as mutex lock unlock), it can hide the fact that your other atomics are not memory ordered correctly. It was only when I removed the mutexes that the problem revealed itself, but it was actually there all along.

We've discussed this before when we asked "do mutexes need to be seq cst" ; the answer is NO if you just want them to provide mutual exclusion. But if you want them to act like an OS mutex, then the answer is YES. And the issue is that people can write code that is basically relying on the OS mutex being a barrier that provides more than just mutual exclusion.

The next point is that when I reduced the test down to just 2 threads, I still found that I needed seq_cst. That should be a tipoff that the problem does not actually arise from a need for total order. A true seq_cst problem should only show up when you go over 2 threads.

The real problem of course was here :


    void unlock(futex_system * system)
    {
        m_state($).store(0,rl::mo_release);

        //system->notify_one();

        #StoreLoad

        if ( system->m_count($).load(mo_acquire) == 0 ) // mo_seq_cst
            return;
        
        SetEvent(system->m_handle);
    }
};

we just need a StoreLoad barrier there. It should be obvious why we need a StoreLoad there but I'll be very explicit :

same as :

    void unlock(futex_system * system)
    {
        m_state($).store(0,rl::mo_release);

        int count = system->m_count($).load(mo_acquire);

        if ( count == 0 )
            return;
        
        SetEvent(system->m_handle);
    }

same as :

    void unlock(futex_system * system)
    {
        int count = system->m_count($).load(mo_acquire);

        // (*1)

        m_state($).store(0,rl::mo_release);

        if ( count == 0 )
            return;
        
        SetEvent(system->m_handle);
    }

so now at (*1) we have already loaded count and got a 0 (no wiaters); then the other thread trying to lock the mutex sees state == 1, locked, so it incs count and goes to sleep, and we return, and we have a deadlock.

As noted in the first post on this topic, there's no #StoreLoad in C++0x , so you wind up needing seq cst. Note that the case we cooked up here is almost identical to Thomasson's problem with "event count" so you can read about that :

Synchronization Algorithm Verificator for C++0x - Page 2
really simple portable eventcount... - comp.programming.threads Google Groups
C++0x sequentially consistent atomic operations - comp.programming.threads Google Groups
C++0x memory_order_acq_rel vs memory_order_seq_cst
appcoreac_queue_spsc - comp.programming.threads Google Groups


07-30-11 | A look at some bounded queues - part 2

Okay, let's look into making an MPMC bounded FIFO queue.

We can use basically the same two ideas that we worked up last time.

First let's try to do one based on the read and write indexes being atomic. Consider the consumer; the check for empty now is much more race prone, because there may be another consumer simultaneously reading, which could turn the queue into empty state while you are reading. Thus we need a more single atomic moment to detect "empty" and reserve our read slot.

The most brute-force way to do this kind of thing is always to munge the two variables together. In this case we stick the read & write index into one int together. Now we can atomically check "empty" in one go. We're going to put rdwr in a 32-bit int and use the top and bottom 16 bits for the read index and write index.

So you can reserve a read slot something like this :


    nonatomic<t_element> * read_fetch()
    {
        unsigned int rdwr = m_rdwr($).load(mo_acquire);
        unsigned int rd;
        for(;;)
        {
            rd = (rdwr>>16) & 0xFFFF;
            int wr = rdwr & 0xFFFF;
            
            if ( wr == rd ) // empty
                return false;
                
            if ( m_rdwr($).compare_exchange_weak(rdwr,rdwr+(1<<16),mo_acq_rel) )
                break;
        }
                
        nonatomic<t_element> * p = & ( m_array[ rd % t_size ] );

        return p;
    }

but this doesn't work by itself. We have succeeded in atomically checking "empty" and reserving our read slot, but now the read index no longer indicates that the read has completed, it only indicates that a reader reserved that slot. For the writer to be able to write to that slot it needs to know the read has completed, so we need to publish the read through a separate read counter.

The end result is this :


template <