I'm pretty sure clother diapers are bullshit. I'm about to cancel my diaper service. In this first week I've been using a semi-alternating mix of cloth and disposable. I assumed that I would start out with disposables just for ease in the first few days and then switch to cloth because it's "better", but I don't think I will.
(I make all my decisions now based only on 1. personal observations and 2. serious scientific studies where I can read the original papers. I try to avoid and discount 3. journalism 4. hearsay 5. the internet 6. mass-market nonfiction. I think they are garbage and mental poison.)
What I'm seeing is :
Disposable diapers actually work the way they claim to. The seal around the borders is good. The entire diaper itself has a nice low profile so is not too bulky or uncomfortable. But most importantly, they actually do trap and absorb moisture. When baby has a heavy pee in a disposable diaper, the moisture stays right in one little spot and doesn't spread all over. When I remove the diaper I can feel her skin all over the nether regions is pretty dry.
Cloth diapers don't. The worst aspect is that when baby has a heavy pee, the cloth soaks it up, and because it's cloth and wicks moisture, the pee is spread all over her entire lower parts. When I get the diaper off, she's soaking wet all over. (and yes of course I'm changing her almost instantly after peeing because at this point we're watching her constantly). That alone is enough to turn me off cloth diapers, but there's lots more that sucks about them. It's really hard to get the diaper cover on such that it actually makes a water-tight seal, so leakage is much more likely (and if you do try to make it water tight, it's easy to make it too tight and cut off circulation, which I accidentally did once). The cloth diaper alone looks pretty comfortable on her, but the diaper cover is much rougher and more bulky than a dispoable; the result is that she has this huge awkward thing on.
When you add the inconvenience of cloth diapers (longer changing times, having to store poop in your house, taking the pail in and out for pickup), it just seems like a massive lose.
The only possible argument pro-cloth that makes sense to me is the reduction of the landfill load. Now, environmental arguments are always complicated; there are arguments for the other side based on the environmental cost of washing (though I think they're bogus). But even assuming that the environmental case is clear, being a hypocritical liberal I wouldn't actually inconvenience myself and discomfort my baby for the benefit of the landfill.
Eh, actually I take back that false self-accusation. That's a retarded Fox News style "gotcha" that's based on misrepresentation and not understanding. I've never advocated the standard liberal martyrdom (and if I once did, I certainly don't now). I don't believe in choosing to undermine yourself because you believe the world would be better if everyone did it. I believe in changing the laws such that they encourage you to make the choice that is better for the world. eg. people who don't drive because they believe it's evil, even if it would be much to their benefit, are just being dumb martyrs. The US government massively subsidizes driving, so if you don't take advantage of that you are essentially paying for other people to drive. I would love it if the government would subsidize *not driving* rather than the other way around, but until they do I'm driving up a storm. (tangent : the massive subsidize for Teslas is a great example of the way that Dems and Reps are in fact both really working for the same cause : creating loop holes and kick backs so that they can give money to rich people).
I'm a big tangent wanderer. My political philosophy in a nutshell :
Government's role is to create a market structure (through laws, regulation, the Fed, direct market action, etc) such that when each actor maximimizes their own personal utility, the net result is as good for the entire world (nation) as possible.
(if you're out of high school (or the 18th century) you should know that a free market does not do that on its own)
(And crucially, "good for" must be defined on something like a sum-of-logs scale, or perhaps just maximize the median, or minimize the number in poverty; if you maximize the sum (basically GDP) then giving huge profits to Larry Ellison and fucking everyone else looks like it's "good for the world")
And, uh, oh yeah, cloth diapers suck.
I want to write about the wonderful experience of having a home birth (see *2), but don't want to intrude on Tasha's privacy. Suffice it to say it was really good, so good to be home and have everything at hand to make Tasha comfortable, and then be able to take baby in our arms and settle into bed right away. We spent the first 36 hours after birth all in bed together and I think that time was really important.
I've always wanted to have kids, but I'm (mostly) glad that I waited this long. For one thing Tasha is a wonderful mom and I'm glad I found her. But also, I realize now that I wasn't ready in my twenties. I've changed a lot in the last five years and I'm a much better person now. I've learned important lessons that are helping me a lot in this challenging time, like to do hard work correctly you have to not only complete the task but also keep a good attitude and be nice to the people around you while you do it. And that when you are tired and hungry is when you can really show your character; anyone can have a good attitude when they're fresh, but if you get nasty when the going gets tough then you are nasty. etc. standard cbloom slowly figures out things that most people learned in their teens.
Now for some old-style ranting.
1. "We had a baby". No you fucking did not. Your wife had a baby. If you were a really good husband, you held her hand and got her snacks. She squeezed a watermelon out of her vagina. You do not get to take any credit for that act, it was all her. It's a bit like Steve Jobs saying "we invented" anything; no you did not you fucking credit-stealing douchebag, your company didn't even invent it, much less you.
(tangent : I can't stand the political correctness in sport post-game interviews these days; they're all so sanitized and formulaic. They must go to interview coaching classes or something because everyone says exactly the same things. Of course it's not the athlete's fault, they would love to have emotional honest outbursts, it's the god damn stupid public who throw a coniption if anybody says anything remotely true. In particular this post reminds me of how athletes always immediately go "it wasn't just me, it was the team"; no it was not, Kobe, you just had an 80 point game, it was all fucking you, don't give me this bullshit credit to the team stuff. Be a man and say "*I* won this game".)
2. People are busy-body dicks. When we would tell acquaintances about our plans to have a home birth, a good 25% would feel like they had to tell us what a bad idea that was and nag us about the dangers of childbirth. Shut the fuck up you stupid asshole. First of all, don't you think that maybe we've researched that more than you before making our decision, so you don't know WTF you're talking about? Second of all, we're not going to change our mind because of your nagging, so all you're doing is being nasty about something you're not going to change. We didn't ask for your opinion, you can just stay the hell out of it. (The doctors that we would occasionally see for tests were often negative and naggy as well, which only made us more confident in our choice).
It's a bit like if a friend tells you they're marrying someone and you go "her?". Even if the marriage is a questionable choice, they're not going to stop it due to your misgivings, so all you're doing is adding some unpleasantness to their experience.
You always run into these idiots when you do software reviews or brainstorming sessions. You'll call a meeting to discuss revisions to the boss fight sequence, and some asshole will always chime in with "I really think the whole idea of boss fights sucks and we should start over". Umm, great, thanks, very helpful. We're not going to tear up the whole design of the game a few months from shipping, so maybe you could stick to the topic at hand and get some kind of clue about what things are reasonable to change and which need to be taken as a given and worked within as constraints.
Like when I'd ask for reviews of Oodle, a few of the respondents would give me something awesomely unhelpful like "I don't like the entire style of the API, and I'd throw it out and do a new one" , or "actually I think a paging + data compression library is a bad idea and I'd just start over on something else". Great, thanks; I might agree with you but obviously you must know that that is not going to happen and it's not what I was asking for, so if you don't want to say anything helpful then just say "no".
(* = obviously not trivial if you're trying to minimize the memory ordering constraints, as evidenced by the revisions to this post that were required; it is trivial if you just make everything seq_cst)
Previous writings on this topic :
Smart & Weak Pointers - valuable tools for games - 03-27-04
cbloom rants 03-22-08 - 6
cbloom rants 07-05-10 - Counterpoint 2
cbloom rants 08-01-11 - A game threading model
cbloom rants 03-05-12 - Oodle Handle Table
The primary ops conceptually are :
Add object to table; gives it a WeakRef id
WeakRef -> OwningRef (might be null)
OwningRef -> naked pointer
OwningRef construct/destruct = ref count inc/dec
The full code is in here : cbliblf.zip , but you can get a
taste for how it works from the ref count maintenance code :
// IncRef looks up the weak reference; returns null if lost
// (this is the only way to resolve a weak reference)
Referable * IncRef( handle_type h )
{
handle_type index = handle_get_index(h);
LF_OS_ASSERT( index >= 0 && index < c_num_slots );
Slot * s = &s_slots[index];
handle_type guid = handle_get_guid(h);
// this is just an atomic inc of state
// but checking guid each time to check that we haven't lost this slot
handle_type state = s->m_state.load(mo_acquire);
for(;;)
{
if ( state_get_guid(state) != guid )
return NULL;
// assert refcount isn't hitting max
LF_OS_ASSERT( state_get_refcount(state) < state_max_refcount );
handle_type incstate = state+1;
if ( s->m_state.compare_exchange_weak(state,incstate,mo_acq_rel,mo_acquire) )
{
// did the ref inc
return s->m_ptr;
}
// state was reloaded, loop
}
}
// IncRefRelaxed can be used when you know a ref is held
// so there's no chance of the object being gone
void IncRefRelaxed( handle_type h )
{
handle_type index = handle_get_index(h);
LF_OS_ASSERT( index >= 0 && index < c_num_slots );
Slot * s = &s_slots[index];
handle_type state_prev = s->m_state.fetch_add(1,mo_relaxed);
state_prev;
// make sure we were used correctly :
LF_OS_ASSERT( handle_get_guid(h) == state_get_guid(state_prev) );
LF_OS_ASSERT( state_get_refcount(state_prev) >= 0 );
LF_OS_ASSERT( state_get_refcount(state_prev) < state_max_refcount );
}
// DecRef
void DecRef( handle_type h )
{
handle_type index = handle_get_index(h);
LF_OS_ASSERT( index >= 0 && index < c_num_slots );
Slot * s = &s_slots[index];
// no need to check guid because I must own a ref
handle_type state_prev = s->m_state.fetch_add((handle_type)-1,mo_release);
LF_OS_ASSERT( handle_get_guid(h) == state_get_guid(state_prev) );
LF_OS_ASSERT( state_get_refcount(state_prev) >= 1 );
if ( state_get_refcount(state_prev) == 1 )
{
// I took refcount to 0
// slot is not actually freed yet; someone else could IncRef right now
// the slot becomes inaccessible to weak refs when I inc guid :
// try to inc guid with refcount at 0 :
handle_type old_guid = handle_get_guid(h);
handle_type old_state = make_state(old_guid,0); // == state_prev-1
handle_type new_state = make_state(old_guid+1,0); // == new_state + (1<
<handle_guid_shift);
if ( s->m_state($).compare_exchange_strong(old_state,new_state,mo_acq_rel,mo_relaxed) )
{
// I released the slot
// cmpx provides the acquire barrier for the free :
FreeSlot(s);
return;
}
// somebody else mucked with me
}
}
The maintenance of ref counts only requires relaxed atomic increment & release atomic decrement (except when the pointed-at object is
initially made and finally destroyed, then some more work is required). Even just the relaxed atomic incs
could get expensive if you did a ton of them, but my philosophy for how to use this kind of system is that you inc & dec refs
as rarely as possible. The key thing is that you don't write functions that take owning refs as arguments, like :
void bad_function( OwningRefT
hence doing lots of inc & decs on refs all over the code. Instead you write all your
code with naked pointers, and only use the smart pointers where they are needed to ensure ownership for the lifetime
of usage. eg. :
<Thingy> sptr )
{
more_bad_funcs(sptr);
}
void Stuff::bad_caller()
{
OwningRefT<thingy> sptr( m_weakRef );
if ( sptr != NULL )
{
bad_function(sptr);
}
}
void good_function( Thing * ptr )
{
more_good_funcs(ptr);
}
void Stuff::bad_caller()
{
OwningRefT
<thingy> sptr( m_weakRef );
Thingy * ptr = sptr.GetPtr();
if ( ptr != NULL )
{
good_function(ptr);
}
}
If you like formal rules, they're something like this :
1. All stored variables are either OwningRef or WeakRef , depending on whether it's
an "I own this" or "I see this" relationship. Never store a naked pointer.
2. All variables in function call args are naked pointers, as are variables on the
stack and temp work variables, when possible.
3. WeakRef to pointer resolution is only provided as WeakRef -> OwningRef. Naked pointers
are only retrieved from OwningRefs.
And obviously there are lots of enchancements to the system that are possible. A major one that I recommend is to put more information in the reference table state word. If you use a 32-bit weak reference handle, and a 64-bit state word, then you have 32-bits of extra space that you can check for free with the weak reference resolution. You could put some mutex bits in there (or an rwlock) so that the state contains the lock for the object, but I'm not sure that is a big win (the only advantage of having the lock built into the state is that you could atomically get a lock and inc refcount in a single op). A better usage is to put some object information in there that can be retrieved without chasing the pointer and inc'ing the ref and so on.
For example in Oodle I store the status of the object in the state table. (Oodle status is a progression through Invalid->Pending->Done/Error). That way I can take a weak ref and query status in one atomic load. I also store some lock bits, and you aren't allowed to get back naked pointers unless you have a lock on them.
The code for the weak ref table is now in the cbliblf.zip that I made for the last post. Download : cbliblf.zip
( The old cblib has a non-LF weak reference table that's similar for comparison. It's also more developed with helpers and fancier templates and such that could be ported to this version. Download : cblib.zip )
ADDENDUM : alternative DecRef that uses CAS instead of atomic decrement. Removes the two-atomic free path.
Platforms that implement atomic add as a CAS loop should probably just use this form. Platforms that have
true atomic add should use the previously posted version.
// DecRef
void DecRef( handle_type h )
{
handle_type index = handle_get_index(h);
LF_OS_ASSERT( index >= 0 && index < c_num_slots );
Slot * s = &s_slots[index];
// no need to check guid because I must own a ref
handle_type state_prev = s->m_state($).load(mo_relaxed);
handle_type old_guid = handle_get_guid(h);
for(;;)
{
// I haven't done my dec yet, guid must still match :
LF_OS_ASSERT( state_get_guid(state_prev) == old_guid );
// check current refcount :
handle_type state_prev_rc = state_get_refcount(state_prev);
LF_OS_ASSERT( state_prev_rc >= 1 );
if ( state_prev_rc == 1 )
{
// I'm taking refcount to 0
// also inc guid, which releases the slot :
handle_type new_state = make_state(old_guid+1,0);
if ( s->m_state($).compare_exchange_weak(state_prev,new_state,mo_acq_rel,mo_relaxed) )
{
// I released the slot
// cmpx provides the acquire barrier for the free :
FreeSlot(s);
return;
}
}
else
{
// this is just a decrement
// but have to do it as a CAS to ensure state_prev_rc doesn't change on us
handle_type new_state = state_prev-1;
LF_OS_ASSERT( new_state == make_state(old_guid, state_prev_rc-1) );
if ( s->m_state($).compare_exchange_weak(state_prev,new_state,mo_release,mo_relaxed) )
{
// I dec'ed a ref
return;
}
}
}
}
I thought I'd make a super simple one in the correct modern style. Download : cbliblf.zip
(If you want a big fully functional much-more-complete library, Intel TBB is the best I've seen. The problem with TBB is that it's huge and entangled, and the license is not clearly free for all use).
There are two pieces here :
"cblibCpp0x.h" provides atomic and such in C++0x style for MSVC/Windows/x86 compilers that don't have real C++0x yet. I have made zero attempt to make this header syntatically identical to C++0x, there are various intentional and unintentional differences.
"cblibLF.h" provides some simple lockfree utilities (mostly queues) built on C++0x atomics.
"cblibCpp0x.h" is kind of by design not at all portable. "cblibLF.h" should be portable to any C++0x platform.
WARNING : this stuff is not super well tested because it's not what I use in Oodle. I've mostly copy-pasted this from my Relacy test code, so it should be pretty strong but there may have been some copy-paste errors.
ADDENDUM : In case it's not clear, you do not *want* to use "cblibCpp0x.h". You want to use real Cpp0x atomics provided by your compiler. This is a temporary band-aid so that people like me who use old compilers can get a cpp0x stand-in, so that they can do work using the modern syntax. If you're on a gcc platform that has the __atomic extensions but not C1X, use that.
You should be able to take any of the C++0x-style lockfree code I've posted over the years and use it with "cblibCpp0x.h" , perhaps with some minor syntactic fixes. eg. you could take the fastsemaphore wrapper and put the "semaphore" from "cblibCpp0x.h" in there as the base semaphore.
Here's an example of what the objects in "cblibLF.h" look like :
//=================================================================
// spsc fifo
// lock free for single producer, single consumer
// requires an allocator
// and a dummy node so the fifo is never empty
template
<typename t_data>
struct lf_spsc_fifo_t
{
public:
lf_spsc_fifo_t()
{
// initialize with one dummy node :
node * dummy = new node;
m_head = dummy;
m_tail = dummy;
}
~lf_spsc_fifo_t()
{
// should be one node left :
LF_OS_ASSERT( m_head == m_tail );
delete m_head;
}
void push(const t_data & data)
{
node * n = new node(data);
// n->next == NULL from constructor
m_head->next.store(n, memory_order_release);
m_head = n;
}
// returns true if a node was popped
// fills *pdata only if the return value is true
bool pop(t_data * pdata)
{
// we're going to take the data from m_tail->next
// and free m_tail
node* t = m_tail;
node* n = t->next.load(memory_order_acquire);
if ( n == NULL )
return false;
*pdata = n->data; // could be a swap
m_tail = n;
delete t;
return true;
}
private:
struct node
{
atomic<node *> next;
nonatomic<t_data> data;
node() : next(NULL) { }
node(const t_data & d) : next(NULL), data(d) { }
};
// head and tail are owned by separate threads,
// make sure there's no false sharing :
nonatomic<node *> m_head;
char m_pad[LF_OS_CACHE_LINE_SIZE];
nonatomic<node *> m_tail;
};
Download : cbliblf.zip
You have a value that's in [0,N). Ideally all code lengths would be the same ( log2(N) ) which is fractional for N not a power of 2. With just bit output, we can't write fractional bits, so we will lose some efficiency. But how much exactly?
You can of course trivially write a symbol in [0,N) by using log2ceil(N) bits. That's just going up to the next
integer bit count. But you're wasting values in there, so you can take each wasted value and use it to reduce
the length of a code that you need. eg. for N = 5 , start with log2ceil(N) bits :
0 : 000
1 : 001
2 : 010
3 : 011
4 : 100
x : 101
x : 110
x : 111
The first five codes are used for our values, and the last three are wasted.
Rearrange to interleave the wasted codewords :
0 : 000
x : 001
1 : 010
x : 011
2 : 100
x : 101
3 : 110
4 : 111
now since we have adjacent codes where one is used and one is not used, we can reduce the length of
those codes and still have a prefix code. That is, if we see the two bits "00" we know that it must
always be a value of 0, because "001" is wasted. So simply don't send the third bit in that case :
0 : 00
1 : 01
2 : 10
3 : 110
4 : 111
(this is a general way of constructing shorter prefix codes when you have wasted values). You can see that the number of wasted values we had at the top is the number of codes that can be shortened by one bit.
A flat code is written thusly :
void OutputFlat(int sym, int N)
{
ASSERT( N >= 2 && sym >= 0 && sym < N );
int B = intlog2ceil(N);
int T = (1<
That is, we write (T) values in (B-1) bits, and (N-T) in (B) bits.
The intlog2ceil can be slow, so in practice you would want to precompute that
or pass it in as a parameter.
<B) - N;
// T is the number of "wasted values"
if ( sym < T )
{
// write in B-1 bits
PutBits(sym, B-1);
}
else
{
// write in B bits
// push value up by T
PutBits(sym+T, B);
}
}
int InputFlat(int sym,int N)
{
ASSERT( N >= 2 && sym >= 0 && sym < N );
int B = intlog2ceil(N);
int T = (1<<B) - N;
int sym = GetBits(B-1);
if ( sym < T )
{
return sym;
}
else
{
// need one more bit :
int ret = (sym<<1) - T + GetBits(1);
return ret;
}
}
So, what is the loss vs. ideal, and where does it occur? Let's work it out :
H = log2(N) is the ideal (fractional) entropy
N is in (2^(B-1),2^B]
so H is in (B-1,B]
The number of bits written by the flat code is :
L = ( T * (B-1) + (N-T) * B ) / N
with T = 2^B - N
Let's set
N = f * 2^B
with f in (0.5,1] our fractional position in the range.
so T = 2^B * (1 - f)
At f = 0.5 and 1.0 there's no loss, so there must be a maximum in that interval.
Doing some simplifying :
L = (T * (B-1) + (N-T) * B)/N
L = (T * B - T + N*B - T * B)/N
L = ( N*B - T)/N = B - T/N
T/N = (1-f)/f = (1/f) - 1
L = B - (1/f) + 1
The excess bits is :
E = L - H
H = log2(N) = log2( f * 2^B ) = B + log2(f)
E = (B - (1/f) + 1) - (B + log2(f))
E = 1 - (1/f) - log2(f)
so find the maximum of E by taking a derivative :
d/df(E) = 0
d/df(E) = 1/f^2 - (1/f)/ln2
1/f^2 = (1/f)/ln2
1/f = 1/ln(2)
f = ln(2)
f = 0.6931472...
and at that spot the excess is :
E = 1 - (1/ln2) - ln(ln2)/ln2
E = 0.08607133...
The worst case is 8.6% of a bit per symbol excess. The worst case
appears periodically, once for each power of two.
The actual excess bits output for some low N's :
The worst case actually occurs as N->large, because at higher N you can get f closer to that worst case fraction (ln(2)). At lower N, the integer steps mean you miss the worst case and so waste less. This is perhaps a bit surprising, you might think that the worst case would be at something like N = 3.
In fact for N = 3 :
H = l2(3) = 1.584962 ...
L = average length written by OutputFlat
L = (1+2+2)/3 = 1.66666...
E = L - H = 0.08170421...
(obviously if you measure the loss as a percentage of the output length, the worst case
is at N=3, and there it's 5.155% of the entropy).
(there is some LVM stuff that lets you make multiple partitions and then treat them as a single one, but for a Unix newb like myself that looks too scary).
Also make sure "Connect at Power On" is checked.
bios.forceSetupOnce = "TRUE"
My VM was set up with a swap partitition, so I had to move that to the end before I could grow the primary partition. I hear that you can set up Linux with a swap file instead of a swap partition; that would be preferable. A swap partition makes zero sense in a VM where the disks are virtualized anyway (so the advantage of keeping the swap thrashing off your main disk doesn't exist). Not something I want to change though.
More generally, what have I learned about multi-platform development from working at RAD ?
That it's horrible, really horrible, and I pray that I never have to do it again in my life. Ugh.
Just writing cross-platform code is not the issue (though that's horrible enough, solely due to stupid non-fundamental issues like the fact that struct packing isn't standardized, adding signed ints isn't standardized, restrict/noalias isn't standardized, inline linkage varies greatly, etc. urg wtf etc etc). If you're just releasing some code on the net and offering it for many platforms (leaving it up to the downloaders to actually build it and test it), your life is easy. The horrible part is if you actually have to maintain machines and build systems for all those platforms, test them, be able to debug on them, keep all the sdk's up to date, etc. etc.
(in general coding is easy when you don't actually test your code and make sure it works well, which a surprising number of people think is "done"; hey it compiles, I'm done! umm, no...)
(I guess that's a more general life thing; I observe a lot of people who just do things and don't actually measure whether the "doing" was successful or done well, but they just move on and are generally happy. People who stress over whether what they're doing is actually a good job or not are massively less happy but also actually do good work.)
I feel like I spend 90% of my time on stupid fucking non-algorithmic issues like this Linux partition resizing shit (probably more like 20%, but that's still frustratingly high). The regression tests are failing on Linux, okay have to figure out why, oh it's because the VM disk is too small, okay how do I fix that; or the PS4 compiler has a bug I have to work around, or the system software on this system has a bug, or the Mac clang wants to spew pointless warnings about anonymous namespaces, or my tests aren't working on Xenon .. spend some time digging .. oh the console is just turned off, or the IP changed or it got reflashed and my SDK doesn't work anymore, and blah blah fucking blah. God dammit I just want to be able to write algorithms. I miss coding, I miss thinking about hard problems. Le sigh.
I've written before about how in my imagination I could hire some kid for $50k to do all this shit work for me and it would be a huge win for me overall. But I'm afraid it's not that easy in reality.
What really should exist is a "coder cloud" service. There should be a bunch of VMs of different OS'es with various compilers and SDKs installed, so I can just say "build my shit for X with Y". Of course you need to be able to run tests on that system as well, and if something goes wrong you need remote desktop for interactive debugging. It's got to have every platform, including things like game consoles where you need license agreements, which is probably a no-go in reality because corporations are jerks. There's got to be superb customer service, because if I can't rely on it for builds at every moment of every day then it's a no-go. Unfortunately, programmers are almost uniformly moronic about this kind of thing (in that they massively overestimate their own ability to manage these things quickly) so wouldn't want to pay what it costs to run that service.
cbloom rants 11-28-11 - Some lock-free rambling
cbloom rants 11-30-11 - Some more Waitset notes
cbloom rants 12-08-11 - Some Semaphores
In particular, two big things occurred to me :
1. I talked before about the "passing on the signal" issue. See the above posts for more in depth details,
but in brief the issue is if you are trying to do NotifyOne (instead of NotifyAll), and you have a double-check
waitset like this :
{
waiter = waitset.prepare_wait(condition);
if ( double check )
{
waiter.cancel();
}
else
{
waiter.wait();
// possibly loop and re-check condition
}
}
then if you get a signal between prepare_wait and cancel, you didn't need that signal, so a wakeup of
another thread that did need that signal can be missed.
Now, I talked about this before as an "ugly hack", but over time thinking about it, it doesn't seem so bad. In particular, if you put the resignal inside the cancel() , so that the client code looks just like the above, it doesn't need to know about the fact that the resignal mechanism is happening at all.
So, the new concept is that cancel atomically removes the waiter from the waitset and sees if it got a signal
that it didn't consume. If so, it just passes on that signal. The fact that this is okay and not a hack
came to me when I thought about under what conditions this actually happens. If you recall from the earlier
posts, the need for resignal comes from situations like :
T0 posts sem , and signals noone
T1 posts sem , and signals T3
T2 tries to dec count and sees none, goes into wait()
T3 tries to dec count and gets one, goes into cancel(), but also got the signal - must resignal T2
the thing is this can only happen if all the threads are awake and racing against each other (it requires
a very specific interleaving); that is,
the T3 in particular that decs count and does the resignal had to be awake anyway (because its first check
saw no count, but its double check did dec count, so it must have raced with the sem post). It's not like you
wake up a thread you shouldn't have and then pass it on. The thread wakeup scheme is just changed
from :
T0 sem.post --wakes--> T2 sem.wait
T1 sem.post --wakes--> T3 sem.wait
to :
T0 sem.post
T1 sem.post --wakes--> T3 sem.wait --wakes--> T2 sem.wait
that is, one of the consumer threads wakes its peer. This is a tiny performance loss, but it's a pretty
rare race, so really not a bad thing.
The whole "double check" pathway in waitset only happens in a race case. It occurs when one thread sets the condition you want right at the same time that you check it, so your first check fails and after you prepare_wait, your second check passes. The resignal only occurs if you are in that race path, and also the setting thread sent you a signal between your prepare_wait and cancel, *and* there's another thread waiting on that same signal that should have gotten it. Basically this case is quite rare, we don't care too much about it being fast or elegant (as long as it's not disastrously slow), we just need behavior to be correct when it does happen - and the "pass on the signal" mechanism gives you that.
The advantage of being able to do just a NotifyOne instead of a NotifyAll is so huge that it's worth adopting this as standard practice in waitset.
2. It then occurred to me that the waitset PrepareWait and Cancel could be made lock-free pretty trivially.
Conceptually, they are made lock free by turning them into messages. "Notify" is now the receiver of messages
and the scheme is now :
{
waiter w;
waitset : send message { prepare_wait , &w, condition };
if ( double check )
{
waitset : send message { cancel , &w };
return;
}
w.wait();
}
-------
{
waitset Notify(condition) :
first consume all messages
do prepare_wait and cancel actions
then do the normal notify
eg. see if there are any waiters that want to know about "condition"
}
The result is that the entire wait-side operation is lock free. The notify-side still uses a lock to
ensure the consistency of the wait list.
This greatly reduces contention in the most common usage patterns :
Mutex :
only the mutex owner does Notify
- so contention of the waitset lock is non-existant
many threads may try to lock a mutex
- they do not have any waitset-lock contention
Semaphore :
the common case of one producer and many consumers (lots of threads do wait() )
- zero contention of the waitset lock
the less common case of many producers and few consumers is slow
Another way to look at it is instead of doing little bits of waitlist maintenance in three
different places (prepare_wait,notify,cancel) which each have to take a lock on the list,
all the maintenance is moved to one spot.
Now there are some subtleties.
If you used a fresh "waiter" every time, things would be simple. But for efficiency you don't want to do that. In fact I use one unique waiter per thread. There's only one OS waitable handle needed per thread and you can use that to implement every threading primitive. But now you have to be able to recycle the waiter. Note that you don't have to worry about other threads using your waiter; the waiter is per-thread so you just have to worry about when you come around and use it again yourself.
If you didn't try to do the lock-free wait-side, recycling would be easy. But with the lock-free wait side there are some issues.
First is that when you do a prepare-then-cancel , your cancel might not actually be done for a long time (it was just a request). So if you come back around on the same thread and call prepare() again, prepare has to check if that earlier cancel has been processed or not. If it has not, then you just have to force the Notify-side list maintenance to be done immediately.
The second related issue is that the lock-free wait-side can give you spurious signals to your waiter. Normally prepare_wait could clear the OS handle, so that when you wait on it you know that you got the signal you wanted. But because prepare_wait is just a message and doesn't take the lock on the waitlist, you might actually still be in the waitlist from the previous time you used your waiter. Thus you can get a signal that you didn't want. There are a few solutions to this; one is to allow spurious signals (I don't love that); another is to detect that the signal is spurious and wait again (I do this). Another is to always just grab the waitlist lock (and do nothing) in either cancel or prepare_wait.
Ok, so we now have a clean waitset that can do NotifyOne and gaurantee no spurious signals. Let's use it.
You may recall we've looked at a simple waitset-based mutex before :
U32 thinlock;
Lock :
{
// first check :
while( Exchange(&thinlock,1) != 0 )
{
waiter w; // get from TLS
waitset.PrepareWait( &w, &thinlock );
// double check and put in waiter flag :
if ( Exchange(&thinlock,2) == 0 )
{
// got it
w.Cancel();
return;
}
w.Wait();
}
}
Unlock :
{
if ( Exchange(&thinlock,0) == 2 )
{
waitset.NotifyAll( &thinlock );
}
}
This mutex is non-recursive, and of course you should spin doing some TryLocks before going into the wait loop
for efficiency.
This was an okay way to build a mutex on waitset when all you had was NotifyAll. It only does the notify if there are waiters, but the big problem with it is if you have multiple waiters, it wakes them all and then they all run in to try to grab the mutex, and all but one fail and go back to sleep. This is a common type of unnecessary-wakeup thread-thrashing pattern that sucks really bad.
(any time you write threading code where the wakeup means "hey wakeup and see if you can grab an atomic" (as opposed to "wakeup you got it"), you should be concerned (particularly when the wake is a broadcast))
Now that we have NotifyOne we can fix that mutex :
U32 thinlock;
Lock :
{
// first check :
while( Exchange(&thinlock,2) != 0 ) // (*1)
{
waiter w; // get from TLS
waitset.PrepareWait( &w, &thinlock );
// double check and put in waiter flag :
if ( Exchange(&thinlock,2) == 0 )
{
// got it
w.Cancel(waitset_resignal_no); // (*2)
return;
}
w.Wait();
}
}
Unlock :
{
if ( Exchange(&thinlock,0) == 2 ) // (*3)
{
waitset.NotifyOne( &thinlock );
}
}
We changed the NotifyAll to NotifyOne , but two funny bits are worth noting : (*1) - we must now immediately
exchange in the waiter-flag here; in the NotifyAll case it worked to put a 1 in there for funny reasons
(see
cbloom rants 07-15-11 - Review of many Mutex implementations ,
where this type of mutex is discussed as "Three-state mutex using Event" ), but it doesn't work with the NotifyOne.
(*2) - with a mutex you do not need to pass on the signal when you stole it and cancelled. The reason is just that
there can't possibly be any more mutex for another thread to consume. A mutex is a lot like a semaphore with a maximum count of 1
(actually it's exactly like it for non-recursive mutexes);
you only need to pass on the signal when it's possible that some other thread needs to know about it.
(*3) - you might think the check for == 2 here is dumb because we always put in a 2, but there's code you're not seeing.
TryLock should still put in a 1, so in the uncontended cases the thinlock will have a value of 1 and no Notify is done. The thinlock
only goes to a 2 if there is some contention, and then the value stays at 2 until the last unlock of that contended sequence.
Okay, so that works, but it's kind of silly. With the mechanism we have now we can do a much neater mutex :
U32 thinlock; // = 0 initializes thinlock
Lock :
{
waiter w; // get from TLS
waitset.PrepareWait( &w, &thinlock );
if ( Fetch_Add(&thinlock,1) == 0 )
{
// got the lock (no need to resignal)
w.Cancel(waitset_resignal_no);
return;
}
w.Wait();
// woke up - I have the lock !
}
Unlock :
{
if ( Fetch_Add(&thinlock,-1) > 1 )
{
// there were waiters
waitset.NotifyOne( &thinlock );
}
}
The mutex is just a wait-count now. (as usual you should TryLock a few times before jumping in to the PrepareWait).
This mutex is more elegant; it also has a small performance advantage in that it only calls NotifyOne when it really needs to;
because its gate is also a wait-count it knows if it needs to Notify or not. The previous Mutex posted will always Notify on
the last unlock whether or not it needs to (eg. it will always do one Notify too many).
This last mutex is also really just a semaphore. We can see it by writing a semaphore with our waitset :
U32 thinsem; // = 0 initializes thinsem
Wait :
{
waiter w; // get from TLS
waitset.PrepareWait( &w, &thinsem );
if ( Fetch_Add(&thinsem,-1) > 0 )
{
// got a dec on count
w.Cancel(waitset_resignal_yes); // (*1)
return;
}
w.Wait();
// woke up - I got the sem !
}
Post :
{
if ( Fetch_add(&thinsem,1) < 0 )
{
waitset.NotifyOne( &thinsem );
}
}
which is obviously the same. The only subtle change is at (*1) - with a semaphore we must do the resignal,
because there may have been several posts to the sem (contrast with mutex where there can only be one Unlock at a time;
and the mutex itself serializes the unlocks).
Oh, one very subtle issue that I only discovered due to relacy :
waitset.Notify requires a #StoreLoad between the condition check and the notify call. That is, the standard
pattern for any kind of "Publish" is something like :
Publish
{
change shared variable
if ( any waiters )
{
#StoreLoad
waitset.Notify()
}
}
Now, in most cases, such as the Sem and Mutex posted above, the Publish uses an atomic RMW op. If that
is the case, then you don't need to add any more barriers - the RMW synchronizes for you. But if you do
some kind of more weakly ordered primitive, then you must force a barrier there.
This is the exact same issue that I've run into before and forgot about again :
cbloom rants 07-31-11 - An example that needs seq_cst -
cbloom rants 08-09-11 - Threading Links (see threads on eventcount)
cbloom rants 06-01-12 - On C++ Atomic Fences Part 3
Download : tabdir 320k zip
tabdir -?
usage : tabdir [opts] [dir]
options:
-v : view output after writing
-p : show progress of dir enumeration (with interactive keys)
-w# : set # of worker threads
-oN : output to name N [r:\tabdir.tab]
This new tabdir is built on Oodle so it has a multi-threaded dir lister for much greater speed. (*)
Also note to self : I fixed tabview so it works as a shell file association. I hit this all the time and always forget it : if something works on the command line but not as a shell association, it's probably because the shell passes you quotes around file names, so you need a little code to strip quotes from args.
Someday I'd like to write an even faster tabdir that reads the NTFS volume directory information directly, but chances are that will never happen.
One odd thing I've spotted with this tabdir is that the Windows SxS Assembly dirs take a ton of time to enumerate on my machine. I dunno if they're compressed or WTF the deal is with them (I pushed it on the todo stack to investigate), but they're like 10X slower than any other dir. (could just be the larger number of files in there)
I never did this before because I didn't expect multi-threaded dir enumeration to be a big win; I thought it
would just cause seek thrashing, and if you're IO bound anyway then multi-threading can't help, can it? Well,
it turns out the Win32 dir enum functions have quite a lot of CPU overhead, so multi-threading does in fact help
a bit :
# workers | elapsed time
1 | 12.327
2 | 10.450
3 | 9.710
4 | 9.130
(* = actually the big speed win was not multi-threading, it's that the old tabdir did something rather dumb in the file enum. It would enum all files, and then do GetInfo() on each one to get the file sizes. The new one just uses the file infos that are returned as part of the Win32 enumeration, which is massively faster).
You may recall, I use a worker thread system with forward "permits" (reversed dependencies) . When any handle completes it sees if that completion should trigger any followup handles, and if so those are then launched. Handles may be SPU jobs or IOs or CPU jobs or whatever. The problem I will talk about occurred when the predessor and the followup were both CPU jobs.
I'll talk about a specific case to be concrete : decoding compressed data while reading it from disk.
To decode each chunk of LZ data, a chunk-decompress job is made. That job depends on the IO(s) that read in the compressed data for that chunk. It also depends on the previous chunk if the chunk is not a seek-reset point. So in the case of a non-reset chunk, you have a dependency on an IO and a previous CPU job. Your job will be started by one or the other, whichever finishes last.
Now, when decompression was IO bound, then the IO completions were kicking off the decompress jobs, and everything was fine.
In these timelines, the second line is IO and the bottom four are workers. (click images for higher res)
LZ Read and Decompress without seek-resets, IO bound :
You can see the funny fans of lines that show the dependency on the previous decompress job and also the IO. Yellow is a thread that's sleeping.
You may notice that the worker threads are cycling around. That's not really ideal, but it's not related to the problem I'm talking about today. (that cycling is caused by the fact that the OS semaphore is FIFO. For something like worker threads, we'd actually rather have a LIFO semaphore, because it makes it more likely that you get a thread with something useful still hot in cache. Someday I'll replace my OS semaphore with my own LIFO one, but for now this is a minor performance bug). (Win32 docs say that they don't gaurantee any particular order, but in my experience threads of equal priority are always FIFO in Win32 semaphores)
Okay, now for the problem. When the IO was going fast, so we were CPU bound, it's the prior decompress job that triggers the followup work.
But something bad happened due to the forward permit system. The control flow was something like this :
On worker thread 0
wake from semaphore
do on an LZ decompress job
mark job done
completion change causes a permits check
permits check sees that there is a pending job triggered by this completion
-> fire off that handle
handle is pushed to worker thread system
no worker is available to do it, so wake a new worker and give him the job
finalize (usually delete) job I just finished
look for more work to do
there is none because it was already handed to a new worker
And it looked like this :
LZ Read and Decompress without seek-resets, CPU bound, naive permits :
You can see each subsequent decompress job is moving to another worker thread. Yuck, bad.
So the fix in Oodle is to use the "delay-kick" mechanism, which I'd already been using for coroutine refires (which had a similar problem; the problem occurred when you yielded a coroutine on something like an IO, and the IO was done almost immediately; the coroutine would get moved to another worker thread instead of just staying on the same one and continuing from the yield as if it wasn't there).
The scheme is something like this :
On each worker thread :
Try to pop a work item from the "delay kick queue"
if there is more than one item in the DKQ,
take one for myself and "kick" the remainder
(kick means wake worker threads to do the jobs)
If nothing on DKQ, pop from the main queue
if nothing on main queue, wait on work semaphore
Do your job
Set "delay kick" = true
("delay kick" has to be in TLS of course)
Mark job as done
Permit system checks for successor handles that can now run
if they exist, they are put in the DKQ instead of immediately firing
Set "delay kick" = false
Repeat
In brief : work that is made runnable by the completion of work is not fired until the worker
thread that did the completion gets its own shot at grabbing that new work. If the completion made 4 jobs
runnable, the worker will grab 1 for itself and kick the other 3. The kick is no longer in the completion
phase, it's in the pop phase.
And the result is :
LZ Read and Decompress without seek-resets, CPU bound, delay-kick permits :
Molto superiore.
These last two timelines are on the same time scale, so you can see just from the visual that eliminating the unnecessary thread switching is about a 10% speedup.
Anyway, this particular issue may not apply to your worker thread system, or you may have other solutions. I think the main take-away is that while worker thread systems seem very simple to write at first, there's actually a huge amount of careful fiddling required to make them really run well. You have to be constantly vigilant about doing test runs and checking threadprofile views like this to ensure that what's actually happening matches what you think is happening. Err, oops, I think I just accidentally wrote an advertisement for Telemetry .
BC1 (DXT1/S3TC) DDS textures :
All compressors run in max-compress mode. Note that it's not entirely fair because Oodle has the BC1 swizzle and the others don't.
Some day I'd like to do a BC1-specific encoder. Various ideas and possibilities there. Also RD-DXTC.
I also did a WAV filter. This one is particularly ridiculous because nobody uses WAV, and if you want to compress audio you should use a domain-specific compressor, not just OodleLZ with a simple delta filter. I did it because I was annoyed that RAR beat me on WAVs (due to its having a multimedia filter), and RAR should never beat me.
WAV compression :
See also : same chart with 7z (not really fair cuz 7z doesn't have a WAV filter)
Happy to see that Oodle-filt handily beats RAR-filt. I'm using just a trivial linear gradient predictor :
out[i] = in[i] - 2*in[i-1] + in[i-2]
this could surely be better, but whatever, WAV filtering is not important.
I also did a simple BMP delta filter and EXE (BCJ/relative-call) transform. I don't really want to get into the business of offering all kinds of special case filters the way some of the more insane modern archivers do (like undoing ZLIB compression so you can recompress it, or WRT), but anyhoo there's a few.
ADDED : I will say something perhaps useful about the WAV filter.
There's a bit of a funny issue because the WAV data is 16 bit (or 24 or 32), and the back-end entropy coder in a simple LZ is 8 bit.
If you just take a 16-bit delta and put it into bytes, then most of your values will be around zero,
and you'll make a stream like :
[00 00] [00 01] [FF FF] [FF F8] [00 04] ...
The bad thing you should notice here are the high bytes are switching between 00 and FF even though the
values have quite a small range. (Note that the common thing of centering the values with +32768 doesn't
change this at all).
You can make this much better just by doing a bias of +128. That makes it so the most important range of values (around zero (specifically [-128,127])) all have the same top byte.
I think it might be even slightly better to do a "folded" signed->unsigned map, like
{ 0,-1,1,-2,2,-3,...,32767,-32768 }
The main difference being that values like -129 and +128 get the same high byte in this mapping, rather
than two different high bytes in the simple +128 bias scheme.
Of course you really want a separate 8-bit huffman for alternating pairs of bytes. One way to get that is to use a few bottom bits of position as part of the literal context. Also, the high byte should really be used as context for the low byte. But both of those are beyond the capabilities of my simple LZ-huffs so I just deinterleave the high and low bytes to two streams.
You have some testset {T} of many items, and you wish to fit some heuristic model M over T which has some parameters. There may be multiple forms of the model and you aren't sure which is best, so you wish to compare models against each other.
For concreteness, you might imagine that T is a bunch of images, and you are trying to make a perceptual DXTC coder; you measure block error in the encoder as something like (SSD + a * SATD ^ b + c * SSIM_8x8 ) , and the goal is to minimize the total image error in the outer loop, measured using something complex like IW-MS-SSIM or "MyDCTDelta" or whatever. So you are trying to fit the parameters {a,b,c} to minimize an error.
For reference, the naive training method is : run the model on all data in {T}, optimize parameters to minimize error over {T}.
The method of random holdouts goes like this :
Run many trials
On each trial, take the testset T and randomly separate it into a training set and a verification set.
Typically training set is something like 75% of the data and verification is 25%.
Optimize the model parameters on the {training set} to minimize the error measure over {training set}.
Now run the optimized model on the {verification set} and measure the error there.
This is the error that will be used to rate the model.
When you make the average error, compensate for the size of the model thusly :
average_error = sum_error / ( [num in {verification set}] - [dof in model] )
Record the optimal parameters and the error for that trial
Now you have optimized parameters for each trial, and an error for each trial. You can take the average over
all trials, but you can also take the sdev. The sdev tells you how well your model is really working - if it's
not close to zero then you are missing something important in your model. A term with a large sdev might just be
a random factor that's not useful in the model, and you should try again without it.
The method of random holdouts reduces over-training risk, because in each run you are measuring error only on data samples that were not used in training.
The method of random holdouts gives you a decent way to compare models which may have different numbers of DOF. If you just use the naive method of training, then models with more DOF will always appear better, because they are just fitting your data set.
That is, in our example say that (SSD + a * SATD ^ b) is actually the ideal model and the extra term ( + c * SSIM_8x8 ) is not useful. As long as it's not just a linear combo of other terms, then naive training will find a "c" such that that term is used to compensate for variations in your particular testset. And in fact that incorrect "c" can be quite a large value (along with a negative "a").
This kind of method can also be used for fancier stuff like building complex models from ensembles of simple models, "boosting" models, etc. But it's useful even in this case where we wind up just using a simple linear model, because you can see how it varies over the random holdouts.
There's a fundamental principal of any healthy (*) market that the reward for some labor is equal across all fields - proportional only to standard factors like the risk factor, the scarcity of labor, the capital required for entry, etc. (* = more on "healthy" later). The point is that those factors have *nothing* to do with the details of the field.
The basic factor at play is that if some field changes and suddenly becomes much more profitable, then people will flood into that field, and the risk-equal-capital-return will keep going down until it becomes equal to other fields. Water flows downhill, you know.
When people like Alan Greenspan try to tell you that oh this new field is completely unlike anything we've seen in the past because of blah blah - it doesn't matter, they may have lots of great points that seem reasonable in isolation, but the equilibrium still applies. The pay of a computer programmer is set by the pay of a farmer, because if the difference were out of whack, the farmer would quit farming and start programming; they pay of programmers will go down and the wages of farmers will go up, then the price of lettuce will go up, and in the end a programmer won't be able to buy any more lettuce than anyone else in a similar job. ("similar" only in terms of risk, ease of entry, rarity of talent, etc.)
We went through a drive-through car wash yesterday and Tasha idly wondered how much the car wash operator makes from an operation like that. Well, I bet it's about the same as a quick-lube place makes, and that's about the same as a dry cleaner, and it's about the same as a pizza place (which has less capital outlay but more risk), because if one of them was much more profitable, there would be more competition until equilibrium was reached.
Specifically I've been thinking about this because of the current indie game boom on the PC, which seems to be a bit of a magic gold rush at the moment. That almost inevitably has to die out, it's just a question of when. (so hurry up and get your game out before it does!).
But of course that leads us into the issue of broken markets, since all current game sales avenues are deeply broken markets.
Equilibrium (like most naive economic theory) only applies to markets where there's fluidity, robust competition, no monopolistic control, free information, etc. And of course those don't happen in the real world.
Whenever a market is not healthy, it provides an opportunity for unbalanced reward, well out of equilibrium.
Lack of information can be particularly be a factor in small niches. There can be a company that does something random like make height-adjustable massage tables. If they're a private operation and nobody really pays attention to them, they can have super high profit levels for something that's not particularly difficult - way out of equilibrium. If other people knew how easy that business was, lots of others would enter, but due to lack of information they don't.
Patents and other such mechanisms that create legally enforced distortions of the market. Of course things like the cable and utility systems are even worse.
On a large scale, government distortion means that huge fields like health care, finance, insurance, oil, farming, etc. are all more profitable than they should be.
Perhaps the biggest issue in downloadable games is the oligopoly of App Store and Steam. This creates an unhealthy market distortion and it's hard to say exactly what the long term affect of that will be. (of course you don't see it as "unhealthy" if you are the one benefiting from the favor of the great powers; it's unhealthy in a market fluidity and fair competition sense, and may slow or prevent equilibrium)
Of course new fields are not yet in equilibrium, and one of the best ways to "get rich quick" is to chase new fields. Software has been out of equilibrium for the past 50 years, and is only recently settling down. Eventually software will be a very poorly paid field, because it requires very little capital to become a programmer, it's low risk, and there are lots of people who can do it.
Note that in *every* field the best will always rise to the top and be paid accordingly.
Games used to be a great field to work in because it was a new field. New fields are exciting, they offer great opportunities for innovation, and they attract the best people. Mature industries are well into equilibrium and the only chances for great success are through either big risk, big capital investment, or crookedness.
1. Programming is dead. There were basically zero programming talks at GDC this year. That's sad, but also perfectly reasonable since programming is not the problem any more (*). (* = assuming that you just want to make the same old shit with different graphics)
2. Piece of shit mobile games that people have thrown together in a month look better than AAA games 10 years ago. It's not just that GPU's are so much better, but the free engines are really amazing these days, and the content pipes are so much better, and there are so many more decent 3d artists that can just make tons of content.
3. Game developers look like human beings now. If you looked at a GDC when I first started going, we were all classic troglodyte nerds; unwashed sweatshirts and open backpacks with slide-rules falling out. We were all vampirically pale from being locked in a dark box surrounded by our giant CRTs. (more generally I'm noticing that the average fitness level (on the west coast anyway) is way up in the past 5 years or so).
4. Mobile is dead, downloadable is king. I do an unscientific random sampling every year just by asking the people who stop by the RAD booth what they're working on. For the past few years it has been mobile mobile "we're making a game for ios and android", tons of kids and startups and indies trying to get into mobile. That seems to be gone, and the new gold rush is "downloadable" (PC, XBLA, etc).
5. Games are tacky and tasteless. One of the worst things for me standing at the booth is just hearing and seeing games all day. I don't play games much, I never watch TV with commercials, and I never watch things like cable news with all the excessive HUD and overstimulation, I find all that stuff abusive of my senses. Games are stuck in this awful "bling bling whoosh blammo" flashing and fast-cuts and just really tacky aesthetic. It's just like TV ads, or a bit like standing in the slot machine section of a casino (which is surely some level of hell).
6. I saw one really amazing game at GDC that stood out from the rest. It had all the players instantly smiling and laughing. It was fun for kids and adults. It created a feeling of group affinity. Everyone around wanted to join in. It was even beneficial to the body. It was an inflatable ball. Personally I had the "holy shit what we make is total crap" (actually worse than crap, because it's actively harmful to the body and mind) epiphany some 10+ years ago, but it just struck me so hard standing there with all these shit games around and people having so much more fun in the most basic game in the non-electronic world.
cbloom rants 08-01-11 - A game threading model
cbloom rants 12-03-11 - Worker Thread system with reverse dependencies
cbloom rants 03-05-12 - Oodle Handle Table
cbloom rants 03-08-12 - Oodle Coroutines
cbloom rants 06-21-12 - Two Alternative Oodles
cbloom rants 07-19-12 - Experimental Futures in Oodle
cbloom rants 10-26-12 - Oodle Rewrite Thoughts
cbloom rants 12-18-12 - Async-Await ; Microsoft's Coroutines
cbloom rants 12-21-12 - Coroutine-centric Architecture
cbloom rants 12-21-12 - Coroutines From Lambdas
cbloom rants 12-06-12 - Theoretical Oodle Rewrite Continued
cbloom rants 02-23-13 - Threading - Reasoning Behind Coroutine Centric Design
My contention is not that this is "the ultimate solution". I believe it's a good architecture using the techniques that we currently have available, without doing anything that I consider bananas like writing your own programming language (*). Of course if you are platform-specific or know you can use C++11 there are small ways to make things more convenient, but the fundamental architecture would be about the same (and assuming that you will never need to port to a broken platform is a mistake I know well).
(* = a lot of people that I consider usually smart seem to think that writing a custom language is a great solution for lots of problems. Whenever we're talking about "oh reflection in C is a mess" or "dependency analysis should be automatic", they'll throw out "well if you had the time you would just write a custom language that does all this better". Would you? I certainly wouldn't. I like using tools that actually work, that new hires are familiar with, etc. etc. I don't have to list the pros of sticking with standard languages. In my experience every clever custom language for games is a huge fucking disaster and I would never advocate that as a good solution for any problem)
The easy way to load many file formats (I'll use a BMP here to be concrete) is just to point a
struct at it :
struct BITMAPFILEHEADER
{
U16 bfType;
U32 bfSize;
U16 bfReserved1;
U16 bfReserved2;
U32 bfOffBits;
} __attribute__ ((__packed__));
BITMAPFILEHEADER * bmfh = (BITMAPFILEHEADER *)data;
if ( bmfh->bfType != 0x4D42 )
ERROR_RETURN("not a BM",0);
etc..
but of course this doesn't work cross platform.
So people do all kinds of convoluted things (which I have usually done), like changing to a
method like :
U16 bfType = Get16LE(&ptr);
U32 bfSize = Get32LE(&ptr);
or they'll do some crazy struct-parse fixup thing which I've always found to be bananas.
But there's a super trivial and convenient solution :
struct BITMAPFILEHEADER
{
U16LE bfType;
U32LE bfSize;
U16LE bfReserved1;
U16LE bfReserved2;
U32LE bfOffBits;
} __attribute__ ((__packed__));
where U16LE is just U16 on little-endian platforms and is a class that does bswap on itself on big-endian
platforms.
Then you can still just use the old struct-pointing method and everything just works. Duh, I can't believe I didn't think of this earlier.
Similarly, here's a WAV header :
struct WAV_header_LE
{
U32LE FOURCC_RIFF; // RIFF Header
U32LE riffChunkSize; // RIFF Chunk Size
U32LE FOURCC_WAVE; // WAVE Header
U32LE FOURCC_FMT; // FMT header
U32LE fmtChunkSize; // Size of the fmt chunk
U16LE audioFormat; // Audio format 1=PCM,6=mulaw,7=alaw, 257=IBM Mu-Law, 258=IBM A-Law, 259=ADPCM
U16LE numChan; // Number of channels 1=Mono 2=Sterio
U32LE samplesPerSec; // Sampling Frequency in Hz
U32LE bytesPerSec; // bytes per second
U16LE blockAlign; // normall NumChan* bytes per sample
U16LE bitsPerSample; // Number of bits per sample
} __attribute__ ((__packed__));;
easy.
For file-input type structs, you just do this and there's no penalty. For structs you keep in memory you wouldn't want to eat the bswap all the time, but even in that case this provides a simple way to get the swizzle into native structs by just copying all the members over.
Of course if you have the Reflection-Visitor system that I'm fond of, that's also a good way to go. (cursed C, give me a "do this macro on all members").
Recently I've been fixing up a bunch of code that does things like
void MutexLock( Mutex * m )
{
if ( ! m ) return;
...
yikes. Invalid argument and you just silently do nothing. No thank you.
We should all know that silently nopping in failure cases is pretty horrible. But I'm also dealing with a lot of error code returns, and it occurs to me that returning an error code in that situation is not much better.
Personally I want unexpected or unhandleable errors to just blow up my app. In my own code I would just assert; unfortunately that's not viable in OS code or perhaps even in a library.
The classic example is malloc. I hate mallocs that return null. If I run out of memory, there's no way I'm handling it cleanly and reducing my footprint and carrying on. Just blow up my app. Personally whenever I implement an allocator if it can't get memory from the OS it just prints a message and exits (*).
(* = aside : even better is "functions that don't fail" which I might write more about later; basically the idea is the function tries to handle the failure case itself and never returns it out to the larger app. So in the case of malloc it might print a message like "tried to alloc N bytes; (a)bort/(r)etry/return (n)ull?". Another common case is when you try to open a file for write and it fails for whatever reason, it should just handle that at the low level and say "couldn't open X for write; (a)bort/(r)etry/change (n)ame?" )
I think error code returns are okay for *expected* and *recoverable* errors.
On functions that you realistically expect to always succeed and will not check error codes for, they shouldn't return error codes at all. I wrote recently about wrapping system APIs for portable code ; an example of the style of level 2 wrapping that I like is to "fix" the error returns.
(obviously this is not something the OS should do, they just have to return every error; it requires app-specific knowledge about what kind of errors your app can encounter and successfully recover from and continue, vs. ones that just mean you have a catastrophic unexpected bug)
For example, functions like lock & unlock a mutex shouldn't fail (in my code). 99% of the user code in the world that locks and
unlocks mutexes doesn't check the return value, they just call lock and then proceed assuming the lock succeeded - so don't return it :
void mypthread_mutex_lock(mypthread_mutex_t *mutex)
{
int ret = pthread_mutex_lock(mutex);
if ( ret != 0 )
CB_FAIL("pthread_mutex_lock",ret);
}
When you get a crazy unexpected error like that, the app should just blow up right at the call site (rather
than silently failing and then blowing up somewhere weird later on because the mutex wasn't actually locked).
In other cases there are a mix of expected failures and unexpected ones, and the level-2 wrapper should differentiate
between them :
bool mysem_trywait(mysem * sem)
{
for(;;)
{
int res = sem_trywait( sem );
if ( res == 0 ) return true; // got it
int err = errno;
if ( err == EINTR )
{
// UNIX is such balls
continue;
}
else if ( err == EAGAIN )
{
// expected failure, no count in sem to dec :
return false;
}
else
{
// crazy failure; blow up :
CB_FAIL("sem_trywait",err);
}
}
}
(BTW best practice these days is always to copy "errno" out to an int, because errno may actually be
#defined to a function call in the multithreaded world)
And since I just stumbled into it by accident, I may as well talk about EINTR. Now I understand that there may be legitimate reasons why you *want* an OS API that's interrupted by signals - we're going to ignore that, because that's not what the EINTR debate is about. So for purposes of discussion pretend that you never have a use case where you want EINTR and it's just a question of whether the API should put that trouble on the user or not.
I ranted about EINTR at RAD a while ago and was informed (reminded) this was an ancient argument that I was on the wrong side of.
Mmm. One thing certainly is true : if you want to write an operating system (or any piece of software) such that it is easy to port to lots of platforms and maintain for a long time, then it should be absolutely as simple as possible (meaning simple to implement, not simple in the API or simple to use), even at the cost of "rightness" and pain to the user. That I certainly agree with; UNIX has succeeded at being easy to port (and also succeeded at being a pain to the user).
But most people who argue on the pro-EINTR side of the argument are just wrong; they are confused about what the advantage of the pro-EINTR argument is (for example Jeff Atwood takes off on a general rant against complexity ; I think we all should know by now that huge complex APIs are bad; that's not interesting, and that's not what "Worse is Better" is about; or Jeff's example of INI files vs the registry - INI files are just massively better in every way, it's not related at all, there's no pro-con there).
(to be clear and simple : the pro-EINTR argument is entirely about simplicity of implementation and porting of the API; it's about requiring the minimum from the system)
The EINTR-returning API is not simpler (than one that doesn't force you to loop). Consider an API like this :
U64 system( U64 code );
doc :
if the top 32 bits of code are 77 this is a file open and the bottom 32 bits specify a device; the
return values then are 0 = call the same function again with the first 8 chars of the file name ...
if it returns 7 then you must sleep at least 1 milli and then call again with code = 44 ...
etc.. docs for 100 pages ...
what you should now realize is that *the docs are part of the API*. (that is not a "simple" API)
An API that requires you to carefully read about the weird special cases and understand what is going on inside the system is NOT a simple API. It might look simple, but it's in disguise. A simple API does what you expect it to. You should be able to just look at the function signature and guess what it does and be right 99% of the time.
Aside from the issue of simplicity, any API that requires you to write the exact same boiler-plate every time you use it is just a broken fucking API.
Also, I strongly believe that any API which returns error codes should be usable if you don't check the error code
at all. Yeah yeah in real production code of course you check the error code, but for little test apps you
should be able to do :
int fd = open("blah");
read(fd,buf);
close(fd);
and that should work okay in my hack test app. Nope, not in UNIX it doesn't. Thanks to its wonderful "simplicity"
you have to call "read" in a loop because it might decide to return before the whole read is done.
Another example that occurs to me is the reuse of keywords and syntax in C. Things like making "static" mean something completely different depending on how you use it makes the number of special keywords smaller. But I believe it actually makes the "API" of the language much *more* complex. Instead of having intuitive and obvious separate clear keywords for each meaning that you could perhaps figure out just by looking at them, you instead have to read a bunch of docs and have very technical knowledge of the internals of what the keywords mean in each usage. (there are legitimate advantages to minimizing the number of keywords, of course, like leaving as many names available to users as possible). Knowledge required to use an API is part of the API. Simplicity is determined by the amount of knowledge required to do things correctly.
The Oodle web site just went live a few days ago.
Sometimes I feel embarassed (ashamed? humiliated?) that it's taken me five years to write a file IO and data compression library. Other times I think I've basically written an entire OS by myself (and all the docs, and marketing materials, and a video compressor, and aborted paging engine, and a bunch of other crap) and that doesn't sound so bad. I suppose the truth is somewhere in the middle. (perhaps with Oodle finally being officially released and selling, I might write a little post-mortem about how it's gone, try to honestly look back at it a bit. (because lord knows what I need is more introspection in my life)).
Oodle 1.1 will be out any day now. Main new features :
Lots more platforms. Almost everything except mobile platforms now.
LZNIB! I think LZNIB is pretty great. 8X faster to decode than ZLIB and usually
makes smaller files.
Other junk :
All the compressors can run parallel encode & decode now.
Long-range-matcher for LZ matching on huge files (still only in-memory though).
Incremental compressors for online transmission, and faster resets.
Personally I'm excited the core architecture is finally settling down, and we have a more focused
direction to go forward, which is mainly the compressors. I hope to be able to work on some new
compressors for 1.2 (like a very-high-compression option, which I currently don't have), and then
eventually move on to some image compression stuff.
Any time you are in a work item, if you decide that you can get some more parallelism by doing a branch-merge inside that item, you need deep yield.
Remember you should never ever do an OS wait on a coroutine thread (with normal threads anyway; on a WinRT threadpool thread you can). The reason is the OS wait disables that worker thread, so you have one less. In the worst case, it leads to deadlock, because all your worker threads can be asleep waiting on work items, and with no worker threads they will never get done.
Anyway, I've cooked up a temporary work-around, it looks like this :
I'm in some function and I want to branch-merge
If I'm not on on a worker thread
-> just do a normal branch-merge, send the work off and use a Wait for completion
If I am on a worker thread :
inc target worker thread count
if # currently live worker threads is < target count
start a new worker thread (either create or wake from pool)
now do the branch-merge and use OS Wait
dec the target worker thread count
on each worker thread, after completing a work item and before popping more work :
if target worker thread count < currently live count
stop self (go back into a sleeping state in the pool)
this is basically using OS threads to implement stack-saving deep yield. It's not awesome,
but it is okay if deep yield is rare.
I am so fucking bored of graphics. Graphics are not the damn problem. I'm completely appalled by the derivative repetitive boring games you all keep making. I don't want to play "Shoot People in the Face 227" or "Space Marines 154" or "Slide Blocks to Make them Go Bling N" or "Cute Creatures Jump Around on Blocks N". Barf, boring. And making them all shiny with new graphics is just gilding the turd. Stop working on graphics.
Games have huge tech problems that nobody seems to want to work on. One that I have wanted to work on for a long time is animation. And by "animation" I don't really mean playing back clips, which fundamentally looks like garbage, but making characters move naturally, able to transition movements the way their body should, respond to surface variations and so on. Game animation just looks so awful, and it's becoming more uncanny as the graphics get better.
(in fact if we were smart we would have done it the other way around. Every cartoonist for a hundred years has known that it's actually ok for the visuals to look unrealistic if the animation and sound are really good. Human perception cares more about motion than the static appearance of things.)
Anyhoo, the other big one is AI. And by "AI" I don't mean playing scripts, or moving to designer-placed cover spots. Even some of the more sophisticated game AI systems are really just fancy whack-a-mole. You can see the AI's run to one spot, do a pre-programmed routine, run to another spot, pop out of cover so the player can shoot me, pop back in cover. Now, certainly there are merits to whack-a-mole AI. If you're making a platformer you don't want the enemy to do surprising things, you just want them to walk back and forth on a set pattern that the player can pick up easily. They're not really AI at all, they're rigid bodies with an animal painted on them.
These AI's never surprise you, they never make you laugh, they never make you want to play again because they might do something new. They feed off your energy and don't give anything back, like a bad conversation partner.
So it made me realize that game AIs are actually more interesting when the game is very simple. It might naively seem like a big complex sandbox 3d world has got a more complex AI, but really that complex world means that the AI no longer understands what it's doing. Your only hope is to give it simple rules to follow about what it can do in that world.
In contrast, AI for simple game systems (chess, checkers, backgammon, poker) can do amazing things that the human programmer never anticipated. There's a funny thing that happens with computer algorithms where a cold rational scientific brute-force search of a mathematical problem space actually leads to behavior that's more human than the shitty heuristic decision-tree type programming that's explicitly trying to simulate human behavior.
For example, when I was writing poker AI, I was really amazed at the "creative" plays that a simulation-based bot makes. (for review : a standard UAlberta-style poker bot works by building a model of the opponent based on observation of previous action; it then simulates all the possibilities for future cards and imagines what the opponent will do in each situation; it sums the EV over all paths for each of its own actions, and chooses the action that maximizes EV).
At the simplest level, it figures out things like check-raising when you tend to bet checked flops too much. But it did even weirder things. For example the bot would very quickly become hyper-aggressive against an opponent that folds even slightly too much; it adjusted faster and way more severely than any human. I would play against it sometimes with our cards face up so that I could make sure it was doing sane things, and I would see it make a huge check-raise bluff on the river with junk. My first thought is "I have a bug" and I'd go looking into the stats of the model, and found that there was no bug, it's just that the AI had learned that I thought a big river raise meant strength, so I was folding to them a lot, and therefore the simulation will jam almost every hand.
This type of poker AI is not the game theoretic equilibrium solution. It's assuming that the opponent plays by some scheme which may not be optimal, and that its own strategy is not face up. That can lead it to make mistakes. One I've long been aware of is that it doesn't hedge correctly. Normal humans hedge all the time in their poker play, perhaps too much; you will often suspect that someone is bluffing a huge percent of the time, but you aren't sure. A non-hedging AI would immediately start making very light call-downs, but a cautious human will weight in some factor for the model being wrong and play with a blended strategy that's not disastrous if the model is wrong (like only doing the light call-down in small pots, or waiting for a call-down with a hand that has some chance of being best even if the model is wrong).
Continuing the random rambling train of thought, I just realized (re-realized?) that one of the flaws with this style of poker AI is that it doesn't anticipate the reaction to its moves. Of course it does anticipate the reaction just in terms of "if I bet, what hands will he call with or raise with", but it is evaluating based on the *past* model of the opponent. After you make your bet, the opponent sees it and adjusts their view of you, so you need to be anticipating how their play style changes. For example in the case I mentioned above - when someone is playing pretty weak/tight the bot rapidly becomes hyper-aggressive, which is mostly good, but the bot never gets the idea that "hey he can see I'm raising every single street of every hand, he's going to adjust and call me down more".
Anyway, bringing it back to games, it occurred to me that it would be interesting to try some really simple 2d games, and give them a mathematical solving AI, instead of the usual heuristic crap we do. Like, let's face facts - we can't actually make games in these big free form 3d worlds, it's too complex. Our ability to do the graphics has gotten way beyond every other aspect. We need to back up and go to like Ultima-style 2d tile-based games. Now you have a space where the AI can just explore future actions, and things like advancing on the player by moving from cover to cover just pops out of the behavior automatically because it maximizes EV, not because it was explicitly coded.
(I'm not contending that this is the "right way" to make games or that it will necessarily make good games, I just thought it was interesting)
(Also remember there are a lot of other issues with Sleep(n) ; the times are only reliable here because this is in a no-op test app)
This actually started because I was looking into Linux thread sleep timing, so I wrote a little test to just Sleep(n) a bunch of times and measure the observed duration of the sleep.
(Of course on Windows I do timeBeginPeriod(1) and bump my thread to very high priority (and timeGetDevCaps says the minp is 1)).
Anyway, what I'm seeing is this :
Win7 :
sleep(1) : average = 0.999 , sdev = 0.035 ,min = 0.175 , max = 1.568
sleep(2) : average = 2.000 , sdev = 0.041 ,min = 1.344 , max = 2.660
sleep(3) : average = 3.000 , sdev = 0.040 ,min = 2.200 , max = 3.774
Sleep(n) averages n
duration in [n-1,n+1]
WinXP :
sleep(1) : average = 1.952 , sdev = 0.001 ,min = 1.902 , max = 1.966
sleep(2) : average = 2.929 , sdev = 0.004 ,min = 2.665 , max = 2.961
sleep(3) : average = 3.905 , sdev = 0.004 ,min = 3.640 , max = 3.927
Sleep(n) averages (n+1)
duration very close to (n+1) every time (tiny sdev)
Win8 :
sleep(1) : average = 2.002 , sdev = 0.111 ,min = 1.015 , max = 2.101
sleep(2) : average = 2.703 , sdev = 0.439 ,min = 2.017 , max = 3.085
sleep(3) : average = 3.630 , sdev = 0.452 ,min = 3.003 , max = 4.130
average no good
Sleep(n) minimum very precisely n
duration in [n,n+1] (+ a little error)
rather larger sdev
it's like completely different logic on each of my 3 machines. XP is the most precise,
but it's sleeping for (n+1) millis instead of (n) ! Win8 has a very precise min of n, but
the average and max is quite sloppy (sdev of almost half a milli, very high variation even
with nothing happening on the system). Win7 hits the average really nicely but has a large
range, and is the only one that will go well below the requested duration.
As noted before, I had a look at this because I'm running Linux in a VM and seeing very poor
performance from my threading code under Linux-VM. So I ran this experiment :
Sleep(1) on Linux :
native : average = 1.094 , sdev = 0.015 , min = 1.054 , max = 1.224
in VM : average = 3.270 , sdev =14.748 , min = 1.058 , max = 656.297
(added)
in VM2 : average = 1.308 , sdev = 2.757 , min = 1.052 , max = 154.025
obviously being inside a VM on Windows is not being very kind to Linux's threading system.
On the native box, Linux's sleep time is way more reliable than Windows (small min-max range)
(and this is just with default priority threads and SCHED_OTHER, not even using a high priority
trick like with the Windows tests above).
added "in VM2". So the VM threading seems to be much better if you let it see many fewer cores than you have. I'm running on a 4 core (8 hypercore) machine; the base "in VM" numbers are with the VM set to see 4 cores. "in VM2" is with the VM set to 2 cores. Still a really bad max in there, but much better overall.
1. Obviously you all know the best practice of using your own data types (S32 or whatever) and making macros for any kind of common operation that the standards don't handle well (like use a SEL macro instead of ?: , make a macro for ROT, etc). Never use bit-fields, make your own macros for manipulating bits within words. You also have to make your own whole macro meta-language for things not quite in the language, like data alignment, restrict/alias, etc. etc. (god damn C standard people, spend some time on the actual problems that real coders face every day. Thanks mkay). That's background and it's the way to go.
Make your own defines for SIZEOF_POINTER since stupid C doesn't give you any way to check sizeof() in a macro. You probably also want SIZEOF_REGISTER. You need your own equivalent of ptrdiff_t and intptr_t. Best practice is to use pointer-sized ints for all indexing of arrays and buffer sizes.
(one annoying complication is that there are platforms with 64 bit pointers on which 64-bit int math is very slow; for example they might not have a 64-bit multiply at all and have to emulate it. In that case you will want to use 32-bit ints for array access when possible; bleh)
Avoid using "wchar_t" because it is not always the same size. Try to explicitly use UTF16 or UTF32 in your code. You could make your own SIZEOF_WCHAR and select one or the other on the appropriate platform. (really try to avoid using wchar at all; just use U16 or U32 and do your own UTF encoding).
One thing I would add to the macro meta-language next time is to wrap every single function (and class) in my
code. That is, instead of :
int myfunc( int args );
do
FUNC1 int FUNC2 myfunc(int args );
or even better :
FUNC( int , myfunc , (int args) );
this gives you lots of power to add attributes and other munging as may be needed later on some platforms.
If I was doing this again I would use the last style, and I would have two of them, a FUNC_PUBLIC and FUNC_PRIVATE
to control linkage. Probably should have separate wrapper macros for the proto and the body.
While you're at it you may as well have a preamble in every func too :
FUNC_PUBLIC_BODY( int , myfunc , (int args) )
{
FUNC_PUBLIC_PRE
...
}
which lets you add automatic func tracing, profiling, logging, and so on.
I wish I had made several different layers of platform Id #defines. The first one you want is the lowest level, which explicitly Id's the current platform. These should be exclusive (no overlaps), something like OODLE_PLATFORM_X86X64_WIN32 or OODLE_PLATFORM_PS3_PPU.
Then I'd like another layer that's platform *groups*. For me the groups would probably be OODLE_PLATFORM_GROUP_PC , GROUP_CONSOLE,
and GROUP_EMBEDDED. Those let you make gross characterizations like on "GROUP_PC" you use more memory and have more debug systems
and such. With these mutually exclusive platform checks, you should never use an #else. That is, don't do :
#if OODLE_PLATFORM_X86X64_WIN32
.. some code ..
#else
.. fallback ..
#endif
it's much better to explicitly enumerate which platforms you want to go to which code block, and then have an
#else
#error new platform
#endif
at the end of every check. That way when you try building on new platforms that you haven't thought carefully about yet, you get
nice compiler notification about all the places where you need to think "should it use this code path or should I write a new one".
Fallbacks are evil! I hate fallbacks, give me errors.
Aside from the explicit platforms and groups I would have platform flags or caps which are non-mutually exclusive. Things like PLATFORM_FLAG_STDIN_CONSOLE.
While you want the raw platform checks, in end code I wish I had avoided using them explicitly, and instead
converted them into logical queries about the platform. What I mean is, when you just have an "#if some platform"
in the code, it doesn't make it clear why you care that's the platform, and it doesn't make it reusable.
For example I have things like :
#if PLATFORM_X86X64
// .. do string matching by U64 and xor/cntlz
#else
// unaligned U64 read may be slow
// do string match byte by byte
#endif
what I should have done is to introduce an abstraction layer in the #if that makes it clear what I am checking
for, like :
#if PLATFORM_X86X64
#define PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS 1
#elif PLATFORM_PS3
#define PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS 0
#else
#error classify me
#endif
#if PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS
// .. do string matching by U64 and xor/cntlz
#else
// unaligned U64 read may be slow
// do string match byte by byte
#endif
then it's really clear what you want to know and how to classify new platforms. It also lets you reuse
that toggle in lots of places without code duping the fiddly bit, which is the platform classification.
Note that when doing this, it's best to make high level usage-specific switches. You might be tempted to try to use platform attributes there. Like instead of "PLATFORM_SWITCH_DO_STRING_MATCH_BIGWORDS" you might want to use "PLATFORM_SWITCH_UNALIGNED_READ_PENALTY" . But that's not actually what you want to know, you want to know if on my particular application (LZ string match) it's better to use big words or not, and that might not match the low level attribute of the CPU.
It's really tempting to skip all this and abuse the switches you can see (lord knows I do it); I see (and write) lots of code that does evil things like using "#ifdef _MSC_VER" to mean something totally different like "is this x86 or x64" ? Of course that screws you when you move to another x86 platform and you aren't detecting it correctly (or when you use MSVC to make PPC or ARM compiles).
Okay, that's all pretty standard, now for the new bit :
2. I would opaque out the system APIs in two levels. I haven't actually ever done this, so grains of salt, but I'm pretty convinced it's the right way to go after working with a more standard system.
(for the record : the standard way is to make a set of wrappers that tries to behave the same on all systems, eg. that tries to hide what system you are on as much as possible. Then if you need to do platform-specific stuff you would just include the platform system headers and talk to them directly. That's what I'm saying is not good.)
In the proposed alternative, the first level would just be a wrapper on the system APIs with minimal or no behavior change. That is, it's just passing them through and standardizing naming and behavior.
At this level you are doing a few things :
2.A. Hiding the system includes from the rest of your app. System includes are often in different places, and often turn on compiler flags in nasty ways. You want to remove that variation from the rest of your code so that your main codebase only sees your own wrapper header.
2.B. Standardizing naming. For example the MSVC POSIX funcs are all named wrong; at this level you can patch that all up.
2.C. Fixing things that are slightly different or don't work on various platforms where they really should be the same. For example things like pthreads are not actually all the same on all the pthreads platforms, and that can catch you out in nasty ways. (eg. things like sem_init always failing on Mac).
Note this is *not* trying to make non-POSIX platforms look like POSIX. It's not hiding the system you're on, just wrapping it in a standard way.
2.D. I would also go ahead and add my own asserts for args and returns in this layer, because I hate functions that just return error codes when there's a catastrophic failure like a null arg or an EHEAPCORRUPT or whatever.
So once you have this wrapper you no longer call any system funcs directly from your main codebase, but you still would
be doing things like :
#if PLATFORM_WIN32
HANDLE h = platform_CreateFile( ... )
#elif PLATFORM_POSIX
int fd = platform_open( name , flags )
#else
#error unknown platform
#endif
that is, you're not hiding what platform you're on, you're still letting the larger codebase get to the low level calls,
it's just the mess of how fucked they are that's hidden a bit.
3. You then have a second level of wrapping which tries to make same-action interfaces that dont require you to know what platform you're on. Second level is written on the first level.
The second level wrappers should be as high level as necessary to opaque out the operation. For example rather than having "make temp file name" and "open file" you might have "open file with temp name", because on some platforms that can be more efficient when you know it is a high-level combined op. You don't just have "GetTime" you have "GetTimeMonotonic" , because on some platforms they have an efficient monotonic clock for you, and on other platforms/hardwares you may have to do a lot of complicated work to ensure a reliable clock (that you don't want to do in the low level timer).
When a platform can't provide a high-level function efficiently, rather than emulate it in a complex way I'd rather just not have it - not a stub that fails, but no definition at all. That way I get a compile error and in those spots I can do something different, using the level 1 APIs.
The first level wrappers are very independent of the large code base's usage, but the second level wrappers are very much specifically designed for their usage.
To be clear about the problem of making platform-hiding second layer wrappers, consider something like OpenFile(). What are the args to that? What can it do? It's hopeless to make something that works on all platforms without greatly reducing the capabilities of some platforms. And the meaning of various options (like async, temporary, buffered, etc.) all changes with platform.
If you wanted to really make a general purpose multi-platform OpenFile you would have to use some kind of "caps" query system, where you first do something like OpenFile_QueryCaps( OF_DOES_UNBUFFERED_MEAN_ALIGNMENT_IS_REQUIRED ) and it would be an ugly disaster. (and it's retarded on the face of it, because really what you're doing there is saying "is this win32" ?). The alternative to the crazy caps system is to just make the high level wrappers very limited and specific to your usage. So you could make a platform-agnostic wrapper like OpenFile_ForReadShared_StandardFlagsAndPermissions(). Then the platforms can all do slightly different things and satisfy the high level goal of the imperative in the best way for that platform.
A good second level has as few functions as possible, and they are as high level as possible. Making them very high level allows you to do different compound ops on the platform in a way that's hidden from the larger codebase.
"Rep matches" are a little weird. They help a lot, but the reason why they help depends on the file you are compressing. (rep match = repeat match, gap match, aka "last offset")
On text files, they work as interrupted matches, or "gap matches". They let you generate something like :
stand on the floor
stand in the door
stand in the door
[stand ][i][n the ][d][oor]
[off 19, len 6][1 lit][rep len 6][1 lit][off 18, len 3]
that is, you have a long match of [stand on the ] but with a gap at the 'o'.
Now, something I observed was that more than one last offset continues to help. On text the main benefit from having two last offsets is that it lets
you use a match for the gap. When the gap is not just one character but a word, you might want to use a match to put that word in, in which case
the continuation after the gap is no longer the first last-offset, it's the second one. eg.
cope
how to work with animals
how to cope with animals
[how to ][cope][ with animals]
[off 25 ][off 32][off 25 (rep2)]
You could imagine alternative coding structures that don't require keeping some number of "last offsets". (oddly, the last offset maintenance can be a large part of decode time, because maintaining an MTF list is something that CPUs do incredibly poorly). For example you could code with a scheme where you just send the entire long match, and then any time you send a long match you send a flag for "are there any gaps", and if so you then code some gaps inside the match.
The funny thing is, on binary files "last offsets" do something else which can be more important. They become the most common offsets. In particular, on highly structured binary data, they will generally be some factor of the structure size. eg. on a file that has a struct size of 36, and that struct has dwords and such in it, the last offsets will generally be things like 4,8,16,36, or 72. They provide a sort of dictionary of the most common offsets so that you can code those smaller. You are still getting the gap-match effect, but the common-offset benefit is much bigger on these files.
(aside : word-replacing transform on text really helps LZ (and everything) by removing the length variance of tokens. In particular for LZ77, word length variance breaks rep matches. There are lots of common occurances of a single replaced word in a phrase, like : "I want some stuff" -> "I want the stuff". You can't get a rep match here of [ stuff] because the offset changed because the substituted word was different length. If you do WRT first, then gap matches get these.)
Note 2 : on offset structure.
I've had it in the back of my head for quite some time now to do an LZ compressor specifically designed for structured data.
One idea I had was to use "2d" match offsets. That is, send a {dx,dy} where dx is within the record and dy is different records. Like imagine the data is in a table, dy is going back rows, dx is an offset on the row. You probably want to mod dx around the row so its range is always the same, and special case dy=0 (matches within your own record).
It occurred to me that the standard way of sending LZ offsets these days actually already does this. The normal way that good LZ's send
offsets these days is to break it into low and high parts :
low = offset & 7F;
high = offset >> 7;
or similar, then you send "high" using some kind of "NoSB" scheme (Number of Significant Bits is entropy coded, and the bits themselves
are sent raw), and you send "low" with an order-0 entropy coder.
But this is just a 2d structured record offset for a particular power-of-2 record size. It's why when I've experimented with 2d offsets I haven't seen huge wins - because I'm already doing it.
There is some win to be had from custom 2d-offsets (vs. the standard low/high bits scheme) when the record size is not a power of two.
The big problem with libraries is that you don't control how they're used. This is in contrast to game engines. Game engines are not libraries. I've worked on many game engines over the years, including ones that went out to large free user bases (Genesis 3d and Wild Tangent), and they are much much easier than libraries.
The difference is that game engines generally impose an architecture on the user. They force you to use it in a certain way. (this is of course why more advanced developers despise them so much; it sucks to have some 3rd party telling you your code architecture). It's totally acceptable if a game engine only works well when you use it in the approved way, and is really slow if you abuse it, or it could even crash if you use it oddly.
A library has to be flexible about how it's used; it can't impose a system on the user, like a certain threading model, or a certain memory management model, or even an error-handling style.
Personally when I do IO for games, I make a "tool path" that just uses stdio and is very simple and flexible, does streaming IO and text parsing and so on, but isn't shipped with the game, and I make a "game path" that only does large-block async IO that's pre-baked so you can just point at it. I find that system is powerful enough for my use, it's easy to write and use. It means that the "tool path" doesn't have to be particularly fast, and the fast game path doesn't need to support buffered character IO or anything other than big block reads.
But I can't force that model on clients, so I have to support all the permutations and I have to make them all decently fast.
A lot of times in the past I've complained about over-complicated APIs that have tons of crazy options that nobody ever needs (look at the IJG jpeg code for example). Well, now I see that often those complicated APIs were made because somebody (probably somebody important) needed those options. Of course as the library provider you can offer the complex interface and also simpler alternatives, but that has its own pitfalls of making the API bigger and more redundant (like if you offer OpenFileSimple and OpenFileComplex); in some ways it's better to only offer the complex API and make the user wrap it and reduce the parameter set to what they actually use.
There's also a sort of "liability" issue in libraries. Not exactly legal liability, but program bad behavior liability. Lots of things that would make the library easier to use and faster are naughty to do automatically. For example Oodle under Vista+ can run faster with elevated priviledge, to get access to some of the unsecure file APIs (like extending without zeroing), but it would be naughty for me to do that automatically, so instead I have to add an extra step to make the client specifically ask for that.
Optimization for me has really become a nightmare. At first I was trying to make every function fast, but it's impossible, there are just too many entry points and too many usage patterns. Now my philosophy is to make certain core functions fast, and then address problems in the bigger high level API as customers see issues. I remember as a game developer always being so pissed that all the GL drivers were specially optimized for Id. I would want to use the API in a slightly different style, and my way would be super slow, not for any good reason but just because it hadn't gotten the optimization loving of the important customer's use case.
I used to also rail about the "unnecessary" argument checking that all the 3d APIs do. It massively slows them down, and I would complain that I had ensured the arguments were good so just fucking pass them through, stop slowing me down with all your validation! But now I see that if you really do that, you will just constantly be crashing people as they pass in broken args. In fact arg validation is often the way that people figure out the API, either because they don't read the docs or because the docs are no good.
(this is not even getting into the issue of API design which is another area where I have been suitably humbled)
ADDENDUM : I guess I should mention the really obvious points that I didn't make.
1. One of the things that makes a public library so hard after release is that you can't refactor. The normal way I make APIs for myself (and for internal teams) is to sort of make an effort at a good API the first time, but it usually sucks, and you rip it out and go through big scourges of find-rep. That only works when you control all the code, the library and the consumer. It's only after several iterations that the API becomes really nice (and even then it's only nice for that specific use case, it might still suck in the wild).
2. APIs without users almost always suck. When someone goes away in a cave and works on a big new fancy library and then shows it to the world, it's probably terrible. This a problem that I think everyone at RAD faces. The code of mine that I really like is stuff that I use over and over, so that I see the flaws and when I want it to be easier to use I just go fix it.
3. There are two separate issues about what makes an API "good". One is "is it good for the user?" and one is "is it good for the library maintainer?". Often they are the same but not always.
Anyway, the main point of this post is supposed to be : the next time you complain about a bad library design, there may well be valid reasons why it is the way it is; they have to balance a lot of competing goals. And even if they got it wrong, hey it's hard.
I think maybe Obama just doesn't understand politics. Perhaps because of his youth and lack of experience in serious elected office, he seems to think he can just make a good speech to the public and the legislature will somehow see the light and bow to his finely reasoned and rationally based argument. LOL, silly Obama. The only way to actually accomplish anything progressive is through strong-arm backroom deals and dirty tricks (see eg. LBJ and FDR). You can't just beat crooks like the NRA and AMA by being *right* ; the moral high ground or rational correctness never got anyone anywhere.
Either that or he's super clever and knows that none of his stuff will ever pass, and he's just trying to make a show to look a bit progressive while intentionally only succeeding on the very pro-big-business measures.
Sometimes when I see a bit of Fox News or some Tea Party demonstration or whatever, I imagine Mr. Burns is standing just off screen whispering "and what about the taxes" or "it's big government that's doing this to you!". Can't you see that these talking points have been written by think tanks and your angry mob is just a puppet in their game?
1. Open up immigration for anyone with a PhD , MD , etc.. Not just visas - give them citizenships. You want them to stay and make their business here.
(not really on topic, but if the AMA wasn't such a bunch of fuckers we would have a super-fast-track for MD's in other countries to get a US MD)
2. Instant citizenship for any immigrant who goes to a US PhD program and graduates.
(* obviously some difficulty here because colleges would pop up just to take money and make citizens, so there has to be some control on this)
Anybody who's gone to an American science PhD program knows just how completely insane our policy is and how much amazing human talent we are letting slip away. It's fucking retarded that so many Indian and Chinese and Russian (and others) scientists come to America, get PhDs, and leave because they can't stay. Now, as I said it's already too late, and our small-mindedness and intransigence has already fucked us, because they now have decent tech economies to go home to. If we'd done this 10 years ago we could have been the tech leader of the world for a long time.
3. No limit on H1B visas. Fast track to citizenship for H1B workers. Certainly anyone who works in software knows how stupid this is. Don't let American tech companies hire the best workers in the world, and then even when we do get to hire them, don't let them stay and become assimilated US citizens. Good system guys.
Now some ranting.
The financial system of the world is in a very sick state. Since the Great Recession, things have actually gotten worse. Zero reforms have been passed to prevent instability and counter-humanist actions by the big banks. What's worse is that there has been even *more* consolidation, so the reins of power are now in the hands of the very few, and government is almost helpless against them. Furthermore, Europe's troubles have provided a great opportunity for for the big banks to redesign the finances of many european countries in their favor.
There *will* be another major collapse similar to the housing crash. I have no idea how long it will take or how it will happen exactly, but with the current economic climate it's inevitable. When government wants nothing but growth and has no stomach for regulation, the inevitable result is bubble and crash.
When there are crises, there are great opportunities for change. How have they been used?
LTCM -> should have been a wake up call about the danger of huge leverage and computer trading. Nothing done. Dot com bubble -> chance to break investment banks from financial advisors, make televised stock pumping illegal, etc. Nothing done. 9/11 -> used to strip Americans of civil liberties and provide justificiation for the Cheney/Rumsfeld project in Iraq. Obama election -> great opportunity to restore some transparency, rule of law, and civil rights. Opportunity not taken. Great Recession -> chance to restore bank separation and generally improve regulation of the financial system. Opportunity not taken. Collapse of Iceland, Ireland, Italy, Spain, etc. -> Goldman is there literally writing the conditions of the bailouts. Reduced revenue for all levels of US government -> small government schemers use it to tighten the one way ratchet.the forces of evil are far more clever about using crises. Of course they are, they have huge advantages when it comes to acting decisively in a moment of crisis (lack of morals, political power, unified organization, etc.).
The entire productive world economy is now functioning to subsidize the financial sector (and brand owners & patent holders). We let this happen party because we are all fools, and partly because they control the government, so the system is designed to make it that way.
It's absurd to think of capitalism as any kind of fair rational playing field; if capitalism was left on its own unfettered, it would very quickly become a world oligopoly, in which a few players controlled everything. (in a game design sense, capitalism is a game which is badly afflicted by "runaways" ; once one player starts winning, they become even more powerful, and soon the other players have no hope other than a massive blunder). The only way to make capitalism work decently is with a robust regulatory structure which crafts the system so that it is okay for workers and consumers. A capitalism economy is a lot like a game system, the way it plays is a direct result of the rules. We are sculptors of our own capitalism environment; it is a human political creation.
Anyway, ranting about it is pointless of course and I gave up on all that long ago, because it won't get better. Money wins. A rational realist only has one option : if the system is going to stay this way you have to work in finance (or try to get some bullshit patent so you can sell your tech company). If you don't choose to work in finance, you are choosing to subsidize people who work in finance, which is a silly thing to do.
Long ago I decided it was dumb to be angry about the way the world works, and it was pointless to try to change it. So when you identify something, the only rational thing to do is to use it for your benefit. But I can't bring myself to actually do that for some reason.
Sometimes I wonder if the whole idea of being "moral" is a trick that was used to brainwash us into being good obedient and controllable pawns. It's largely the churches that created this idea of it being so great to live a quiet life of moral goodness, when at the same time the churches themselves were in immoral ruthless power grabs. It's like one of the monkeys in the tribe convinces everyone that you shouldn't hit other monkeys when they steal your food, and then he proceeds to steal all your food. Very clever, monkey, very clever.
(Obviously those in power use the "moral" argument in a transparent and disgusting way to supress opposition; like claiming that government workers who speak out about the evils of government are "traitors" or that questioning the merits of going to war is cowardly or unpatriotic, or that regulating the financial sector in a recession is "irresponsible". Of course the way kids are taught to "respect authority" and such is just to keep them in line. Of course the entire public school system is a way of converting individuals into mindless worker drones. But those specific things are really what I'm talking about in the above paragaph).
(update : I've had a little look, and it seems to be pretty straightforward, it's an optimal parser + huff reset searcher. There are various other prior tools to do this (kzip,deflopt,defluff,etc). It's missing some of the things that I've written about before here, such as methods of dealing with the huff-parse feedback; the code looks pretty clean, so if you want a good zip-encoder code it looks like a good place to start.)
I've written these things before, but I will summarize here how to make small zips :
1. Use an exact (windowed) string matcher.
cbloom rants 09-24-12 - LZ String Matcher Decision Tree
2. Optimal parser. Optimal parsing zip is super easy because it has no "repeat match", so you can use plain old backwards scan. You do have the huffman code costs, so you have to consider at least one match candidate for each codeword length.
cbloom rants 10-10-08 - 7 - On LZ Optimal Parsing
cbloom rants 09-04-12 - LZ4 Optimal Parse
3. Deep huffman reset search. You can do this pretty easily by using some heuristics to set candidate points and then building a bottom-up tree. Zopfli seems to use a top-down greedy search. More huffman resets makes decode slower, so a good encoder should expose some kind space-speed tradeoff parameter (and/or a maximum number of resets).
cbloom rants 06-15-10 - Top down - Bottom up
cbloom rants 10-02-12 - Small note on Adaptive vs Static Modeling
4. Multi-parse. The optimal parser needs to be seeded in some way, with either initial code costs or some kind of heuristic parse. There may be multiple local minima, so the right way to do it is to run 4 seeds (or so) simultaneously with different strategies.
cbloom rants 09-11-12 - LZ MinMatchLen and Parse Strategies
5. The only unsolved bit : huffman - parse feedback. The only solution I know to this is iteration. You should use some tricks like smoothing and better handling of the zero-frequency symbols, but it's just heuristics and iteration.
One cool thing to have would be a cheap way to compute incremental huffman cost.
That is, say you have some array of symbols. The symbols have a corresponding histogram and huffman code. The full huffman cost is :
fullcost(symbol set) = cost{ transmit code lengths } + sum[n] { codelen[n] * count[n] }
that is, the cost to send the code lengths + the cost of sending all the symbols with those code lengths.
You'd like to be able to do an incremental update of fullcost. That is, if you add one more symbol to the set, what is the delta of fullcost ?
*if* the huffman code lengths don't change, then the delta is just +codelen[symbol].
But, the addition of the symbol might change the code lengths, which causes fullcost to change in several ways.
I'm not sure if there's some clever fast way to do incremental updates; like when adding the symbol pushes you over the threshold to change the huffman tree, it often only changes some small local part of the tree, so you don't have to re-sum your whole histogram, just the changed part. Then you could slide your partition point across an array and find the optimal point quite quickly.
Now some ranting.
How sad is it that we're still using zip?
I've been thinking about writing my own super-zipper for many years, but I always stop myself because WTF is the point? I don't mean for the world, I guess I see that it is useful for some people, but it does nothing for *me*. Hey I could write some thing and probably noone would use it and I wouldn't get any reward from it and it would just be another depressing waste of some great effort like so many other things in my life.
It's weird to me that the best code in the world tends to be the type of code that's given away for free. The little nuggets of pure genius, the code that really has something special in it - that tends to be the free code. I'm thinking of compressors, hashers, data structures, the real gems. Now, I'm all for free code and sharing knowledge and so on, but it's not equal. We (the producers of those gems) are getting fucked on the deal. Apple and the financial service industry are gouging me in every possible immoral way, and I'm giving away the best work of my life for nothing. It's a sucker move, but it's too late. The only sensible play in a realpolitik sense of your own life optimization is to not work in algorithms.
Obviously anyone who claims that patents provide money to inventors is either a liar (Myhrvold etc.) or just has no familiarity with actual engineering. I often think about LZ77 as a case in point. The people who made money off LZ77 patents were PK and Stac, both of whom contributed *absolutely nothing*. Their variants were completely trivial obvious extensions of the original idea. Of course the real inventors (L&Z, and the modern variant is really due to S&S) didn't patent and got nothing. Same thing with GIF and LZW, etc. etc. perhaps v42 goes in there somewhere too; not a single one of the compression-patent money makers was an innovator. (and this is even igoring the larger anti-patent argument, which is that things like LZ77 would have been obvious to any number of researchers in the field at the time; it's almost always impossible to attribute scientific invention/discovery to an individual)
1. A low-level keyed event with double-checked wait.
Futex and NT's keyed event are both pretty great, but the ideal low level wait should be double-checked. I believe it should be
something like :
HANDLE Waitset;
Waitset CreateWaitset();
DestroyWaitset(Waitset ws);
HANDLE wait_handle = Waitset_PrepareWait( Waitset ws , U64 key );
Waitset_CancelWait( Waitset ws , wait_handle h );
Waitset_Wait( Waitset ws , wait_handle h );
Waitset_Signal( Waitset ws, U64 key );
**Now, key of course could be a pointer, but there's no reason for it to be particularly. This is easily a superset of futex; if you want
you could just have one global Waitset object, and key could be an int pointer, and you could check *ptr in between PrepareWait and Wait,
that would give you futex. But you can do much more with this.
I prefer having a "waitset" object to put the waits on (like KeyedEvent), not just making it global/static (like futex). The advantage is 1. efficiency and 2. multiple meanings for a single "key". It's more efficient because you can have different waitsets for different uses, which makes each one cover fewer waits, which makes all the lookups faster. (that is, rather than 100 global waits pending, maybe you have 10 on 10 different waitsets). The other advantage is that you can reuse the same value for key without it confusing the system. You could have one Waitset where key is a pointer, and another where key is an internal handle number, etc.
2. A proper cond_var with waker-side condition checking.
First of all, a decent cond_var API combines a lot of the disjoint junk in the posix API. It should
include the mutex, because that allows for vastly more efficient implementation :
class condition_var
{
public:
void lock();
void unlock();
// the below are always called with lock held :
void unlock_wait_lock();
void signal_unlock();
void broadcast_unlock();
private:
...
};
The basic usage of this cv is like :
cv.lock();
while( ! condition )
{
cv.unlock_wait_lock();
}
.. do stuff with condition true ..
cv.unlock();
A good implementation should do the compound ops (signal_unlock, etc) atomically. But I wouldn't require that
because it's too hard.
But that's just background. What you really want is to put the condition check in the API. It should be :
void wait_lock( [] { wake condition } );
The spec of the API is that "wake condition" is some code that will be run with the mutex locked, and when the function exits
you will own the mutex and the condition is true. Then client usage is like :
cv.wait_lock( condition );
.. do stuff with condition true ..
cv.unlock();
which allows for much more efficient implementation. The wake condition of the waiter list can be evaluated
easily inside signal_unlock(), because that's always called with the mutex held.
When a thread is waiting on some condition, your goal should be to only wake it up if that condition is actually true - that is, the thread really gets to run. In reverse order of badness :
1. Wakeup condition polling. This is the worst and is very common. You're essentially just using
the thread wakeup to say "hey your condition *might* be set, check it yourself".
The suspect code looks something like :
while( ! condition )
{
Wait(event);
}
these threads can waste a ton of cycles just waking up, checking their condition, then going back to sleep.
One of the common ways to get nasty wake-polling is when you are trying to just wake one thread, but you have to do a broadcast due to the possibility of a missed wakeup (as in the naive semaphore from waitset ).
Of course any usage of cond_var is a wake-poll loop. I really don't like cond_var as an API or a programming pattern. It encourages you to write wakee-side condition checks. Whenever possible, waker-side condition checks are better. (See previous notes on cond vars such as : In general, you should prefer to use the CV to mean "this condition is set" , not "hey wakeup and check your condition").
(ADDENDUM : in fact I dislike cond_var so much I wrote a proposal on an alternative cond_var api ).
Now it's worth breaking this into two sub-categories :
1.A. Wake-polling when it is extremely likely that you get to run immediately.
This is super standard and is not that bad. At root, what's happening here is that under normal conditions, the wakeup means the condition is true and you get to run. The loop is only needed to catch the race where someone stole your wakeup.
For example, the way
Linux implements
semaphore on futex is a classic wake-poll. The core loop is :
for(;;)
{
if ( try_wait() ) break;
futex_wait( & sem->value, 0 ); // wait if sem value == 0
}
If there's no contention, you wake from the wait and get to try_wait (dec the count) and proceed. The only
time you have to loop is if someone else raced in and dec'ed the count before you. (see also in that same
post a discussion of why you actually *want* that race to happen, for performance reasons).
The reason this is okay is because the futex semaphore only has to do a wake 1 when it signals. If it had to do a broadcast, this would be a bad loop. (and note that the reason it can do a broadcast is due to the special nature of the futex wait, which ensures that the single thread signal actually goes to someone who needs it!) (see : cbloom rants 08-01-11 - Double checked wait ).
1.B. Wake-polling when it is unlikely that you get to run.
This is the really bad one.
As I've noted previously (
cbloom rants 07-26-11 - Implementing Event WFMO )
this is a common way for people to implement WFMO. The crap implementation basically looks like this :
while ( any events in array[] not set )
{
wait on an unset event in array[]
}
What this does is any time one of the events in the set triggers, it wakes up all the waiters that are waiting
on it in an array, checks the array, and they go back to sleep.
Obviously this is terrible, it causes bad "thread thrashing" - tons of wakeups and immediate sleeps just to get one thread to eventually run.
2. "Direct Handoff" - minimal wakes. This is the ideal; you only wake a thread when you absolutely know it gets to run.
When only a single thread is waiting on the condition, this is pretty easy, because there's no issue of "stolen wakeups". With multiple threads waiting, this can be tricky.
The only way to really robustly ensure that you have direct handoff is by making the wakeup ensure the condition.
At the low level, you want threading primitives that don't give you unnecessary wakeups. eg. we don't like the pthreads cond_var
that has you call :
condvar.wait();
mutex.lock();
as two separate calls, which means you can wake from the condvar and immediately fail to get the mutex and go back to sleep. Prefer a
single call :
condvar.wait_then_lock(mutex);
which only wakes you when you get a cv signal *and* can acquire the mutex.
At the high level, the main thing you should be doing is *waker* side checks.
eg. to do a good WFMO you should be checking for all-events-set on the *waker* side. To do this you must create a proxy event for the set when you enter the wait, register all the events on the proxy, and then you only signal the proxy when they are all set. When one of them is set, it does the checking. That is, the checking is moved to the signaller. The advantage is that the signalling thread is already running.
Why do I think it should be that way? Let's revisit some points.
1. Main thread should be a worker and all workers should be symmetric. That is, there's only one type of thread - worker threads, and all functions are work items. There are no special-purpose threads.
The purpose of this is to minimize thread switches, and to make waits be immediate runs whenever possible.
Consider the alternative. Say you have a classic "main" thread and a worker thread. Your main thread is running along and then decides it has to Wait() on a work item. It has to sleep the thread pending a notification from the worker thread. The OS has to switch to the worker, run the job, notify, then switch you back.
With fully symmetric threads, there is no actual thread wait there. If the work item is not started, or is in a yield point of a coroutine, you simply pop it and run it immediately. (of course your main() also has to be a coroutine, so that it can be yielded out at that spot to run the work item). Symmetric threads = less thread switching.
There are other advantages. One is that you're less affected by the OS starving one of your threads. When your threads are not symmetric, if one is starved (and is the bottleneck) it can ruin your throughput; one crucial job or IO can't run and then all the other threads back up. With symmetric threads, someone else grabs that job and off you go.
Symmetric threads are self-balancing. Any time you decide "we have 2 threads for graphics and 1 for compute" you are assuming you know your load exactly, and you can only be wrong. Symmetric threads maximize the utilization of the cpu. (Note that for cache coherence you might want to have a system that *prefers* to keep the same time of job on the same thread, but that's only a soft preference and it will run other jobs if there are none of the right type).
Symmetric threads scale cleanly down to 1. This is a big one that I think is important. Even just for debugging purposes, you want to be able to run your system non-threaded. With asymmetric threads you have to have a custom "non-threaded" pathway, which leads to bugs and means you aren't testing the same threaded pathway. The symmetric thread system scales down to 1 thread using the same code as always - when you wait on a job, if it hasn't been started it's just run immediately.
It's also much easier to have deadlocks in asymmetric thread systems. If an IO job waits on a graphics job, and a graphics job waits on an IO job, you're in a tricky situation; of course you shouldn't deadlock as long as there are no circular dependencies, but if one of those threads is processing in FIFO order you can get in trouble. It's just better to have a system where that issue doesn't even arise.
2. Deep yield.
Obviously if you want to write real software, you can't be returning out to the root level of the coroutine every time you want to yield.
In the full coroutine-centric architecture, all the OS waits (mutex locks, etc) should be coroutine yields. The only way to do that is if they can call yield() internally and it's a full stack-staving deep yield.
Of course you should be able to spawn more coroutines from inside your coroutine, and wait on them (with that wait being a yield). That is, aside from the outer branch-merge, each internal operation should be able to do its own branch-merge, and yield its thread to its sub-items.
3. Everything GC.
This is just the only reasonable way to write code in this system. It gives you a race-free way to ensure that object lifetimes exceed their usage.
The last post I did about the simple string crash is just so easy to do. The problem is that without GC you
inevitably try to be "clever" and "efficient" (really "dumb" and "pointless") about your lifetime management.
That is, you'll write things like :
void func1()
{
char name[256];
.. file name ..
Handle h = StartJob( LoadAndDecompress, name );
...
Wait(h);
}
which is okay, because it waits on the async op inside the lifetime of "name". But of course a week later you
change this function to :
Handle func1()
{
char name[256];
.. file name ..
Handle h = StartJob( LoadAndDecompress, name );
...
return h;
}
with the wait done externally, and now it's a crash. Manual lifetime management in heavily-threaded code is just not reasonable.
The other compelling reason is that you want to be able to have "dangling" coroutines, that is you don't want to have to wait on them and clean them up on the outside, just fire them off and the clean themselves when they finish. This requires some kind of ref-counted or GC'ed ownership of all objects.
4. A thread per core.
With all your "threading" as coroutines and all your waits as "yields", you no longer need threads to share the cpu time, so you just make one thread per core and leave it there.
I wanted to note an exception to this - OS signals that cannot be converted to yields, such as IO. In this case you still need to do a true OS Wait that would block a thread. This would stop your entire worker from running, so that's not nice.
The solution is to have a separate pool of threads for running the small set of OS functions that do internal thread waits.
That is, you convert :
ReadFile( args )
->
yield RunOnThreadPool( ReadFile, args );
this separate pool of threads is in addition to the one per core (or it could just be all one pool, and you
make new threads as needed to ensure that #cores of them are running).
void DoLZDecompress(const char *filename,...)
{
struct CommandInfo i;
i.data = (void *)filename;
// warning : passing string pointer (not copying) to another thread, make sure it's const / sticks around!
StartJob( &i );
}
Yup, that's a crash.
void OodleIOQ_SetKickImmediate( bool kick );
/* kick state is global ; hmm should really be per-thread ; makes it a race
*/
Yup, that's a problem, which leads to the later deadlock :
void Oodle_Wait( Handle h )
{
// @@ ? can this handle depend on un-kicked items, and hence never complete ?
// I used to check for that with normal deps but it's hard now with the "permits"
...
}
Coding crime doesn't pay. Spaghetti always gets you in the end, with its buggy staining sauce.
Whenever I have one of those "hmm this smells funny, I'm worried about the robustness of this" , yep, it's a problem.
One of my mortal enemies are the "don't worry about it, it'll be fine" people. No it will fucking not be fine. You know what will happen? It'll be a nasty god damn race bug, which I will wind up fixing while the "don't worry about it, it'll be fine" guy is watching lolcatz or browsing facebook.
I've been thinking about this a lot over the last fews days, and have come to it simultaneously from several different angles.
For the past month or so I've been going over my finances, reviewing my spending, because I'm not happy with the amount I'm saving, and I'm trying to figure out where the money is leaking. Obviously there are big expenses like cars and vacations, but those I've budgeted for, they're not the leak (*) (**), but there's still a large general money leakage that I want to track down. It turns out a lot of it is buying stuff for the house or for productivity, stuff that on its own I can justify, but overall adds up to a big waste.
A lot of that waste are things that I tell myself will "pay off someday". Like I need some rope for around the house; hey look it's a much better deal if I buy it in a 500 foot spool. I'll use it eventually so that's the better buy. Or, I need to set a bolt in concrete; sure a hammer drill is expensive, but I'll use it the rest of my life, so it will be a good value some day (better than renting one for this one job). etc. Lots of stuff where the idea is that in the long term it will be a good value.
Now I certainly haven't hit the "long term" yet, but I can already see the flaw in that logic. There are lots of reasons why that imagined long term value might never come. I might never wind up using the stuff. It might get damaged over time from sitting, or flood or who knows what. I also essentially pay a tax to store it, having stuff is not free. I pay a tax on it any time I move. Maybe I won't want to do DIY in the future and will just hire out the jobs and so won't use it. There are a lot of costs and uncertainty about this future value which make it much less valuable than it naively appears.
Perhaps computer stuff is an even easier example; like I sort of need a USB hub; I could live without it and just unplug stuff to make room depending on what device I want to use at the moment. You could easily convince yourself that the value is high because "even if I don't really need it now I'll use it someday". But of course there's any number of reasons why you might not use it some day.
(* = aside : expensive cars actually aren't that expensive; if you're careful about how you buy and sell, and look for cars that are on a pretty flat part of the depreciation curve, you can get a "$100k car" that actually only costs $5k a year. That's not really a big expense in the scheme of things. However that also doesn't mean it's free; the big cost is the time spent buying and selling; if you actually want it to be low cost you have to spend a lot of time on the transaction to get good value, and for people like me that's excruciatingly painful; for people who like wheeling-and-dealing, they can do pretty well, getting almost free stuff that they are just holding temporarily between sales)
(** = more aside, and actually there is a spending leak that I have that's associated to cars and vacations; I, like most, and perhaps less than average, fall prey to the sunk cost fallacy. The sunk cost fallacy is the idea that you've spent a bunch on something, you should stick with it and spend some more. Like I've spent this much to go on vacation, I shouldn't cheap out on the dining or whatever. Or I've got an expensive car, I should buy the expensive tires. But that of course is not true. Each decision should be evaluated independently for its value; the fact that you have a large sunk cost only matters in that it changes your current situation. You don't keep chasing your flush just because you're already called some big bets (though obviously your past calls do affect the pot size which affects your current decision)).
Of course home improvements are a classic of false future value. I'm not foolish enough to think I'll get any resale value benefit, but I do fall prey to thinking "I'll enjoy this for many years" when in fact I might not.
I was thinking about buying a really good mattress that's supposed to last 30 years vs one that will only last 5. In theory the long-life one is a much better deal, but there are any number of reasons why that might not be the case. It might not last like it's supposed to; you might pee and poo on it; you might want a different size mattress. By making an "investment" what you've done is commit yourself to something, you've removed flexibility, which is a cost.
Of course if you ever decide you want to travel the world and live in apartments again, all the buying of stuff is a big liability.
Getting away from just "accumulating stuff" now :
I've been thinking lately about my career arc. All through my young-adulthood I was carefully building up my value as a software development employee. I was improving my skills, improving my profile, networking, all that stuff, going up the career ladder. During that time I was not getting paid particularly well. I took jobs based on them being good opportunities for my larger career, not for their immediate financial reward.
The problem is that the big payoff never came. When Oddworld went under I was at the point where I could have moved on to CTO-level jobs at major studios, but I decided I didn't want to do that. The stress was ruining my life (and various other things that I've blogged about back then). The point is that this "future value" I had been building suddenly became zero. If you actually want to redeem that future value, you are locking yourself in to a path, which is a major cost you are paying, giving up flexibility in your life. And in careers there are so many factors outside your control; perhaps your specialty will become less prominent in a few years. Lots of people have done things like getting a JD only to find the law job market has dried up by the time they graduate.
Saving money in general is questionable now. The governments of the world have demonstrated that they don't care about the integrity of the world financial systems, so socking money away for the future has immense risk associated with it. (I don't put much credence in the complete currency collapse alarmists, but I do believe that a long period of negative inflation-adjusted returns is very likely). In the old days we glorified the good salaryman, who worked hard and saved some money, putting the joy of today aside to build a life for themselves and their family tomorrow.
Of course we can relate this all to poker, in old skool cbloom-rants style.
One of the first big realizations I had on my own as I was getting better and moving beyond book TAG play is that implied odds are massively overvalued by most players. "implied odds" is the term used for the imagined future value that you will get if you hit a big hand. Like if I call with a flush draw, it might be a bad value based on the immediate odds, but if I hit I'll make some more money, which makes it worth it the call. The problem is that there are a wide variety of reasons why you might not get paid off even if your flush comes (scare cards, or your opponent never had a strong calling hand to begin with). Or your flush might come and he might have a better flush (negative implied odds). If you realistically weight all those undesirable outcomes, the result is that the true effect of implied odds is very small. eg. on the turn you have a 16% chance to improve, you can call a bet for zero EV if the bet is about 20% of pot size. The action of implied odds is very small; you can only call a bet that's maybe 25% of pot size; really not much more. Certainly not the 30-35% that people talk themselves into believing is correct. (and of course in no limit holdem you have to adjust for position; out of position you should consider your implied odds to be zero, chasing a draw out of position is so very bad). What I'm getting at is the imagined future value of your current investment is far lower than you imagine.
(sort of not related but "implied odds" is also a good example of the "rationalization trap". Whenever a complicated logical exercise justifies behavior that your naughty irresponsible side secretly wants to do (like chasing draws), you should be very skeptical of that logic. Whenever you read that "a little red wine is healthy" you should be very skeptical. Whenever the result of a "study" is exactly what people want to hear, beware).
This isn't really related to the "future value" mistake, but I've been mulling another spending fallacy, which is the "value of an hour" fallacy. Sometimes I'll do something like buy a tool or hire a helper because it only costs $50 and saves me an hour of work; my hour is worth more than $50, so that's a good deal, right? I'm not so sure. I feel like that line of reasoning is just a way of rationalizing more spending, but I haven't quite found the flaw in it yet.
One of the better stores around here is "Bedrooms and More", which sounds just like a national chain sleaze-o-rama mattress trash peddler, but is actually not. The owner does some funny rants online and he suggests that the real shittening of the S-brands is due to private equity. Interesting idea; certainly there's no doubt that the S-brands have gone to total shit.
Of course we should be mad at the corporate overlords for sending product quality to shit, and generally using dishonest schemes to maximize short term profit. But I'm also angry at consumers for letting it happen. The only way to direct good behavior is to punish people who behave badly. And that just doesn't happen, neither in social life, nor capitalist life. People are amazingly (foolishly?) forgiving. Your only weapon as a consumer is your money.
Hanging out on Porsche forums a few years ago (zomg what was I thinking), I kept having my mind blown by how short-memoried everyone was. Even people who were pretty realistic about what fuckers Porsche had been in the past were all ready to buy the new model. (background : back in the early 90's, Porsche almost went bankrupt; they were completely restructured to focus more on marketing and profit, and less on quality. They intentionally drove the quality of their products down to the absolute minimum (actually, below minimum). This was the era of the Boxster and then the 996, and the early cars that were made were complete junk, some of the worst made cars for any money (worse than a Tata or god knows, it's hard to even think of an example of a horribly made car any more), the engine blocks were porous, the cylinders were out of round, there was cheap plastic inside the engine, and of course terrible cheap plastic everywhere in the interior, it was just a total clusterfuck). The rational consumer response should have been : whoah you guys are lying fuckers, we are never going to buy anything from you again. Instead most of the people were just incredibly forgiving and short-memoried, like yeah that was bad, but they'll fix it in the next model!
Wouldn't it be nice if products that cost more were actually better? Then you could just look at the range of products, pick your price-quality tradeoff point, and buy one. It would still be a tough decision, you'd have to weight how much you want to spend on this thing, but you would at least know that spending more got you something better. In the real world, that's not remotely the case. It's so nice when you go shopping in a video game RPG and you can just buy the more expensive sword and know it's better (and it's so fucking retarded when video games designers throw in wild-cards of expensive items that suck or really cheap items that are great, you dumb assholes, you don't get it, the game world should not make me do all the stressful shit I have to do in the real world).
I've always wanted a grocery store that actually selected its products for good cost/quality tradeof. That is, a good store should only sell things on the Pareto Frontier. Why the fuck do you have 50 different olive oils? I have no fucking clue what all these olive oils are, don't offer them to me. You (the retailer) should be an expert in this product (and also act as an agglomerator of customer feedback). There should only be like 4 olive oils to choose from, at various cost/quality tradeoffs (and also some for finishing vs. cooking oils, but let's pretend right now that there's only one axis of "quality" for olive oil), so I can just choose how much I want to spend and get the best oil at that price.
I had a funny self-realization moment at Soaring Heart when the salesman was saying how everything was made locally and they pay health care and benefits for their workers, I instinctively/subconscious thought "yeuck, that means bad value". Apparently my subconscious wants to buy products made in sweatshops. More generally I've got a major bias against ever giving money to someone who seems to be living well. When I see a realtor in a fancy car or a contractor who gets a swim and massage daily, fuck you I'm not giving you money, I want my pay to you to be barely enough to support human life, you should be in miserable subsistence conditions, not living it up! I guess I'm also biased against anything made in America; my mental image of Seattle mattress builders is not great skilled workers (like New Yankee Workshop), but something like failed philosophy PhDs who smoke weed while they work and don't know WTF they're doing (like Workaholics).
"Reflect" is in my opinion clearly the best way to do member-enumeration in C++. And yet almost nobody uses it. A quick reminder :
the reflection visitor pattern is that every class provides a member function named Reflect which takes a templated functor
visitor and applies that visitor to all its members; something like :
class Thingy
{
type1 m_x;
type2 m_y;
template
with Reflect you can efficiently generate text IO, binary IO, tweak variable GUIs, etc.
<typename functor>
void Reflect( functor visit )
{
// (for all members)
visit(m_x);
visit(m_y);
}
};
(actually instead of directly calling "visit" you probably want to use a macro like #define VISIT(x) visit(x,#x))
A typical visitor is something like a "ReadFromText" functor. You specialize ReadFromText for the basic types (int, float), and for
any type that doesn't have a specialization, you assume it's a class and call Reflect on it. That is, the fallback specialization for
every visitor should be :
struct ReadFromText
{
template
<typename visiting>
void operator () ( visiting & v )
{
v.Reflect( *this );
}
}:
The standard alternative is to use some macros to mark up your variables and create a walkable set of extra data on the side. That is much worse in many ways, I contend. You have to maintain a whole type ID system, you have to have virtuals for each type of class IO (note that the Reflect pattern uses no virtuals). The Reflect method lets you use the compiler to create specializations, and get decent error messages when you try to use new visitors or new types that don't have the correct handlers.
Perhaps the best thing about the Reflect system is that it's code, not data. That means you can add arbitrary special case code directly where it's needed, rather than trying to make the macro-cvar system handle everything.
Of course you can go farther and auto-generate your Reflect function, but in my experience manual maintenance is really not a bad problem. See previous notes :
cbloom rants 04-11-07 - 1 - Reflection
cbloom rants 03-13-09 - Automatic Prefs
cbloom rants 05-05-09 - AutoReflect
Now, despite being pro-Reflect I thought I would look at some of the drawbacks.
1. Everything in headers. This is the standard C++ problem. If you truly want to be able to Reflect any class with any visitor, everything has to be in headers. That's annoying enough that in practice in a large code base you probably want to restrict to just a few types of visitor (perhaps just BinIO,TextIO), and provide non-template accessors for those.
This is a transformation that the compiler could do for you if C++ was actually well designed and friendly to programmers (grumble grumble).
That is, we have something like
template
but we don't want to eat all that pain, so we tell the compiler which types can actually ever visit us :
<typename functor>
void Reflect( functor visit );
void Reflect( TextIO & visit );
void Reflect( BinIO & visit );
and then you can put all the details in the body. Since C++ won't do it for you, you have to do this by hand, and it's annoying boiler-plate,
but could be made easier with macros or autogen.
2. No virtual templates in C++. To call the derived-class implementation of Reflect you need to get down there in some ugly way. If you are specializing to just a few possible visitors (as above), then you can just make those virtual functions and it's no problem. Otherwise you need a derived-class dispatcher (see cblib and previous discussions).
3. Versioning. First of all, versioning in this system is not really any better or worse than versioning in any other system. I've always found automatic versioning systems to be more trouble than they're worth. The fundamental issue is that you want to be able to incrementally do the version transition (you should still be able to load old versions during development), so the objects need to know how to load old versions and convert them to new versions. So you wind up having to write custom code to adapt the old variables to the new, stuff like :
if ( version == 1 )
{
// used to have member m_angle
double m_angle;
visitor(m_angle);
m_angleCos = cos(m_angle);
}
else
{
visitor(m_angleCos);
}
now, you can of course do this without explicit version numbers, which is my preference for simple changes. eg. when I have some text prefs
and decide I want to remove some values and add new ones, you can just leave code in to handle both ways for a while :
{
#ifndef FINAL
if ( visitor.IsRead() )
{
double m_angle = 0;
visitor(m_angle);
m_angleCos = cos(m_angle);
}
#endif
visitor(m_angleCos);
}
where I'm using the assumption that my IO visitor is a NOP on variables that aren't in the stream. (eg. when loading an old stream,
m_angleCos won't be found and so the value from m_angle will be loaded, and when loading a new stream the initial filling from m_angle
will be replaced by the later load from m_angleCos).
Anyway, the need for conversions like this has always put me off automatic versioning. But that also means that you can't use the auto-gen'ed reflection. I suspect that in large real-world code, you would wind up doing lots of little special case hacks which would prevent use of the simple auto-gen'ed reflection.
4. Note that macro-markup and Reflect() both could provide extra data, such as min & max value ranges, version numbers, etc. So that's not a reason to prefer one or the other.
5. Reflect() can be abused to call the visitor on values that are on the stack or otherwise not actually data members. Mostly that's a big advantage, it lets you do converions, and also serialize in a more human-friendly format (for text or tweakers) (eg. you might store a quaternion, but expose it to tweak/text prefs as euler angles) (*).
But, in theory with a macro-markup cvar method, you could use that for layout info of your objects, which would allow you to do more efficient binary IO (eg. by identifying big blocks of data that can be read in binary without any conversions).
(* = whenever you expose a converted version, you should also store the original form in binary so that write-then-read is a gauranteed nop ; this is of course true even for just floating point numbers that aren't printed to all their places, which is something I've talked about before).
I think this potential advantage of the cvar method is not a real advantage. Doing super-efficient binary IO should be as close to this :
void * data = Load( one file );
GameWorld * world = (GameWorld *) data;
as possible. That's going to require a whole different pathway for IO that's separate from the cvar/reflect pathway, so there's no need
to consider that as part of the pro/con.
6. The End. I've never used the Reflect() pattern in the real world of a large production codebase, so I don't know how it would really fare. I'd like to try it.
(followed up by reading "The Emigrants" which is good, but much more of a normal book, it's terrestrial, not so oddly magical and other-worldly as "The Rings of Saturn").
I've just discovered "The Sky at Night" (ex of Sir Patrick Moore) on BBC. What a marvelous show. I don't really even care much about astronomy, and yet here is a show with real scientists, talking to each other about things they actually understand, and talking at a very high level and not really dumbing it down much for the audience. I've never seen anything like it on television before, actual intelligent people talking to each other, it's amazing. I love Patrick's interviewing style, the way he just blurts things out; he reminds me so much of the real scientists I've known in my life who are super direct and straight to the facts without dancing around the point. It's best to watch early episodes before he got too old/ill.
They did a demo of the Higgs spontaneous symmetry breaking on The Sky at Night which is the best I've seen. Take a wine bottle (with a good hump at the bottom; those familiar will recognize that the hump is the key to symmetry breaking) and put a piece of cork or a ball or something inside. Now shake the bottle vigorously. At high energy like that (bottle shaking), the location of the cork is random, so the whole assemblage still has rotational symmetry. Now stop shaking (low energy) and the cork will settle somewhere - not in the middle of the wine bottle where the hump is, but off to one side in the trough. Symmetry broken.
"The Loneliest Planet" is a really terrible movie and I don't recommend it (jesus christ the scenes where they sit around the camp fire and say the same words over and over are excruciating torture), but it has these few scenes that are some of the most beautiful I've ever seen in a movie - the scenes with the wide static shots that the characters slowly walk across, they're staggering, breath-taking.
"Hello Lonesome" was good. The director obviously knows loneliness; it reminded me a lot of various times in my life; the weirdness of being alone for a long time, the sad joy - you do whatever you want, but it all kind of sucks. The long idle times, so much free time, rambling around your property, hitting fruit with a baseball bat (me, not the movie).
also watchable : Summer Hours (quiet, nothing ever happens, and yet very adult interactions, somehow compelling), Bernie (irresistible, charming), Bonsai (simple little movie that reminds me of life at that age; tasteful), Youth in Revolt (much better teeny rom-com than the more well known teeny rom-coms), Breaking and Entering (some great characters; in the end it's a movie about sad and lonely people)
GAYLE is crazy funny. Weird as hell, wtf is going on, and yet it's the most biting mockery of normal suburban life.
"Utopia" is pretty retarded; the plot is standard unrealistic conspiracy crap, straight out of that awful graphic-novel type of writing; there's no part of it that's clever or insightful or well thought out (so far), and the characters are pretty awful to boot. I don't really care for the torture-porn stuff either (just skip it). All that said, the look of it is just super beautiful, amazing art direction, subtle and realistic but strikingly odd; every shot has these dramatic geometric forms and colors in it. And there's an eerie stillness to it, lots of long holds. It's so good to look at, and good sounds too.
(a lot of recent British stuff is just staggeringly good looking. See also "The Tower", "Red Riding", "Wallander".)
The new season of Top Gear is finally here, and good god is it painfully bad. I guess I should face the fact that it's been awful for many years now, but I was clinging on, hoping it would perk back up (as it occasional has done, eg. the Bolivia Special). You develop this almost pavlovian response to things that have given you pleasure in the past; the sound of a beer bottle top popping off, the smell of coffee, that Top Gear opening theme song, it starts the pleasure molecules bursting in my brain, even if I don't actually want a beer, and I know that Top Gear is going to suck, there's still this vestigial fondness I have for it. The best part of the series so far has been the meta-comedy moment of James May falling asleep on the show because he too was so bored of it.
ADDENDUM : "Beasts of the Southern Wild" is amazing! joyous, sad, hard to watch, thrilling, it's a rich emotional feast, but it's also an incredible work of art. There's obvious allegory, but it's characters aren't unrealistic victims without faults. More than any thing I think it's a modern fairy tale (of the old style); not "fairy tale" like in the shit Disney sense meaning "princess, happy ending, dreams come true" but in the original sense, like Grimm's and all the old stories that were terrifying ; they were fantasies, but grounded in real world horrors, and often were obvious warning messages. Real fairy tales are magical and beautiful but also scary and sad.
One of the consequences of this is that the layout of "cbloom rants" can no longer be achieved or maintained with the new blogger layout, which means I can't edit it without losing it completely. (the existing layout does seem to keep working as long as I don't touch it, because they keep the raw HTML of the layout).
Another nasty one I just discovered is that a key setting that I rely upon is no longer there. Under "Settings->Formatting" there used to be a setting for "Convert Line Breaks" which defaults to Yes and causes any LF to be turned into an HTML BR code. I set that to "No" for cbloom rants so that it doesn't crud up my html when I send it over the Blogger API. (god dammit just let me put up HTML and stop fucking with it).
The odd thing is that the "No" setting (of "Convert Line Breaks") for cbloom rants appears to have stuck even though that setting has disappeared. That's fine with me I guess, though I wouldn't be surprised if it just stops working at some point when they revise the service again. The problem is I'm trying to set up a new blog and I can't get that setting any more.
(I of course have a workaround, which is removing LF's before I upload posts. The workaround sucks a little because I like to be able to download my posts back down and have them match the way I wrote them, which of course was with LF's in it for my readability during composition. The point is not the specific issue, it's god damn it don't push updates on me ever never ever unless I ask for them.)
Software updates are incredibly harmful. The benefit from changing *anything* has to be really massive for it to be a win. I'm so sick of getting new versions of crap pushed on me. At least with non-web software you can try to hold onto old versions as long as possible so that you can keep your valuable knowledge and its connections to your automation suite.
(my current mail solution is to use Eudora POP on my local machine, but forward all my mail through gmail for spam filtering and archiving and searchability; it's working pretty well finally).
Gmail does not offer any "import from local disk" options. Sigh. There appear to be a few ways to do this :
1. Change my gmail temporarily to IMAP. Get all my Eudora MBX's into an IMAP client (something like Outlook; perhaps requiring an MBX to PST conversion step or something). Open the IMAP client and connect to gmail; drag the mail from the Eudora boxes to the gmail boxes.
Should work in theory, but a bit scary, and extremely slow (moving mail on IMAP is ungodly slow).
(Also, when I switch back to POP, is it going to redeliver me all that mail that I just uploaded? That would double-suck.)
2. Make a POP server somewhere. Convert the mbx's to mbox's to maildirs and dump them on the POP server for it to serve up. Tell gmail to grab mail from that POP server.
One issue is where I could get a POP server with a public IP and admin access. The other is that any time I try to do networking stuff it's a massive fail of mysterious problems and no error messages.
3. Get a Google Apps gmail account (different from regular gmail account for unknowable reasons). Import MBX's to Outlook. Use "Google Apps Migration for Microsoft Outlook" to import mail to Apps mail account. Use gmail fetcher to bring mail from apps-gmail to my normal gmail.
(similar alternative : get apps gmail. Convert mbx to mbox. Find a Mac. Use "Google Email Uploader for Mac" to upload the mbox. Transfer mail from apps-mail to normal mail).
(I could also use gmail API to write my own importer, but that also requires an Apps gmail, so may as well just use the existing importers in method 3)
It's all such a hassle that I'm once again tempted to just write my own damn email client. Sigh I wish I'd done that long ago, but it's always the local optimization to not do it. I'm so fucking sick of getting penis emails. Hello spam filterers, *penis* -> spam. You're welcome. And of course if I used my own email client, my private property (words) wouldn't be data-mined to serve me ads (you bastards).
(oddly gmail does remarkably well at spam detection on the cases that would be hard for me to do with simple filters; things like bank phishing mails that are designed to look exactly like legitimate mails from my bank; I don't think I could give that up, so I'd still be stuck with routing mail through gmail even if I had my own client).
1. Yes it's beautiful. It looks more like Vietnam or Thailand and their limestone karst stuff, all old and weathered and crumbly with these random protrusions and such. (it's cool how you can travel the Hawaiian islands from south to north and visually see geographic time passing at a rate of 100,000 years per island hop). It actually wasn't as lush as I expected given all the hype about how wet it was and the incredible lushness. It's no more jungley beautiful than the Hamakua Coast near Hilo is really. My favorite parts of Kauai were the northern coast, and also just south of Lihue around the Hulemalu Road area (which would be a pretty sweet bike ride; good pavement, no cars).
2. There're sweet beaches all over the island. Like you almost don't have to seek them out there's another one around every corner, and most with very few people on them. None of them looked really perfect the way Mauna Kea and Hapuna are just ridiculously perfect in every way (clear water, no rocks, bottom drops neither too fast nor too slow, no rips, etc), but they were uncrowded and more sort of charming in a rustic way and often have cool surrounding cliffs and pretty settings.
3. The traffic sucks. The island is small, which is cool for a vacation (actually I love staying on tiny islands, like Ko Hai, Caye Caulker, or the Isla Mujeres of my childhood; islands where you can walk from one side to the other in half an hour or so). However, despite the smallness it takes forever to get anywhere because it's constant gridlock. Sitting in traffic fucking blows, and this alone is almost enough to put me off Kauai.
4. The human development on Kauai is repellent. The cities are all really ugly (though that seems to be standard all over Hawaii); most of the island is strip malls and run down shopping centers and fast food and such. Then the alternative are these fancy manicured suburban/golf developments like Poipu and Princeville which are disgusting in a different way. Between the two, the human hand on Kauai has scarred it with an ugliness that is quite tragic.
5. It's extremely tourist-oriented. Every restaurant is for tourists (which means rotten food and weird phoney-nice service), the place is covered in tourist crap shops (t-shirts, mac nuts, koa, etc). It has no feeling of being it's own place independent from the tourists. It also has a big port where cruise ships drop out hordes. Part of the problem with that is that Kauai is so small it can't really handle the appearance of 5000 people in one day.
6. The Na Pali Coast trail (Kalalau) is pretty cool. We made it 6 miles in before turning around (just into Hanakoa Valley, which was the best part of the trail that we saw) (pretty impressive for a pregnant lady). It's definitely not the most beautiful hike ever (as some say); there are lots of hikes in WA that are better scenery and not so jam packed with ding-dongs. It is sweet to be able to take a dip in the rivers along the way and swim at the beach afterward. Much like the Big Island, there's too much private property and not very much development of good trails, so you see all this beautiful stuff around but you can't really get to it (unless you want to tresspass and bushwhack, which you certainly can do).
7. I think it would be a pretty great place for a surf vacation. One of the good things about it from that standpoint is there are decent beaches facing every cardinal direction, so you can pick your spot to match the swell, and because it's small it doesn't take forever to get there. I could see maybe going back to Kauai some day for an intensive "finally learn to really surf" vacation, based around Hanalei or something (and never leaving that area).
8. For anyone considering going to Kauai - don't go in winter. We got super lucky with no big storms during our short trip, but generally Kauai is pounded in winter with big waves and lots of rain. You can always go hide in the dry south, but since the north is the best part of the island it's just better to go when it's not storm season.
Overall it made me miss our Big Island home, and I'm happy to be back.
I guess I'm a little negative about Kauai because I was super tired the whole time from not sleeping well. I also realized that I kind of hate vacation these days. I like workcation where I rent a house for a while and settle in and can cook my own food and bring my bike and get to do what I like (bike, swim, work). I don't really like sight seeing, just going from place to place and going yep I saw that; it feels so pointless, and it's kind of all the same experience no matter what sight it is you're seeing. I hate hotels, the invariably awful beds and pillows, the ice makers and elevators and other guests, the nasty decor and bad air, the attendants angling for tips. I hate restaurants, I'm so sick of restaurants. I wish I could just buy some proper ingredients that are actually fresh and okay quality, and have them cooked simply at the time that I order. Instead you get frozen super-low-grade Sysco garbage that's been pre-cooked and then warmed to order and covered in some nasty "sauce", it's just revolting the filth that they pass off as food all over America. (and the fancy expensive restaurants are not much better). And you have to sit around forever while the waiter does god knows what and try to act nice and make the most of it while poisonous filth is flopped down in front of you.
I like the idea of vacations that are for a certain activity that you like. Not going to see sights or relax, but to go hiking in some place that's really great for hiking, or to go biking, or surfing, or whatever you like. I sort of did this with the CA work/bike-cation, and it was rocking good. I'd like to do it more, but it's hard to find good information. A lot of the "epic hikes" or "great bike rides" are actually total shit; the rating is done by people who don't know WTF they're talking about. (same is true of "great beaches", which are often total crap beaches except for their white sand or something stupid like that). For example, I know that Hwy 1 in CA is on many a list of epic rides, and having lived there for a long time I know that's totally retarded; not only is it not epic, it's barely even tolerable, like I would never ride it by choice (I only rode it when necessary to connect a loop between other roads), and in the same area where they recommend Hwy 1 there are probably 30 rides that are much better. So anyway, actually finding solid information on places that are good "destination biking" is very difficult.
I'm also getting more sensitive about travelling places where the tourism is sort of a form of exploitation. In Hawaii the bad vibes are mild, but they're definitely there. We stole these islands from the Hawaiians, and now they are mostly pretty poor and get to watch rich tourists come in and buy up their best land and crowd up their favorite local spots. But despite that Hawaii is immensely better than other beach destinations I've gone to. In Mexico & Central America you get to see the abject poverty of people whose lives have been destroyed by government corruption and "free trade" (which is a transparent absurdity when we own all the patents and subsidize our exports and fuel costs); most of the beach developments were the result of the government evicting the people who rightfully lived there with minimal compensation; you used to be able to get away from the Zona Hoteleria areas and find sweet little towns that were still pretty untouched, but that's increasingly hard. In Thailand you're surrounded by the sex tourists and the cheap-booze backpacking set, who generally sleaze the place up (but it's better when you get away from the tourist-heavy areas).
Anyhoo, some photos from Kauai :
|
|
(including "tree canopy" and "how to look at tree canopy")
The information on the internet is now almost entirely one of :
1. Advertising. Sometimes even subtly hidden advertising (there are now tons of "blogs" that are actually advertisements, and a lot of the posters on web forums are actually advertisers who are more or less clever about it).
2. Ignorant. Stuff like answers.yahoo and eHow and Yelp and so on are once in a while written by someone who knows their topic, but usually not. Reading these sites is often more harmful than helpful.
Oddly, the vast majority of blogs about things like cooking, cars, home improvement, or any DIY hobbies are not written by people who actually do those things and know anything about them. They're usually written by housewives or techie nerds who just want some attention or love blogging or god knows why they do it. It should sort of be harmless for ignorant people to write about their adventures building a shack for the first time (lord knows I do it), but it's actually not harmless. For one thing, they tend to become popular and so become the leading search results, ahead of much better information which is drier or not so cutesy. For another, the writers often present themselves as more well informed than they actually are, and they often misrepresent the success of their endeavour.
3. Self-vertising. Even some of the better blogs are just ways to self-promote or otherwise make money. This can be okay and there can still be good information from the self-vertisers, but they also do a lot of padding, a lot of repetition, and heavily distort the truth to make themselves seem more important. The tech self-vertisers tend to be annoyingly pedantic and act like experts when they are not. They almost never do the helpful thing and link to their (better) original sources. They often use the same style as pundits or paid "experts" in that they present their solution as The One True Way to give it extra legitimacy, when in fact the truth is more nuanced (maybe there are disadvantages that they don't talk about, or equally valid other solutions that already exist, or uncertainties in the parameters). Part of the problem with the self-vertisers is that they all mutually promote and are very active about SEO, so they become the primary visible voices. Also to pad their posting they tend to grab "facts" from other sources and repeat them, which creates a bad false sense of confidence in those nonsensities because they are being repeated all over.
Somewhat related to this are the lunatics with some kind of agenda. They aren't exactly advertising, but they are rabid about some point and so spam the web with their "facts" which are just creations designed to prove their point. It makes it almost impossible to find information about controversial topics, because these people are so active that they dominate search results.
4. Communities. I used to get some of my best information from web communities/forums. The great thing about them is that you can find these individual posters that hang out on them who are actually true experts in the field; like if you're searching for home improvement stuff you can find guys in web communities who are actual long term builders and provide solid facts; or for car info you can find people who actually build or race cars and know WTF they're talking about. However, it takes a lot of work to find those guys; they generally are not the most frequent posters, they tend to pop in and snipe some amazing wisdom once in a while and then disappear. You have to do a lot of scrounging around, and read multiple posts from each poster to try to assess the credibility of the individual user.
But I've been noticing something really nasty about web communities recently. They tend to get into this kind of rigid group-think which can lead them to constantly repeat certain "facts" despite there being no substance to them. What happens is some strong personality on the forum promotes some fact and everyone gives it a "thumbs up" , they start repeating it everytime someone asks that question, and it winds up in the FAQ. Posters on web communities are highly motivated by the approval of their peers; they act like a pack of high schoolers who are constantly looking around to make sure everyone else thinks they're cool. There's very little independent thinking and willingness to challenge the group-think. There's lots of high-fiving.
The truly wise tend to be humble and a bit soft-spoken. That's all well and good, but in the juvenile shouting match which is the modern internet, it's the people who are unashamed to loudly pontificate and bully about things they know not much of who are heard.
Try searching for something like "Calphalon" or "Big Island Waterfall" and see how many results you can find that aren't one of those 4 groups. Sure there's still signal out there but it's getting drowned in the noise.
Anyhoo. One of the symptoms of the dying internet that I've noticed is that there are basically no tech blogs for me to read any more. Maybe I'm just out of the loop? Are you all blogging on facebook now, or some other closed system that I refuse to join?
A few years ago, I felt like I was getting really superb quality tech blogs in my RSS on an almost daily basis, and now that has slowed to a trickle of maybe one a week or one a month. The vast majority of people that I liked and followed are not posting any more. What gives?
I understand that a lot of people who blog do it for a while, but lose steam and their blog goes silent. But there should be new people picking up the mantle; maybe I just haven't been active enough about figuring out who the good new bloggers are.
For reference, my tech blog subscriptions :
<opml version="1.0">
<head>
<title>cbloom subscriptions in Google Reader</title>
</head>
<body>
<outline text=".mischief.mayhem.soap."
title=".mischief.mayhem.soap." type="rss"
xmlUrl="http://msinilo.pl/blog/?feed=rss2" htmlUrl="http://msinilo.pl/blog"/>
<outline text="1024cores" title="1024cores" type="rss"
xmlUrl="http://feeds.feedburner.com/1024cores" htmlUrl="http://blog.1024cores.net/"/>
<outline text="A random walk through geek-space"
title="A random walk through geek-space" type="rss"
xmlUrl="http://api.live.net/Users(4929737823860505484)/Main?$format=rss20" htmlUrl="http://sebastiansylvan.wordpress.com"/>
<outline text="Amit's Thoughts" title="Amit's Thoughts"
type="rss"
xmlUrl="http://amitp.blogspot.com/feeds/posts/default" htmlUrl="http://amitp.blogspot.com/"/>
<outline text="Aras' website" title="Aras' website" type="rss"
xmlUrl="http://aras-p.info/atom.xml" htmlUrl="http://aras-p.info/"/>
<outline text="Atom" title="Atom" type="rss"
xmlUrl="http://farrarfocus.blogspot.com/feeds/posts/default" htmlUrl="http://farrarfocus.blogspot.com/"/>
<outline text="Attractive Chaos" title="Attractive Chaos"
type="rss"
xmlUrl="http://attractivechaos.wordpress.com/feed/" htmlUrl="http://attractivechaos.wordpress.com"/>
<outline text="Beautiful Pixels" title="Beautiful Pixels"
type="rss"
xmlUrl="http://feeds.feedburner.com/BeautifulPixels" htmlUrl="http://beautifulpixels.blogspot.com/"/>
<outline text="Birth of a Game" title="Birth of a Game"
type="rss"
xmlUrl="http://uber.typepad.com/birthofagame/atom.xml" htmlUrl="http://uber.typepad.com/birthofagame/"/>
<outline text="bitsquid: development blog"
title="bitsquid: development blog" type="rss"
xmlUrl="http://bitsquid.blogspot.com/feeds/posts/default" htmlUrl="http://bitsquid.blogspot.com/"/>
<outline text="bouliiii's blog" title="bouliiii's blog"
type="rss"
xmlUrl="http://bouliiii.blogspot.com/feeds/posts/default" htmlUrl="http://bouliiii.blogspot.com/"/>
<outline text="Braid" title="Braid" type="rss"
xmlUrl="http://braid-game.com/news/?feed=rss2" htmlUrl="http://braid-game.com/news"/>
<outline text="Breaking Eggs And Making Omelettes"
title="Breaking Eggs And Making Omelettes" type="rss"
xmlUrl="http://multimedia.cx/eggs/feed/" htmlUrl="http://multimedia.cx/eggs"/>
<outline text="C++Next" title="C++Next" type="rss"
xmlUrl="http://cpp-next.com/feed/" htmlUrl="http://cpp-next.com"/>
<outline text="c0de517e" title="c0de517e" type="rss"
xmlUrl="http://c0de517e.blogspot.com/feeds/posts/default" htmlUrl="http://c0de517e.blogspot.com/"/>
<outline text="Canned Platypus" title="Canned Platypus"
type="rss" xmlUrl="http://pl.atyp.us/wordpress/?feed=rss2" htmlUrl="http://pl.atyp.us/wordpress"/>
<outline text="cbloom rants" title="cbloom rants" type="rss"
xmlUrl="http://feeds.feedburner.com/CbloomRants" htmlUrl="http://cbloomrants.blogspot.com/"/>
<outline text="Cessu's blog" title="Cessu's blog" type="rss"
xmlUrl="http://cessu.blogspot.com/feeds/posts/default" htmlUrl="http://cessu.blogspot.com/"/>
<outline text="CodeItNow" title="CodeItNow" type="rss"
xmlUrl="http://www.rorydriscoll.com/feed/" htmlUrl="http://www.rorydriscoll.com"/>
<outline text="Coder Corner" title="Coder Corner" type="rss"
xmlUrl="http://www.codercorner.com/blog/?feed=rss2" htmlUrl="http://www.codercorner.com/blog"/>
<outline text="copypastepixel" title="copypastepixel" type="rss"
xmlUrl="http://copypastepixel.blogspot.com/feeds/posts/default" htmlUrl="http://copypastepixel.blogspot.com/"/>
<outline text="Corensic" title="Corensic" type="rss"
xmlUrl="http://corensic.wordpress.com/feed/" htmlUrl="http://corensic.wordpress.com"/>
<outline text="Diary of a Graphics Programmer"
title="Diary of a Graphics Programmer" type="rss"
xmlUrl="http://diaryofagraphicsprogrammer.blogspot.com/feeds/posts/default" htmlUrl="http://diaryofagraphicsprogrammer.blogspot.com/"/>
<outline text="Diary Of An x264 Developer"
title="Diary Of An x264 Developer" type="rss"
xmlUrl="http://x264dev.multimedia.cx/?feed=atom" htmlUrl="http://x264dev.multimedia.cx/"/>
<outline text="direct to video" title="direct to video"
type="rss" xmlUrl="http://directtovideo.wordpress.com/feed/" htmlUrl="http://directtovideo.wordpress.com"/>
<outline text="el trastero" title="el trastero" type="rss"
xmlUrl="http://www.iquilezles.org/blog/?feed=rss2" htmlUrl="http://www.iquilezles.org/blog"/>
<outline text="EntBlog" title="EntBlog" type="rss"
xmlUrl="http://feeds2.feedburner.com/EntBlog" htmlUrl="http://entland.homelinux.com/blog"/>
<outline text="EnterTheSingularity" title="EnterTheSingularity"
type="rss"
xmlUrl="http://enterthesingularity.blogspot.com/feeds/posts/default?alt=rss" htmlUrl="http://enterthesingularity.blogspot.com/"/>
<outline text="Fast Data Compression"
title="Fast Data Compression" type="rss"
xmlUrl="http://fastcompression.blogspot.com/feeds/posts/default" htmlUrl="http://fastcompression.blogspot.com/"/>
<outline text="fixored?" title="fixored?" type="rss"
xmlUrl="http://www.sjbrown.co.uk/feed/" htmlUrl="http://www.sjbrown.co.uk"/>
<outline text="Game Angst" title="Game Angst" type="rss"
xmlUrl="http://gameangst.com/?feed=rss2" htmlUrl="http://gameangst.com"/>
<outline text="Game Rendering" title="Game Rendering" type="rss"
xmlUrl="http://www.gamerendering.com/feed/atom/" htmlUrl="http://www.gamerendering.com/"/>
<outline text="Game Rendering" title="Game Rendering" type="rss"
xmlUrl="http://www.gamerendering.com/feed/" htmlUrl="http://www.gamerendering.com"/>
<outline text="GameArchitect" title="GameArchitect" type="rss"
xmlUrl="http://gamearchitect.net/feed/" htmlUrl="http://gamearchitect.net"/>
<outline text="Gamedev Coder Diary" title="Gamedev Coder Diary"
type="rss" xmlUrl="http://gamedevcoder.wordpress.com/feed/" htmlUrl="http://gamedevcoder.wordpress.com"/>
<outline text="Graphic Rants" title="Graphic Rants" type="rss"
xmlUrl="http://graphicrants.blogspot.com/feeds/posts/default" htmlUrl="http://graphicrants.blogspot.com/"/>
<outline text="Graphics Runner" title="Graphics Runner"
type="rss"
xmlUrl="http://graphicsrunner.blogspot.com/feeds/posts/default" htmlUrl="http://graphicsrunner.blogspot.com/"/>
<outline text="Graphics Size Coding"
title="Graphics Size Coding" type="rss"
xmlUrl="http://sizecoding.blogspot.com/feeds/posts/default" htmlUrl="http://sizecoding.blogspot.com/"/>
<outline text="Gustavo Duarte" title="Gustavo Duarte" type="rss"
xmlUrl="http://feeds2.feedburner.com/GustavoDuarte" htmlUrl="http://duartes.org/gustavo/blog"/>
<outline text="Hardwarebug" title="Hardwarebug" type="rss"
xmlUrl="http://hardwarebug.org/feed/" htmlUrl="http://hardwarebug.org"/>
<outline text="hbr" title="hbr" type="rss"
xmlUrl="http://brnz.org/hbr/?feed=rss2" htmlUrl="http://brnz.org/hbr"/>
<outline text="Humus" title="Humus" type="rss"
xmlUrl="http://www.humus.name/rss.xml" htmlUrl="http://www.humus.name"/>
<outline text="I am an extreme moderate"
title="I am an extreme moderate" type="rss"
xmlUrl="https://extrememoderate.wordpress.com/feed/" htmlUrl="https://extrememoderate.wordpress.com"/>
<outline text="I Get Your Fail" title="I Get Your Fail"
type="rss" xmlUrl="http://feeds.feedburner.com/IGetYourFail" htmlUrl="http://igetyourfail.blogspot.com/"/>
<outline text="Ignacio Castaño" title="Ignacio Castaño"
type="rss" xmlUrl="http://castano.ludicon.com/blog/feed/" htmlUrl="http://www.ludicon.com/castano/blog"/>
<outline text="Industrial Arithmetic"
title="Industrial Arithmetic" type="rss"
xmlUrl="http://industrialarithmetic.blogspot.com/feeds/posts/default" htmlUrl="http://industrialarithmetic.blogspot.com/"/>
<outline text="John Ratcliff's Code Suppository"
title="John Ratcliff's Code Suppository" type="rss"
xmlUrl="http://codesuppository.blogspot.com/feeds/posts/default" htmlUrl="http://codesuppository.blogspot.com/"/>
<outline text="Just Software Solutions Blog"
title="Just Software Solutions Blog" type="rss"
xmlUrl="http://www.justsoftwaresolutions.co.uk/index.rss" htmlUrl="http://www.justsoftwaresolutions.co.uk/blog/"/>
<outline text="Lair Of The Multimedia Guru"
title="Lair Of The Multimedia Guru" type="rss"
xmlUrl="http://guru.multimedia.cx/feed/" htmlUrl="http://guru.multimedia.cx"/>
<outline text="Larry Osterman's WebLog"
title="Larry Osterman's WebLog" type="rss"
xmlUrl="http://blogs.msdn.com/larryosterman/rss.xml" htmlUrl="http://blogs.msdn.com/b/larryosterman/"/>
<outline text="Level of Detail" title="Level of Detail"
type="rss" xmlUrl="http://levelofdetail.wordpress.com/feed/" htmlUrl="http://levelofdetail.wordpress.com"/>
<outline text="level of detail" title="level of detail"
type="rss" xmlUrl="http://www.jshopf.com/blog/?feed=rss2" htmlUrl="http://jshopf.com/blog"/>
<outline text="Light is beautiful" title="Light is beautiful"
type="rss"
xmlUrl="http://feeds.feedburner.com/LightIsBeautiful?format=xml" htmlUrl="http://lousodrome.net/blog/light"/>
<outline text="Lightning Engine" title="Lightning Engine"
type="rss"
xmlUrl="http://feeds2.feedburner.com/LightningEngine" htmlUrl="http://blog.makingartstudios.com"/>
<outline text="Lost in the Triangles"
title="Lost in the Triangles" type="rss"
xmlUrl="http://feeds.feedburner.com/LostInTheTriangles" htmlUrl="http://aras-p.info/"/>
<outline text="Mark's Blog" title="Mark's Blog" type="rss"
xmlUrl="http://blogs.technet.com/markrussinovich/rss.xml" htmlUrl="http://blogs.technet.com/b/markrussinovich/"/>
<outline text="meshula.net" title="meshula.net" type="rss"
xmlUrl="http://meshula.net/wordpress/?feed=rss2" htmlUrl="http://meshula.net/wordpress"/>
<outline text="Miles Macklin's blog"
title="Miles Macklin's blog" type="rss"
xmlUrl="http://blog.mmacklin.com/feed/" htmlUrl="http://blog.mmacklin.com"/>
<outline text="Mod Blog" title="Mod Blog" type="rss"
xmlUrl="http://www.modularpeople.com/blog/?feed=rss2" htmlUrl="http://www.modularpeople.com/blog"/>
<outline text="Molecular Musings" title="Molecular Musings"
type="rss"
xmlUrl="http://molecularmusings.wordpress.com/feed/" htmlUrl="http://molecularmusings.wordpress.com"/>
<outline text="Monty" title="Monty" type="rss"
xmlUrl="http://xiphmont.livejournal.com/data/rss" htmlUrl="http://xiphmont.livejournal.com/"/>
<outline text="My Green Paste, Inc."
title="My Green Paste, Inc." type="rss"
xmlUrl="http://mygreenpaste.blogspot.com/feeds/posts/default" htmlUrl="http://mygreenpaste.blogspot.com/"/>
<outline text="Nerdblog.com" title="Nerdblog.com" type="rss"
xmlUrl="http://www.nerdblog.com/feeds/posts/default" htmlUrl="http://www.nerdblog.com/"/>
<outline text="nothings' projects" title="nothings' projects"
type="rss" xmlUrl="http://nothings.org/projects/?feed=rss2" htmlUrl="http://nothings.org/projects"/>
<outline text="Nynaeve" title="Nynaeve" type="rss"
xmlUrl="http://www.nynaeve.net/?feed=rss2" htmlUrl="http://www.nynaeve.net"/>
<outline text="onepartcode.com" title="onepartcode.com"
type="rss" xmlUrl="http://onepartcode.com/main/index.rss" htmlUrl="http://onepartcode.com/main"/>
<outline text="Online Game Techniques"
title="Online Game Techniques" type="rss"
xmlUrl="http://onlinegametechniques.blogspot.com/feeds/posts/default" htmlUrl="http://onlinegametechniques.blogspot.com/"/>
<outline text="Pete Shirley's Graphics Blog"
title="Pete Shirley's Graphics Blog" type="rss"
xmlUrl="http://psgraphics.blogspot.com/feeds/posts/default" htmlUrl="http://psgraphics.blogspot.com/"/>
<outline text="Pixels, Too Many.." title="Pixels, Too Many.."
type="rss" xmlUrl="http://pixelstoomany.wordpress.com/feed/" htmlUrl="http://pixelstoomany.wordpress.com"/>
<outline text="Preshing on Programming"
title="Preshing on Programming" type="rss"
xmlUrl="http://preshing.com/feed" htmlUrl="http://preshing.com"/>
<outline text="Ray Tracey's blog" title="Ray Tracey's blog"
type="rss"
xmlUrl="http://raytracey.blogspot.com/feeds/posts/default" htmlUrl="http://raytracey.blogspot.com/"/>
<outline text="Real-Time Rendering" title="Real-Time Rendering"
type="rss"
xmlUrl="http://www.realtimerendering.com/blog/feed/" htmlUrl="http://www.realtimerendering.com/blog"/>
<outline text="realtimecollisiondetection.net - the blog"
title="realtimecollisiondetection.net - the blog" type="rss"
xmlUrl="http://realtimecollisiondetection.net/blog/?feed=atom" htmlUrl="http://realtimecollisiondetection.net/blog"/>
<outline text="Reenigne blog" title="Reenigne blog" type="rss"
xmlUrl="http://www.reenigne.org/blog/feed/" htmlUrl="http://www.reenigne.org/blog"/>
<outline text="RenderWonk" title="RenderWonk" type="rss"
xmlUrl="http://renderwonk.com/blog/index.php/feed/" htmlUrl="http://renderwonk.com/blog"/>
<outline text="ridiculous_fish" title="ridiculous_fish"
type="rss" xmlUrl="http://ridiculousfish.com/blog/feed/" htmlUrl="http://ridiculousfish.com/blog/"/>
<outline text="Sanders' blog" title="Sanders' blog" type="rss"
xmlUrl="http://sandervanrossen.blogspot.com/feeds/posts/default?alt=rss" htmlUrl="http://sandervanrossen.blogspot.com/"/>
<outline text="Self Shadow" title="Self Shadow" type="rss"
xmlUrl="http://blog.selfshadow.com/feed/" htmlUrl="http://blog.selfshadow.com/"/>
<outline text="Some Assembly Required"
title="Some Assembly Required" type="rss"
xmlUrl="http://assemblyrequired.crashworks.org/feed/" htmlUrl="http://assemblyrequired.crashworks.org"/>
<outline text="stinkin' thinkin'" title="stinkin' thinkin'"
type="rss"
xmlUrl="http://stinkygoat.livejournal.com/data/rss" htmlUrl="http://stinkygoat.livejournal.com/"/>
<outline text="Stuart Denman" title="Stuart Denman" type="rss"
xmlUrl="http://www.stuartdenman.com/feed/" htmlUrl="http://www.stuartdenman.com"/>
<outline text="Stumbling Toward 'Awesomeness'"
title="Stumbling Toward 'Awesomeness'" type="rss"
xmlUrl="http://www.chrisevans3d.com/pub_blog/?feed=atom" htmlUrl="http://www.chrisevans3d.com/pub_blog"/>
<outline text="Syntopia" title="Syntopia" type="rss"
xmlUrl="http://blog.hvidtfeldts.net/index.php/feed/" htmlUrl="http://blog.hvidtfeldts.net"/>
<outline text="Sébastien Lagarde" title="Sébastien Lagarde"
type="rss" xmlUrl="https://seblagarde.wordpress.com/feed/" htmlUrl="https://seblagarde.wordpress.com"/>
<outline text="The Atom Project" title="The Atom Project"
type="rss"
xmlUrl="http://www.farrarfocus.com/atom/index.atom" htmlUrl="http://www.farrarfocus.com/atom/"/>
<outline text="The Danger Zone" title="The Danger Zone"
type="rss" xmlUrl="http://mynameismjp.wordpress.com/feed/" htmlUrl="http://mynameismjp.wordpress.com"/>
<outline text="The Data Compression News Blog"
title="The Data Compression News Blog" type="rss"
xmlUrl="http://www.c10n.info/feed" htmlUrl="http://www.c10n.info"/>
<outline text="The Fifth Column" title="The Fifth Column"
type="rss"
xmlUrl="http://thefifthcolumn.com/blog/?feed=rss2" htmlUrl="http://thefifthcolumn.com/blog"/>
<outline text="The Ladybug Letter" title="The Ladybug Letter"
type="rss" xmlUrl="http://www.ladybugletter.com/?feed=atom" htmlUrl="http://www.ladybugletter.com/"/>
<outline text="The ryg blog" title="The ryg blog" type="rss"
xmlUrl="http://fgiesen.wordpress.com/feed/" htmlUrl="http://fgiesen.wordpress.com"/>
<outline text="The software rendering world"
title="The software rendering world" type="rss"
xmlUrl="http://winden.wordpress.com/feed/" htmlUrl="http://winden.wordpress.com"/>
<outline text="The Witness" title="The Witness" type="rss"
xmlUrl="http://the-witness.net/news/feed/" htmlUrl="http://the-witness.net/news"/>
<outline text="Transcendental Technical Travails"
title="Transcendental Technical Travails" type="rss"
xmlUrl="http://t-t-travails.blogspot.com/feeds/posts/default" htmlUrl="http://t-t-travails.blogspot.com/"/>
<outline text="Treatise on Graphics Programming"
title="Treatise on Graphics Programming" type="rss"
xmlUrl="http://www.wolfgang-engel.info/blogs/?feed=rss2" htmlUrl="http://www.wolfgang-engel.info/blogs"/>
<outline text="UMBC Games, Animation and Interactive Media"
title="UMBC Games, Animation and Interactive Media"
type="rss" xmlUrl="http://gaim.umbc.edu/feed/" htmlUrl="http://gaim.umbc.edu"/>
<outline text="View" title="View" type="rss"
xmlUrl="http://view.eecs.berkeley.edu/blog/rss.php?ver=2" htmlUrl="http://view.eecs.berkeley.edu/blog/"/>
<outline text="VirtualBlog" title="VirtualBlog" type="rss"
xmlUrl="http://www.virtualdub.org/blog/rss.xml" htmlUrl="http://virtualdub.org/blog/index.php"/>
<outline text="Voxelium" title="Voxelium" type="rss"
xmlUrl="http://voxelium.wordpress.com/feed/" htmlUrl="http://voxelium.wordpress.com"/>
<outline
text="What your mother never told you about graphics development"
title="What your mother never told you about graphics development"
type="rss"
xmlUrl="http://zeuxcg.blogspot.com/feeds/posts/default" htmlUrl="http://zeuxcg.blogspot.com/"/>
<outline
text="What your mother never told you about graphics development"
title="What your mother never told you about graphics development"
type="rss" xmlUrl="http://zeuxcg.org/feed/" htmlUrl="http://zeuxcg.org"/>
<outline text="Work Without Dread" title="Work Without Dread"
type="rss"
xmlUrl="http://workwithoutdread.blogspot.com/feeds/posts/default" htmlUrl="http://workwithoutdread.blogspot.com/"/>
<outline text="Zack Rusin" title="Zack Rusin" type="rss"
xmlUrl="http://zrusin.blogspot.com/feeds/posts/default" htmlUrl="http://zrusin.blogspot.com/"/>
<outline text="ZigguratVertigo's Hideout"
title="ZigguratVertigo's Hideout" type="rss"
xmlUrl="http://zigguratvertigo.com/feed/" htmlUrl="http://colinbarrebrisebois.com"/>
<outline text="Â Â Bartosz Milewski's Programming Cafe"
title="Â Â Bartosz Milewski's Programming Cafe" type="rss"
xmlUrl="http://bartoszmilewski.wordpress.com/feed/" htmlUrl="http://bartoszmilewski.com"/>
</body>
</opml>
(note for dumb people : we're not here to talk about boring obvious shit like "kids make you sleep less" or "many parents live out their frustrated life goals through their kids". That's an obvious given as a baseline that should not need to be said; on this blog we try to talk about the things that are past the baseline, though many readers seem to not get that and want to chime in with the material that was a prerequisite for this course; get out of here and go back to reading "Excessive DOF Photos of Crappy Food" or "The New Old Coding Bore" or "Precious Twee Artisinal All-Organic Parenting" or whatever banal blog you usually read)
1. Kids automatically make you cooler. They're like a +1 modifier on anything you do. Like if you're just some single guy and you're in good shape and do triathlons or whatever, who cares, you're kind of an obsessive dweeb. But if you're a good family-man dad and you do the same, then you're amazing cool fit dad. (of course there's valid reason for this +1, because it's so much harder to do anything once you have kids, they're such a huge energy-suck)
(I've long been aware that I have some sort of bad jealousy tick where I really hate awesome dads; whenever I meet a dad who's super-fit and has great kids and also has a great job and builds robots or writes books on the side, I'm just filled with loathing; I'm not entirely sure but I assume that instant gut loathing comes from jealousy; I also think those guys are liars/phonies. Like, I think they must actually be terrible dads, it's just not possible to do all those things and spend enough time with your kids; why aren't you exhausted and frazzled? perhaps they have very self-sacrificing wives who are actually doing all the work at home, and/or they aren't actually putting in the work at their job; something is amiss, my spidey senses tingle)
2. Kids let you do things you suck at without feeling awkward. Say you suck at skiing; if you just go as a single man and take beginners lessons and have to ski the tow-rope bunny slope, you feel embarassed and most people can't get over it (of course if you do it anyway, you actually are super cool, and it's the people who look down on you that are fucking retard losers, but I digress). With kids you can go and ski the bunny slope with them and nobody looks at you funny. If you go ice skate for the first time as a single adult and are falling all over and wobbly you're a weirdo, but if you do it with your kids, you're a cool dad.
(one of the great tragedies of life is that people stop doing new things around 20 because they don't want to look like a beginner; they also lose all humility and never want to admit that they are a beginner at something. It's super dumb and I've been trying to get past it for the last 20 years or so. It's so funny seeing men at track days or at home improvement stores; they obviously don't know a thing about cars or construction (like I don't), but they can't just admit it and go "yeah I'm a newbie, can you help me?" they have to act all macho-man and pretend to be in their element like "I need a ball-peen wrench to adjust the timing on my carburetor." Um, let's back up and try again.)
3. Kids let you do things that are dweeby to do as single people, like go to the zoo or ride in a carriage. Part of the issue is that those activities are just not quite interesting enough on their own, but when you have the +1 enjoyment modifier of seeing it through your kids' eyes that pushes them over the threshold of worth doingness. I've always loved factory tours and those living-history museums where you can see how stained glass is made or whatever, also science museums (particularly interactive ones), but they just aren't quite worth doing as an adult. Kids remove the difficult embarassingness of everyone around you thinking "why are you here? it's only really old people and families, childless adults are not allowed".
4. Kids give you an excuse to be a selfish inconsiderate asshole. This is not a good thing and lots of parents over-do it. (it starts with pregnant moms who use the pregnancy as an excuse to be selfish bitches way beyond what's necessary or appropriate). Things like we can be loud at the symphony because we have kids, or we can cut in line because we're pregnant, or lets take the best seats and spread out all over, or lets take all the chairs at the hotel pool and then leave a giant mess behind us, etc. People know that kids make it much harder for others to go "hey fucker, you're out of line" and they abuse that advantage.
5. Kids let you play. I'm super excited about this. For a long time I've known that what I really need in my life is *play* , not sports, not games, but just joyous pointless movement. Adults are so fucking uptight and trying to act cool and impress each other all the time that they can never just play (actually I had a pretty sweet thing going for a while with Ryan where we could play a bit, but that was rare). Of course there's a whole industry of "ecstatic dance" and shit like that which is basically adults paying someone to let them play, which is so sad and bizarre; you have these uptight type-A business assholes who are total fuckers to everyone all day long, and then they go in a room and listen to a teacher tell them to run around in circles and stick their tongue out; super bizarre disconnect there. Anyhoo, kids let you go to the park and run around and roll in the grass and jump on things and nobody thinks you're a weirdo. (alternatively : move to San Francisco; fucking wonderful place SF, but all the gentry and computerists are ruining it)
(I guess those funny-dress-up runs are also societal concoctions to let adults play; but they ruin it by being a regimented precisely specified play; you're still just trying to fit in and do what you're supposed to. Oh crap, I wore a tutu and everyone else is wearing a cape! And it's still competitive and judgemental - ooh look, that guy is really relaxing well. Adulthood is so bizarre.)
6. Kids let you not have friends. They let you turn inward and just hang out with family. And of course you get some socializing through yours kids doing things and hanging out with other parents. You don't have to make any effort to make adult relationships work, which is a pain in the ass. Kids let you just stay home with your family without being a weird lonely hermit. Of course this is also a danger if you take it too far; you see these families that are so drawn in and almost afraid of other adults that when they're out in public they hardly even look up at the world around them.
7. Kids let you feel okay about sucking. If you're not really doing anything with your life and you're just kind of a rotten human being, but you have kids - then you can think "I devote my life to my kids, they are my pride and joy, at least I've made them, they are my life's work". They provide a +1 smugness bonus.
8. Kids give you a new thing to be ridiculously analytical and obsessive and introspective about. Most type-A nerds have kind of gotten bored of thinking about life by the time they hit 25 or so. We've already thought and over-thought everything that we do in life ("what is actually going on in the little social interaction with the grocery store checkout person? should I make minimal polite smalltalk, or should I try to say something unexpected to cheer them up? do I feel bad for this person whose life has obviously taken such a wrong turn? am I trying to make them feel like my equal and not my servant?" etc). We've made charts and graphs about how various influences in our life affect our productivity, and it's just all old hat. ("should I turn the other cheek, or should I get aggressive back at this asshole? Turning the other cheek is a local optimization of my own happiness, but that does not create a social game-theory structure which directs overall behavior in a good way. Oh wait I've had this same thought like 100 times before"). Kids give you a whole new set of things to be anal and nerdy about, read books and think about cause-effect and blah blah blah. Of course this blog post is a sign that I've already begun.
| ||||||||||
|
| |||||||||
Man it feels great to be here. The house is incredible, just as we hoped, tons of windows and a big view of Mauna Kea with not a single neighbor around. I feel alive, young, virile, lithe. I love the sun and the sweat. I love the trees and the good vibes.
I packed my bike this year (mild hassle (and the damn TSA opened my box and disturbed my careful packing)), looking forward to getting some good rides.
BTW you may notice that the correct ergonomic position for a "laptop" is about three feet above the human lap.
I can't wait for Tasha to pop the kids out so we can travel with them and play on the beach and run around in the trees.
1. Chickens don't need a big coop. They don't like to be inside, they like to be outside (as noted above, I'm assuming a decently warm climate). The coop is just for sleeping and laying. Almost all the coop designs you'll see on the internet, and all the fancy ones you can buy, are much too big. Not only is it a waste of time and materials to build a big coop, it's a huge disadvantage because it takes up more space and is more work to clean and is harder to move.
2. Don't build a coop you can walk inside. As per #1, the coop should be small, and it should be high (chickens like to be up high to sleep). All you need is a small raised box. You do not need a door for humans or a floor at human height. Do, however, put an entire wall or roof on hinges so that you can open up the whole thing and easily reach every corner.
3. Don't over-engineer. Because the coop only needs to bear chicken-weight not human-weight, there's no need to use 2x4's or half inch plywood, you can use much lighter and smaller construction materials. Again most of the internet designs and coops you can buy are just way off here, way over-engineered. (it does need to be strong enough to be wind-proof and dog-proof; dogs are by far the biggest hazard to urban chickens).
Even if you want a movable coop, you don't really need wheels if you use suitably light building materials and are moderately athletic. It's very easy to just pick up a small coop and move it around the yard as needed.
4. Paint. I painted the inside of the coop, and some sites & people consider this silly and froo-froo, but I think it was a good call and would do it again. A thick coat of high-gloss provides great water proofing and provides a smooth surface, which makes for much easier cleanup and longer life.
5. Rain/Snow. In contrast to #3, you should *not* cut corners in following good practices for weather-proofing. In particular, don't leave exposed edges of plywood or sheathing (they delaminate very easily), do use good shingle-principle for roofing (overlap and cover holes), use a proper drip-edge to prevent water wrapping around, etc.
6. Doors. I put a bunch of doors in the coop and one thing I didn't really consider was that all the poop and shavings and such will constantly be getting in the door jamb, which will prevent closure if it's a tight fit. One option is just to intentionally make a sloppy door that's a loose fit; another is to put some kind of trough near the door so that closing it pushes out the crud into the gap. Many designs, including mine, feature a door hinged at the bottom, so that when it opens it becomes a ramp. This seems clever but is not a very functional door because of the poop-in-the-hinges problem, it just becomes a static ramp. Probably the best type of door is top-hinged, with a raised bottom sill to prevent crud building up there. There's just not a lot of need for doors though; if you make the whole coop open for cleanability (such as via a hinged or removable roof), you can just use that to get the eggs as well; there's no need for the cute little nesting boxes with individual doors that people do.
7. The roost is the backbone of the coop. The chickens will spend 90% of their indoor time on the roosts, so locating the roost is the most important aspect of the design. The coop is really just the roost and the nesting boxes, the chickens want to spend their time outside in the run or free ranging, not on the floor of the coop.
8. The Poop Trough. Because of #7, I've found that almost all the chicken poop that's inside the coop is in a perfect straight line under the roost. I think you could take advantage of that and put an angled trough under the roost so that the poop was super easy to clean out. Another option would be a line of wire mesh instead of solid floor under the roost, perhaps with a removable trough under the wire mesh.
9. Rats. You have to decide from the beginning if you want to try to make a rat-proof coop. Doing so is a major undertaking and requires careful design. For example, chicken wire is not rat-proof. To make a rat-proof coop, first you need a solid stone foundation (for a small coop the easiest way to rat-proof the floor is just to cover the whole floor with pavers or bricks; for a larger coop you wouldn't want to do that, so you have to dig down at least 1 foot underground and surround the perimeter with rat-proof wire mesh or concrete blocks; rats are excellent diggers). Then the entire coop must be surrounded with hardware cloth (wire mesh) or similar. Rats are also superb climbers and jumpers, so vertical barriers will not stop them (you need a closed roof).
Some people try to rat-proof by putting wire on the floor (rather than a solid paver floor or burying a barrier around the perimeter). This is not a great idea. What will happen is the rats will still dig under the coop and create a network of tunnels under the wire floor. The chickens knock their feed all around, so lots (most) of it will fall through the wire mesh into the gap below it, and the rats will have a party living in the dirt under the wire floor. This might be okay with you (at least the rats are not actually in the chicken's space) but I think that overall wire on the floor is actually worse than nothing.
10. Feeders. Lots of people advocate these big automatic hanging feeders that you can fill with feed and it will drop down to let out more. Unless you have made a seriously rat-proof coop, these things are a terrible idea. Rats with an unlimited supply of food like that will multiply incredibly rapidly. You're going to want to visit the chickens every day anyway, so I see no advantage to these gravity feeders, just give them their ration each day so that there aren't a lot of left-overs for the vermin.
11. The Run. You have to decide up front whether you are going to free-range the chickens or not. If you are going to free-range them, then you don't need any run at all, just let them out in the yard. If not, then you need a big run. A tiny run (like under the popular commercial A-frame "chicken tractor") is pointless and cruel. If I had a decent amount of land I would build a simple run by just putting in some posts and wrapping it in chicken wire. (obviously this run is not rat proof). There's no need to cover the top of a large run (assuming as above you do not use a big feeder which would attract other birds).
12. Free ranging in your yard kind of sucks. Chickens love to dig in soft soil, so will go after your new plantings and vegetable beds and dig up your seedlings. They like to sit on railings and handles and poop. You will have poop all over everything. It's not awesome. On the other hand, it is very easy. They will eat a better diet without you having to carefully manage the supplements in their feed. They also naturally return to their coop at night so you don't really have to do any work to get them in and out, they do it themselves.
13. The poop pile. If you are going to try to reuse the poop and shavings you get when you clean out the coop as manure, you need to locate a spot for the poop to rest. You will get a *lot* of waste out of the coop, so you need a big spot, and you need at least two piles so you can cycle the new into the old (like compost; poop needs 2-3 months rest before use). The poop pile should not be near the coop (or run) and should also not be near your planting beds to avoid pest and pathogen transfer. It can be hard to find a good location for the poop pile in an urban yard, so you may want to abandon this idea and just throw out the poop. The poop pile will also attract rats and flies (but of course so will composting); it may also attract justifiably irate neighbors.
I've been thinking about this for a while, but this washingtonpost blog about the correlation of video games and gun violence recently popped into my blog feed, so I'll use it as an example.
The Washington Post blog leads you to believe that the data shows an unequivocal lack of correlation between videogames and gun violence. That's retarded. It only takes one glance at the chart to see that the data is completely dominated by other factors, like probably most strongly the gun ownership rate. You can't possibly try to find the effect of a minor contributing factor without normalizing for other factors, which most of these "analyses" fail to do, which makes them totally bogus. Furthermore, as usual, you would need a much larger sample size to have any confidence in the data, and you'd have to question the selection of data that was done. Also the entire thing being charted is wrong; it shouldn't be video game spending per capita, it should be video games played per capita (especially with China on there), and it shouldn't be gun-related murders, it should be all murders (because the fraction of murders that is gun related varies strongly by gun control laws, while the all murders rate varies more directly with the level of economic and social development in a country).
(Using data and charts and graphs has been a very popular way to respond to the recent shootings. Every single one that I've seen is complete nonsense. People just want to make a point that they've previously decided, so they trot out some data to "prove it" or make it "non-partisan" as if their bogus charts somehow make it "factual". It's pathetic. Here's a good example of using tons of data to show absolutely nothing . If you want to make an editorial point, just write your opinion, don't trot out bogus charts to "back it up". )
It's extremely popular these days to "prove" that some intuition is wrong by finding some data that shows a reverse correlation. (blame Freakonomics, among other things). You get lots of this in the smarmy TED talks - "you may expect that stabbing yourself in the eye with a pencil is harmful, but in fact these studies show that stabbing yourself in the eye is correlated to longer life expectancy!" (and then everyone claps). The problem with all this cute semi-intellectualism is that it's very often just wrong.
Aside from just poor data analysis, one of the major flaws with this kind of reasoning is the assumption that you are measuring all the inputs and all the outputs.
An obvious case is education, where you get all kinds of bogus studies that show such-and-such program "improves learning". Well, how did you actually measure learning? Obviously something like cutting music programs out of schools "improves learning" if you measure "learning" in a myopic way that doesn't include the benefits of music. And of course you must also ask what else was changed between the measured kids and the control (selection bias, novelty effect, etc; essentially all the studies on charter schools are total nonsense since any selection of students and new environment will produce a short term improvement).
I believe that choosing the wrong inputs and outputs is even worse than the poor data analysis, because it can be so hidden. Quite often there are some huge (bogus) logical leaps where the article will measure some narrow output and then proceed to talk about it as if it was just "better". Even when your data analysis was correct, you did not show it was better, you showed that one specific narrow output that you chose to measure improved, and you have to be very careful to not start using more general words.
(one of the great classic "wrong output" mistakes is measuring GDP to decide if a government financial policy was successful; this is one of those cases where economists have in fact done very sophisticated data analysis, but with a misleadingly narrow output)
Being repetitive : it's okay if you are actually very specific and careful not to extrapolate. eg. if you say "lowering interest rates increased GDP" and you are careful not to ever imply that "increased GDP" necessarily means "was good for the economy" (or that "was good for the economy" meant "was good for the population"); the problem is that people are sloppy, in their analysis and their implications and their reading, so it becomes "lowering interest rates improved the well-being of the population" and that becomes accepted wisdom.
Of course you can transparently see the vapidity of most of these analyses because they don't propagate error bars. If they actually took the errors of the measurement, corrected for the error of the sample size, propagated it through the correlation calculation and gave a confidence at the end, you would see things like "we measured a 5% improvement (+- 50%)" , which is no data at all.
I saw Bryan Cox on QI recently, and there was some point about the US government testing whether heavy doses of LSD helped schizophrenics or not. Everyone was aghast but Bryan popped up with "actually I support data-based medicine; if it had been shown to help then I would be for that therapy". Now obviously this was a jokey context so I'll cut Cox some slack, but it does in fact reflect a very commonly held belief these days (that we should trust the data more than our common sense that it's a terrible idea). And it's just obviously retarded on the face of it. If the study had shown it to help, then obviously something was wrong with the study. Medical studies are almost always so flawed that it's hard to believe any of them. Selection bias is huge, novelty and placebo effect are huge; but even if you really have controlled for all that, the other big failure is that they are too short term, and the "output" is much too narrow. You may have improved the thing you were measuring for, but done lots of other harm that you didn't see. Perhaps they did measure a decrease in certain schizophrenia symptoms (but psychotic lapses and suicides were way up; oops that wasn't part of the output we measured).
Exercise/dieting and child-rearing are two major topics where you are just bombarded with nonsense pseudo-science "correlations" all the time.
Of course political/economic charts are useless and misleading. A classic falsehood that gets trotted out regularly is the charts showing "the economy does better under democrats" ; for one thing the sample size is just so small that it could be totally random ; for another the economy is more effected by the previous president than the current ; and in almost every case huge external factors are massively more important (what's the Fed rate, did Al Gore recently invent the internet, are we in a war or an oil crisis, etc.). People love to show that chart but it is *pure garbage* , it contains zero information. Similarly the charts about how the economy does right after a tax raise or decrease; again there are so many confounding factors and the sample size is so tiny, but more importantly tax raises tend to happen when government receipts are low (eg. economic growth is already slowing), while tax cuts tend to happen in flush times, so saying "tax cuts lead to growth" is really saying "growth leads to growth".
What I'm trying to get at in this post is not the ridiculous lack of science in all these studies and "facts", but the way that the popular press (and the semi-intellectual world of blogs and talks and magazines) use charts and graphs to present "data" to legitimize the bogus point.
I believe that any time you see a chart or graph in the popular press you should look away.
I know they are seductive and fun, and they give you a vapid conversation piece ("did you know that christmas lights are correlated with impotence?") but they in fact poison the brain with falsehoods.
Finally, back to the issue of video games and violence. I believe it is obvious on the face of it that video games contribute to violence. Of course they do. Especially at a young age, if a kid grows up shooting virtual men in the face it has to have some effect (especially on people who are already mentally unstable). Is it a big factor? Probably not; by far the biggest factor in violence is poverty, then government instability and human rights, then the gun ownership rate, the ease of gun purchasing, etc. I suspect that the general gun glorification in America is a much bigger effect, as is growing up in a home where your parents had guns, going to the shooting range as a child, rappers glorifying violence, movies and TV. Somewhere after all that, I'm sure video games contribute. The only thing we can actually say scientifically is that the effect is very small and almost impossible to measure due to the presence of much larger and highly uncertain factors.
(of course we should also recognize that these kind of crazy school shooting events are completely different than ordinary violence, and statistically are a drop in the bucket. I suspect the rare mass-murder psycho killer things are more related to a country's mental health system than anything else. Pulling out the total murder numbers as a response to these rare psychotic events is another example of using the wrong data and then glossing over the illogical jump.)
I think in almost all cases if you don't play pretend with data and just go and sit quietly and think about the problem and tap into your own brain, you will come to better conclusions.
This post will be code-heavy and the code will be ugly. This code is all sloppy about buffer sizes and string over-runs and such, so DO NOT copy-paste it into production unless you want security holes. (a particular nasty point to be wary of is that many of the APIs differ in whether they take a buffer size in bytes or chars, which with unicode is different)
We're gonna use these helpers to call into windows dlls :
template
<typename t_func_type>
t_func_type GetWindowsImport( t_func_type * pFunc , const char * funcName, const char * libName , bool dothrow)
{
if ( *pFunc == 0 )
{
HMODULE m = GetModuleHandle(libName);
if ( m == 0 ) m = LoadLibrary(libName); // adds extension for you
ASSERT_RELEASE( m != 0 );
t_func_type f = (t_func_type) GetProcAddress( m, funcName );
if ( f == 0 && dothrow )
{
throw funcName;
}
*pFunc = f;
}
return (*pFunc);
}
// GET_IMPORT can return NULL
#define GET_IMPORT(lib,name) (GetWindowsImport(&STRING_JOIN(fp_,name),STRINGIZE(name),lib,false))
// CALL_IMPORT throws if not found
#define CALL_IMPORT(lib,name) (*GetWindowsImport(&STRING_JOIN(fp_,name),STRINGIZE(name),lib,true))
#define CALL_KERNEL32(name) CALL_IMPORT("kernel32",name)
#define CALL_NT(name) CALL_IMPORT("ntdll",name)
I also make use of the cblib strlen, strcpy, etc. on wchars. Their implementation is obvious.
Also, for reference, to open a file handle just to read its attributes (to map its name) you use :
HANDLE f = CreateFile(from,
FILE_READ_ATTRIBUTES |
STANDARD_RIGHTS_READ
,FILE_SHARE_READ,0,OPEN_EXISTING,FILE_FLAG_BACKUP_SEMANTICS,0);
(also works on directories).
Okay now : How to get a final path name from a file handle :
1. On Vista+ , just use GetFinalPathNameByHandle.
GetFinalPathNameByHandle gives you back a "\\?\" prefixed path, or "\\?\UNC\" for network shares.
2. Pre-Vista, lots of people recommend mem-mapping the file and then using GetMappedFileName.
This is a bad suggestion. It doesn't work on directories. It requires that you actually have the file open for read, which is of course impossible in some scenarios. It's just generally a non-robust way to get a file name from a handle.
For the record, here is the code from MSDN to get a file name from handle using GetMappedFileName.
Note that GetMappedFileName gives you back an NT-namespace name, and I have factored out the
bit to convert that to Win32 into MapNtDriveName, which we'll come back to later.
BOOL GetFileNameFromHandleW_Map(HANDLE hFile,wchar_t * pszFilename,int pszFilenameSize)
{
BOOL bSuccess = FALSE;
HANDLE hFileMap;
pszFilename[0] = 0;
// Get the file size.
DWORD dwFileSizeHi = 0;
DWORD dwFileSizeLo = GetFileSize(hFile, &dwFileSizeHi);
if( dwFileSizeLo == 0 && dwFileSizeHi == 0 )
{
lprintf(("Cannot map a file with a length of zero.\n"));
return FALSE;
}
// Create a file mapping object.
hFileMap = CreateFileMapping(hFile,
NULL,
PAGE_READONLY,
0,
1,
NULL);
if (hFileMap)
{
// Create a file mapping to get the file name.
void* pMem = MapViewOfFile(hFileMap, FILE_MAP_READ, 0, 0, 1);
if (pMem)
{
if (GetMappedFileNameW(GetCurrentProcess(),
pMem,
pszFilename,
MAX_PATH))
{
//pszFilename is an NT-space name :
//pszFilename = "\Device\HarddiskVolume4\devel\projects\oodle\z.bat"
wchar_t temp[2048];
strcpy(temp,pszFilename);
MapNtDriveName(temp,pszFilename);
}
bSuccess = TRUE;
UnmapViewOfFile(pMem);
}
CloseHandle(hFileMap);
}
else
{
return FALSE;
}
return(bSuccess);
}
3. There's a more direct way to get the name from file handle : NtQueryObject.
NtQueryObject gives you the name of any handle. If it's a file handle, you get the file name. This name is an NT namespace name, so you have to map it down of course.
The core code is :
typedef enum _OBJECT_INFORMATION_CLASS {
ObjectBasicInformation, ObjectNameInformation, ObjectTypeInformation, ObjectAllInformation, ObjectDataInformation
} OBJECT_INFORMATION_CLASS, *POBJECT_INFORMATION_CLASS;
typedef struct _UNICODE_STRING {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} UNICODE_STRING, *PUNICODE_STRING;
typedef struct _OBJECT_NAME_INFORMATION {
UNICODE_STRING Name;
WCHAR NameBuffer[1];
} OBJECT_NAME_INFORMATION, *POBJECT_NAME_INFORMATION;
NTSTATUS
(NTAPI *
fp_NtQueryObject)(
IN HANDLE ObjectHandle, IN OBJECT_INFORMATION_CLASS ObjectInformationClass, OUT PVOID ObjectInformation, IN ULONG Length, OUT PULONG ResultLength )
= 0;
{
char infobuf[4096];
ULONG ResultLength = 0;
CALL_NT(NtQueryObject)(f,
ObjectNameInformation,
infobuf,
sizeof(infobuf),
&ResultLength);
OBJECT_NAME_INFORMATION * pinfo = (OBJECT_NAME_INFORMATION *) infobuf;
wchar_t * ps = pinfo->NameBuffer;
// info->Name.Length is in BYTES , not wchars
ps[ pinfo->Name.Length / 2 ] = 0;
lprintf("OBJECT_NAME_INFORMATION: (%S)\n",ps);
}
which will give you a name like :
OBJECT_NAME_INFORMATION: (\Device\HarddiskVolume1\devel\projects\oodle\examples\oodle_future.h)
and then you just have to pull off the drive part and call MapNtDriveName (mentioned previously but not
yet detailed).
Note that there's another call that looks appealing :
CALL_NT(NtQueryInformationFile)(f,
&block,
infobuf,
sizeof(infobuf),
FileNameInformation);
but NtQueryInformationFile seems to always give you just the file name without the drive. In fact
it seems possible to use NtQueryInformationFile and NtQueryObject to separate the drive part and path part.
That is, you get something like :
t: is substed to c:\trans
LogDosDrives prints :
T: : \??\C:\trans
we ask about :
fmName : t:\prefs.js
we get :
NtQueryInformationFile: "\trans\prefs.js"
NtQueryObject: "\Device\HarddiskVolume4\trans\prefs.js"
If there was a way to get the drive letter, then you could just use NtQueryInformationFile , but so far as I know there is no simple way, so we have to go through all this mess.
On network shares, it's similar but a little different :
y: is net used to \\charlesbpc\C$
LogDosDrives prints :
Y: : \Device\LanmanRedirector\;Y:0000000000034569\charlesbpc\C$
we ask about :
fmName : y:\xfer\path.txt
we get :
NtQueryInformationFile: "\charlesbpc\C$\xfer\path.txt"
NtQueryObject: "\Device\Mup\charlesbpc\C$\xfer\path.txt"
so in that case you could just prepend a "\" to NtQueryInformationFile , but again I'm not sure how
to know that what you got was a network share and not just a directory, so we'll go through all the
mess here to figure it out.
4. MapNtDriveName is needed to map an NT-namespace drive name to a Win32/DOS-namespace name.
I've found two different ways of doing this, and they seem to produce the same results in all the tests I've run, so it's unclear if one is better than the other.
4.A. MapNtDriveName by QueryDosDevice
QueryDosDevice gives you the NT name of a dos drive. This is the opposite of what we want, so we have to reverse the mapping. The way is to use GetLogicalDriveStrings which gives you all the dos drive letters, then you can look them up to get all the NT names, and thus create the reverse mapping.
Here's LogDosDrives :
void LogDosDrives()
{
#define BUFSIZE 2048
// Translate path with device name to drive letters.
wchar_t szTemp[BUFSIZE];
szTemp[0] = '\0';
// GetLogicalDriveStrings
// gives you the DOS drives on the system
// including substs and network drives
if (GetLogicalDriveStringsW(BUFSIZE-1, szTemp))
{
wchar_t szName[MAX_PATH];
wchar_t szDrive[3] = (L" :");
wchar_t * p = szTemp;
do
{
// Copy the drive letter to the template string
*szDrive = *p;
// Look up each device name
if (QueryDosDeviceW(szDrive, szName, MAX_PATH))
{
lprintf("%S : %S\n",szDrive,szName);
}
// Go to the next NULL character.
while (*p++);
} while ( *p); // double-null is end of drives list
}
return;
}
/**
LogDosDrives prints stuff like :
A: : \Device\Floppy0
C: : \Device\HarddiskVolume1
D: : \Device\HarddiskVolume2
E: : \Device\CdRom0
H: : \Device\CdRom1
I: : \Device\CdRom2
M: : \??\D:\misc
R: : \??\D:\ramdisk
S: : \??\D:\ramdisk
T: : \??\D:\trans
V: : \??\C:
W: : \Device\LanmanRedirector\;W:0000000000024326\radnet\raddevel
Y: : \Device\LanmanRedirector\;Y:0000000000024326\radnet\radmedia
Z: : \Device\LanmanRedirector\;Z:0000000000024326\charlesb-pc\c
**/
Recall from the last post that "\??\" is the NT-namespace way of mapping back to the win32 namespace.
Those are substed drives. The "net use" drives get the "Lanman" prefix.
MapNtDriveName using QueryDosDevice is :
bool MapNtDriveName_QueryDosDevice(const wchar_t * from,wchar_t * to)
{
#define BUFSIZE 2048
// Translate path with device name to drive letters.
wchar_t allDosDrives[BUFSIZE];
allDosDrives[0] = '\0';
// GetLogicalDriveStrings
// gives you the DOS drives on the system
// including substs and network drives
if (GetLogicalDriveStringsW(BUFSIZE-1, allDosDrives))
{
wchar_t * pDosDrives = allDosDrives;
do
{
// Copy the drive letter to the template string
wchar_t dosDrive[3] = (L" :");
*dosDrive = *pDosDrives;
// Look up each device name
wchar_t ntDriveName[BUFSIZE];
if ( QueryDosDeviceW(dosDrive, ntDriveName, ARRAY_SIZE(ntDriveName)) )
{
size_t ntDriveNameLen = strlen(ntDriveName);
if ( strnicmp(from, ntDriveName, ntDriveNameLen) == 0
&& ( from[ntDriveNameLen] == '\\' || from[ntDriveNameLen] == 0 ) )
{
strcpy(to,dosDrive);
strcat(to,from+ntDriveNameLen);
return true;
}
}
// Go to the next NULL character.
while (*pDosDrives++);
} while ( *pDosDrives); // double-null is end of drives list
}
return false;
}
4.B. MapNtDriveName by IOControl :
There's a more direct way using DeviceIoControl. You just send a message to the "MountPointManager"
which is the guy who controls these mappings. (this is from "Mehrdad" on Stackoverflow) :
struct MOUNTMGR_TARGET_NAME { USHORT DeviceNameLength; WCHAR DeviceName[1]; };
struct MOUNTMGR_VOLUME_PATHS { ULONG MultiSzLength; WCHAR MultiSz[1]; };
#define MOUNTMGRCONTROLTYPE ((ULONG) 'm')
#define IOCTL_MOUNTMGR_QUERY_DOS_VOLUME_PATH \
CTL_CODE(MOUNTMGRCONTROLTYPE, 12, METHOD_BUFFERED, FILE_ANY_ACCESS)
union ANY_BUFFER {
MOUNTMGR_TARGET_NAME TargetName;
MOUNTMGR_VOLUME_PATHS TargetPaths;
char Buffer[4096];
};
bool MapNtDriveName_IoControl(const wchar_t * from,wchar_t * to)
{
ANY_BUFFER nameMnt;
int fromLen = strlen(from);
// DeviceNameLength is in *bytes*
nameMnt.TargetName.DeviceNameLength = (USHORT) ( 2 * fromLen );
strcpy(nameMnt.TargetName.DeviceName, from );
HANDLE hMountPointMgr = CreateFile( ("\\\\.\\MountPointManager"),
0, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL, OPEN_EXISTING, 0, NULL);
ASSERT_RELEASE( hMountPointMgr != 0 );
DWORD bytesReturned;
BOOL success = DeviceIoControl(hMountPointMgr,
IOCTL_MOUNTMGR_QUERY_DOS_VOLUME_PATH, &nameMnt,
sizeof(nameMnt), &nameMnt, sizeof(nameMnt),
&bytesReturned, NULL);
CloseHandle(hMountPointMgr);
if ( success && nameMnt.TargetPaths.MultiSzLength > 0 )
{
strcpy(to,nameMnt.TargetPaths.MultiSz);
return true;
}
else
{
return false;
}
}
5. Fix MapNtDriveName for network names.
I said that MapNtDriveName_IoControl and MapNtDriveName_QueryDosDevice produced the same results and both worked. Well, that's only true for local drives. For network drives they both fail, but in different ways. MapNtDriveName_QueryDosDevice just won't find network drives, while MapNtDriveName_IoControl will hang for a long time and eventually time out with a failure.
We can fix it easily though because the NT path for a network share contains the valid win32 path
as a suffix, so all we have to do is grab that suffix.
bool MapNtDriveName(const wchar_t * from,wchar_t * to)
{
// hard-code network drives :
if ( strisame(from,L"\\Device\\Mup") || strisame(from,L"\\Device\\LanmanRedirector") )
{
strcpy(to,L"\\");
return true;
}
// either one :
//return MapNtDriveName_IOControl(from,to);
return MapNtDriveName_QueryDosDevice(from,to);
}
This just takes the NT-namespace network paths, like :
"\Device\Mup\charlesbpc\C$\xfer\path.txt"
->
"\\charlesbpc\C$\xfer\path.txt"
And we're done.
Windows has various "namespaces" or classes of file names :
1. DOS Names :
"c:\blah" and such.
Max path of 260 including drive and trailing null. Different cases refer to the same file, *however* different unicode encodings of the same character do *NOT* refer to the same file (eg. things like "accented e" and "e + accent previous char" are different files). See previous posts about code pages and general unicode disaster on Windows.
I'm going to ignore the 8.3 legacy junk, though it still has some funny lingering effects on even "long" DOS names. (for example, the longest path name length allowed is 244 characters, because they require room for an 8.3 name after the longest path).
2. Win32 Names :
This includes all DOS names plus all network paths like "\\server\blah".
The Win32 APIs can also take the "\\?\" names, which are sort of a way of peeking into the lower-level NT names.
Many people incorrectly think the big difference with the "\\?\" names is that the length can be much longer
(32768 instead of 260), but IMO the bigger difference is that the name that follows is treated as raw
characters. That is, you can have "/" or "." or ".." or whatever in the name - they do not get any processing.
Very scary. I've seen lots of code that blindly assumes it can add or remove "\\?\" with impunity - that is
not true!
"\\?\c:\" is a local path
"\\?\UNC\server\blah" is a network name like "\\server\blah"
Assuming you have your drives shared, you can get to yourself as "\\localhost\c$\"
I think the "\\?\" namespace is totally insane and using it is a Very Bad Idea. The vast majority of apps will do the wrong thing when given it, and many will crash.
3. NT names :
Win32 is built on "ntdll" which internally uses another style of name. They start with "\" and then refer to the drivers used to access them, like :
"\Device\HarddiskVolume1\devel\projects\oodle"
In the NT namespace network shares are named :
Pre-Vista :
\Device\LanmanRedirector\
<some per-user stuff>\server\share
Vista+ : Lanman way and also :
\Device\Mup\Server\share
And the NT namespace has a symbolic link to the entire Win32 namespace under "\Global??\" , so
"\Global??\c:\whatever"
is also a valid NT name, (and "\??\" is sometimes valid as a short version of "\Global??\").
What fun.
1. Run one thread locked to each core.
(NOTE : this is only appropriate on something like a game console where you are in control of all the threads! Do not do this on an OS like Windows where other apps may also be locking to cores, and you have the thread affinity scheduler problems, and so on).
The one-thread-per-core set of threads is your thread pool. All code runs as "tasks" (or jobs or whatever) on the thread pool.
The threads never actually do ANY OS Waits. They never switch. They're not really threads, you're not using any of the OS threading any more. (I suppose you still are using the OS to handle signals and such, and there are probably some OS threads that are running which will grab some of your time, and you want that; but you are not using the OS threading in your code).
2. All functions are coroutines. A function with no yields in it is just a very simple coroutine. There's no special syntax to be a coroutine or call a coroutine.
All functions can take futures or return futures. (a future is just a value that's not yet ready). Whether you want this to be totally implicit or not is up to your taste about how much of the operations behind the scenes are visible in the code.
For example if you have a function like :
int func(int x);
and you call it with a future
<int> :
future<int> y;
func(y);
it is promoted automatically to :
future<int> func( future<int> x )
{
yield x;
return func( x.value );
}
When you call a function, it is not a "branch", it's just a normal function call. If that function yields, it yields the whole current coroutine. That is, it's just like threading and waits, but rather with coroutines and yields.
To branch I would use a new keyword, like "start" :
future
<int> some_async_func(int x);
int current_func(int y)
{
// execution will step directly into this function;
// when it yields, current_func will yield
future<int> f1 = some_async_func(y);
// with "start" a new coroutine is made and enqueued to the thread pool
// my coroutine immediately continues to the f1.wait
future<int> f2 = start some_async_func(y);
return f1.wait();
}
"start" should really be an abbreviation for a two-phase launch, which allows a lot more flexibility.
That is, "start" should be a shorthand for something like :
start some_async_func(y);
is
coro * c = new coro( some_async_func(y); );
c->start();
because that allows batch-starting, and things like setting dependencies after creating the coro, which
I have found to be very useful in practice. eg :
coro * c[32];
for(i in 32)
{
c[i] = new coro( );
if ( i > 0 )
c[i-1]->depends( c[i] );
}
start_all( c, 32 );
Batch starting is one of those things that people often leave out. Starting tasks one by one is just like waiting for them one by one (instead of using a wait_all), it causes bad thread-thrashing (waking up and going back to sleep over and over, or switching back and forth).
3. Full stack-saving is crucial.
For this to be efficient you need a very small minimum stack size (4k is probably good) and you need stack-extension on demand.
You may have lots of pending coroutines sitting around and you don't want them gobbling all your memory with 64k stacks.
Full stack saving means you can do full variable capture for free, even in a language like C where tracking references is hard.
4. You stop using the OS mutex, semaphore, event, etc. and instead use coroutine variants.
Instead of a thread owning a lock, a coroutine owns a lock. When you block on a lock it's a yield of the coroutine instead a full OS wait.
Getting access to a mutex or semaphore is an event that can trigger coroutines being run or resumed. eg. it's a future just like the
return from an async procedural call. So you can do things like :
future
which yields your coroutine until the joint condition is met that the async func is done AND you can get the lock on "my_mutex".
<int> y = some_async_func();
yield( y , my_mutex.when_lock() );
Joint yields are very important because they prevent unnecessary coroutine wakeup. While coroutine thrashing is not nearly as bad as thread thrashing (and is one of the big advantages of coroutine-centric architecture (in fact perhaps the biggest)).
You must have coroutine versions of all the ops that have delays (file IO, networking, GPU, etc) so that you can yield on them instead of doing thread-waits.
5. You must have some kind of GC.
Because coroutines will constantly be capturing values, you must ensure their lifetime is >= the life of the coroutine. GC is the only reasonable way to do this.
I would also go ahead and put an RW-lock in every object as well since that will be necessary.
6. Dependencies and side effects should be expressed through args and return values.
You really need to get away from funcs like
void DoSomeStuff(void);
that have various un-knowable inputs and outputs. All inputs & outputs need to be values so that they
can be used to create dependency chains.
When that's not directly possible, you must use a convention to express it. eg. for file manipulation I recommend using a string containing the file name to express the side effects that go through the file system (eg. for Rename, Delete, Copy, etc.).
7. Note that coroutines do not fundamentally alter the difficulties of threading.
You still have races, deadlocks, etc. Basic async ops are much easier to write with coroutines, but they are no panacea and do not try to be anything other than a nicer way of writing threading. (eg. they are not transactional memory or any other auto-magic).
to be continued (perhaps) ....
Add 3/15/13 : 8. No static size anything. No resources you can run out of. This is another "best practice" that goes with modern thread design that I forgot to list.
Don't use fixed-size queues for thread communication; they seem like an optimization or simplification at first, but if you can ever hit the limit (and you will) they cause big problems. Don't assume a fixed number of workers or a maximum number of async ops in flight, this can cause deadlocks and be a big problem.
The thing is that a "coroutine centric" program is no longer so much like a normal imperative C program. It's moving towards a functional program where the path of execution is all nonlinear. You're setting a big graph to evaluate, and then you just need to be able to hit "go" and wait for the graph to close. If you run into some limit at some point during the graph evaluation, it's a big mess figuring out how to deal with that.
Of course the OS can impose limits on you (eg. running out of memory) and that is a hassle you have to deal with.
Any language with lambdas (that can be fired when an async completes) can simulate coroutines.
Assume we have some async function call :
future
which send the integer off over the net (or whatever) and eventually gets a result back. Assume
that future<> has a "AndThen" which schedules a function to run when it's done.
<int> AsyncFunc( int x );
Then you can write a sequence of operations like :
future
with a little munging we can make it look more like a standard coroutine :
<int> MySequenceOfOps( int x1 )
{
x1++;
future<int> f1 = AsyncFunc(x1);
return f1.AndThen( [](int x2){
x2 *= 2;
future<int> f2 = AsyncFunc(x2);
return f2.AndThen( [](int x3){
x3 --;
return x3;
} );
} );
}
#define YIELD(future,args) return future.AndThen( [](args){
future
the only really ugly bit is that you have to put a bunch of scope-closers at the end to match the number of
yields.
<int> MySequenceOfOps( int x1 )
{
x1++;
future<int> f1 = AsyncFunc(x1);
YIELD(f1,int x2)
x2 *= 2;
future<int> f2 = AsyncFunc(x2);
YIELD(f2,int x3)
x3 --;
return x3;
} );
} );
}
This is really what any coroutine is doing under the hood. When you hit a "yield", what it does is take the remainder of the function and package that up as a functor to get called after the async op that you're yielding on is done.
Coroutines from lambdas have a few disadvantages, aside from the scope-closers annoyance. It's ugly to do
anything but simple linear control flow. The above example is the very simple case of "imperative, yield, imperative,
yield" , but in real code you want to have things like :
if ( bool )
{
YIELD
}
or
while ( some condition )
{
YIELD
}
which while probably possible with lambda-coroutines, gets ugly.
An advantage of lambda-coroutines is if you're in a language where you have lambdas with variable-capture, then you get that in your coroutines.
I'm just catching up on this, so I'm going to make some notes about things that took a minute to figure out. Correct me where I'm wrong.
For the most part I'll be talking in C# lingo, because this stuff comes from C# and is much more natural there. There are C++/CX versions of all this, but they're rather more ugly. Occasionally I'll dip into what it looks like in CX, which is where we start :
1. "hat" (eg. String^)
Hat is a pointer to a ref-counted object. The ^ means inc and dec ref in scope. In cbloom code String^ is StringPtr.
The main note : "hat" is a thread-safe ref count, *however* it implies no other thread safety. That is,
the ref-counting and object destruction is thread safe / atomic , but derefs are not :
Thingy^ t = Get(); // thread safe ref increment here
t->var1 = t->var2; // non-thread safe var accesses!
There is no built-in mutex or anything like that for hat-objects.
2. "async" func keyword
Async is a new keyword that indicates a function might be a coroutine. It does not make the function
into an asynchronous call. What it really is is a "structify" or "functor" keyword (plus a "switch")
. Like a C++ lambda, the main thing the language does for you is package
up all the local variables and function arguments and put them all in a struct. That is (playingly
rather loosely with the translation for brevity) :
async void MyFunc( int x )
{
string y;
stuff();
}
[ is transformed to : ]
struct MyFunc_functor
{
int x;
string y;
void Do() { stuff(); }
};
void MyFunc( int x )
{
// allocator functor object :
MyFunc_functor * f = new MyFunc_functor();
// copy in args :
f->x = x;
// run it :
f->Do();
}
So obviously this functor that captures the function's state is the key to making this into an async
coroutine.
It is *not* stack saving. However for simple usages it is the same. Obviously crucial to this is using a language like C# which has GC so all the references can be traced, and everything is on the heap (perhaps lazily). That is, in C++ you could have pointers and references that refer to things on the stack, so just packaging up the args like this doesn't work.
Note that in the above you didn't see any task creation or asynchronous func launching, because it's not. The "async" keyword does not make a function async, all it does is "functorify" it so that it *could* become async. (this is in contrast to C++11 where "async" is an imperative to "run this asynchronously").
3. No more threads.
WinRT is pushing very hard to remove manual control of threads from the developer. Instead you have an OS thread pool that can run your tasks.
Now, I actually am a fan of this model in a limitted way. It's the model I've been advocating for games for a while. To be clear, what I think is good for games is : run 1 thread per core. All game code consists of tasks for the thread pool. There are no special purpose threads, any thread can run any type of task. All the threads are equal priority (there's only 1 per core so this is irrelevant as long as you don't add extra threads).
So, when a coroutine becomes async, it just enqueues to a thread pool.
There is this funny stuff about execution "context", because they couldn't make it actually clean (so that any task can run any thread in the pool); a "context" is a set of one or more threads with certain properties; the main one is the special UI context, which only gets one thread, which therefore can deadlock. This looks like a big mess to me, but as long as you aren't actually doing C# UI stuff you can ignore it.
See ConfigureAwait etc. There seems to be lots of control you might want that's intentionally missing. Things like how many real threads are in your thread pool; also things like "run this task on this particular thread" is forbidden (or even just "stay on the same thread"; you can only stay on the same context, which may be several threads).
4. "await" is a coroutine yield.
You can only use "await" inside an "async" func because it relies on the structification.
It's very much like the old C-coroutines using switch trick. await is given an Awaitable (an interface to an async op). At that point your struct is enqueued on the thread pool to run again when the Awaitable is ready.
"await" is a yield, so you may return to your caller immediately at the point that you await.
Note that because of this, "async/await" functions cannot have return values (* except for Task which we'll see next).
Note that "await" is the point at which an "async" function actually becomes async. That is, when you call an async function, it is *not* initially launched to the thread pool, instead it initially runs synchronously on the calling thread. (this is part of a general effort in the WinRT design to make the async functions not actually async whenever possible, minimizing thread switches and heap allocations). It only actually becomes an APC when you await something.
(aside : there is a hacky "await Task.Yield()" mechanism which kicks off your synchronous invocation of a coroutine to the thread pool without anything explicit to await)
I really don't like the name "await" because it's not a "wait" , it's a "yield". The current thread does not stop running, but the current function might be enqueued to continue later. If it is enqueued, then the current thread returns out of the function and continues in the calling context.
One major flaw I see is that you can only await one async; there's no yield_all or yield_any. Because of this
you see people writing atrocious code like :
await x;
await y;
await z;
stuff(x,y,z);
Now they do provide a Task.WhenAll and Task.WhenAny , which create proxy tasks that complete when the desired
condition is met, so it is possible to do it right (but much easier not to).
Of course "await" might not actually yield the coroutine; if the thing you are awaiting is already done, your coroutine may continue immediately. If you await a task that's not done (and also not already running), it might be run immediately on your thread. They intentionally don't want you to rely on any certain flow control, they leave it up to the "scheduler".
5. "Task" is a future.
The Task< > template is a future (or "promise" if you like) that provides a handle to get the result of a coroutine when it eventually completes. Because of the previously noted problem that "await" returns to the caller immediately, before your final return, you need a way to give the caller a handle to that result.
IAsyncOperation< > is the lower level C++/COM version of Task< > ; it's the same thing without the helper methods of Task.
IAsyncOperation.Status can be polled for completion. IAsyncOperation.GetResults can only be called after completed. IAsyncOperation.Completed is a callback function you can set to be run on completion. (*)
So far as I can tell there is no simple way to just Wait on an IAsyncOperation. (you can "await"). Obviously they are trying hard to prevent you from blocking threads in the pool. The method I've seen is to wrap it in a Task and then use Task.Wait()
(* = the .Completed member is a good example of a big annoyance : they play very fast-and-loose with documenting the thread safety semantics of the whole API. Now, I presume that for .Completed to make any sense it must be a thread-safe accessor, and it must be atomic with Status. Otherwise there would be a race where my completion handler would not get called. Presumably your completion handler is called once and only once. None of this is documented, and the same goes across the whole API. They just expect it all to magically work without you knowing how or why.)
(it seems that .NET used to have a Future< > as well, but that's gone since Task< > is just a future and having both is pointless (?))
So, in general if I read it as :
"async" = "coroutine" (hacky C switch + functor encapsulation)
"await" = yield
"Task" = future
then it's pretty intuitive.
What's missing?
Well there are some places that are syntactically very ugly, but possible. (eg. working with IAsyncOperation/IAsyncInfo in general is super ugly; also the lack of simple "await x,y,z" is a mistake IMO).
There seems to be no way to easily automatically promote a synchronous function to async. That is,
if you have something like :
int func1(int x) { return x+1; }
and you want to run it on a future of an int (Task< int >) , what you really want is just a simple syntax like :
future
which makes a coroutine that waits for its args to be ready and then runs the synchronous function.
(maybe it's possible to write a helper template that does this?)
<int> x = some async func that returns an int
future<int> y = start func1( x );
Now it's tempting to do something like :
future
and you see that all the time in example code,
but of course that is not the same thing at all and has many drawbacks (it waits immediately even though "y"
might not be needed for a while, it doesn't allow you to create async dependency chains, it requires you are
already running as a coroutine, etc).
<int> x = some async func that returns an int
int y = func1( await x );
The bigger issue is that it's not a real stackful coroutine system, which means it's not "composable",
something I've written about before :
cbloom rants 06-21-12 - Two Alternative Oodles
cbloom rants 10-26-12 - Oodle Rewrite Thoughts
Specifically, a coroutine cannot call another function that does the await. This makes sense if you think of the "await" as being the hacky C-switch-#define thing, not a real language construct. The "async" on the func is the "switch {" and the "await" is a "case ". You cannot write utility functions that are usable in coroutines and may await.
To call functions that might await, they must be run as their own separate coroutine. When they await,
they block their own coroutine, not your calling function. That is :
int helper( bool b , AsyncStream s )
{
if ( b )
{
return 0;
}
else
{
int x = await s.Get
The idea here is that "myfunc1" is a coroutine, it calls a function ("helper") which does a yield; that
yields out of the parent coroutine (myfunc1).
That does not work and is not allowed. It is what I would like to see in a good coroutine-centric language.
Instead you have to do something like :
<int>();
return x + 10;
}
}
async Task<int> myfunc1()
{
AsyncStream s = open it;
int x = helper( true, s );
return x;
}
async Task
Here "helper" is its own coroutine, and we have to block on it. Now it is worth noting that because WinRT
is aggressive about delaying heap-allocation of coroutines and is aggresive about running coroutines
immediately, the actual flow of the two cases is not very different.
<int> helper( bool b , AsyncStream s )
{
if ( b )
{
return 0;
}
else
{
int x = await s.Get<int>();
return x + 10;
}
}
async Task<int> myfunc1()
{
AsyncStream s = open it;
int x = await helper( true, s );
return x;
}
To be extra clear : lack of composability means you can't just have something like "cofread" which acts like synchronous fread , but instead of blocking the thread when it doesn't have enough data, it yields the coroutine.
You also can't write your own "cosemaphore" or "comutex" that yield instead of waiting the thread. (does WinRT provide cosemaphore and comutex? To have a fully functional coroutine-centric language you need all that kind of stuff. What does the normal C# Mutex do when used in a coroutine? Block the whole thread?)
There are a few places in the syntax that I find very dangerous due to their (false) apparent simplicity.
1. Args to coroutines are often references. When the coroutine is packaged into a struct and delayed
execution, what you get is a non-thread-safe pointer to some shared object. It's incredibly easy to
write code like :
async void func1( SomeStruct^ s )
{
s->DoStuff();
MoreStuff( s );
}
where in fact every touch of 's' is potentially a race and bug.
2. There is no syntax required to start a coroutine. This means you have no idea if functions are
async or not at the call site!
void func2()
{
DeleteFile("b");
CopyFile("a","b");
}
Does this code work? No idea! They might be coroutines, in which case DeleteFile might return before it's
done, and then I would be calling CopyFile before the delete. (if it is a coroutine, the fix is to call
"await", assuming it returned a Task).
Obviously the problem arises from side effects. In this case the file system is the medium for communicating side effects. To use coroutine/future code cleanly, you need to try to make all functions take all their inputs as arguments, and to return all their effects are return values. Even if the return is not necessary, you must return some kind of proxy to the change as a way of expressing the dependency.
"async void" functions are probably bad practice in general; you should at least return a Task with no data (future< void >) so that the caller has something to wait on if they want to. async functions with side effects are very dangerous but also very common. The fantasy that we'll all write pure functions that only read their args (by value) and put all output in their return values is absurd.
It's pretty bold of them to make this the official way to write new code for Windows. As an experimental C# language feature, I think it's pretty decent. But good lord man. Race city, here we come. The days of software having repeatable outcomes are over!
As a software design point, the whole idea that "async improves responsiveness" is rather disturbing. We're gonna get a lot more trickle-in GUIs, which is fucking awful. Yes, async is great for making tasks that the user *expects* to be slow to run in the background. What it should not be used for is hiding the slowness of tasks that should in fact be instant. Like when you open a new window, it should immediately appear fully populated with all its buttons and graphics - if there are widgets in the window that take a long time to appear, they should be fixed or deleted, not made to appear asynchronously.
The way web pages give you an initial view and then gradually trickle in updates? That is fucking awful and should be used as a last resort. It does not belong in applications where you have control over your content. But that is exactly what is being heavily pushed by MS for all WinRT apps.
Having buttons move around after they first appeared, or having buttons appear after the window first opened - that is *terrible* software.
(Windows 8 is of course itself an example; part of their trick for speeding up startup is to put more things delayed until after startup. You now have to boot up, and then sit there and twiddle your thumbs for a few minutes while it actually finishes starting up. (there are some tricks to reduce this, such as using Task Scheduler to force things to run immediately at the user login event))
Some links :
Jerry Nixon @work Windows 8 The right way to Read & Write Files in WinRT
Task.Wait and “Inlining” - .NET Parallel Programming - Site Home - MSDN Blogs
CreateThread for Windows 8 Metro - Shawn Hargreaves Blog - Site Home - MSDN Blogs
Diving deep with WinRT and await - Windows 8 app developer blog - Site Home - MSDN Blogs
Exposing .NET tasks as WinRT asynchronous operations - Windows 8 app developer blog - Site Home - MSDN Blogs
Windows 8 File access sample in C#, VB.NET, C++, JavaScript for Visual Studio 2012
Futures and promises - Wikipedia, the free encyclopedia
Effective Go - The Go Programming Language
Deceptive simplicity of async and await
async (C# Reference)
Asynchronous Programming with Async and Await (C# and Visual Basic)
Creating Asynchronous Operations in C++ for Windows Store Apps
Asynchronous Programming - Easier Asynchronous Programming with the New Visual Studio Async CTP
Asynchronous Programming - Async Performance Understanding the Costs of Async and Await
Asynchronous Programming - Pause and Play with Await
Asynchronous programming in C++ (Windows Store apps) (Windows)
AsyncAwait Could Be Better - CodeProject
File Manipulation in Windows 8 Store Apps
SharpGIS Reading and Writing text files in Windows 8 Metro
MS has gone from by far the most beloved sweet simple console API to develop for to this :
[System::PleaseLetMyAppRun::PrettyPlease]
Universe::MilkyWay::SolarSystem::Earth::DataTypes::Void
main( some complicated args that don't matter because they don't work ^ hat )
{
IPrintf^ p = System::GoodLord::Deprecated::stdio::COM::AreYouKiddingMe( IPrintfToken );
p->OnReady( [this]{ return CharStreamer( StreamBufferBuilder( StringStreamer( StringBuffer( CharConcatenator('h') +
CharConcatenator('e') + IQuitSoftware("llo world\n") )))) } );
}
(this example fails because it didn't request privilege elevation with the security token to access the console)
(and then still fails because it didn't list its imports correctly in its manifest xml)
1. The Win-X key. Win-X is mine; I've been using it to mean "maximize window" for the last 10 years. You can't have it. I've figured out how to disable their Win-X menu, but they still seem to be eating that key press somewhere very early, before I can see it. (they also stole my Win-1 and various other keys, but those went away with the NoWinKeys registry setting; Win-X seems unaffected by that setting).
2. Win 8 seems to have even more UAC than previous verions. As usual you can kill most of it by turning UAC down to min, setting Everyone to Full Control, and Taking Ownership of c:\ recursively. But apparently in Win 8 when you turn the UAC slider down to min, it no longer actually goes to off. Before Win 8, with UAC set to min all processes were "high integrity", now processes have to request elevation from code. One annoyance I haven't figured out how to fix is that net-used and subst'ed drives are now per-user. eg. if you open in admin cmd and do a subst, the drive doesn't show up in your normal explorer (and vice-versa).
3. There seems to be no way to tweak the colors, and the default colors are really annoying. Somebody thought it was a good idea to make every color white or light gray so all the windows and frames just run together and you can't easily spot the edges. You *can* tweak individual colors if you choose a "high contrast" theme (it's pretty standard on modern Windows that you only get the options you deserve by pretending to be disabled (reasonable things like "no animations" are all hidden in "accessibility")) - however, the "high contrast" theme seems to confuse every app (devenv, firefox) such that they use white text on white backgrounds. Doh.
Once you get Win 8 set up, it's basically Win 7. I don't understand what they were thinking putting the tablet UI as the default on the desktop. Mouse/keyboard user interface is so completely different from jamming your big fat clumsy fingers into a screen that it makes no sense to try to use one on the other. You wouldn't put tiny little buttons on a tablet, so why are you putting giant ham-finger tablet buttons on my desktop? Oh well, easy to disable.
So far the only improvement I've noticed (over Win 7) is that Windows Networking seems massively improved (finally, thank god). It might actually be faster to copy files across a local network than to re-download them from the internet now.
Some general OS ranting :
An OS should be a blank piece of paper. It is a tool for *me* to create what *I* want. It should not have a visual style. It should not be "designed" any more than a good quality blank piece of paper is designed.
(that said I prefer the Win 8 look to anything MS has done since Win 2k (which was the pinnacle of Windows, good lord how sweet it would be if I could still use Win 2k); Aero was an abortion, you don't base your OS GUI design on WinAmp for fuck's sake, though at least with the Aero-OS'es you could easily get a "classic" look, which is not so easy any more)
It's almost impossible to find an OS that actually respects its users any more. I want control of *everything*. If you add some new feature, fine, let me turn it off. If you change my key mappings, fine, let me put them back the way I'm used to.
I despise multi-user OS'es. In my experience they never actually work for security, and they are a constant huge pain in the ass. If you all want to make multi-user OS'es, please just give me a way to get a no-users install with a flat file system and just one set of configs. Nobody but me will ever touch my computer, I don't need this extra layer of shit that adds friction every single day that I use a computer (is that config under "cbloom" or is it under "all users"? Fuck, why do I have to think about this, give me a god damn flat OS. Oh wait the config was under "administrators" or "local machine". ARG). I know this is not gonna happen. Urg.
While we're at it can we talk about how ridiculously broken Windows is in general now?
One of the most basic thing you might want to do with an OS is to take your apps and data and config from one machine and put it on another. LOL, good luck. I know, let's just take all the system hardware config and the user settings and the app installs and let's just shuffle them all up in a big haystack.
Any serious developer needs to be able to clone their dev software from one machine to another, or completely back up their code + dev tools (but without backing up the whole machine, and be able to restore to a different machine).
Obviously the whole DLL mess is a disaster (and now manifests, packages, SxS assemblies, .net frameworks, WTF WTF). It's particularly insane to me that MS can't even get it right with their own damn runtimes. How in hell is it that I can run an exe on Windows and get an "msvcrxx not found" error? WTF, just ship all your damn runtimes with Windows, it can't be that big. And even if you don't want to ship them all, how can you not just have a server that gives me the missing runtimes? It's so insane.
God help you if you are trying to write software that can build on various installs of windows. Oh you have Win Vista SP1 with Platform SDK N, that's a totally different header which is missing some functions and causes some other weird warning, and you need .net framework X on this machine and blah blah it's such a total mess.
Ok, so XML suxors and all, but if you're going to use XML then *use XML*. When you rev the god damn devstudio you don't break the old file format, you just add new tags for whatever new crap you feel you need to add. You don't put the devstudio version in the header of the file, you put on the individual tags that are specific to that version.
If you need to do per-version settings files, put them in a different file than my basic list of what my source code is and how to build it. And of course don't mix up your GUI cache with my project data.
The thing that really boggles my mind is how they can make such a huge mistake, and then stick with it year after year. It's sort of understandable to make a mistake once (though I think this one was entirely avoidable), but then you go "whoah what a fuckup, let's change that". Nope.
(of course they've done the same thing with their flagship (Office). It's crazy broken that I can't at least load the text and basic formatting from any type of document into any version)
Something I've noticed is that in the last month or so a lot of cars with histories
like this are popping up :
09/06/2012 70,668 Inspection Co. New Jersey Inspection performed
11/20/2012 74,471 Covert Ford Austin, TX Car offered for sale
You're not fooling me, bub. I know what happened in NJ between those two dates! Beware!
cbloom rants 07-19-12 - Experimental Futures in Oodle
cbloom rants 10-26-12 - Oodle Rewrite Thoughts
It occurs to me that this could massively simplify the giant API.
What you do is treat "array data" as a special type of object that can be linearly broken up. (I noted previously about having RW locks in every object and special-casing arrays by letting them be RW-locked in portions instead of always locking the whole buffer).
Then arrays could have two special ways of running async :
1. Stream. A straightforward futures sequence to do something like read-compress-write would wait for the whole file read to be done before starting the compress. What you could do instead is have the read op immediately return a "stream future" which would be able to dole out portions of the read as it completed. Any call that processes data linearly can be a streamer, so "compress" could also return a stream future, and "write" would then be able to write out compressed bits as they are made, rather than waiting on the whole op.
2. Branch-merge. This is less of an architectural thing than just a helper (you can easily write it client-side with normal futures); it takes an array and runs the future on portions of it, rather than running on the whole thing. But having this helper in from the beginning means you don't have to write lots of special case branch-merges to do things like compress large files in several pieces.
So you basically just have a bunch of simple APIs that don't look particularly Async. Read just returns a buffer (future). ReadStream returns a buffer stream future. They look like simple buffer->buffer APIs and you don't have to write special cases for all the various async chains, because it's easy for the client to chain things together as they please.
To be redundant, the win is that you can write a function like Compress() and you write it just like a synchronous buffer-to-buffer function, but it's arguments can be futures and its return value can be a future.
Compress() should actually be a stackful coroutine, so that if the input buffer is a Stream buffer, then when you try to access bytes
that aren't yet available in that buffer, you Yield the coroutine (pending on the stream filling).
Functions take futures as arguments and return futures.
Every function is actually run as a stackful coroutine on the worker threadpool.
Functions just look like synchronous code, but things like file IO cause a coroutine Yield rather than a thread Wait.
All objects are ref-counted and create automatic dependency chains.
All objects have built-in RW locks, arrays have RW locks on regions.
Parallelism is achieved through generic Stream and Branch/Merge facilities.
While this all sounds very nice in theory, I'm sure in practice it wouldn't work. What I've found is that every parallel routine I write requires new hacky special-casing to make it really run at full efficiency.
1. The API is too big, it's too complicated. There are too many ways of doing basically the same thing, and too many arguments and options on the functions.
2. I really want to be able to do X but I can't do it exactly the way I want with the API, can you add another interface?
Neither one is wrong.
The structure of all these laws is basically the same : 1. allow tax cuts to pass by simple majority. 2. make tax raises difficult to pass (many states now require 2/3 super-majority for taxes increases (and they were already politically nearly impossible)). 3. set a debt limit and force mandatory cuts to balance the budget (many states actually have a debt limit of 0, every year's budget must be balanced, which is manifestly absurd given the variance of economies and the resulting receipts and costs).
They claim that this produces "fiscal responsibility" but of course they know that's a lie; the goal is small government and that's all it produces. If you wanted actual fiscal responsibility, you wouldn't cut taxes in flush times, instead you would require that governments save during surplus times to provide a cushion for recessionary times; you would also require that any tax *cut* is matched by spending cuts in order to pass. If you were actually fiscally responsible, you would allow defecits during recessions, but require them to be matched by tax raises in boom periods.
The result of these laws is obvious : a simple Republican majority can pass tax cuts when they are in power (and the dumb voters will love it), then it's almost impossible to put the taxes back where they were, and then you inevitably run out of money and have to cut spending. Particularly if you hit a recession and have to keep the budget balanced, you will have to slash government drastically.
(whether or not government should be minimal is open for debate, but the duplicitous method of acheiving it is incontrovertibly scummy)
So first of all, let's recall the source of the fiscal cliff. It is not the growth of entitlements, which is sort of an unrelated long term issue that people love to mix in to any financial discussion. The primary causes of the short term deficit are the Bush tax cuts and the recession. (the other major factors are the war spending and TARP spending (etc)). This is not a fundamental problem of the way the US government is run, it's the combination of cutting taxes and increasing spending that happened under GWB.
The other major issue that we must keep in mind is that we are still currently deep in a recession. Tiny amounts of GDP growth may hide this, and the unemployment numbers look better, but I believe the reality is that the American economy is still deeply sick, with no real growth of industry and no prospects. Essentially we are propping it up with the free money from the Fed and the super-low taxes. Any attempt to return to a sustainable Fed interest rate and tax rate would show the economy for what it really is. Trying to tighten the belt now would certainly look bad; I don't say that it would "lead to a recession", I believe we are in a recession and are just hiding it with a candy coating.
Now, briefly about entitlements. The Republicans love to make entitlement growth seem really scary, but it's not true. Social Security can be made solvent very easily : simply make the SS tax non-regressive. The current SS tax is regressive because it's a flat percentage but has a maximum. If you simply remove the maximum, Social Security is solvent for the next 100 years (CBO numbers).
Medicare is a bigger problem, but not because of the increase of the number of elderly - rather due to the corruption of doctors and the medical establishment. With increased productivity and technology, the cost of health care should be going *down*, instead it rises at an obscene rate, because the insurance complex has cooked up a system where we have no control over the cost of our care. Unfortunately Obamacare has perhaps made this worse than ever, locking the corrupt health insurance system into law without taking any steps to limit private profits.
How do you actually fix the American economy and get some real growth that's not just an illusion propped up by free Fed money?
1. Legally require open systems. Make net neutrality law. Open up the cable-TV lines. Perhaps the best option is a national open broadband system on some new super-fast fiber (unrealistic). Make the Apple Store type of computer lockdown illegal. Openness and free competition for small business is what will really save this country.
2. Make it easier to start small businesses. Remove the favoritism for big business. Tax loopholes and breaks massively favor big business - eliminate them all. Eliminate all development and "green" subsidies, which again massively favor big business. Simplify the tax code (see below) and then perhaps even simplify it more for small businesses, like provide a super-basic flat tax option for businesses that make less than $1M a year.
3. Make it cheaper to hire Americans in America. Eliminate payroll taxes. Eliminate employer-run health care (or provide a national group option for small businesses). Increase taxes on corproate profits and aggressively go after offshoring of money.
4. Long term we're fucked no matter what. What would you say are the prospects for a country where the education system sucks (the cost of education continues to rise way faster than inflation, and most of the "educated" can't actually do anything useful), the IT infrastructure sucks, and the cost of labor is sky-high? You would say that country is doomed to poverty, and that it is.
A few proposals for real government taxing & spending reform :
O. Get corporate money out of politics or everything else is hopeless.
O. Stop the revolving door between government and private industry. eg. if you work on the Texas Railroad Commission, then you are not allowed to go work for the oil/gas industry (and vice-versa). Treasury secretaries shouldn't be allowed to rotate in and out of wall street. It's totally absurd and like corporate speech, everything else is hopeless until it's stopped.
O. Return defense spending to 1999 levels ($300 billion from the current $700+ billion). And then cut it even further. Never going to happen since defense is the biggest pork item in government (by far).
O. Stop all farm subsidies and tax breaks. They're a sick farce. The small family farmers that are trotted out for political purposes don't exist in the real world; farm subsidies go to large agribusiness and to rich people with vineyards. They're actually very bad for small farmers that are trying to legitimately compete because they massively favor big business. Not only does it make the American farm economy sick, we're destroying the entire world food economy with our export subsidies.
O. Stop all direct aid for ethanol, electric cars, etc. They're a sick distortion of the market that isn't helping anything except corrupt profit. Let the market find solutions to problems.
O. Stop sending federal money to leech states. (need to get rid of the ridiculous over-powering of small states caused by the Senate)
O. Eliminate all payroll taxes. Fund medicare, SS, unemployment, etc. from the general tax revenue. This massively simplifies the tax code, removes the regressive SS tax, and reduces the cost of employment.
O. Don't cut Medicare spending or fake out the inflation rate for COLA. Instead go after the reason why medical costs are rising out of control. Don't reimburse doctors for unnecessary procedures or scans. Don't reimburse for unnecessary MRI's. Don't allow any medical practitioner to pass on the fee in excess of the negotiated rates to the client. Require up-front pricing for all medical treatment. Force the AMA to stop its corrupt limiting of the number of doctors. etc. etc.
O. Eliminate capital gains tax. I don't mean reduce it, I mean treat all profit as profit - tax it as normal income. Eliminate the dividend loophole. Stop letting the super rich pay 10-15% tax rates.
O. Eliminate all tax deductions. Nothing is deductable (but raise the standard deduction so that a majority of Americans actually get to deduct more). Alternatively : raise the AMT and remove exceptions from the AMT.
O. Remove foreign residency as a way to avoid US taxes. If you do business in America, you pay US taxes. Same for corporate taxes. Remove non-income benefits as a way to avoid taxes; eg. company cars, apartments, dinners, etc. all count as income. Pass new laws so we can be more aggressive about going after holding companies or "consulting firms" as ways to hide personal income.
O. Make US companies pay for our foreign spending on their behalf. eg. if you're Chevron and want to run a pipeline through Afghanistan, fine go ahead, but then you pay for the Afghanistan war. etc. Almost all of our defense and foreign aid spending should be paid by the companies that do business in unstable countries.
(I'm lifting the roof on their house) (there are five there, they're all sitting on top of the one broody hen that never leaves that box)
We've got hawthorn trees at the house which are unusual for Seattle (they don't belong here). In the fall/winter the leaves drop and they are covered in berries which are inedible to humans but are apparently like ambrosia to birds and squirrels. We get incursions from neighboring squirrels that the resident ones have to fend off with much shrieking, and of course lots of little birds come through in packs, which seems to be attracting the predator.
I was a bit worried that our cats would take advantage of this bountiful hunting ground (it's really perverse when people with cats set up bird feeders, and having super-delicious trees is not much removed from that), but so far that hasn't really happened.
I believe that human beings are only comfortable living with people they are intimate with. In ancient days this was your whole tribe, now it's usually just family. You essentially have no privacy from these people, and not even separate property. Trying to keep your own stuff is an exercise in frustration. You must trust these people and work together and open up to them to be happy. Certainly there is always friction in this, but it's a natural human existence, and even though it may give you lots to complain about, there will also be joy. (foolishly moving way from this way of life is the root of much unhappiness)
Everyone else is an enemy. If you aren't in my intimate tribal group, WTF are you doing near my home? This is my land where I have my family, I will fucking jab you in the eye with a pointy stick.
I'm not really comfortable with "friends". So-called "friends" are not your friends; they will make fun of you behind your back, they will let you down when you need help. You can't ever really open up and admit things to them, you can't show your weaknesses, they will mock you for it or use your weaknesses against you. It's so awful to me the way normal people talk to each other; everyone is pretending to be strong and happy all the time, nobody ever talks about anything serious, some people put on a big show to be entertaining, it's all just so tense and shallow and unpleasant. The reason is that these people are not in my tribe, hence they are my enemies, and this whole "friends" thing is a very modern societal invention that doesn't really work.
I realized a while ago that this is one of the things I hate about going into the office. The best office experiences I've had have been the ones where it was a small team, we were all young and spent lots of time at work, and we actually bonded and had fun together and were almost like a family after several years of crunching (going through tough shit together is a classic way to brainwash people into acting like a tribe); at that point, it feels comfortable to go in to work, you can rip a fart and people laugh instead of tensely pretending that nothing happened. But most of the time an office never reaches that level of intimacy, the people in the office are just acquaintances, so you're in this space where you're supposed to be relaxed, and there are people walking around all the time looking over your shoulder, but they are enemies! Of course I can't relax, they're not my tribe, why are they in my space? It's terrible.
Going away from home to work is really unnatural.
At first when people start working from home it feels weird because they're so used to leaving, but really this whole going to a factory/office thing is a super-modern invention (last few hundred years).
Of course you should stay home and work your farm. You should build your house, till your field, and be there to protect your women and children (eg. in the modern world : open jars for them). Of course you should have your children beside you so that you can talk to them and teach them your trade as you work.
Of course when you're hungry you should go in to your own kitchen and eat some braised pork shoulder that's real simple hearty food cooked by your own hands, not the poisonous filth that restaurants purvey.
You shouldn't leave your family for 8 hours every day, that's bizarre and horrible. You should see your livestock running around, be there to shoo away the neighbors cats, see the trees changing color, and put your time and your love into what is really yours.
cbloom's definitive SSD buying guide :
recommended :
Intel's whatever (currently the 520, but actually the old X25-M is still just fine; the S3700 stuff looks promising for the future)
not recommended :
Everything else.
The whole issue of flash degradation and moving blocks and such is a total red herring. SSD's are not failing because of the theoretical lifetime of flash memory, they are failing because the non-Intel drives are just broken. It's pretty simple, don't buy them.
The other issue I really don't care about is speed. They're all super fast. If they all actually worked then maybe I would care which was fastest, but since the non-Intel ones are just broken, the question of speed is irrelevant. The hardware review sites are all pretty awful with their endless benchmarking and complete missing of the actual issues. And even my ancient X25-M is plenty fast enough.
I think it's tempting to just go for the enterprise-grade stuff (Intel 710 at the moment). Saving money on storage doesn't make any sense to me, and all the speed measurement stuff just makes me yawn. (Intel 720 looks promising for the future). It's not quite as clear cut as ECC RAM (which is obviously worth it), but I suspect that spending another few $hundred to not worry about drive failure is worth it.
Oh, also brief googling indicates various versions of Mac OS don't support various SSD's correctly. I would just avoid SSD's on Mac unless you are very confident about getting this right. (best practice is probably just avoiding Mac altogether, but YMMV and various other acronyms)
And in context :
The chickens were out free ranging at the time; they all ran inside the coop and climbed back into the farthest corner nesting box and sat on top each other in a writhing pile of terrified chickens.
Watching animals is pretty entertaining. I remember when I was younger, I used to think it was a pathetic waste of time. Old people would sit around and watch the cats play, or get all excited about going on safari or whatever, and I would think "ppfft, boring, whatever, I've seen it on TV, what a sad vapid way to get entertainment, you oldsters are all so brain-dead, doing nothing with your time, you could be learning quantum field theory, but you've all just given up on life and want to sit around smiling at animals". Well, that's me now.
Going through old notes I found this (originally from Road and Track) :
"For instance, just about every Audi, Porsche and Volkswagen model that I've driven in the U.S. doesn't allow throttle/brake overlap. Our long-term Nissan 370Z doesn't, either, which is a big reason why I'm not particularly keen on taking it out for a good flog; overlap its throttle and brake just a little bit and the Z cuts power for what seems an eternity (probably about two seconds)."
VAG makes fucked up cars. I certainly won't ever buy a modern one again. They have extremely intrusive computers that take the power for LOLs out of the driver's hands. (apparently the 370Z also has some stupidity as well; this shit does not belong in cars that are sold as "driver's cars").
(in case it wasn't clear from the above : you cannot left-foot-brake a modern Porsche with throttle overlap. Furthermore, you also can't trail-brake oversteer a modern Porsche because ESC is always on under braking. You have to be careful going fully off throttle and then back on due to the off-throttle-timing-advance. etc. etc. probably more stupid shit I'm not aware of. This stuff may be okay for most drivers in their comfort saloons, but is inexcusable in a sports car)
Anyway, I'm posting cuz this reminded me that I found another good little mod for the 997 :
Stupid VAG computer has clutch-release-assist. What this does is change the engine map in the first few seconds after you let the clutch out. The reason they do this is so that incompetent old fart owners don't stall the car when pulling away from a light, and also to help you not burn the clutch so much. (the change to the engine map increases the minimum throttle and also reduces the max).
If you actually want to drive your car and do hard launches and clutch-kicks and generally have fun, it sucks. (the worst part is when you do a hard launch and turn, like when you're trying to join fast traffic, and you get into a slight slide, which is fine and fun, but then in the middle of your maneuver the throttle map suddenly changes back as the clutch-assist phase ends, and the car sort of lurches and surges weirdly, it's horrible). Fortunately disabling it is very easy :
There's a sensor that detects clutch depression. It's directly above the clutch in the underside of the dash. You should be able to see the plastic piston for the sensor near the hinge of the clutch pedal. All you have to do is unplug the sensor (it's a plastic clip fitting)
With the sensor unplugged you get no more clutch-release-assist and the car feels much better. You will probably stall it a few times as you get used to the different throttle map, but once you're used to it smooth fast starts are actually easier. (oh, and pressing the clutch will no longer disable cruise control, so be aware of that). I like it.
(aside : it's a shame that all the car magazines are such total garbage. If they weren't, I would be able to find out if any modern cars are not so fucked. And you also want to know if they're easy to fix; problems that are easy to fix are not problems)
(other aside : the new 991-gen Cayman looks really sweet to me, but there are some problems. I was hoping they would use the longer wheelbase to make the cabin a bit bigger, which apparently they didn't really do. They also lowered the seat and raised the door sills which ruin one of the great advantages of the 997-gen Porsches (that they had not adopted that horrible trend of excessively high doors and poor visibility). But the really big drawback is that I'm sure it's all VAG-ed up in stupid ways that make it annoying for a driver. And of course all the standard Cayman problems remain, like the fact that they down-grade all the parts from the 911 in shitty ways (put the damn multi-link rear suspension on the Cayman you assholes, put an adjustable sway on it and adjustable front control arms))
(final aside : car user interface design is generally getting worse in the last 10-20 years. Something that user interface designers used to understand but seem to have forgotten is that the most potent man-machine bond develops when you can build muscle memory for the device, so that you can use it effectively with your twitch reflexes that don't involve rational thought. In order for that to work, the device must behave exactly the same way at all times. You can't have context-sensitive knobs. You can't have the map of the throttle or brake pedal changing based on something the car computer detected. You must have the same outcome from the same body motion every time. This must be an involiable principle of good user interfaces.)
Seattle is a somewhat beautiful place (I'm not more enthusiastic because it is depressing to me how easily it could have been much better (and it continues to get worse as the modern development of Cap Hill and South Lake Union turn the city into a generic condo/mall dystopia)) but I just don't see it any more. When we got back from CA I realized that I just don't see the lake and the trees anymore, all I see is "home".
There are some aspects that still move me, like clear views of the Olympics, because they are a rare treat. But after 4 years, the beauty all around is just background.
We have pretty great views from our house, and I sort of notice them, but really the effect on happiness of the view is minimal.
(* = there are benefits to houses with a view other than the beauty of the view. Usually a good view is associated with being on a hill top, or above other people, or up high in a condo tower, and those have the advantages of being quieter, better air, more privacy, etc. Also having a view of nature is an advantage just in the fact that it is *not* a view of other people, which is generally stressful to look at because they are doing fucked up things that you can't control. I certainly appreciate the fact that our house is above everyone else; it's nice to look down on the world and be separate from it).
I was driving along Lake Wash with my brother this summer and he made some comment about how beautiful it was, and for a second there I just couldn't figure out what he was talking about. I was looking around to see if there was some new art installation, or if Mount Rainier was showing itself that day, and then I realized that he just meant the tree lined avenue on the lake and the islands and all that which I just didn't see at all any more.
Of course marrying for beauty is a similar mistake. Even ignoring the fact that beauty fades, if we imagine that it lasted forever it would still be a mistake because you would stop seeing it.
I've always thought that couples could keep the aesthetic interest in each other alive by completely changing their style every few years. Like, dress as a hipster for a while, dress as a punk rocker or a goth, dress as a preppy business person. Or get drastically different hair cuts, like for men grow out your hair like an 80's rocker, or get a big Morrisey pompadour, something different. Most people over 30 tend to settle into one boring low-maintenance style for the rest of their lives, and it becomes invisible to the adapted eyes in their lives.
I suppose there are various tricks you can use; like rather than have your favorite paintings on the wall all the time, rotate them like a museum, put some in storage for a while and hang up some others. It might even help to roll some dice to forcibly randomize your selection.
I guess the standard married custom of wearing sweats around the house and generally looking like hell is actually a smart way of providing intermittent reward. It's the standard sitcom-man refrain to complain that your wife doesn't fancy herself up any more, but that's dumb; if she did dress up every day, then that would just become the norm and you would stop seeing it. Better to set the baseline low so that you can occasionally have something exceptional.
(add : hmm the generalized point that you should save your best for just a few moments and be shitty other times is questionable. Think about behavior. Should you intentionally be kind of dicky most of the time and occasionally really nice? If you're just nice all the time, that becomes the baseline and people take it for granted. I'm not sure about that. But certainly morons do love the "dicky dad" character in TV's and movies; your typical fictional football coach is a great example; dicky dad is stern and tough, scowly and hard on you, but then takes you aside and is somewhat kind and generous, and all the morons in the audience melt and just eat that shit up.)
One of the traps of life is optimizing things. You paint your walls your favorite color for walls, you think you're making things better, but that gets you stuck in a local maximum, which you then stop seeing, and you don't feel motivated to change it because any change is "worse".
I realized the other day that quite a few ancient societies actually have pretty clever customs to provide randomized rewards. For example lots of societies have something like "numbers" , which ignoring the vig, is just a way of taking a steady small income and turning it into randomized big rewards.
Say you got a raise and make $1 more a day. At first you're happy because your life got better, but soon that happiness is gone because you just get used to the new slightly better life and don't perceive it any more. If instead of getting that $1 a day, you instead get $365 randomly on average once a year, your happiness baseline is the same, but once in a while you get a really happy day. This is probably actually better for happiness.
I think the big expensive parties that lots of ancient societies throw for special events might be a similar thing. Growing up in LA we would see our poor latino neighbors spend ridiculous amounts on a quincenera or a wedding and think how foolish it was, surely it's more rational to save that money and use it for health care or education or a nicer house. But maybe they had it right? Human happiness is highly resistant to rational optimization.
Log_SetOutputFile( FILE * f );
then
Log_Printf( const char * fmt .... );
or :
malloc_setminimumalignment( 16 );
then
malloc( size_t size );
The goal of this kind of design is to make the common use API minimal, and have a place to store the settings (in the singleton) so they don't have to be passed in all the time. So, eg. Log_Printf() doesn't have to pass in all the options associated with logging, they are stored in global state.
I propose that global state like this is the classic mistake of improving the easy case. For small code bases with only one programmer, they are mostly okay. But in large code bases, with multi-threading, with chunks of code written independently and then combined, they are a disaster.
Let's look at the problems :
1. Multi-threading.
This is an obvious disaster and pretty much a nail in the coffin for global state. Say you have some code
like :
pcb * previous_callback = malloc_setfailcallback( my_malloc_fail_callback );
void * ptr = malloc( big_size );
malloc_setfailcallback( previous_callback );
this is okay single threaded, but if other threads are using malloc, you just set the "failcallback" for them
as well during that span. You've created a nasty race. And of course you have no idea whether the failcallback
that you wanted is actually set when you call malloc because someone else might change it on another thread.
Now, an obvious solution is to make the state thread-local. That fixed the above snippet, but some times you want to change the state so that other threads are affected. So now you have to have thread-local versions and global versions of everything. This is a viable, but messy, solution. The full solution is :
There's a global version of all state variables. There are also thread-local copies of all the global state. The thread-local copies have a special value that means "inherit from global state". The initial value of all the thread-local state should be "inherit". All state-setting APIs must have a flag for whether they should set the global state or the thread-local state. Scoped thread-local state changes (such as the above example) need to restore the thread-local state to "inherit".
This can be made to work (I'm using for the Log system in Oodle at the moment) but it really is a very large conceptual burden on the client code and I don't recommend it.
There's another way that these global-state singletons are horrible for multi-threading, and that's that they create dependencies between threads that are not obvious or intentional. A little utility function that just calls some simple functions picks up these ties to shared variables and needs synchronization protection with the global state. This is related to :
2. Non-local effects.
The global state makes the functions that use it non-"pure" in a very hidden way. It means that innocuous functions can break code that's very far away from it in hidden ways.
One of the classic disasters of global state is the x87 (FPU) control word. Say you have a function like :
void func1()
{
set x87 CW
do a bunch of math that relies on that CW
func2();
do more math that relies on CW
restore CW
}
Even without threading problems (the x87 CW is thread-local under any normal OS), this code has nasty non-local
effects.
Some branch of code way out in func2() might rely on the CW being in a certain state, or it might change the CW and that breaks func1().
You don't want to be able to break code very far away from you in a hidden way, which is what all global state does. Particularly in the multi-threaded world, you want to be able to detect pure functions at a glance, or if a function is not pure, you need to be able to see what it depends on.
3. Undocumented and un-asserted requirements.
Any code base with global state is just full of bugs waiting to happen.
Any 3d graphics programmer knows about the nightmare of the GPU state machine. To actually write robust GPU code, you have to check every single render state at the start of the function to ensure that it is set up the way you expect. Good code always expresses (and checks) its requirements, and global state makes that very hard.
This is a big problem even in a single-source code base, but even worse with multiple programmers, and a total disaster when trying to copy-paste code between different products.
Even something like taking a function that's called in one spot in the code and calling it in another spot can be a hidden bug if it relied on some global state that was set up in just the right way in that original spot. That's terrible, as much as possible functions should be self-contained and work the same no matter where they are called. It's sort of like "movement of call site invariance symmetry" ; the action of a function should be determined only by its arguments (as much as possible) and any memory locations that it reads should be as clearly documented as possible.
4. Code sharing.
I believe that global state is part of what makes C code so hard to share.
If you take a code snippet that relies on some specific global state out of its content and paste it somewhere else, it no longer works. Part of the problem is that nobody documents or checks that the global state they need is set. But a bigger issue is :
If you take two chunks of code that work independently and just link them together, they might no longer work. If they share some global state, either intentionally or accidentally, and set it up differently, suddenly they are stomping on each other and breaking each other.
Obviously this occurs with anything in stdlib, or on the processor, or in the OS (for example there are lots of per-Process settings in Windows; eg. if you take some libraries that want a different time period, or process priority class, or priviledge level, etc. etc. you can break them just by putting them together).
Ideally this really should not be so. You should be able to link together separate libs and they should not break each other. Global state is very bad.
Okay, so we hate global state and want to avoid it. What can we do? I don't really have the answer to this because I've only recently come to this conclusion and don't have years of experience, which is what it takes to really make a good decision.
One option is the thread-local global state with inheritance and overrides as sketched above. There are some nice things about the thread-local-inherits-global method. One is that you do still have global state, so you can change the options somewhere and it affects all users. (eg. if you hit 'L' to toggle logging that can change the global state, and any thread or scope that hasn't explicitly sets it picks up the global option immediately).
Other solutions :
1. Pass in everything :
When it's reasonable to do so, try to pass in the options rather than setting them on a singleton. This may make the client code uglier and longer to type at first, but is better down the road.
eg. rather than
malloc_set_alignment( 16 );
malloc( size );
you would do :
malloc_aligned( size , 16 );
One change I've made to Oodle is taking state out of the async systems and putting in the args for each
launch. It used to be like :
OodleWork_SetKickImmediate( OodleKickImmediate_No );
OodleWork_SetPriority( OodlePriority_High );
OodleWork_Run( job );
and now it's :
OodleWork_Run( job , OodleKickImmediate_No, OodlePriority_High );
2. An options struct rather than lots of args.
I distinguish this from #3 because it's sort of a bridge between the two. In particular I think of an "options struct" as just plain values - it doesn't have to be cleaned up, it could be const or made with an initializer list. You just use this when the number of options is too large and if you frequently set up the options once and then use it many times.
So eg. the above would be :
OodleWorkOptions wopts = { OodleKickImmediate_No, OodlePriority_High };
OodleWork_Run( job , &wopts );
Now I should emphasize that we already have given ourselves great power and clarity. The options struct
could just be global, and then you have the standard mess with that. You could have it in the TLS so you
have per-thread options. And then you could locally override even the thread-local options in some scope.
Subroutines should take OodleWorkOptions as a parameter so the caller can control how things inside are run,
otherwise you lose the ability to affect child code which a global state system has.
Note also that options structs are dangerous for maintenance because of the C default initializer value of 0 and the fact that there's no warning for partially assigned structs. You can fix this by either making 0 mean "default" for every value, or making 0 mean "invalid" (and assert) - do not have 0 be a valid value which is anything but default. Another option is to require a magic number in the last value of the struct; unfortunately this is only caught at runtime, not compile time, which makes it ugly for a library. Because of that it may be best to only expose Set() functions for the struct and make the initializer list inaccessible.
The options struct can inherit values when its created; eg. it might fill any non-explicitly given values (eg. the 0 default) by inheriting from global options. As long as you never store options (you just make them on the stack), and each frame tick you get back to a root for all threads that has no options on the stack, then global options percolate out at least once a frame. (so for example the 'L' key to toggle logging will affect all threads on the next frame).
3. An initialized state object that you pass around.
Rather than a global singleton for things like The Log or The Allocator, this idea is to completely remove the concept that there is only one of those.
Instead, Log or Allocator is a struct that is passed in, and must be used to do those options. eg. like :
void FunctionThatMightLogOrAllocate( Log * l, Allocator * a , int x , int y )
{
if ( x )
{
Log_Printf( l , "some stuff" );
}
if ( y )
{
void * p = malloc( a , 32 );
free( a , p );
}
}
now you can set options on your object, which may be a per-thread object or it might be global, or it might even be unique to
the scope.
This is very powerful, it lets you do things like make an "arena" allocator in a scope ; the arena is allocated from the parent
allocator and passed to the child functions. eg :
void MakeSuffixTrie( Allocator * a , U8 * buf, int bufSize )
{
Allocator_Arena arena( a , bufSize * 4 );
MakeSuffixTrie_Sub( &arena, buf, bufSize );
}
The idea is there's no global state, everything is passed down.
At first the fact that you have to pass down a state pointer to use malloc seems like an excessive pain in the ass, but it has advantages. It makes it super clear in the signature of a function which subsystems it might use. You get no more surprises because you forgot that your Mat3::Invert function logs about degeneracy.
It's unclear to me whether this would be too much of a burden in real world large code bases like games.
I've got a million (actually several hundred) APIs that start an Async op. All of those APIs take a
bunch of standard arguments that they all share, so they all look like :
OodleHandle Oodle_Read_Async(
// function-specific args :
OodleIOQFile file,void * memory,SINTa size,S64 position,
// standard args on every _Async function :
OodleHandleAutoDelete autoDelete,OodlePriority priority,const OodleHandle * dependencies,S32 numDependencies);
The idea was that you pass in everything needed to start the op, and when it's returned you get a fully valid Handle which is
enqueued to run.
What I should have done was make all the little _Async functions create an incomplete handle, and then have a standard function
to start it. Something like :
// prep an async handle but don't start it :
OodleHandleStaging Oodle_Make_Read(
OodleIOQFile file,void * memory,SINTa size,S64 position
);
// standard function to run any op :
OodleHandle Oodle_Start( OodleHandleStaging handle,
OodleHandleAutoDelete autoDelete,OodlePriority priority,const OodleHandle * dependencies,S32 numDependencies);
it would remove a ton of boiler-plate out of all my functions, and make it a lot easier to add more standard args, or have
different ways of firing off handles. It would also allow things like creating a bunch of "Staging" handles that aren't
enqueued yet, and then firing them off all at once, or even just holding them un-fired for a while, etc.
It's sort of ugly to make clients call two functions to run an async op, but
you can always get client code that looks just like the old way by doing :
OodleHandle Oodle_Start( Oodle_Make_Read( OodleIOQFile file,void * memory,SINTa size,S64 position ) ,
OodleHandleAutoDelete autoDelete,OodlePriority priority,const OodleHandle * dependencies,S32 numDependencies);
and I could easily make macros that make that look like one function call.
Having that interval of a partially-constructed op would also let me add more attributes that you could set on the Staging handle before firing it off.
(for example : when I was testing compresses on enwik, some of the tasks could allocate something like 256MB each; it occurred to me that a robust task system should understand limitting the number of tasks that run at the same time if their usage of some resource exceeds the max. eg. for memory usage, if you know you have 2 GB free, don't run more than 8 of those 256 MB tasks at once, but you could run other non-memory-hungry tasks during that time. (I guess the general way to do that would be to make task groups and assign tasks to groups and then limit the number from a certain group that can run simultaneously))
American vernacular garden style is disgusting. It consists of a patch of lawn that's generally very dense and tightly mowed, then a hard edge at the border of the lawn, often even with something utterly unforgivable like a black plastic strip, then planted beds. The planted bends typically are a big expanse of mulch with scattered little dots of isolated plants. It's totally unnatural looking; it doesn't look right in its place, like it belongs there.
Just like with food, you should ask yourself, is all this stuff I'm doing actually making it better? If you go out to the woods around here and then walk around a neighborhood, how could you possibly think that the concrete pavers and decks and inappropriate warm-weather plants are an improvement?
Anyhoo, here's my manifesto about natural garden design :
1. Plants should look like themselves. Ferns look good green, not yellow or variegated. Daffodils look good yellow, not red or white. These days with advanced hybridization techniques you can get all kinds of crazy stuff, but DON'T, they are tacky as hell. They're like heavy makeup or plastic surgery, they sort of optimize a beauty goal but wind up being worse.
2. Your garden should match your area. Again with careful tending you can grow things from all over the world, but you shouldn't. The plants will look the most natural if they suit your area. Here in the PNW that means evergreens, ferns, rhododendra, etc. You can add some Japanese plants and such that have similar native climates, but things like tropical plants or hot/dry mediterranean plants do not belong here.
3. Mulch is fucking disgusting looking. One of the worst possible things you can do to your garden is to spread a huge field of mulch and then dot it with a sparse scattering of shrubs. Mulch is a necessary evil (actually some modern thought believes that mulch is overrated, but that's a digression) and should be invisible as much as possible. Mulches should, like the plants, match the area, not be some weird imported thing. So the modern cocoa and coco (coconut) mulches are both inappropriate everywhere. Here in the Northwest, pine bark is a semi-natural forest floor mulch and so looks okay in moderation.
4. Fertilizers, weed killers, grass treatments, etc. are all massive poisons, they flow into our natural water ways and fuck up the environment. You are a huge fucking selfish asshole if you use them beyond the bare minimum that is absolutely necessary. If your yard or plants require large amounts of chemicals to be happy, then change your fucking yard you asshole, plant something that works better in your environment; poisoning the damn lake is not a good tradeoff for you having a nice lawn.
5. Moss is a very natural and beautiful part of a Northwest garden and should not be removed. (BTW on a semi-related topic, the idea that moss can damage your roof is basically a myth here in the northwest (we very rarely get a major freeze, which is what makes moss harmful (because freezing makes it expand which rips up the shingles)) - and certainly pressure washing is guaranteed to do more harm than the moss ever could). Removing moss in the northwest is like polishing the patina off antique metal ware, it shows a complete lack of taste.
6. Concrete, manufactured stone, plastic, cast pavers, etc. have no place in a garden. Landscape fabric, plastic path/yard edging, etc. can be used but only if they will stay invisible, which they won't (they always work themselves out into sight), so probably just shouldn't be used.
7. A sort of general issue I've been thinking about is that there is a conflict between what looks good in a photo, or in a first impression, vs. what looks good to live with. This is true of gardens, houses, lots of things. Basically for a photo or a first impression (like a realtor visit) what looks good is simple, clean, and above all coherent; for example flowers should be all of one color or two colors. Many people now design gardens with this in mind, optimizing for a single view. However, that is quite different from what is enjoyable to live with on a regular basis. The scene that looks great in a photo will get boring if it's all you have to look at every day. To live with it's nicer to have lots of variety, unusual specimens, lots of little bits of interest you can walk around and look at. It's sort of like the overall impression vs. the density of interest; it's the "macro" vs "micro" optimization if you like. (the same is true of house decor; magazines and realtors favor a very clean, unified, almost Japanese simple interior, but living in that is quite boring; it's more interesting to live in the very cluttered house full of curios and covered with paintings and photos that give you lots of little things to look at).
The ideal Seattle house should have big Doug Fir beams, a cedar shake roof, and a big fireplace made of natural boulders. There should be a stream on the property and french drains that route groundwater to the stream.
The ideal Seattle garden should be like a woodland meadow. Obviously you don't actually want a "house in the woods" actually because it's too dark, what you want is that feeling when you're walking through the dense woods and you get to a big meadow and suddenly the sun appears and there's this lovely clearing of grasses and flowers with trees all around.
An ideal Seattle garden should always include some big evergreen trees, since they are the true masters of this landscape. A forest garden around the evergreens could include rhodendra, ferns, blueberries, etc. A good Seattle garden should always include moss and boulders; a truly lucky site would have one of our magnificent ice-age glacially deposited boulders.
(Seattle used to have lots of amazing giant boulders in the city. They were deposited by the glaciers that cut the sound, and were usually granite, giant 40 foot diameter things that just plopped there randomly. The vast majority of them have been destroyed, clearing space for houses and roads and such, and also to create smaller rocks. If you drive around Seattle you may observe all the rockeries used as retaining walls, made of large boulders (2-3 foot diameter typically); those boulders were usually made by dynamiting the original huge glacial boulders. There was one of the giant glacial boulders right on my street up until quite recently (the 1950's or something like that; the old neighbor was alive when it was still there); it's a shame that more weren't left and used as interesting city landscape features; of course it's an even bigger shame that more stands of old growth forest weren't left, we could have had stretches like the Golden Gate Park Panhandle running all over the city, and Seattle would then have been a unique and gorgeous city instead of the architectural garbage heap that it is today).
I've realized after buying this house we're in now that when shopping for a house, if you want to be a gardener, it's actually a liability to buy a house with a nice existing garden. The problem is you will want to work with what's already there, to respect those plants, and to save yourself work. But most likely the previous owner did some dumb things, planting big trees in bad places, or picking bad species or whatever. It's hard for me to just pull the trigger and rip out a garden that's already pretty nice, but if I had a crap garden I could plan it from the beginning and get more of what I want.
Typical "American Vernacular" style, taken from a real estate listing in my neighborhood :
Contrast with a Seattle park (Llandover Woods) that has very minimal sculpting, but is sticking to what this area should look like :
1. Corporate Speech / Unlimited political spending.
Duh. Can we impeach Thomas and Scalia already? Every lawyer and judge in this country knows that they have no business being on the bench; they are literally the punch-line of law school jokes. It's a farce that we have such incompetent, corrupt, lazy, biased buffoons making some of the most important decisions in the country. (in case you aren't aware : in Citizen's United (as with countless other cases), Thomas and Scalia were known to be meeting with the supporters of the conservative side of the case).
Without campaign finance reform, there is no democracy. Both parties are just the parties of big corporations now. Both parties are the puppets of wall street, the military, the health-care complex, the cable companies, etc, and none of those interests will ever be harmed, no matter how much they fuck over the populace. Politicians are the puppets of money, and money wins elections. With the current court, the only way we'll get serious campaign finance reform would be with an ammendment, which is pretty unlikely in this day.
(in amusing absurdism, lots of states are going after political speech by labor unions, and the courts have so far been upholding it. While I basically agree that Unions should not be making political speech, or at least their members should be allowed to opt out of funding it, it's in odd opposition to allowing unlimited corporate political speech)
2. Electoral College.
I think everyone with a brain realizes now that the electoral college is a huge disaster that's ruining national politics. National elections hinge entirely on the results in a few swing states, thus national party platforms and attention are directed at the interests of those states. The majority of the country has almost no say in national elections. It's completely retarded.
The electoral college also means that the national elections are disporportionally controlled by the state governments of a few swing states, which gives those state governments massive power over the nation that we all must be concerned about.
3. Voter roll tampering.
Voter roll tampering should be really shocking to anyone of either party that respects the right to vote. At the moment it appears to be the Republicans who are mainly using this tool (certainly in the olden corrupt days, the dems were masters of it).
For a while the main tool was expunging criminals from the rolls (with collateral damage to non-criminals). The new tool of choice is "voter id" laws. Voter id sounds okay in theory, but in practice the point of the voter id laws is to remove some poor people and some old people from the rolls, because they tend to vote more Dem. I've heard some Dems say it's not a big deal, it's only 20,000 people or so that lose their right to vote, but of course that number is *massive* in the swing states.
In a wider view, the problem is that the party which gets into majority in the state government has the power to change the rules to affect future elections. That should raise some serious eyebrows, and brings us to the next point :
4. Gerrymandering.
It's completely ridiculous that the party in charge (in many states) gets to draw the voting districts.
I think a lot of people don't realize how powerful this is, or how widespread, or how ridiculous many of the districts are. (see for example : amusing maps and disturbing control ). We're being robbed right in front of our eyes and we're not doing anything about it, and they're laughing all the way to the bank; it's sickening.
If you win control of a state by even a tiny margin like 51-49 , you can rig the districts to go for your party by a huge margin, like 12-4. The way you do it is you put all the opposition support into a few districts that they will win in landslides, like 95-5, and then you spread out your support just thin enough to guarantee lots of wins, like 55-45 wins. (if you have a 51% majority of the state's population, you can split your state into just 2 districts where the opposition wins 95-5 , and 23 districts where you win 55-45, giving you a 23-2 majority from 51-49).
I've seen proposals that there should be better non-political committees to draw the voting districts (something like direct-elected long term seats), but I think they're all doomed to be corrupt eventually. I would much rather see the elimination of voting districts entirely, and instead use direct state-wide election (something like : you vote for your top 5 people and the people with the most votes get the seats). (multi-vote systems are also a big win for other reasons; they give non-mainstream candidates a better chance of winning, and allow viable 3rd parties to form)
(I understand that the idea of local districts is that you have a rep in your area to help address your local issues, but I think that's basically a myth; the only thing local representation does is encourage corruption, as the rep tries to get tax cuts and earmarks for business interests in their district)
amusing disclosure : my first ever software job was working on gerrymandering software ("redistricting software" but of course we all knew what it was for). It was a CAD package that had all the census data, and you could move the borders around and see the political balance in each district so that you could easily adjust the lines to get the voter ratios you wanted. We sold it to Cook County, which is one of the classic Dem-side gerrymanders (what they do is take little slices of Democratic Chicago and put them into the suburban districts that might otherwise go Republican).
Anytime you do anything in America these days you are handed a fifty page contract (*) with all kinds of crazy clauses that nobody reads. And even if you do read it and object, what are you supposed to do? Not sign it? The competition has the exact same kind of abusive contract. As a consumer you can't choose to avoid them.
(* = if you are actually handed the real contract, that's rare and you should consider that company to be upstanding. In reality there is usually a clause somewhere that says "the full contract is available by request" and there's another 200 pages you don't know about that have even more exclusions. And even if you do request the full contract and actually get it, by the time you get it they've changed it and you no longer have the latest. And even if you like the terms you see and sign it, they'll change it the next day, and one of the clauses was "the agreement is superceded by changes to the contract" or some such. You may as well just sign a blank page that says "you may rape me however you choose" because they can always change the rules)
Tricky contracts are one of the forgotten evils behind the whole mortgage crisis. Asshole type-A republican types will say "it's your own fault if you don't read the fine print; you were given a contract and agreed to it, you have to live with it". Bullshit. We are all presented with reams of intentionally-obfuscated lawyer-speak, it is totally unreasonable to allow it.
There need to be laws about limitting the complexity of contracts. There need to be laws about limitting the amount of fine print in fee structures; pricing needs to be up front and standardized and clearly advertised.
Things like the classic "It's only $9.95 (with ten more monthly payments of 99 thousand)" should just be illegal, obviously. There's no social benefit in allowing that kind of advertising. The point of laws is to make a capitalist/social structure which is good for the people, and that kind of shit is not good; it's particularly bad for capitalists who want to play fair by selling a good product with an honest price.
The new evil in abusive contracts is of course software license agreements. These should just be completely illegal. You shouldn't be allowed to compel me to sign anything in order to use the product that I already bought. The purchase implies a standard obligation of functionality and indemnity, that should be the only agreement allowed. The standard indemnity laws should be updated a bit to reflect the modern age of software of course.
2. America desperately needs a real right-to-privacy law.
2.A. Government agencies should not be allowed to sell your personal information to corporations! (among others, the USPS does this, as do most state DMV's). This one is a super no-brainer.
2.B. No corporation should be allowed to sell your personal information without your explicit permission. But really what's really needed is :
2.C. Corporations should not be allowed to require any personal information beyond some bare minimum that can identify you. eg. they can ask for SSN and a password of your choosing, but they cannot ask for previous addresses or bank account numbers or etc.
2.D. It should be illegal to tie incentives to privacy violations. eg. club cards that give you discounts in exchange for giving up your privacy. Similarly lots of bills now give you a discount if you allow direct withdrawal from your bank account. Utilities often will allow you to not give your SSN, but only with a fee.
2.E. All privacy options must be set to max-privacy by default. Allowing increased privacy but requiring you to go through forms is no good.
2.F. You should be able to request deletion of all your records. eg. when you close a bank account, or from a doctor, or whatever, you should be able to say "please delete all your info on me" and they should be required by law to comply (if your accounts are in good standing blah blah blah).
3. I feel like the government-corporate complex is intentionally building this world structure in which you are locked into a variety of fixed fees which suck up all your income. (okay this part of the post is going off the deep end a bit)
You don't go work and get your money and then choose how to spend it. The corporate masters have auto-debit on your account and just suck it right out. I know this is retarded hyperbole, but it sort of feels like mining towns where you get paid scrip and then just have to give it right back at the company store, but in this case the company store is apple and the chinese-crap importers (target, walmart, gap, c&b, etc etc).
3.1. Health insurance is perhaps the most obvious; the cost of health care is ridiculously inflated, but that cost is hidden from you a bit (intentionally), so we're all just locked into paying out a massive amount monthly to the health care complex.
3.2. Car insurance is of course the same story. The car insurance companies very much want you to feel like "accidents are no big deal" ; hey lets all jump in our tanks with no visibility and smash into eachother, no biggie, the car insurance pays for it. And in the mean time a huge government-required deduction slips out of your account every month.
3.3. Cell phones obviously; and Cable companies; these ones are semi-government-enforced monopolies, and basically not optional in modern life, you are just required by law to give them $200 every month. More and more software wants to move to subscription plans. Everyone is getting very clever about making the easy way out just being "give us lots of money every month automatically" , and you have to work harder and harder to actually be proactive about spending your money.
3.4. I actually think online purchasing is part of this and is changing the whole relationship people have with money. You very often no longer get to see the thing you are buying before you buy it. Then when it sucks, it's usually too much trouble to return it. The result is that the whole purchasing is more like "send money out into the ether and then products show up which I have no control over". It's almost like a constant tax, and then they give you some shitty products once in a while.
It's useful to have at least 3 classes of task in your task system :
1. "Normal" tasks that want to run on one of the worker threads. These take some amount of time, you don't want them to block other threads, they might have dependencies and create other jobs.
2. "IO" tasks. This is CPU work, but the main thing it does is wait on an IO and perhaps spawn another one. For example something like copying a file through a double-buffer is basically just wait on an IO op then do a tiny bit of math, then start another IO op. This should be run on the IO thread, because it avoids all thread switching and thread communication as it mediates the IO jobs.
(of course the same applies to other subsytems if you have them)
3. "Tiny" / "Run anywhere" tasks. These are tasks that take very little CPU time and should not be enqueued onto worker threads, because the time to wake up a thread for them dwarfs the time to just run them.
The only reason you would run these as async tasks at all is because they depend on something. Generally this is something trivial like "set a bool after this other job is done".
These tasks should be run immediately when they become ready-to-run on whatever thread made them rtr. So they might run on the IO thread, or a worker thread, or the main thread (any client thread). eg. if a tiny task is enqueued on the main thread and its dependencies are already done, it should just be run immediately.
It's possible that class #2 could be merged into class #3. That is, eliminate the IO-tasks (or GPU-tasks or whatever) and just call them all "tiny tasks". You might lose a tiny bit of efficiency from that, but the simplicity of having only two classes of tasks is probably preferable. If the IO tasks are made into just generic tiny tasks, then it's important that the IO thread be able to execute tiny tasks from the generic job system itself, otherwise it might go to sleep thinking there is no IO to be done, when a pending tiny IO task could create new IO work for it.
Okay.
Beyond that, for "normal" tasks there's the question of typical duration, which tells you whether it's worth it to fire up more threads.
eg. say you shoot 10 tasks at your thread-pool worker system. Should you wake up 1 thread and let it do all 10 ? Or wake up 10 threads and give each one a task? Or maybe wake 2?
One issue that still is bugging me is when you have a worker thread, and in doing some work it makes some more tasks ready-to-run. Should it fire up new worker threads to take those tasks, or should it just finish its task and then do them itself? You need two pieces of information : 1. are the new tasks significant enough to warrant a new thread? and 2. how close to the end of my current task am I? (eg. if I'm in the middle of some big work I might want to fire up a new thread even though the new RTR tasks are tiny).
When you have "tiny" and "normal" tasks at the same priority level, it's probably worth running all the tinies before any normals.
Good lord.
"ad-hoc" multi-threading refers to sharing of data across threads without an explicit sharing mechanism (such as a queue or a mutex). There's nothing wrong per-se with ad-hoc multi-threading, but too often people use it as an excuse for "comment moderated handoff" which is no good.
The point of this post is : protect your threading! Use name-changes and protection classes to make access lifetimes very explicit and compiler (or at least assert) moderated rather than comment-moderated.
Let's look at some examples to be super clear. Ad-Hoc multi-threading is something like this :
int shared;
thread0 :
{
shared = 7; // no atomics or protection or anything
// shared is now set up
start thread1;
// .. do other stuff ..
kill thread1;
wait thread1;
print shared;
}
thread1 :
{
shared ++;
}
this code works (assuming that thread creation and waiting has some kind of memory barrier in it, which it usually does),
but the hand-offs and synchronization are all ad-hoc and "comment moderated". This is terrible code.
I believe that even with something like a mutex, you should make the protection compiler-enforced, not comment-enforced.
Comment-enforced mutex protection is something like :
struct MyStruct s_data;
Mutex s_data_mutex;
// lock s_data_mutex before touching s_data
That's okay, but comment-enforced code is always brittle and bug-prone. Better is something like :
struct MyStruct s_data_needs_mutex;
Mutex s_data_mutex;
#define MYSTRUCT_SCOPE(name) MUTEX_IN_SCOPE(s_data_mutex); MyStruct & name = s_data_needs_mutex;
assuming you have some kind of mutex-scoper class and macro. This makes it impossible to accidentally
touch the protected stuff outside of a lock.
Even cleaner is to make a lock-scoper class that un-hides the data for you. Something like :
//-----------------------------------
template
Errkay.
<typename t_data> class ThinLockProtectedHolder;
template <typename t_data>
class ThinLockProtected
{
public:
ThinLockProtected() : m_lock(0), m_data() { }
~ThinLockProtected() { }
protected:
friend class ThinLockProtectedHolder<t_data>;
OodleThinLock m_lock;
t_data m_data;
};
template <typename t_data>
class ThinLockProtectedHolder
{
public:
typedef ThinLockProtected<t_data> t_protected;
ThinLockProtectedHolder(t_protected * ptr) : m_protected(ptr) { OodleThinLock_Lock(&(m_protected->m_lock)); }
~ThinLockProtectedHolder() { OodleThinLock_Unlock(&(m_protected->m_lock)); }
t_data & Data() { return m_protected->m_data; }
protected:
t_protected * m_protected;
};
#define TLP_SCOPE(t_data,ptr,data) ThinLockProtectedHolder<t_data> RR_STRING_JOIN(tlph,data) (ptr); t_data & data = RR_STRING_JOIN(tlph,data).Data();
//--------
/*
// use like :
ThinLockProtected<int> tlpi;
{
TLP_SCOPE(int,&tlpi,shared_int);
shared_int = 7;
}
*/
//-----------------------------------
So the point of this whole post is that even when you are just doing ad-hoc thread ownership, you should
still use a robustness mechanism like this. For example by direct analogy you could use something like :
//=========================================================================
template
which provides scoped checked ownership of variable hand-offs without any explicit mutex.
<typename t_data> class AdHocProtectedHolder;
template <typename t_data>
class AdHocProtected
{
public:
AdHocProtected() :
#ifdef RR_DO_ASSERTS
m_lock(0),
#endif
m_data() { }
~AdHocProtected() { }
protected:
friend class AdHocProtectedHolder<t_data>;
#ifdef RR_DO_ASSERTS
U32 m_lock;
#endif
t_data m_data;
};
#ifdef RR_DO_ASSERTS
void AdHoc_Lock( U32 * pb) { U32 old = rrAtomicAddExchange32(pb,1); RR_ASSERT( old == 0 ); }
void AdHoc_Unlock(U32 * pb) { U32 old = rrAtomicAddExchange32(pb,-1); RR_ASSERT( old == 1 ); }
#else
#define AdHoc_Lock(xx)
#define AdHoc_Unlock(xx)
#endif
template <typename t_data>
class AdHocProtectedHolder
{
public:
typedef AdHocProtected<t_data> t_protected;
AdHocProtectedHolder(t_protected * ptr) : m_protected(ptr) { AdHoc_Lock(&(m_protected->m_lock)); }
~AdHocProtectedHolder() { AdHoc_Unlock(&(m_protected->m_lock)); }
t_data & Data() { return m_protected->m_data; }
protected:
t_protected * m_protected;
};
#define ADHOC_SCOPE(t_data,ptr,data) AdHocProtectedHolder<t_data> RR_STRING_JOIN(tlph,data) (ptr); t_data & data = RR_STRING_JOIN(tlph,data).Data();
//==================================================================
We can now revisit our original example :
AdHocProtected
And now we have code which is efficient, robust, and safe from accidents.
<int> ahp_shared;
thread0 :
{
{
ADHOC_SCOPE(int,&ahp_shared,shared);
shared = 7; // no atomics or protection or anything
// shared is now set up
}
start thread1;
// .. do other stuff ..
kill thread1;
wait thread1;
{
ADHOC_SCOPE(int,&ahp_shared,shared);
print shared;
}
}
thread1 :
{
ADHOC_SCOPE(int,&ahp_shared,shared);
shared ++;
}
The traditional way to do this is to write your own "operator new" implementation which will link in place of the library implementation. This way sucks for various reasons. The important one to me is that it changes all the news of any other statically-linked code, which is just not an okay thing for a library to do. You may want to have different mallocs for different purposes; the whole idea of a single global allocator is kind of broken in the modern world.
(the presence of global state in the C standard lib is part of what makes C code so hard to share. The entire C stdlib should be a passed-in vtable argument. Perhaps more on this in a later post.)
Anyway, what I want is a way to do a "new" without interfering with client code or other libraries. It's relatively straightforward (*), but there are a few little details that took me a while to get right, so here they are.
(* = ADDENDUM = not straightforward at all if multiple-inheritance is used and deletion can be done on arbitrary parts of the MI class)
//==================================================================
/*
subtlety : just calling placement new can be problematic; it's safer to make an explicit selection
of placement new. This is how we call the constructor.
*/
enum EnumForPlacementNew { ePlacementNew };
// explicit access to placement new when there's ambiguity :
// if there are *any* custom overrides to new() then placement new becomes ambiguous
inline void* operator new (size_t, EnumForPlacementNew, void* pReturn) { return pReturn; }
inline void operator delete(void*, EnumForPlacementNew, void*) { }
#ifdef __STRICT_ALIGNED
// subtlety : some stdlibs have a non-standard operator new with alignment (second arg is alignment)
// note that the alignment is not getting passed to our malloc here, so you must ensure you are
// getting it in some other way
inline void* operator new (size_t , size_t, EnumForPlacementNew, void* pReturn) { return pReturn; }
#endif
// "MyNew" macro is how you do construction
/*
subtlety : trailing the arg list off the macro means we don't need to do this kind of nonsense :
template
and the end result is that you can do :
<typename Entry,typename t_arg1,typename t_arg2,typename t_arg3,typename t_arg4,typename t_arg5,typename t_arg6,typename t_arg7,typename t_arg8,typename t_arg9>
static inline Entry * construct(Entry * pEntry, t_arg1 arg1, t_arg2 arg2, t_arg3 arg3, t_arg4 arg4, t_arg5 arg5, t_arg6 arg6, t_arg7 arg7, t_arg8 arg8, t_arg9 arg9)
*/
// Stuff * ptr = MyNew(Stuff) (constructor args);
// eg. for void args :
// Stuff * ptr = MyNew(Stuff) ();
#define MyNew(t_type) new (ePlacementNew, (t_type *) MyMalloc(sizeof(t_type)) ) t_type
// call the destructor :
template <typename t_type>
static inline t_type * destruct(t_type * ptr)
{
RR_ASSERT( ptr != NULL );
ptr->~t_type();
return ptr;
}
// MyDelete is how you kill a class
/*
subtlety : I like to use a Free() which takes the size of the object. This is a big optimization
for the allocator in some cases (or lets you not store the memory size in a header of the allocation).
*But* if you do this, you must ensure that you don't use sizeof() if the object is polymorphic.
Here I use MSVC's nice __has_virtual_destructor() extension to detect if a type is polymorphic.
*/
template <typename t_type>
void MyDeleteNonVirtual(t_type * ptr)
{
RR_ASSERT( ptr != NULL );
#ifdef _MSC_VER
RR_COMPILER_ASSERT( ! __has_virtual_destructor(t_type) );
#endif
destruct(ptr);
MyFree_Sized((void *)ptr,sizeof(t_type));
}
template <typename t_type>
void MyDeleteVirtual(t_type * ptr)
{
RR_ASSERT( ptr != NULL );
destruct(ptr);
// can't use size :
MyFree_NoSize((void *)ptr);
}
#ifdef _MSC_VER
// on MSVC , MyDelete can select the right call at compile time
template <typename t_type>
void MyDelete(t_type * ptr)
{
if ( __has_virtual_destructor(t_type) )
{
MyDeleteVirtual(ptr);
}
else
{
MyDeleteNonVirtual(ptr);
}
}
#else
// must be safe and use the polymorphic call :
#define MyDelete MyDeleteVirtual
#endif
foo * f = MyNew(foo) ();
MyDelete(f);
and you get normal construction and destruction but with your own allocator, and without polluting
(or depending on) the global linker space. Yay.
1. Global variables should not be allowed without a prefix "global".
(optionally). eg.
int x; // compile failure!
static int y;
global int x; // ok
(this should be turned on as an option). I should be able to search for "global" and find them all
(and remove them, because globals are pointless and horrible, especially in the modern age of multi-threading).
2. Name-hiding should require an explicit "hide" prefix, eg.
int x;
{
int x; // compile failure !
hide int x; // ok
}
I hate name-hiding and think it never should have been in the language. It does nothing
good and creates lots of bugs. Similarly :
3. Overloads and virtual overrides should have an explicit prefix.
This ensures that you are adding an overload or virtual override intentionally, not by accident just because the names happen to line up. The entire C overload/override method only works by coincidence; like it's a coincidence that these names are the same so they are an overload; there's no way to say "I intend this to be an overload, please be an error if it's not" (and the opposite; I intend this to be a unique function, please error if it is an overload).
Similarly, it catches the very common error that you wrote a virtual override and it all worked and then somebody changes the signature in the base class and suddenly you have code that still compiles but the virtuals no longer override.
4. C-style cast should be (optionally) an error.
I should be able to flip a switch and make C-style casts illegal. They are the devil, too easy to abuse, and impossible to search for. There should be a c_cast that provides the same action in a more verbose way.
5. Uninitialized variables should be an error. (optionally).
int x; // compile failure!
int x(0); // ok
Duh. Of course. Same thing with member variables in class constructors. It's ridiculous that it's so easy to use uninitialized memory.
6. Less undefined behavior.
Everything that's currently "undefined" should be a compile error by default. Then let me make it
allowed by setting pragmas, like :
#require signed_shift
#require flat_memory_model
which then changes those usages from compile errors to clearly defined operations.
7. Standardize everything that's sort of outside the standard, such as common pragmas, warning disablement, etc.
The way to do this is to have a chunk of standard that's like "you don't have to implement this, but if you do the syntax must be like so" and then provide a way to check if it's implemented.
Just a standard syntax for warning disablement would be great (and of course the ability to do it in ranges or even C scopes).
Things like SIMD could also be added to the language in this way; just simple things like "if you have a simd type it is named this" would massively help with standardizing shit which is different on every platform/compiler for no good reason.
8. (optionally) disable generation of all automatic functions, eg. assignment, copy construct, default construct. The slight short-term convenience of these being auto-generated is vastly outweighed by the bugs they create when you use them accidentally on classes that should not be copied. Of course I know how to put non-copyable junk in classes but I shouldn't have to do that; that should be the default, and they should only be copyable when I explicitly say it's okay. And you shouldn't have to write that out either, there should be a "use default copy" directive that you put in the class when you know it's okay.
9. Trivial reflection. Just give me a way to say "on all member variables". There's no reason not to have this,
it's so easy for the compiler to add, just give me a way to do :
template
In which the compiler generates the list of members for me, I don't have to manually do it.
<typename t_op>
void Reflect( t_op & op )
{
op(member1);
op(member2);
...
}
Even something as simple as "write part of this buffer to a file" constantly causes me pain, because implied in that operation is "the buffer must not be freed until the write is done" , "the buffer should not be changed in the area being written until the write is done" , and "the file should not be closed until the write is done".
When you first start out and aren't doing a lot of complicated ops, it doesn't seem too bad, you can keep those things in your head; they become "comment-enforced" rules; that is, the code doesn't make itself correct, you have to write comments like "// write is pending, don't free buffer yet" (often you don't actually write the comments, but they're still "comment-enforced" as opposed to "code-enforced").
I think the better way is the very-C++-y Oodle futures .
Oodle futures rely on every object they take as inputs having refcounts, so there is no issue of free before exit. Some key points about the Oodle futures that I think are good :
A. Dependencies are automatic based on your arguments. You depend on anything you take as arguments. If the arguments themselves depend on async ops, then you depend on the chain of ops automatically. This is super-sweet and just removes a ton of bugs. You are then required to write code such that all your dependencies are in the form of function arguments, which at first is a pain in the ass, but actually results in much cleaner code overall because it makes the expression of dependencies really clear (as opposed to just touching some global deep inside your function, which creates a dependency in a really nasty way).
B. Futures create implicit async handles; the async handles in Oodle future are all ref-counted so they clean themselves automatically when you no longer care about them. This is way better than the manual lifetime management in Oodle right now, in which you either have to hold a bunch of handles.
C. It's an easy way to plug in the result of one async op into the input of the next one. It's like an imperative way of using code to do that graph drawing thing ; "this op has an output which goes into this input slot". Without an automated system for this, what I'm doing at the moment is writing lots of little stub functions that just wait on one op, gather up its results and starts the next op. There's no inefficiency in this, it's the same thing the future system does, but it's a pain in the ass.
If I was restarting from scratch I would go even further. Something like :
1. Every object has a refcount AND a read-write lock built into. Maybe the refcount and RW lock count go together in one U32 or U64 which is maintained by lockfree ops.
Refcounting is obvious. Lifetimes of async ops are way too complicated without it.
The RW lock in every object is something that sophomoric programmers don't see the need for. They think "hey it's a simple struct, I fill it on one thread, then pass it to another thread, and he touches it". No no no, you're a horrible programmer and I don't want to work with you. It seems simple at first, but it's just so fragile and prone to bugs any time you change anything, it's not worth it. If every object doesn't just come with an RW lock it's too easy to be lazy and skip adding one, which is very bad. If the lock is uncontended, as in the simple struct handoff case above, then it's very cheap, so just use it anyway.
2. Whenever you start an async op on an object, it takes a ref and also takes either a read lock or write lock.
3. Buffers are special in that you RW lock them in ranges. Same thing with textures and such. So you can write non-overlapping ranges simultaneously.
4. Every object has a list of the ops that are pending on that object. Any time you start a new op on an object, it is delayed until those pending ops are done. Similarly, every op has a list of objects that it takes as input, and won't run until those objects are ready.
The other big thing I would do in a rewrite from scratch is the basic architecture :
1. Write all my own threading primitives (semaphore, mutex, etc) and base them on a single waitset. (I basically have this already).
2. Write stack-ful coroutines.
3. When the low level Wait() is called on a stackful coroutine, instead yield the coroutine.
That way the coroutine code can just use Semaphore or whatever, and when it goes to wait on the semaphore, it will yield instead. It makes the coroutine code exactly the same as non-coroutine code and makes it "composable" (eg. you can call functions and they actually work), which I believe is crucial to real programming. This lets you write stackful coroutine code that does file IO or waits on async ops or whatever, and when you hit some blocking code it just automatically yields the coroutine (instead of blocking the whole worker thread).
This would mean that you could write coroutine code without any special syntax; so eg. you can call the same functions from coroutines as you do from non-coroutines and it Just Works the way you want. Hmm I think I wrote the same sentence like 3 times, but it's significant enough to bear repetition.
It's an input race , as previously discussed here
What happens is, you hit Start, and you get your focus in the type-in-a-program edit box. That part is fine. You type in a program name. At that point it does the search in the start menu thing in the background (it doesn't stall after each key press). In many cases there will be a bit of a delay before it updates the list of matching programs found.
If you hit Enter before it finds the program and highlights it, it just closes the dialog and doesn't run anything. If you wait a beat before hitting enter, the background program-finder will highlight the thing and hitting enter will work.
Very shitty. The start menu should not have keyboard input races. In this case the solution is obvious and trivial - when you hit enter it should wait on the background search task before acting on that key (but if you hit escape it should immediately close the window and abort the task without waiting).
I've long been an advocate of video game programmers doing "flakiness" testing by playing the game at 1 fps, or capturing recordings of the game at the normal 30 fps and then watching them play back at 1 fps. When you do that you see all sorts of janky shit that should be eliminated, like single frame horrible animation pops, or in normal GUIs you'll see things like the whole thing redraw twice in a row, or single frames where GUI elements flash in for 1 frame in the wrong place, etc.
Things like input races can be very easily found if you artificially slow down the program by 100X or so, so that you can see what it's actually doing step by step.
I'm a big believer in eliminating this kind of flakiness. Almost nobody that I've ever met in development puts it as a high priority, and it does take a lot of work for apparently little reward, and if you ask consumers they will never rate it highly on their wish list. But I think it's more important than people realize; I think it creates a feeling of solidness and trust in the application. It makes you feel like the app is doing what you tell it to, and if your avatar dies in the game it's because of your own actions, not because the stupid game didn't jump even though you hit the jump button because there was one frame where it wasn't responding to input.
cbloom rants 09-02-12 - Encoding Values in Bytes Part 1
cbloom rants 09-02-12 - Encoding Values in Bytes Part 2
cbloom rants 09-02-12 - Encoding Values in Bytes Part 3
cbloom rants 09-04-12 - Encoding Values in Bytes Part 4
cbloom rants 09-04-12 - LZ4 Optimal Parse
cbloom rants 09-10-12 - LZ4 - Large Window
cbloom rants 09-11-12 - LZ MinMatchLen and Parse Strategies
cbloom rants 09-13-12 - LZNib
cbloom rants 09-14-12 - Things Most Compressors Leave On the Table
cbloom rants 09-15-12 - Some compression comparison charts
cbloom rants 09-23-12 - Patches and Deltas
cbloom rants 09-24-12 - LZ String Matcher Decision Tree
cbloom rants 09-28-12 - LZNib on enwik8 with Long Range Matcher
cbloom rants 09-30-12 - Long Range Matcher Notes
cbloom rants 10-02-12 - Small note on LZHAM
cbloom rants 10-04-12 - Hash-Link match finder tricks
cbloom rants 10-05-12 - OodleLZ Encoder Speed Variation with Worker Count
cbloom rants 10-07-12 - Small Notes on LZNib
cbloom rants: 10-16-12 - Two more small notes on LZNib
And some little additions :
First a correction/addendum on cbloom rants 09-04-12 - LZ4 Optimal Parse :
I wrote before that going beyond the 15 states needed to capture the LRL overflowing the control byte doesn't help much (or at all). That's true if you only go up to 20 or 30 or 200 states, but if you go all the way to 270 states, so that you capture the transition to needing another byte, there is some win to be had (LZ4P-LO-332 got lztestset to 12714031 with small optimal state set, 12492631 with large state set).
If you just do it naively, it greatly increases memory use and run time. However, I realized that there is a better way. The key is to use the fact that there are so many code-cost ties. In LZ-Bytewise with the large state set, often the coding decision in a large number of states will have the same cost, and furthermore often the end point states will all have the same cost. When this happens, you don't need to make the decision independently for each state, instead you make one decision for the entire block, and you store a decision for a range of states, instead of one for each state.
eg. to be explicit, instead of doing :
in state 20 at pos P
consider coding a literal (takes me to state 21 at pos P+1)
consider various matches (takes me to state 0 at pos P+L)
store best choice in table[P][20]
in state 21 ...
do :
in states 16-260 at pos P
consider coding a literal (takes me to states 17-261 at pos P+1 which I saw all have the same cost)
consider various matches (takes me to state 0 at pos P+L)
store in table[P] : range {16-260} makes decision X
in states 261-263 ...
so you actually can do the very large optimal parse state set with not much increase in run time or memory use.
Second : I did a more complex variant of LZ4P (large window). LZ4P-LO includes "last offset". LZ4P-LO-332 uses a 3-bit-3-bit-2-bit control word (as described previously here : cbloom rants 09-10-12 - LZ4 - Large Window ) ; the 2 bit offset reserves one value for LO and 3 values for normal offsets.
(I consider this an "LZ4" variant because (unlike LZNib) it sends LZ codes as a strictly alternating LRL-ML pairs (LRL can be zero) and the control word of LRL and ML is in one byte)
Slightly better than LZ4P-LO-332 is LZ4P-LO-695 , where the numbering has switched from bits to number of values (so 332 should be 884 for consistency). You may have noticed that 6*9*5 = 270 does not fit in a byte, but that's fixed easily by forbidding some of the possibilities. 6-9-5 = 6 values for literals, 9 for match lengths, and 5 for offsets. The 5 offsets are LO + 2 bits of normal offset. So for example one of the ways that the 270 values is reduced is because an LO match can never occur after an LRL of 0 (the previous match would have just been longer), so those combinations are removed from the control byte.
LZ4P-LO-695 is not competitive with LZNib unless you spill the excess LRL and ML (the amount that is too large to fit in the control word) to nibbles, instead of spilling to bytes as in the original LZ4 and LZ4P. Even with spilling to nibbles, it's no better than LZNib. Doing LZ4P-LO-695, I found a few bugs in LZNib, so its results also got better.
Thirdly, current numbers :
| raw | lz4 | lz4p332 | lz4plo695 | lznib d8 | zlib | OodleLZHLW | |
| lzt00 | 16914 | 6473 | 6068 | 6012 | 5749 | 4896 | 4909 |
| lzt01 | 200000 | 198900 | 198880 | 198107 | 198107 | 198199 | 198271 |
| lzt02 | 755121 | 410695 | 292427 | 265490 | 253935 | 386203 | 174946 |
| lzt03 | 3471552 | 1820761 | 1795951 | 1745594 | 1732491 | 1789728 | 1698003 |
| lzt04 | 48649 | 16709 | 15584 | 15230 | 14352 | 11903 | 10679 |
| lzt05 | 927796 | 460889 | 440742 | 420541 | 413894 | 422484 | 357308 |
| lzt06 | 563160 | 493055 | 419768 | 407437 | 398780 | 446533 | 347495 |
| lzt07 | 500000 | 265688 | 248500 | 240004 | 237120 | 229426 | 210182 |
| lzt08 | 355400 | 331454 | 322959 | 297694 | 302303 | 277666 | 232863 |
| lzt09 | 786488 | 344792 | 325124 | 313076 | 298340 | 325921 | 268715 |
| lzt10 | 154624 | 15139 | 13299 | 11774 | 11995 | 12577 | 10274 |
| lzt11 | 58524 | 25832 | 23870 | 22381 | 22219 | 21637 | 19132 |
| lzt12 | 164423 | 33666 | 30864 | 29023 | 29214 | 27583 | 24101 |
| lzt13 | 1041576 | 1042749 | 1040033 | 1039169 | 1009055 | 969636 | 923798 |
| lzt14 | 102400 | 56525 | 53395 | 51328 | 51522 | 48155 | 46422 |
| lzt15 | 34664 | 14062 | 12723 | 11610 | 11696 | 11464 | 10349 |
| lzt16 | 21504 | 12349 | 11392 | 10881 | 10889 | 10311 | 9936 |
| lzt17 | 53161 | 23141 | 22028 | 21877 | 20857 | 18518 | 17931 |
| lzt18 | 102400 | 85659 | 79138 | 74459 | 76335 | 68392 | 59919 |
| lzt19 | 768771 | 363217 | 335912 | 323886 | 299498 | 312257 | 268329 |
| lzt20 | 1179702 | 1045179 | 993442 | 973791 | 955546 | 952365 | 855231 |
| lzt21 | 679936 | 194075 | 113461 | 107860 | 102857 | 148267 | 83825 |
| lzt22 | 400000 | 361733 | 348347 | 336715 | 331960 | 309569 | 279646 |
| lzt23 | 1048576 | 1040701 | 1035197 | 1008638 | 989387 | 777633 | 798045 |
| lzt24 | 3471552 | 2369885 | 1934129 | 1757927 | 1649592 | 2289316 | 1398291 |
| lzt25 | 1029744 | 324190 | 332747 | 269047 | 230931 | 210363 | 96745 |
| lzt26 | 262144 | 246465 | 244990 | 239816 | 239509 | 222808 | 207600 |
| lzt27 | 857241 | 430350 | 353497 | 315394 | 328666 | 333120 | 223125 |
| lzt28 | 1591760 | 445806 | 388712 | 376137 | 345343 | 335243 | 259488 |
| lzt29 | 3953035 | 2235299 | 1519904 | 1451801 | 1424026 | 1805289 | 1132368 |
| lzt30 | 100000 | 100394 | 100393 | 100010 | 100013 | 100020 | 100001 |
| total | 24700817 | 14815832 | 13053476 | 12442709 | 12096181 | 13077482 | 10327927 |
And comparison charts on the aggregated single file lzt99 :
Speeds are the best of 20 trials on each core; speed is the best of either x86 or x64 (usually x64 is faster). The decode times
measured are slightly lower for everybody in this post (vs the last post of this type) because of the slightly more rigorous timing runs.
For reference the decode speeds I measured are (mb/s) :
LZ4 : 1715.10235
LZNib : 869.1924302
OodleLZHLW: 287.2821629
zlib : 226.9286645
LZMA : 31.41397495
Also LZNib current enwik8 size :
(parallel chunking (8 MB chunks) and LRM 12/12 with bubble)
LZNib enwik8 mml3 : 30719351
LZNib enwik8 stepml : 30548818
(all other LZNib results are for mml3)
To be clear, there are two compression steps :
{ Raw structs } --[ad hoc]--> { Bit Packed } --[compressor]--> { Transmitted Data }
What you actually want to minimize is the size of the final transmitted data, which is not necessarily achieved with the
smallest bit packed data.
The ideal scenario is if you know your back-end compressor, simply try a variety of ways of packing and measure the final size. You should always start with completely un-packed data, which often is a reasonable way to go. It's also important to keep in mind the speed hit of bit packing. Compressors (in particular, decompressors) are very fast, so even though your bit-packing may just consist of some simple math, it actually can very easily be much slower than the back-end decompressor. Many people incorrectly spend CPU time doing pre-compression bit-packing, when they would be better off spending that same CPU time by just running a stronger compressor and not doing any twiddling themselves.
The goal of bit-packing should really be to put the data in a form that the compressor can model efficienctly. Almost all compressors assume an 8-bit alphabet, so you want your data to stay in 8-bit form (eg. use bit-aligned packing, don't use non-power-of-2 multiplies to tightly pack values if they will cross a byte boundary). Also almost all compressors, even the best in the world (PAQ, etc) primarily achieve compression by modeling correlation between neighboring bytes. That means if you have data that does not have the property of maximum correlation to its immediate neighbor (and steady falloff) then some swizzling may help, just rearranging bytes to put the correlated bytes near each other and the uncorrelated bytes far away.
Some issues to consider :
1. Lossy bit packing.
Any time you can throw away bits completely, you have a big opportunity that you should exploit (which no back end compressor can ever do, because it sends data exactly). The most common case of this is if you have floats in your struct. Almost always there are several bits in a float which are pure garbage, just random noise which is way below the error tolerance of your app. Those bits are impossible to compress and if you can throw them away, that's pure win. Most floats are better transmitted as something like a 16 bit fixed point, but this requires application-specific knowledge about how much precision is really needed.
Even if you decide you can't throw away those bits, something that can help is just to get them out of the main stream. Having some random bytes mixed in to an otherwise nicely compressible stream really mucks up the order-0 statistics, so just putting them on the side is a nice way to go. eg. you might take the bottom 4 or 8 bits out of each float and just pass them uncompressed.
(in practical bone-head tips, it's pretty common for un-initialized memory to be passed to compressors; eg. if your structs are padded by C so there are gaps between values, put something highly compressible in the gap, like zero or a duplicate of the neighboring byte)
2. Relationships between values.
Any time you have a struct where the values are not completely independent, you have a good opportunity for packing. Obviously there are cases where one value in a struct can be computed from another and should just not be sent.
There are more subtle cases, like if A = 1 then B has certain statistics (perhaps it's usually high), while if A = 0 then B has other statistics (perhaps it's usually low). In these cases there are a few options. One is just to rearrange the transmission order so that A and B are adjacent. Most back end compressors model correlation between values that are adjacent, so putting the most-related values in a struct next to each other will let the back end find that correlation.
There are also often complicated mathematical relationships. A common case is a normalized vector; the 3 values are constrained in a way that the compressor will never be able to figure out (proof that current compressors are still very far away from the ideal of perfect compression). When possible you want to reduce these related values to their minimal set; another common case is rotation matrices, where 9 floats (36 bytes) can be reduced to 3 fixed points (6-9 bytes).
This is really exactly the same as the kinds of variable changes that you want to do physics; when you have a lot of values in a struct that are constrained together in some way, you want to identify the true number of degrees of freedom, and try to convert your values into independent unconstrained variables.
When numerical values are correlated to their neighbors, delta transformation may help. (this particularly helps with larger-than-byte values where a compressor will have a harder time figuring it out)
3. Don't mash together statistics.
A common mistake is to get too aggressive with mashing together values into bits in a way that wrecks the back-end statistical model. Most back end compressors work best if the bytes in the file all have the same probability histogram; that is, they are drawn from the same "source". (as noted in some of the other points, if there are multiple unrelated "sources" in your one data stream, the best thing to do is to separate them from each other in the buffer)
Let me give a really concrete example of this. Say you have some data which has lots of unused space in its bytes, something like :
bytes in the original have values :
0000 + 4 bits from source "A"
0001 + 4 bits from source "B"
(when I say "from source" I mean a random value drawn under a certain probability distribution)
You might be tempted to bit-pack these to compact them before the back end compressor. You might do something like this :
Take the top 4 bits to make a flag bit
Take 8 flag bits and put them in a byte
Then take the 4 bits of either A or B and put them together in the high and low nibble of a byte
eg, in nibbles :
0A 1B 1B 0A 0A 0A 1B 0A
--[bit packed]-->
01100010 (binary) + ABBAAABA (nibbles)
(and A and B are not the hex numbers but mean 4 bits drawn from that source)
It looks like you have done a nice job of packing, but in fact you've really wrecked the data. The sources A and B had different statistics,
and in the original form the compressor would have been able to learn that, because the flag bit was right there in the byte with the payload.
But by packing it up tightly what you have done is made a bunch of bytes whose probability model is a mix of {bit flags},{source A},{source B},
which is a big mess.
I guess a related point is :
4. Even straightforward bit packing doesn't work for the reasons you think it does.
Say for example you have a bunch of bytes which only take on the values 0-3 (eg. use 2 bits). You might think that it would be a big win to do your own bit packing before the compressor and cram 4 bytes together into one. Well, maybe.
The issue is that the back end compressor will be able to do that exact same thing just as well. It can see that the bytes only take values 0-3 and thus will send them as 2 bits. It doesn't really need your help to see that. (you could help it if you had say some values that you knew were in 0-3 and some other values you knew were in 0-7, you might de-interleave those values so they are separated in the file, or somehow include their identity in the value so that their statistics don't get mixed up; see #5)
However, packing the bytes down can help in some cases. One is if the values are correlated to their neighbors; by packing them you get more of them near each other, so the correlation is modeled at an effective higher order. (eg. if the back end only used order-0 literals, then by packing you get order-3 (for one of the values anyway)). If the values are not neighbor-correlated, then packing will actually hurt.
(with a Huffman back end packing can also help because it allows you to get fractional bits per original value)
Also for small window LZ, packing down effectively increases the window size. Many people see advantages to packing data down before feeding it to Zip, but largely that is just reflective of the tiny 32k window in Zip (left over from the DOS days and totally insane that we're still using it).
5. Separating values that are independent :
I guess I've covered this in other points but it's significant enough to be redundant about. If you have two different sources (A and B); and there's not much correlation between the two, eg. A's and B's are unrelated, but the A's are correlated to other A's - you should try to deinterleave them.
A common simple case is AOS vs SOA. When you have a bunch of structs, often each value in the struct is more related to the same value in its neighbor struct than to other values within its own struct (eg. struct0.x is related to struct1.x more than to struct0.y). In this case, you should transform from array-of-structs to struct-of-arrays ; that is, put all the .x's together.
For example, it's well known that DXT1 compresses better if you de-interleave the end point colors from the palette interpolation indeces. Note that AOS-SOA transformation is very slow if done naively so this has to be considered as a tradeoff in the larger picture.
More generally when given a struct you want to use app-specific knowledge to pack together values that are strongly correlated and de-interleave values that are not.
1. Because cost ties are common, and ties are not actually ties (due to "last offset"), just changing the order that you visit matches can change your compression. eg. if you walk matches from long to short or short to long or low offset to high offset, etc.
Another important way to break ties is for speed. Basically prefer long matches and long literal runs vs. a series of shorter ones that make the same output length. Because the code cost is integer bytes, you can do this pretty easily by just adding a small bias to the cost (one thousandth of a byte or whatever) each time you start a new match or LRL.
(more generally in an ideal world every compressor should have a lagrange parameter for space-speed tradeoff, but that's the kind of thing nobody ever gets around to)
2. Traditional LZ coders did not output matches unless they were cheaper than literals. That is, say you send a match len in 4 bits and an offset in 12 bits, so a match is 2 bytes - you would think that the minimum match length should be 3 - not 2 - because sending a 2 byte match is pointless (it's cheaper or the same cost to send those 2 bytes as literals (cheaper as literals if you are in a literal run-len already)). By using a larger MML, you can send higher match lengths in your 4 bits, so it should be a win.
This is not true if you have "last offset". With LO in your coder, it is often beneficial to send matches which are not a win (vs literals) on their own. eg. in the above example, minimum match length should be 2 in an LO coder.
This is one of those cases where text and binary data differ drastically. If you never tested on structured data you would not see this. Really the nature of LZ compression on text and binary is so different that it's worth considering two totally independent compressors (or at least some different tweaked config vals). Text match offsets fall off very steadily in a perfect curve, and "last offsets" are only used for interrupted matches, not for re-using an offset (and generally don't help that much). Binary match offsets have very sparse histograms with lots of strong peaks at the record sizes in the file, and "last offset" is used often just as a way of cheaply encoding the common record distance.
On text, it is in fact best to use an MML which makes matches strictly smaller than literals.
If I keep at this work in the future I'm sure I'll get around to doing an LZ specifically designed for structured data; it's sort of hopeless trying to find a compromise that works great on both; I see a lot more win possible.
The sun, the heat. The big open spaces and shade trees. It makes me want to get naked and feel the hand of the sun on my skin, run around in the field, it's the way human beings should be. The vast rolling hills just beg you to get on a horse and ride (though our shitty world of fences and private property make that only a fantasy).
The biking is just fantastic. Pretty deserted roads (though there's more traffic than I remember being here 10 years ago), decent pavement. Grape vines and oak trees all around, and lovely rolling terrain. I hate flats, and I kind of hate endless climbs; here I have the sweet mix, a hard sprint climb, and then a fun windy descent, then a bit of a gradual climb, up and down, lots of variety, never boring. Really great riding, and so many different routes with varying difficulty levels, all out in the country but close by.
The smell; maybe above all the thing that hits me any time I come back to California are the sweet smells; sage and grass up on the dry hillsides, and bay laurel down in the river hollows, the gentle breezes just rich with the wild smells.
October might be the best time here; the grapes are ripe and just about to be picked (in fact there are pickers working right now); you can smell them as you ride around, or stop and have a snack. Wine grapes are super delicious; they have much more interesting flavors than the garbage you get in grocery stores, tons of weird musky notes and caramel and just lots of complexity, not so sweet and boring.
I love that everyone drives fast in California. It just makes life so much easier for me, because I'm not constantly fighting the general flow around me. (in fact being used to Northwest driving, I'm often the slowest person here). I know that it doesn't mean that people here are actually more intelligent or better drivers, they're just following the regional habit the same way Northwest people are (the way people so uniformly just go with the habit of their area is a great demonstration of how little actual individuality anyone has; 99% of your "personality" is just where you live and the time and place you were raised), but man it is a fucking bummer driving in the Northwest with all the up-tight busy-body dumbass passive-aggressive speed-limit followers (who are actually very dangerous drivers, because they don't adapt their behavior to the situation at hand).
Seattle Joy :
This is a memory of July/August, trying to remind myself of the good things.
Being able to walk down and swim in the lake is incredibly nice. I love to just swim out a hundred feet or so and float; getting away from shore you get a view of Mt Rainier and the city skyline. Incredibly it really never gets too crowded in the lake, and even on busy boating days it clears out around twilight, which is one of the best times to be in the lake.
It's really magical when the blackberries get ripe all over the city. The sweet rich smell fills the air and you get it just everywhere as you walk around town. You can ride around Mercer Island and stop and snack as you go.
I've found some pretty decent river swims; they aren't the river swims of my dreams (too cold, and not private enough to get naked), but they are a joy on those rare hot summer days, when you get a bracing dip that shocks you and makes you feel alive.
One of the things that I totally take for granted is that we have no bugs. I completely forget that it's true until I go visit some place like PA or The South where you just can't even sit outside at all without being attacked. It absolutely sucks to live in places with bugs and it's some kind of bizarre miracle that we don't have them (it makes no sense to me that we don't, there's lots of water, and it doesn't get that cold, it should be ideal mosquito land, wtf).
Of course the high mountains are really incredible. Once again I only got backpacking twice; every year I tell myself that I need to go more next year, but it doesn't happen. One problem is that I feel like I can't take that much time off work; another problem is that just staying in the city and swimming and biking near home is so sweet in the summer that the motivation to go way out to the woods is reduced. Anyway, once again I swear I'll try to get out more often next year. It would be easier if I could work in the mountains.
Comparing the Northwest with California, I've had some revelations about what makes for really great driving/riding roads. The driving & riding around Seattle just sucks, and surprisingly in CA which is a much much more populous state, that doesn't really seem to be that much older, it's way better.
The key thing for great roads is that they are somewhat old and now disused. That is, there had to be some reason to put in good country roads long ago (mining, farming) but now there is not much reason for people to be on those roads, so they are low traffic. They have to be old enough to be made before earth-moving equipment, so they are nice and windy and narrow.
The problem with the Northwest is it's just too young. Habitation in the area is only 100 years old; there aren't farm roads from 150 years ago. The only old roads are logging roads and those are/were dirt and temporary. There's only a handful of nice windy old mountain pass roads, and they all are popular tourist attractions which makes them no good for me.
Of course one of the things that makes the Central Coast area so great is the strict development controls that keep the towns from creeping into the countryside and devouring it with endless suburbs. With no housing subdivisions on these old farm roads there's not much use for them, and that makes them heaven for a windy road lover.
Being back in SLO gives me some perspective on how badly I've lived my life. Walking around downtown Tasha asked me if I did this or that, did I go to college parties? did I surf? did I make wine? No, not really. What did I do all the time that I lived here? Pretty much just worked. What a retard.
I feel like I accomplish a lot more than the average programmer, and I like to think that it's because I'm smart and more efficient; I think I have good methodology and solve problems more directly, but maybe I don't, maybe I just work more. When I'm in the moment I can't see it, but any time I look back at my life with 10 years or more of distance I go "wtf I was just doing nothing but working the whole time".
Maybe that's just the way it is for everyone; you work and buy groceries and sleep and go through life without ever doing much.
In related news, I think going out to dinner and going to movies and such is a really horrible way to spend your time. It doesn't really impact your life, you don't remember it down the road, it's just a way of killing time, it's not much better than watching TV or drinking booze (which is the ultimate in "please just make this lifetime go away with as little involvement from me as possible").
What I mean is, when you treat someone "like an adult", you let them be responsible for their own decisions, you let them suffer the ill consequences of their own mistakes, and you listen to their words as if they mean what they say literally. When you treat someone "like a child" , you clean up after them, you fix their mistakes for them, you assume that when they say something wrong they didn't mean it, etc.
I think some examples may clarify what I mean.
Say you're going hiking in the mountains with a friend. You notice that they have not brought a jacket and you know it will be cold up there. You say "hey do you want to borrow a jacket?" and they say "nah, I'll be fine". You know fully well they will not be fine. If you "treat them like an adult", you would just let them suffer the ill consequences of their bad decision, but the result will be unpleasant for you as well, they will complain, they'll be all pouty and in a bad mood, they'll want to leave quickly, it will suck. Either you can say "fuck you, I told you to bring a jacket, I want to stay, suck it up!" or you can accomodate them and leave early, and either way sucks. So the better thing is to "treat them like a child" and just say at the start "well I'll bring an extra one anyway in case you want it". (with particularly childish friends you shouldn't even say anything and just silently bring an extra one).
(The same goes with snacks and water and such obviously; basically you're better off being like a mom and carrying a pouch of supplies to keep all the "children" (ie. all humans) from getting cranky).
Say you're driving with your dad and you're lost and he doesn't want to stop for directions. If you treat him "like an adult" you would either just speak to him rationally and say "hey this is silly you need to stop and ask someone, don't be so childish" or you would just let him suffer the ill consequences of being lost. But of course neither of those would actually work (almost nobody responds well to having their bad behavior pointed out to them). What you need to do is treat him like a pouty child and fix the situation yourself; eg. say you really need to pee, can we stop for that please, and then ask for directions yourself.
A very common one is just when someone is really pouty or starts acting like a jerk to you. You could "treat them like an adult" and assume they are aware of what they are saying and actually mean to be a jerk to you. But in reality they probably don't, they are just hungry or cranky or need a poop (they are a child) and you shouldn't take it personally. If you need to interact with them, you should get them some food and water and try to fix their crankiness before proceeding.
I find in general that interactions with people work much better if I treat them like a child. (and the same goes in reverse - I get along much better with people who treat me like a child). (basically the idea of the rational self-responsible adult is an invention that does not correspond to reality)
(I guess a related thing that everyone in "communication" knows is you can't just criticize someone and expect them to rationally accept the information and decide if it is useful or not; you have to butter them up first and do it really gently and all that, just like you were trying to critique your child's drawing ("that tree is awful, trees don't look like lollipops, you moron"))
(I guess 99% of modern publicity is just treating people like children. It doesn't matter how good your product is if it has enough stars on the box; your store can sell garbage if it smells like cookies; it doesn't matter what the president actually says as long as he has good hair. I feel like in the 50's before PR was figured out, that media actually treated adults like adults a bit more, and the cleverness of the modern age is realizing that everyone is an easily manipulated pouty child (suck on your iNipple))
Related : thoughts on using money.
I have enough money now to live comfortably, much more so than when I was a child. The little differences are really what strike me. When I was a child of course you would buy store brand aluminum foil, of course you would use coupons, those dollars all mattered. Buying food at Disneyland was a huge luxury (to be avoided if possible, you can wait till we get out of the park, right?) because it was marked up so much. So the first good use of money is just hey you can buy whatever basic necessities you want and not waste your time worrying about the price.
I've tried various ways of spending money now and think that I've made some discoveries. Fancy cars and fancy houses are not good ways to spend money. They are not any better and don't improve your life. In general buying stuff/goods/toys is not helpful (except when it allows you to do an activity that you could not have otherwise done, and you actually do that activity and enjoy it; eg. buying fancy road bikes has zero value if you already had a bike that was good enough and you enjoyed riding; if it doesn't change your ability to do an activity, just making it faster or easier or whatever has zero actual value; but if you had no bike and buy one and then actually ride it, okay that's a good use of money).
Anyway, one of the best uses of money is just to fix all those little moments of crankiness. Like you're in a museum and you're kind of tired or hungry or thirsty; you start to get cranky and not enjoy it. My depression-era upbringing tells me to just gut it out; stay the hell away from the museum cafe, because it's crap food and it's way overpriced. But with money you can just buy the ten dollar tuna sandwich and it will fix your bad mood; that's a good use of money. (in my youth we would have brought homemade sandwiches).
It sort of reminds me of all the terrible park designers who make these circuitous awkward paths such that the direct route to get through the park is straight through the greenery, and then put up signs saying "stay off the grass". If you wanted people to stay off the grass you should have put the path in the natural place to walk. Of course people are going to cut off your dumb artsy loop. It's not their fault for walking on the grass, they're not doing anything wrong, you did something wrong by building your paths dumbly.
Driving down I-5 to get here, you're on this seemingly endless straight flat stretch of highway; I spent the whole time thinking about what incredible morons everyone around me was. First of all, hardly anybody seems to use cruise control, so I'm going along at 70 and people keep passing me and then slowing down in front of me and such annoying shit. Dumbasses. Even the people who are reasonably adept at controlling a car are just such inconsiderate assholes. For example people would constantly pull out to pass me at like 71 mph when I'm going 70, which causes them to box me in on the left for like half an hour because it takes them so long to pass; inevitably some truck is in the right lane and I get trapped. It was so consistent that I started to just hold the left lane (which I hate to do and felt like an asshole) until another car proved to me that they were a decent human being (eg. would pass me quickly). I'm trying to be less curteous to strangers; my new rule is that you have to give me some kind of sign that you aren't a waste of oxygen before I get upset with myself for inconveniencing you. (it's not really working yet, I still instinctively get out of the way of assholes who are rudely barging through a crowd, etc)
Tasha and I have both gotten speeding tickets recently while passing, right in that brief moment when we sped up to get the pass over with quickly. Speeding tickets in general are obviously a farce, so this is no surprise, but it's completely absurd to suggest that you should pass without speeding. If someone is going 3 mph under the limit and you want to pass (in a 65 mph zone, and assuming you need 200 feet of clearance on each side to pass safely) it would take 1.64 miles to make the pass. Of course the safest way to make a pass is to get it over with as quickly as possible, eg. pop up to 90 mph briefly; in a reasonable world you should get a ticket for passing without speeding up enough (which I occasionally see and is very dangerous) (or of course for blocking up traffic and not pulling out).
Because I have a ticket I'm trying to be really careful and not speed at all, and it *sucks* god it sucks. It's not the speed I miss, I'm actually totally fine with just driving slowly for a while, it's the ability to get away from all the dumb fuckers out there. If you actually drive the speed limit everywhere, you are constantly surrounded by other cars and they are just constantly doing cock-ass-motherfucker things like changing lanes right into me without signalling so that I have to take evasive action, or just hanging out right in my blind spot and matching my speed, or speeding up to pull in front of me and then slowing down again, etc. etc. It's just awful driving in a pack. I actually think it was much safer when I was speeding, because I would use it to find empty spots on the freeway and just get alone. I also think it's a lot safer to always be slightly faster than the average traffic, because then it's all layed out in front of you for you to see, rather than buzzing around and coming up from behind. (obviously this is a local optimization not a global one) (I think that most drivers are not actually watching for other cars the way I do; that's why nobody but me seems to care about that fact that almost every modern car has absolutely horrid visibility. When I drive I know exactly where every car around me is, so that I can always make an evasive move without looking, because I know there's a space on my left or right.)
In other news of "my god everyone is so incredibly dumb" I had three retail experiences in a row with the exact same bizarre dumb interaction. I went to this gross mongolian wok place and asked the cashier for a "grilled pork bowl" and she looked at me like I had just said "blerkabootyppsh" , she was like "err, what is that?"; eventually I re-checked the menu and said "barbecue pork bowl" and she was like "oh, okay". Huh? What? You can't figure out that maybe that's what I meant? It's not a very hard puzzle, the only things you sell are "chicken bowl" , "pork bowl", and "beef bowl", so just because I put the wrong adjective in front shouldn't have blown your mind (and FYI it's actually grilled not barbecued); it's like there's just empty space behind those eyes.
The one that really boggles my mind is the constant level of stupidity in coffee shops; I ordered a doppio somewhere and the girl was like "err.. uhh.. do you mean a double espresso?" , uh yeah, you work in a fucking espresso shop and you've never heard of a doppio before? Of course it always kind of blows my mind the way people can do a job day after day and not be at all interested in learning about it or doing it well.
1. Of course Lance Armstrong was on drugs; if you didn't know that, you're a moron. He completely dominated a field which was full of dopers, winning in the mountains and on the flats; if he could do that naturally he would be some kind of super-human abnormality, which of course he wasn't. It doesn't diminish his amazing achievements at all. Everyone he was competing against was on drugs too, so it was a totally level playing field. Everyone in cycling has been on drugs since maybe the 30's or so when they took straight amphetamines. (so did most atheletes in those days). You do know that Eddy Merckx was on drugs, right? And pretty much every TdF winner ever. Everyone in every sport in the world has been on drugs for roughly the past century, it's bizarre to act like it's a scandal. It's sort of like a man admitting that he thinks about women other than his wife and everyone gets all upset about it (harrumph and drops their monocle); it's fucking retarded to have these societal faux-pas that publicly we decry and nobody can admit to, but anybody with a brain knowns that everybody does it.
Even if you have perfect drug testing of pro athletes, it wouldn't diminish the importance or usage of drugs in sport. eg. say you tested every single day and the tests could detect everything so no doping was possible. That would just make it even more important for the kids to dope in high school before going pro and getting into the testing regime (which is what happens in NFL football these days; to be a football player you must use steroids in high school).
(The French obsession with taking down Lance for doping is particularly ridiculous; they're upset that Lance beat all their French stars so badly, and just generally upset that French cyclists all suck so bad these days, but of course the only French cyclists who have had any success at all recently (eg. Richard Virenque) were huge huge dopers (which is inevitable when you are carrying the expectations of a nation))
2. Sebastien Loeb was probably the greatest racing driver of all time. Unfortunately for him, the WRC format has just not been very interesting during his reign, and he didn't have the fortune of a good foil - he needed a rival to seriously challenge him and make it interesting for the fans, but nobody ever could. (it also didn't help that the cars are so boring now; historic rallies are probably the best rallies to watch now; I love to watch the old Ford Escorts in the 2wd historic rallies hanging the tail around every corner).
Obviously 9 straight championships speaks of his dominance, but if you actually watched some of the races you would appreciate that his supremacy was at a level even beyond what the numbers show. You could tell that he was playing it safe most of the time, that he always had a little more speed in the bank. Of course that's smart, and part of what made Loeb the greatest of all time, that he was not only skilled but crafty and good at managing the risk and percentages. He would drive just fast enough to win and no faster; sometimes he would fall behind in an early stage of a race, and then he would push a little harder and just rip time out of the competition, showing how much speed he really had.
3. The current Spanish national soccer team is a real joy to watch; one of the best international teams I've ever seen (but I don't watch a lot of soccer). The thing that makes them so great is they play a dynamic style with lots of movement (their movement off the ball is particularly good; they run great "give and go's" in basketball lingo), and they make goals from the natural flow of play. Way too many international teams use on a very boring defensive style, where they just randomly launch the ball forward or rely on set pieces (corner kicks, penalties) for goals. It's so much nicer to see goals come out of the flow of play rather than set pieces. The German teams in particular are always very effective but just agony to watch. Even the great Brazilian teams have fun individual flair but actually play a pretty defensive configuration most of the time and rely on just a few forwards to make something happen.
Soccer, like most sports, is clearly broken. By "broken" I mean that the rules of the game do not encourage beautiful play; in fact they punish it. A well designed game system makes it so that playing smartly (eg. to maximize the chance of winning) also causes you to play in a way that is elegant and nice to watch and in the spirit of the game. I don't think that playing 8 defenders and winning on penalty kicks is in the spirit of the game and the rules should not let that be such a profitable strategy.
4. I've been enjoying watching F1 recently, mainly as a nice way to zone out (it's very boring, a bit like watching golf or something, just a nice bit of mindless background).
One of the very annoying things about it is that all the video feeds are provided by the F1 organization (not each TV channel) and they are just terrible. The director seems to have very little clue about what is interesting to watch in racing. They're constantly showing cars in 20th place (HRT or whatever) going into the pits; oo 20th place pitted, better cut away to that, fascinating. And of course they cut away from the leaders right when they are setting up for a pass.
F1, like almost all sports, also has just terrible announcers. There are lots of interesting things happening all the time that you would not have any clue about unless you really know racing, because the commentators don't tell you. For example smart drivers like Alonso are very clever about how they interact with other cars; if he's trying to make a pass, he will pester the car in front in areas of the track where he is not planning to pass; this makes the leading car use up its KERS unwisely, meanwhile Alonso is saving all his KERS for the spot where he is planning to make the pass. Sometimes a tough pass is set up with fakes (bluffs) over the course of several laps; you show that you want to pass in one part of the track, so that the leading driver starts going defensive in that spot, then you actually make the pass in a totally other section where you have previously bluffed that you don't have pace. Good commentators should be telling you about these dynamics, as well as just constantly telling stories about the drivers to give you some background on how they interact with each other. If you actually stop and think about how good commentating could be, and how shitty it actually is, the gulf is massive. We've been kind of inured to just atrocious sports commentary, so much so that it is the expected norm (it doesn't feel right to watch football without some commentator saying "they need to get more physical").
I feel like Red Bull is actually way way better than all the other teams, but it doesn't seem that way on the surface because they keep getting torpedoed by the FIA. Every time they make another advancement that lets them run away with races, the rules get changed to make their innovation illegal. Certainly the racing is more interesting if the teams are close by, but constantly changing the rules to hinder the leader is not a good way to make a sport competitive.
I do a lot of weird threading stuff so my first fear was that I had some kind of race. So I turned off all my threading, but it kept happening.
My next thought was some kind of uninitialized memory problem or out-of-bounds problem. The circumstances of failure jive with the bug only happening after I have touched a lot of memory and maybe moved into a weird part of address space, or maybe I'm writing past the end of a buffer somewhere and it doesn't show up and hurt me until much later.
So I turned on my various debug allocator features and tried a bunch of things to stress that, but still couldn't get it to fail in any kind of repeatable way.
Yesterday I saw the exact same kind of bug happen in a few of my different compressors and the lightbulb finally came on in my head : maybe I have bad RAM. Memtest86 and just a few seconds in, yep, bad RAM.
Phew. As pissed as I am to have to deal with this (getting into the RAM on my lappy is a serious pain in the ass) it's nice to not actually have a bizarro bug.
The failure rate of RAM in desktop-replacement lappies is around 100% in my experience. I've had two different desktop replacement lappies in the past 8 years and I have burned out 3 RAM chips; I've blown the OEM RAM on both of them and on this one I also toasted the replacement RAM. Presumably the problem is that it just gets too hot in there and they don't have sufficient cooling. (and yes I keep them on a screen for air flow and all that, and never actually use them on a lap or pillow or anything bad like that). (perhaps I should get one of those laptop stands that has active cooling fans).
Also, shouldn't we have better debugging features by now?
I should be able to take any range of memory, not just page boundaries, and mark it as "no access". So for example I could take compression buffers and put little no access regions at the head and tail.
For uninitialized memory you want to be able to mark every allocation as "fault if it's read before it's written". (this requires a bit per byte which is cleared on write).
You could enforce true const in C by making a true_const template that marks its memory as read-only.
I've ranted before about how thread debugging would be much better if we could mark memory as "fault unless you are thread X", eg. give exclusive access of a memory region to a thread.
I see two good solutions for this : 1. a VM that could run your exe and add these features, or 2. special memory chips and MMU's for programmers. I certainly would pay extra for RAM that had an extra 2 bits per byte with access flags. Hell with how cheap RAM is these days I would pay extra for more error-correction bits too; maybe even completely duplicate bytes. And self-healing RAM wouldn't be bad either (just mark a page as unusable if it sees failures in that page).
(for thread debugging we should also have a VM that can record exact execution traces and replay them, of course).
1. It's kind of amazing to me how well LZNib does. (currently 30,986,634 on enwik8 with parallel chunked compress and LRM). I guess it's just the "asymptotic optimality" of LZ77; as the dictionary gets bigger, LZ77 approaches perfect compression (assuming the data source is static, which of course it never is, which is why LZ77 does not in fact approach the best compressor). But anyway, the point is with basic LZ the way matches are encoded becomes less and less important as the window gets bigger (and the average match length thus gets longer).
2. With byte-wise coders you have something funny in the optimal parser than you don't run into much with huffman or arithmetic coders : *ties*. That is, there are frequently many ways to code that have exactly the same code length. (in fact it's not uncommon for *all* the coding choices at a given position to produce the same total length).
You might think ties don't matter but in fact they do. One way you can break a tie is to favor speed; eg. break the tie by picking the encoding that decodes the fastest. But beyond that if your format has some feedback, the tie is important. For example in LZNib the "divider" value could be dynamic and set by feedback from the previous encoding.
In my LZNib I have "last offset" (repeat match), which is affected by ties.
3. My current decoder is around 800 mb/s on my machine. That's almost half the speed of LZ4 (around 1500 mb/s). I think there are a few things I could do to get a little more speed, but it's never going to get all the way. Presumably the main factor is the large window - LZ4 matches mostly come from L1 and if not then they are in L2. LZNib gets a lot of large offsets, thus more cache misses. It might help to do a lagrangian space-speed thing that picks smaller offsets when they don't hurt too much (certainly for breaking ties). (LZNib is also somewhat more branchy than LZ4 which is the other major source of speed loss)
4. One of the nice things about optimal parsing LZNib is that you can strictly pick the set of matches you need to consider. (and there are also enough choices for the optimal parser to make interesting decisions). Offsets can be sent in 12 bits, 20 bits, 28 bits, etc. so for each offset size you just pick the longest match in that window. (this is in contrast to any entropy-coded scheme where reducing to only a few matches is an approximation that hurts compression, or a fixed-size scheme like LZ4 that doesn't give the optimal parser any choices to make)
5. As usual I'm giving up some compression in the optimal parser by not considering all possible lengths for each match. eg. if you find a match of length 10 you should consider only using 3,4,5... ; I don't do that, I only consider lengths that result in a shorter match length code word. That is a small approximation but helps encoder speed a lot.
6. Since LZNib uses "last offset", the optimal parse is only approximate and that is an unsolved problem. Because big groups of offsets code to the same output size, the choice between those offsets should be made by how useful they are in the future as repeat matches, which is something I'm not doing yet.
(Animated gifs are so annoying.)
Measured speed on enwik8. This is the slow optimal encoder to give it something to do. enwik8 is encoded by breaking into 4 MB chunks (24 of them). Each chunk gets 4 MB of dictionary overlap precondition. Matches before the overlap are found using the LRM (Long Range Matcher). The LRM is created for the whole file and shared between all chunks.
What we see :
The speed dip from 0 to 1 workers is expected, it's the cost of firing up threads and communication and chunking and such. (0 = synchronous, just encode on the main thread).
My machine has 4 real cores and 8 hyper-cores. From 1-4 workers we see not-quite-linear speedup, but big steps. Once we get into the hyperthreads, the benefit is smaller but I'm still seeing steady speedup, which surprises me a bit, I thought it would flatten out more after 4 workers.
(the wiggle at 7 is probably just a random fluctuation in Windows (some service doing something I didn't ask it to do, you bastards); I only ran this test once so the numbers are not very solid; normally I run 40 trials or so when measuring speeds on Windows).
And here's the Oodle ThreadProfile of the encode showing what's happening all the threads :
Of course part of the reason for the not-quite-linear speedup is the gap at the end when not all the workers are busy. You can fix that by using smaller chunks, but it's really not anything to worry too much about. While it does affect the latency of this single "encode enwik8" operation, it doesn't affect throughput of the overall system under multiple workloads.
OodleLZHLW enwik8 compressed size variation with different chunkings :
28,326,489 4 MB chunks - no LRM
27,559,112 4 MB chunks with LRM
27,098,361 8 MB chunks with LRM , 4 matches
26,976,079 16 MB chunks , 4 matches
26,939,463 16 MB chunks , 8 matches
26,939,812 16 MB chunks , 8 matches, with thresholds
In each case the amount of overlap is = the chunk size (it's really overlap that affects the amount of
compression). After the first one, all others are with LRM. Note that the effective local dictionary size
varies as you parse through a chunk; eg. with 4 MB chunks, you start with 4 MB of overlap, so you have an
effective 4 MB local window, as you parse your window effectively grows up to a max of 8 MB, so the end of
each chunk is better compressed than the beginning.
My LZHLW optimal parse only considers 4 matches normally; as the overlap gets bigger, that becomes a worse compromise. Part of the problem is how those matches are chosen - I just take the 4 longest matches (and the lowest offset at each unique length). Normally this compromise is okay, you get a decent sampling of matches to choose from; on moderate file sizes the cost from going to infinite to 16 to 4 matches is not that great, but as the dictionary gets bigger, you will sometimes fill all 4 matches with high offsets (because they provide the longest match lengths) and not any low offsets to try.
At 16 MB chunks (+16 overlap = 32 MB total window) it becomes necessary to consider more matches. (in fact there's almost no benefit in going from 8 MB to 16 MB chunks without increasing the number of matches).
I tried adding "thresholds"; requiring that some of the matches found be in certain windows, but it didn't help; that merits more investigation. Intuitively it seems to me that the optimal parser wants to be able to choose between some long high-offset matches and some shorter low-offset matches, so the question is how to provide it a few good selections to consider. I think there's definitely some more win possible in my optimal parser by considering more matches, or by having a better heuristic to choose which matches to consider.
Some additional tricks which are becoming more or less standard these days :
1. Carry-forward "follows" matches. Previously discussed, see Hash1b post. (also in the Hash1b post : checking for improvement first).
2. "Good enough length". Once you find a match of length >= GOOD_ENOUGH (256 or 1024 or so), you stop the search. This helps in super-degenerate areas; eg. you are at a big run of zeros and that has occured many times before in your file, you can get into a very bad O(N^2) thing if you aren't careful, so once you find a long match, just take it. Hurts compression very little. (note this is not just a max match length; that does hurt compression a bit more (on super-compressable files))
3. Extra steps when not finding matches. The first place I saw this was in LZ4 and Snappy, dunno where it
was done first. The idea is when you fail to find a match, instead of stepping ahead by 1 you step ahead by
some variable amount. As you continue to fail to find matches, that variable amount increases. Something like :
ptr += 1 + (numSearchesWithNoMatchFound>>5);
instead of just ptr++. The idea is that on incompressible files (or incompressible portions of files) you
stop bothering with all the work to find matches that you won't find anyway. Once you get back to a compressible part,
the step resets.
4. Variable "amortize" (truncated hash search). A variant of #3 is to use a variable limit for the amortized hash search. Instead of just stepping over literals and doing no match search at all, you could do a match search but with a very short truncated limit. Alternatively, if you are spending too much time in the match finder, you could reduce the limit (eg. in degenerate cases not helped by the "good enough len"). The amortize limit might vary between 64 and 4096.
The goal of all this is to even out the speed of the LZ encoder.
The ideal scenario for an LZ encoder (greedy parsing) is that it finds a very long match (and thus can step over many bytes without doing any lookup at all), and it finds it in a hash bucket which has very few other entries, or if there are other entries they are very easily rejected (eg. they mismatch on the first byte).
The worst scenario for an LZ encoder (without our tricks) is either : 1. there are tons of long matches, so we go and visit tons of bytes before picking one, or 2. there are no matches (or only a very short match) but we had to look at tons of pointers in our hash bucket to find it, and we will have to do hash lookups many times in the file because we are not finding long matches.
1. I'm pretty sure that people who have "work-life balance" are not actually working. Not by my standard of "work". I see these people sometimes who manage to exercise every day, take a nice relaxing lunch break, stop working to be sweet to their wife or play with their kids. No fucking way those guys are working, you can't put in a solid self-destructing day when you're doing that.
It seems like you should be able to stop and take 30 minutes off for stretching or whatever, but in fact you can't. For one thing, if you are really in deep "work mode", it takes at least an hour to get out of it. Then if you really were working hard, your body and mind are exhausted when you get out so you don't want to do anything. Then when you do go back to work, it takes hours to really get your mind back up to full speed.
The worst are the motivational speaker douchebags who will tell you can get more done if you only work 1 hour a day, or the dot-com-luckboxes who made millions over some trivial bullshit and now think they are business geniuses. I get more done in 5 minutes than you fuckers have done in your entire lives. I don't think you have any concept of what people do when they're actually working.
2. I've been in crazy crunch all summer long, and only in the last few weeks have kind of "hit the wall" where it's become a bit unpleasant. Not just in terms of job work, but also exercising, house work, etc. it's been a summer of work work work, take a break from one kind of work to do another kind of work.
(aside : actually taking a break from work to do other work is wonderful; I find that almost any day on which I do a variety of jobs I'm quite happy; like 6 hours of job work, then a few hours of wood working in the garage, then a few hours of gardening; that's a nice day. Any day that I spend doing all the same work all day is a sucky horrible day; obviously job work all day long is miserable, but so is home improving all day long. I've never been much for socializing or relaxing or whatever you're supposed to do when you're not working, so a lifestyle of hobbies and chores is okay with me.
Sometimes I see these old guys, generally 50-60 or so, wirey leather-hard old guys, who are just always doing something, they built their own house, they're overhauling an old engine, carrying bags of feed; you know they're really miserable inside their own brains which is why they never stop working to just sit and think or talk with the family, but they've found a way to live by just keeping themselves busy all the time. I look at those old guys and think yeah I could see myself getting through life that way.)
Anyhoo, now that fall is rolling in my body & mind want to quit. It occurred to me that this is the ancient Northern European life cycle; when spring rolls around you kick into high gear and take advantage of the long days and work your ass off for a while, then falls rolls around and you retreat into your dens. In the long ago, Northern Europeans actually almost hibernated in the winter; they would sleep for 16+ hours a day, and their heart rates and calorie consumption would drastically lower.
One of the problems with the modern world is that Northern Europeans won. With the advent of artificial light and heat, they can keep that Northern European summer work ethic going all year round. Back in the ancient days if you lived somewhere where you could work year round (eg. the tropics) then of course you took a slower pace. It's a real un-human situation we've gotten ourselves into. The Northern Europeans had to work their asses off in the summer because they didn't have much opportunity; and they had to be really careful uptight jerks, cache their food carefully and repair their shelters and such, because if they didn't they would die in the winter.
To be repetitive : in the ancient days you had the tropical peoples that lived a slower pace year round, and the northern peoples who lived very intensely, but only for the brief summer. What we've got now is basically that intense summer pace of life, but year round.
(as usual I have no idea if this is actually true; but a good parable is much better than factual accuracy).
3. Work life quality is obviously a race for the bottom. Basically capitalism is a pressure against life quality. I suppose the fundamental reason is that productivity is ever increasing (as knowledge increases, the value of each laborer goes down), and population is also increasing. But anyway, it's clear that the fundamental direction of capitalism is towards worse life quality (*). There are two factors I see that resist it : 1. unions , and 2. new fields of work. (or 3. get to the top of the hierarchy)
(* = this is clearly a half baked thought, as there are various ways in which capitalism is a pressure towards better life quality overall. I guess I'm specifically talking about the pressure of competition in a field where the number of people that want to be in it is greater than what's really needed. All fields go through a progression where at first the number of people trying to do it is very small, there are great opportunities in that phase, but at some point it becomes a well known thing to do and then the pressure is towards worse and worse life quality. I'm also normalizing out the general improvement in life quality for all, since human perception also normalizes that out and it doesn't affect our perception of our life quality)
The "race for the bottom" basically works like this : say you have some job that pays decently and gives you decent life quality; someone else sees that and says "hey I'll do the same job but for 90% of the pay" or more often "I'll take the same pay but work 120% of the hours". Because there is excess labor, the life quality for the worker goes down and down.
New areas of work, where there is a relatively small pool of competent labor, is one of the few ways to avoid this. Software has been new enough to be quite nice for some time, but for your standard low-level computer programmer is already no longer so, and it will only get worse as it becomes more mature.
The "race for the bottom" also occurs due to competition. Say you're an independent, maybe you make an archiver like WinPackStuffSmall, if your competition starts working crazy hours adding more and more features, suddenly you have to do the same to compete; then anybody else who wants to get into that business has to work even harder; over time the profit gets smaller and the work conditions get worse. This has certainly happened in games; it's almost impossible to make a competitive game with a small budget in a small amount of time without just killing the employees.
Anyway, I certainly feel it in data compression; there are so many smart guys putting in tons of work on compression for free because it's fun work, that you can't compete unless you really push hard. If you're going for the maximum compression prize and somebody else is putting in just killer work to do all the little things that get you a little more win, you can't compete unless you do it too. Being more efficient or having better ideas wins you a little bit of relaxation, but in the end some grindstone time is inevitable.
4. I really want my cabin in the woods to go off and work. It's too hard for me to try to work and live a normal life at the same time; I'd like to be able to just go out and be alone and eat gruel and code for a week straight.
For a while I was thinking about buying my own piece of land and building a little basic shack. But now that I own a house I'm not so sure. Owning property fucking sucks, it's a constant pain in the ass. (the only thing worse is renting in America, where the landlords have all the rights and are even worse pains in the ass). Sometimes I think that it would be nice to own a piece of mountain land, maybe an orchard, a beach house in the tropics, that that would be a legacy to pass on to my children, to stay in the family, but god damn it's a pain in the ass maintaining properties.
I wish I could find a rental, but I just cannot find anything decent, which is very odd, I know it must be out there. If I went out to my coding shack and I owned it, I would spend much of the time stressed out about the fixes I should be doing to it, at least with a rental I can go "yeah this place sucks but it's not my problem".
I sort of vaguely considered going backpacking-working, but I can't stand working on laptops, and carrying out the standing desk seems a bit difficult. (I said to James when we were backpacking that if I was rich it would be sweet to go backpacking and have a mule team or something carry in a nice bit of kit for you, way back into the inaccessible wilderness, so you could be out there all alone but with a nice supply of non-freeze-dried food (and a keyboard and standing desk) (like an old timey British explorer; have a coolie carry my oak desk into the woods on his back).
I do think the best way for a programmer to work (well, me anyway) is not the steady put in 8 hours every day and plod along. It's take a few weeks off and basically don't work at all, then go heavy crunch for a few months where you just dive in and every thought is about work. It's so much better if you can stay focused on the problem and not have to pop the stack and try to be relaxed and social and such. I'm not fucking relaxed, I can't chit chat with you, I have shit to get done! Unfortunately the periodic lifestyle doesn't work very well with other people in your life. (and mainstream employers expect you to do the crunch part but not the take a break part).
5. I've always thought that the ideal job would be a seasonal/periodic one. Something like being an F1 engineer, or an NFL coach. (NFL coach was my dream job in college; now I think F1 engineer looks mighty attractive; you get the fun of competition, and then you get to go back to your shed and study film and develop strategies and run computer models). There's some phase when you're "on" where you just work like crazy, and then you get a little bit of a break in the off season. (unfortunately, due to the "race to the bottom", the break in these kinds of jobs is disappearing; back in the ancient days they really were seasonal, in the off season everyone would just go relax, but then uptight assholes starting taking the jobs and working year round, and now that's more the norm).
The other awesome thing about F1 engineer or NFL coach is that you get a big feedback moment (eg. "I won" or "I lost") which is very cathartic either way and gives you nice resolution. For me the absolute worst kind of work is the never-ending maintenance; you do some work, and then you do some more; guess what, next year you do some more; there's no real end point. Working on games at least does have that end point (whew, we shipped!) but they're way too far apart to be a nice cyclical lifestyle; you want it once a year, not once every 3-4 years.
I also like the overt competition in those kind of jobs. Real intellectual competition is one of the most fun things in the world; it's what I loved about poker, about going after the most compression, the Netflix prize, etc. It's so cool to see someone else beat you, and you get motivated and try to figure out how they did it, or take the next step yourself and come back and top them. Love that. And you don't have to listen to any dumb fucker's opinion about what the best way is, you go out and prove it in the field of combat; if your ideas are right, you win.
(for quite a while I've been thinking about making my own indie game, solely for the competitive aspect of it; I want to prove I can make a game faster and better than my peers. I really have very little interest in games, for me the "game" is the programming; I want to win at programming. Good lord that is a childish bad reason to make a game. Anyway that part of me is slowly dieing as I get older so the chance of me actually making an indie game declines by the day.)
6. I can be quite happy with a simple lifestyle : work really hard, then exercise hard to release the stress and relax the body, then sleep a lot. It actually feels pretty great. The problem is it's an unstable equilibrium, like a pendulum on its tip. The slightest disturbance sends it toppling down.
Any day you don't get enough sleep, suddenly the work is agony and you don't feel like exercising, and then you carry the stress and it's a disaster. In this lifestyle I feel very productive and healthy, but I'm also very prickly; you have to be quite self-defensive to make it work, you can't let people sap your energy because you are so close to using all the energy you have. You will seem quite unreasonable to others; if someone asks you for a little favor or even just wants to socialize or something; no, sorry I can't do it; I have to work and then I have to go swim or everything is going to come crashing down.
I see a lot of the type A successful douchebag types living this lifestyle, and I've never quite put my finger on it about what makes it so douchey. I suppose part of it is jealousy; somebody who actually manages to put in a hard day of work and then exercise off the stress and have a good evening is something that I am rarely able to do, and I'm jealous of people who pull it off. But part of it is that it is a very self-centered lifestyle; you have to be very selfish to make it work.
7. I certainly am aware that I am using work to avoid life at the moment. I've got a bunch of home improving I need to do and other such shite that I really don't want to deal with, so every morning I wake up and just get straight on the computer to do RAD work so that I don't have to think about any of the other things I should be doing.
Of course that's not unusual. I have quite a few friends/acquaintances around here who very reliably use work to avoid life; they can't do this or that "because they have to work". Bullshit, of course you don't have to work right at that moment, you almost never do, you're just avoiding life. It's not really even a lie; if you think it's a lie it's just because you're listening too literally; they're really just saying "no I don't want to" or "my head is all fucked up right now and it's better if I don't spend time in the normal world".
A few months ago I had a fence put in, and on the day that the guys were doing the layout, I felt like I had to be at the office. Of course they did some kind of fucked up things because I wasn't there to supervise, and of course looking back now I can't even remember why it was I felt like I really had to go to work that day, of course I didn't.
8. The times that I really kill myself working are 1. when a team depends on me; like if I made a bug that's blocking the artists, of course I'll kill myself until it's fixed (and you're an asshole if you don't), 2. when I'm working on something that I kind of am not supposed to be; eg. back when I did VIPM at WildTangent or cube map lights at Oddworld or many things at RAD (LZNib the latest); even if it's sort of within the scope of what I should be doing, if it's not what I would have told myself to do if I was the producer, then I feel guilty about it and try to get it over with as quickly as possible and feel bad about it the whole time. 3. when I'm embarassed; that's maybe the biggest motivator. If I release a product that has bugs, that's embarassing, or if I claim something is the best and then find out it's not, I have to go and kill myself to make it right.
Right now I'm embarassed about how long Oodle has taken to get out, so I'm trying to fix that.
9. There's a kind of mania you can get into when you're working a lot where you stop seeing the forest for the trees. You can dive down a hole, and you just keep doing stuff, you're knocking items off the todo list, but you aren't seeing the big picture. It's like the more you work the more you only see the foreground, the details. You have to stop and take a break to take stock and realize you should move onto a different topic.
Sometimes when you are faced with a mountain of tasks and are kind of overwhelmed about where to start, the best thing is to just pick one and do it, then the next, and eventually you will be done. But that rarely works with code, because there are really an infinite number of tasks, doing each one creates two new ones, so "putting your head down" (as producers love to say) can be non-productive.
(see previous notes on how Windows buffering works and why this is fastest :
cbloom rants 10-06-08 - 2
cbloom rants 10-07-08 - 2
cbloom rants 10-09-08 - 2
)
Not buffering writes also has other advantages besides raw speed, such as not polluting the file cache; if you buffer writes, then first some existing cache page is evicted, then the page is zero'ed, then your bytes are copied in, and finally it goes out to disk. Particularly if you are streaming out large amounts of data, there's no need to dump out a bunch of read-cached data for your write pages (which is what Windows will do because its page allocation strategy is very greedy).
(the major exception to unbuffered writes being best is if you will read the data soon after writing; eg. if you're writing out a file so that some other component can read it in again immediately; that usage is relatively rare, but important to keep in mind)
Anyhoo, this post is a small note to remind myself of a caveat :
If you are benchmarking apps by their time to run (eg. as an exe on a command line), buffered writes can appear to be much much faster. The reason is that the writes are not actually done when the app exits. When you do a WriteFile to a buffered file, it synchronously reserves the page and zeroes it and copies your data in. But the actual writing out to disk is deferred and is done by the Windows cache maintenance thread at some later time. Your app is even allowed to exit completely with those pages unwritten, and they will trickle out to disk eventually.
For a little command line app, this is a better experience for the user - the app runs much faster as far as they are concerned. So you should probably use buffered writes in this case.
For a long-running app (more than a few seconds) that doesn't care much about the edge conditions around shutdown, you care more about speed while your app is running (and also CPU consumption) - you should probable use unbuffered writes.
(the benefit for write throughput is not the only compelling factor, unbuffered writes also consume less CPU due to avoiding a memset and memcpy).
Let me now try to make that note more precise :
With an adaptive model to really do things right you must :
1. Initialize to a smart/tuned initial condition (not just 50/50 probabilities or an empty model)
2. Train the model with carefully trained rates; perhaps faster learning at first then slowing down;
perhaps different rates in different parts of the model
3. Reset the model smartly at data-change boundaries, or perhaps have multiple learning scales
4. Be careful of making the adaptive model too big for your data; eg. don't use a huge model space
that will be overly sparse on small files, but also don't use a tiny model that can't learn about
big files
With a static model to do things right you must :
1. Transmit the model very compactly, using assumptions about what the model is like typically;
transmit model refreshes as deltas
2. Send model refreshes in the appropriate places; the encoder must optimally choose model refresh points
3. Be able to send variable amounts of model; eg. with order-1 huffman decide which contexts get their
own statistics and which go into a shared group
4. Be able to send the model with varying degrees of precision; eg. be able to approximate when that's better
for the overall size(model) + size(coded bits)
We've seen over and over in compression that these can be the same. For example with linear-prediction lossless image compression, assuming you are doing LSQR fits to make predictors, you can either use the local neighborhood and generate an LSQR in the decoder each time, or you can transmit the LSQR fits at the start of the file. It turns out that either way compression is about the same (!!* BUT only if the encoder in the latter case is quite smart about deciding how many fits to send and how precise they are and what pixels they apply to).
Same thing with coding residuals of predictions in images. You can either do an adaptive coder (which needs to be pretty sophisticated these days; it should have variable learning rates and tiers, ala the Fenwick symbol-rank work; most people do this without realizing it just by having independent statistics for the low values and the high values) or you can create static shaped laplacian models and select a model for each coefficient. It turns out they are about the same.
The trade off is that the static model way needs a very sophisticated encoder which can optimize the total size (sizeof(transmitted model) + sizeof(coded bits)) , but then it gets a simpler decoder.
(caveat : this is not applicable to compressors where the model is huge, like PPM/CM/etc.)
A lot of people incorrectly think that adaptive models offer better compression. That's not really true, but it is *much* easier to write a compressor that achieves good compression with an adaptive model. With static models, there is a huge over-complete set of ways to encode the data, and you need a very complex optimizing encoder to find the smallest rep. (see, eg. video coders).
Even something as simple as doing order-0 Huffman and choosing the optimal points to retransmit the model is a very hard unsolved problem. And that's just the very tip of the iceberg for static models; even just staying with order-0 Huffman you could do much more; eg. instead of retransmitting a whole model, send a delta instead. Instead of sending the delta to the ideal code lens, instead send a smaller delta to non-ideal codelens (that makes a smaller total len); instead of sending new code lens, select from one of your previous huffmans. Perhaps have 16 known huffmans that you can select from and not transmit anything (would help a lot for small buffers). etc. etc. It's very complex.
Another issue with static models is that you really need to boil the data down to its simplest form for static models to work well. For example with images you want to be in post-predictor space with bias adjusted and all that gubbins before using a static model; on text you want to be in post-BWT space or something like that; eg. you want to get as close to decorrelated as possible. With adaptive models it's much easier to just toss in some extra context bits and let the model do the decorrelation for you. Put another way, static models need much more human guidance in their creation and study about how to be minimal, whereas adaptive models work much better when you treat them badly.
So I finally had a deeper look to sort it out. The short answer is that LZHAM has some sort of very long initialization (even for just the decoder) which makes its speed extremely poor on small buffers. I was seeing speeds like 2 MB/sec , much worse than LZMA (which generally gets 10-25 MB/sec on my machine). (this is just from calling lzham_lib_decompress_memory)
On large buffers, LZHAM is in fact pretty fast (some numbers below). The space-speed is very good (on large buffers); it gets almost LZMA compression with much faster decodes. Unfortunately the breakdown on small buffers makes it not a general solution at the moment IMO (it's also very slow on incompressible and nearly-incompressible data). I imagine it's something like the huffman table construction is very slow, which gets hidden on large files but dominates small ones.
Anyhoo, here are some numbers. Decode shows mb/s.
BTW BEWARE : don't pay too much attention to enwik8 results; compressing huge amounts of text is irrelevant to almost all users. The results on lzt99 are more reflective of typical use.
| name | lzt99 | decode |
| raw | 24700820 | inf |
| lz4 | 14814442 | 1718.72 |
| zlib | 13115250 | 213.99 |
| oodlelzhlw | 10164511 | 287.54 |
| lzham | 10066153 | 61.24 |
| lzma | 9344463 | 29.77 |
| name | enwik8 | dec |
| raw | 100000000 | inf |
| lz4 | 42210253 | 1032.34 |
| zlib | 36445770 | 186.96 |
| oodlelzhlw | 27729121 | 258.46 |
| lzham | 24769055 | 103.01 |
| lzma | 24772996 | 54.59 |
(lzma should beat lzham on enwik8 but I can't be bothered to fiddle with all the compress options to find the ones that make it win; this is just setting both to "uber" (and -9) parse level and setting dict size = 2^29 for both)
And some charts for lzt99. See the previous post on how to read the charts .
Any time you do a string search based on hashes you will have a degeneracy problem. We saw this with the standard "Hash1b" (Hash->links) string matcher. In short, the problem is if you have many occurances of the same hash, then exact string matching becomes very slow. The standard solution is to truncate your search at some number of maximum steps (aka "amortized hashing"), but that has potentially unbounded cost (though typically low).
We have this problem with LRM and I brushed it under the rug last time. When you are doing "seperate scan" (eg. not incrementally adding to the hash table), then there's no need to have a truncated search, instead you can just have a truncated insert. That is, if you're limitting your search to 10, then don't add 1000 of the same hash and only ever search 10, just add 10. In fact on my test files it's not terrible to limit the LRM search to just 1 (!).
But I'm not happy with that as a general solution because there is a potential for huge inefficiency. The really bad degenerate case
looks something like this :
LRM hash length is 32 or whatever
Lots of strings in the file of length 32 have the same hash value
You only add 100 or so to the hash
One of the ones you didn't add would have provided a really good match
Typically, missing that match is not a disaster, because at the next byte you will roll to a new hash and look that up, and so on,
so if you miss a 128k long match, you will usually find a (128k - 256) long match 256 bytes later. But it is possible to miss it for
a long time if you are unlucky, and I like my inefficiency to be bounded. The more common bad case is that you get matches just a bit
shorter than possible, and that happens many times, and it adds up to compression lost. eg. say hash length is 16 and there are 24 byte matches
possible, but due to the reduced search you only find 16 or 18-length matches.
But most importantly, I don't like to throw away compression for no good reason, I want to know that the speed of doing it this approximate way is worth it vs. a more exact matcher.
There are a few obvious solutions with LRM :
1. Push matches backwards :
If you find a match at pos P of length L, that match might also have worked at pos (P-1) for length (L+1), but a match wasn't found there, either because of the approximate search or because hashes are only put in the dictionary every N bytes.
In practice you want to be scanning matches forward (so that you can roll the hash forward, and also so you can carry forward "last match follows" in generate cases), so to implement this you probably want to have a circular window of the next 256 positions or whatever with matches in them.
This is almost free (in terms of speed and memory use) so should really be in any LRM.
2. Multiple Hashes :
The simplest form of this is to do two hashes; like one of length 16 and one of length 64 (or whatever). The shorter hash is the main one you use to find most matches, the longer hash is there to make sure you can find the big matches.
That is, this is trying to reduce the chance that you miss out on a very long match due to truncating the search on the short hash. More generally, to really be scale-invariant, you should have a bunch of levels; length 16,64,256,1024,etc. Unfortunately implementing this the naive way (by simply having several independent hashes and tables) hurts speed by a linear factor.
3. Multiple Non-Redundant Hashes :
The previous scheme has some obvious inefficiencies; why are we doing completely independent hash lookups when in fact you can't match a 64-long hash if you don't match a 16-long hash.
So you can imagine that we would first do a 16-long hash , in a lookup where the hashes have been unique'd (each hash value only occurs once), then for each 16-long hash there is another table of the 64-long hashes that occured for that 16-long hash. So then we look up in the next. If one is found there, we look in the 256-long etc.
An alternative way to imagine this is as a sorted array. For each entry you store a hash of 16,64,256,etc. You compare first on the 16-long hash, then for entries where that is equal you compare on the 64-long hash, etc. So to lookup you first use the 16-long hash and do a sorted array lookup; then in each range of equal hashes you do another sorted array lookup on the 64-long hash, etc.
These methods are okay, but the storage requirements are too high in the naive rep. You can in fact store them compactly but it all gets a bit complicated.
4. Hash-suffix sort :
Of course it should occur to us that what we're doing in #3 is really just a form of coarse suffix sort! Why not just actually use a suffix sort?
One way is like this : for each 16-byte sequence of the file, replace it with a 4-byte U32 hash value, so the array shrinks by 4X. Now suffix-sort this array of hash values, but use a U32 alphabet instead of a U8 alphabet; that is, suffix strings only start on every 4th byte.
To lookup you can use normal sorted-array lookup strategies (binary search, interpolation search, jump-ins + binary or jump-ins + interpolation, etc). So you start with a 16-byte hash to get into the suffix sort, then if you match you use the next 16-byte hash to step further, etc.
I'm using a rolling hash, and a sorted-array lookup. Building the lookup is insanely fast. One problem with it in an Optimal Parsing scenario is that when you get a very long LRM match, you will see it over and over (hence N^2 kind of badness), so I use a heuristic that if I get a match over some threshold (256 or 1024) I don't look for any more in that interval.
For a rolling hash I'm currently using just multiplicative hash with modulus of 2^32 and no additive constant. I have no idea if this is good, I've not had much luck finding good reference material on rolling hashes. (and yes of course I've read the Wikipedia and such easily Googleable stuff; I've not tested buzhash yet; I don't like table lookups for hashing, I needs all my L1 for myself)
LRM builds its search structure by hashing L bytes (LRM hash length is a parameter (default is 12)) every S bytes (S step is a parameter (default is 10)). Memory use is 8 bytes per LRM entry, so a step of 8 would mean the LRM uses memory equal to the size of the file. For large files you have to increase the step. Hash length does not affect memory use.
So anyhoo, I tested on enwik8. This is a test of different dictionary overlaps and LRM settings.
Compression works like this :
I split the file into 8 M chunks. (12 of them)
Chunks are compressed independently (in parallel).
Each chunk preconditions its dictionary with "overlap" bytes preceding the chunk.
Each chunk can also use the LRM to match the entire file preceding its overlap range.
So for each chunk the view of the file is like :
[... LRM matches here ...][ overlap precondition ][ my chunk ][ not my business ]
(note : enwik8 is 100mB (millions of bytes) not 100 MB, which means that 12*8 MB chunks = 96 MB actually
covers the 100 mB).
(BTW of course compression is maximized if you don't do any chunking, or set the overlap to infinity; we want chunking for parallelism, and we want overlap to be finite to limit memory use; enwik8 is actually small enough to do infinite overlap, but we want a solution that has bounded memory use for arbitrarily large files)
With no further ado, some data. Varying the amount of chunk dictionary overlap :
| overlap MB | no LRM | LRM |
| 0 | 32709771 | 31842119 |
| 1 | 32355536 | 31797627 |
| 2 | 32203046 | 31692184 |
| 3 | 32105834 | 31628054 |
| 4 | 32020438 | 31568893 |
| 5 | 31947086 | 31518298 |
| 6 | 31870320 | 31463842 |
| 7 | 31797504 | 31409024 |
| 8 | 31731210 | 31361250 |
| 9 | 31673081 | 31397825 |
| 10 | 31619846 | 31355133 |
| 11 | 31571057 | 31316477 |
| 12 | 31527702 | 31281434 |
| 13 | 31492445 | 31253955 |
| 14 | 31462962 | 31231454 |
| 15 | 31431215 | 31206202 |
| 16 | 31408009 | 31189477 |
| 17 | 31391335 | 31215474 |
| 18 | 31374592 | 31202448 |
| 19 | 31361233 | 31192874 |
0 overlap means the chunks are totally independent. My LRM has a minimum match length of 8 and also must match a hash equal to the rolling hash length. The "with LRM" in the above test used a step of 10 and hash length of 12.
LRM helps less as the overlap gets bigger, because you find the most important matches in the LRM region. Also enwik8 being text doesn't really have that huge repeated block that lots of binary data has. (on many of my binary test files, turning on LRM gives a huge jump because some chunk is completely repeated from the beginning of the file to the end). On text it's more incremental.
We can also look at how compression varies with the LRM step and hash length :
| lrm_step | lrm_length | compressed |
| 0 | 0 | 32020443 |
| 32 | 32 | 31846039 |
| 16 | 32 | 31801822 |
| 16 | 16 | 31669798 |
| 12 | 16 | 31629439 |
| 8 | 16 | 31566822 |
| 12 | 12 | 31599906 |
| 10 | 12 | 31568893 |
| 8 | 12 | 31529746 |
| 6 | 12 | 31478409 |
| 8 | 8 | 31511345 |
| 6 | 8 | 31457094 |
(this test was run with 4 MB overlap). On text you really want the shortest hash you can get. That's not true for binary though, 12 or 16 is usually best. Longer than that hurts compression a little but may help speed.
For reference, some other compressors on enwik8 (from Matt's
LTCB page )
enwik8
lzham alpha 3 x64 -m4 -d29 24,954,329
cabarc 1.00.0601 -m lzx:21 28,465,607
bzip2 1.0.2 -9 29,008,736
crush 0.01 cx 32,577,338
gzip 1.3.5 -9 36,445,248
ADDENDUM : newer LRM, with bubble-back and other improvement :
LZNib enwik8
lots of win from bubble back (LRM Scanner Windowed) :
32/32 no bubble : 31,522,926
32/32 wi bubble : 31,445,380
12/12 wi bubble : 30,983,058
10/12 no bubble : 31,268,849
10/12 wi bubble : 30,958,529
6/10 wi bubble : 30,886,133
This may seem really bone-headed but it's very very common. You may recall in some of my past posts I looked into some basic video stuff in great detail and found that almost every major video encoder/decoder is broken on the simple stuff :
the DCT (or whatever transform)
quantization/dequantization
color space conversion
upsample/downsample
translating float math into ints
These things are all relatively trivial, but it's just SO common to get them wrong, and you throw away a bottom bit (or half a bottom bit) when you do so. Any time you are writing a compressor, you need to write reference implementations of all these basic things that you know are right - and check them! And then a crucial thing is : keep the reference implementation around! Ideally you would be able to switch it on from the command line, or failing that with a build toggle, so at anytime you can go back and enable the slow mode and make sure everything works as expected.
(of course a frequent cause of trouble is that people go and grab an optimized integer implementation that they found somewhere, and it either is bad or they use it incorrectly (eg. maybe it assumes data that's biased in a certain way, or centered at 0, or scaled up by *8, or etc))
A lot of this basic stuff in video is very easy to do regression tests on (eg. take some random 8x8 data, dct, quantize, dequantize, idct, measure the error, it should be very low) so there's no excuse to get it wrong. But even very experienced programmers do get it wrong, because they get lazy. They might even start with a reference implementation they know is right, and then they start optimizing and translating stuff into ints or SIMD, and they don't maintain the slow path, and somewhere along the line a mistake slips in and they don't even know it.
I've been thinking about a more difficult problem, which is : how do you deal with bugs in compression algorithms?
I don't mean bugs like "it crashes" or "the decompressor doesn't reproduce the original data" - those are the easy kind of bugs and you just go fix them. I mean bugs that cause the compressor to not work the way you intended, and thus not compress as much as it should.
The very hard thing about these bugs is that you can have them and not even know it; I'm sure I have a hundred of them right now. Frequently they are tiny things like you have a less-than where you should have a less-or-equal.
To avoid them really requires a level of care that most programmers never use. You have to be super vigilant. Any time something surprises you or is a bit fishy, you can't just go "hmm that's weird, oh well, move on to the next task". You have to stop and think and look into it. You have to gather data obsessively.
Any time you implement some new idea and it doesn't give you the compression win you expected, you can't just say "oh well guess that didn't work", you have to treat it like a crash bug, and go set breakpoints and watch your variables and make sure it really is doing what you think; and if it is, then you have to gather stats about how often that code is hit and what the values are, and see where your expectations didn't match reality.
I've really been enjoying working on compression again. It's one of the most fun areas of programming that exists. What makes it great :
1. Clear objective measure of success. You can measure size and speed (or whatever other criteria) and see if you are doing well or not. (lossy compression is harder for this).
2. Decompressors are one extreme of fun "hacker" programming; they have to be very lean; great decompressors are like haikus, they're little pearls that you feel could not get any simpler without ruining them.
3. Compressors, on the other hand, can be big and slow, and you get to pull out all the big guns of algorithms for optimization and searching and so on.
I want to be able to use CPP to make lists of N args (N a compile-time variable) like :
(int stuff0, int stuff1, int stuff2)
In MSVC (any old compiler where CPP is just text sub) you can do this quite easily.
#define LIST1(prefix,between) RR_STRING_JOIN(prefix,1)
#define LIST2(prefix,between) LIST1(prefix,between) between RR_STRING_JOIN(prefix,2)
#define LIST3(prefix,between) LIST2(prefix,between) between RR_STRING_JOIN(prefix,3)
#define LIST4(prefix,between) LIST3(prefix,between) between RR_STRING_JOIN(prefix,4)
#define LIST5(prefix,between) LIST4(prefix,between) between RR_STRING_JOIN(prefix,5)
#define LIST6(prefix,between) LIST5(prefix,between) between RR_STRING_JOIN(prefix,6)
#define LIST7(prefix,between) LIST6(prefix,between) between RR_STRING_JOIN(prefix,7)
#define LIST8(prefix,between) LIST7(prefix,between) between RR_STRING_JOIN(prefix,8)
#define LIST9(prefix,between) LIST8(prefix,between) between RR_STRING_JOIN(prefix,9)
#define LISTN(N,prefix,between) RR_STRING_JOIN(LIST,N)(prefix,between)
#define LISTCOMMAS(N,prefix) LISTN(N,prefix,COMMA)
#define COMMA ,
and then you can use it like :
#define TestFuncN(N) void RR_STRING_JOIN(TestFunc,N) ( LISTCOMMAS(N,int arg) );
Similarly for other variants of LIST, and then you can quite neatly construct structs/templates that take N args of N types.
But this doesn't work in compilers (GCC) with the newer standard that says preprocessor tokens have to be C identifiers (or whatever pedantic thing it says). IMO it's another one of those GCC/C99 (C89?) things that breaks old code and takes power away from the programmer and has very little to no benefit. (I guess I just really don't like the strict C standard).
Urg. GCC is like the nit-picky guy on the team who wants to endlessly debate some pointless crap that doesn't actually help anyone.
Is there any way to do this type of thing correctly? I'm so sick of manually making variants for N args every time I want this, the freaking preprocessor is the perfect tool to make all the N-arg variants for me if only they would let me use it.
(see for example : cblib/autoprintf.inl or cblib/callback.h)
To be concrete, a common usage case is something like cblib/callback where you want a struct to encapsulate
a member function call. You have to do something like :
explicit CallbackM3(T_ClassPtr c, T_fun_type f, Arg1 a1 , Arg2 a2, Arg3 a3, double when) : Callback(when)
{
ASSERT( c != NULL && f != NULL );
__p = c;
_mem_fun = f;
_arg1 = a1;
_arg2 = a2;
_arg3 = a3;
}
and write variants for every number of args. With the LIST macros I could very easily make variants for N args
automatically.
... later ...
Oh well, I just bit the bullet and made a bunch of these :
#if NUM >= 1
prefix1
#if NUM >= 2
prefix2
#if NUM >= 3
prefix3
#if NUM >= 4
prefix4
#if NUM >= 5
prefix5
#if NUM >= 6
prefix6
#if NUM >= 7
prefix7
#if NUM >= 8
prefix8
#if NUM >= 9
prefix9
#endif
#endif
#endif
#endif
#endif
#endif
#endif
#endif
#endif
which is much much worse than the LIST solution but works in GCC. And of course that could be much
nicer if you could put preprocessor directives inside macros, cuz then I could just have a macro that
does that, instead I have to copy-paste the whole thing all over.
Bleh. How lame is it that C++98 doesn't have a way to encapsulate a function call in a struct. (yes yes I know we finally have lambdas now, well not now but maybe in 10 years or so).
Another thing that would have made this all easier would be if C has a "null" type. Then I could just
make templates that always take 10 arguments, and for the fewer-argument variants I could make the later
ones be null types. eg. for cases like :
Hmm.. so one option for this is I could just run a different CPP before compiling; it used to be that CPP was
completely independent from the compiler, and you could do whatever you want there, but I'm not sure that's the
case any more. (eg. assuming I want things like debugging in the original pre-CPP code to work). Or I could
just eat the pain once again and work around GCC yet again.
Starting from the clearest cases to the least clear :
There are two types of parsing, I will call them Optimal and Greedy. What I really mean is Optimal = find matches at every position in the file,
and Greedy = find matches only at some positions, and then take big steps over the match. (Lazy parses and such are in the Greedy category).
There are three types of windowing : 1. not windowed (eg. infinite window or we don't really care about windowing), 2. fixed window; eg. you
send offsets in 16 bits so you have a 64k window for matches, 3. multi-window or "cascade" ; eg. you send offsets up to 128 in 1 byte , up to
16384 in 2 bytes, etc. and you want to find the best match in each window.
There are two scan cases : 1. Incremental scan; that is, we're matching against the same buffer we are parsing; matches cannot come from in
front of our current parse position, 2. Separate scan - the match source buffer is independent from the current parse point, eg. this is
the case for precondition dictionaries, or just part of the file that's well before the current parse point.
1. Optimal Parsing , No window, Incremental scan : Suffix Trie is the clear winner here. Suffix Trie is only a super clear winner when you
are parsing and matching at the same time, since they are exactly the same work you double your time taken if they are separate. That is,
you must be scanning forward, adding strings and getting matches. Suffix Trie can be extended to Cascaded Windowing in an approximate way,
by walking to parents in the tree, but doing it exactly breaks the O(N) of the Suffix Trie.
2. Optimal Parsing, No window or single window, Separate Scan : Suffix Array is pretty great here. Separate scan means you can just take the whole buffer you
want to match against and suffix sort it.
(BTW this is a general point that I don't think most people get - any time you are not doing incremental update, a sort is a superb
search structure. For example it's very rarely best to use a hash table when you are doing separate scan, you should just have a sorted list,
possibly with jump-ins)
3. Optimal Parsing, Windowed or Cascaded, Incremental or Separate Scan : there's not an awesome solution for this. One method I use is
cascades of suffix arrays. I wrote in the past about how to use Suffix Array Searcher with Incremental Scan (you have to exclude positions
ahead of you), and also how to extend it to Windowing. But those method get slow if the percentage of matches allowed gets low; eg. if you
have a 1 GB buffer and a 64k window, you get a slowdown proportional to (1GB/64k). To address this I use chunks of suffix array; eg. for a
64k window you might cut the file into 256k chunks and sort each one, then you only have to search in a chunk that's reasonably close to
the size of your window. For cascaded windows, you might need multiple levels of chunk size. This is all okay and it has good O(N)
performance (eg. no degeneracy disasters), but it's rather complex and not awesome.
Another option for this case is just to use something like Hash->Links and accept its drawbacks. A more complex option is to use a hybrid;
eg. for cascaded windows you might use Hash->Links for the small windows, and then Suffix Array for medium size windows, and Suffix Trie
for infinite window. For very small windows (4k or less) hash->links (or even just a "cache table") is very good, so it can be
a nice supplement to a matcher like suffix trie is not great at cascaded windows.
Addendum : "Suffix Array Sets" is definitely the best solution for this.
4. Greedy Parsing : Here SuffixArray and SuffixTrie both are not awesome, because they are essentially doing all the work of an optimal-style
parse (eg. string matching at every position), which is a big waste of time if you only need the greedy matches.
Hash-Link is comparable to the best matcher that I know of for greedy parsing. Yann's MMC is generally a bit
faster (or finds better matches at the same speed) but is basically in the same class. The pseudo-binary-tree
thing used in LZMA (and I believe it's the same thing that was used in the original PkZip that was patented) is not awesome; sometimes it's
slightly faster than hash-link, sometimes slightly slower. All Window relatively easily.
Hash-Link extends very naturally to cascaded windows, because you are always visiting links in order from lowest offset to highest, you can
easily find exact best matches in each window of the cascade as you go.
With Greedy Parsing you don't have to worry about degeneracies quite so much, because when you find a very
long match you are just going to take it and step over it. (that is, with optimal parse if you find a 32k long
match, then at the next step you will find a 32k-1 match, etc. which is a bad N^2 (or N^3) thing if you aren't
super careful (eg. use a SuffixTrie with correct "follows" implementation)). However, with lazy parsing you
can still hit a mild form of this degeneracy, but you can avoid that pretty easily by just not doing the lazy
eval if your first match length is long enough (over 1024 or whatever).
(BTW I'm pretty sure it's possible to do a Suffix Trie with lazy/incremental update for Greedy Parsing; the
result should be similar to MMC but provide exact best matches without any degenerate bad cases; it's rather
complex and I figure that if I want perfect matching I generally also want Optimal Parsing, so the space of
perfect matching + greedy parsing is not that important)
Previous posts on string matching :
cbloom rants 06-17-10 - Suffix Array Neighboring Pair Match Lens
To be clear, the idea is computer 1 (receiver) has a previous version of a file (or set of files, but for now we'll just
assume it's one file; if not, make a tar), computer 2 (transmitter) has a newer version and wishes to send a minimum patch, which computer 1
can apply to create the newer version.
First of all, you need to generate patches from uncompressed data (or the patch generator needs to be able to
do decompression). Once the patch is generated, it should generally be compressed. If you're trying to patch a zip to a zip, there will be lots of different bits even
if the contained files are the same, so decompress first before patching.
Second, there are really two classes of this problem, and they're quite different. One class is where the transmitter
cannot see the old version that the receiver has; this is the case where there is no authoritative source of data;
eg. in rsync. Another class is where the transmitter has all previous versions and can use them to create the diff;
this is the case eg. for game developers creating patches to update installed games.
Let's look at each class.
Class 1 : Transmitter can't see previous version
This the case for rsync and Windows RDC (Remote Differential Compression).
The basic way all these methods work is by sending only hashes of chunks of data to each other, and hoping that when the
hashes for chunks of the files match, then the bytes actually were the same.
These methods are fallible - it is possible to get corrupt data if you have an unlucky hash match.
In more detail about how they work :
The file is divided into chunks. It's important that these chunks are chosen based on the *contents* of the file, not just every 256
bytes or whatever, some fixed size chunking. If you did fixed size chunking, then just adding 1 byte at the head of a file would
make every chunk different. You want to use some kind of natural signature to choose the chunk boundaries. (this reminds me rather of SURF type stuff in image feature
detection).
I've seen two approaches to finding chunking boundaries :
1. Pick a desired average chunk size of L. Start from the previous chunk end, and look ahead 2*L and compute a hash at each position.
The next chunk boundary is set to the local min (or max) of the hash value in that range.
2. Pick a desired average chunk size of 2^B. Make a mask M with B random bits set. Compute a hash at each position in the file;
any position with (hash & M) == (M) is a boundary; this should occur once in 2^B bytes, giving you an average chunk len of 2^B.
Both methods can fall apart in degenerate areas, so you could either enforce a maximum chunk size, or you could specifically detect
degenerate areas (areas with the same hash at many positions) and handle them as a special case.
So once you have these chunk boundaries, you compute a strong hash for each chunk (MD5 or SHA or whatever; actually any many-bit hash is fine, the
cryptographic hashes are widely over-used for this, they are generally slower to compute than an equally strong non-cryptographic hash). Then the transmitter and receiver send these hashes between each other; if the hashes match they assume the bytes match
and don't send the bytes. If the hashes differ, they send the bytes for that chunk.
When sending the bytes for a chunk that needs updating, you can use all the chunks that were the same as context for a compressor.
If the file is large, you may wish to use multi-scale chunking. That is, first do a coarse level of chunking at large scale to find
big regions that need transmision, then for each of those regions do finer scale chunking. One way to do this is to just use a constant
size chunk (1024 bytes or whatever), and to apply the same algorithm to your chunk-signature set; eg. recurse (RDC does this).
Class 2 : Transmitter can see previous version
This case is simple and allows for smaller patches (as well as guaranteed, non-probablistic patches).
(you probably want to do some simple hash check to ensure that the previous versions do in fact match).
The simplest way to do this is just to take an LZ77 compressor, take your previous version of the file and put it in your LZ dictionary,
then compress the new version of the file. This will do byte-exact string matching and find any parts of the file that are duplicated
in the previous version.
(aside : I went and looked for "preload dictionary" options in a bunch of mainstream compressors and couldn't find it in any of them.
This is something that every major compressor should have, so if you are the author of FreeArc or 7zip or anything like that, go add a
preload dictionary option)
(aside : you could use other compressors than LZ77 of course; for example you could use PPM (or CM) and use the previous
version to precondition the model. For large preconditions, the PPM would have to be very high order, probably
unbounded order. An unbounded order PPM would be just as good (actually, better) at differential compression than LZ77.
The reason why we like LZ77 for this application is that the memory use is very low, and we want to use very large
preconditions. In particular, the memory use (in excess of the window itself) for LZ77 compression can be very low
without losing the ability to deduplicate large blocks; it's very easy to control, and when you hit memory limits
you simply increase the block size that you can deduplicate; eg. up to 1 GB you can find all dupes of 64 bytes or more;
from 1-2 GB you can find dupes of 128 bytes or more; etc. this kind of scaling is very hard to do with other compression algorithms)
But for large distributions, you will quickly run into the limits of how many byte-exact matches an LZ77 matcher can handle.
Even a 32 MB preload is going to stress most matchers, so you need some kind of special very-large-window matcher to find the
large repeated blocks.
Now at this point the approaches for very-large-window matching look an awful lot like what was done in class 1 for differential
transmission, but it is really a different problem and not to be confused.
The standard approach is to pick a large minimum match length L (L = 32 or more) for the long-range matches, and to only put them in
the dictionary once every N bytes (N 16 or more, scaling based on available memory and the size of the files). So basically every N
bytes you compute a hash for the next L bytes and add that to the dictionary. Now when scanning over the new version to look for
matches, you compute an L-byte hash at every position (this is fast if you use a rolling hash) and look that up.
One interesting variant of this is out-of-core matching; that is if the previous version is bigger than memory. What you can do is
find the longest match using only the hashes, and then confirm it by pulling the previous file bytes back into memory only when you
think you have the longest once. (SREP does some things like this; oddly SREP also doesn't include a "preload dictionary" option, or
it could be used for making patches)
In the end you're just generating LZ matches though. Note that even though you only make dictionary entries every N bytes for L byte chunks,
you can generate matches of arbitrary length by doing byte-by-byte matching off the end of the chunk, and you can even adjust to other
offsets by sliding matches to their neighboring bytes. But you might not want to do that; instead for very large offets and match lengths
you could just not send some bottom bits; eg. only send a max of 24 bits of offset, but you allow infinite window matches, so over 24 bits
of offset you don't send some of the bottom bits.
Special Cases
So far we've only looked at pretty straightforward repeated sequence finding (deduplication). In some cases, tiny changes to original files
can make lots of derived bytes differ.
A common case is executables; a tiny change to source code can cause the compiled exe to differ all over. Ideally you would
back up to the source data and transmit that diff and regenerate the derived data, but that's not practical.
Some of the diff programs have special case handling for exes that backs out one of the major problems : jump address changes. Basically the
problem is if something like the address of memcpy changes (or the location of a vtable, or address of some global variable, etc..), then you'll have diffs all over your
file and generating a small patch will be hard.
I speculate that what these diffs do basically is first do the local-jump to absolute-jump transform, and then they create a mapping of the absolute addresses
to find the same routine in the new & old files. They send the changed address, like "hey replace all occurances of address 0x104AC with
0x10FEE) so that chunks that only differ by some addresses moving can be counted as unchanged.
(bsdiff does some fancy stuff like this for executables) (ADDENDUM : not really; what bsdiff does is much more
general and not as good on exes; see comments)
If you're trying to send small patches of something like lightmaps, you might have areas where you just increased the
brightness of a light; that might change very pixel and create a huge diff. It might be possible to express deltas of image (and sound)
data as linear transforms (add & scale). An alternative would be finding the original piece of content and just using it as a mocomp
source (dictionary precondition) for an image compressor. But at some point the complexity of the diff becomes not worth it.
Links
In no particular order :
-ck hacking lrzip-0.613
BTW some of you have horrible page titles. "binary diff" is not a useful page title, nor is "data deduplication". It's like all
the freaking music out there named "track1.mp3".
I have not done an extensive review of the existing solutions. I think bsdiff is very strong, but is limited to relatively small files,
since it builds an entire suffix sort. I'm not sure what the best solution is for large file diffs; perhaps xdelta (?). The algorithms in rsync
look good but I don't see any variant that makes "class 2" (transmitter has previous version) diffs. It seems neither lrzip nor srep have a
"precondition dictionary" option (wtf?). So there you go.
I wanted to write about what's in Oodle new and what's coming, as a sort of roadmap for myself and others. This is not a detailed feature
list; contact RAD for documents with more details about Oodle features.
Oodle Beta : (now)
What's in Oodle at the moment :
Oodle RC / 1.0 : (2012-2013)
Oodle 1.1 : (around GDC 2013)
Oodle 1.2/2.0 :
This thread should be able to display the current state of the "world" (whatever the app is managing)
and let you do simple things like move/resize windows, scroll, etc. without blocking on complex processing.
Almost every app gets this wrong; even the ones that try (like some web browsers) just don't actually do
it; eg. you should never ever get into a situation where you browse to a slow page that has some broken
script or something, and that causes your other tabs to become unresponsive. (part of the problem with
web browsers these days of course is that scripts are allowed to do input processing, which never should
have been allowed, but anyhoo).
Anyway, that's just very basic and obvious. A slightly more advanced topic is how to respond to input
when the slow processing causes a change in state which affects input processing.
That is, we see a series of input commands { A B C D ... } and we start doing them, but A is some big
slow operation. As long as the commands are complelety indepenent (like "pure" functions) then we can
just fire off A, then while it's still running we go ahead and execute B, C, D ...
But if A is something like "open a new window and take focus" , then it's completely ambiguous about
whether we should go ahead and execute B,C,D now or not.
I can certainly make arguments for either side.
Argument for "go ahead and process B C D immediately" :
Say for example you're in a web browser and you click on a link as action A. The link is very slow to
load so you decide you'll do something else and you center-click some other links on the original page
to open them in new tabs. Clearly these inputs should be acted on immediately.
Argument for "delay processing B C D until A is done" :
For similarity we'll assume a web browser again. Say you are trying to log into your bank, which you
have done many times. You type in your user name and hit enter. You know that this will load the next
page which will put you at a password prompt, so you go ahead and start typing your password. Of
course those key presses should be enqueued until the focus change is done.
A proponent of this argument could outline two clear principles :
1. User input should be race free. That is, the final result should not depend on a race between my fingers
and the computer. I should get a consistent result even if the processing of commands is subject to random
delays. One way to do this is :
2. For keyboard input, any keyboard command which changes key focus should cause all future keyboard input
to be enqueued until that focus change is done.
This certainly bugs me on a nearly daily basis. The most common place I hit it is in MSVC because that's
where I spend most of my life, and I've developed muscle-memory for common things. So I'll frequently
do something like hit "ctrl-F stuff enter" , expecting to do a search for "stuff" , only to be dismayed
to see that for some inscrutable reason the find dialog box took longer than usual to open, and instead
of searching for "stuff" I instead typed it into my source code and am left with an empty find box.
I think in the case of pure keyboard input in a source code editor, the argument for race-freeness of
user input is the right one. I should be able to develop keyboard finger instinctive actions which
have consistent results.
However, the counter-example of the slow web browser means that this is not an obvious general rule for
user inputs.
The thing I ask myself in these scenarios is "if there was a tiny human inside my computer that was
making this decision, could they do it?". If the answer to that question is yes, then it means that there
is a solution in theory, it just may not be easy to express as a computer algorithm.
I believe that in this case, 99% of the time a human would be able to tell you if the input should be
enqueued or not. For example in the source code "ctrl-F stuff" case - duh, of course he wants stuff
to be in the find dialog, not typed into the source code; the human computer would get that right (by
saying "enqueue the input, don't process immediately"). Also
in the web browser case where I click a slow link and then click other stuff on the original page - again
a human would get that right by saying "don't enqueue the input, do process it immediately").
Obviously there are ambiguous cases, but this is an interesting point that I figured out while playing poker
that I think most people don't get : the hard decisions don't matter !
Quickly repeating the point for the case of poker (I've written this before) : in poker you are constantly faced
with decisions, some easy (in that the right answer is relatively obvious) and some very hard, where the right answer
is quite unclear, maybe the right answer is not what the standard wisdom thinks it is, or maybe it requires deep
thought. The thing is, the hard decisions don't matter. The reason they are hard is because the EV (expected value)
of either line is very close; eg. maybe the EV of raise is 1.1 BB and the EV of call is 1.05 BB ; obviously in analysis
you aren't actually figuring out the EV, but just the fact that it's not clear tells you that either line is okay.
The way that people lose value in poker is by flubbing the *easy* decisions. If you fold a full house on the river
because you were afraid your opponent had quads, that is a huge error and gives up tons of value. When you fail
to do something that is obviously right (like three-betting often enough from the blinds against aggressive late
position openers) that is a big error. When you are faced with tricky situations that poker experts would have to
debate for some time and still might not agree what the best line is - those are not important situations.
You can of course apply the same situation to politics, and here to algorithms. People love to debate the tricky
situations, or to say that "that solution is not the answer because it doesn't work 100% of the time". That's stupid
non-productive nit picking.
A common debate game is to make up extreme examples that prove someone's solution is not universal or not completely
logical or self-consistent. That's retarded. Similarly, if you have a good solution for case A, and a good (different)
solution for case B, a common debate game is to interpolate the cases and find something in the middle where it's
ambiguous or neither solution works, and the sophomoric debater contends that this invalidates the solutions. Of course
it doesn't, it's still a good solution for case A and case B and if those are the common cases then who cares.
What actually matters is to get the answer right *when there is obviously a right answer*.
In particular with user input response, the user expects the app to respond in the way that it obviously *should* respond when
there is an obvious response. If you do something that would be very easy for the app to get right, and it gets it wrong,
that is very frustrating. However if you give input that you know is ambiguous, then it's not a bad thing if the app gets
it wrong.
(the compressors are the ones I can compile into my testbed and run from code, eg. not command line apps; they are all run
memory to memory; I tried to run all compressors in max-compression-max-decode-speed mode , eg. turning on heavy options on
the encode side. Decode times are generated by running each decompressor 10 times locked to each core of my box (80 total runs)
and taking the min time; the cache is wiped before each decode. Load times are simulated by dividing the compressed file size
by the disk speed parameter. All decoders were run single threaded.)
They are sorted by fastest decompressor. (the "raw" uncompressed file takes zero time to decompress).
"sum" = the sum of decomp + load times. This is the latency if you load the entire compressed file and then decompress in series.
"max" = the max of decomp & load times. This is the latency if the decomp and load were perfectly parallelizable,
and neither ever stalled on the other.
The criterion you actually want to use is something between "sum" and "max", so the idea is you look at them both
and kind of interpolate in your mind. (obviously you can replace "disk" with ram or network or "channel")
Discussion :
The compressors are ordered from left to right by speed. If you look at the chart of compressed file sizes,
they should be going monotonically downward from left to right. Any time it pops up as you go to the right
(eg. at snappy, minilzo, miniz, zlib) that is just a bad compressor; it has no hope of being a good choice in
terms of space/speed tradeoff. The only ones that are on the "Pareto frontier" are raw, LZ4, OodleLZH, and LZMA.
Basically what you should see is that on a fast disk (100 mbps (and mb = millions of bytes, not 1024*1024)),
a very slow decoder like LZMA does not make a lot of sense, you spend way too much time in decomp.
On very slow data channels (like perhaps over the net) it starts to make sense, but you have to get to 5 mbps or
slower before it becomes a clear win. (of course there may be other reasons that you want your data very small
other than minimizing time to load; eg. if you are exceeding DVD capacity).
On a fast disk, the fast decompressors like LZ4 are appealing. (though even at 100 mbps, OodleLZH has a lower "max";
LZ4 has the best "sum").
Of the fast decoders, LZ4 is just the best. (in fact LZ4-largewindow would be even stronger). Zip is pretty poor;
the small window is surely hurting it, it doesn't find enough matches which not only hurts compression, it hurts decode
speed. Part of the problem is neither miniz nor zlib have super great decoders with all the tricks.
It's kind of ridiculous that we don't have a single decent mainstream free compression library.
Even just zip-largewindow would be at least decent.
(miniz could easily be extended to large windows; that would make it a much more competitive compressor for people
who don't care about zlib compatibility)
If you are fully utilizing your CPU's, you may need a low-CPU decoder even if it's not the best choice in a vacuum.
In fact because of that you should avoid CPU-hungry decoders even if they are the best by some simple measure like
time to load. eg. even in cases where LZMA
does seem like the right choice, if it's close you should avoid it, because you could use that CPU time for something
else. You could say that any compressor that can decode faster than it can load compressed data is "free"; that is,
the decode time can be totally hidden by parallelizing with the IO and you can saturate the disk loading compressed data. While that is true it
assumes no other CPU work is being done, so does not apply to games. (it does sort of apply to archivers or installers,
assuming you aren't using all the cores).
As a rough rule of thumb, compressors that are in the "sweet spot" take time that is roughly on par
with the disk time to load their compressed data. That is, maybe half the time, or double the time, but not 1/10th the time of the disk
(then they are going too fast, compressing too little, leaving too much on the table), and also not 10X the time of the disk (then they are just
going way too slow and you'd be better off with less compression and a faster compressor).
The other thing we can do is draw the curves and see who's on the pareto frontier.
Here I make the Y axis the "effective mbps" to load and then decompress (sequentially). Note that "raw" is an
x=y line, because the effective speed equals the disk speed.
Let me emphasize that these charts should be evaluated as information that goes into a decision. You do not
just go "hey my disk is 80 mbps let me see which compressor is on top at that speed" and go use that. That's very wrong.
and the log-log (log base 2) :
You can see way down there at the bottom of the log-log, where the disk speed is 1.0 mbps, LZMA finally becomes best.
Also note that log2 of 10 is a gigabyte per second, almost the speed of memory.
Some intuition about log-log compressor plots :
Over on the right hand side, all the curves will flatten out and become horizontal. This is the region where
the decompress time dominates and the disk speed becomes almost irrelevant (load time is very tiny compared
to decompress time). You see LZMA flattens out at relatively low disk speed (at 16 mbps (log2 = 4) it's already
pretty flat). The speed over at the far right approaches the speed of just the decompressor running memory to memory.
On the left all the curves become straight lines with a slope of 1 (Y = X + B). In this area their total time is dominated by their loading
time, which is just a constant times the disk speed. In a log-log plot this constant multiple becomes a constant
addition - the Y intercept of each curve is equal to log2(rawLen/compLen) ; eg someone with 2:1 compression will
hit the Y axis at log2(2) = 1.0 . You can see them stacked up hitting the Y axis in order of who gets the most compression.
Another plot we can do is the L2 mean of load time and decode time (sqrt of squares). What the L2 mean does is penalize compressors
where the load time and decode time are very different (it favors ones where they are nearly equal). That is,
it sort of weights the average towards the max. I think this is actually a better way to rate a compressor for
most usages, but it's a little hand-wavey so take it with some skepticism.
There are lots of "extra" things you can do on top of the base pure compressor. It makes it very hard to
compare compressors when one of them is doing some of the extra things and another isn't.
I used to only write pure compressors and considered the extra things "cheating", but of course in practical
reality they can sometimes provide very good bang-for-the-buck, so you have to do them. (and archivers these
days are doing more and more of these things, so you will look bad in comparison if you don't do them).
Trying to dump out a list of things :
Parameter Optimization . Most compressors have some hard-coded parameters; some time it's an obvious
one, like in LZMA you can set the # of position bits used in the context. Getting that number right for the
particular file can be a big win. Other compressors have hard-coded tweaks that are not so obvious; for example
almost all modern PPM or CM compressors use some kind of secondary-statistics table; the hash index made for
that table is usually some heuristic, and tweaking it per file can be a big win.
Model Preconditioning . Any time your have a compressor that learns (eg. adaptive statistical coders,
or the LZP cache table, or the LZ77 dictionary) - a "pure" compressor starts from an empty memory and then
learns the file as it goes. But that's rarely optimal. You can usually get some win by starting from some
pre-loaded state; rather than starting from empty and learning the file, you start from "default file" and
learn towards the current file. (eg. every binary arithmetic probability should not start at 50% but rather
at the expected final probability). And of course you can take this a step further by having a few different
preconditions for different file types and selecting one.
Prefilters . BCJ (exe jump transform), deltas, image deltas, table record deltas (Bulat's DELTA), record
transposes, various deswizzles, etc. etc. There are lots of prefilters possible, and they can provide very big
wins for the amount of time they take. If you don't implement all the prefilters you are at a disadavantage to
compressors that do. (for example, RAR has a pretty strong set of prefilters that are enabled by default, which
means that RAR actually beats 7zip on lots of files, even though as a pure compressor it's much worse).
Header compression . Anything you send like buffer sizes or compressor parameters can generally be smaller
by more advanced modeling. Typically this is just a few bytes total so not important, but it can become important
if you transmit incremental headers, or something like static huffman codes. eg. something like Zip that can adapt
by resending Huffmans, it's actually important to get that as small as possible, and it's usually something that's
neglected because it's outside of the pure compressor.
Data Size Specialization . Most compressors either work well on large buffers or small buffers, not both;
eg. if you do an LZSS , you might pick 3 byte offsets for large buffers, but on tiny buffers that's a huge waste;
in fact you should use 1 byte offsets at first, and then switch to 2, and then 3. People rarely go to the trouble
to have separately tuned algorithms for various buffer sizes.
Data specialization . Compressing text, structured records, images, etc. is actually all very different.
You can get major win by special-casing for the major types of data (eg. text has weird features like the top bits
tell you the type of character; word-replacing transforms are a big win, as are de-punctuators, etc. etc.).
Decompression . One of the new favorite tricks is decompressing data to compress it. If someone
hands you a JPEG or a Zip or whatever and tells you to compress it as well as possible, of course the first
thing you have to do is decompress it to undo the bad compressor so you can get back to the original bits.
This is almost all stuff I haven't done yet, so I have some big win in my back pocket if I ever get around to it.
In the compression community, I'm happy to see packages like FreeArc that are gathering together the prefilters so
that they can be used with a variety of back-end "pure" compressors.
eg. if the nibble is < config_divider_lrl , it's a literal run len; if nibble is >= config_divider_lrl, it's a match.
The point of LZNib is to see how much compression is possible while keeping the decode speed close to the fastest
of any reasonable compressor (LZ4,snappy,etc).
Testing different values of config_divider_lrl :
The divider at 8 is the same as using a flag bit + 3 bit payload in the nibble.
There does seem to be some win to be had by transmitting divider in the file (but it's not huge). Adaptive divider seems like an
interesting thing to explore also, but it will make the decoder slower.
Obviously it would be nice to be able to find the optimal divider for each file without just running the compressor N times.
Some comparison to other compressors. I tried to run all compressors with settings for
max compression and max decode speed.
I'm having a bit of trouble finding anything to compare LZNib to. Obviously the
LZ4+Large Window (LZ4P)
that I talked about before is a fair challenger. I thought CRUSH would be the one, but it does
very poorly on the big files, indicating that it has a small window (?). If anyone knows of a strong byte-aligned LZ
that has large/unbounded window, let me know so I have something to compare against.
(ADDENDUM : tor -c1 can do byte-aligned IO and large windows, but it's a really bad byte-aligned coder)
(obviously things like LZ4 and snappy are not fair challengers because their small window makes them much worse;
also note that zlib of course has a small window but is the only one with entropy coding; also several of these are not
truly "byte aligned"; they have byte-aligned literals but do matches and control words bit by bit, which is "cheating"
in this test (if you're allowed to do variable bit IO you can do much better; hell you may as well do huffman if you're
doing variable bit IO) (though I'm also cheating because I believe I'm the only LZ here with a proper optimal parser)
(I'm also cheating in a more subtle way, because I have this test set to tweak on and others don't)).
(aside : I don't understand why so many people are still writing small-window LZ's. Yes there is a certain
use case for small-window LZ (eg. I did one for use on the SPU), but the most common use case for LZ these days
is just "decompress the whole thing into a linear memory buffer", in which case you always have the entire buffer available
for matches, so why not use it all?)
BTW I finally got most of these compressors linked into my own test bed for this post. Review of their API's (and some ranting) :
minilzo , quicklz, LZ4 : yes, good, easy. API is basically compress(char *,int len,char *), as it should be.
One minor niggle with these guys : I'm actually leaning towards compress API's taking "void *" now for buffers.
It's annoying dealing with API's where somebody thinks "char *" is the way to take an array and someone
else wants "const unsigned char *", etc. If it's just memory, take void *, I think. I'm not completely sure
about that.
Oh, also "uncompress" ? Where did that come from? It's "decompress". Come on!
Oh, and about LZO : so minilzo is nice and clean, but it only gives you access to the most
crappy compressor. To get the good LZO variants you have to use the full LZO sdk which is a nightmare. So I
didn't bother and just ran lzopack -9.
miniz : I don't love the single file miniz.c thing; I'd like to have an implementation .c and a header with the externs.
It makes it way harder to read the externs and see what is supposed to be public. I also don't like the custom
types (mz_uLong and such), I like "int" (I guess "size_t" is okay). But not bad, way better than :
zlib : WTF WTF ridiculous clusterfuck. There's an MSVC make file that seems to make a lib that
is not what you want. Then the contrib has some projects for two MSVC versions and not others; then you have to make
sure that you get the DLL import and WINAPI defines in your import the same as the built lib. Much hair pulling due to
"_deflate not found" link errors.
(BTW I don't entirely blame zlib for that; how many man hours have been wasted due to the linker
being so damn unhelpful. Hey jerks who work on the linker, if I'm trying to import "__stdcall deflate" and you only have
"__cdecl deflate" then perhaps you could figure out that just maybe I wanted those to hook up. We've only
been having this damn problem for the last 20 years. It's the kind of thing where one decent human being
on that team would go "hmm I bet we could do something better here". The linker is like the
overly precise nerd; if you ask "can you pass the salt" he's like "no, I don't have any salt" , and you're
like "fuck you, I can see the salt right next to you" and he's like "nope, no salt", and you're like wtf wtf
so you walk around and grab it and you say "okay, what's this?" and he says "that's kosher salt").
Compressed sizes from zlib were slightly smaller than miniz, so I posted them.
lzma : this is a mixed bag; the C++ interface is a total disaster of COM insanity, but you can avoid that
and the C interface (LzmaEnc/LzmaDec) is okay. A bit of a mess and you oddly have to handle the props
transmission yourself, but so much better than the C++ interface that it looks great in comparison. Very
bizarre to expose so many technical encoder settings in the basic API though.
snappy : WTF ? No MSVC build, it uses some crazy STL crap for no reason (iostreams? are you kidding me?
using std::string for buffers? WTF),
it's all overly complex and bad-C++y.
Really gross. There's a mess of files and I can't tell which ones I actually need.
Oh, and it generates warnings like crazy;
so it's all bloated "clean style" but not actually clean, which is the worst way to be.
And then after futzing around with it, I'm rewarded with a really crappy compressor.
I guess I see the point of snappy; it's LZRW1 for modern CPUs. Okay, that's fine. And it is reasonably
fast even on incompressible data. So it has a certain use case, perhaps for disk compression, or I
guess Google uses it for network data compression or some such. But people have been taking snappy and
using it as just a utility data compressor and it is totally terrible for that.
We've seen in the past that changing the LZ "min match len" can give big compression wins on some files.
eg. say you have some LZ coder which normally has a minmatchlen of 3, you instead set minmatchlen to 4
so that no length 3 matches are emitted, and you find that the compressed size is smaller.
( see, for example, post on png-alike )
(for concreteness, assume a Huffman coded LZ, so that if you don't emit any matches of length 3, then that
symbol takes up no code space; no prefix code is reserved for it if it doesn't occur)
Now, perhaps this is not a surprise in a normal parse, because a normal parse has some heuristic about
whether a match is better than a literal (or a lazy match is better, etc.), and if that heuristic doesn't
match the file then the normal parse will not find the optimum.
But this is also true for (typical) optimal parsers. You would think that if the matches of length 3 hurt
compression, the optimal parser would not choose them. So what's going on?
Well, first of all, the problem is that the matches of length 3 do *help* compression, in a greedy local optimization
sense. That is, if you have some assumption about the Huffman code lengths so that you can measure the cost of
each choice, and you just ask "what's smaller to code, these 3 literals or this length-3 match?" the answer is the
length-3 match. That's what makes this a tricky case; if it was more expensive it wouldn't get chosen.
But you can see what caused the problem - in order to do the optimal parse we had to make an assumption about the initial
Huffman code lengths to compute code costs. This biases us towards a certain strategy of parse. The "optimal parse" is
then just a local optimization relative to that seed; it can't find radically different parses.
In particular, what's happening with these files is that *if* you generate length-3 matches, they are slightly cheaper
than 3 literals, but only barely so. However, if you don't generate any length-3 matches, that makes your other prefix
codes shorter; in particular the literals get shorter and the length-4 matches get shorter. The result is a smaller file
overall.
With an optimal parser, you could find the better parse by using a different seed Huffman cost, rather than using
the hard rule of changing the minMatchLen. What you'd have to do is parse a bunch of files, pick a set of representative
Huffman code lengths that are quite different from each other, then you can run your optimal parse seeded from each one and take
the best. This is just giving you a bunch of different initial positions for your local search in an attempt to find
the global minimum.
In heuristic lazy parsers (like LZMA
( see this blog post )) there are some tweaks about "should I prefer this match or that one" or "should I
prefer a literal or match". You have the same problem there, that the parse strategy affects the statistical coder,
and the statistical coder affects the parse strategy. The heuristics are tweaked to guide the parse in a way that we
expect to be good, but its not the best way on all files.
For a while I've had this idea that I call "multi parse". The short description is to run many parses at once and take
the best once. This is a bit different from a normal "optimal parse" in the sense that our specific goal is to avoid
doing a single local minimization.
For example with an LZMA style heuristic parser, you could run several parses with different constants in the
normal match vs repeat match and match vs lazy match decisions, as well as min match len.
The key thing is that you can run a "multi parse" as an inner loop, rather than an outer loop. That is, you run all the parses at
once as you step over the file, rather than tweaking parameters on the outside and running the entire compression several times :
But the real big win comes once you realize that parse strategy doesn't have to be selected for the whole file. It can vary throughout the
file, and different strategies may be best in different regions. If you have the timeline of all the strategies layed out, then you can
try to find a shortest-path serpentine walk across the strategies.
Trying the obvious ways to extend LZ4 to large windows.
In my view, the fundamental thing about LZ4 is the strictly alternating sequence of literal-run-len, match-len.
That is, in LZ4 to send two matches in a row you must send a literal-run-len of 0 between them. This helps decoder
speed a bit because it removes the "is this token a match or literal?" branch and makes you just always go
literals-match-literals-match. (though you don't really get to avoid the branch, it's just moved into the looping
on the run len).
I relax the control word structure to not just be 4 bits of LRL and 4 bits of ML. But I do keep strict bit-wise division
of the control world, and strictly byte-by-byte literals and offsets. The obvious contenders are :
Some thoughts : going to large window helps a lot, but the exact scheme doesn't matter much.
Obviously you could do better by going to more complexity; arbitrary dividers instead of integer bit counts;
different min match lens for each # of offset bytes (eg. 3 for 1 byte, 4 for 2 bytes, 5 for 3 bytes, etc.).
Probably the biggest next step (which isn't a big speed hit) would be to add a "last offset" (aka "repeat match").
Last offset is a big LHS penalty on the shit platforms, but here our decoder is so simple that we have the potential
to write the whole thing in assembly and just keep the last offset in a register.
BTW I don't think I said this explicitly in the last post, but the great thing about having an optimal parser is
that it's much *easier* to write an optimal parser for an arbitrary coding scheme than it is to write a well tweaked
heuristic. Each time you change the coding, you would have to re-tune the heuristic, whereas with an optimal parser
you just plug in the code costs and let it run. Then once you have decided on a format, you can go back and try to
find a heuristic that produces a good parse without the slowness of the optimal parse.
In the same family as this post :
cbloom rants 06-14-11 - A simple allocator
On newer chips with POPCNT this should be reasonably fast (assuming query is common and insert is rare,
since insert requires a realloc/memmove or something similar). (and of course everything is a disaster on
platforms where variable shift is slow).
(BTW the GCC builtins are annoying. What they should have done is provide a way to test if the builtin
actually corresponds to a machine instruction, because if it doesn't I generally don't want their fallback
implementation, I'll do my own thank you.)
It does two major things differently that are important for me :
1. It fixes file names for you. The primary two fixes it does are :
1.A. De-mapping of substed or net used drives. eg. if you have c:\src as your client root, and you map s:\ to c:\src , perforce can't
handle any paths in s:\. p4util will take s:\ paths and convert them to c:\src.
1.B. Case fixing (p4 is so fucked up in this regard; the behavior of SVN software should not be affected by what my client and server
are running on; there should just a setting for case sensitive or not, and both ways should work on all OS'es). Anyhoo,
p4util checks if the depot has a file name that is the same except for case and fixes it for you (so that the broken situation of
Windows client and UNIX server works passably).
eg. if your depot has "Stuff" in it and you p4 edit "stuff" it will fail, but p4util will succeed.
2. It can pick up its working dir from file names.
P4 has this nice "P4CONFIG" feature in which you can set up config files to have different servers for different dirs. I use it to run
a home p4 server on one source tree and use the RAD p4 server on another source tree.
That's all well and good, but p4.exe picks up the P4CONFIG from the working dir it is run in. That's dumb. When I do a
Here it is, at cbloom.com :
p4util.exe (zip)
I also just added a special command to create a summary of how your client differs from the depot. The main part of snapshot is to
do "p4 changes -s pending" and then describe all those changelists. (but just doing p4 opened and p4 diff works pretty well too).
TODO : make a full snapshot/recover that saves copies of files that differ so you can fully save state and restore to it, even with
files at weird sync points or different from depot.
ADDENDUM :
BTW before I wrote p4util I was using another program I wrote called "makedesubst".
makedesubst writes a temp bat called desubst.bat which you can use to cd to the non-substed version of
your path.
makedesubst is a very trivial program which just runs GetFinalPathNameByHandle on the current directory :
(it is possible to change the CWD of your parent process, but it's ugly, so I took the easy way out of
writing a bat)
0. Coding is pixel-by-pixel not byte-by-byte. (?)
This is a big change that is not at all clear in their explanation (I didn't pick up on it until I
wrote this whole thing and had to come back and add it as item 0). It appears to me that coding is
always pixel-by-pixel ; eg. they only send LZ matches for *whole pixels*. If this doesn't hurt compression,
it should help decode speed, since coding branches are taken less often.
(ADDENDUM : a related major but not-emphasized change is that they have 3 separate huffmans for R,G,B literals
(vs. PNG that just has a single huffman for all literals); note that on RGBA data, you could use LZMA as a back-end for a
PNG-alike and the mod-4 literal stats of LZMA would correspond to RGBA; on RGB data, you need a mod-3 which
no standard LZ does out of the box).
1. A large set of fixed spatial predictors.
They're basically the PNG predictors. They only allow the 1-ring causal neighbors. They do not include
the truly great 1-ring predictor "ClampedGrad". They do not include arbitrary linear-combo predictors
(which is very odd since they do include linear-combo color transforms). There's no ability to
do more complex predictors using the 2-ring like CALIC (hello 1996 is calling and has some better
tech for you). They do include the over-complex Select predictor (similar to PNG Paeth).
2. Division of image for coding purposes is in rectangular blocks instead of rows (like PNG).
This is surely a big win and allows the possibility of a very aggressive optimizing encoder. Obviously
2d images are better correlated in rectangular blocks than in long rows.
This is particularly a win in the modern world of very large images; if you have 16k pixels on a row
it's silly to divide your image into rows for compression purposes; much better to put 16k pixels in a
square block together.
(BTW webp retardation : 14 bit (16k) limit for width and height? Oh yeah, 640k of ram is all anyone needs. File sizes
will never be bigger than 32 bits. WTF WTF.
How many standards do you have to fuck up before you learn not to put in absolute limits that will be blown away in 5-10 years).
3. Arbitrary color transform.
They allow you to transmit the color transform as a series of lifting steps. I did a bunch of experiments
on this in the past (and believe I wrote some rants about it)
and found it to be not of much use on average, however in weird/rare cases it can
help enormously.
However, this feature, like many others in WebP-LL, make a super efficient decoder implementation almost
impossible. eg. if you know the color transform is LOCO you can write a very nice ASM decoder for that.
BTW WebP-LL looks unfairly better than PNG here because LOCO-PNG is not in the normal PNG standard. (it's in MNG)
4. 2d LZ77
The LZ77 seems to be just Zip for the most part (except with a 1M window - which makes low memory
decoding impossible). Instead of LZ77 offsets being linear in row-scanline order, they reserve some
special low offsets to be the local 2d neighborhood (eg. your upward neighbor). Again this looks like
it should be a big win for the modern world of long-row images (because in linear order, a long row causes
the upward location to be a large offset).
Certainly this should be a win for things like text where the same pattern is repeated vertically. (though
for text an even bigger win would be LZ-modulo-offsets; eg. figure out that the natural repeat is an 8x16 block
and send offsets that are multiples of that). (and hell for text you should just use JBIG or something from
20 years ago that's designed for text).
(aside : there was work ages ago about doing actual rectangular LZ for images. eg. instead of linear byte matches,
you transmit a width & height and paste in a rectangle, and instead of just linear offsets you code to do
a {dx,dy} offset. I don't believe it ever caught on due to the complexity, but it is theoretically appealing.)
Sadly they have not taken the opportunity to make an
LZ that's actually competitive with 1995 (LZX). At the very least any modern LZ should have "last offset".
Also for a codec that is pretty much PC-only there's very little reason to not use arithmetic coding.
(I say it's PC-only because of the 1MB window and other forms of non-locality; that will preclude use in embedded devices).
Variable min-match-len also seems to be a big win for image data; transmitting MML in the header would have been
nice.
5. Color Cache thingy
It's basically a simple cache table and you transmit
the index in the cache table (very old compression technique). Meh, curious how much of a win this is.
Very bad for load-hit-store platforms.
6. Multiple palettes
This is a trivial thing, but the ability to define local palettes on 2d regions is surely a big win
for certain types of graphic art web images. (eg. in mixed text/color you may have regions of the image
that are basically 2 bit) (allowing lossy detection of black&white could make this even better; lossy YCoCg
color transform is something I would have liked to see as well).
7. Pixel formats?
The docs are just terrible on this, I can't tell what they support at all (they say "see the VP8 docs" which
are even worse). Any modern format should support N channels and N bits per channel (at least N=8,16,32), as well as
floating point channels with encodings like LogInt.
Conclusion :
It seems reasonable and there's nothing massively bone-headed in it. (it's much easier working on lossless
(than lossy) because you can just measure the size).
For open formats, it's a very good idea to make them flexible (but not complex - there's a difference),
because it lets the kiddies go off and write optimizers and find clever tricks.
WebP-LL is pretty good in this regard but could have
been better (transmit linear combo spatial predictor for example).
The compression ratio is decently close to state of the art (for fast decoders).
If you're going to make yet another image format, it should be more forward-thinking, since formats tend
to either be ignored completely or stick around for 20+ years.
Further thoughts :
Image data can be roughly categorized into 3 types :
1. "Natural images" (continuous tone). This type of image is handled very well by linear predictors
(BCIF, TMW, GLICBAWLS, CALIC, MRP, etc. etc.).
2. "Text". What I mean by text is few-colors, and with exactly repeating shapes. eg. some graphical symbol
may appear several times in the image, just moved, not rotated or fuzzy. This type of image is handled well
by large shape context predictors (JBIG) or 2d-LZ.
3. "Graphic art". This type of image has some repeating patterns, maybe gradients or solid color shapes.
PNG and WebP-LL both handle this kind of thing reasonably well. The "natural image" compressors generally do
poorly on them.
A real modern network image codec should really have 3 modes for these types of data, selected in rectangular regions.
(or, like PAQ, simultaneously modeled and mixed).
ADDENDUM : it's important to distinguish between the compressor and the file format. I think webp-ll as a
compressor is actually pretty impressive; they have achieved very good space-speed for the complexity level.
As a file format it's not so great.
US Eastern King dimensions : 76 x 80 inches
76 x 80 !? You fuckers. That's like giving me a square hole and a peg that's just barely doesn't fit and laughing
while I try to cram it in the hole.
Make it square god damn it. I suspect they do it on purpose so you can't rotate the bed 90 degrees, since that
would massively prolong the life of a mattress. (same reason why all modern mattresses are non-flippable).
If you could rotate AND flip a mattress, it could last a lifetime.
Anyhoo, this thought reminded me of an old rant I wrote and never posted (I almost never post the non-technical rants that I
write anymore), so here it is :
Modern Capitalism is primarily about how to maximize profit by manipulating consumers,
making shit products and somehow selling them for huge sums. Why in the world would a corporation ever
spend the masses of money to make actual good products and sell them honestly based on their merits, when
they can make much more profit by convincing you to buy some piece of crap that you don't actually need?
Why would I spend money improving my factory when I can spend much less on some clever lawyers and financiers
who can offshore my profits and deprive my workers of their contractually guaranteed pensions?
This post is a few little stories related to that.
It's almost impossible to find good tools any more. All of the major reputable "quality" brands have
switched their stage of life from "build up my brand name" to "milk my brand name for all its worth".
DeWalt, Makita, etc. are all basically the same made in China crap. (Craftsman is of course the tool
brand that has fallen the farthest; in my grandfather's time it meant really solid quality good stuff,
and now it is just the bottom of the barrel crap; in fact what Craftsman does is take the generic no-brand
stuff from China and just stick a label on it, you can literally find the exact same products in different
colors on AliBaba). This is the inevitable cycle of life of brand names, they get built up for a while,
and at some point they become too valuable and the small profit that you get from actually making good
products and continuing that brand is not enough, a venture cap company will buy you up because there is
a much bigger profit opportunity by going into the "brand milking" phase.
When you're looking to buy something these days, having a well known reputable brand name is almost a
curse for the product, that is, it makes it *less* likely to be good. It's kind of amazing to me how
many semi-intelligent people will say things like "oh yeah, German cars are great quality". No they aren't,
they haven't been since the 80's. BMW, Merc, and Porsche are all well into the "brand milking" phase.
Sadly Honda seems like it may have entered that phase of life as well.
I recently bought a Simmons Beautyrest mattress for the new house, and made the mistake of not redoing
my research. 10 years or so ago the last time I bought a mattress I did a ton of research and found
that the Beautyrest was a perfectly acceptable quality-pricepoint compromise (though at that point it
was already not the bed that it used to be). In the time since then, Simmons has gone into total brand
milking mode.
(aside : of course mattresses have been an area of super-shysterism for a long time; there are lots of appalling
warning signs. Any product that doesn't have a standardized price is a major warning sign. Any product that's only
sold through dealers and not direct is a major warning sign. They have long touted their "warranties" which are completely
worthless because they always find some loop hole to deny it. But perhaps the biggest warning sign is the way they
are constantly changing the names so that you cannot cross-shop or find consistent information about "this model is good"
or "this model is bad").
Amusingly I even found this piece of evidence :
This is what modern companies really do. Maximize dealer profitability. Develop new product levels for up-selling opportunities. Improve our
products? Fuck that.
If you still don't believe me, yes it is in fact true that Simmons makes absolute garbage now :
(aside 1 : one thing that always strikes me as super bizarre is that the business pages will be full of things like
"Porsche turns record $25,000 average profit per car" or "Monster Cable launches new Beats headphone line to tap dumb sucker profit opportunity"
and in the business pages the article is very positive, it's like great play by them, we think they're cool for being really good at
ripping people off. Okay fine, I'm sure the industrialists congratulate each other on their schemes. But it's right out in the
open. How can you see these articles and ever buy a single product from those companies again? It's sort of like two pro poker
players talking to each other at the table saying out loud "yeah this moron fish across from us is about to spew off his whole stack so I'm trying
to play every hand with him", how can you get away with admitting the fact that you are a robber right to their faces and
get away with it?)
(aside 2 : it's puzzling/funny to me that many people are so brainwashed by the hype, they think they have options. They'll say : okay, I know Sealy and
Simmons are lying scammers that make absolute garbage, but the top end Beautyrest Black line is still good. Or, I know Sealy and Simmons
are crap, that's why I buy Stearns & Foster (FYI - they also make garbage, which you can tell if you do the research into the grade and
gauge of steel they use in their coils); that's exactly what they want you to do, by stepping up you have exactly fallen into their upsell trap,
my god).
(aside 3 : a similar line of bad thought is the people who know that these businesses are all scammers, but think that they can somehow beat the scammers.
No you can't, you can never beat them. They have loads of money, they control distribution, and they have advertising that has already
corrupted your brain and you don't even know it. I'll see people who are like "oh, it's over-priced and not that great quality, but I got it
for 50% off so I win!" ; no you didn't win you fucking moron, all mattresses are always 50% off; if they sell it to you, you lose. I see countless people
on car forums who think they really got one over on a dealer, or who think they got an extended warranty for a great price. Yeah, I'm sure your math is
better than the professional actuaries at the warranty company.)
I had another little experience recently that made me think of this. I finally got new glasses after many years of having the same ones
(I think I got them when I was 21, so it's been 13-14 years !?). I tried a few different shops, and wound up picking some at the Japanese
frame shop in Uwajimaya. When I was picking the type of glass I asked about getting the upgrades for scratch resistance and anti-glare
coating, which at any other glasses shop would be extras; the lady was like "all our lenses are scratch resistant and anti-glare,
it's not an option, of course your lenses should have that!". It blew me away, it was like total non-ripoff-capitalism (which of course
means they are missing a profit opporunity and some VC should buy them out and make them more cut-throat).
The thing I realized is that a standard fucking retarded economic theory gets this all wrong. An economist would say that it's actually
better for the consumer to have the options. By having various price points, the consumer can choose the level that maximizes utility
for them. But that is just not true.
In reality what making those things optional really does is to raise the price of glasses.
When the glasses shop switches from selling {lenses+antiglare+antiscratch} as one unit into selling them as separate units, the total
price {lenses}+{antiglare}+{antiscratch} does not stay the same, it rises, a lot. In theory a consumer should be able to see that and
go wait a sec, I'll buy from a shop that still sells it as a unit, but of course you can't actually do that in the real world.
One of the real world problems is that when business sectors figure out these new profit tricks, they very quickly all adopt them,
so you don't have a choice to keep buying the old way. Another problem is of course advertising; standard economic theory is basically
totally broken in the real world because it is assumes the consumer demand is based on some rational concrete thing, when in fact demand is
usually manufactured by advertising; in this case, splitting the prices allows them to advertise the price of just the lenses, and they
hide the information from you that the lens price they are advertising is for shit lenses. There's another more subtle piece of
unequal information, which is that the glasses shop knows that you should never buy those crap lenses without the options, it just
isn't worth the savings, but consumers are not educated enough to know that and so can make that mistake.
(perhaps an even better example is cars - every time they take out a standard feature and make it optional, it's worse for consumers;
the base price never goes down by the amount they saved in removing that feature)
A final thing I've been thinking about for a while is the fact that sadly the video game industry has finally entered into the
"clever capitalism" phase. If you're a fucking asshole you might say that our industry has gotten "more mature" or we are
better at offering different purchasing methods to consumers. That's not true, in fact what we are doing is learning how to
cleverly rob people.
I remember a few years ago people in games all started talking about randomized rewards and incremental payments and all this
shit about the psychology of how to tickle a user's brain and get them to keep putting money in the slot, and I just felt sick
and knew I had to get out of this industry.
Subscription payment methods are almost always just methods of robbery. Whether it's a phone, a gym, or an MMO, the reason
why businessmen love subscription models is because it provides a great way to take much more money from the consumer than they
think they will spend on it.
Downloadable content sounds okay in theory, but in fact it is just a form of robbery based on unequal information (one of the
capitalist's favorite ripoffs). Jeff Minter has ranted about this far better than me, but the principle scheme is to not make
it clear up front to the player how much of the game they actually get for each payment.
Of course the most scummy current trend in games is free-to-play and in-game-purchases and the whole "milk the whale"
method.
cbloom rants 09-02-12 - Encoding Values in Bytes Part 1
1. I mentioned prefix byte length codes in the first post. This is a code in which you first send the # of bytes using a bit prefix code,
then the remainder. A small note on that :
Obviously you have unary :
and obviously you have the fixed two bit code :
2. The unary prefix code is always nice, because it can be decoded branchlessly with countlz (BSF/BSR). (see for example Elias Gamma or Exp-Golomb coding).
The decoder for mod-128 (with the flag bits at the head, and little endian) is :
The code above provides the maximum code space; it adds the base of each range and it doesn't waste a bit for the
top range. Obviously it can be faster (no MIN's) if you do waste the bit.
3. I was playing with LZ4 offsets and it's an obvious place to use EncodeMod.
Say you have an existing LZ codec that writes offsets in 2 bytes. You want to leave it alone except for
the offset IO.
You can add > 65536 offset support by changing the offset to EncodeMod.
One obvious scheme is : 0 + 15 bit offsets, 1 + 23 bit offsets. This is EncodeMod( 15 bits ) on the first word
followed by a raw byte (EncodeMod(0)).
But that is not ideal; you want more of the offsets in the 32768 to 65536 range to get into the first 2 bytes. Well,
just lower your mod. EncodeMod( 14 bits ) gives you up to (32768+16384) in the first 2 bytes, and 22 bits for longer offsets.
EncodeMod( 13 bits ) gives you up to 57344 (65536-8192) in the first 2 bytes, and 21 bits for long offsets.
The next thing you could do is be able to send a few low offsets in just 1 byte. The most important one is actually just the RLE
offset (offset=1). We do not want to use something like 1 bit + 7 bit payload, because that puts too many offsets in 1
byte and it takes away too much of our 2 byte range (we still want as many offsets as possible to be sent in 2 bytes).
We can get that in using EncodeMod. Instead of outputing 2 bytes at first, we do 1 byte with a
high mod - something like mod 254. This saves just a few values to fit in 1 byte and most of that byte range is
left over for the second byte.
So to send an LZ offset you might do : mod 255 , mod 64, mod 0 ; then the thresholds are
(I don't suggest you actually do this for LZ offsets; it's intended as an example of the kind of thinking that
goes into good code design).
4. File sizes are a common case for this, so I did a brute force optimize.
I gathered all the file sizes on my computer. (you may question whether my computer is a representative sample).
(BTW reminder to self : beware circular symbolic links in dir traversals these days; lots of old software that does dir
traversals can infinite loop these days)
Anyhoo I tried mod byte-by-byte encoding and optimized the parameters for a 3-step series. The optimal was :
More efficient is an encodemod which goes word-then-bytes , and uses only pow2 mods. The optimal for that is :
Compare to a simple scheme I've used in the past :
It's a very easy format to optimal parse, because the state space is small enough that you can walk the entire
dynamic programming table. That is, you just make a table which is :
(* see addendum at end)
For LZ4 the state is just the literal run len (there is no entropy coder; there is no "last offset"; and
there are no carried bits between coding events - the way a match is coded is totally independent of what precedes it).
I use 16 states. Whenever you code a literal, the state transition is just state++ , when you code a match the transition is always to state = 0.
There is a small approximation in my optimal parse; I don't keep individual states for literal run lens > 15.
That means I do measure the cost jump when you go from 14 to 15 literals (and have to output an extra byte),
but I don't measure the cost jump when you go from 15+254 to 15+255.
The optimal parse can be made very very slightly better by using 20 states or so (instead of 16). Then from state 15-20 you count the
cost of sending a literal to be exactly 1 byte (no extra cost in control words or literal run len). At state 20 you count the cost to be 1 byte + (1/234) , that is
you add in the amortized cost of the 1-1-1-1 code that will be used to send large literal run lengths. While this is better in theory,
on my test set I don't get any win from going to more than 16 states.
Without further ado, the numbers :
greedy, lazy, optimal are mine. They all use a suffix array for string searching, and thus always
find the longest possible match. Greedy just takes the longest match. Lazy considers a match at the next
position also and has a very simple heuristic for preferring it or not. Optimal is the big state table described
above.
Yann's lz4 -c2 is a lazy parse that seems to go 3 steps ahead with some funny heurstics that I can't quite follow; I see it definitely
considers the transition threshold of matchlen from 18 to 19, and also some other stuff. It uses MMC for string matching.
His heuristic parse is quite good; I actually suspect that most of the win of "optimal" over "lz4 -c2" is due to finding
better matches, not from making better parse decisions.
(Yann's lz4.exe seems to also add a 16 byte header to every file)
See also previous posts on LZ and optimal parsing :
cbloom rants 10-10-08 - 7 - On LZ Optimal Parsing
(*) ADDENDUM :
It turns out you can optimal parse LZ4 without keeping all the states, that is with just a single LZSS style backwards walk
and only a 1-wide dynamic programming table. There are several subtle things that make
this possible.
See the comments on this post :
LZ-Bytewise conclusions
Key :
My focus this quarter was mainly just trying to be more careful with the little details that have made my
pots not that great in the past. Two things in particular I focused on were : 1. making better feet - you want
the feet to be well positioned for stability and aeshetics, and you also want them to be a bit rounded so they
are pleasant to touch and don't scratch tables (also tried to do better trimming around the feet and bottom
of the pot to make equal thickness all over); 2. making really smooth bowl interiors.
Smooth bowl interiors need to be thrown with a rib; they also need to be glazed with a glossy glaze that doesn't
do anything too weird. I also worked on eliminating the "potter's hump" so the interior curve of the bowl is
perfectly round. I developed a technique for this : first throw the bowl as usual; at this point you will typically
have a hump; now take a rib, support the rim of the bowl on the outside with your hand and gently push the rib
into the hump, work it up and down and gradually smoosh out the hump; this will basically move the hump from the
inside to the outside. Now if there's too much material on the outside you can throw it up from there; otherwise
just wait and you can trim it off.
To get cleaner glazing, I used more wax and took my time. The bottom right bowl for example was oxblood inside;
wait for it to dry; then wax the inside around the edges to protect it, then pour the outside upside down; avoids
the messy-looking overlap on the inside (but allows some nice overlap on the outside).
Throwing low, wide bowls is almost like throwing plates. There's a continuum from bowls to platters with lips to plates. The main
thing with throwing low bowls is to throw the inside to the shape you want and don't really worry about the outside; in fact, leave a
pretty big supporting lump around the outside, because you can't get the clay too horizontal and unsupported in throwing; fix the
outside in trimming.
Some issues I need to keep working on :
Don't be afraid of trimming too thin; take time, check the pot and if it's not done, keep going. There's no need
to make more mediocre pots.
Be aware of glaze thickness. The prepared glazes in the studio are often not quite right; too thin and too thick
can both fuck up your pot.
Take the time to get the pits out of shino; don't use it for bowl insides.
What we want to do should be obvious. We want to use some amount of range, like [0 , (T-1)] to send a value <= (T-1) immediately,
and use the rest of the range [T, base-1] to send some part of a value that's >= T. (eg. base = 256 for bytewise encoding)
With our background this should be pretty obvious so I'll jump straight to the code. The trick is in the case that it doesn't all fit in the current byte,
we'll use modulo to output part of the value in the space we're given.
First a recursive implementation which is more obviously symmetric between the encoder and decoder :
(In this implementation I put the "can send immediately" in the top part of the word, and I put the "here's part
of the value but there's more to follow" in the bottom part of the word, which is the opposite of the description above).
Now in all this code we're outputing a byte at a time, but of course you could start with 2 bytes,
or go 1-1-2 or whatever.
EncodeMod is a superset of some of the previous encodings we've looked at.
For example, mod = 1 is a 1-1-1-1 flag value encoding. That is, we reserve only one value in the range to indicate
more bytes follow.
mod = 128 is a unary flag-bit encoding (the 1 bit flag + 7 bit payload scheme).
(mod over 128 is valid but a weird thing to do (*!*). mod = 0 is send a byte. mod = 256 is send one byte from a multi-byte word;
eg. to send a uint16 in 16 bits is encodemod 256 then encodemod 0 ).
In between EncodeMod can generate intermediate encodings. Here is a sampling of the values at which
EncodeMod steps up to using the next # of bytes :
EncodeMod is flexible but it cannot generate all possible variable length integer encodings in bytes. As a counterexample,
if we try to reproduce the 2-flag-bits thresholds (0,0x40,0x4040,0x404040) the closest I can get with EncodeMod is to do
mod 192, then 170, then 127, which gives thresholds of (0,0x40,0x40C0,0x408040). One of the differences here is that EncodeMod
has no upper limit, it can encode infinite values (just like flag-value and unary-flag-bits encodings), but the fixed-number-of-flag bit
(or finite length prefix code) type encodings do have an upper limit, so are not reserving any space for "there are still bigger values".
Finally, some charts showing the number of bytes required to send a value with various mods.
The charts all show the same data, just at different zooms of the X axis. You should be able to see that
low mod is best for small values, high mod is best for large values, and they cross somewhere in between.
(the rightmost point is not the next byte step, it's a generated clip with the chart boundary; all the other
points indicate a step up in the number of bytes output).
ADDENDUM : *!* : actually encodemod with a mod over the midpoint is totally useful and reasonable.
If you're trying to put two different things in a byte (eg. in Part 1 we were putting either "the value fits in
this byte and here it is" or "the value does not fit in this byte and here's a portion of it"), you can
obviously use a flag bit.
eg. say you're trying to send either some number of apples (A's) or some number of bananas (B's) in a byte.
You could send :
But writing it in a value-checking manner makes it obvious that we don't need to put the divider in the middle.
We could just as easily do :
To first approximation, the division point should be the probability of A's vs. B's ; eg. midpoint = P(A)*256.
(this is not exactly right; it depends on the distribution of magnitudes of A's and B's ; plus there are some ugly
quantization boundary effects, but it is roughly right).
Of course flag-value encodings that we saw last time are a special case of this with the divider shoved all
the way to one end.
Now this is obviously a kind of proto-arithmetic-coder. It doesn't carry left over range from one byte to the next,
but it's trying to divide the output range of the current byte based on the probability of the symbol.
In particular for one simple case, it does exactly what you expect :
Say you have a random source of 0's and 1's (eg. bits) with some probability P0 of 0 (and 1-P0 of 1).
You decide to encode them using RLE, that is you'll send a run length of either 0's or 1's. You want to put
the run length in a byte (or nibble, or whatever). You can do it by sending a word which is broken into
two ranges; one range for a run of 0's and another range for a run of 1's. Thusly :
Anyway, this encoder has just been a very concrete demonstration to get us to this statement :
For a random source with probability P0 of a 0, the optimal value of "divider" is divider = P0 * base.
That is, you split the range proportional to the probability of each type of value.
(it may be either the floor or ceil of that float). (and it may not be true very close to the edges,
eg. at divider=1 or base-2).
This was just an exercise to prove that our intuition is correct - the divider corresponds to the probability
of each type of thing.
I'm sure that this technique has been well known to practitioners of the art forever. Back when LZRW was written, the
tiniest last ounce of efficiency was needed, so the bit-packing approaches were preferred. With better CPU's now we
can afford to do a branch or table lookup, which lets us put the divider away from the middle. So far as I know this
technique was first used in a mainstream product in LZO (*). Recently Sean helped clarify my thinking about it.
In the next part we will bring together parts 1 and 2.
* = ADDENDUM : LZO uses this scheme to send a match or literal(s) in a byte. I don't recall the exact scheme in LZO,
but say for example you wanted to send either a literal run len or a match length in an LZ encoder in a 4 bit nibble.
You could send 1 bit for a match/literal flag, and then 3 bits of match len or literal run len. Instead, we now
see that's equal to a threshold of 8, and we can put that threshold somewhere else to minimize our output. eg. it
could be 10, then you have [0,9] for matches and [10,15] for literals.
In LZO the threshold is a constant, presumably tweaked over some test set, but other ideas are obvious. The threshold
could be transmitted in the header of the file and the encoder could optimize it for the file in question. The threshold
could also be dynamic, then you have an adaptive bytewise entropy coder of sorts. The adaptation schemes are obvious
because as we have shown the threshold acts very much like a probability, so you can use the standard schemes that
are known for binary arithmetic coders (eg. threshold += (threshold>>5) type of stuff, or rung-ladder, etc).
(rung-ladder is a table-driven state machine, eg. threshold = table[state].threshold ; state = table[state].after_match and
then optimize the tables somehow).
If I recall correctly, LZO also does some semi-arithmetic-codery stuff, in the sense that if not all of the range
is needed to flag what you are currently sending, that value range can be carried over into a later encoding.
Again, this not the exact LZO scheme but you could imagine a decoder that works like this : grab a nibble;
a value in [0-11] is a match (with perhaps 1 bit of offset and the rest is match lengths), a value in [12-15]
flags a literal; now obviously you don't need 4 values to send a single bit flag, so we keep (nibble-12) and add
it to the next match length we decode.
ADDENDUM 2 : just to emphasize what I'm saying in the last paragraph because I think it's important.
When designing heuristic / bit-packing compression codes, you should *not* take the view point of "hey a literal flag takes 1 bit so I will allocate
one bit of space for it".
What you should do is find the probability of events; (eg. literal vs. match); and allocate space in the code based on probability. Then,
once you have done that, figure out a way to use the values in the code productively.
A common case is you want to send some value which has a large max (or perhaps infinite or unknown max) but has a
much lower mean. Obviously you don't want to just output all the bits/bytes needed for the max value all the time.
(to be quite concrete : one standard situation is transmitting a file size. A file size could take up to 64 bits but
most file sizes fit in 16 bits. Sending 64 bits all the time is silly, you want some kind of variable length encoding
that using 2 bytes when possible, then more if necessary. Of course because of Kraft inequality you will no longer
be able to send a value of (2^64 -1) in 8 bytes, because you are compressing some of the values you must expand some)
Let's run through the standard ways to do this :
These reserve a flag value in the range to mean "more bytes follow". The most basic form is like :
There are various other options for flag-value encodings. I like to name them by the number of bytes send in each step.
So the example above is "1-1-1-1-..." encoding. (it's the "unary" of byte encoding; that is, 1-1-1-1 encoding sends small values
in the fewest possible bytes, but it sends large values in the most possible bytes (of any prefix code)).
But there are other options : 1-2-4-8 (try to send just 1 byte, then try to send 2 bytes (flag value 65535), then try in 4),
1-1-2-3-4-5 , etc. (and you don't have to start with 1; for file sizes you might use 2-3-5-8 (though really for file sizes a
flag bit encoding is probably better)).
In fact while the simplicity of 1-1-1-1 encoding is appealing, it's almost never the best way to go, because in the rare chance
that you have some degenerate data with a very large value, it can expand a lot. (eg. to send 100,000 in 1-1-1-... encoding takes 393 bytes).
Even if 1-1-1- is theoretically optimal on your data, you should use 1-1-1-2-4 or something like that to limit the maximum output in bad
cases.
Here's the number of bytes sent for some flag value encodings :
The next very common method is a flag bit (or bits) to indicate more bytes follow. This lets you send large values in fewer bytes
at the cost of sending small values in more bytes.
The flag bits can be sent with any prefix code (eg. Huffman, unary, etc) of course. Some common simple cases :
(aka unary flag bits, or 7-bit words). You do this encoding thusly :
Another common case is :
Now of course you can combine the two, and you can use the same kind of nomenclature as before, but
this time referring to the number of flag bits ; eg. 1 flag bit, then 1 more, then 2, then 3. But as noted
previously this is really just a prefix code for the number of bits.
Flag bit and flag value are really just two extremes of a related continuum, which we shall see in a later post.
It was really nice of them to include "/loc" (perhaps by accident?).
Man I miss the days when I was excited to explore a virtual world, and felt like nobody but me and the developer had seen it before,
and I drew my own maps and wrote down notes of stuff I wanted to come back to later.
I feel like the internet has ruined that. Sure you could choose not to look on the net, but that's like choosing to enter a boxing ring
with a blindfold on; everyone else has a massive massive advantage over you. Even in single player games it feels dumb to me; it's like
people who hike up mountains when there's a tram or road up it; for me it totally ruins the hike if I get to the top and find a parking lot
full of people who got there the easy way.
One of my favorites was the Amiga game "Dungeon Master" where you made spells by putting together these elemental symbols. You would find
scrolls and clues in the game with combos that worked, but you could also sort of deduce them (they were semi-logical), and when you figured one out it was like
an awesome eureka moment, and you wrote it down in your little scratch pad. Nowadays that kind of system can't even be in games at all
because everyone would just look up all the combos right away (some dumb devs do still try to use this
kind of system, but don't let you use a combo until you have unlocked it (by purchase or level up or whatever),
which ruins all the joy from it and makes it quite pointless).
We had amazing lettuces for many months. It made me feel great, very worth it. Broccoli was great (*), asian greens, arugula, herbs,
peas and beans are easy of course. We need taller trellis for beans next year. Alpine strawberries were good but got buggy. Regular
strawberries got mealy, not sure why (too much water?). It almost
works to plant peas and beans in the same bed, and as the peas are finishing, the beans grow up their trellis.
Corn worked surprisingly well, we should do it again but find a better variety to plant.
Haven't really figured out what to grow in late summer here, we've got a near-dormant veggie garden now, only
harvesting tomatos at the moment.
* = Broccoli is one of my all time favorite things to grow myself, because I can pick it when it's still a baby. Baby broccoli is
sweet and tender; it's totally unrelated to the "baby broccoli" sold in grocery stores, which is a full grown different plant.
The leaves of baby broccoli are the best part. I also liked radish micro-greens; maybe I'll do a whole micro-green garden
some day.
Sophomoric people
try to do cost/benefit analyses about their home farm. If your time is worth $100/hour or more that
is just massively retarded (*). (I ranted before about
some moron doing
chicken coop cost-benefit analysis all wrong ). For people with jobs, the only relevant question is do
you enjoy the time you spend on it. If you enjoy it, it's worth it, even if you throw out all the produce.
If you don't enjoy it, then no value of produce can make up for the time cost. Yes there are borderline
cases but you all do cost-benefit analysis so massively wrong that you just shouldn't even try. (for example
Steve Solomon's (very good) book has a section on the best dollar-value crops to grow; you should just tear those pages
out. Maybe if you're quite poor and trying to make a subsistence living with gardening/farming as part
of that you might want to consider it).
(* = and FYI hopefully your time is worth a lot more than your salary; you can see that must be true because
your job selection is partially based on pleasure)
Next year we'll focus more on just the crops we liked that did well, and try to be a bit more systematic
about it to make pest control easier and ensure the long term health of the beds. We need more space between plants,
some fallow rotations, etc.
Solo backpacking snow lake. Snow Lake is actually a really nice overnight because the camp sites are superb.
The quality of camp site is so important to a backpack that I'd rather go some place kind of mediocre that has
nice sites (nice = great views right from the site, lots of separation and privacy, at least the feeling that you're
alone even if you aren't; no bugs or water running into the site) than go someplace amazing that has shitty sites.
(of course, going to camp-anywhere National Forest is a safer bet than going to stupid designated-spots-only
National Parks or no-fires Wildneresses ; boo). This was my first solo backpack and I enjoyed it a lot, very peaceful,
lots of quiet time for thinking about algorithms;
I may do that more in the future. I like the feeling of setting up camp and taking it down by myself, and if I want
to just eat cookies for dinner and skip the cooking I can. I looked into getting a PLB or sat-phone for safety
but they're crazy expensive so I don't think I will. The danger of hiking around here is grossly overstated; pretty much any
hike around here you will see 50 other people every day; and you're never more than 10 miles from a road.
I'm pretty sure I could break my leg and still drag myself 10 miles.
A few photos of home made fruit in home made bowls. I rarely take stupid food photos, but it just pleases me
so greatly to make my own things and consume them for myself. I like to write my own software for myself.
Interacting with other human beings is just a constant disappointment and frustration. I can see the appeal
of becoming a hermit and going off to live in the country and make my own house and my own furniture and grow
my own food, what sweet peace and relaxation that would be; no neighbors, no commuting, no making anything to
anyone else's specs or dealing with the disgusting retailers who want to sell me flat-pack Chinese garbage
and pretend it's "designer"; I hate purchasing experiences, they make me feel sick and dirty; I like making
things very much.
(* = I got fed up with riding around cars; you fucking assholes suck the fun out of what is one of the most
joyful things in my life; I'll be out riding Mercer Island and an hour into it I get into that real deep bliss
from the scenery and endorphins and low blood sugar and all that, and then some fucking selfish stupid asshole
will do some dumb fucking move like try to pass me while we go around a blind hairpin and it takes me right
back into that "I hate you all" negativity (**); I don't want to be in that mindset, so I figured I'd try some dirt
track riding to get away from the cars. (in my adulthood I have realized that there's no point in railing
against the awfulness of the world; now I just try to figure out ways to live a life that is parallel but separate
from the horribleness). I tried a few forest roads, but they're just too rutted and rocky
and pot-holed to be a pleasant bike ride. You can ride the logging roads that are closed to traffic, which is nice,
but those tend to be super vertical (I rode one that must have been 20% grade the whole way for about a quarter mile
before I threw in the towel; part of the problem is that on dirt your rear wheel slips a lot with that kind of grade,
so even if you get super low gears it's unpleasant). So far all I've found that's really great is this Iron Horse rails to trails
conversion over the 90). (** = for all the self-righteous asshole anti-bike readers I can add that riding Mercer Island on
the weekend when it's full of Freds (that's wannabe race cyclists) is at least as annoying as the cars; they
draft my ass without saying anything (you do not have permission to draft me you fucker! it's stupid and dangerous!
and if you're gonna do it at least give me a chance to draft you for a while, you inconsiderate selfish asshole,
you are stealing effort from me, you thief!),
ride two abreast on the narrow bridge path (***), and all manner of stupid shit).
(*** = actually the most annoying thing on the bike bridge are the fucking pedestrians who are just out for a stroll;
WTF is wrong with you people? Oh I know, I'll go for a walk on the shoulder of a fucking freeway, that will be delightful. I really
enjoy having gravel shot at my head at high speed. Hmm,
it's a narrow bike lane with fast two way traffic, let me just stand around in the middle of it. God it's a nightmare
on that damn bridge, anyone who goes on it other than out of absolute necessity has some serious brain damage) (****)
(**** = wow I didn't realize I had so much ranting pent up; I guess it particularly upsets me because the area I live
in with the Lake Washington Blvd lined with trees and the wonderful loop around Mercer Island has got to be one of the
best urban rides in all of America, and there's absolutely no reason that it couldn't be delightful for everyone on it,
but the sheer stupidity and dickishness of some people makes it so much worse than it needs to be. It's very sad.)
Backpacking is the only way I can stop
working these days and get some sleep (*), otherwise my mind keeps saying "wake up and get back on the computer".
(* = though not much).
It's here : lzp1.h
As noted previously, this is no longer a good way to do fast data compression; it's too branchy, it
doesn't take advantage of the large amount of RAM available, etc. This is mainly a historical
curiosity.
This header was just ripped out of the LZP code that has long been available
here, at cbloom.com .
One thing that has really struck me, being away from compression for a while and then coming back to it,
is that so many of the new ideas are just old ideas that weren't practical at the time. When you're
working on something, you have tons of ideas that you throw out because they take too much memory,
or they're too slow, or whatever. But in 5-10 years those ideas will be good, and 5-10 years is not
really that far away. If we were dick-heads we could have patented every idea that we ruled out because it
was "not practical" at the moment.
(that's not to disparage the new ideas; what most people who are not practitioners of the dark arts don't realize
is that the valuable contribution is almost always the tiny details that make something work; the general
idea was probably obvious to researchers 20 years ago, they just didn't pursue it for whatever reason)
1. It's tiny. I had no idea from the internet pictures that it was so small; it's the size of a Miata/MX-5.
2. Driver space is okay. I could sit upright and still had head clearance. Better than a 370Z (or Miata) for example,
which is too small for me. There are some weird lumps on the ceiling that might annoy me. Right leg space is
a little cramped; probably no problem for short drives but would get annoying on long drives.
3. The seats are a disaster. They have the extremely narrowly set bolsters just like the Speed 3. WTF are you
thinking, car makers of the world? Humans come in different sizes, you can't just put tight bolsters on a seat
and offer no other options.
4. The visibility is just okay. It's not ridiculously bad like the 370Z or many other modern cars, but it's also
not good like the 997 or WRX. The doors come up a lot higher than I like in a car, almost to my shoulders, but it's
becoming increasingly hard to buy any car with decent visibility. The pillars are rather thicker than I like and
the rear view, while again not totally ridiculous like some modern cars, is not great. (I'm pretty sure the future of cars is that they
have no windows at all and the only way to avoid crashing is with all the automatic radar systems).
5. It actually sounds pretty good; they've done a nice job of tweaking it so the little wheezy 4 banger has a
growl.
6. The power is pathetic. It feels like even less than 200 hp. I think the problem is there's just no torque
kick anywhere in the rev range. You floor it at 2000 rpm and the revs just slowly climb without any drama.
Even an old Honda Civic feels like it has more power because you get that nice VTEC lump; a jump in torque makes
a car feel like it's doing more than it really is, whereas a car with low torque and high revs might actually be
decently fast (it has decent hp) but feels like it's not doing much. Very disappointing.
7. The interior is servicable. It felt a bit cheaper than even a WRX interior. One particular annoyance is
the gate you pull up to put it in reverse is plastic and felt really shitty, and that's something you have to
touch on a semi-regular basis. Most car companies have gotten pretty smart about making the touch surfaces
high quality and putting the cheapo shit where you don't touch it, so that's a silly fail.
As for the handling, I can't say anything useful because I can't push it enough on a test drive. The steering
felt nice, communicative, it certainly felt like there might be something special in the chassis (as all the
professional reviewers have been saying).
I am very excited about it still and I want it to be a great car (I love the idea of a small, light car that's
easy to toss and easy to catch, with narrow tires so you can have fun at reasonable speeds; I hate modern super
sports cars with stupid wide tires that don't start to feel exciting until over 100 mph), but don't think I'll buy one with the current engine.
Last week at RAD we discovered that some of our Xenon libs were several megabytes bigger than they need to be
simply because we included "xtl.h" in a few files. What was happening was that xtl.h has a ton of inline
functions in it, and the compiler goes ahead and compiles all of them and sticks them in your OBJ even if
you don't use them (of course this is another problem with C that I'd like to see fixed - there's no need
to waste all that time compiling functions that I don't use - but that's another rant).
Of course when you make a lib, all it does is cram together your obj's. It doesn't strip the uncalled functions
(that's left to the linker, later on).
So DLL's are fucked and we wanted them to be "packages" ; and libs are also fucked and we want them to be packages too!
What I think a "package" in C should be :
This also lets you strip all un-referenced and un-exported symbols.
So that's all background material. What this post is about is this : it occurs to me that you can get most of this in
standard C by making your own "libpackager" tool.
libpackager should take a lib and output a lib. You have to also provide it a list of exports (or you could use some decorator
that it can parse to mark the exports). It can parse the obj's in the lib and find all the symbols and do its own "link" step
to eliminate unreferenced symbols, then remake the obj's without those symbols. So this gives us #2, which is a pretty big win.
You could also do #4 by having libpackager decorate all the internal symbol names that are neither import nor export. This is
roughly equivalent to if you had put all your internal symbols in some namespace.
You could even do #5 ; make libpackage go and grab the libs you reference, stuff their obj's into your lib also. Then your copy
of the lib and your references get name-decorated so they don't conflict with someone else. eg. say you want to make "oodle.lib"
as a package and you use "radmemset" from "radutil.lib" , packager could grab radutil.lib and stuff it in; then since radmemset is
now an internal reference, it gets changed to "oodlelib_radmemset". Now when you put "oodle.lib" and "bink.lib" both into your app,
if they used different versions of radmemset, they will not cross-link because the libs have been made into fake "packages".
(this step should be optional because sometimes you do want cross-links).
One annoying complication is that this doesn't work with the stdlib in a straightforward way. I would very much like to be able to
"package" all references to stdlib in this way, but stdlib is not just a normal lib, it also has some special cheating connections to
the crt0 startup code, so you can't just go and rename all its symbols to oodlelib_memset and such. Perhaps this could be resolved,
which would be nice to avoid all those garbage problems that arise because some lib was built for libc and some other lib was built for
libcmtd , etc.
I think this all is pretty straightforward (other than the stdlib issues). The only hard part is parsing the lib and obj formats on
every platform and build variant that you need to support.
(BTW a bit of web searching indicates that the gcc tools on some platforms (Mac) provide some of this; there seems to
be some special attributes for exports from libs and perhaps a lib tool that does dead strips; it's hard to follow
gcc docs)
The "ifdef way" is like :
1. You can tell if the user set STUFF or not. In the ifdef way, not setting it is one of the boolean values, so you can't
tell if the user made any intentional selection or not. Sometimes you want to ensure that something was selected explicitly because it's too dangerous to
fall back to a default automatically.
2. You can easily change the default value when STUFF is not set. You can just do #ifndef STUFF #define STUFF 0 or #ifndef STUFF #define STUFF 1.
To change the default with the ifdef way, you have to change the sense of the boolean (eg. instead of STUFF use NOTSTUFF) and then all your
builds break because they are setting STUFF intead of NOTSTUFF (and that breakage is totally fragile and non-detectable because of point #1).
3. There's no way to positively say "not STUFF" in the ifdef way. The way not stuff is set is by not passing anything ot the command line,
but frequently it's hard to track down exactly how the command line is being set through the convoluted machinations of the IDE or make system.
If some other bad part of the build script has put a -DSTUFF on your command line, you can't easily undo that by just tacking something else
on the end of the command line.
I think it's incontrovertible that the "if way" is just massively better, and everyone should use it all the time, and never use ifdef.
And yet I myself still use ifdef frequently. I'm not really sure why, I think it's just because I grew up using ifdef for toggles, and I'm
so used to seeing it in other people's code that it just comes out of my fingers naturally.
Anyway, I was thinking about this because I had some problems with some #defines at RAD, and I chased down the problem and cleaned it up,
and it seemed to me that it was a pretty good example of "cbloom style robustination". I've never met anyone who writes code quite like me
(some are thankful for that, I know); I try to write code that is hard to use wrong (but without adding crazy complexity or overhead the
way Herb Sutter style code does).
(disclaimer : this is not intended as a passive aggressive back-handed way of calling out some RAD coder; the RAD code in question is totally
standard style that you would see anywhere, and it wasn't broken, just hard for me to use)
Anyhoo, the code in question set up the function exporting for Oodle.h ; it was controlled by two #defines :
(* = actually it also works if you use -DMAKEORIMPORTLIB in case 4; specifying "dllimport" for functions is actually optional and only used
by the compiler as an optimization)
So anyway here's the robustinated version :
(and of course I instinctively used #ifdef for toggles when I wrote this instead of using #if)
I used to think that "robustinated" code was the One True Way to write code, and I wrote advocacy articles about it
and tried to educate others and so on. I basically have given up on that because it's too frustrating and tiring trying
to convince people about coding practices. And in my old age I'm more humble and no longer so sure that it is better (because
the code becomes longer, and short to-the-point code has inherent advantages; also robustination takes coder time which could
be spent on other things; lastly robustination also tends to make compiles slower which hurts rapid iteration).
But I do know it's the right way for *me* to write code. When I first came to RAD I tried very hard to write code the "RAD way"
so that the style would be consistent and so on. That was a huge mistake, it was very painful for me and made me write very bad
code and take much longer than I should have. Only after a few years in did I realize that to be productive I have to write code
my way. In particular I need the code to be very strongly self-checking.
Posts about the disaster of Unicode on Windows : (mainly with respect to old apps and/or console apps)
cbloom rants 06-14-08 - 3
Brief summary : correctly handling unicode (*) file names in a console app on windows is almost impossible. cblib
has some functions to do the best I believe you can do (MakeUnicodeNameFullMatch), but it's so complicated and
error prone that I suggest you should not try it. Also never use printf with wchars, it's badly broken; do your own conversion.
(* = actually the problem occurs even for non-unicode 8-bit character names (eg. any time the "A" "OEM" and "ConsoleCP"
encodings could be different); Windows console apps only work reliably on file names that are 7-bit ascii).
Fie! Fie I say to you!
One of the great tragedies of modern technical writing is that it has gotten so fucking standard and boring.
There is absolutely no reason for it. It does not make it clearer or easier to read, in fact it makes it
worse in every way - less clear, less fun, less human.
If you read actual great technical writing, it has humanity and humor. For me the absolute giants of technical
writing are Feynman and Einstein. There's lots of cleverness and little winks for the advanced reader and lots
of non-standard ways of writing things. If they followed Boring Technical Style Guide it would suck all
the personality and beauty from their writing. (I also like Isaac Asimov's technical writing and John Baez's).
I think computer writing has become particularly bad in the last 10 years or so. The books are all Microsoft-press-style
bullet point garbage. Blogs (eg. finger files) started out in the early days as sort of wonderful ramshackle things where each one
was different and reflected the writer's personality, but recently there has developed this standard "technical blog style"
that everyone follows.
Standard Technical Blog Style is very pedantic and condescending; the author acts like some expert from on high
(regardless of their actual expertise level). There are as many self-plugs as possible. I find it vomitacious.
A while ago someone wrote a blog series about floating point stuff; it really bothered me for various reasons. One was
that the topic has been covered many times in the past (by
Chris Lomont for example, also FS Acton, Kahan, Hecker, etc)
(if you actually want to learn about floating points,
Kahan's web page is a good place to start).
Another is that it just rolled
out the same old crap without actually talking about solutions (like "use epsilons for floating point compares" ; wow
that is super non-useful advice; tell me something real like how to make a robust BSP engine with floating point code).
But maybe the most bothersome thing about it all was that it was written in Standard Boring Dicky Technical Blog Style
when you can go out right now and buy a wonderful book by Forman S. Acton on floating point which is not only much much
more useful, but it's also written with cleverness and humanity. (Kahan's writing is also delightfully quirky).
It's kind of like taking a beautifully funky indie movie and remaking it as mainstream shlock; it's not only a waste
of time, but offensive to those of us who appreciate the aesthetic pleasure that is possible in technical writing.
Anyway, if you are considering doing some blogging or technical writing, here is my advice to you :
1. Make it informal. Use I. Use incomplete sentences. Tell stories about your personal experience with the
topic. When you put in some really complicated code or equations or whatever, explain what it means with
colloquial, conversational english.
2. Don't look at any reference material for a writing style to copy. Their style fucking sucks. Don't copy it.
If you listen to people telling you the "right way" to do things, you will be aspiring to mediocrity.
(err, ahem, but do listen to me).
3. Do not use an artifical impersonal voice to add "gravity" or a false air of expertise, it doesn't work.
Be humble; admit it
when you aren't sure about something. Also don't pad small ideas with more text to make them seem bigger.
There's nothing wrong with a one sentence idea.
90% of AltDev blogs should be one paragraph or less.
4. Do not waste time editing that could be spent making the content better. I bet you didn't actually run
fair comparison tests against competing methods. Go do that instead. I will not judge you by the purpleness
of your prose but rather by the content of your creation.
5. Stop writing blogs about shit that is already very well covered in books. Your writing should always be
from the perspective of your domain-specific experience on a topic. Don't write yet another introduction to
Quaternions, write about how you've used them differently or some application you've found that you think is
worth writing about. Real domain-specific experience is what make your writing valuable.
6. Habeas Corpus. Show me the money. If you're writing about some new technique, provide code, provide an
exe, prove it. If I can't repro your results, then I don't believe you.
Document the tiny details and embarassing hacks. The vast majority of technical writers don't write up
what they *actually* use. Instead they write up the idealized clean version of the algorithm that they
think is more elegant and more scientific. Often the most useful thing in your work are the hacks for
weird cases that didn't work right. People are usually too proud of the main idea; hey guess what, thousands
of people have had that idea before, but didn't think it was worth pursuing or didn't get the details quite
right; the value is usually in the tweak constants or the little fudgey bits that you figured out.
Download : cbhashtable.h at cbloom.com
cbhashtable was ripped out of cblib .
I recently improved the cblib version so that the hash table entries can be {hash,key,data} or {hash,data} (key==data) or {key,data} (no stored hash)
or just {data} (key==data and no stored hash). (or whatever you want I guess, though those are the only 4 that make sense I think).
cbhashtable is built on a vector to store its entries; you can use std::vector, or your own, or use cbvector .
See previous posts on hash tables :
cbloom rants 10-17-08 - 1
Commentary :
I'm pretty happy with the implementation of cbhashtable now, but setting it up is still a bit awkward. (using it
once its set up is fine). You have to create an "ops" functor which knows how to make & detect the special empty &
deleted keys. I may try to improve this some day.
To make file names that are (almost) guaranteed to not cause any problems, you can use
Once you do all that, hey you actually get a fucking file name that you can use in the real world. While
*some* apps work with some of those characters and *some* command shells work with some of those characters,
if you have any of them, it's quite possible you will run somebody else's wild card traverse and it will do
unexpected things, which is very very very (very very very) bad. Particularly when deleting or renaming.
usage :
What I really crave now is a big Hollywood spectacle, but one that doesn't suck. This seems to
be the rarest type of movie of all. I want beautiful people, lavish production, big real sets,
lots of music, fast pace, feel good, but without being annoyingly stupid or cliche.
Bonus points are awarded for :
B1. Long tracking shots. Oh yeah, delicious. But not gratuitious "I'm showing off" or "look how clever I am referencing other movies".
B2. Montages, with music. Usually the best part of a movie. Pure fun without any of that retarded dialogue messing it up.
B3. Lots of color, rich saturation. ala Eyes Wides Shut or Almodovar.
B4. Steadicam, with movement. Man what a lovely feeling, the camera gliding slowly as it tracks the action.
B5. Sexiness. Not necessarily sex, but arousal. Love the standard scene where a girl changes her clothes while
carrying on a conversation.
B6. Sparkly lights (particularly red and blue). A nightclub scene is always welcome. Wet city streets at night is another great standard.
B7. Good music that's fun and funky , not big dramatic orchestral score (yawn) ; maybe the overall theme is just fun and playfulness.
Minus points for :
M1. Violence. A tiny bit is okay.
M2. Computer graphics. Jesus CG sucks so bad, it ruins movies. None of that garbage like "The Fall" that people
keep putting on lists of good looking movies. And heavy use of color filters is getting a bit old too (mainly on
high-end TV shows, it's becoming way too common and heavy handed; oo everything is slightly blue oo how cool, no it's not,
it's a fucking hack move to use that trick without intention or subtlety).
M3. Black and White. (see B3) Also a minus is being made before 1985 or so when film looked much worse.
M4. Horrible acting.
M5. Taking itself seriously. It should feel like the director is having a laugh.
M6. Superheros or the military.
M7. Foreign language. Sometimes I don't mind subtitles, but when the whole point is the visual beauty,
having your eyes on the bottom of the screen ruins it.
Anyway, the obvious ones I can think of are :
1. Goodfellas
The whole Scorcese / De Palma period in the 80's-90's is obviously the model for this type of movie that
various modern movies copy (eg. "Boogie Nights" is a very intentional recreation of one of those movies
in a different profession; it works well at times, but is obviously way too long, and even in the best
tracking shots in Boogie Nights it feels very forced; you can see the actors are concentrated on hitting
their timing perfectly and aren't natural). (all the PT and Wes Anderson movies sort of fail in that they
just feel too uptight, too forced; it's like how Martha Stewart is actually a horrible hostess
because everything is just too careful, there's no "beautiful mistakes").
(unrelated to the first part)
I updated my Netflix -> CSV extractor so I can do this. Movies I have enjoyed in the last 12 months or so :
DVD :
It occured to me that the whole C "struct" thing is really not what we want most of the time. 99% of the time I
want to say "put all these values in this thing, and I don't care what order". In particular, Mr. Compiler, you can
optimize the order to minimize size, or for speed, or whatever.
Now that's not a big deal with just a simple struct, but more generally it's massive.
What if I could say "bag" is some values with certain types and names. Then when I combine bags, you don't duplicate
things that are in both. And you can reinterpret bags to other bags as long as they have the needed values.
Obviously this doesn't work with C functions which are compiled once and get to know the offset to the data in that one compile.
You want a compiler that can compile all the logic but leave the variable references unspecified until it is bound to a
concrete type.
Now obviously templates sort of address this, but in a much uglier way.
Templates are worse in 2 ways. 1. they don't do any shared compilation, they leave all compilation to the last
minute when they know the concrete type they get to work on; conversely bags can do 99% of the compilation up front
and just need to specialize the variable addressing per usage. 2. they don't do any static type checking; that is,
when you write the template you can't specify in any kind of clean way "you can use this template on anything that
provides m_pos and m_color" (C++ concepts were supposed to sort of address this) ; bags provide this very nicely and let you write a template and error-check it before you
even apply it to anything.
Obviously templates are more powerful, but not useable in reality because of these problems; they delay compilation
too much for widespread use, it makes compilation too slow, and puts the errors in the wrong place (at the use site, not
the template itself). Compilation up front is great (I think weakly typed languages are ridiculous disasters).
But in theory bags can do even more. One annoying problem we have at RAD is that factored out functions can be
massively slower than macros. There are various reasons for this, but one is that if you have some variables
in registers and want to call a function on them, the compiler will almost always do a ton of work to move
those variables around that it doesn't need to do (either writing them back to memory or pushing them on the
stack, or just moving them to other registers).
With bags in theory you could do something like :
The win is that Particle_Move basically acts like a macro, but I get the type-safety and separate compilation
error check benefits of an inline function.
Similarly, I shouldn't have to write new code every time I have a bunch of data that I want to use as
SOA (structure of arrays) instead of AOS (array of structures). eg. if I have
Being a bit redundant : the classic way that simple C inheritance fails happens a lot with vertex types in
graphics. It goes something like this :
Now obviously inheritance and virtual functions solve a slightly different problem than
bags. They let you act on part of a type without knowing the concrete type. Bags have to be
used on the concrete type, just like templates.
I guess the fundamental problem with structs (and why they're wrong for some uses) is that
they are actually solving this slightly different problem.
Anyhoo.
(aside : it's very hard for me to tell if this is actually a rational decision on my part; I've been
working a lot trying to finally finish Oodle, and my brain gets into this weird state when I've been
crunching a lot where I decide that I "need to do" something, and then only much later do I realize
that it was only the mania of Work Mode that made me think I had to do that. Usually at some point
during a big Work Mode marathon I'll decide that I need to buy a bunch of random things to "improve"
some situation, like I'll go buy a bunch of AFCI breakers and replace the breakers in my panel, and
at the time I "had to do that", and weeks later I'm like "WTF did I do that for?". In Work Mode everything
becomes either a hassle that I don't want to deal with right now (such as "relaxing" or "socializing";
that shit is definitely priority level 3) or stuff that gets put on the todo list and just knocked out,
even if it's actually super unimportant (eg. writing this blog post). Anyhoo.)
The problem is that E46 M3's around here seem to be cursed.
(That's aside from the very big curse that BMW themselves put
on the car by making a rear subframe that literally rips right out of its attachment points
in the folded sheet metal body) (and ignoring the major curse they put on the engine but fixed by recall).
(this is way off topic, but folded sheet metal is one of the great fucking shitty garbage innovations
of the last 20-30 years; it used to be that something like a heater grate or a metal toolbox was *cast*, now they're
almost always folded, and folded metal fucking sucks balls, it's so much worse, it has weak points at the folds,
it has sharp edges, it's ugly, it just reeks of cheapness, we've really gone way backwards
on the quality of basic goods, but anyhoo).
So first of all, something like 75% or more of the M3's I see for sale are optioned in some retarded way that rules them
out for me. Lots of them are fucking automatics; no thank you. Then a whole god damn mess of them
are convertibles.
WTF is wrong with people who buy convertible sedans. If you want a fucking convertible, BMW makes a great
one, and you can even get it with the same exact damn engine as the E46 M3, it's called an M Roadster and
it's a perfectly good car. Don't take the roof off a sedan. A convertible M3 is in the same family as this :
(in the sense that they contribute to the ugliness of the modern world; concrete block walls, plant beds
that are just fields of bark mulch, and convertibles that should never have been convertibles, all vomit-inducing).
Then you get all the bizarro interior colors; white leather, red leather, blue leather !? WTF are you thinking?
You're not Liberace, no your "cinammon" interior does not look cool, you are not remotely pulling it off.
Leather can only be black or brown unless you are a glam rocker or a stripper.
Okay, so now that we're reduced to the set of cars I could possibly buy - then the E46 M3 Curse takes over.
The Curse says that every E46 M3 around here is either wrecked, or has been "improved" by some suburban Wigger
who has "dropped" or "blakt out" or "stanced" or "hella-flushed" the car, thereby ruining it.
It's kind of funny, so many have been wrecked; clearly RWD + Seattle rain is a great recipe for smashing up
M3's. The ads just try to slip it in like it's no big deal. Most of them start with "great condition!"
and end with "oh, BTW, it was totalled". For example (I haven't even been saving the most hilarious examples
of these ads, these are just the ones that happen to be up today) :
So a totally insane percentage of the E46 M3's around here have been totalled.
And then the remainder
are like this :
Which aside from meaning that I have to spend a bunch of money undoing their stupidity (putting the suspension
and wheels and windows all back to stock), the aftermarket
springs make the rear-subframe-rips-out-of-the-car problem much more likely, so you just can't buy these cars.
One of the things that has always slightly turned me off to E46 M3's has been the fact that they are so
popular with the retarded douchebag crowd. Sadly the douchebags don't only ruin the image and mental
association with a car, they also ruin the cars. So sad.
At least this d-bag (also posted recently to Seattle Craigslist) had the
decency to go all the way :
I applaud you sir.
"futures" implemented in C++98 with Oodle :
This creates an async chain to read a file, compress it, write it, then read it back in, decompress it, and write out the decompressed bits.
Futures can take either immediates as arguments or other futures. If they take futures as arguments,
they enqueue themself to run when their argument is ready (using the forward dependency system). Dependencies are all automatic based on
function arguments; it occurs to me that this rather like the way CPU's do scheduling for out-of-order-processing.
(in contrast to idea #1 in Two Alternative Oodles ,
here we do not get the full async graph in advance, it's given to us as we get commands, that is, we're expected to start running things immediately
when we get the command, and we don't get to know what comes next; but, just like in CPU's, the command submission normally runs slightly
ahead of execution (unless our pipeline is totally empty), in which case we have a little bit of time gap)
Functions called by the future can either return values or return futures. (eg, in the code above, "oodle_readfile" could just return an
OodleBufferRC directly, or it could return a future to one). If a function in a future returns a future, then the returned future replaces
the original, and doesn't return to the outer scope until the chain of futures returns a non-future value. That is, this is a way of doing
coroutine yields basically; when you want to yield, you instead return a future to the remaining work. (this is like the lambda-style coroutine
yield that we touched on earlier). (* - see example at end)
future of course has a wait() method that blocks and returns its value. As long as you are passing futures to other futures, you never have
to wait.
You can implement your own wait_all thusly :
A few annoying niggles due to use of old C++ :
1. I don't have lambdas so you actually have to define a function body every time you want to run something as a future.
2. I can't induce the return type of a function, so you have to explicitly specify it when you call start_future.
3. I don't have variadic templates so I have to specifically make versions of start_future<> for 0 args, 1 arg, etc. bleh.
(though variadic templates are so ugly that I might choose to do it this way anyway).
Otherwise not bad. (well, that is, the client usage is not bad; like most C++ the implementation is scary as shit; also doing
this type of stuff in C++ is very heavy on the mallocs (because you have to convert things into different types, and the way you
do that is by new'ing something of the desired type), if you are a sane and reasonable person that should not bother you, but
I know a lot of people are in fact still bothered by mallocs).
In order to automatically depend on a previous future, you need to take its return value as one of your input arguments.
There's also a method to
manually add dependencies on things that aren't input args. Another option is to carry over a dependency through a binding function
which depends on one type and returns another, but that kind of C++ is not to my liking. (**)
To really use this kind of system nicely, you should make functions whose return value is a compound type (eg. a struct) that contains all its
effects. So, for example oodle_writefile returns the name of the file written, because it modifies that object; if you had a function that
modified a game object, like say an Actor *, then its return value should include that Actor *, so that you can use that to set up
dependency chains. (in real code, oodle_writefile should really return a struct containing the file name and also an error code).
* : example of returning futures to continue the async job :
This is a necessary building block, it lets you compose operations, but it's an ugly way to write coroutine-style code.
What it is good for is creating more complex functions from simpler functions, like :
I believe this "future" is much better than the real C++0x std::future, which seems to be missing a lot of features.
** : example of using a binding function to carry over dependencies :
In particular, you can give it a list of several videos on the command line, and it plays all of them. It plays
them frame-exact by stepping through their frames one by one. Then you can pause and single step forward and back,
and you can do split-screen or frame-toggles.
(I've always found that comparing image or video compressors is very misleading if you just look at the output; you
have to look at them side-by-side with the originals, or even better do full-frame ping-ponging. For example, x264
generally looks great, but when you do full-frame ping-pongs you can see that they really fuck up the overall luma
level and completely change the colors quite a bit (this is largely due to the very shitty YUV space that is standard
for video, not anything in x264)).
Anyhoo, videoplayer does some nice things like prefetch and cache lots of frames (it's retarded that the mainstream video players
don't do this; I have gigs of ram you fuckers, prefetch some god damn video so that I can play over a network without
hitching; also keep some previous frames around so I can pause and step backwards a little without seeking!).
download : videoplayer.zip (463k at cbloom.com)
(videoplayer needs radutil.dll which is in RAD Video Tools ;
I put it on delay-load so you only need if you want to load a real video (not just images))
Anyhoo, I haven't touched it in a while cuz I'm off video for now. But "videoplayer" can also be pointed at a dir
and it treats the images in the dir as frames of a video. I did this originally to load frame-dumps of videos that I couldn't
play. For example I would use MPlayer to spit out frames with :
But I realized the other day that I could use dir-loading to just show image slideshows too, if I use the option to manually set the frame rate
to something really low like 0.1 fps. Which brings us around to the title of this post :
I discovered COMMONOPOLY a while ago; I think it's super beautiful ; the guy
chooses images well and there's something about that horizontal reflection that really does some magic on the brain. If you full-screen it
and stare at the middle and let your eyes defocus a bit, you can get the same image going in each eye and it's like mmm yeah good.
But obviously viewing it on the web sucks balls. So what you do is download all the images with DownloadThemAll, put them in a
dir and then point videoplayer at the dir thusly :
MS could have very easily put DLL load-on-first use into the thunk libs they make.
That would have been nice. (ADDENDUM : I guess they did!)
Anyhoo, you can do it yourself, and it looks like this :
We need CALL_IMPORT from before :
Lastly we need to define all the fp_ func pointers with the right signature. This is particularly
easy because png.h wraps its function externs with macros (nice, thanks guys),
so we can just abuse those macros. png.h contains things like :
So anyhoo there you go, libpng use with delay-load.
BTW this also suggests a way that you can make your DLL very easy to use with delay-load and
manual imports. What you
should do is provide a header with only your function protos in it, and wrap them with a macro like :
Unfortunately you can't do a #define inside a #define or this could be used to alias the names transparently
with something like
(ADDENDUM : or, you know, just use the /DELAYLOAD option)
Caveats : this is ripped out of cblib (with dependencies removed), so it's rather ugly. This is not totally
STL compatible. It works well enough for my purposes. Feel free to improve it.
Download :
cbvector.h at cbloom.com
cbvector.h intentionally doesn't include anything. You may need to include stddef.h and/or new.h before cbvector.h.
WARNING : cbvector.h uses #defines to configure various aspects of the implementation. If you set those #defines
to different things in different code files, then you must ensure that one of the following is true :
1. the CB_VECTOR class name is different in each case, or 2. the linkage of all funcs affected is "static", or
3. the entire cbvector is in an anonymous namespace (which is the default). If you're not careful about this you can
get super bizarro bugs when the linker merges functions that aren't actually the same.
I was using ipconfig /release and /renew to do it, but that has a few problems : 1. it's very slow
(not a big deal since the toggle is rare), and 2. it also kills my VPN to work.
Jeff suggested a better way is to turn DNS off and on. I found netsh can do that pretty easily.
I'd never used netsh before, it's pretty nice. The commands are :
For machines you want to still access in "internet off" mode, you can of course just use their explicit IP,
or slightly nicer you can add them to your "hosts" file.
(aside : another cool feature of netsh is that you can save & restore your entire network config if you
want to temporarily mess things up; just use "netsh dump" to save it and "netsh exec" to restore it).
1. The graph-forwarding automated parallelism system.
The idea here is that you make all your async operations be like flow chart widgets, they have "in" and "out" channels, and then you can
draw links and hook up ins to outs.
This creates dependencies automatically, each op depends on everything that feeds its input channels.
So for example you might do :
There are various ways you could set up the async chains, obviously GUI tools where you drag lines around are popular for this sort of
thing, but I think that's a terrible way to go. You could just have a text markup, or some creation functions that you call to build
up the graph.
Another interesting option is to use the "just run the code" method. That is, you make proxy classes for all the variable types and
do-nothing functions with the names of the ops (ReadFile, etc.); then you just run this fake imperative code, and all it does is record the calls
and the arguments, and uses that to build the graph. That's easy enough to do for code without branches, but that's sort of a trivial
case and I'm not sure how to make it work with branches.
In fact in general this type of thing sucks bad for code with loops or branches.
Anyway, I think that this method is basically a terrible idea, except for one thing : creating the graph of the entire async operation
before doing it can be a huge performance win. It allows you to "see the future" in terms of what the client wants to do, and thus
make better scheduling decisions to maximimize utilization of your available computation resources and disk at all times.
In the simplest case, if the client calls a huge Read and the a huge LZcompress after that, that's a dumb non-parallel way to do things,
but in the normal imperative Oodle I can't do anything about it, because at the time I get the Read I don't know what's coming after it.
If you gave me the graph, I could go, oh hey during the Read I'm not using the CPU at all, and during the big LZCompress I'm not using
the disk, so let me break those into smaller pieces and overlap them. Obviously you can schedule IO's around the timeline so that they
try to be issued early enough so their results are back when needed. (though even with the full async graph you can't really schedule right
unless you know how long the cpu operations are going to take).
There are even more subtle low level ways that not knowing the future gets me. In the whole worker thread system, there are crucial
decisions like "should I wake up a new worker thread to take this work item, or wait for one of the existing worker threads to take it?"
or "should I signal the worker threads as I make each item, or wait until I make them all?" ; or even just deciding which worker thread
should take a task to maximize cache coherence. You cannot possibly get these decisions right without knowing the future.
Anyhoo, I don't think the advantage outweighs the horribleness of writing code this way, so on to the next one :
2. The coroutine auto-yielding system.
What if your file IO was always done from a coroutine, and instead of blocking when it didn't have the bytes needed, it
just yielded the coroutine? That would give you fully async file IO (in the sense that you never block a thread just waiting on IO),
and you could write code just like plain sync IO.
eg. you would just write :
The real practical problem is that you just can't make this nice in the hacky switch-based C coroutine method that I use.
You really need a language that supports coroutines natively.
You can cook up a clunky way of doing this with the C coroutines, something like :
But, because of the fact that we use switches and returns, we can't use any stack variables in the arguments to cofread ; eg :
Basically with the hacky C coroutine method you just can't do funny business like this where you hide the control flow;
you have to make the yield points very explicit because they are points where you lose all your stack variables and must recreate them.
Perhaps a larger issue is that if you really were going to go with the full coroutine auto-yielding system, you'd want to be able
to yield from inside function calls, not just at the root level of the coroutine. eg. you'd like to call functions that might do file IO
or fire off worker tasks, and you want them to be able to yield too. That's not possible unless you have full stack-saving coroutines.
ADDENDUM for clarity :
It's totally trivial to fix the lack of stack saving in a limited way. All I have to do is reserve a few slots in the coroutine struct
that cofread can use to store its variables. So cofread becomes :
But it's *not* what I really want for this proposal, which is a full transparent system that a client can build their IO on. The problem is that
cofread can only be called at the "root" level of a coroutine. That is, because the "yield" is not a true language yield that preserves
function call stack, it must be in the base coroutine function.
eg. you can do :
To be super clear and redundant again - Oodle of course does support and extensively uses coroutine IO, but it is for small
tasks that I want to have maximum performance (like, eg. read and decompress a Package), where the limitation of
having to write all your yielding code within one function is okay. The idea of proposal #2 is to make
a system that is visible to the client and totally transparent, which they could use to write all their game IO.
(ASIDE : there is a way to do this in C++ in theory (but not in practice). What you do is do all your yielding at the
coroutine root level still, either using the switch method or the lambda method (doesn't really matter). To do yields
inside function calls, what you do is have your IO routines throw a DataNotReady exception. Then at the coroutine root
level you catch that exception and yield. When you resume from the yield, you retry the function call and should make
it further this time (but might throw again). To do this, all your functions must be fully rewindable, that is they
should be exception safe, and should use classes that back out any uncommitted changes on destruction. I believe this
makes the idea technically possible, but unusable in reality).
1. Use pointer-sized ints for memory buffer sizes.
When I made the transition to building 32-64-bit cross platform I was really annoyed with the fact that I couldn't just use "int"
everywhere any more. To make it easier for myself, I mostly just used int64 for memory buffers, and made a bunch of helpers for myself
that returned int instead of size_t and intptr_t , eg. for things like vector.size() and strlen() and so on I used int-returning variants.
The nice thing about using int64 for memory buffer sizes is that your type doesn't change when you build different targets; that makes
coding a lot simpler (and removes the possibility of bugs due to struct sizes changing and such).
Only very recently have I come around, and it's probably not for the reason you think.
It's because using a separate type for "pointer sized thing" is a good way of documenting functions with types.
In the previous API post I talked about how I like the function arg types to be as self-documenting as possible. It's even better if you
can make it a compile error or warning when you mis-use it. So I was looking at API's like :
If you have an old code base, it's very annoying at first to do this, because you'll have a lot of conversions to do. It only becomes clean
once you have changed your whole code base to follow the rules :
eg. when you try to read a whole file into a memory buffer. You get the file size as an S64 and then you need to pass something to malloc,
which takes an SINTa. That makes a clear single point where you are doing a questionable cast. Once you've done that one cast, the rest
of the code is cast-free which is nice.
Furthermore, I think it's augmented by having helpers for the cast-downs :
2. Avoid recursive mutexes.
Like many people, I read the wisdom of the ancients (old school threading programmers) who said that "recursive mutexes are of questionable
value and lead to dangerous code; prefer non-recursive mutexes" ; I read that and I went "psshaw" and thought they were crotchety dinosaurs.
Well, I've come around.
The thing about recursive mutexes is that like much code which is attractive to the novice, they make the trivial case simpler and look
cleaner, but they make the hard case much worse, and the hard case if what actually matters.
The trivial case is : object only locks itself, never locks other objects, never can be freed while locked, etc.
But inevitably in real world threaded code you have to deal with the harder case for mutexes, which is : object might have to lock other
objects (and they might have to lock it); lock order may be hard to establish; object may want to free itself while locked, etc.
The way that the novice normally makes a thread-safe object with a recursive mutex is to put a lock scoper in every function on that
object :
What's better is to separate the mutex-taking from the actions, so instead you do :
So now you need to do something like call A->Func1 and B->Func2 , and it has to be done "atomically" , eg.
with both locked. Then you run into the issue of mutex order of acquisition and possible deadlocks.
If you have used the first style where objects lock themselves, then it is impossible for objects to call
each other safely. That is, object A can never call object B, because object B might call object A and
there's a deadlock. But with the _Locked functions, any _Locked function can call to another _Locked function,
and they don't have to worry about deadlock; instead the lock taking is all pushed up and can be done in
a standardized order (or even atomically if you have multi-mutex lock acquisiton).
The other thing that non-recursive locks cleans up, is that you know whenever you see a call to Unlock(),
that's the *last* call to unlock and it definitely makes the object publicly visible. That is, there is
a crucial transition point from "I own this object" to "others can have it", and with the recursive lock
method, that transition point is muddied.
For example, consider this simple and common case :
something in the object's internal state tells you it should be deleted. You must do that atomically,
because if you unlock then delete, the internal state might change, eg. :
Basically the recursive mutex is sort of like optimizing the code that's already fast; it makes the
simple case a bit simpler, but it makes the real world hard cases much worse. Better to just start from
the beginning with the more robust long term solution.
Once you realize that the advantage of a recursive mutex is an illusion, then the advantages of the non-
recursive mutex become appealing. You can implement a non-recursive mutex far more efficiently. It can
take only 1 bit of memory. Every object can easily have its own non-recursive mutex and they don't even
need to be allocated or initialized at all.
runlogged runs a command line and saves an archive of all runs, with the arg list and the date/time in the file name
so they never collide.
Thanks to these two :
vanderwoude Batch files
If the internet wasn't such a useless pile of turd, I would completely download both of their sites for my
reference library.
I've written before, but repeating : what I really want is a fully journalling OS; every thing I ever run should be
fully tracked and completely reproducable; all the input & output files should be saved. Oh well, this is
better than nothing.
One is that my personal preference for API's has been heavily influenced by my reliance on Browse Info tips and
auto complete. Some examples :
1. I like really long function names. I want the function name to tell me as much as possible about the function.
If the function has some weird side effect that is not right there in the name, I consider that a failure of the API. I like
function names like "OodleLZ_OpenFileReadHeaderDecompressAsync" ; it tells me exactly what it does. I don't
care how long it is because after I type a few character I control-space and the rest comes out.
2. Related to that, I like file names and function names to be alphabetically front-loaded. That is, I want the first
few characters to distinguish the function as much as possible, because I really want to do a few characters and then
be able to ctrl-space it. I hate the fact that I have to start all my function names with Oodle_ because it makes for
a lot of typing before I can ctrl-space it. In my ideal world, the first few letters are a summary of the action;
like in the previous example I'd rather see "LZDecIO_OpenFileReadHeaderDecompressAsync".
3. For function arguments, I of course rely on the browse info to tell me what the args are. Because I can see the
types of the args and their names, I want the types and names to be super descriptive. A well designed function
does not need any docs at all, the names of the args and the name of the function tell you everything about it.
3.A. It's very easy to distinguish "in" and "out" args by use of const, when the auto-complete tells you the type of
the args. Furthermore in C++ I would always make "in"
args be references and "out" args be pointers. Basically anything you can put in the types of the variables
is documentation that is visible to me right there when I use the function.
3.B. Because of this I sort of prefer a mess of bool args to enums (*) or flags. (* = an enum just to
give the bool names is nice).
If I start typing a function and the browse info pops up and tells me the args are
Okay, now we'll get into some issues that aren't so browse-info related.
4. I've always believed that API's should be obvious about correct or incorrect usage. They should be "wysiwyg" in the sense
of, if you read the code written to the API, it should do what intuitive English reading suggests it does. In particular,
if it looks right, it should *be* right. There are some ways you can fuck this up in API design :
4.A. Near-synonyms that aren't obvious. Somebody writes code like :
4.B. One of the most insiduous forms of this is API's that will still work if you use them wrong, but fall into super-low-performance mode.
Of course most of us are familiar with the terrible 3d graphics API's where just setting some seemingly inoccuous flag suddenly makes all
your speed go away (D3D has perhaps gotten a bit better about this in recent years), but it's fucking ridiculous, I shouldn't have to
profile my code every time I change a flag on CreateVertexBuffer or spend years learning about the hacks. The fast paths and slow paths
should be clearly separated and if I do something slow when using the fast functions, it should be a failure. It's much worse to give it
a slow fallback than to just make it fail.
A similar case is when API's have a lot of unclear dependencies to get on the fast path. Trying to think of an example, one that comes
to mind is stuff like the Win32 "overlapped" async file stuff. ReadFile can be async or not async depending on whether you pass an
overlapped structure (bad - I hate API's that massively change their action in an unclear way), if it is async then the requirements on the buffer and size and such are totally different (must be sector
aligned), and both the sync and async act on the exact same object (HANDLE from CreateFile) so that when you are give a file handle you
have no idea whether it is a valid async io handle or not. Similarly when you CreateFile it's totally not obvious how to make sure you
are making one that's value for async io.
Any time you are thinking about writing docs like "if you call this function and then this one, and if you pass just the right flags
and only use them in this specific way, then the entire behavior of the thing changes and does this other thing". WTF? No. That's bad.
Make that functionality a separate set of functions.
(Aside whiney rant : sometimes I feel like I'm the only person in the world who has this philosophy : When somebody uses my API wrong, I
consider that not their failing, but my failing. What did I do wrong in the API that made them think that was the correct usage?
How could I add more compile-time clarity to make incorrect usage be a compile failure? I consider it the API's reponsibility to make
it as easy as possible to use correctly, and as hard as possible to use incorrectly. Whenever I use someone's API wrong and I ask "why
is this not doing what I expect?", I generally get the response of "duh you're using wrong". When someone hands you a knife
with the blade pointed towards you, it's kind of their fault when you cut your hand.)
((though I guess that's just a sub-point of an even bigger whiney rant, which is that I often feel like I'm the only person in
the world who actually cares about their own code or takes pride in it. If somebody finds a bug in my code, or points out a flaw, or
even tells me that something could be done better, then I want to know, I want to fix it. I believe my code is great and if there is
any reason to suspect it is not, I want to remedy it. When I point out bugs or bad algorithms in others' code the response I usually
get is "meh". And on a broader level, when people ask about difficult algorithmic questions I'll usually point them at some
reference pages or some academic papers, and I sort of suspect that nobody that I've referred things to has actually ever read them;
I guess as any professor knows, you have to be satisfied with something like a 1% hit rate. When a smart kid comes to your after class
and says "I'd like to learn more about interpretations of quantum mechanics" and you are excited that a kid is actually
interested in learning, and you say "sure, I'd love to talk to you about that, here's a good introductory paper to get you started, then come and
see me at office hours", chances are you'll never see that kid again)).
When someone uses my API badly it reflects badly on my code. A lot of API's have the very bad design property that using them wrong
is much easier than using them right. If people use your API and never check error codes, maybe it's your fault because you made it
too hard to check error codes. Correct usage should be as automatic as possible.
Designing API's for yourself, or even your own company, is very different than tossing API's out into the void.
When it's for a small group, a certain amount of quirkiness is okay, because you learn the quirks and then they
aren't a problem. In fact, as many game companies sadly know, it's much more valuable to leave the quirky code
alone, because it's better to have something familiar than to "fix it" and make something that is cleaner but
unfamiliar.
A certain amount of fragility is also okay in internal API's, because if you use it wrong then you hit an assert or
a crash, and you just fix your usage. In a public API that's not so okay because the crash is down in the library
and it's not their code.
The biggest difference is that internal API's can force a use pattern. You can make an API that assumes a certain
model for things like memory allocation and lifetime management and such; you can say "use it this way and if you
try to use it other ways it won't work". With a public API you can't do that, it has to work decently any way it's
used, and different people may have very different ideas.
At RAD an issue I've never really dealt with before is that I have to be careful to design the API so it's good for *me*
as well as good for the customers. Normally I would just try to make the API as easy and good to use as possible.
In particular, something that I often do in my own code like cblib is to wrap up lots of helper functions that do all
the common stuff for you. Personally I really hate "minimal" API's where you have to write the same bunch of code
over and over to use them. If I have to use it in a certain way, then just fucking wrap that up for me and put that
in a helper.
But there are disadvantages to that for the API maintainer. For one thing, just the bigger the API is, the more work it is,
you have to document every entry point, you have to stress test each entry point, and then once it's added you can't remove
it without causing chaos.
The other related issue is API's that are dangerous or hard to use. There are some things where bugs in
usage are so likely that it's not worth exposing them.
The most obvious example for me in Oodle is all the low level lock free stuff. I personally think there's
immense value in that stuff, but it's highly unlikely that we make sales based on it, and it's very hard to
use and very easy to get weird crashes with incorrect use. Because of that, it won't be exposed (in the
first release anyway), and we're trying to make the visibility to it only through safe API's.
Another example is exposing access to internal structures. It's slightly more efficient to be able to access the
Oodle system structs directly, to read or change values, but that's also much more risky. No operating
system ever allows that, for example. The safer way is to provide accessors that change the values safely.
For example on Win32, any time you get an OS struct, you actually have a copy of the OS structure, not
a direct pointer; so first you have to copy the struct out to your own, fiddle with it, then copy it back.
This is basically the way I'm trying to go with Oodle in the first rev.
In particular I wanted to change :
1. The replace string is *not* a regexp ; in particular notice there's no \ for the parens; I had \ on the parens in the output and the damn
thing just silently refused to do the replace. So that's another hint - if you click "Find" and it works, and you click "Replace" and it just
silently does nothing, that might mean that it doesn't like your output string.
2. There's a ":i" special tag that matches a C identifier. (:i is equal to ([a-zA-Z_$][a-zA-Z0-9_$]*) ) You might think that :i is a nice
way to match a function argument, but it's not. It only works if the function argument is a simple identifier, it won't match "array[3]" or
"obj.member" or anything like that. It would have been nice if they provided a :I or something that matched a complex identifier.
In cblib/chsh, I could have just done
(in cblib a * in the search string always matches the minimum number of characters, and a * in the replace string means put the
chars matched in the search string at the same slot)
MSVC supports a similar kind of simple wild match for searching, but it doesn't seem to support replacing
in the simple wild mode, which is too bad.
I'm doing a ton of Find-Replacing trying to clean up the Oodle public API, and it has made it
clear to me how fucking awful the find-replace in most of our code editors is.
I
wrote before
about how "keep case" is an obvious feature that you should have for code find-replace.
But there's so much more that you should expect from your find-rep. For example :
1. I frequently want to do things like rename "VFS_" to "OodleVFS_" , but only if it occurs at the beginning
of a word (and of course with keep-case as well). So "only at head of word" or "only at tail of word" would
be nice.
2. All modern code editors have syntax parsing so they know if words are types, variable names, comments,
etc. I should be able to say do this find-replace but only apply it to function names.
An extremely simple "duh" check-box on any find-replace should be "search code" and "search comments". A
lot of the time I want to do a find-rep only on code and not comments.
An even more sophisticated type-aware find-rep would let you do things like :
That sounds like rather a lot to expect of your find-rep but by god no it is not. The computer knows how
to do it; if it can compile the code it can do that find-rep very easily. What's outrageous is that a
human being has to do it.
3. A very common annoyance for me is accidental repeated find-reps. That is, I'll do something like
find-rep "eLZH_" to "OodleLZH_" , but if I accidentally do it twice I get "OodlOodleLZH_" which is
something I didn't expect. Almost always when doing these kind of big find-reps, once I fix a word it's
done, so these problems could be avoided by having an option to exclude any match which has already been modified in the
current find-rep session.
4. Obviously it should have a check box for "ignore whitespace that doesn't affect C". I shouldn't have to
use regexp to mark up every spot where there could be benign whitespace in an expression. eg. if I search
for "dumb(world)" and ignore C whitespace it should find "dumb ( world )" but not "du mb(world)".
etc. I'm sure if we could wipe out our preconceptions about how fucking lame the find-rep is, lots of ideas
would come to mind about what it obviously should be able to do.
I see there are a bunch of commercial "Refactoring" (what a retarded buzz word that is; it's cleaning up code) tools
that might do these type of things for you. In my experience those tools tend to be ungodly slow and flakey; part
of the problem is they try to incrementally maintain a browse info database, and they always fuck it up. The compiler
is plenty fast and I know it gets it right.
(chronological order, so the best stuff is at the end, and early stuff may
be corrected in later posts)
2009-02-26 - Low Level Threading - Table of Contents
I will use
an
example that I used before , a very simple "futex" (not really) waitset based exchange-mutex.
I'm gonna use the exact same code here as I did there, but I'm putting the futex_system ops into
the mutex to make it all a bit leaner :
(mutual exclusion is guaranteed by this code by our actions on m_state , which also provides the necessary acquire & release
to make the mutex valid. So when I say it "doesn't work" it means the waitset interaction with the mutex doesn't work, eg.
we deadlock by failing to wake a waiter, it's a "missed wakeup" problem).
(irrelevant aside : if you want this to be a real mutex implementation, then the lock() operations on m_state
should probably be acq_rel to prevent mutex overlap deadlocks; see
this blog post
among others; but the purpose of this post is not to make a real mutex, it's to demonstrate an issue, so let's
not get bogged down)
In brief, the problem is that the unlocker at *2 can load a waiter count of 0 (and thus not signal), even though the waiter
has passed point *1 (and thus count should not be zero).
The bad execution goes like this :
So clearly we want a #StoreLoad between 5 and 6 to prevent that load from backing up. You cannot express that in C++0x and that's
what I meant in
this original blog post when I
said that the C++0x seq_cst fence is not really a StoreLoad and there's no way to really express this kind of StoreLoad in C++0x.
Specifically, just adding a seq_cst fence here where you want a StoreLoad does not work :
Note : I believe that on every real CPU, putting a MemoryBarrier() there where you want the #StoreLoad would make this code work.
This example is actually very similar to the Peterson lock we saw in part 2.
Boiled down, the problem is like this :
To be very clear, this code works :
But what if we remove the need for one of the threads to have a seq_cst fence :
Take away : atomic_thread_fence(mo_seq_cst) does NOT really act as a #StoreLoad. It can be used in spots where
you need a #StoreLoad, but only in the right situation, eg. if it has the right other stuff to synchronize with.
So, getting back to our futex_mutex example, you can use a seq_cst fence at (*2) to act as your storeload, but
only if you also have one at (*1) for it synchronize with :
Or, alternatively, if you leave the fence out of lock(), you can fix it just in unlock by changing either the store
or the load into an RMW :
(variant "banana" appears to work if *3 is only mo_relaxed, which is a bit of a mystery! We'll leave that as
an exercise for the reader). (update : it doesn't actually work, see comments).
(NOTE : I am talking about what a C++0x fence is guaranteed to be by the standard; at the moment we
will not concern ourselves with the fact that at the moment a C++0x fence always actually issues a CPU
memory barrier which is somewhat stronger than what C++0x promises; I have no doubt that CPUs and compilers
will change in the future to be more aggressive about allowing relaxed memory ordering).
The C++0x fence is not visible to other threads unless they specifically schedule against the fence.
That maybe sounds obvious if you are thinking in terms of the C++0x definitions, but it's not true of
a real CPU fence.
A very common example in old-school code is to use a CPU barrier for publication. Something like this :
(aside : this is of course a terrible way to do publication in the modern world, you should just use a store-release and
load-acquire pair to do publish and consume; alternatively if you publish a pointer, then you don't need any synchronization
at all, because causality and load-consume takes care of it for you).
Okay. We can see the exact same thing in a more complex example.
This is
Dmitriy's example
Peterson Mutex . Here's my test harness :
The failure is the same kind of thing as the first trivial example; all current CPU's have ordering relations that
are stronger than C++0x. In particular the bad execution case that's possible in C++0x (when barrier is a
seq_cst fence) goes like this :
So part of the problem is C++0x doesn't count stores in the linear "modification order" of an atomic object.
So the easy fix to ensure the "is after" relationship above actually happens is to change the store to turn
into an RMW :
Some links on the topic :
Subtle difference between C++0x MM and other MMs
(caveat : I'm no standards-reading afficionado, and I find the C++0x rules very odd, so this is as much me learning out loud as anything).
I'm going to do this post a bit backwards; first some random notes.
1. I never use C++0x fences. In all the lockfree atomic code I've written I've never used them
or wanted them. I put the memory ordering on the operations; it's easier to understand and usually more efficient
(because doing something like making an exchange be acq_rel is a no-op on most platforms, exchange is already acq_rel, whereas adding
another fence requires another hard sync to be issued because compilers are not yet sophisticated enough to merge atomic ordering
operations).
The only case I have ever seen where a fence is better than putting the constraint on the operation is for the optimization of doing
a relaxed op to detect that you need the ordering. Something like :
2. The way to understand C++0x fences is to forget everything you know about CPU's, caches, what the CPU
memory fence instructions do, any connotation in your head about what "barrier" or "fence" means. The people who are most confused about it are the
ones who had some background writing lockfree assembly in the pre-C++0x days, because C++0x fences are really not what you are used to.
What C++0x fences really do is provide more sync points through which other C++0x atomic ops can create "happens before" relationships.
You can heuristically think of them as modifying the neighboring ops, not as being an independent operation themselves.
In particular :
Section 29.8 of C++ doc N3337 :
29.8.2 :
A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both
operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before
B, and Y reads the value written by X or a value written by any side effect in the hypothetical release
sequence X would head if it were a release operation.
(just using a store-release and a load-acquire is the normal way of doing a publish/consume pattern like this)
29.8.3 :
A release fence A synchronizes with an atomic operation B that performs an acquire operation on an atomic
object M if there exists an atomic operation X such that A is sequenced before X, X modifies M, and B
reads the value written by X or a value written by any side effect in the hypothetical release sequence X
would head if it were a release operation.
29.8.4 :
An atomic operation A that is a release operation on an atomic object M synchronizes with an acquire fence
B if there exists some atomic operation X on M such that X is sequenced before B and reads the value
written by A or a value written by any side effect in the release sequence headed by A.
That's about for interesting stuff in 29.8 , there's a bit more in 29.3 on fences :
29.3.5 : For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there is
a memory_order_seq_cst fence X such that A is sequenced before X and B follows X in S, then B observes
either the effects of A or a later modification of M in its modification order.
This sort of says that a seq_cst fence acts as a #StoreLoad. Note however that both A and B must have "happens before/after"
relationships with the seq_cst fence. If only one of them has that relation then it doesn't work. We'll revisit this in
the next post when we talk about how seq_cst fence doesn't behave exactly as you think.
29.3.6 For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there
are memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B,
and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification
order.
Well, duh.
29.3.7 For atomic operations A and B on an atomic object M, if there are memory_order_seq_cst fences X and Y
such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B occurs later
than A in the modification order of M.
There are two forms of very strong ordering guaranteed in C++0x. One is the total order (called "S") of seq_cst ops
(aside : I believe that seq_cst fences count as independent entries in the list S, but I don't see anywhere that the standard
actually says that). The other is the modification order of an atomic object. There is guaranteed to be a single
consistent modification order to every atomic, though not all threads may see the same order, depending on how they view it.
Furthermore, any Store without a read (not an RMW) clobbers the modification order, because it wipes out the history and makes
it impossible to know what the order before the store was. But you can say something strong : as long as you only use RMW ops
on a variable, then it has a single global order of changes, and every thread can only view those changes in order.
eg. say a bunch of threads are doing RMW Increment ops on a shared variable. Then the variable must go {1,2,3,...} in its timeline,
and the threads that do it can only see an ordered subset of those values, eg. {1,3,4,8,...} but never {3,1,7,4,....}.
Anyhoo, 29.3.7 is just saying that the total order of seq_cst ops (S) and the total order of an RMW sequence can order against each other.
Some of the standard document points are so redundant, I'm surprised they don't have an entry for the converse :
Okay, so next post we'll look at what the fence is *not*.
Some common examples : profilers that push/pop scopes, logging scopes such as setting tab depth or subsystem, mutex lock scopers,
exception handler stacks, etc.
Basically the model is at the start of some function you push something on your parallel stack, and pop on exit.
I've only recently realized that this is bad programming. The problem is it's redundant with the execution stack,
and any time you have two separate subsystems which must be exactly in sync to work correctly, you have bad code.
In particular, it's (unnecessarily) fragile.
To be concrete lets consider a "stack" that's just an integer counting the depth for something. You might
naively maintain it like :
But really the scoper is just putting a band-aid on the problem. There is a whole other class of bugs that you cannot fix
that way - what if Func2() is written wrong and has a mismatched push/pop ? It will fuck up the stack to you. The entire
mechanism is fragile because it is vulnerable to inheriting mistakes in any child code.
Now, sophomoric programmers are thinking "of course it inherits mistakes from calling broken code", they think that's just
the way code is. They think the way you make working programs is by finding every single bug in every possible execution
path and fixing them all. They think fragile code that has to be used "just so" is fine, because they will ensure that the
code is used in exactly the right way.
This is very wrong, and it's why so many programs suck so bad. It comes from a great arrogance that is very common in
programmers from beginners to experts. It's the (untrue) belief that you can write bug free code. It's the belief that
fragile systems without fail-safes and inherent self-protection are okay because you are so great you will use them
correctly.
It becomes obvious if we add the invariant checks that
test program correctness :
But if that's the requirement, why not just do that directly? It's much more robust, it's not sensitive to unmatched push/pops in the children.
So instead of Push/Pop we should just do Push/Restore :
Anyhoo, our s_depth example is so trivial let's do a slightly more complex one, a profiler :
Anyhoo, this is very standard, and it's sort of okay, but it's fragile. The issue is in Profile_Pop() - you are making a
pure leap of faith that back() is actually the element you should be popping.
(in particular, a very common source of bugs or fragility in this type of code is if the Profiler can be enabled & disabled;
even if you always use a Push/Pop scoper class to avoid unmatched pairs, people can enable/disable anywhere and that creates
a mismatch (assuming you don't continue to do the push/pops when disabled)).
A better way is Push/Retore :
What is the fundamental principle that makes this so much more robust that doing push/pop? It's elimination of
parallel states that must be kept in sync. That's one of the greatest causes of bugs and should be done
whenever possible; I don't do it enough, it's so tempting sometimes and you don't really see the necessity of it
(because we all have Programmer Arrogance that makes us think we can keep it sync correctly).
eg. you constantly see code like :
That's obvious, but the stack is the exact same situation. The thing is, the program is maintaining its own stack, the execution
stack, so any other stack that mirrors the execution stack is redundant state. You should just use the execution stack,
which is what Push/Restore does for you.
1. Make all mime types always just download. Forever. MP3, PDF, whatever the fuck, I never want any fucking
handler to do anything with it. Just download it. I can't figure out a way to make this happen; I feel like
the mime type database must be a text file I could just edit and set everything to =download but I can't find
any info about that on the net.
2. Never ever do anything to my windows without asking me. Don't close them, resize them, move them, and
certainly don't pop anything up.
3. No mouse-over popups. This is some new shit that is causing me constant frustration. Jesus christ putting
mouse-overs on web pages is such a stupid broken idea. There's a bunch of web sites that I just can't use
any more because of mouse-overs.
I'm currently using Firefox 2 and will be very sad when I have to switch to Firefox 3 because of HTML5 or
whatever fucking new thing that makes the internet worse but which I cannot realistically opt out of.
In other "god the web is stupid" ranting, fucking forms that change as you enter values should be illegal.
No, you are not fucking helping me. Like credit card forms that detect your type of card and then add the box
for the CVV number with 3 or 4 blanks. No! You are not helping! Just make a static form.
Don't you
realize that the web just doesn't work? Keep everything as simple as possible, because making it complex
just makes it broken. Also, your UI design skills are rudimentary at best, so don't try anything ambitious.
How about you just make a static form, and if I fail to fill in a required field you don't blank the whole
thing out, mmkay?
Anyhoo, the latest trend that is fucking boiling my brain almost daily is this fucking "make your password
strong" bullshit. It seems like every web site I go to these days wants me to change my password to be
stronger, with some cock-ass idea of how to do that.
Of course every site has different ideas about what that should be, such as "must contain a space or !"
while another is "must not contain a space or !". So I can't just use the same password in lots of places,
I have to generate one per site and write them down, which of course makes them extremely non-secure.
The worst of course are the places that don't tell you
exactly how to generate a strong password, so you try some shit and they say "too weak" and you try some
other shit and they say "funny character not allowed!" so you smash your keyboard and then try backing out
the weird characters and wind up binary searching a few times through annoying forms until you get it.
Look you fuck-heads, why don't you just auto-generate
a random password for me that conforms with your rules, and then I can write it down on my desk and we can all pretend things are more
secure even though the hackers just steal your whole user database on a near-daily basis and it doesn't matter how many numbers are
in my password anyway.
In other computer broken news :
Somehow in DevStudio I accidentally hit "Alt-P-O" and didn't realize it. (I think all complex programs should
have a log console of all your key and button presses and what they correspond to; often in Photoshop or whatever
behemoth program I'll be pootling along and all of a sudden it does something wacky and I'm like "wtf what did I hit";
sometimes it's even a good result and I have no idea how to make it happen again). Anyway, apparently Alt-P-O
turns on "Show All Files" in the solution explorer which is a feature I didn't even know existed. So all of a
sudden a bunch of new crap appears to be in my project, and I spent a few hours deleting my NCB's and doing P4
reverts and trying to figure out WTF happened before I discovered this feature.
I've had a problem for quite a while where "Open With" didn't work; if I browsed for a program to do the open-with,
it would let me pick it, but then just not open it. I finally got annoyed enough that I went and fixed it :
OpenWithAdd : Registering programs with the Open With dialog
Probably just using the "OpenWithAdd" utility will work if you have this problem. But fundamentally the problem
arises because the registry gets itself fucked somehow. There are two different ends where it can get fucked and
both seem to occur and both can cause this problem. Say you are trying to register .FLV to open with VLC. The
registry for .FLV can get fucked and cause this, and there can also be a registry for VLC that gets fucked and
causes this. Best bet to clean it up is just to delete everything in the registry for the file type .FLV and for
the program VLC.
(the OpenWith command doesn't actually go directly to the program path that you give it, that would be too easy
and robust; instead when you say "OpenWith" "VLC" what it does is go look up a registry entry for "VLC" which tells
it where to get the exe and how to pass the args; that entry is what gets fucked somehow)
I still use Win XP at work (and Win 7 at home), and literally the only thing from Win 7 that I miss on the XP machine
is the text-search in the start menu. I'm sure that could be added to XP and it would fully satisfy me. God damn
the fucking control panel on Win 7 got so much worse, it's so fucking hard to find any advanced settings any more,
and everything is split into a bajillion different pages. The typical thing I want to do when I get a new machine is
to turn off all skins, turn off all graphics and backgrounds, turn off all cleartype, animations, special effects, etc;
on Win 7 it takes me an hour of fumbling around in a ton of different menus to find all that stuff. There should be
one single "display settings" page which has *everything* you can change about display settings all together on one page.
So far the main thing I've learned is that prior to this I knew nothing about carpentry. I've made some
little things in the past, but it was always just the totally uneducated "hey I'll stick some wood together
and nail and screw it semi-randomly".
In reality the way modern American carpentry works is very systematic; the 2x4 stud and the 1x2 furring strip
are not like multi-purpose legos that you can just fit together however you want; you use them in specific ways
(and in fact the "stud" is not just a piece of wood, it is specifically engineered to be very strong in
compression to bear the loads of stick framing, and not necessarily strong in other ways). You use specific
sizes of nails for each specific task.
Some notes :
Garden beds are four 4x4 cedar with through bolts. I'm pretty happy with them in general but if I were to do
it again I would not do the "high raised bed" style and instead just use two 4x4 posts (which gives you about a 7"
raised bed, since a 4x4 is actually 3.5" , and you could use standard 8" carriage bolts in that case, I had to
mail-order 15" bolts (actually better than carriage bolts are "timber bolts" which are just carriage bolts with
special cutting flanges that hold the bolt better in softwoods like cedar)). Be careful when tightening bolts on
wood, you do not want them to be really tight, just sort of lightly hand tight. To make the nut stick you may need
two nuts that you tighten against each other.
Kitten loves the cat tower, and the fibrousy wallpaper stuff on it was pretty successful (she loves to climb
straight up it, rather than jumping between the platforms usually), but it is breaking down; I'm not sure if
there is any material that would be more resilient, perhaps some kind of woven bamboo mat? She can knock it
over with her crazy stripper pole moves, we currently have it stabilized by leaning in a corner of the room;
to really make it self-stable would require a really wide base or a lot of weight at the base, I think.
Potting table is standard apron-style table; it's 8 feet long so I used 2x6 apron rails. I pocket-holed the
table top boards together, which I would not do again. If I made another table top I would probably just
try to glue it (I don't have the clamps for that kind of glue-up at the moment);
with an apron-style table the top is not really load-bearing anyway, the apron is the crucial
structure, so the strength of the top joinery is not important. One thing you might not think of is the
attachment of the top to the apron needs the ability to move side to side (but not up and down), so drill the
screw shank holes oversize. (apron-style tables are super duper easy to make)
I only made the vertical trellis, the top part was already there, but with no way for a vine to reach it.
One of my house pet peeves is when people have a trellis with nothing on it, so I had to do something about
that. (one of the more retarded fads in house style was the "trellis top" fences that everyone was getting
5 years; hardly any of them actually have anything growing on them so there're all these stupid looking
empty trelli in the fancy neighborhoods of Seattle). The only tricky thing about that trellis was trying to
do the anchors to the house correctly. Piercing the skin of a house is almost always a bad idea and has to
be done with care to avoid creating a water-incursion point. I used through-bolts to a block inside the
sheathing. The bolts are angled down to reduce capillary action, and hopefully the caulk does the rest; the
holes for the bolts must be over-size to avoid cracking the siding with seasonal flex. One mistake I made
was running the bolts from the outside to the inside; I should have done it the opposite way, which would
have let me install the block and bolts, and caulk and paint that assembly and let it fully dry before
attaching the trellis.
Chicken coop - urr various things I would probably do differently here. It's over-engineered in some ways
(more structurally sound than necessary for the weight of chickens), but also missing some important features,
like a heat lamp (addendum : heat lamp was easily added via extention cord) and
some easier access points. It currently has an egg retrieval door, a chicken ramp door,
and a removeable wall and roof for cleaning access, but I think I have to add another door for human play time
access. The run is also small compared to the coop, I may have to make a bigger run some day. Maybe the main
annoyance at the moment is you have to get inside to open and close the chicken ramp door; when I made it I
thought we would basically just leave that door open all the time, but we've been closing it to help them stay
warm at night, and it's a pain to get in to open and close it.
Most of my funky glazing experiments are failures, but notes on a few that I think worked okay :
Wacky layering of shino is great. You can also put it on super thick then scrape away some to create variation.
An even more subtle way to get variation is to paint the bisque with irregular strokes of water before dipping
in the glaze, because water changes the glaze adherance rate.
Rutile or iron-ox over glaze makes subtle/natural looking variations.
"Peacock eyes" on the outside of the red bowl was pretty successful. This was done by first doing the blue
glaze, waxing over the eye shapes, then dipping all in red. You could use the same technique for things like
clean-edge bowl rim bands.
Carved ridges on outside of small bowl does nice stuff for tenmoku.
(aside : I believe the proper way to build a Hawaii house (especially a vacation house that you aren't living in
all year) is pretty much a roof with no walls. The weather is so good year round that you basically never need
protection at all. I think the correct construction method is Japanese style, that is posts & beams that support
the roof, and then any walls you may have (perhaps wood louvers) are not structural. The vast majority of houses
in Hawaii are American style "stick framing" where the walls carry the roof, and they have windows that seal and
air conditioning and all that vulgar tacky suburban shit that you don't need).
The pictures :
I wrote a little program to make randomized two-column image layouts. Spring is just starting; we've got lots
of bulbs coming up, a few blooming, and the flowering trees are just starting. I think in a few weeks the real
spring bloom will kick off and Seattle will be at its most beautiful.
Anyhoo, here's a tour of early spring around our house (featuring special guest star : the crazed kitten) :
Are you just sitting there trying to let gravity pull the poop out? You know you have to squeeze, right?
There are these very simple acts of daily life that we never get taught, and sometimes people somehow
fail to grasp how to get them right. For example, up until quite late in my life, maybe high school or
so, I didn't know how to drink liquid correctly. I would just pour the liquid into my mouth as if my mouth was
a glass, and then I would close my lips and swallow; the result was that I frequently swallowed air with
liquid and drinking anything would give me burps. Some time in high school it finally clicked that you
have to actually suck as you drink and form a vacuum so that a stream runs down your throat, you don't
just swallow a parcel of it. So maybe you long-sitters just don't know how to poop.
Whenever I go to someone's house and see a bunch of magazines and books in the bathroom I'm a bit freaked out
and then take extra care not to touch anything in their bathroom. There should not be paper or any other
non-waterproof product in a bathroom; ideally a toilet room should have a drain on the floor and tile walls
so you can just hose the whole thing down.
The only time I've felt the desire to hang out in the bathroom for longer than the bare minimum of time was
once when I had house guests staying over, and I was getting quite fed up with them (not that there was anything
wrong with them per se, but after a few hours of entertaining people I'm ready to lobotomize myself). I randomly
needed a poo and had a sit down and was going to do my usual one minute express, but I suddenly realized, whoah
this is nice, I'm alone, nobody is trying to talk to me, I can just chill, and I sat there a few minutes and
relaxed. So maybe you toilet campers are doing this all the time, hiding out?
I do think I have seen that behavior in married men. You can see in certain married men (especially with kids)
a desperation to get some alone time. If you mention some group activity that their wife wouldn't want to do
they instantly are like "oh yes please I'd like to do that". I see them at our local neighborhood pizza place
waiting for take out; the desperate married men always show up early and just hang out in the bar for a while.
In other random bathroom ranting, I think that America has the worst designed bathrooms, perhaps in the whole
world, certainly in the civilized world. The bathroom is not an appropriate place for carpet. The toilet
should not be in the same room as your shower/bathtub and makeup/medicine cabinet, the toilet should be in a
little closet all by itself. Almost every other country in the world gets this, but for some reason Americans
just love to hang out in a room full of shit and piss.
I think the 80's were probably the pinnacle of terrible bathroom ideas in America, when people were actually
putting plush covers on toilet seats (not even just the top part, but the actual horseshoe sitting part).
Here's an example of the retarded type of bathroom you see in America :
Actually that one is not even close to the worst because it actually has tile on the walls, a lot (probably a majority)
of american baths have drywall walls which is super fucking retarded. I know, let's put some porous, absorbant, non-waterproof material in the
bathroom! good idea!
Even aside from the specific horrible design choices, the whole trend of spending a bunch of money to make your
bathroom fancy is so fucking retarded. Of course most home improvement that the suburban wives do is not
actually connected to functionality or even their own taste in any way, it's entirely based on what is
popular at the time, and a kind of competition. Everyone is just so fucking disgustingly mindless, nobody has their
own ideas, they don't think for themselves and do what they actually want; you can watch the builders around town and they all wind up doing
the same thing, and then the fad changes and they all do the new thing; like for a while recently the big fad is
chef's kitchens; it doesn't matter that the family basically never cooks and has no idea how to use a knife, yeah we
need the fanciest chef kitchen; right now around Seattle I'm watching horizontal board fences go up at a blistering
pace, it's suddenly become a mass fad and everybody needs a fucking horizontal plank fence, which will look tacky and
dated in a few years (and actually usually already looks bad, because it looks fucking awful to put one modern-style
feature on an old house that doesn't match it at all).
This hard work for someone else principle is huge in domestic bliss, from the smallest level to the overall.
On the small scale, even tiny things like doing dishes are so much better if you feel like the family
appreciates it, if they acknowledge and are kind about your contribution to the group. Of course children
and husbands are consistently dickish about this, they don't want to give the respect and status elevation
associated with gratitude.
Of course if you are a good person you should gladly let the other person have the pleasure of doing something
for you. The tricky thing is knowing when they want to do it, and when they are just offering because they
think you want it. When someone offers to do something for you, and you don't really particularly care,
it's super dicky to just say so. One of the great social disfunctions of the nerdy computer guy is being too
literal in conversation. You may think you are being clear by listening and responding to the content of the
words, but in fact you are being a huge dick by not picking up what they really mean.
(in fact, often people
use this as a way of intentionally being a dick; for example when someone is clearly trying to tell you
something but they just can't say it directly, and you respond with "what are you saying" or "if you want
something just tell me" or whatever, it's not their fault for failing to spit it out, it's your fault for failing
to pick up the very obvious message that they aren't saying.)
Back to the case in point, when someone asks you "hey I was thinking of doing this thing for you which is quite
a lot of work for me, would you like me to do it?" you are now faced with a puzzle (assuming you don't really care
about the actual work product very much). Are they just offering for your sake (and would be happy if you said
no, don't do it), or are they offering because they want to do it, and they want to do it for someone. eg. when
your auntie offers to bake you cookies it's actually because she wants to bake cookies for you, it's not about
what you want. In some cases it's tricky because it's hard to tell what's actually going on with their intention.
You also can't just ask, of course, like "are you offering for my benefit, or do you just want to do that?"
because people are not so self-aware, nor are they so open in their admissions.
Even once you are sophisticated enough to navigate that minefield, the next step is equally treacherous, when
you get to the details of the act. Depending on whether you think the person was offering for your benefit or
their benefit, you have to tailor your requests to what *they* want to do, even though they will ask you what
*you* want to do. eg. when auntie offers to make you cookies, she will ask what kind; this case is easy because
you know it's for her benefit, so you should choose a type of cookie that she likes to make; in some cases she
has a "famous" variety (it's pretty fucking hillarious how many American secret family recipes are actually
just the Betty Crocker recipe or some shit like that).
As another example, if I offer to throw a pot for someone, it's not because I think they really want me to, it's
because I want to throw a pot, and it's nice to do it with someone in mind. Your only task in the job is to
act grateful. You have about 30 seconds of work to do, if you aren't actually grateful you have to put on a smile
and say thanks; it's just outrageously dickish if you fail to do that 30 seconds of work for me. (the same is
true of most present receiving of course). Furthermore, if I offer to throw you a pot and you know I make sort of
rustic ceramics don't say "what I really like are the cast art deco ceramics" ; you fucking asshole, if the
conversation was entirely literal that would be a fine response, but of course it's not; the subtextual conversation
went something like "I want to make a pot, can I pretend it's for you?" and you responded with "you can make it,
but I won't let you pretend that I'm happy about it, and I'm going to belittle your work first". That example
may be overly obvious, but a lot of the time people tell themselves they are being constructive or helpful by
trying to push you with weird requests, or by pointing out ways your stuff could be better; that in fact is not
helpful and is probably even more dicky (and in fact I believe they subconsciously know they are being dicky and
it's sort of intentional as some sort of dominance bullshit).
Okay. So in real life you can never be completely sure what the subtext is. You can only make a probability
estimate; you aren't entirely sure what the other person's intentions were. Then you have to do an analysis
like a poker player. Human conversation is a game of imperfect information. Let's outline the game in this case :
Anyhoo, aside from the sort of micro, conversational cases like this, this principle of "doing it for myself but
pretending to do it for you" is one of the primary macro forces in relationships.
The most obvious example is the stereotypical male/female family roles. The man goes off to work and puts in
lots of hours and grinds himself down "for her", and the woman stays home and cleans and raises the kids and
dreams up home improvements "for him". Of course in reality, the other person typically doesn't actually
want that. eg. she would be happier if he didn't work so hard and come home so tired and grumpy, and she would
be just fine if they made a bit less money. He in fact is not doing it for her, but because he wants the
challenge, or wants the good job, or thinks he's supposed to for some reason or whatever. But he can be
much happier if he is allowed to pretend that it is "for her". In a healthy relationship, she would allow him
to have that illusion and support him to some extent (though in a truly healthy relationship you should
also be able to have the conversation about "hey you're working too hard and it's not actually for the
family's benefit" if it gets out of hand).
(there's an obvious sort of "Gift of the Magi" in the old 50's stereotype family roles; the man works his ass
off "for the family" (not really), which just alienates him from his wife, she wishes he would come home with
more energy for her, while the wife cooks and cleans
and pretties up the house, which just annoys the man, he wishes she would just relax and leave well enough alone).
This macro feeling that what you're doing is "for the family" shouldn't really be in your mind every day
(if it is, you might be one of those narcissistic psychopaths who constantly talks about how they would do
anything for their family, including fucking over anyone outside their family, etc. (I recall reading about
the Costa Concordia accident where some douchebag was proud to say that he had punched his way to a lifeboat
to secure a seat for his child; it's sort of weird that these dangerous psychopaths are admired by a decent
chunk of society (sort of like the surreal psychological dissonance I feel when I see admiration for
someone who is just so obviously overtly evil like Dick Cheney or Rupert Murdoch))) - but it is a nice
undertone to your whole life. Once a week you have a moment where you think "fuck, why am I doing all this?
I'm so tired, maybe I should just run away to Asia" and in that moment you can think "it's for my family"
and it makes you feel better. It's incredibly dickish (and a failed relationship) if the other person
doesn't allow you to feel like you are making a contribution to the unit.
(it would be nice if there was syntax coloring for deeply parenthesized writing, I might do it more)
When one party complains about their workload in the partnership, often what it really means is that they
aren't getting enough gratitude. In our stereotypical sexist couple, when the man comes home and complains
about how hard work is, the correct response is not "well maybe you should work less, I don't want you to
do this, don't pretend it's for me". When the stereotypical woman says "I'm sick of doing the dishes all
the time" or whatever, there is some aspect of literalness in that, but the more important subtext is
"I'm not receiving
sufficient gratitude (or sufficient cohesion) to have the pleasure of feeling like I'm doing this work for our
family unit".
Semi-related topic : a lot of the work that people consider "hard" is not actually. eg. things like
being in the Army, or raising a child. They may be difficult, they may be grueling, they may be exhausting;
but if you feel like
your work is for a reason, if it's very clear to you what you are supposed to do, and you just go do it,
and then you feel like it was for good - that's easy. That's the easiest fucking thing in the world.
It doesn't matter how tiring the actual work activity is. When someone hands you a todo list and says do
this then this then this, that's so peaceful and relaxing and easy. When you wake up in the morning and
you know your todo list for the day (feed the kids, take em to school), that's easy.
The truly difficult thing is doing work that is only for yourself, that you're not really sure if it's
a good idea or not, and you're not really sure if you're doing it the right way. That is fucking horrible
and exhausting. Every few minutes you have to stop and think "wait, why am I doing this? maybe I should
stop".
A lot of people get themselves through life by making up goals that they "have to" do. I'm not really
sure how self-aware the average person is (for example, do bird watchers know that bird watching is fucking
retarded? it is nice to get outside, and it's nice to have a reason to get outside, and it's nice to make up
an excuse about why you "have to" go to the jungles of New Guinea, but doing it to see some bird is fucking
ridiculous, right? do bird watchers actually realize why they are doing? and how arrogant am I that I think
I know more about other people's real motivations than they do?) - but clearly people do this to themselves.
Like I "have to" lose weight for my bikini vacation, or I have to do this kitchen remodel; it's like they
create this task for themselves, and then for a while they act like it was an order given from on high that
they cannot question, and that makes life easier because you just do that task and stop facing the
cripling void of self-determination.
(another aside on bird-watching : in general I think the people who get obsessed with something minor
which obviously is not deserving of that obsession are very silly; like triathletes, or hikers around
here who try to "bag all the peaks", or people who travel around the world for one specific weird thing
like birding or whatever. But in the end it seems to work for them in terms of quality of life. That is,
I think it would be more reasonable if instead of training for your triathlon (which will often be
miserable exercise and is not particularly good for your body), you instead just did some pleasant
exercise that was actually more useful. The problem is that if you are not a wacko over-committed person
who has this invented "has to", you just won't do it. Like I'm sure I would be happier if I went on a hike
every week, but since it's just up to me, and I'm doing it only because I choose to, I wind up not doing it
very often; on the other hand the goof-ball who decided he "has to" bag every peak in one year does do the
hikes, and while that specific goal is retarded, it does force him to get out there. Obviously the
standard "everything in moderation" is sound in theory, but in practice it just doesn't work, because if
you are reasonable and moderate, it's hard to get out there).
(another example to beat this point to a pulp : it's easy to make fun of the "freds" on bicycles who
track their heart rate and buy the fanciest bike shit, and generally over-obsess and make up goals for
no reason; it would be nice if you could be more reasonable, you get just as much exercise on a slower
bike, you don't need to fucking test your blood sugar if you are a recreational rider. It's very easy
and trite for people like Bike Snob or Grant Peterson or Me to make fun of the freds and say that you
should just "go ride". (as another example, obviously doing something like "bike to work month" is silly;
if you want to bike to work, you can make that decision for yourself each day of the year, there's no
reason to follow some organization's choice of month). However, if you are just reasonable and moderate
and make your own decisions, you simply won't do it. It's too hard to make the right decision all the time,
you won't have the motivation, you'll get lazy. In the end when you see the people who are out there
riding at 6 AM in the rain, it's the nut-jobs, it's the people who invented some silly "have to", and in
the end they are happier for it)
(of course the best form of made of "have to" is one that's at least semi-useful; like I "have to study hard
and get into a good grad school"; having that goal can make the work fun and rewarding for a while; of course
you eventually get disillusioned with that goal and realize it was perhaps all pointless; but the correct
response is not to just give up on goals, but rather to invent a new one. (sometimes I try to get myself
excited about some kind of made up goal, like I "have to" learn to fly an airplane, or I have to take a lot
of artistic photographs, or I have to write some music, but it's hard for me to sustain the mental illusion).
Anyhoo, the relation is that some people use "do it for the family" as one of these made up goals to
make their life easier. Like I "have to" get an SUV so I can drive all these kids around to their soccer
and piano lessons and so on. Well, no you actually don't have to, but if it makes your life easier to
pretend that you have to, then okay. This form of "do it for the family" is NOT what I was talking about
in the main part of the post, but it is semi-related.
01-18-11 - Hadamard
06/2011 to 04/2012
template
I could just have stuff<typename t1,typename t2,typename t3,typename t4>
struct stuff
{
t1 m1;
t2 m2;
t3 m3;
t4 m4;
};
09-24-12 | LZ String Matcher Decision Tree
Revisiting this to clarify a bit on the question of "I want to do X , which string matcher should I use?"
cbloom rants 09-23-11 - Morphing Matching Chain
cbloom rants 09-24-11 - Suffix Tries 1
cbloom rants 09-24-11 - Suffix Tries 2
cbloom rants 09-25-11 - More on LZ String Matching
cbloom rants 09-26-11 - Tiny Suffix Note
cbloom rants 09-27-11 - String Match Stress Test
cbloom rants 09-28-11 - Algorithm - Next Index with Lower Value
cbloom rants 09-28-11 - String Matching with Suffix Arrays
cbloom rants 09-29-11 - Suffix Tries 3 - On Follows with Path Compression
cbloom rants 09-30-11 - Don't use memset to zero
cbloom rants 09-30-11 - String Match Results Part 1
cbloom rants 09-30-11 - String Match Results Part 2
cbloom rants 09-30-11 - String Match Results Part 2b
cbloom rants 09-30-11 - String Match Results Part 3
cbloom rants 09-30-11 - String Match Results Part 4
cbloom rants 09-30-11 - String Match Results Part 5 + Conclusion
cbloom rants 10-01-11 - More Reliable Timing on Windows
cbloom rants 10-01-11 - String Match Results Part 6
cbloom rants 10-02-11 - How to walk binary interval tree
cbloom rants 10-03-11 - Amortized Hashing
cbloom rants 10-18-11 - StringMatchTest - Hash 1b
cbloom rants 11-02-11 - StringMatchTest Release
09-23-12 | Patches and Deltas
A while ago Jon posted a lament about how bad Steam's patches are. Making small patches seems like something
nice for Oodle to do, so I had a look into what the state of the art is for patches/deltas.
ngramhashing - Rolling Hash C++ Library - Google Project Hosting
A new 900GB compression target
Patchdelta compression
Remote diff utility
SREP huge-dictionary LZ77 preprocessor
Long Range ZIP – Freecode
About Remote Differential Compression (Windows)
There is a Better Way. Instead of using fixed sized blocks, use variable sized b... Hacker News
bsdiff windows
ZIDRAV Free Communications software downloads at SourceForge.net
Binary diff (bsdiff)
Data deduplication (exdupe)
xdelta
Tridge (rsync)
09-22-12 | Oodle Beta and Roadmap
Oodle went Beta a few weeks ago (yay). If you're a game developer interested in Oodle you can mail oodle at rad.
09-22-12 | Input Streams and Focus Changes
Clearly apps should have an input/render thread which takes input and immediately responds to simple
actions even when the app is busy doing processing.
09-15-12 | Some compression comparison charts
These charts show the time to load + decompress a compressed file using various compressors.
09-14-12 | Things Most Compressors Leave On the Table
It's very appealing to write a "pure" algorithmic compressor which just implements PPM or LZP or whatever
in a very data agnostic way and make it quite minimal. But if you do that, you are generally leaving a lot
on the table.
09-13-12 | LZNib
LZNib is the straightforward/trivial way to do an LZ77 coder using
EncodeMod for variable length numbers and 4-bit nibble aligned IO.
That is, literals are always 8 bit; the control word is 4 bits and signals either a literal run len or a match length,
using a
range division value instead of
a flag bit.
name raw config_divider_lrl=4 config_divider_lrl=5 config_divider_lrl=6 config_divider_lrl=7 config_divider_lrl=8 config_divider_lrl=9 config_divider_lrl=10 config_divider_lrl=11 config_divider_lrl=12 best
lzt00 16914 5639 5638 5632 5636 5654 5671 5696 5728 5771 5632
lzt01 200000 199360 199354 199348 199345 199339 199333 199324 199319 199314 199314
lzt02 755121 244328 243844 250836 255146 255177 255257 257754 260107 260597 243844
lzt03 3471552 1746220 1744630 1743728 1743043 1742718 1742728 1743191 1744496 1746915 1742718
lzt04 48649 13932 13939 13968 14015 14120 14184 14268 14319 14507 13932
lzt05 927796 422058 421115 420746 420592 418289 418200 418639 418854 418082 418082
lzt06 563160 414925 414080 412748 412223 409673 408884 408361 408435 407393 407393
lzt07 500000 237756 237318 237004 236910 236771 236949 237381 238091 239176 236771
lzt08 355400 309397 309490 308579 307706 307263 306418 305689 305495 305793 305495
lzt09 786488 302834 303018 303773 304350 305222 307405 308888 310649 314647 302834
lzt10 154624 11799 11785 11792 11800 11821 11843 11866 11885 11923 11785
lzt11 58524 22420 22341 22249 22288 22276 22322 22370 22561 22581 22249
lzt12 164423 28901 28974 28900 28957 29053 29122 29296 29381 29545 28900
lzt13 1041576 1072275 1068614 1058545 1047273 1025641 1025616 1025520 1025404 1024891 1024891
lzt14 102400 52010 51755 51595 51462 51379 51314 51298 51302 51341 51298
lzt15 34664 11846 11795 11767 11760 11740 11740 11756 11831 11837 11740
lzt16 21504 11056 11000 10961 10934 10911 10904 10893 10883 10892 10883
lzt17 53161 20122 20119 20152 20210 20288 20424 20601 20834 21091 20119
lzt18 102400 77317 77307 77274 77045 77037 77020 77006 76964 76976 76964
lzt19 768771 306499 307120 308138 309635 311801 314857 318983 323981 329683 306499
lzt20 1179702 975546 974447 973507 972326 971521 972060 971614 971569 985009 971521
lzt21 679936 99059 99182 99385 99673 100013 100492 101018 101652 102387 99059
lzt22 400000 334796 334533 334357 334027 333860 333733 333543 333501 337864 333501
lzt23 1048576 1029556 1026539 1023978 1021833 1019900 1018124 1016552 1015139 1013815 1013815
lzt24 3471552 1711694 1710524 1708577 1706969 1696663 1695663 1694205 1692996 1688324 1688324
lzt25 1029744 224428 224423 224306 229365 229362 229368 229603 227083 227546 224306
lzt26 262144 240106 239633 239200 238864 238538 238232 237960 237738 237571 237571
lzt27 857241 323147 323098 323274 323133 322050 322068 322799 322182 322573 322050
lzt28 1591760 343555 345586 348549 350601 352455 354077 356025 360583 364438 343555
lzt29 3953035 1445657 1442589 1440996 1440794 1437132 1437593 1440565 1442614 1442914 1437132
lzt30 100000 100668 100660 100656 100655 100653 100651 100651 100643 100643 100643
total 24700817 12338906 12324450 12314520 12308570 12268320 12272252 12283315 12296219 12326039 12212820
raw minilzo snappy quicklz 3 lz4 hc lzopack -9 ULZ c6 crush cx zlib lz4p 332 lznib div8
lzt00 16914 7195 7254 6082 6473 5805 6472 5487 4896 6068 5654
lzt01 200000 198906 198222 200009 198900 198934 199680 222477 198199 198880 199339
lzt02 755121 567421 552599 334625 410695 426187 312889 258902 386203 292427 255177
lzt03 3471552 2456002 2399663 2036985 1820761 1854718 1835797 1938138 1789728 1795951 1742718
lzt04 48649 20602 21170 16894 16709 14858 17359 13869 11903 15584 14120
lzt05 927796 590072 554964 480848 460889 453608 464602 444945 422484 440742 418289
lzt06 563160 536084 516818 563169 493055 490308 428137 432989 446533 419768 409673
lzt07 500000 297306 298207 268255 265688 242029 268271 245662 229426 248500 236771
lzt08 355400 356248 351850 317102 331454 314918 337423 315688 277666 322959 307263
lzt09 786488 460896 472498 372682 344792 345561 329608 304048 325921 325124 305222
lzt10 154624 21355 25960 17249 15139 14013 16238 12540 12577 13299 11821
lzt11 58524 28153 29121 25626 25832 23717 26720 23279 21637 23870 22276
lzt12 164423 52725 53515 38745 33666 31574 36077 29601 27583 30864 29053
lzt13 1041576 1045665 1041684 1041585 1042749 1041633 1048598 1061423 969636 1040033 1025641
lzt14 102400 61394 63638 55124 56525 52102 58509 51491 48155 53395 51379
lzt15 34664 16026 15417 13626 14062 12663 14016 12470 11464 12723 11740
lzt16 21504 12858 13165 11646 12349 11203 12554 11119 10311 11392 10911
lzt17 53161 28075 29415 23478 23141 20979 23829 19877 18518 22028 20288
lzt18 102400 100499 97931 81100 85659 74268 89973 76858 68392 79138 77037
lzt19 768771 495312 524686 411916 363217 360558 333732 302006 312257 335912 311801
lzt20 1179702 1181855 1161896 1037098 1045179 1042190 1013392 952329 952365 993442 971521
lzt21 679936 240528 240244 188446 194075 174892 125322 103608 148267 113461 100013
lzt22 400000 401446 397959 338837 361733 354449 355978 321598 309569 348347 333860
lzt23 1048576 1052692 1048694 1048585 1040701 1030737 1047609 985814 777633 1035197 1019900
lzt24 3471552 2668424 2613405 2425865 2369885 2521469 2040395 2080506 2289316 1934129 1696663
lzt25 1029744 361885 425735 351577 324190 326180 477377 297974 210363 332747 229362
lzt26 262144 261152 259327 244555 246465 242343 252297 237162 222808 244990 238538
lzt27 857241 460703 466747 435522 430350 395926 392139 335655 333120 353497 322050
lzt28 1591760 578289 617498 453170 445806 421804 401166 349753 335243 388712 352455
lzt29 3953035 2570903 2535625 2259281 2235299 2052597 2227835 2013763 1805289 1519904 1437132
lzt30 100000 100397 100015 100009 100394 100033 100782 112373 100020 100393 100653
total 24700817 17231068 17134922 15199691 14815832 14652256 14294776 13573404 13077482 13053476 12268320 09-11-12 | LZ MinMatchLen and Parse Strategies
I've been thinking about this and want to get my thoughts down while they're (somewhat) clear.
tweak outer params :
for each parse strategy S
for each byte B
use S on B
"multi-parse" :
for each byte B
for each parse strategy S
use S on B
this is a big difference, because a lot of work can be shared. In particular, the string matcher only needs to be run once for all
the parse strategies. Also in many cases their work is redundant; eg. if you don't find any matches then all strategies must output a literal.
09-10-12 | LZ4 - Large Window
Continuing from :
LZ4 Optimal Parse
(also see the
Encoding Values in Bytes series).
LZ4 variants are named like :
# bits of LRL - # bits of ML - # bits of offset : how offset is encoded
we have :
4-4-0 : 16 = "LZ4 classic" , always a 16 bit offset
4-4-0 : 15/23 = one bit of 16 bit offset is reserved to indicate a 3 byte offset (so windows are 1<<15 , 1<<23)
4-4-0 : encodemodWB = use encodemod for offset; send word first then bytes
3-4-1 : 16/24 = 3 bits of LRL, 1 bit in control flags 2 or 3 byte offset
(4-3-1 is strictly worse than 3-4-1)
3-3-2 : encodemodB = 2 bottom bits of offset go in the control ; remainder send with encodemod bytes
this is the first variant that can send an offset in only 1 byte
3-3-2-B ML3-4 : same as above, but 1 byte offets get a min match len of 3 (instead of 4)
And the optimal parse compressed sizes are :
raw 4-4-0 : 16 4-4-0 : 15/23 4-4-0 : encodemodWB 3-4-1 : 16/24 3-3-2 : encodemodB 3-3-2-B MML3-4
lzt00 16914 6444 6444 6444 6551 6186 6068
lzt01 200000 198900 198900 198900 198905 198893 198880
lzt02 755121 410549 321448 314152 307669 315101 292427
lzt03 3471552 1815464 1804312 1804086 1807200 1799418 1795951
lzt04 48649 16461 16463 16461 16564 15938 15584
lzt05 927796 459700 457261 454931 460191 445986 440742
lzt06 563160 492938 429336 428734 431119 429374 419768
lzt07 500000 264112 264594 263954 266882 257550 248500
lzt08 355400 330793 330524 328740 336329 334284 322959
lzt09 786488 340317 327145 323145 321273 326352 325124
lzt10 154624 14845 14706 14627 14687 13714 13299
lzt11 58524 25749 25755 25750 25943 24555 23870
lzt12 164423 32485 32770 32470 32542 31272 30864
lzt13 1041576 1042749 1043586 1042984 1042836 1042747 1040033
lzt14 102400 56478 56734 56535 57516 55182 53395
lzt15 34664 13995 13996 13995 14102 13050 12723
lzt16 21504 12340 12340 12340 12517 11847 11392
lzt17 53161 23025 23167 23025 23163 22374 22028
lzt18 102400 85614 87190 85929 86374 86197 79138
lzt19 768771 359276 345974 339273 334512 337204 335912
lzt20 1179702 1043192 1011629 1004412 1004435 1002099 993442
lzt21 679936 192411 120808 120704 121908 115289 113461
lzt22 400000 361524 356885 353333 353676 353120 348347
lzt23 1048576 1040623 1038648 1034493 1039073 1038802 1035197
lzt24 3471552 2369040 1911004 1907645 1929223 1931560 1934129
lzt25 1029744 324107 324281 323513 323032 332437 332747
lzt26 262144 246334 248360 246587 247177 246667 244990
lzt27 857241 425694 386493 386056 387358 358184 353497
lzt28 1591760 437666 399105 393814 390517 390421 388712
lzt29 3953035 2230095 1563410 1554583 1553331 1537093 1519904
lzt30 100000 100394 100394 100394 100394 100394 100393
total 24700817 14773314 13273662 13212009 13246999 13173290 13053476
raw 4-4-0 : 16 4-4-0 : 15/23 4-4-0 : encodemodWB 3-4-1 : 16/24 3-3-2 : encodemodB 3-3-2 : encdemodB
09-09-12 | A Simple Tight-Packed Array
Trivial snippet for a tight-packed array with bit mask indicating which elements exist.
struct PackedArray
{
uint32 mask; // numItems = num_bits_set(mask)
int32 items[1]; // variable allocation size
};
static inline uint32 num_bits_set( uint32 v )
{
//return _mm_popcnt_u32(v);
// from "Bit Twiddling Hacks" :
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
uint32 c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
return c;
}
bool PackedArray_Get(const PackedArray * a,int index,int32 * pgot)
{
ASSERT( index >= 0 && index < 32 );
uint32 mask = 1UL << index;
if ( a->mask & mask )
{
// it exists, find it
uint32 numPreceding = num_bits_set( a->mask & (mask-1) );
*pgot = a->items[numPreceding];
return true;
}
else
{
return false;
}
}
bool PackedArray_Put(PackedArray * a,int index,int32 item)
{
ASSERT( index >= 0 && index < 32 );
uint32 mask = 1UL << index;
uint32 numPreceding = num_bits_set( a->mask & (mask-1) );
if ( a->mask & mask )
{
// it exists, replace
a->items[numPreceding] = item;
return true;
}
else
{
// have to add it
// realloc items here or whatever your scheme is
// make room :
uint32 numFollowing = num_bits_set(a->mask) - numPreceding;
// slide up followers :
int32 * pitem = a->items + numPreceding;
memmove(pitem+1,pitem,numFollowing*sizeof(int32));
// put me in
*pitem = item;
a->mask |= mask;
return false;
}
}
09-08-12 | p4util
p4util is a replacement for p4.exe.
p4 edit c:\rad\stuff.c
clearly you should use the P4CONFIG from c:\rad to do that, not the one from whatever cwd I happen to be in. So p4util can (optionally)
do that.
cddesubst.bat :
@makedesubst
@s:\desubst.bat
p4.bat :
@echo off
pushd
call cddesubst
"C:\Program Files\Perforce\p4.exe" %*
popd
So that worked for the problem of your cwd being on a substed drive, but didn't fix any other problems,
so I went for the full p4util solution.
{
HANDLE f = CreateFile(GetCurDir().CStr(),
FILE_READ_ATTRIBUTES | STANDARD_RIGHTS_READ
,FILE_SHARE_READ,0,OPEN_EXISTING,FILE_FLAG_BACKUP_SEMANTICS,0);
if ( f == INVALID_HANDLE_VALUE )
{
lprintf("ERROR : CreateFile\n");
return 10;
}
char buffer[1024];
DWORD bufSize = ARRAY_SIZE(buffer);
DWORD ret = CALL_KERNEL32(GetFinalPathNameByHandleA)(f,buffer,bufSize,0);
CloseHandle(f);
if ( ret >= bufSize )
{
lprintf("ERROR : GetFinalPathNameByHandleA\n");
return 10;
}
char * pnorm = buffer;
stripresameadvance(&pnorm,"\\\\?\\");
String out = StringPrintf("cdd %s\n",pnorm);
bool ok2 = WriteWholeFile("s:\\desubst.bat",out.CStr(),out.Length());
if ( ! ok2 )
{
lprintf("ERROR : WriteWholeFile\n");
return 10;
}
return 0;
}
which uses some cblib stuff, but replacing it with non-cb stuff should be obvious.
09-06-12 | Quick WebP lossless examination
Just had a quick run through the WebP-lossless spec. What I see :
09-05-12 | Make it square!
God damn mattress fitted sheet is on sideways again.
Simmons Beautyrest- The Great Depression - Simmons Beautyrest Mattress - Epinions.com
Consumer complaints about Simmons Mattresses
Simmons Beautyrest Mattresses - All Types Reviews – Viewpoints.com
09-04-12 | Encoding Values in Bytes Part 4
Just some random addenda. The main post series is here :
cbloom rants 09-02-12 - Encoding Values in Bytes Part 2
cbloom rants 09-02-12 - Encoding Values in Bytes Part 3
1 + 7 bits
10 + 14 bits
110 + 21 bits
1110 + 28 bits
If the max you send is 4 bytes then we're actually wasting a bit on the last code; it should just be 111 + 29 bits :
1 + 7 bits
10 + 14 bits
110 + 21 bits
111 + 29 bits
00 + 6 bits
01 + 14 bits
10 + 22 bits
11 + 30 bits
what I didn't mention before is that for a max of 4 bytes these are the *only* prefix codes. All other prefix codes are just the xor of
these or are strictly worse.
const uint32 mod128_mask[] = { (1UL<<7)-1, (1UL<<14)-1, (1UL<<21)-1, (1UL<<29)-1 };
const uint32 mod128_base[] = { 0, (1UL<<7)-1, ((1UL<<14)-1) + ((1UL<<7)-1),
((1UL<<21)-1) + ((1UL<<14)-1) + ((1UL<<7)-1) };
int DecodeMod128LEBranchless(U8PC & from)
{
uint32 dw = *( (uint32 *) from );
unsigned long index;
_BitScanForward(&index,dw);
index = MIN(index,3);
int count = index+1;
int shift = MIN(count,3);
from += count;
dw >>= shift;
dw &= mod128_mask[index];
dw += mod128_base[index];
return (int) dw;
}
and the encoder is left as an exercise for the reader. (as is an endian-independent version)
1,48961,4226881
for 1-3 byte offset encoding.
B-B-B-mod :
best : 251, 27, 15 : 1228288
cost per : 2.129383
you can see it has reserved a little space for tiny file sizes to go in 1 byte, then most go in 2 bytes,
and not much space is reserved for the very large rare files.
W-B-pow2
best : 13 , 4 : 1232113
cost per : 2.136096
only barely higher average cost. (13,4 are bit counts; eg. mods of 1<<13 and 1<<4)
write 16 bits if size < (1<<15) , else 32 bits, else 64 bits :
16-32-64 : cost per : 2.33855
this is not a great scheme and yet it's only slightly worse on average; clearly not a lot of win to be had
on this front.
09-04-12 | LZ4 Optimal Parse
I wrote an optimal parser for LZ4 to have as a reference point.
<------ all file positions ------>
^ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
states XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
| XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
V XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
and then you just go fill all the slots. Like LZSS (Storer-Szymanski) it's simplest to walk backwards, that way
any value in the table that you need from later positions is already computed.
raw greedy lazy optimal XXX lz4 -c0 lz4 -c1 lz4 -c2
lzt00 16914 6646 6529 6444 7557 6666 6489
lzt01 200000 198906 198901 198900 198922 198918 198916
lzt02 755121 412064 411107 410549 447200 412085 410705
lzt03 3471552 1827696 1821562 1815464 2024928 1827577 1820776
lzt04 48649 17521 16985 16461 21803 17540 16725
lzt05 927796 484749 462204 459700 518392 484764 460905
lzt06 563160 493815 493056 492938 508281 493838 493071
lzt07 500000 269945 266191 264112 300505 269945 265704
lzt08 355400 332459 332201 330793 335890 332476 331470
lzt09 786488 352802 346700 340317 416708 352821 344807
lzt10 154624 16517 15173 14845 21334 16537 15155
lzt11 58524 26719 26060 25749 30148 26737 25848
lzt12 164423 35168 33504 32485 60450 35187 33682
lzt13 1041576 1042758 1042750 1042749 1045665 1042774 1042765
lzt14 102400 57421 56834 56478 59832 57432 56541
lzt15 34664 14755 14055 13995 15702 14776 14078
lzt16 21504 12503 12396 12340 12908 12528 12365
lzt17 53161 24023 23450 23025 28657 24040 23157
lzt18 102400 86880 86109 85614 88745 86899 85675
lzt19 768771 381018 369110 359276 478748 381033 363233
lzt20 1179702 1051478 1047742 1043192 1073769 1051500 1045195
lzt21 679936 203405 196764 192411 244363 203410 194091
lzt22 400000 363832 362390 361524 371121 363845 361748
lzt23 1048576 1040762 1040758 1040623 1049408 1040778 1040717
lzt24 3471552 2391132 2372991 2369040 2426582 2391145 2369896
lzt25 1029744 328271 326682 324107 370278 328295 324206
lzt26 262144 246951 246876 246334 250159 246972 246481
lzt27 857241 478524 429287 425694 531932 478537 430366
lzt28 1591760 468568 455644 437666 580253 468589 445814
lzt29 3953035 2342525 2259916 2230095 2536827 2342537 2235202
lzt30 100000 100394 100394 100394 100410 100410 100410
total 24700817 15110207 14874321 14773314 16157477 15110591 14816193
raw greedy lazy optimal lz4 -c0 lz4 -c1 lz4 -c2
cbloom rants 10-24-11 - LZ Optimal Parse with A Star Part 1
cbloom rants 12-17-11 - LZ Optimal Parse with A Star Part 2
cbloom rants 12-17-11 - LZ Optimal Parse with A Star Part 3
cbloom rants 12-17-11 - LZ Optimal Parse with A Star Part 4
cbloom rants 01-09-12 - LZ Optimal Parse with A Star Part 5
cbloom rants 11-02-11 - StringMatchTest Release + String Match Post Index
cbloom rants 09-27-08 - 2 - LZ and ACB
cbloom rants 08-20-10 - Deobfuscating LZMA
cbloom rants 09-03-10 - LZ and Exclusions
cbloom rants 09-14-10 - A small note on structured data
cbloom rants 06-08-11 - Tech Todos
09-03-12 | Photos - Pots from Q2 2012
Fourth quarter of classes ended back in June. Work from that quarter :
Open egg; solid Kaki ; kaki was too thin
Left : vic's dip + kaki rim dip and sponges inside
Right : celadon + kaki rim and sponges outside
Same two as above
Shino bowl; chrome under outside and chrome over inside
chrome over is not nice; glaze was too thick for chrome under to come through
Open egg; rhodes outside and BBB inside
BBB has a nice glow and is a good smooth liner
Flower pots; wilcox with rhodes over ; shino with ben's stain
ben's stain caused some weird flaking
Flower pots; maria's gold and tenmoku
Celadon with something poured in the middle, not sure what
Left : celadon with cobalt painted under
Middle : solid tenmoku (I think; too thin)
Right : Ox inside, tenmoku out
09-02-12 | Encoding Values in Bytes Part 3
Okay, now that we have some background we can bring together the two ends of the story : encoding variable length numbers,
and encodings in bytes using variable ranges.
typedef uint8 U8;
typedef uint8 * U8P;
typedef const uint8 * U8PC;
U8 * EncodeMod(U8 * to, int val, int mod)
{
ASSERT( mod >= 1 && mod <= 255 );
const int upper = 256 - mod;
if ( val < upper )
{
*to++ = (U8)(mod + val);
return to;
}
else
{
// val >= upper
val -= upper;
int top = val / mod;
int bottom = val % mod;
*to++ = (U8)bottom;
return EncodeMod(to,top,mod);
}
}
int DecodeMod(U8PC & from, int mod)
{
const int upper = 256 - mod;
int byte = *from++;
if ( byte >= mod )
{
return (byte - mod);
}
else
{
int val = DecodeMod(from,mod);
val *= mod;
val += byte + upper;
return val;
}
}
but of course you don't actually want to use recursive functions; the less obviously symmetric non-recursive
implementation is :
U8 * EncodeMod(U8 * to, int val, int mod)
{
ASSERT( mod >= 1 && mod <= 255 );
const int upper = 256 - mod;
for(;;)
{
if ( val < upper )
{
*to++ = (U8)(mod + val);
return to;
}
else
{
// val >= upper
val -= upper;
int lower = val % mod;
*to++ = (U8)lower;
val /= mod;
}
}
}
int DecodeMod(U8PC & from, int mod)
{
const int upper = 256 - mod;
int mul = 1;
int val = 0;
for(;;)
{
int byte = *from++;
if ( byte >= mod )
{
val += (byte - mod)*mul;
break;
}
else
{
val += (byte+upper)*mul;
mul *= mod;
}
}
return val;
}
and furthermore in practice you probably don't want to eat the cost of the divide and modulo in the encoder
(it's only a mul in the decoder, but still). So you can require that mod is a power of 2. The implementation
is obvious :
U8 * EncodeModPow2(U8 * to, int val, int bits)
{
ASSERT( bits >= 0 && bits < 8 );
const int mod = (1<
<bits);
const int upper = 256 - mod;
for(;;)
{
if ( val < upper )
{
*to++ = (U8)(mod + val);
return to;
}
else
{
// val >= upper
val -= upper;
int lower = val & (mod-1);
*to++ = (U8)lower;
val >>= bits;
}
}
}
int DecodeModPow2(U8PC & from, int bits)
{
const int mod = (1<<bits);
const int upper = 256 - mod;
int shift = 0;
int val = 0;
for(;;)
{
int byte = *from++;
if ( byte >= mod )
{
val += (byte - mod)<<shift;
return val;
}
else
{
val += (byte+upper)<<shift;
shift += bits;
}
}
}
mod 1 : 255,510,765,1020,1275,1530,1785,2040,2295,
mod 2 : 254,762,1778,3810,7874,16002,32258,64770,129794,
mod 3 : 253,1012,3289,10120,30613,92092,276529,
mod 5 : 251,1506,7781,39156,196031,
mod 8 : 248,2232,18104,145080,
mod 13 : 243,3402,44469,578340,
mod 21 : 235,5170,108805,
mod 34 : 222,7770,264402,
mod 55 : 201,11256,619281,
mod 89 : 167,15030,1337837,
mod 144 : 112,16240,2338672,
mod 233 : 23,5382,1254029,
bits 0 : 255,510,765,1020,1275,1530,1785,2040,2295,
bits 1 : 254,762,1778,3810,7874,16002,32258,64770,129794,
bits 2 : 252,1260,5292,21420,85932,343980,
bits 3 : 248,2232,18104,145080,
bits 4 : 240,4080,65520,1048560,
bits 5 : 224,7392,236768,
bits 6 : 192,12480,798912,
bits 7 : 128,16512,2113664,
eg. mod = 1 (the old 1-1-1-1 encoding) sends values < 255 in 1 byte, < 510 in 2 bytes, etc. So for example
at mod 13 we van only get up to 242 in 1 byte, but we can get a lot more in 2 bytes (< 3402).
09-02-12 | Encoding Values in Bytes Part 2
A short post for a small concept : encoding using division of words into ranges.
0 bit + 7 bits more = 7 bits of A's
1 bit + 7 bits more = 7 bits of B's
and you can write the decoder in a bit-twiddling manner :
{
int byte = *ptr++;
int isB = byte & 0x80;
int count = byte & 0x7F;
}
but you can also write the same decoder in a value-checking manner :
{
int byte = *ptr++;
if ( byte >= 128 )
{
isB = true;
count = byte - 128;
}
else
{
isB = false;
count = byte;
}
}
(of course you should follow with a count++ in all cases unless you want to be able to send a count of 0)
{
int byte = *ptr++;
if ( byte >= 90 )
{
isB = true;
count = byte - 90;
}
else
{
isB = false;
count = byte;
}
}
and now we have [0-89] for A's and [90-255] for B's ; this distribution may be better if your data is not
symmetric.
{
int byte = *ptr++;
if ( byte >= 255 )
{
isB = true;
count = byte - 255;
}
else
{
isB = false;
count = byte;
}
}
and a "B" is something that doesn't fit in one byte.
EncodeRun( bool isRun0 , int runLen , int divider )
{
ASSERT( runLen > 0 );
if ( isRun0 )
{
// I get [0,divider-1] in the output
for(;;)
{
if ( runLen <= divider )
{
*output++ = runLen-1;
return;
}
else
{
*output++ = divider-1;
runLen -= (divider-1);
ASSERT( runLen >= 1 );
}
}
}
else
{
// I get [divider,base-1] in the output
// (eg. base = 256 for bytes, 16 for nibbles, etc.)
int range = base-divider;
for(;;)
{
if ( runLen <= range )
{
*output++ = divider + runLen-1;
return;
}
else
{
*output++ = base-1; // divider + (range-1)
runLen -= (range-1);
ASSERT( runLen >= 1 );
}
}
}
}
Note there's no flag for "doesn't fit in one byte" here; if we have more 0's than fit in one run, we just send
another run of 0's following the first. eg. runs of 0's and 1's don't strictly alternate in this encoding.
09-02-12 | Encoding Values in Bytes Part 1
A very basic level of data compression is just sending values in bytes. I'm gonna do a few posts on this topic,
first some review.
Flag Value Encodings
{
U8 *ptr; int value; // given
while( value >= 255 )
{
value -= 255;
*ptr++ = 255;
}
*ptr++ = value;
}
That is, we reserve one value in the byte (255) to be a flag meaning "more bytes follow". Values lower than 255 can be sent
right away. If more bytes follow, we try again to just send it in one byte.
value : 1-1-1-1 1-1-2-3 1-2-3-4
200 : 1 1 1
400 : 2 2 3
600 : 3 4 3
1000 : 4 4 3
1600 : 7 4 3
2600 : 11 4 3
4200 : 17 4 3
6800 : 27 4 3
11000 : 44 4 3
17800 : 70 4 3
28800 : 113 4 3
46600 : 183 4 3
75400 : 296 7 6
Flag Bit Encodings
One flag bit at a time
{
U8 *ptr; int value; // given
while( value >= 128 )
{
value -= 128;
*ptr++ = value & 0x7F;
value >>= 7;
}
*ptr++ = value;
}
To decode it, you check if the top bit of each byte is on. The top bit is being on means "more bytes follow". Whether or not the top bit
is on, you always have 7 bits of payload to add to your value. That is, the encoding is like :
0 - 127 : 0 + 7 bits
128 - 16511 : 1 + 7 bits ; 0 + 7 bits [or 10 + 14 bits]
16512 - 0x20407F : 1 + 7, 1 + 7, 0 + 7 [or 110 + 21 bits]
N flag bits
In this case you just send a fixed number of flag bits; eg. 2 bits to 1-5 bytes. The encoding is :
0 - 63 : 00 + 6 bits
0x40 - 16447 : 01 + 6+8 bits
0x4040 - 4210751 : 10 + 6+8+8 bits
0x404040 - 1.078e9 : 11 + 6+8+8+8 bits
09-01-12 | Good Computer Days
I was throwing out some old papers and found this :
09-01-12 | Photos - June/July
Contemplating the scramble at the top of Mount Si
I am pleased by Si, N is hiding new braces
The Ghost Dog of Si
Home produce
The homestead
The Si scramble from below; super fun and easy.
Newly planted flower and herb bed
The future egg makers
Canoeing at sunset when the water is like mercury
Home produce
09-01-12 | Photos - August
Biking the Iron Horse / John Wayne trail to the Snoqualmie Tunnel. It was pretty awesome going up, just a steady
climb and nice scenery. Not so much fun going down, just a really long coast and unpleasant vibration. (*)
09-01-12 | Photos - Backpacking with James
Cady Ridge - Meander Meadows area, on and off the PCT near Glacier Peak. Amazing; lupine and other summer
wildflowers still in bloom, and berries already getting ripe (it's been a warm year this year); bugs not bad,
pretty much everything you could ask for in a backpacking trip. (that's my happy face, I know it's hard to tell).
The PCT was ridiculously crowded, full of big groups of tourists crammed into the horrible PCT camp sites; luckily
if you're not a total moron it's very easy to get off the main track and avoid them.
09-01-12 | LZP1.h
In my continuing STB-ificiation of old code, I did a single file header version of the original "LZP1"
(this is "LZP1b" in the nomenclature of this
recent LZP1 post ).
08-19-12 | Subaru BRZ Test Drive
Just took a short drive in a BRZ. This won't be a full "test drive" post, just some quick thoughts.
08-19-12 | Packages in Standard C
So we've talked about DLL's a few times and how fucked up they are. What you really want is something like a DLL
that you can statically put in your app so it's not a separate file. We'll call this a "package".
But I was reminded the other day that C libs are also fucked up.
08-17-12 | Defines
In the category of "stuff everyone should know by now" : doing "#if" is much better than "#ifdef"
for boolean toggles that you want to be able to set from the compiler command line.
code :
#ifdef STUFF
.. stuff a ..
#else
.. stuff b ..
#endif
command line :
compiler -DSTUFF %*
or
compiler %*
Whereas the "if way" is :
code :
#ifndef STUFF
// stuff not set
// could #error here
// or #define STUFF to 0 or 1
#endif
#if STUFF
.. stuff a ..
#else
.. stuff b ..
#endif
command line :
compiler -DSTUFF=1 %*
or
compiler -DSTUFF=0 %*
Why is the "if way" so much better ?
#ifdef MAKEDLL
#define expfunc __declspec(dllexport)
#else
#ifdef MAKEORIMPORTLIB
#define expfunc extern
#else
#define expfunc __declspec(dllimport)
#endif
#endif
Okay, so there are four usage cases :
1. building Oodle as a LIB - use -DMAKEORIMPORTLIB
2. building Oodle as a DLL - use -DMAKEDLL
3. building an app that uses Oodle as a LIB - use -DMAKEORIMPORTLIB
4. building an app that uses Oodle as a DLL - use no define
and that all works fine (*). But I found it hard to use; for example if I try to stick a -DMAKEXE on the command line and somebody already
set -DMAKEDLL, it doesn't do what I expected; and there's no way to definitely say "I want dllimport".
#ifdef MAKEDLL
#define expfunc __declspec(dllexport)
#if defined(MAKELIB) || defined(IMPORTLIB) || defined(IMPORTDLL)
#error multiple MAKE or IMPORT defines
#endif
#elif defined(IMPORTDLL)
#define expfunc __declspec(dllimport)
#if defined(MAKELIB) || defined(MAKEDLL) || defined(IMPORTLIB)
#error multiple MAKE or IMPORT defines
#endif
#elif defined(MAKELIB)
#define expfunc extern
#if defined(MAKEDLL) || defined(IMPORTLIB) || defined(IMPORTDLL)
#error multiple MAKE or IMPORT defines
#endif
#elif defined(IMPORTLIB)
#define expfunc extern
#if defined(MAKELIB) || defined(MAKEDLL) || defined(IMPORTDLL)
#error multiple MAKE or IMPORT defines
#endif
#else
#error no Oodle usage define set
#endif
and usage is obvious because there's a specific define for each case :
1. building Oodle as a LIB - use -DMAKELIB
2. building Oodle as a DLL - use -DMAKEDLL
3. building an app that uses Oodle as a LIB - use -DIMPORTLIB
4. building an app that uses Oodle as a DLL - use -DIMPORTDLL
and it's much harder to use incorrectly, because you have to set one and only one. Also it's a little bit less
implementation tied, in the sense that the fact that MAKELIB and IMPORTLIB are actually the same thing is hidden
from the user in case that ever changes.
08-12-12 | Unicode on Windows Summary Page
Making another summary page for myself to link to.
cbloom rants 06-15-08 - 2
cbloom rants 06-21-08 - 3
cbloom rants 11-06-09 - IsSameFile
cbloom rants 06-07-10 - Unicode CMD Code Page Checkup
cbloom rants 10-11-10 - DeUnicode v1.0
cbloom rants 10-11-10 - Windows 1252 to ASCII best fit
cbloom rants 07-28-12 - DeUnicode 1.1
08-11-12 | Technical Writing
Whenever I give people my technical writing to review, one of the first comments out of most people's
mouths is "you need to remove the use of 'I' , and the asides, and the run-on sentences, and this bit
where you say 'fuck' is unprofessional, and blah blah".
08-10-12 | cbhashtable
cbhashtable is a single file standalone hash table. It is a power-of-two-size reprobing hash table
(aka "open addressing" or "closed hashing") which uses special
values for empty & deleted slots (not separate flags). It optionally stores the hash value in the table to accelerate finding when
the key comparison is slow.
cbloom rants 10-19-08 - 1
cbloom rants 10-21-08 - 4
cbloom rants 10-21-08 - 5
cbloom rants 11-23-08 - Hashing & Cache Line Size
cbloom rants 11-19-10 - Hashes and Cache Tables
cbloom rants 11-29-10 - Useless hash test
07-28-12 | DeUnicode 1.1
Made a few revisions to DeUnicode. Mainly because I got sick of continuing to run into stupid file name
problems, you now have the option to even further reduce the set of allowable characters.
DeUnicode -a -c -s *
where -a makes the output 7-bit ascii (instead of 8-bit windows "A" code page (not ANSI), which is the
default), -c removes all special characters that are used in my console ("*^;[]()" etc., as well as anything under 32), and -s removes spaces.
c:\src\deunicode>deunicode -?
DeUnicode v1.1 by cbloom
usage:
DeUnicode [-opts]
DeUnicode.zip at cbloom.com
<dir>
-r : recurse
-t : special the ("the blah" -> "blah, the")
-c : remove chsh special characters
-s : remove spaces
-a : ascii output [else 'A' code page]
-d : display only (don't do)
-q : quiet
-v : verbose
07-26-12 | Movies
My taste in movies has changed a lot over the last few years; I used to be really into the
realistic, depressing crap, but now I just find that tedious and boring. It's sort of too easy to
make those movies, and often they're just terrible and rote, but the terribleness is hidden in a cloak
of seriousness.
2. The Big Lebowski
3. ?
Jiro Dreams of Sushi - he's just so adorable
U Turn - the first 10 minutes are amazing, just the weird atmosphere of it; it goes downhill from there
(Red Rock West is similar and better)
Powaqqatsi - I've discovered a love of Philip Glass in my old age
The Guard
Tristram Shandy: A Cock and Bull Story - did not expect to like this at all, but loved it
(The Trip is okay too but not as good)
Bridesmaids - best hollywood comedy in a while
Submarine
Instant :
Mancora - highly enjoyable fluff
Brick - there are moments are sheer genius in this movie; really amazing writing
Circo
Alamar - wow, beautiful
Absurdistan
Starstruck - wonderfully weird
Bad Day to Go Fishing
The Grocer's Son
Deep Water
Registered Sex Offender
Cocaine Cowboys
Berkeley in the Sixties
El Bulli: Cooking in Progress
07-23-12 | Structs are not what you want
I was looking at some stupid struct in my code which was something like
struct Stuff
{
U16 m_val1;
U32 m_val2;
U16 m_val3;
};
because of the order of variables and the standard packing, it wound up taking 12 bytes instead of 8. In
this particular case, the types were actually templates or #defined types so I don't know statically that the ordering
is fucked up and causing that problem.
bag Particle
{
Vec3 m_pos;
ColorDW m_color;
};
bag GameObject
{
Vec3 m_pos;
String m_name;
};
bag GameObject_AndColor : GameObject
{
ColorDW m_color;
};
now a GameObject_AndColor can be passed to any function that wants a Particle because it provides all the needed fields.
alternatively :
bag GameObjectParticle : GameObject, Particle
{
};
only has one m_pos.
Vec3 pos1;
int i;
ColorDW c;
... some code that sets up pos1 and c ...
bag Particle p = { pos &= pos1, color &= c } ;
// this is not a copy
// it says that the variables I already have can be treated as a Particle bag
Particle_Move(p);
// Particle_Move generates code that acts directly on my local variables "pos1" and "c" , no copying
Vec3 positions[100];
ColorDW colors[100];
I should be able to use Particle_ functions on those, because they provide the values needed to make a valid
bag.
struct Vertex_Pos { Vec3 pos; }
// functs that only need a "pos" member act on a Vertex_Pos
struct Vertex_PosNormal : Vertex_Pos { Vec3 normal; }
// functs that need a pos and normal act on Vertex_PosNormal
struct Vertex_PosColor : Vertex_Pos { Color color; }
// functs that need a pos and color act on Vertex_PosColor
struct Vertex_PosNormalColor : ??
// oh crap, which do I do ?
struct Vertex_PosNormalColor : Vertex_PosNormal { Color color; }
struct Vertex_PosNormalColor : Vertex_PosColor { Vec3 normal; }
// either way is busted
// with bags I should be able to just do :
bag Vertex_PosNormalColor { Vec3 pos; Vec3 normal; Color color; }
// (or either of the above inheritance ways)
and then the bag Vertex_PosNormalColor can be used as a Vertex_PosNormal or Vertex_PosColor without even
explicitly specifying any inheritance. (an ideal compiler would also let you specify that as a compile-time
constraint on the type).
07-22-12 | The E46 M3 Curse
I've been sort of half-assedly trying to buy an E46 M3 for the past few months (planning to then sell the Porsche),
and it has become ridiculous how hard it is to find a decent one.
2005 BMW M3 6 speed Coupe, Black with black leather interior. 62kmiles.
Heated drives seats, CD, AM/FM, Sun roof/moon roof, Air conditioning,
Automatic climate control, Alarm system, Memory drivers seat, Adjustble
power driver/passenger seats, Steering wheel mounted controls, Leather
steering wheel, Tilt steering wheel, Power mirrors, Traction control,
Cruise control, Rear defroster, Power windows, Keyless entry, clean
interior. I have more pics just text and ask for some, Salvaged title
Asking ...
Great car! Look at these options! (oh, BTW, it was totalled). At first when I saw an ad like this,
I thought it was a rare weirdo who was trying to pass off a salvaged car at full price as if it was no big
deal, but it's just every single one :
I am selling my 2006 BMW M3 E46 with competition package. It is
Interlagos Blue. 6 speed manual transmission. Navigation. 2 door coupe
hard top with sunroof. It is stock with 333 hp 3.2L inline 6. Leather
power seats and ipod hook up. The car has 80,000 miles on it. Just put
new brake pads on the rear. Just had the car fully detailed. Non smoker,
Premium Harman Kardon stereo. DVD based navigation. I'm selling the car
because I am getting married and need the money for the wedding. The car
has a salvaged title. It was rebuilt before I bought it in 2010. The
front right side of the car was where the damage was. Accident happened
before I bought it. Everything is fixed and works properly. I am asking
...
Great car! Oh by the way, totalled.
Up for sale is my m3. this car is in PERFECT condition. 6-speed manual.
there is NOTHING wrong with this car. TOO MUCH OPTIONS TO LIST. FULLY
LOADED. leather, power doorlocks, power windows, iphone/ipod USB cable..
there are NO dents or dings. there are NO tears in the seats. car has
72k. car does have a reconstructed title.
Perfect condition! Oh yeah, it was totalled.
07-19-12 | Experimental Futures in Oodle
I don't know if this will ever see the light of day, but it's fucking sexy as hell so here's a sneak peek.
void example_future_comp_decomp()
{
future
<OodleBufferRC> rawBuf = oodle_readfile("r:\\oodle_example_future_input");
// call :
// oodle_compress_sync( rawBuf, OodleLZ_Compressor_LZH, OodleLZ_CompressSelect_Fast );
// but not until rawBuf is done :
future<OodleBufferRC> compBuf = start_future<OodleBufferRC>( oodle_compress_sync, rawBuf, OodleLZ_Compressor_LZH, OodleLZ_CompressSelect_Fast );
future<const char *> write = start_future<const char*>(oodle_writefile,"r:\\oodle_example_future_comp",compBuf);
future<OodleBufferRC> read_compBuf = start_future<OodleBufferRC>( oodle_readfile, write );
future<OodleBufferRC> read_decompBuf = start_future<OodleBufferRC>( oodle_decompress_sync, read_compBuf );
start_future<const char *>( oodle_writefile, "r:\\oodle_example_future_decomp",read_decompBuf);
}
int nop(...)
{
return 0;
}
template
<typename t_arg1,typename t_arg2,typename t_arg3,typename t_arg4>
future<int> done_when_all( t_arg1 a, t_arg2 b, t_arg3 c, t_arg4 d )
{
return start_future<int>( nop, a,b,c,d );
}
then call
done_when_all( various futures )->wait();
float test_func_5_1(int x)
{
Sleep(1);
return x * (2.0/3.0);
}
future
does what it does.
<float> test_func_5_2(int x)
{
Sleep(1);
if ( x == 1 )
{
// hey in this case I can return my value immediately
return make_future(0.6f);
}
else
{
// I need to run another async job to compute my value
x *= 3;
return start_future<float>(test_func_5_1,x);
}
}
then use as :
future<float> f = start_future<float>(test_func_5_2,7);
... do other work ...
float x = f.wait();
future
<const char *> oodle_compress_then_writefile(const char *filename, OodleBufferRC rawBuf, OodleLZ_Compressor compressor, OodleLZ_CompressSelect select )
{
OodleBufferRC compBuf = oodle_compress_sync( rawBuf, compressor, select );
return start_future<const char*>( oodle_writefile, filename, compBuf );
}
// say I want to run two asyncs :
future
but like I said previously I hate that kind of crap for the most part.
Much better is to use the explicit dependency mechanism, like :
<int> f1 = start_future<int>( func1 );
future<float> f2 = start_future<float>( func2 , 7.5f );
// but I want to run func2 after func1
// due to some dependency that isn't through the return value
// what I can use is a return-to-arg adapter like :
template<typename t1,typename t2>
t1 return1(t1 a,t2 b)
{
b;
return a;
}
template<typename t1,typename t2>
future<t1> return1_after2(t1 a,future<t2> b)
{
return start_future<t1>( return1<t1,t2>, a, b );
}
// then run :
future<int> f1 = start_future<int>( func1 );
future<float> f2 = start_future<float>( func2 , return1_after2(7.5f,f1) );
future
There is one case where the funny binding mechanism can be used elegantly; that's when you can
associate the binding with the actual reason for the dependency. That is, if we require func2 to
run after func1, there must be some shared variable that is causing that ordering requirement.
Using a binder to associate func1 with that shared variable is a clean way of saying "you can read this var
after func1 is done".
<int> f1 = start_future<int>( func1 );
future<float> f2 = make_future<float>( func2 , 7.5f );
f2->add_dep(f1);
f2->start();
07-15-12 | VideoPlayer and Commonopoly
I wrote this little videoplayer for RAD a while ago. The main purpose of it is not for consumer-style use but
for developer diagnosing of video compressors.
mplayer -benchmark -nosound -vo png:z=6 %1
md %1_frames
call mov *.png %1_frames\
and then point my videoplayer at the dir, and since it can prefetch and all that it can play a video of pngs (which load
too slow to play without preloading a bunch).
videoplayerR.exe -f0.125 -w1 -q -0 -2 -4 -s1 -r t:\commonopoly
and then enjoy.
07-15-12 | libpng DLL Delay Load
cblib for a long time has used libpng , which creates a damn DLL dependency. Annoyingly, that happens
at app startup, which means even if you are trying to load a .BMP it will fail to start the app if it
can't find the PNG DLL (or, you know, using cblib on totally non-image related stuff).
What I've wanted for a long time is to put the PNG DLL load into the PNG code
so that it's only done if you actually try to load a PNG.
template
and I'm going to just explicitly load the PNG DLL when I get to LoadPNG :
<typename t_func_type>
t_func_type GetImport( t_func_type * pFunc , const char * funcName, HMODULE hm )
{
if ( *pFunc == 0 )
{
ASSERT( hm != 0 );
t_func_type fp = (t_func_type) GetProcAddress( hm, funcName );
// not optional :
ASSERT_RELEASE_THROW( fp != 0 );
*pFunc = fp;
}
return (*pFunc);
}
#define CALL_IMPORT(name,hm) (*GetImport(&STRING_JOIN(fp_,name),STRINGIZE(name),hm))
static HMODULE hm_png = 0;
#define HMODULE_FAILED (HMODULE) 1
static bool my_png_init()
{
if ( hm_png )
{
return ( hm_png == HMODULE_FAILED ) ? false : true;
}
HMODULE hm_zl = LoadLibrary(ZLIB_NAME);
if ( hm_zl == 0 )
{
lprintf("Couldn't load Zlib (%s)\n",ZLIB_NAME);
hm_png = HMODULE_FAILED;
return false;
}
hm_png = LoadLibrary(PNG_NAME);
if ( hm_png == 0 )
{
lprintf("Couldn't load PNG lib (%s)\n",PNG_NAME);
hm_png = HMODULE_FAILED;
return false;
}
lprintf_v2("Using libpng.\n");
return true;
}
#define CALL_PNG(name) CALL_IMPORT(name,hm_png)
so now we just have to replace all our png calls with CALL_PNG() calls, like :
png_ptr = png_create_read_struct(PNG_LIBPNG_VER_STRING, NULL,NULL,NULL);
->
png_ptr = CALL_PNG(png_create_read_struct)(PNG_LIBPNG_VER_STRING, NULL,NULL,NULL);
(obviously you could #define png_create_read_struct CALL_PNG(png_create_read_struct) to make it
totally transparent)
PNG_EXPORT(png_voidp,png_get_io_ptr) PNGARG((png_structp png_ptr));
So we can just do :
#define PNG_EXPORT(ret,func) static ret (PNGAPI * STRING_JOIN(fp_,func))
#define PNGARG(args) args = 0
and then the png proto is defining our fp for us. (unfortunately png.h sticks "extern" in front of all
the protos, so you have to copy out the protos and take off the extern).
DECLFUNC( ret, name , args );
eg.
DECLFUNC( png_voidp, png_get_io_ptr, (png_structp png_ptr) );
Then a client could just include that header multiple times and change DECLFUNC to various things.
For example if you had a header like that, you can look up all the func names at LoadLibrary time, instead of
doing each one on first use (this removes a branch from each function call site). eg :
#define DECLFUNC( ret, name , args ) ret (CALLBACK * STRING_JOIN(fp_,name)) args = 0
#include "allfuncs.inc"
#undef DECLFUNC
void ImportAllFuncs()
{
HMODULE hm = LoadLibrary;
#define DECLFUNC( ret, name , args ) STRING_JOIN(fp_,name) = ImportFunc(name,hm)
#include "allfuncs.inc"
#undef DECLFUNC
}
#define DECLFUNC(ret,name,args) #define name (* STRING_JOIN(fp_,name) )
#include "allfuncs.inc"
#undef DECLFUNC
07-15-12 | cbvector
I tried to use std::vector in one of the Oodle examples, and the freaking MSVC STL generates link dependencies
by default. WTF. So I got annoyed and wrote my own single file standalone vector.
07-15-12 | Internet Toggle
For the most part I'm not that prone to the time-wasting allure of the internet when I'm supposed to
be working. But recently I've been
writing the docs for Oodle and it's just excruciating boring work. Once in a while I have something
important to say in the docs, but the vast majority is stuff like :
DOCGEN int Oodle_ComputeAPlusB(int A, int B);
/* Computes A + B
$:A the value of A in A+B
$:B the value of B in A+B
$:return the sum of and A and B
Oh god please kill me now.
*/
Every time I wrote one of those docs I would involuntarily pop up my web browser and see
if Kids on Crack had any updates.
So I started turning off my internet access to break that habit.
dns_on.bat :
netsh interface ip set dns name="Wireless Network Connection" dhcp
dns_off.bat :
netsh interface ip set dns name="Wireless Network Connection" static 0.0.0.0
ipconfig /flushdns
(or, you know, slightly different as is appropriate for your net config).
(note the flushdns also, otherwise you'll have things like Google in cache).
06-21-12 | Two Alternative Oodles
I want to write about two interesting ideas for async IO systems that I am *not* doing in Oodle.
"c:\junk" :-> OpenFile :-> fh
(that is, OpenFile takes some string as input and then puts out a file handle named "fh")
fh :-> ReadFile :-> buf[0-32768]
buf :->
0-32768 :->
(that is, make a ReadFile op that takes the output of the Openfile, and outputs a valid buffer range)
buf[0-32768] :-> SPU_LZCompress :-> lzSize , compbuf[0-lzSize]
compbuf
(do an LZ compress on the valid buf range and output to compressed buf)
lzSize , compbuf[0-lzSize] :-> WriteFile
etc..
This sets up a chain of operations with dependencies, you can fire it off and then wait on it all to complete.
So, in a sense it's nice because you don't have to write code that waits on the completion of each op and fires the next op and so on.
// in a coroutine :
FILE * fp = fopen("c:\junk","rb"); // yields the coroutine for the async open
int c = fgetc(fp); // might yield for a buffer fill
etc..
This is sort of appealing, it certainly makes it easy to write async IO code. I personally don't really love the fact that
the thread yielding is totally hidden, I like major functional operation to be clear in the imperative code.
#define cofread(fp,buf,size) \
for(;;)
{
{
OodleAsyncHandle h = Oodle_ReadOrReturnAsyncHandle(fp,buf,size);
// Oodle_ReadOrReturnAsyncHandle returns 0 if the Read could complete without waiting
if ( ! h )
break;
Coroutine_AddDependency(h);
}
COROUTINE_YIELD();
Coroutine_FlushDoneDependencies();
}
where COROUTINE_YIELD is my macro that does something like :
self->state = 7;
return;
case 7:
So now you can call cofread() from an Oodle coroutine and it kind of does what we want.
{
int size = 16384;
cofread(fp,buf,size);
}
is no good, if you resume at the YIELD point inside cofread, "size" is gone. (you'd get a case statement skips variable initialization
error or something like that).
#define cofread(fp,buf,size) \
co->m_fp = fp; co->m_buf = buf; co->m_size = size;
for(;;)
{
{
OodleAsyncHandle h = Oodle_ReadOrReturnAsyncHandle(co->m_fp,co->m_buf,co->m_size);
// Oodle_ReadOrReturnAsyncHandle returns 0 if the Read could complete without waiting
if ( ! h )
break;
Coroutine_AddDependency(h);
}
COROUTINE_YIELD();
Coroutine_FlushDoneDependencies();
}
and now you really can use cofread within a coroutine, and you can use local variables as arguments to it, and it yields if it can't
complete immediately, and that's all nice.
MyCoroutine1( coroutine * co )
{
COROUTINE_START()
g_fp = cofopen("blah");
cofread(g_fp,g_buf,1024);
COROUTINE_DONE()
}
That's easy. But you cannnot do :
void MyHelper()
{
g_fp = cofopen("blah");
cofread(g_fp,g_buf,1024);
}
MyCoroutine2( coroutine * co )
{
COROUTINE_START()
MyHelper();
COROUTINE_DONE()
}
and that lack of composability makes it unusable as a general purpose way to do IO.
06-19-12 | Two Learnings
Two things that I got wrong in the past and have only recently come around to what I believe is the right way.
Oodle_Read( OodleFile * f, void * buffer, S64 size, U64 filePos );
in particular at the call site where you have something like :
S64 i1,i2;
Oodle_Read(f,ptr,i1,i2);
you can't tell what the last two args are just by looking at the call site, and what's worse, you can switch them in the call order and get
no warning. That sucks. It's much better to do :
Oodle_Read( OodleFile * f, void * buffer, SINTa size, U64 filePos );
(SINTa is the RAD pointer-sized signed int). It's now clearly documented in the variable types - this is the size of a memory buffer.
SINTA/UINTa - all memory buffer sizes and array sizes
S64/U64 - file positions and sizes
S32/U32 - really not used for much, just enums and modes and flags, not array sizes
Not only does this put more documentation on the variable, it also makes it more clear when you are doing dangerous cast-downs.
U32 oo64to32(U64 x); // used super rarely, very weird thing to do
U32 ooAto32(UINTa x); // used super rarely, very weird thing to do
UINTa oo64toA(U64 x); // used for file sizes -> memory buffers, somewhat common
There are only three cast-downs needed, and really the first two should almost never be used; if you are using them a lot it's a sign of
bad code. The last one is common, and it should do a check to ensure the cast is okay (eg. if using a > 2GB file size on a 32 bit system).
Object::Func(...)
{
m_mutex.Lock();
do stuff;
m_mutex.Unlock();
}
or really :
Object::Func(...)
{
MUTEX_SCOPER(m_mutex);
do stuff;
}
and crucially, when you call other member functions on yourself, that takes the mutex again recursively. (we're going to ignore the
efficiency cost of taking the mutex many times). At first that way seems nice, it's
easy, but then you encounter one of the real world problems and it falls apart.
Object::Func_Locked(...)
{
do stuff;
}
Object::Func_Unlocked(...)
{
MUTEX_SCOPER(m_mutex);
Func_Locked();
}
Then all the functionality is in the _Locked functions and they can call each other.
Object::Func()
{
m_mutex.lock();
if ( m_x > m_y )
deleteMe = true;
m_mutex.unlock();
// *!*
if ( deleteMe )
delete this;
}
That code is no good, at the ! point, some other thread might touch object and make the check m_x > m_y
no longer be true, and then we would be deleting this object incorrectly, when the invariant is not set.
In order to make this work you need a combined "unlock_and_delete" operation. But to do that you need
to know that your unlock is the *last* unlock - you can't do that in the recursive mutex style.
06-18-12 | Run Logged
A handy batch file :
runlogged.bat :
set args=%*
if "%args%"=="" end.bat
set Now=%TIME%
REM ECHO It's %Now% now
REM remove decimal part of time :
FOR /F "tokens=1 delims=." %%A IN ("%Now%") DO SET Now=%%A
REM ECHO It's %Now% now
REM alternate way :
REM FOR /F %%A IN ('TIME/T') DO SET Now=%%A
REM ECHO It's %Now% now
set str=%DATE%_%Now%_%args%
call make_clean_str %str%
set str=%str:.=_%
if not exist r:\runlogs call md r:\runlogs
set logname=r:\runlogs\%str%.log
call %args% > %logname% 2>&1
call d %logname%
type %logname%
make_clean_str.bat :
set str=%*
set str=%str: =_%
set str=%str::=_%
set str=%str:\=_%
set str=%str:/=_%
set str=%str:;=_%
make_clean_str is handy any time you want to make a string into a file name, it gets rid of illegal characters.
ss64 Windows CMD Command Syntax
06-14-12 | API Design with Auto-Complete
I'm working on tweaking the Oodle API and a few things strike me.
Package_WriteFile_ThenDeleteSource();
that's okay
Package_WriteFile(); // also deletes source
WTF crazy unexpected side effect that's not in the name. Not okay.
Basically any time I have to go to the header or the docs to figure out what a function does, I consider the API a failure.
(of course the biggest failure is when I'm using my own code and can't figure out exactly
what a function does without going and reading the implementation)
EnumDir(const char * dirPath,bool recurseDirs,bool caseSensitive,bool filesOnlyNoDirs)
then I can see that as I type and I know exactly what it does, But if I start typing and the browse info pops up
EnumDir(const char * dirPath,U32 flags)
that sucks because now I have to go to the header and try to figure out what the flags are. Basically any API where
you have to go read the docs or scan the header, I don't like. Unfortunately the bool way results in really ugly
code after you've written it :
EnumDir(dirPath,true,false,false)
you have no idea WTF that does, so that sucks. There is a special type of enum which I believe gives the best
of both worlds. First of all, the bad type of enum is one that you can't tell what values it has unless you go
look at the docs or the header, so like, this is bad :
EnumDir(const char * dirPath,enum EnumDirOptions options)
that sucks; but if you just use the enums as a way of naming bools, like :
EnumDir(const char * dirPath,enum RecurseDirs recurseDirs,enum CaseSensitive caseSensitive,enum FilesOnly filesOnlyNoDirs);
EnumDir(dirPath,RecurseDirs_Yes,CaseSensitive_No,FilesOnly_No);
then I think you have the best of both worlds, in the sense that reading the function after it's written is totally clear,
and you can write the function (with browse info) without looking at docs or headers. This only works if all your enums are
reliably just _Yes _No simple enums, if you try to be fancier and make the names like "RecurseDirs_Recursive" or whatever custom
non-standard names, it makes it unpredictable.
Widget * w = Widget_Create();
...
Widget_Delete(w);
it's fine, right? Nope. They should have used Widget_Destroy(w) there, Widget_Delete is for something else. That's very bad API,
you have near-synonyms that seem to be interchangeable, but aren't, and it leads to code that reads like it should be fine but
isn't.
06-13-12 | MSVC RegExp Find-Replace
This took me a while to figure out so I thought I'd write it down.
You can use the MSVC regexp find-rep to match function args and pass them through.
This is in VC2005 so YMMV with other versions (yay for fucking retards who randomly
change interfaces for very little benefit and throw away huge amounts of value in learned
knowledge and familiarity! yay!)
Oodle_Wait(blah,true);
->
Oodle_Wait(blah,OodleAsyncHandle_DeleteIfDone)
for any "blah". The way to do this is :
Oodle_Wait\(:b*{.@}:b*,:b*true:b*\)
->
Oodle_Wait(\1,OodleAsyncHandle_DeleteIfDone)
What it means :
\( is escaped ( character
:b* matches any amount of white space
{} wrap an expression which we can later refer to with \1,\2,etc.
.@ means match any character, and @ means match as few as possible (don't use .*)
A few things that tripped me up :
Oodle_Wait`(*,true`)
->
Oodle_Wait`(*,OodleAsyncHandle_DeleteIfDone`)
which, while not remotely as powerful as a full regexp match, I find much more intuitive and easy to use,
and it works for 99% of the find-reps that I want.
enum MyEnum
{
red,black
};
I want to find-rep "red" and make it "Oodle_MyEnum_Red" , but only where the word "red" is being used in a
variable of type MyEnum.
06-12-12 | Another Threading Post Index
Maybe if I used post category tags I wouldn't have to do these by hand. Oh well. Starting from the last index :
2009-04-06 - Multi-threaded Allocators
2009-04-06 - The Work Dispatcher
2010-05-29 - Lock Free in x64
2010-07-18 - Mystery - Do Mutexes need More than Acquire-Release -
2010-07-18 - Mystery - Does the Cell PPU need Memory Control -
2010-07-18 - Mystery - Why no isync for Acquire on Xenon -
2010-07-31 - GCC Scheduling Barrier
2010-09-12 - The defficiency of Windows' multi-processor scheduler
2010-09-21 - Waiting on Thread Events Part 2
2010-09-21 - Waiting on Thread Events
2011-03-11 - Worklets , IO , and Coroutines
2011-05-13 - Avoiding Thread Switches
2011-07-06 - Who ordered Condition Variables -
2011-07-08 - Event Count and Condition Variable
2011-07-08 - Who ordered Event Count -
2011-07-09 - LockFree - Thomasson's simple MPMC
2011-07-09 - TLS for Win32
2011-07-10 - Mystery - Do you ever need Total Order (seq_cst) -
2011-07-13 - Good threading design for games
2011-07-14 - ARM Atomics
2011-07-14 - compare_exchange_strong vs compare_exchange_weak
2011-07-14 - Some obscure threading APIs
2011-07-15 - Review of many Mutex implementations
2011-07-16 - Ticket FIFO Mutex
2011-07-17 - Atman's Multi-way Ticket Lock
2011-07-17 - CLH list-based lock
2011-07-17 - Per-thread event mutexes
2011-07-18 - cblib Relacy
2011-07-18 - MCS list-based lock
2011-07-20 - A cond_var that's actually atomic
2011-07-20 - Some condition var implementations
2011-07-20 - Some notes on condition vars
2011-07-24 - A cond_var that's actually atomic - part 2
2011-07-25 - Semaphore from CondVar
2011-07-26 - Implementing Event WFMO
2011-07-29 - A look at some bounded queues
2011-07-29 - Semaphore Work Counting issues
2011-07-29 - Spinning
2011-07-30 - A look at some bounded queues - part 2
2011-07-31 - An example that needs seq_cst -
2011-08-01 - Double checked wait
2011-08-01 - A game threading model
2011-08-01 - Non-mutex priority inversion
2011-08-09 - Threading Links
2011-11-28 - Some lock-free rambling
2011-11-30 - Basic sketch of Worker Thread system with dependencies
2011-11-30 - Some more Waitset notes
2011-12-03 - Worker Thread system with reverse dependencies
2011-12-05 - Surprising Producer-Consumer Failures
2011-12-08 - Some Semaphores
2012-03-06 - The Worker Wake and Semaphore Delay Issue
2012-05-30 - On C++ Atomic Fences
2012-05-31 - On C++ Atomic Fences Part 2
2012-06-01 - On C++ Atomic Fences Part 3
06-01-12 | On C++ Atomic Fences Part 3
Finally a small note of confusion. There are cases where I think "what I need is a #StoreLoad",
but a seq_cst fence doesn't work, and changing the load to an RMW does work. Let's try to look into
one of those cases a bit.
struct futex_mutex3
{
std::atomic
The code as written does not work. The problem is the interaction of spots *1 and *2.
<int> m_state; // mutex locked flag
// waitset :
HANDLE m_handle; // actual wait event
atomic<int> m_count; // waiter count
/*************/
futex_mutex3() : m_state(0), m_count(0)
{
m_handle = CreateEvent(NULL,0,0,NULL);
}
~futex_mutex3()
{
CloseHandle(m_handle);
}
void lock()
{
// try to grab the mutex by exchanging in a 1
// if it returns 0, we got the lock
if ( m_state($).exchange(1,rl::mo_acquire) )
{
// prepare_wait :
// add one to waiter count
m_count($).fetch_add(1,mo_acquire);
// (*1)
// double check :
while ( m_state($).exchange(1,rl::mo_acquire) )
{
// wait :
WaitForSingleObject(m_handle, INFINITE);
}
// retire_wait :
m_count($).fetch_add(-1,mo_relaxed);
}
}
void unlock()
{
m_state($).store(0,rl::mo_release);
//notify_one :
// need #StoreLoad before loading m_count
// (*2)
int c = m_count($).load(mo_acquire);
if ( c > 0 )
{
SetEvent(m_handle);
}
}
};
1. thread 0 holds the lock on the mutex
thread 1 calls lock()
2. thread 1 tries to lock the mutex, sees m_state=1 and goes into prepare_wait
3. thread 1 does m_count ++
4. thread 1 tries the exchange again, sees m_state=1 and goes into the wait
thread 0 calls unlock()
5. thread 0 stores a 0 to m_state
6. thread 0 loads m_count and gets 0 (out of date value)
Now, you might think the problem is that the load can act like it hoists above the store.
That is, we know the store happens after the exchange (#4), because the exchange didn't see a zero. Therefore #3 (the inc to count)
must already have happened. But the load at #6 is seeing the value before #3; sure, that's allowed, the load has no ordering
contraint that stops it from moving back in time.
void unlock()
{
m_state($).store(0,rl::mo_release);
//notify_one :
// need #StoreLoad before loading m_count
std::atomic_thread_fence(mo_seq_cst);
int c = m_count($).load(mo_acquire);
if ( c > 0 )
{
SetEvent(m_handle);
}
}
The problem as I understand it that is a single fence in C++0x doesn't really do what you want. I kind of got at this in Part 2
as well, like you can't just use a fence as a way to publish and have relaxed loads receive it. You need another fence in the
receiving thread, so that the fences can "synchronize with" each other. Also if you go back to Part 1 and look at most of the rules
about fences, they only provide ordering if they have the right kind of object to connect through; you need something to carry a transitive
relationship.
A & B initially zero
thread 0 :
A = 1
#StoreLoad
load B
thread 1 :
B = 1
#StoreLoad
load A
It should not be possible for both threads to load 0. Either one or both threads should see a 1. Now if you
make both #StoreLoads into atomic_thread_fence(seq_cst) then it works - but not because the fence is a #StoreLoad.
It works because the two seq_cst fences must have a definite order against each other in the total order S, and then
that provides reference for all the other ops to be "happens before/after" each other.
thread 0:
A($).store(1,mo_relaxed);
std::atomic_thread_fence(mo_seq_cst,$);
r1($) = B($).load(mo_relaxed);
thread 1:
B($).store(1,mo_relaxed);
std::atomic_thread_fence(mo_seq_cst,$);
r2($) = A($).load(mo_relaxed);
after :
r1+r2 == 1 or 2 (never 0)
(BTW this is actually the classic WFMO case in semi-disguise; you're waiting on the two conditions A and B to
become true; if two separate threads set the conditions, then at least one should see the joint A && B be true,
but that only works with the appropriate #StoreLoad;
see this blog post )
thread 0:
A($).store(1,mo_relaxed);
std::atomic_thread_fence(mo_seq_cst,$); // #StoreLoad ?
r1($) = B($).load(mo_relaxed);
thread 1:
B($).store(1,mo_relaxed);
// load A will be after store B via StoreStore release ordering
r2($) = A($).fetch_add(0,mo_acq_rel);
after :
r1+r2 == 1 or 2 (never 0)
Here thread 1 uses an RMW to ensure StoreLoad ordering, so we get rid of the fence.
This code no longer works. Now the B.load in thread 0 can hoist above the B.store in thread 1. The reason
is that the seq_cst fence in thread 0 is not acting as a StoreLoad any more because it has nothing to synchronize against
in the other thread.
struct futex_mutex3
{
std::atomic
That is, the two fences allow you to set up a transitive relationship - like the m_count.load in unlock() is definitely
after the fence in unlock, the fence in unlock is after the fence in lock, and the fence in lock is after the m_count.fetch_add ;
therefore the m_count.load must see count > 0.
<int> m_state; // mutex locked flag
// waitset :
HANDLE m_handle; // actual wait event
atomic<int> m_count; // waiter count
/*************/
futex_mutex3() : m_state(0), m_count(0)
{
m_handle = CreateEvent(NULL,0,0,NULL);
}
~futex_mutex3()
{
CloseHandle(m_handle);
}
void lock()
{
// try to grab the mutex by exchanging in a 1
// if it returns 0, we got the lock
if ( m_state($).exchange(1,rl::mo_acquire) )
{
// prepare_wait :
// add one to waiter count
m_count($).fetch_add(1,mo_relaxed);
// (*1)
std::atomic_thread_fence(mo_seq_cst,$);
// double check :
while ( m_state($).exchange(1,rl::mo_acquire) )
{
// wait :
WaitForSingleObject(m_handle, INFINITE);
}
// retire_wait :
m_count($).fetch_add(-1,mo_relaxed);
}
}
void unlock()
{
m_state($).store(0,rl::mo_release);
//notify_one :
// (*2)
// need #StoreLoad before loading m_count
std::atomic_thread_fence(mo_seq_cst,$);
int c = m_count($).load(mo_acquire);
if ( c > 0 )
{
SetEvent(m_handle);
}
}
};
variant "apple" :
void unlock()
{
// release is for the mutex innards
m_state($).exchange(0,rl::mo_acq_rel);
// #StoreLoad is achieved as a #LoadLoad on m_state
int c = m_count($).load(mo_relaxed);
if ( c > 0 )
{
SetEvent(m_handle);
}
}
variant "banana" :
void unlock()
{
// release is for the mutex innards
m_state($).store(0,rl::mo_release);
// #StoreLoad is achieved as a #StoreStore on m_count :
int c = m_count($).fetch_add(0,mo_release); //(*3)
if ( c > 0 )
{
SetEvent(m_handle);
}
}
(variant "apple" only works if the double check on m_state in lock is acq_rel).
05-31-12 | On C++ Atomic Fences Part 2
Last post we talked about what C++0x fences are. Now we'll look at what they are not.
Set up stuff
Barrier
Publish stuff
other threads :
if stuff is published
then reading stuff should see set up
this does *not* work if "Barrier" is a C++0x fence. In particular we can construct this simple example :
atomic
this does not work (xx can be zero). To make it work in C++0x you need something that can synchronize with the
fence, for example this would work :
<int> x; // stuff
atomic<int> y; // publication flag
x & y initially 0
thread 0 :
// set up stuff :
x($).store(1,mo_relaxed);
// barrier :
std::atomic_thread_fence(mo_seq_cst,$);
// publish :
y($).store(1,mo_relaxed);
thread 1 :
// wait for publication :
rl::backoff bo;
while ( y($).load(mo_relaxed) == 0 )
bo.yield($);
// read it :
int xx = x($).load(mo_relaxed);
RL_ASSERT( xx == 1 );
thread 1 :
// wait for publication :
rl::backoff bo;
while ( y($).load(mo_relaxed) == 0 )
bo.yield($);
// added -
std::atomic_thread_fence(mo_acquire,$);
// read it :
int xx = x($).load(mo_relaxed);
RL_ASSERT( xx == 1 );
Why does it work on real CPU's then? (assuming the "relaxed" loads are at least C++ volatile so they go to memory
and all that, but not otherwise ordered). On all current real CPU's a memory barrier sends a message to all other CPU's
which creates a sync point for all of them (not exactly true, but effectively true).
When thread 1 sees that y is non-zero, then the store to y on thread 0
must have happened, which means the barrier must have happened, so x must be set up, and our load of x occurs after the
barrier, so it must see the set up value. That is, the barrier forms a sync point on *all* threads, not just the
originator, and you don't necessarily need your own fence to tack into that sync, all you need to do is have a way
of connecting a "happens before/after" to it. In this case we can say :
thread 1 reads x after
thread 1 loads y with value == 1 after
thread 0 stores y with value == 1 after
thread 0 does a barrier after
thread 0 stores x
struct test7 : rl::test_suite
This is an example where if "Barrier" is an actual CPU barrier, this code works. But if "Barrier" acts like
a C++0x seq_cst fence (and no more), then it doesn't work. (it doesn't work in the sense that it doesn't
actually provide mutual exclusion, eg. the code doesn't do what you expect it to).
<test7, 2>
{
// the mutex :
std::atomic<int> m_flag[2];
std::atomic<int> m_turn;
// something protected by the mutex :
std::atomic<int> m_data;
void before()
{
m_flag[0]($).store(0);
m_flag[1]($).store(0);
m_turn($).store(0);
m_data($).store(0);
}
void after()
{
int d = m_data($).load();
RL_ASSERT( d == 42 );
}
void lock(int tidx)
{
int other = tidx^1;
m_flag[tidx]($).store(1,std::mo_relaxed);
m_turn($).store(other,std::mo_release);
// ** Barrier here **
rl::backoff bo;
for(;;)
{
int f = m_flag[other]($).load(std::mo_acquire);
int t = m_turn($).load(std::mo_relaxed);
if ( f && t == other )
{
bo.yield($);
continue;
}
break;
}
}
void unlock(unsigned tidx)
{
m_flag[tidx]($).store(0,std::mo_release);
}
void thread(unsigned tidx)
{
lock(tidx);
// do m_data += 7; m_data *= 2;
int d = m_data($).load(std::mo_relaxed);
m_data($).store(d+7,std::mo_relaxed);
d = m_data($).load(std::mo_relaxed);
m_data($).store(d*2,std::mo_relaxed);
unlock(tidx);
}
};
thread 0 : starts lock
thread 0 : flag[0] = 1
thread 0 : turn = 1
thread 1 : starts lock
thread 1 : flag[1] = 1
thread 1 : turn = 0 ; (overwrites the previous turn=1)
thread 1 : barrier
thread 1 : load flag[0] ; sees 0 - old value (*!)
thread 0 : barrier
thread 0 : load flag[1] ; sees 1
thread 0 : load turn ; sees 0
(*!) is the problem point. For the code to work we must load a 1 there (which thread 0 set already). On a
normal CPU you could say :
load flag[0] is after
barrier, which is after
store to turn = 0, which is after
store to turn = 1, which is after
store flag[0] = 1
therefore load flag[0] must see a 1. (I know the store to turn of 0 is after the store of 1 because the
later load of turn on thread 0 sees a 0).
void lock(int tidx)
{
int other = tidx^1;
m_flag[tidx]($).store(1,std::mo_relaxed);
m_turn($).exchange(other,std::mo_acq_rel); // changed to RMW
rl::backoff bo;
for(;;)
{
int f = m_flag[other]($).load(std::mo_acquire);
int t = m_turn($).load(std::mo_relaxed);
if ( f && t == other )
{
bo.yield($);
continue;
}
break;
}
}
and that works (it's the same thing described here :
Peterson's lock with C++0x atomics - Just Software Solutions
among other places).
Subtle difference between C++0x MM and other MMs - Page 3
stdatomic_thread_fence - cppreference.com
Relacy finds fence bugs in spsc
Re questions about memory_order_seq_cst fence 2
Re questions about memory_order_seq_cst fence 1
Implementing Dekker's algorithm with Fences Just Software Solutions - Custom Software Development and Website Development in
C++0x sequentially consistent atomic operations - comp.programming.threads Google Groups 2
C++0x sequentially consistent atomic operations - comp.programming.threads Google Groups 1
05-30-12 | On C++ Atomic Fences
C++0x's atomic_thread_fence is weird. Preshing asked some questions and pointed out some errors in
this blog post which
has got me to look into it again.
if ( atomic_ticket.load( mo_relaxed ) == me )
{
std::atomic_thread_fence( mo_acquire ); // make previous load act like acquire
... do stuff on my ticket ...
}
is faster than :
if ( atomic_ticket.load( mo_acquire ) == me )
{
... do stuff on my ticket ...
}
this can be used for example with a ref counting destructor; you don't actually need the "acquire" until the refs go to zero. Which
brings us to the next note :
An "acquire" fence can make a preceding load act like a load_acquire
A "release" fence can make a following store act like a store_release
(an acq_rel fence obviously does both)
A "seq_cst" fence provides an entry in the program total order ("S")
then preceding loads & following stores can be located relative to that point in the order S
(eg. either "happens before" or "happens after")
Actually the fences are rather like the old-school way of marking up memory ordering constraints. eg. instead of :
x.load( mo_acquire ); // #LoadLoad follows acquire load
you used to write :
x.load();
#LoadLoad
which is more like the fence method in C++0x :
x.load( mo_relaxed );
fence( mo_acquire ); // #LoadLoad
Errkay so let's get into more specifics.
A is a release fence
X is an atomic op on M, X modifies M, after A
Y is an atomic op on M, Y reads the value set at X, before B
B is an acquire fence
is this :
m_x and m_y initially zero
void thread(unsigned tidx)
{
if ( tidx == 0 )
{
m_x($).store(1,std::mo_relaxed);
std::atomic_thread_fence(std::mo_release,$); // "A"
m_y($).store(1,std::mo_relaxed); // "X" , m_y is "M"
}
else
{
while ( m_y($).load(std::mo_relaxed) == 0 )
{
}
// we just read a 1 from m_y // "Y"
std::atomic_thread_fence(std::mo_acquire,$); // "B"
int x = m_x($).load(std::mo_relaxed);
RL_ASSERT( x == 1 );
}
}
Roughly what this says is the "release" ordering of a fence synchronizes with the "acquire" ordering of another fence if there is a
shared variable ("M") that connects the two threads, after the release and before the acquire. Or, if you like, the release fence
before the store to M makes the store act like a store-release, and the acquire fence after the load of M makes the load of M act like
a load-acquire.
A is a release fence
X is an atomic op on M, X modifes M, after A
A synchronizes with B :
B is an atomic op on M, reads the value written by X
void thread(unsigned tidx)
{
if ( tidx == 0 )
{
m_x($).store(1,std::mo_relaxed);
std::atomic_thread_fence(std::mo_release,$); // "A"
m_y($).store(1,std::mo_relaxed); // "X" , m_y is "M"
}
else
{
while ( m_y($).load(std::mo_relaxed) == 0 )
{
}
int y = m_y($).load(std::mo_acquire); // "B" synchronizes with "A"
RL_ASSERT( y == 1 );
int x = m_x($).load(std::mo_relaxed);
RL_ASSERT( x == 1 );
}
}
This just says the same thing but that you can substitude the acquire fence for a load-acquire. That is, a release fence
can synchronize with a load-acquire just as if it was a store-release.
A is an atomic release op on object M
X is an atomic op on M, before B, reads from A
B is an acquire fence
void thread(unsigned tidx)
{
if ( tidx == 0 )
{
m_x($).store(1,std::mo_relaxed);
m_y($).store(1,std::mo_release); // "A"
}
else
{
while ( m_y($).load(std::mo_relaxed) == 0 )
{
}
// we just read a 1 from m_y // "X"
std::atomic_thread_fence(std::mo_acquire,$); // "B"
int x = m_x($).load(std::mo_relaxed);
RL_ASSERT( x == 1 );
}
}
Again the same thing, just saying an acquire fence can synchronize with a store-release.
A modifies M , before X
X is a seq_cst fence
B reads M , after X
Then B cannot see a value of M from before A.
Well, duh.
Note that of course this is only non-trivial when A and B are done on different threads. And B being "after X" usually
means B is after something else in the total order S which you know to be after X.
A modifies M , before X
X is a seq_cst fence
Y is a seq_cst fence, Y is after X
B reads M , after Y
A is an RMW on M , before X
X is a seq_cst fence
Y is a seq_cst fence, Y is after X
B is an RMW on M , after Y
then B is after A on the RMW order of M.
X is a seq_cst fence, before A
A is an RMW on M
B is an RMW on M, after A
Y is a seq_cst fence, after B
then Y is after X in the total order S
which presumably is also true.
05-27-12 | Prefer Push-Restore to Push-Pop
It's a standard paradigm to use a Stack for a subsystem which parallels the execution stack.
Func1()
{
s_depth ++; // push
... other stuff ..
Func2()
s_depth --; // pop
}
Now, sophomoric programmers may already be thinking you could use a "scoper" class to do the push/pop for you;
as long as you use the scoper that ensures that you do push/pops in pairs, and it eliminates one type of bug, which is
accidentally returning without popping.
Func1()
{
int depthBefore = s_depth;
s_depth ++; // push
... other stuff ..
Func2()
s_depth --; // pop
ASSERT( s_depth == depthBefore );
}
The assert checks that after the pop, the stack should be back to where it was before the push.
Func1()
{
int depthBefore = s_depth++; // push
... other stuff ..
Func2()
s_depth = depthBefore; // restore
}
A few quick notes on this : 1. Obviously for super-robust code you should run both methods simultaneously and check that they match; in case of
mismatch, perhaps prompt the user (footnote *1*), or just prefer the more robust, and 2. it's always a good idea to write out the asserts that
check the invariants you believe to be true in the code. Often you will find that the assert is simpler than the code it was checking. 3. the
code is more robust if you just go ahead and make the invariant true. eg. you often see something like :
int Func3(int x)
{
ASSERT( x == 1 || x == 3 );
int y = x * 4 + 12;
ASSERT( y == 16 || y == 24 );
return y;
}
well if you know the parameters and you know what the answer should be, then just fucking return that. If you require a simple invariant, then just
fucking make it true. Don't assume it's true without testing it. Asserting is slightly better, but it's still fragile. Just make it true.
struct ProfBlock { tick_t start; tickt_t end; }
vector
and as noted in the comment, I assume you actually do something with the block once it's recorded, but the details
of that are not necessary here.
<ProfBlock> s_profile_vec;
void Profile_Push()
{
s_profile_vec.push_back();
s_profile_vec.back().start = tick();
}
void Profile_Pop()
{
s_profile_vec.back().end = tick();
// [start,end] for this block is now set, as is stack depth, store it somewhere permanent
s_profile_vec.pop_back();
}
Func1()
{
Profile_Push();
... other stuff ..
Func2()
Profile_Pop();
}
int Profile_Push()
{
int i = s_profile_vec.size();
s_profile_vec.push_back();
s_profile_vec.back().start = tick();
return i;
}
void Profile_Restore(int i)
{
s_profile_vec[i].end = tick();
// [start,end] for this block is now set, as is stack depth, store it somewhere permanent
s_profile_vec.resize(i);
}
Func1()
{
int p = Profile_Push();
... other stuff ..
Func2()
Profile_Restore(p);
}
If you like you can assert in Restore that it's equivalent to a Pop. But crucially, if you are doing the redundant method and
asserting, the fallback behavior should be Restore. (in practice for me, once I realized that Push-Restore is what I really want,
that opened the door for me to allow Pushes without Pops, so I don't check that Restore is equal to a Pop, I let them be unequal).
struct Blah
{
int m_x;
int m_y;
// NOTE : m_y must = m_x * 2;
};
At least this example code has a note about the redundant relationship that must be kept in sync, which is better than lots
of real production code, but it's still just a bug waiting to happen. The redundant state should be eliminated and any
queries for m_y should just go directly to the single variable that is authoritative.
05-10-12 | Fix My Browser
I cannot for the life of me figure out how to do this. I want two (or three) things :
Problems with File Association in Windows 7 64-bit - Microsoft Answers
OpenWithView - DisableEnable items in the 'Open With' dialog-box of Windows
Index of filesdips64
FileTypesMan - Alternative to 'File Types' manager of Windows
Can't get program into OPEN WITH list
05-06-12 | Photos : Some Carpentry
I'm a total beginner, but I'm getting better. I find this to be pretty fun and rewarding; as long as it's
not something that's on my life critical path, or that I have to do in a hurry, I enjoy it. It's fun to go
out to the garage by myself and play some music and figure out how to do things. I'm sort of vaguely thinking about taking some classes, but then it
starts to get too serious and I stop enjoying it.
05-06-12 | Photos : Pots from Q1 2012
My third quarter of classes ; Q1 2012 with ended in early April. It was a good quarter, I made a lot of progress
throwing bigger and thinner. All the work from that quarter :
05-06-12 | Photos : Hawaii 2011
I keep sort of day dreaming about getting a house in hawaii for frequent getaways. I'm not sure it's a good idea.
Owning a house is a fucking pain in the ass; owning multiple is just extra pain. Maybe it could be made super low
maintenance with a steel roof and only native/natural/xeriscape plantings. I dunno, whenever I rent a Hawaii house
I always think about the shit I don't like about it (I think the vast majority of houses in hawaii are fucking gross
and people just don't "get it" ; see below), but man not having to deal with the house is such a huge advantage.
05-06-12 | Photos : Almost spring
This post is from back in march.
05-04-12 | Toilet
WTF is wrong with you people who go and sit on the toilet for half an hour? If nothing is coming out,
get the fuck up and get out of there. The only reason to park yourself on the toilet is if you have
diarrhea (and I don't hear any pllfft coming from the stall so I know that's not it).
If you're constipated there's no need to sit there, get the fuck up and get back to work;
and hell if you're constipated on a daily basis, maybe you should think about eating a fucking vegetable
once in a while.
05-03-12 | Doing Work For Another
It's very pleasurable to do some hard work for someone else, when you can do something for them, push yourself,
work at it for a while, and you feel like they really appreciate it. It's one of the principle pleasures of
the human pack, it's part of what makes being single so horrible. (when you're single all the hard work you're
doing is only for yourself, and you inevitably have to ask yourself "why am I doing this? I don't really care"
and you wind up putting on a bath robe and becoming The Dude).
Player 1 has a hand which contains either the "I do it for you" card or the "I do it for myself" card
Player 1 makes the offer which reveals some information in subtext
Player 2 has to make an estimate of 1's holdings; eg. based on the limited information I got, I
esimated they have "for you" 20% of the time and "for me" 80% of the time
Player 2 says yes or no
Player 1 asks about the details ; this reveals a little bit more information
and player 2 revises their estimates.
Player 2 must now consider their various possible responses;
fold = "forget it"
call = "do it your way"
raise = "actually please do it this specific way"
What you have to do is a weighted EV analysis; that is,
for each possible action I can take, what is the outcome in each of the hidden cases ("for you" or "for myself")
then weight the value by my current estimate of what the underlying truth is
People with very sophisticated social intelligence are actually doing this kind of analysis all the time, though I
think they don't realize it. (I've written a rant very similar to this before).
04-09-12 | Old Image Comparison Post Gathering
Perceptual Metrics, imdiff, and such. Don't think I ever did an "index post" so here it is :
01-17-11 - ImDiff Release
01-12-11 - ImDiff Sample Run and JXR test
01-10-11 - Perceptual Results - PDI
01-10-11 - Perceptual Results - mysoup
01-10-11 - Perceptual Results - Moses
01-10-11 - Perceptual Metrics
01-10-11 - Perceptual Metrics Warmup - x264 Settin...
01-10-11 - Perceptual Metrics Warmup - JPEG Settin...
12-11-10 - Perceptual Notes of the Day
12-09-10 - Rank Lookup Error
12-09-10 - Perceptual vs TID
12-06-10 - More Perceptual Notes
12-02-10 - Perceptual Metric Rambles of the Day
11-18-10 - Bleh and TID2008
11-16-10 - A review of some perceptual metrics
11-08-10 - 709 vs 601
11-05-10 - Brief note on Perceptual Metric Mistakes
10-30-10 - Detail Preservation in Images
10-27-10 - Image Comparison - JPEG-XR
10-26-10 - Image Comparison - Hipix vs PDI
10-22-10 - Some notes on Chroma Sampling
10-18-10 - How to make a Perceptual Database
10-16-10 - Image Comparison Part 9 - Kakadu JPEG2000
10-16-10 - Image Comparison Part 11 - Some Notes on the Tests
10-16-10 - Image Comparison Part 10 - x264 Retry
10-15-10 - Image Comparison Part 8 - Hipix
10-15-10 - Image Comparison Part 7 - WebP
10-15-10 - Image Comparison Part 6 - cbwave
10-14-10 - Image Comparison Part 5 - RAD VideoTest
10-14-10 - Image Comparison Part 4 - JPEG vs NewDCT
10-14-10 - Image Comparison Part 3 - JPEG vs AIC
10-14-10 - Image Comparison Part 2
10-12-10 - Image Comparison Part 1
More :
01/2011 to 06/2011
10/2010 to 01/2011
01/2010 to 10/2010
01/2009 to 12/2009
10/2008 to 01/2009
08/2008 to 10/2008
03/2008 to 08/2008
11/2007 to 03/2008
07/2006 to 11/2007
12/2005 to 07/2006
06/2005 to 12/2005
01/1999 to 06/2005