Thread Synchronization

I came across a good tutorial post of thread synchronization tonight, http://www.tutorialsdaddy.com/2015/10/understanding-kernel-synchronization/. Reading this covered the standard problems and measures to prevent the problems of race conditions, simultaneous access, and hard-to-diagnose timing errors. It was good for what it is. My issue is that one of the best and perhaps least used, most universally applicable, simplest to understand, and most rarely taught in college.

How do I know it isn’t taught? By interviewing graduates for positions at my company. It doesn’t matter if they were from a prestigious university or community college, Computer Science majors or Electrical Engineers, male or female, I didn’t find interviewees who understood the problem, or knew the simplest solution.

If you aren’t familiar with the problem, read the link above. It describes enough aspects of the problem to be useful.

To prevent data-induced timing problems without an additional locking mechanism, it is critical that only one thread can write to each variable. If only one thread will ever write, then there can be no conflict over invalid state or lost transitions. Many may read, but of writers there can be only one.

The best way to do this is with a circular buffer.

A circular buffer is a region of memory, usually contiguous, with two associated pointers. One pointer, the “out” pointer, points to the oldest entry containing data. The other, the “in” pointer, indicates the first empty space ready to receive data. If the pointers are equal, there is no data. To add data to the buffer, simply write the data to the location indicated by the “in” pointer, and then increment it. To remove data from the buffer, copy the data from the “out” pointer, then increment it. In both cases. the increment wraps to the from of the buffer it would go past the end.

This works on any processor architecture that provides coherent write — where the whole pointer is written in one cycle. This is true on most 32 bit processors1. This scheme does not require an atomic read-modify-write cycle. The pointers must be accessible to both the writer and the reader. The writer and reader may be threads, interrupt handlers, or even two separate processors — even asymmetric processors sharing a memory system.

What if one processor is an 8-bit processor, and can not atomically write pointers? Change the pointers to 8-bit indices, which can be atomically written.

What if the data is larger than a byte? Increment by the data size.

What if the data is variable size? Write what is needed, and increment to just after the data.

An interesting variation is a case where there is no data at all, other than the “pointers”. An example would be an event sender and an event receiver. Sending an event turns into an increment of the “in” counter. Receiving the event is an increment of the “out” counter. To know how many events are outstanding, subtract the “out” from the “in”. Notice that a counter violates the one-reader/one-writer criteria, but two counters does not.

Circular buffers are worth considering as a first solution. If a circular buffer isn’t powerful enough, then consider locks, mutexes, spinlocks, or other more exotic solutions. Or, re-frame the problem so that a circular buffer is exactly powerful enough.

This could be a fourth rule of programming.


Footnotes:

1) It may be worth checking the machine code generated by the compiler. A 32-bit Pentium-Pro compiler (Metaware HI-C), optimized some 32-bit writes to be performed as two 16-bit writes. This gave better concurrent use of the two integer units and memory path, but may have been applicable to a limited set of x86 processor implementation. Bottom line: it is good to assure that writes are indeed atomic.