Multithreaded Programming – Mutexes

Multithreaded programming is difficult.  The problem is that as soon as you introduce multiple threads, the flow of your program is no longer linear.  Race conditions arising from multithreaded programming are also some of the toughest bugs to find, reproduce, and fix.  No wonder most people avoid multithreaded programming!  Some people even suggest avoiding this practice all together [reference].  Don’t worry, I’m not here to talk about problems, I’m here to talk about solutions :)

The most common reason I’ve had for doing multithreaded programming is to deal with truly parallel data.  In my case, I have dealt with processing multiple live MPEG-2 Transport streams and providing access to the resulting data over a webservice.  This problem was encountered when developing the Terrier product at LucidHelix Solutions.

Synchronize with Mutexes?
In order to syncrhonize all of these threads we started by using Mutexes to protect shared resources.  This worked well until the volume of data increased and the number of mutex operations started to get into the 1000′s of locking and unlocking operations per second.  At that point everything ground to a halt because everytime you use a mutex a context switch is required to access the underlying kernel synchronization construct!

We quickly discovered that we could not use mutex lock acquisition to protect the resources, but we needed some way to synchronize the data.

FIFOs and Spin locks to the Rescue!
At this point we decided that it was necessary to take a more intelligent look at how the data was being processed and see if we could avoid the need for synchronization.  Having worked with FPGAs in the past, we realized that the multithreading problem is similar to the problem of crossing clock-domains in FPGAs.  The standard solution to the clock-domain crossing problem is a FIFO.  The FIFO is divided into 2 pieces, allowing one thread or clock-domain to feed the input and read the free space available and one thread tor clock-domain to pull the output and read the amount of data in the FIFO.  The key to this working is that no locking is required to write into the FIFO when there is free space, and no locking is requried to read data from the FIFO when there is data in it.

This turned out to be a great way to pass data from the multiple threads reading data to the threads processing that data.  Because the data was continuous the threads processing data use a spinlock to wait for data to come available, which turned out to be significantly less overhead than any kernel-level synchronization

What about the Webservice?
The other synchronization issue came with the need to access the data being processed from the multiple webservice threads.  This problem was different because data being synchronized was not a continuous stream of data, but rather a set of changing data structures.  The webservice would only ever have 10′s of requests per second so the overhead of synchronization on those threads was much lower.  The solution involves a few instance variables on the data processing thread:

  • shouldCopy – indicates whether or not the live data should be copied to a buffer for the webservice thread if an update happens, or if we should wait
  • updateAvailable – indicates whether the latest data available is newer than the copy made available to the webservice
  • currentlyCopying – set (if shouldCopy is 1) before copying data to the webservice buffer and cleared afterward

The webservice threads then simply do the following to ensure the copy available to the webservice threads is never overwritten while being read:

  1. Wait on a mutex shared by all webservice threads so only one thread at a time can be accessing the mutex.
  2. Set shouldCopy to 0
  3. SpinLock until currentlyCopying is 0
  4. Use the data
  5. Set shouldCopy to 1
  6. Release the mutex

This means that the data processing thread, which accesses the data 1000′s of times per second, never uses any kernel-level synchronization constructs.  Meanwhile the webservice thread, which accesses the data at most 10′s of times per second, uses a single Mutex to synchronize access between itself and other webservice threads.

Conclusion
The lesson learned from all of this for us was that multithreading is a problem that needs to be carefully approached and that with some careful planning one can minimize the amount of synchronization overhead.  By minimizing synchronization you maximize the speed of your software as we see the number of cores and processors in machines increasing, while clock speeds remain stationary.

This entry was posted in Algorithms. Bookmark the permalink.

2 Responses to Multithreaded Programming – Mutexes

  1. Pingback: Alexander1

  2. Pingback: Alexander7

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>