Mutex and Semaphores

So , what are mutex and semaphores ?What is the difference between mutex and semaphores ? And,what is the difference between a mutex and a binary semaphore.

These are the questions we are going to explore in this article.

To start with , let me quote the famous Bathroom example which is a simple way to remember the difference between mutex and semaphore :

Mutex vs. Semaphore, what is the difference?

The Toilet Example  (c) Copyright 2005, Niclas Winquist 😉

Mutex:

Is a key to a toilet. One person can have the key – occupy the toilet – at the time. When finished, the person gives (frees) the key to the next person in the queue.

Officially: “Mutexes are typically used to serialise access to a section of  re-entrant code that cannot be executed concurrently by more than one thread. A mutex object only allows one thread into a controlled section, forcing other threads which attempt to gain access to that section to wait until the first thread has exited from that section.”
Ref: Symbian Developer Library

(A mutex is really a semaphore with value 1.)

Semaphore:

Is the number of free identical toilet keys. Example, say we have four toilets with identical locks and keys. The semaphore count – the count of keys – is set to 4 at beginning (all four toilets are free), then the count value is decremented as people are coming in. If all toilets are full, ie. there are no free keys left, the semaphore count is 0. Now, when eq. one person leaves the toilet, semaphore is increased to 1 (one free key), and given to the next person in the queue.

Officially: “A semaphore restricts the number of simultaneous users of a shared resource up to a maximum number. Threads can request access to the resource (decrementing the semaphore), and can signal that they have finished using the resource (incrementing the semaphore).”
Ref: Symbian Developer Library

The author thus concludes that the correct use of a semaphore is for signaling from one task to another. A mutex is meant to be taken and released, always in that order, by each task that uses the shared resource it protects.

However , there is a controversial statement here “(A mutex is really a semaphore with value 1.)” .

As I see, even author has updated this with info “(* – Please note, that some web posts indicate, that this statement is not quite accurate”

And that is what , we are going to discuss here further in detail.

This blog post on semaphores on Feabhas  take this statement heads on .It also contains the history information of semaphores which I also intended to visit here , to clarify the difference.

The blog post says :

Back in 1965, Edsger Dijkstra, a Dutch computer scientist, introduced the concept of a binary semaphore into modern programming to address possible race conditions in concurrent programs. His very simple idea was to use a pair of function calls to the operating system to indicate entering and leaving a critical region. This was achieved through the acquisition and release of an operating system resource called a semaphore. In his original work, Dijkstra used the notation of P & V, from the Dutch words Prolagen (P), a neologism coming from To try and lower, and Verhogen (V) To raise, To increase.

With this model the first task arriving at the P(S) [where S is the semaphore] call gains access to the critical region. If a context switch happens while that task is in the critical region, and another task also calls on P(S), then that second task (and any subsequent tasks) will be blocked from entering the critical region by being put in a waiting state by the operating system. At a later point the first task is rescheduled and calls V(S) to indicate it has left the critical region.The second task will now be allowed access to the critical region.

A variant of Dijkstra’s semaphore was put forward by another Dutchman, Dr. Carel S. Scholten. In his proposal the semaphore can have an initial value (or count) greater than one.This enables building programs where more than one resource is being managed in a given critical region. For example, a counting semaphore could be used to manage the parking spaces in a robotic parking system. The initial count would be set to the initial free parking places. Each time a place is used the count is decremented. If the count reaches zero then the next task trying to acquire the semaphore would be blocked (i.e. it must wait until a parking space is available). Upon releasing the semaphore (A car leaving the parking system) the count is incremented by one.

Scholten’s semaphore is referred to as the General or Counting Semaphore, Dijkstra’s being known as the Binary Semaphore.

The article goes on to discuss some dangers of using semaphores which we’d discuss in a later post.

Also , if you go on to read the comments of this blog post , you’d notice that Michael Barr has pointed out that mutex is what is required in the parking space example. And that’s what we’d be discussing shortly when we reach to the part where we delve into Michael Barr’s article.

Meanwhile , this second post of Feabhas on mutex defines the mutex as :

The mutex is similar to the principles of the binary semaphore with one significant difference: the principle of ownership. Ownership is the simple concept that when a task locks (acquires) a mutex only it can unlock (release) it. If a task tries to unlock a mutex it hasn’t locked (thus doesn’t own) then an error condition is encountered and, most importantly, the mutex is not unlocked. If the mutual exclusion object doesn’t have ownership then, irrelevant of what it is called, it is not a mutex.

Further , article states how mutex overcomes the dangers of semaphores.As I mentioned,we’d discuss that in a separate article as this one deals with the difference between mutex and semaphore.

The author has also addressed the different points raised in the comments on his part 1 Semaphores , by giving examples on how different OSes have realized the mutex and semaphore. After all, it all comes down to what OS designer thinks what’d work best for him.

Part 3 of the series covers the problems which even a mutex does not address.

Going further , this article of Michael Barr on  mutexes and semaphores shuns up the whole bathroom example.

It is easiest to explain why the “multiple resource” scenario is flawed by way of an analogy. If you think of a mutex as a key owned by the operating system, it is easy to see that an individual mutex is analogous to the bathroom key owned by an urban coffee shop. At the coffee shop, there is one bathroom and one bathroom key. If you ask to use the bathroom when the key is not available, you are asked to wait in a queue for the key. By a very similar protocol, a mutex helps multiple tasks serialize their accesses to shared global resources and gives waiting tasks a place to wait for their turn.

This simple resource protection protocol does not scale to the case of two equivalent bathrooms. If a semaphore were a generalization of a mutex able to protect two or more identical shared resources, then in our analogy, it would be a basket containing two identical keys (i.e., each of the keys would work in either bathroom door).

A semaphore cannot solve a multiple identical resource problem on its own. The visitor only knows that he has a key, not yet which bathroom is free.  If you try to use a semaphore like this, you’ll find you always need other state information—itself typically a shared resource protected via a separate mutex. 2 It turns out the best way to design a two-bathroom coffee shop is to offer distinct keys to distinct bathrooms (e.g., men’s vs. women’s), which is analogous to using two distinct mutexes.

It then elaborates what is the correct usage scenario of a semaphore which is critical to understand the difference between the two

The correct use of a semaphore is for signaling from one task to another. A mutex is meant to be taken and released, always in that order, by each task that uses the shared resource it protects. By contrast, tasks that use semaphores either signal or wait—not both. For example, Task 1 may contain code to post (i.e., signal or increment) a particular semaphore when the “power” button is pressed and Task 2, which wakes the display, pends on that same semaphore. In this scenario, one task is the producer of the event signal; the other the consumer.

To summarize with an example, here’s how to use a mutex:

/* Task 1 */
   mutexWait(mutex_mens_room);
      // Safely use shared resource
   mutexRelease(mutex_mens_room);

/* Task 2 */
   mutexWait(mutex_mens_room);
      // Safely use shared resource
   mutexRelease(mutex_mens_room);

By contrast, you should always use a semaphore like this:

/* Task 1 - Producer */
    semPost(sem_power_btn);   // Send the signal

/* Task 2 - Consumer */
    semPend(sem_power_btn);  // Wait for signal

Importantly, semaphores can also be used to signal from an interrupt service routine (ISR) to a task. Signaling a semaphore is a non-blocking RTOS behavior and thus ISR safe.  Because this technique eliminates the error-prone need to disable interrupts at the task level, signaling from within an ISR is an excellent way to make embedded software more reliable by design.

And that is why you’d understand why Michael Barr did not agree with Feabhas’s blog’s example of multiple parking space problem solution through a counting semaphore ( or for that matter any example that suggests that semaphores can be used for multiple resources problem by just decrementing the count of semaphore)

What Michael Barr’s article does not cover in this article is the usage scenario and mechanism of counting semaphore .

It seems like someone else did follow up on this on stack overflow in this thread.

You need to go through both of the answers given on that link to understand how counting semaphores are used in producer-consumer problem.To cut the long story short , you’d require couple of semaphores together to make sure the producer consumer problem works error free.

Meanwhile,a paragraph from “Operating Systems” by William Stallings

To begin, the semaphore has a zero or positive value.If the value is positive,that value equals the number of processes that can issue a wait and immediately continue to execute.If the  value is zero, either by initialization or because a number of processes equal to the initial semaphore value have issued a wait.the next process to issue a wait is blocked,and the semaphore value goes negative.Each subsequent wait drives the semaphore value further into minus territory.The negative value equals the number of processes waiting to be unblocked.Each signal unblocks one of the waiting processes when the semaphore value is negative.
For both counting semaphores and binary semaphores, a queue is used to hold processes waiting on the semaphore.The question arises of the order in which processes are removed form such a queue.The fairest removal policy is  first-in-first-out (FIFO).The process that has been blocked the longest is released from the queue first;a semaphore whose definition includes this policy is called a “Strong Semaphore”.A semaphore that does not specify the order in which procceses are removed from the queue is a “weak semaphore”.
“A concept related to binary semaphore is the mutex.A key difference between the two is that the process that locks the mutex (Sets the value to zero) must be the one to unlock it.In contrast, it is possible for one process to lock a binary semaphore and for another to unlock it”
In some of the literature and in some textbooks, no distinction is made between a mutex and a binary semaphore.However, in practice, a number of operating systems such as Linux , Windows and Solaris offer a mutex facility that conforms to the definition in this book”
And the bonus for you 🙂
So , our take-aways here are difference between mutex and binary semaphore and usage scenarios of mutex, binary semaphore and counting semaphore.
Links for references :