Kernel Threads, and "Thread Safety"
CS 301 Lecture, Dr. Lawlor
Simple Kernel Threads
So in the previous lecture, we built threads and switched between them on our own. But you can actually get the OS
kernel to do the hard work of creating threads, which are then called
"kernel threads". On UNIX systems, you create a new kernel thread
by calling "pthread_create", which is in "#include <pthread.h>"
and the "-lpthread" library. You just pass pthread_create the
function to run in the thread, and the kernel will then run that
function, switching back and forth between threads at
basically random times (100 or 1000 times per second, usually).
#include <pthread.h>
void doWork(const char *caller) {
std::cout<<caller<<"\n";
}
void *fnA(void *arg) {
for (int i=0;i<3;i++) doWork("A");
return 0;
}
void *fnB(void *arg) {
for (int i=0;i<3;i++) doWork("B");
return 0;
}
int foo(void) {
pthread_t B;
pthread_create(&B,0,fnB,0); /* make a thread B, to run fnB */
fnA(0);
void *Bret;
pthread_join(B,&Bret); /* wait until B finishes */
return 0;
}
(executable NetRun link)
To illustrate the way the kernel randomly switches between these functions, run
this thing several times--here's what I got on several different runs:
A A A B B B
|
AB
AB
AB
|
B B B A A A
|
AB A A
B B
|
AB
B B A A
|
That is, the kernel runs A for a while, then B for a while, then back
to A. You can't tell when a switch is going to happen. Note that the kernel sometimes switches to B between when A
prints the letter 'A' and when it prints the newline immediately after
it in the string "A\n"!
The danger of threads
So basically a kernel thread is just a fancy way to run one of your subroutines
simultaniously with all the other subroutines. A kernel thread can
still call other functions, access your global variables, or use anything it can find that belongs
to other threads.
This shared access to common variables
immediately introduces the many problems of "thread
safety". For example, consider a piece of code like this:
int shared=0;
void inc(void) {
int i=shared;
i++;
shared=i;
}
If two threads try to call "inc" repeatedly, the two executions might interleave like this:
Thread A
|
Thread B
|
int i=shared; // i==0
i++; // i==1
// hit interrupt. switch to B
shared=i; // i is still 1, so shared==1!
|
int i=shared; // i==0 (again)
i++; //i==1
shared=i; // shared==1
int i=shared; // i==1
i++; //i==2
shared=i; // shared==2
int i=shared; // i==2
i++; //i==3
shared=i; // shared==3
// hit interrupt, switch back to A
|
Uh oh! When we switch back to thread A, the value stored in "i"
isn't right anymore. It's an older copy of a shared global variable,
stored in thread A's stack or registers. So thread A will happily
overwrite the shared variable with its old version, causing all of B's
work to be lost!
Here's an executable example of this problem. Both threads are
trying to increment "sum". They do this 6,000 times, so "sum"
should be 6000 at the end. But with optimization on, they both
store a copy of "sum" in a register, so one guy overwrites the other
guy's work when they write the modified value back to "sum", and you
(usually!) end up with the totally-wrong value 3000:
#include <pthread.h>
int sum=0; /*<- Careful! This variable is shared between threads! */
void doWork(void) {
for (int i=0;i<1000;i++)
sum++;
}
void *fnA(void *arg) {
for (int i=0;i<3;i++) doWork();
return 0;
}
void *fnB(void *arg) {
for (int i=0;i<3;i++) doWork();
return 0;
}
int foo(void) {
sum=0;
pthread_t B;
pthread_create(&B,0,fnB,0);
fnA(0);
void *Bret;
pthread_join(B,&Bret);
return sum;
}
(executable NetRun link)
Thread Safety with "volatile"
Sometimes, the only problem is the compiler's over-optimization of
access to global variables. You can scare the compiler away from
a variable by putting the keyword "volatile" in front of it. This
makes the compiler fear the variable, and do exactly what you ask with
it:
volatile int sum=0; /*<- Careful! This variable is shared between threads! */
void doWork(void) {
for (int i=0;i<1000;i++)
sum++;
}
(executable NetRun link)
Adding "volatile" does indeed improve the thread safety of this
program--in fact, on a uniprocessor machine, it now gets the right
answer! This is because "sum++" becomes a single instruction
("inc dword [sum]"), and an instruction executes as one piece--if the
kernel switches to another thread, it will be either before or after
this instruction, and never "in the middle of an instruction".
However, on a multiprocessor machine (dual core, dual CPU, or
hyperthreaded like the NetRun machine), this program STILL GIVES THE
WRONG ANSWER.
The reason is curious. On a real multiprocessor, two processors might simultaneously
execute the crucial "inc" instruction. Since the instruction
involves reading "sum" from memory, incrementing it, and writing it
back out, it can still happen that one processor overwrites the result
of another processor.
One way to *really* fix this code on a multiprocessor would be to use
the assembly language "lock" prefix, which causes an instruction to
execute "atomically". This changes the memory access mode so that
no other processor can modify that instruction's data until the
instruction completes. Here's what our loop looks like using the
lock prefix. "incl (sum)" is the g++ inline assembly version of
"inc dword [sum]".
volatile int sum=0; /*<- Careful! This variable is shared between threads! */
void doWork(void) {
for (int i=0;i<1000;i++)
__asm__ ( "lock incl (sum)" );
}
(executable NetRun link)
Now, finally, we get the right answer! However, our program is now 10x slower! Ah well--there are lots of other ways to fix this problem. We'll talk about these non-assembly thread-safety approaches in CS 321!