Recommended

Multicore community

 

Articles

Intel.com

Microsoft.co.il

 

Community

Microsoft Forums

Intel's Forum

Intel's Multicore Community

 

Resources

http://msdn.com/concurrency

Intel Multicore

NVidia Multicore GPU

 

Downloads

.Net Parallel Extensions

Intel's TBB

WinModules   

 

Tools

AsyncOp Logger

Intel thread analysis

Intel VTune

 

Contact

Asaf Shelly

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-->

 

 

 

 

 

 
2 / 1
 
 
 
 
 
 
 

TLS

Thread Local Storage is one of the better ways to work in a parallelized system: non - parallel.

The TLS is a private copy that every thread has. This copy is saved in the thread's Stack just like local variables in a function, and since it is on the thread's private Stack then other threads do not use it. All threads share the same address space of the parent process and so all the Stacks that belong to these threads are visible by all other threads. The Stack is a buffer in memory. The special thing about the Stack is that it is managed automatically by the CPU.

Calling a function pushes the return address to the Stack. Local variables and some function parameters are also pushed to the Stack. When the code does not need the data it is released from the Stack, an action called pop.

See this possible Stack frame:

Address Meaning
1024 Start of Stack
1020 Second data
1016 Third data
....... ........
512 Return Address Func1
508 Parameter 1 (int)
504 Parameter 2 (double)
496 Local Variable (int)
492 Return Address Func2
488 Return Address Func3
484 End of Stack

The Stack has a pointer that starts with the highest address and counts down. When it goes beyond offset zero we get a Stack Overflow. The table above displays a state of the thread when it is currently running the code in function Func3 that was called from Func2 that was called from Func1. Here is the equivalent code:

int Func1 ( int param1, double param2 )

{

  int localVar = 5;

  Func2( );

  return 0;

}

 

int Func2 ( )

{

  Func3();

  return 0;

}

 

int Func3 ( )

{

  return 0;

}

The thread is now executing the code in function Func3 and even so it can still see the data that is stored in the variables that are local to function Func1 and the parameters sent to Func1. In other words the thread can access data that is berried deep down in the Stack no matter where it is.

Now think about the following function as the main function of all the threads in my application (this code is C++):

void* ThreadMainProc ( void* pThParam)

{

  MyStruct* pMyStruct = new MyStruct;    // using malloc() in C

  Func1();

  delete pMyStruct; pMyStruct = NULL;    // using free() in C

  return 0;

}

The code above is the main thread function. It allocates an object in memory and stores the pointer to this object in a local variable. At this point we start the execution flow by calling the function Func1. Only when the main thread function ThreadMainProc returns then the local variable is pop out of the stack and until then it remains in the stack. This means that every function in the code can see this pointer and access the object in memory.

If I have a few threads starting on this main function above then all these threads will have a copy of this object in memory, and every thread will have its own copy of the object.

The offset of the data pMyStruct in the Stack is the same for all these threads and so if we access this data by its offset in the Stack then we know that we are accessing a copy that is private to the thread that is currently running. See the following explanation.

Here is the Stack image:

Address Meaning
1024 Start of Stack
1020 Return address from ThreadMainProc
1016 Parameter pThParam
1012 pMyStruct local variable
1008 Return Address Func1
1004 Parameter 1 (int)
996 Parameter 2 (double)
992 Local Variable (int)
988 Return Address Func2
984 Return Address Func3
980 End of Stack

Suppose that my code could do this:

int Func3 ( )

{

  MyStruct* pMyStruct = (MyStruct*) STACK[ 1012 ];

  .... do things with pMyStruct ...

  return 0;

}

Now every thread has its own copy of the object and we don't need to look for that copy we just use it.

This is what the operating system (or library) is doing for us.

We have an API to ask the system to save a position in the stack and we receive the offset of the new data.

We have an API to write and read data to this stack location. We cannot store the pointer to the stack location because every thread has its own Stack and every Stack starts at a different address. We do however use the offset in the stack and we save it.

The allocation of the Stack location is performed once and then every thread calls an API to work with its copy of the object.

No need to deallocate a stack location but we should however deallocate the object that the TLS pointer is pointing to.

The TLS API usually allocates the size of a pointer because TLS storage is a very limited resource and is shared between all threads in the application and all libraries, dlls, ActiveX controls, modules, system hooks, etc.

A good example for using TLS is the Standard Output ( puts(), printf(), cout, etc.). When we print from several threads in sequences the outputs overlap. Try this:

int Func ( )

{

  puts("Here is the output:\n");

  printf("Data: %s\n", str_whatever);

 

        // C++

  cout << "Here is the output:" << endl

        << "Data: " << str_whatever << endl;

  return 0;

}

Every function call can be raced by another thread (C++: and every operator is a function call). Actually the code below can have every character interrupted by an input from different thread using the same mechnism.

The Standard Output is a buffered I/O which means that there is one output stream for the application and one output buffer to write on. This buffer is shared by all threads. If every thread could have its own copy of the stream and buffer then we could lock the access to the screen but work on a private copy until printing to it, for example until there is a new line or by explicitly calling a Flush function.

TLS is a preferred mechanism for both parallelism and clean code design.