Thursday, October 30, 2008

The cost of [ThreadStatic] attribute

First of all, raw results:

Instance field: Operations: 2,650 B/s.
Static field: Operations: 2,630 B/s.
Volatile instance field: Operations: 2,649 B/s.
[ThreadStatic] field: Operations: 43,611 M/s.
Thread data slot: Operations: 4,068 M/s.

The test is actually quite simple: we read specified field in a loop. As before, on Core 2 Duo @2.66GHz. The code can be found in DataObjects.Net 4.0 test suite, see Xtensive.Core\Xtensive.Core.Tests\DotNetFramework\FieldTypeTest.cs.

Conclusions:
- Reading regular, static or volatile field is quite cheap: ~ 0.2x in previously introduced metrics, or 20% of virtual method call time
- [ThreadStatic] fields are actually quite costly in comparison to others: ~ 14x.

Now the main question: why? It isn't so obvious [ThreadStatic] is ~ 60 times slower than static.

JITted [ThreadStatic] access code actually always consists of two parts:
- Call to a system routine returning address of [ThreadStatic] field by its token
- Regular field access instruction.

Obviously, the first part (call) "eats" almost the whole execution time: there is no more efficient way to get the address of such a field by its token rather than using hash table. As I've mentioned before, reading from a system hash table takes ~ 10x. So that's nearly what we have in this case.

Why they're implemented this way in .NET? I can't imagine why they didn't use some faster approach. E.g. I suspect calculating the lowest stack boundary (as well as the upper one) from the current stack pointer value is quite simple operation - something like bitwise and. Why we can't store the address of the first [ThreadStatic] location as fixed address nearby it, and use constant offset for each [ThreadStatic] field relatively to the address of the first one? In this case it would take ~ 1x to access it...

Ok, this is what could be, but in reality we have 14x.

Finally, there are thread data slots as well. But they're 10 times slower than [ThreadStatic] fields, so it's always better to simulate their behavior with e.g. a Dictionary stored in [ThreadStatic] field.

8 comments: