Wednesday, October 29, 2008

The cost of method calls

Here I'll talk about the cost (or performance) of various ways of method invocation in .NET.

First of all, let's assume:
- Virtual method call time is 1x. This is about ~ 600M calls / sec. on Core 2 Duo @ 2.66 GHz. To be precise, we're talking about instance method getting no arguments and returning a single Int32 value (i.e. it is an average property getter).

So:
- Delegate method call time is 1.5x (the same method invoked by delegate pointing to it)
- Interface method call time is 2x (the same method invoked on reference of interface type)

A bit surprising, yes? The explanation is here. Briefly, interface method dispatch is more complex than delegate dispatch, since we must locate appropriate interface method table for the instance we have first. If count of implemented interfaces on a particular type is quite large (that's actually almost impossible), the only good way to do this is to use hash table. But if it is small (that's the most frequent case), there must be just a search in small [possibly - ordered] list. But this is anyway more complex than in case with delegate, since delegate already contains the exact method address (this isn't correct for open delegates).

Now one more fact to think about: all the tests I'm talking about are loop-based, performed in Release configuration, and their code is nearly the following:

for (int i = 0; i<...; i+=10) {
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
o = c.GetValue();
}


.NET is able to cache resolved interface method tables (and possibly - even method addresses), so calling interface method in a loop must be a bit faster, than calling it once. So in general the cost of interface method call in comparison to delegate call is even bigger.

This explains why we have such types as Hasher(of T) (Comparer, Arithmetic, etc.). They cache the delegates performing a set of operations on T type faster than with use of similar interface in their fields. See e.g. Hasher(of T).GetHash field. Certainly, such an approach is used only when performance is essential - i.e. when it's well known these operations will be invoked many times e.g. on any index seek.

Let's look of few more metrics:
- Creating a delegate pointing to non-virtual method time is 7.5x
- Creating a delegate pointing to virtual method time is 50x
- Creating a delegate pointing to interface method time is 150x

So delegate construction isn't cheap at all.

Now we're turning to virtual generic methods:
- Virtual generic method call time is 10x - independently on of it is called on interface or not. Dependency on count of generic arguments almost absents as well - adding one more argument makes the call longer by ~ 0.5%.
- Creating a delegate pointing to generic virtual method time is 1000x. Not sure, why - it looks like because of some bug in .NET. Since delegates in .NET may point to fully parameterized methods only (they store method address), so the time of calling such a delegate is 1.5x, as before.

Why it's so costly to call virtual generic method? Because there can be generally any number of its implementations, dependently on the argument substitution, so .NET framework resolves its address using internal hash table, that must be bound to corresponding virtual (or interface) method table.

So we may also take that:
- Internal hash table seek time is ~ 10x - I'll use this time in my future posts to show how to estimate the cost of generally any basic operation in .NET.

And few conclusions related to generic virtual methods:
- The smallest heap allocation time is 3.5x; Int32 boxing time is 4x (with its heap allocation). So it's almost always cheaper to have non-generic virtual method with object type argument, rather than generic virtual method.
- If generic virtual method seems anyway preferable, you might think about implementing its "call caching" with use of delegates. E.g. we use DelegateHelper.CreateDelegates and DelegateHelper.ExecuteDelegates methods to perform the operations on Tuples of the same structure faster.

6 comments:

  1. Forgot to add: the code for all these tests can be found in DataObjects.Net 4.0 test suite, see "Xtensive.Core\Xtensive.Core.Tests\DotNetFramework\".

    Any comments are welcome.

    ReplyDelete
  2. Good stuff, Alex.

    What do you think of functional-style programming in C# 3.0 and anonymous delegates? It seems that it will generate slower programs without the developer's knowledge... It is maybe the natural path from a low-level and efficient programming language to a high-level and low one :(.

    -gael

    ReplyDelete
  3. I agree with this. Anonymous delegates are frequently not cached, and in such cases construction of their instances is rather costly. Let's estimate this:

    - .NET creates an anonymous type to handle such a delegate. Its construction cost is ~ 3.5 in comparison to VM call.
    - Then it creates a delegate pointing to its method. The cost is ~ 7.5.
    - Finally, it invokes it. The cost is 1.5, assuming invocation happens just once (i.e. the worst, but most likely the most common case).

    So if anonymous delegate is called just once, you're getting ~ 8.5 times slower call in case it isn't cached.

    In general this isn't "to bad" - i.e. you should have about 50M invocations \ second in such a case. This is anyway much more then such languages as Ruby or Python may provide. But less then 400M with a regular delegate, of course ;)

    Btw, this is enough for the most of RDBMS \ BLL related tasks. It was shown that peak speed of data extraction with ORM is about 400K instances per second, i.e. 100 times slower. So using such anonymous delegates won't seriously affect on performance in such cases.

    But if I develop a game on C#, or something else requiring real-time computing, I would seriously care about this.

    ReplyDelete
  4. Copying recent discussion of this post by e-mail:

    Q: Hello Alex,

    I have read your blog post "The cost of method calls". It's very interesting. I was very surprised on the cost of virtual generic method call. Isn't there any difference between using reference vs value type parameters on the type or method?


    A:

    I suspect there can be noticeable difference: since actually there is just one JITted version of method emitted for any reference type, theoretically it can be found much faster (i.e. without a hash table lookup).

    On the other hand, generic methods are practically much more interesting for value types... For reference types they bring only better safety & readability. That's why we measured this for value types only.

    So just measured:

    Virtual generic call test (1 generic argument: Int32):
    Type: Int32
    Measurement: Generic method 1 (class) 65,500 M/s.
    Measurement: Generic method 1 (class, by delegate) 403,042 M/s.
    Measurement: Generic method 1 cast (class) 42,202 M/s.
    Measurement: Generic method 1 (interface) 65,523 M/s.
    Measurement: Generic method 1 (interface, by delegate) 405,369 M/s.
    Measurement: Generic method 1 cast (interface) 609,802 K/s.
    Virtual generic call test (1 generic argument: String):
    Type: String
    Measurement: Generic method 1 (class) 51,888 M/s.
    Measurement: Generic method 1 (class, by delegate) 308,143 M/s.
    Measurement: Generic method 1 cast (class) 444,508 K/s.
    Measurement: Generic method 1 (interface) 51,326 M/s.
    Measurement: Generic method 1 (interface, by delegate) 309,349 M/s.
    Measurement: Generic method 1 cast (interface) 443,909 K/s.

    Conclusions:
    - The difference is generally neglectable. May be we should try putting "where T: class" constraint on such methods to make JITter to think about this a bit more? ;)
    - Again, it seems there is a serious bug in delegate creation (see bold italicized results in "... cast" lines).

    ReplyDelete
  5. An overview of implementation of virtual generic method call in Mono: http://schani.wordpress.com/2008/10/12/fast-virtual-generic-calls/

    ReplyDelete
  6. A report about mentioned issue with performance of VGM calls on interface: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=91710

    ReplyDelete