Thursday, May 14, 2009

Fast expression compilation

Recently we've faced a well-known problem: LambdaExpression.Compile is quite slow. Mainly, because of:
- Huge amount of reflection
- Excessive security checks
- Absence of any caching!

We've faced the issue while implementing compilation to executable providers for our RecordSet engine for DO4. We should compile generally any expression, such as predicate passed to FilterProvider here. Event quite simple LINQ queries using just a single .Where invocation normally require compilation of 2 expressions: one for original .Where criteria, and one - for index range definition for RangeSetProvider (we build it as expression).

Obviously, we recommend everyone to use our cached queries in DO4 - they eliminate the problem. But what if you can't cache the query - e.g. if it is built dynamically, and actual instances almost always differ?

So we've implemented LambdaExpression compilation cache. Being implemented well, it solves all the above problems.


Raw results

Simple expression compilation test:
- Expression: (int a, int b) => a + b;
- Without caching: 15,884 K compilations/s.
- With caching: 45,158 K compilations/s.

Complex expression compilation test:
- Expression: (int a, int b) => new {Result = a + b * 2 / a}.Result + DateTime.Now.Day * a * b - a + b
- Without caching: 0,945 K compilations/s.
- With caching: 25,747 K compilations/s.

As you see, the acceleration factor varies from ~ 3x to 26x! Even this result seems very good. But we decided to investigate why results of standard .NET expression compilation differs so much here and implemented one more test:

Always new expression compilation test:
- Expression: (int a, int b) => i;
- Without caching: 1,707 K compilations/s.
- With caching: 27,960 K compilations/s.

i
is an integer increasing its value by one on each compilation attempt. So it looks like there is already some kind of caching in .NET expression compilation logic. But IT FAILS even on such a simple case (difference in constant value)! Probably, they simply cache the compilation result using expression instance as key.

So "true" acceleration factor is at least 16x.


Implementation

Note: Further I'll use GenericType(of T) instead of standard C# notation with square brackets, because Blogger hates them - it simply cuts them out.

There is Xtensive.Core.Linq.ExpressionCompileExtensions class providing a set of additional .CachingCompile() methods to Expression type. Its usage is almost the same as of original .Compile method:
- Original code: var compiledLambda = lambda.Compile();
- New code: var compiledLambda = lambda.CachingCompile();

How it could work:
- Find cached version of provided expression tree in some dictionary
- Return its compiled version, if it exists; otherwise, compile & cache it.

Actually everything is much more complex:
- Expressions aren't comparable for equality. They neither override Equals, nor GetHashCode. But comparison for equalitry is required to use them as key in dictionary. We can't use default implmentation as well - Expression instances are almost always built anew rather than cached.
- Even if they would be comparable for equality, they won't really be equal because of closures: a new instance of closure is referenced by corresponding ConstantExpression on the subsequent creation of expression tree. Even if we could be able to compare such constants for equality, they would almost always differ.

So we've implemented:
- ExpressionTree class implementing IEquatable(of ExpressionTree), and properly overriding default GetHashCode & Equals. This allows us to compare expressions.
- ConstantExtractor - a visitor rewriting the original expression and removing any dependencies on constants from it. In fact, we rewrite original expression replacing each const of type T there to ((T) additionalLambdaParameter[constNumber]) there. E.g. () => 1 would be rewritten to (consts) => ((int) consts[0]), and we would cache this expression. ConstantExtractor builds the array of constant values (actual consts value) during processing of the original expression. Since this happens on any attempt to compile the expression by our caching compiler (because first of all we must build caching key - an ExpressionTree containing no constants), we always have this array of constants.
- So in the end we always have both compiled expression (processed by ConstantExtractor) and the array of constants. So we should just "bind" this array to the comiled expression. "Bind" means converting a pair of f(consts,a,b,c, ...) and extractedConsts to g(a,b,c, ...) = f(extractedConsts,a,b,c, ...). This actually done by corresponding overload of .CachingCompile method. Its typical code is:

public static Func(of T1, T2, ..., TResult) CachingCompile(of T1, T2, ..., TResult)
(this Expression(of Func(of T1, T2, TResult)) lambda)
{
var result = CachingExpressionCompiler.Instance.Compile(lambda);
return ((Func(of object[], T1, T2, ...)) result.First).Bind(result.Second);
}
.Bind is one more extension method provided by Xtensive.Core.Helpers.DelegateBindExtensions class.
All .Bind and .CachingCompile versions are generated by T4 templates - finally we've found the place where we could use them ;)

And finally, we use our ThreadSafeDictionary as actual cache, so any cached compiled result is never purged. Initially this may look as a serious lack, but actually it isn't: currently .NET is incapable of unloading any IL code. Even if we don't use cache, expression compilation eats some RAM, and this RAM is never released. So our "caching compiler" just eats a bit more - certainly, only if we don't hit the cache.


Conclusions

Pros and cons:
- We've got much faster expression compilation
- It's quite likely this solution significantly decreases the amount of RAM consumed by complied expressions during the application lifetime, since it significantly decrease the number of actual compilations.
- The compiled expression we return is a bit slower, because we replace constants to array accessors there and add one more delegate call (.Bind method does this). But I feel this will be acceptable in 99.9% of cases.

Possible improvements:
- Use lightweight expression adapters instead of Expression descendants as result of ConstantExtractor, such as the ones from MetaLinq. .NET expressions perform lots of checks during their construction, which aren't necessary here. We must just be able to compare such tree for equality with another one, compute its hash code, and much rarely - convert it to .NET expressions to get it compiled. I feel this should improve the performance at least twice.


Usage

- Download DO4 - it contains compiled version of Xtensive.Core.dll, as well as its source code
- Add reference to Xtensive.Core.dll to your project
- Add "using Xtensive.Core.Linq;" to C# file containing lambda.Compile()
- Replace lambda.Compile() to lambda.CachingCompile().

2 comments:

  1. RE:
    Simple expression compilation test:
    - Expression: (int a, int b) => a + b;
    - Without caching: 15,884 K compilations/s.
    - With caching: 45,158 K compilations/s.

    Of course caching would help if we compile the same expression millions times, but what about one time compilation ? I guess it is even slower...

    ReplyDelete
  2. Yes, it is, although not dramatically: as you may find here, all the "additional job" we do is expression comparison. As it's shown, it takes ~ 30% of pure compilation time at max, and it happens anyway. So if there will be really unique expressions, they'll be compiled 30% slower.

    What's more important: how frequently do you produce really unique expressions, that are compiled just once, taking into account the way we compare them - our comparer is constant-insensitive?

    Moreover, each of such unique expressions will eat some RAM after the compilation (.NET doesn't unload the code), so such a process can't live infinitely. If you're caring about its perfromance, I admit such compilations must happen e.g. 1000 times per second. So it will die in ~ 1 day (8Gb ~= 100bytes*1000*60*60*24).

    ReplyDelete