Thursday, May 14, 2009

Fast expression compilation

Recently we've faced a well-known problem: LambdaExpression.Compile is quite slow. Mainly, because of:
- Huge amount of reflection
- Excessive security checks
- Absence of any caching!

We've faced the issue while implementing compilation to executable providers for our RecordSet engine for DO4. We should compile generally any expression, such as predicate passed to FilterProvider here. Event quite simple LINQ queries using just a single .Where invocation normally require compilation of 2 expressions: one for original .Where criteria, and one - for index range definition for RangeSetProvider (we build it as expression).

Obviously, we recommend everyone to use our cached queries in DO4 - they eliminate the problem. But what if you can't cache the query - e.g. if it is built dynamically, and actual instances almost always differ?

So we've implemented LambdaExpression compilation cache. Being implemented well, it solves all the above problems.


Raw results

Simple expression compilation test:
- Expression: (int a, int b) => a + b;
- Without caching: 15,884 K compilations/s.
- With caching: 45,158 K compilations/s.

Complex expression compilation test:
- Expression: (int a, int b) => new {Result = a + b * 2 / a}.Result + DateTime.Now.Day * a * b - a + b
- Without caching: 0,945 K compilations/s.
- With caching: 25,747 K compilations/s.

As you see, the acceleration factor varies from ~ 3x to 26x! Even this result seems very good. But we decided to investigate why results of standard .NET expression compilation differs so much here and implemented one more test:

Always new expression compilation test:
- Expression: (int a, int b) => i;
- Without caching: 1,707 K compilations/s.
- With caching: 27,960 K compilations/s.

i
is an integer increasing its value by one on each compilation attempt. So it looks like there is already some kind of caching in .NET expression compilation logic. But IT FAILS even on such a simple case (difference in constant value)! Probably, they simply cache the compilation result using expression instance as key.

So "true" acceleration factor is at least 16x.


Implementation

Note: Further I'll use GenericType(of T) instead of standard C# notation with square brackets, because Blogger hates them - it simply cuts them out.

There is Xtensive.Core.Linq.ExpressionCompileExtensions class providing a set of additional .CachingCompile() methods to Expression type. Its usage is almost the same as of original .Compile method:
- Original code: var compiledLambda = lambda.Compile();
- New code: var compiledLambda = lambda.CachingCompile();

How it could work:
- Find cached version of provided expression tree in some dictionary
- Return its compiled version, if it exists; otherwise, compile & cache it.

Actually everything is much more complex:
- Expressions aren't comparable for equality. They neither override Equals, nor GetHashCode. But comparison for equalitry is required to use them as key in dictionary. We can't use default implmentation as well - Expression instances are almost always built anew rather than cached.
- Even if they would be comparable for equality, they won't really be equal because of closures: a new instance of closure is referenced by corresponding ConstantExpression on the subsequent creation of expression tree. Even if we could be able to compare such constants for equality, they would almost always differ.

So we've implemented:
- ExpressionTree class implementing IEquatable(of ExpressionTree), and properly overriding default GetHashCode & Equals. This allows us to compare expressions.
- ConstantExtractor - a visitor rewriting the original expression and removing any dependencies on constants from it. In fact, we rewrite original expression replacing each const of type T there to ((T) additionalLambdaParameter[constNumber]) there. E.g. () => 1 would be rewritten to (consts) => ((int) consts[0]), and we would cache this expression. ConstantExtractor builds the array of constant values (actual consts value) during processing of the original expression. Since this happens on any attempt to compile the expression by our caching compiler (because first of all we must build caching key - an ExpressionTree containing no constants), we always have this array of constants.
- So in the end we always have both compiled expression (processed by ConstantExtractor) and the array of constants. So we should just "bind" this array to the comiled expression. "Bind" means converting a pair of f(consts,a,b,c, ...) and extractedConsts to g(a,b,c, ...) = f(extractedConsts,a,b,c, ...). This actually done by corresponding overload of .CachingCompile method. Its typical code is:

public static Func(of T1, T2, ..., TResult) CachingCompile(of T1, T2, ..., TResult)
(this Expression(of Func(of T1, T2, TResult)) lambda)
{
var result = CachingExpressionCompiler.Instance.Compile(lambda);
return ((Func(of object[], T1, T2, ...)) result.First).Bind(result.Second);
}
.Bind is one more extension method provided by Xtensive.Core.Helpers.DelegateBindExtensions class.
All .Bind and .CachingCompile versions are generated by T4 templates - finally we've found the place where we could use them ;)

And finally, we use our ThreadSafeDictionary as actual cache, so any cached compiled result is never purged. Initially this may look as a serious lack, but actually it isn't: currently .NET is incapable of unloading any IL code. Even if we don't use cache, expression compilation eats some RAM, and this RAM is never released. So our "caching compiler" just eats a bit more - certainly, only if we don't hit the cache.


Conclusions

Pros and cons:
- We've got much faster expression compilation
- It's quite likely this solution significantly decreases the amount of RAM consumed by complied expressions during the application lifetime, since it significantly decrease the number of actual compilations.
- The compiled expression we return is a bit slower, because we replace constants to array accessors there and add one more delegate call (.Bind method does this). But I feel this will be acceptable in 99.9% of cases.

Possible improvements:
- Use lightweight expression adapters instead of Expression descendants as result of ConstantExtractor, such as the ones from MetaLinq. .NET expressions perform lots of checks during their construction, which aren't necessary here. We must just be able to compare such tree for equality with another one, compute its hash code, and much rarely - convert it to .NET expressions to get it compiled. I feel this should improve the performance at least twice.


Usage

- Download DO4 - it contains compiled version of Xtensive.Core.dll, as well as its source code
- Add reference to Xtensive.Core.dll to your project
- Add "using Xtensive.Core.Linq;" to C# file containing lambda.Compile()
- Replace lambda.Compile() to lambda.CachingCompile().

Wednesday, May 6, 2009

Built-in Visual Studio 2008 Code Generator

Have you ever heard about code generator built-in VS 2008 by default? Microsoft named it "T4: Text Template Transformation Toolkit". It is integrated into Visual Studio and very easy to use. Let me demonstrate it on example.

Let us suppose we need a class with methods like those for such primitive types as Int32, Double, Decimal, DateTime, TimeSpan and other.

  public static Double ParseDouble(string value)
  {
    if (value == "Default")
      return default(Double);
    if (value == "MinValue")
      return Double.MinValue;
    if (value == "MaxValue")
      return Double.MaxValue;
    return Double.Parse(value);
  }

  public static Double? ParseNullableDouble(string value)
  {
    if (value == "Default")
      return default(Double);
    if (value == "MinValue")
      return Double.MinValue;
    if (value == "MaxValue")
      return Double.MaxValue;
    if (value == "Null")
      return null;
    return Double.Parse(value);
  }


Because it isn't very interesting to implement it manually I suggest to use built-in code generator. How we can do this:

1. Create class SmartParser.cs and implement those methods for one type, e.g. for Int32
2. Rename SmartParser.cs to SmartParser.tt
3. Now it is possible to use ASP.NET-like tags <# #> and <#= #> to generate this code for all types we need and Visual Studio will automatically create file SmartParser.cs

In our case SmartParser.tt will look like this:

using System;

public static class SmartParser
{
<#
Type[] types = new Type[] {
  typeof(Int32),
  typeof(Double),
  typeof(Decimal),
  typeof(DateTime),
  typeof(TimeSpan)
};
foreach (Type type in types) {
#>

  public static <#=type.Name#> Parse<#=type.Name#>(string value)
  {
    if (value == "Default")
      return default(<#=type.Name#>);
    if (value == "MinValue")
      return <#=type.Name#>.MinValue;
    if (value == "MaxValue")
      return <#=type.Name#>.MaxValue;
    return <#=type.Name#>.Parse(value);
  }

  public static <#=type.Name#>? ParseNullable<#=type.Name#>(string value)
  {
    if (value == "Default")
      return default(<#=type.Name#>);
    if (value == "MinValue")
      return <#=type.Name#>.MinValue;
    if (value == "MaxValue")
      return <#=type.Name#>.MaxValue;
    if (value == "Null")
      return null;
    return <#=type.Name#>.Parse(value);
  }
<#
}
#>

}


Now we can open automatically generated SmartParser.cs just to ensure, that everything is OK and our "huge" class is successfully generated (-;

You can generate not only *.cs files, but *.xml, *.txt, *.html and generally text files with any extension. There are also several useful directives, for example:

<#@ output extension="txt" #>
<#@ template language="C#v3.5" #>
<#@ import namespace="System.Linq" #>
<#@ assembly name="System.Core" #>


Any additional information on T4 can be easily found оn the web.