Friday, October 17, 2008

The cost of having a finalizer

It's well-known that having a finalizer is costly. But how much does it cost - to have a finalizer?

Allocation to nothing:
- SlimObject: Operations: 77.880 M/s.
- FinalizableSlimObject: Operations: 1.628 M/s.

Allocation to an array:
- SlimObject: Operations: 6.996 M/s.
- FinalizableSlimObject: Operations: 3.256 M/s.

"Allocation to nothing" here is allocation without storing a reference anywhere. This case must be the most frequent scenario, since most of allocated objects don't survive the next GC in generation 0.

There is no any additional GC cost for SlimObject in this case - there are no references to any of such objects, so garbage collector won't mark any of them, and consequently, won't spent any time on them at all.

But what's with FinalizeableSlimObject?
- It's allocation is costly, since it is registered for further finalization (added into a finalization queue). Registration cost must be similar to getting a lock and adding an object to Hashtable.
- All of such objects are processed (i.e. marked) during the GC
- Then they're removed from finalization queue (~ finding & removing an entry from a Hashtable)
- And added to FReachable queue (cheap, ~ adding to a list).
- They're also processed on heap compaction phase. We get an additional cost proportional to the size of an object here.
- A separate finalization thread will spend some time on them to finalize them. This must be not quite noticeable in this test, since the test have been performed on multicore CPU.
- And finally, if some of them will survive until the next GC in generation 1, we'll will add further overhead similar to described.

So as you see, allocation of short living finalizable object is 50 times more expensive than of the regular one!

Now let's turn to long-living objects: "Allocation to an array" here is allocation with storing a reference to allocated object in array. The test was running for about a second, so there must be several generation 1 GCs. Certainly SlimObject wins here as well (about 2 times), but not as mach as in above case. Why?
- Because all the objects survive many GCs, and GC overhead is the most significant factor for both types of them.
- One more factor slowing down the test with SlimObject is less efficient usage of L1\L2 caches: in first case all the data were "laying" mainly in cache, but here pages must "flow" through the caches. But I suspect this is much less important factor than the first one here.

In any case, probably it's time to finalize some finalizers ;) At least the ones that aren't really necessary.