Tuesday, October 04, 2011

Delphi XE2 compiler performance

Delphi XE2 introduced namespaces across the runtime library. This stressed unit name lookup inside the compiler, and led to some severe performance regressions in certain cases. So during the runup to the XE2 release, I fired up a profiler and started investigating. It turns out there were numerous situations where lookups were being performed repeatedly with the same arguments, and logically the results should have been consistent across these repeated calls. A relatively cheap and easy fix seemed to be memoization. So I added a caching infrastructure to the compiler and used the profiler to guide me where to inject it.

For the most part - particularly for full builds - the cache works really well. But I've had some reports of bugs that I suspected were caused by the cache hanging on to function results slightly too long, and upon investigation, this turned out to be true. The problem with caches is usually in invalidation; if you don't invalidate the cache soon enough and in all required situations, you end up serving stale results. So there are a few bugs lurking in the Delphi compiler here, which I'm seeking out and squashing.

Some good news, however; I had anticipated that this might be the case, so I added a super secret switch that enables diagnosing a probable cache failure: caches can be selectively disabled, and if a problem goes away with the cache disabled, it's probably because of stale results.

Caches can be disabled by setting an environment variable:

DCC_CACHE_DISABLE='SearchUnitNameInNS,FileSystem,UnitFindByAlias,GetUnitOf'

The above environment variable setting disables all the compiler's caches. By including fewer than the four separate cache names, the problem can iteratively be narrowed down to a specific cache.

I've just been fixing one bug caused by the cache that brought home how needed it is. The project is large; almost 2 million lines. An initial build with the cache enabled takes about a minute on my machine; the bug exhibits itself in later incremental compiles when modifying the source code and pressing F9, producing spurious errors. However, once I disabled the cache (or rather, I recompiled the compiler with cache sanity checking enabled, which still filled out the cache, but also invoked the underlying logic, and simply compared the results to verify the cache), the build time took nearly 3 hours!

Note: invalidation is most likely to be a problem on incremental compiles, rather than full rebuilds, especially from within the IDE. The reason is that the compiler may have lots of stale data for one half of an incremental compile that it later decides is out of date (e.g. a dependency changed); this can leave a bunch of stale entries in the cache for all the memoized function calls that occurred in the first half of the compile. The cache is strictly per-compile; it keeps no data across multiple compilations, even partial compilations.

11 comments:

Jolyon Smith said...

I am speechless.

You seem to be saying that this caching behaviour is entirely new in the XE2 compiler, is that so ?

If so, and that build time of 3 hours on a 2 million line project is indicative of "normal" compiler performance without these new, fancy but broken, caches, then this represents a HUGE and worrying retrograde step in terms of compiler performance over previous versions.

Jolyon Smith said...

Actually, I should clarify that my reaction is based on having worked on projects in excess of 2 million LOC where full build times of more than a few minutes was considered unusual.

Of course, if that particular 2m LOC project you reference always used to build in 3 hours, even in older compilers, then this isn't the retrograde step it might appear to be.

Some clarification on that point would be appreciated.

Barry Kelly said...

Jolyon, you are clearly not speechless. I am saying that the namespace lookup situation is entirely new in the XE2 compiler because the runtime library has changed, but the compiler was not heretofore designed for it. It's in the context of that, that compile times were reaching over two hours in certain projects (specifically, certain patterns of unit uses).

That is to say, it is not the compiler which has regressed (actually, in many if not most cases, it is faster than before); the code it is compiling stressed different parts of it, owing to pervasive use of namespaces where there were none before. Specifically, default namespace lookups (to maintain backward compatibility with code that doesn't explicitly use namespaces) turned unit uses from O(1) into O(n). This in turn caused overall project unit uses into O(m*n) instead of O(m). The explosion in unit name lookups turned into an explosion in file system lookups. The only proper fix for this (the file system lookups in particular) was a cache. But in building the cache, other opportunities could be taken.

Arnaud said...

Is it not exactly the same kind of optimization that Andreas introduced in his http://andy.jgknet.de/ patches?

Barry Kelly said...

Arnaud - it's not exactly the same, no. Some of it is similar, but other bits are different.

Jolyon Smith said...

Ok, so "flabbergasted" would have been a better description than "speechless". :)

I specifically didn't use the term "regression", because clearly a new problem introduced by a new feature is not a regression.

I used the term "retrograde step", to characterise what you appear to be saying: that the new compiler - as a result of these new features - is now significantly slower in large/complex projects (if build times of the 2m LOC project were previously comparable to those I have experience of).

i.e. a backward step in terms of productivity/performance

This I am sure will be a matter of some concern for people with such large projects who in many cases will have no plans to migrate those projects to support anything other than the VCL yet - from what you are saying - might still be impacted by this.


Bottom line:

Assuming for a moment that an existing 2m LOC project could be successfully compiled "as-is" as a VCL application in XE2, would build times for that project be expected to be slower in XE2 vs XE or other earlier versions ?

Barry Kelly said...

I specifically didn't use the term "regression", because clearly a new problem introduced by a new feature is not a regression.

I disagree. Customers whose code, unchanged, compiles slower in version n+1 than version n, will say that there has been a performance regression in version n+1.

The problem being solved is performance. The regression is in the performance of compiling code - unchanged. The new feature is namespaces throughout the RTL to help in scoping code targeting multiple platforms (another new feature). But I don't think it's wrong to call the lack of performance in compiling unchanged code a regression, so it needs a fix.

I used the term "retrograde step", to characterise what you appear to be saying: that the new compiler - as a result of these new features - is now significantly slower in large/complex projects (if build times of the 2m LOC project were previously comparable to those I have experience of).

During development, yes, the compiler was slower on unchanged code. That was the whole point of fixing it.

i.e. a backward step in terms of productivity/performance

During development, yes. All sorts of products have ups and downs in performance while they are in development. That's why we try to fix these problems by release. And that's why I'm happy to say that the Delphi compiler is as fast as, or faster, than XE in almost all use cases.

This I am sure will be a matter of some concern for people with such large projects

Jolyon, I don't know how to make it any clearer to you that the performance regression is in the compiler that was not shipped. You are inventing problems based on a misunderstanding of the situation. Please clarify your understanding.

Assuming for a moment that an existing 2m LOC project could be successfully compiled "as-is" as a VCL application in XE2, would build times for that project be expected to be slower in XE2 vs XE or other earlier versions ?

I would expect 32-bit build times in XE2 to be faster than XE for many, if not most, large complex projects, and that has indeed been the feedback from customers.

The 64-bit back end is new, so hasn't had as much time to bed in. But then, XE didn't have a 64-bit compiler.

Lachlan Gemmell said...

Are these caches and the environment variable applicable to all platforms in XE2?

Barry Kelly said...

All platforms the compiler targets, yes. The caches only affect the front end, for things like symbol lookup and (particularly) unit name resolution.

LaurentE said...

The same optimization does not seem to exist for the compiler command line (dcc32.exe)?
Currently, the construction time is greater than the command line!
Can you explain this? Is this a feature of my project? Improved compiler command line is planned?

thank you

Unknown said...

Think I just realized why a project of mine takes so long to compile. I'm making heavy use of namespaces in a D2009 project. I used them to avoid some unit name clashes with some existing code I was using.

Think I'll do a comparison and see how much difference it makes.