Tuesday, October 04, 2011

Delphi XE2 compiler performance

Delphi XE2 introduced namespaces across the runtime library. This stressed unit name lookup inside the compiler, and led to some severe performance regressions in certain cases. So during the runup to the XE2 release, I fired up a profiler and started investigating. It turns out there were numerous situations where lookups were being performed repeatedly with the same arguments, and logically the results should have been consistent across these repeated calls. A relatively cheap and easy fix seemed to be memoization. So I added a caching infrastructure to the compiler and used the profiler to guide me where to inject it.

For the most part - particularly for full builds - the cache works really well. But I've had some reports of bugs that I suspected were caused by the cache hanging on to function results slightly too long, and upon investigation, this turned out to be true. The problem with caches is usually in invalidation; if you don't invalidate the cache soon enough and in all required situations, you end up serving stale results. So there are a few bugs lurking in the Delphi compiler here, which I'm seeking out and squashing.

Some good news, however; I had anticipated that this might be the case, so I added a super secret switch that enables diagnosing a probable cache failure: caches can be selectively disabled, and if a problem goes away with the cache disabled, it's probably because of stale results.

Caches can be disabled by setting an environment variable:

DCC_CACHE_DISABLE='SearchUnitNameInNS,FileSystem,UnitFindByAlias,GetUnitOf'

The above environment variable setting disables all the compiler's caches. By including fewer than the four separate cache names, the problem can iteratively be narrowed down to a specific cache.

I've just been fixing one bug caused by the cache that brought home how needed it is. The project is large; almost 2 million lines. An initial build with the cache enabled takes about a minute on my machine; the bug exhibits itself in later incremental compiles when modifying the source code and pressing F9, producing spurious errors. However, once I disabled the cache (or rather, I recompiled the compiler with cache sanity checking enabled, which still filled out the cache, but also invoked the underlying logic, and simply compared the results to verify the cache), the build time took nearly 3 hours!

Note: invalidation is most likely to be a problem on incremental compiles, rather than full rebuilds, especially from within the IDE. The reason is that the compiler may have lots of stale data for one half of an incremental compile that it later decides is out of date (e.g. a dependency changed); this can leave a bunch of stale entries in the cache for all the memoized function calls that occurred in the first half of the compile. The cache is strictly per-compile; it keeps no data across multiple compilations, even partial compilations.

Tuesday, March 01, 2011

An ugly alternative to interface to object casting

I was answering a question on Stack Overflow, but the user didn't have the latest version of Delphi. My answer included converting an interface to an object instance, which is made possible with the as cast on interfaces in recent Delphi versions. But there is another way of doing it, exploiting the regularity Delphi interface vtable implementations:

{$apptype console}

function Intf2Obj(x: IInterface): TObject;
type
  TStub = array[0..3] of Byte;
const
  // ADD [ESP+$04], imm8; [ESP+$04] in stdcall is Self argument, after return address
  add_esp_04_imm8: TStub = ($83, $44, $24, $04);
  // ADD [ESP+$04], imm32
  add_esp_04_imm32: TStub = ($81, $44, $24, $04);
  
  function Match(L, R: PByte): Boolean;
  var
    i: Integer;
  begin
    for i := 0 to SizeOf(TStub) - 1 do
      if L[i] <> R[i] then
        Exit(False);
    Result := True;
  end;
  
var
  p: PByte;
begin
  p := PPointer(x)^; // get to vtable
  p := PPointer(p)^; // load stub address from vtable
  
  if Match(p, @add_esp_04_imm8) then 
  begin
    Inc(p, SizeOf(TStub));
    Result := TObject(PByte(Pointer(x)) + PShortint(p)^);
  end
  else if Match(p, @add_esp_04_imm32) then
  begin
    Inc(p, SizeOf(TStub));
    Result := TObject(PByte(Pointer(x)) + PLongint(p)^);
  end
  else
    raise Exception.Create('Not a Delphi interface implementation?');
end;

type
  ITest = interface
    procedure P;
  end;
  TTest = class(TInterfacedObject, ITest)
    F: array[0..200] of Byte;
    procedure P;
  end;

procedure TTest.P;
begin
  Writeln('Hello');
end;
  
procedure Go;
var
  orig: TTest;
  i: ITest;
  o: TObject;
begin
  orig := TTest.Create;
  i := orig;
  i.P;
  o := Intf2Obj(i);
  Writeln(o = orig);
end;

begin
  Go;
end.

This approach is predicated on the idea that the stub code that Delphi produces for turning the implicit interface argument into an instance argument is predictable. It generally only has two forms, depending on how much of an adjustment it needs to make (which itself depends on how much instance data there is). It ought to work for almost all 32-bit Delphi interfaces that have been implemented by instances, where the vtable was created by the compiler. If not, other stub variations can be analyzed (in the IDE CPU view) and handled too. It ought to be pretty safe, as only this specific code is permitted. It could be made even safer by ensuring that the stub ends with a JMP and that the instance returned has a ClassType descending from TClass.

Update: After a Google search I note that Hallvard also wrote about this some time ago. His code is a little tighter than mine (using Integer constants rather than byte-by-byte comparison); in my defense, I only spent a few minutes on this...

Monday, December 20, 2010

Scrolling: Chrome vs Firefox

I was using the Chrome browser on my laptop the other day (some browser compat problem). I don't normally use it; Firefox is my preferred browser. I couldn't help but notice how peculiarly laggy it felt to use. It came down to vertical scrolling: this highly common task - for any long article you'll be doing a lot of it - felt jerky and unpleasant. So I did a quick ad-hoc experiment: open up the same web page in both browsers with viewports of the same size, grab the vertical scroll bar, and wiggle it up and and down continuously so the browser is constantly redrawing text.

This is a crop from Process monitor on my desktop machine, which has 8 logical cores, i.e. 4 physical cores with hyperthreading. That means that 100% CPU usage on one core shows up as 12.5% CPU usage. The big wide bump on the left is the CPU usage while I was fiddling with Chrome; the one on the right is Firefox. (The bump in the middle is sqlservr.exe, which wakes up and does meaningless busywork every 60 seconds.) You can see that Chrome uses perhaps 50% more CPU usage on this task than Firefox, with more time spent in the user process and (proportionately) less in the kernel (presumably shifting bits around). Eyeballing an average of CPU usage, Firefox ranged from 6% to 7.5%, while Chrome pegged at 12.5%. Chrome is simply less CPU-efficient redrawing while scrolling, and it's very obvious to my eyes. (I did the same experiment with IE, and it was in the middle, with about 9% CPU.)

I might hazard a guess that Firefox spends more memory on caching bitmaps of the web page, or some similar trick trading off space for time. In any case, it's one of the reasons Firefox is still my primary browser, and also why I'm completely unconcerned about its memory usage. I have yet to encounter significant paging because of Firefox, not least because it's a 32-bit process, limiting its maximum usage, and I have alternately 12G and 4G of memory in my primary desktop and laptop respectively.

Mind you, when I add up the private working set for all 8 chrome.exe processes apparently needed to display this web page, they add up to 86M, only 1M less than my recently restarted firefox.exe session at 87M. (Comparing other memory usage numbers is awkward, as non-working-set memory isn't relevant, while non-private memory would be double-counted with Chrome.)

Monday, November 15, 2010

CrashPlan manual installation approach on Nexenta

I wrote an earlier post about installing CrashPlan on Nexenta. However, it seems that CrashPlan have changed their Linux installer's modus operandi, and it tries to install a (Linux) JRE of its own. Of course, that won't work; it needs the Nexenta JRE (but I get the whole JDK, as you never know when you'll need to brew up some Java):

$ sudo apt-get install sun-java6-jdk

Anyhow, I received some comments and emails about setting all this up, and figured I'd break it down a little more for people who still want to get it all working. It's a process of hacking things together, though, not a blind recipe; that's why I'm proceeding almost as if it's a debugging session.

Anyhow, once we have a JRE installed (from apt-get above), we can try and extract out the guts of the Linux installer, so rather than running its install.sh, we can set it up manually.

I went to the CrashPlan Linux download web page, and started downloading the installer in a scratch folder (this was the URL at the time of writing):

$ wget http://download.crashplan.com/installs/linux/install/CrashPlan/CrashPlan_2010-03-08_Linux.tgz

That's a gzipped tar archive, so I extracted it:

$ tar xzf CrashPlan_2010-03-08_Linux.tgz
$ cd CrashPlan-install # the dir it extracted
$ ls
CrashPlan_2010-03-08.cpi  EULA.txt  INSTALL  README  install.sh  scripts  uninstall.sh

I didn't know what the big CrashPlan*.cpi file was, so I checked:

$ file CrashPlan_2010-03-08.cpi
CrashPlan_2010-03-08.cpi: gzip compressed data, was "CrashPlan_2010-03-08.cpi", from Unix, max compression

So it's a gzipped file! I decompressed it, and tested the result:

$ gunzip < CrashPlan_2010-03-08.cpi > test
$ file test
test: ASCII cpio archive (pre-SVR4 or odc)

A cpio archive! I extracted that too, but in a separate directory:

$ mkdir t; cd t; cpio -i ../test
$ ls
bin  conf  doc  jniwrap.lic  lang  lib  libjniwrap.so  libjniwrap64.so  libjtux.so  libjtux64.so  log  skin  upgrade

Now this looks very close to the root listing of my actual current CrashPlan install on Nexenta, and by and large it is. Here's a rough difference between the two (listing.curr is my actual install, listing is extracted from the installer):

--- listing.curr        2010-11-15 06:32:12.548701734 +0000
+++ listing     2010-11-15 06:31:40.726884007 +0000
-CrashPlanEngine.pid
-bin/CrashPlanDesktop
-bin/CrashPlanEngine
-bin/CrashPlanEngine.lsb
-bin/run.conf
-bin/vars.sh
-conf/my.service.xml
-conf/service.login
-conf/service.model
-install.vars
-libjtux.so.lsb
-libjtux.so.sol

Let's go through those. This install tree is almost complete; it can be put in /usr/local/crashplan (that's where I have mine) or wherever you like, so long as the configuration bits are also hooked up appropriately.

CrashPlanEngine.pid is just the pidfile, a text file containing the process id of the currently running instance. I'm not sure why it's in the crashplan directory rather than somewhere like /var/run, but it is. I believe the crashplan service will create it.

CrashPlan* are all shell wrappers for Java applications; *.lsb (LSB for Linux Standard Base) are the original versions (I think I renamed them to this). I don't have an X server or libraries on my Nexenta install, so I'm uninterested in CrashPlanDesktop. CrashPlanEngine however is important; it's the main daemon file. Here's mine, which seems to work; I think I may have edited it to work with some GNU utils rather than Solaris utils, as I believe this came from the Solaris installer (or vice versa; actually that's probably more likely):

#!/bin/bash

TARGETDIR="`dirname ${0}`/.."

. ${TARGETDIR}/install.vars
. ${TARGETDIR}/bin/run.conf

cd ${TARGETDIR}

case $1 in
        start)
                PID=`/usr/bin/ps -Af -o pid,ppid,args | grep 'app=CrashPlanService' | grep -v grep | cut -f2 -d' '`
                if [ -n "$PID" ]; then
                  echo CrashPlan is already running with pid $PID
                  exit 1;
                fi
                echo "Starting CrashPlan Engine ... "
                nice -n 19 ${JAVACOMMON} ${SRV_JAVA_OPTS} -classpath "./lib/com.backup42.desktop.jar:./lang" com.backup42.service.CPService > ${TARGETDIR}/log/engine_output.log 2> ${TARGETDIR}/log/engine_error.log & 
                if [ $! -gt 0 ]; then
                        echo $! > ${TARGETDIR}/CrashPlanEngine.pid
                        echo "OK"
                else
                        echo "FAIL" 
                        exit 1
                fi
                ;;
        stop)
                echo "Stopping CrashPlan Engine ... "
                if [ -f ${TARGETDIR}/CrashPlanEngine.pid ] ; then
                  kill `cat ${TARGETDIR}/CrashPlanEngine.pid`
                  sleep 5
                fi
                PID=`/usr/bin/ps -Af -o pid,ppid,args | grep 'app=CrashPlanService' | grep -v grep | cut -f2 -d' '`
                if [ -n "$PID" ]; then
                  echo Still running, killing PID=$PID
                  kill -9 $PID
                fi
                rm -f ${TARGETDIR}/CrashPlanEngine.pid
                echo "OK"
                ;;
        *)      
                echo "$0 "
                exit 1
                ;;
esac

As you can see, that script sources (includes) a couple of other guys, bin/run.conf and install.vars. bin/run.conf looks like this:

SRV_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dapp=CrashPlanService -DappBaseName=CrashPlan -Xms20m -Xmx512m -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0"
GUI_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dapp=CrashPlanDesktop -DappBaseName=CrashPlan -Xms20m -Xmx512m -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0"

install.vars looks like this (for my current install):

TARGETDIR=/usr/local/crashplan
BINSDIR=/usr/local/bin
MANIFESTDIR=/tank/share/backup/crashplan
INITDIR=/etc/init.d
RUNLVLDIR=/etc/rc3.d
INSTALLDATE=20100310
APP_BASENAME=CrashPlan
DIR_BASENAME=crashplan
DOWNLOAD_HOST=download.crashplan.com
JAVACOMMON=/usr/bin/java

The MANIFESTDIR variable points to a directory I expect I provided when I ran the original install.sh, but there's nothing currently in it. The other paths are as you'd expect.

As far as I can make out, the conf/* files are created by CrashPlan itself in coordination with the configuration utility, which, as I linked in my previous post, can be run remotely from a Windows (or other OS) install of CrashPlan. There's a default.service.xml as a prototypical my.service.xml, while service.{login,model} appear to be login and encryption keys.

That leaves libjtux.so.*. These are renamed versions of the file libjtux.so which differs between Linux and Solaris. Primarily they differ in which C library they link against (note: running ldd on untrusted binaries is unsafe, but I trust these binaries):

$ ldd libjtux.so.*
warning: ldd: libjtux.so.lsb: is not executable
libjtux.so.lsb:
        librt.so.1 =>    /lib/librt.so.1
        libnsl.so.1 =>   /lib/libnsl.so.1
        libc.so.6 =>     (file not found)
        libmp.so.2 =>    /lib/libmp.so.2
        libmd.so.1 =>    /lib/libmd.so.1
        libscf.so.1 =>   /lib/libscf.so.1
        libc.so.1 =>     /lib/libc.so.1
        libuutil.so.1 =>         /lib/libuutil.so.1
        libgen.so.1 =>   /lib/libgen.so.1
        libm.so.2 =>     /lib/libm.so.2
warning: ldd: libjtux.so.sol: is not executable
libjtux.so.sol:
        librt.so.1 =>    /lib/librt.so.1
        libsocket.so.1 =>        /lib/libsocket.so.1
        libnsl.so.1 =>   /lib/libnsl.so.1
        libc.so.1 =>     /lib/libc.so.1
        libmp.so.2 =>    /lib/libmp.so.2
        libmd.so.1 =>    /lib/libmd.so.1
        libscf.so.1 =>   /lib/libscf.so.1
        libuutil.so.1 =>         /lib/libuutil.so.1
        libgen.so.1 =>   /lib/libgen.so.1
        libm.so.2 =>     /lib/libm.so.2

Anyhow, we need to get the right version of libjtux.so. It seems to be an open-source project and getting a version wouldn't be difficult, but the one in the CrashPlan Solaris installer works fine; let's get that one:

# current version when I ran this
$ wget http://download.crashplan.com/installs/solaris/install/CrashPlan/CrashPlan_2010-03-08_Solaris.tar.gz
$ tar xzf CrashPlan_2010-03-08_Solaris.tar.gz
$ cd CrashPlan/root/opt/sfw/crashplan
$ ls
bin  client-build.properties  conf  doc  installer  jniwrap.lic  lang  lib  libjtux.so  skin  upgrade

So here we have the entrails of a CrashPlan for Solaris install, and we can pick and choose which organs we need to transplant; and the thing that's particularly needed is libjtux.so; it needs to replace the Linux version.

All that should be left is getting CrashPlan to start at boot-up, which is easiest manually done in Nexenta with the init.d system. Here's the executable crashplan script I have in /etc/init.d:

#!/bin/bash
SCRIPTNAME=/usr/local/crashplan/bin/CrashPlanEngine

case "$1" in
start)
        $SCRIPTNAME start       
        ;;
stop)
        $SCRIPTNAME stop
        ;;
restart)
        $SCRIPTNAME restart
        ;;
force-reload)
        $SCRIPTNAME force-reload
        ;;
status)
        $SCRIPTNAME status
        ;;
*)      
        echo "Usage: $0 " >&2
        exit 3
        ;;
esac
exit 0

And the symlink in rc3.d:

$ ls -l /etc/rc3.d/S99crashplan 
lrwxrwxrwx 1 root root 21 Mar 10  2010 rc3.d/S99crashplan -> ../init.d/crashplan

That's it!

Wednesday, September 15, 2010

Non-Delphi: Postmodernism, Transformers: ROTF and Baudrillard

The simulacrum is never that which conceals the truth--it is the truth which conceals that there is none. The simulacrum is true. (Baudrillard, Simulacra and Simulation)

I've just watched Transformers: Revenge of the Fallen, and it got me thinking. It's perhaps the most post-modern movie I've seen to date.

T:ROTF isn't so much a movie as a string of soundbites, stereotypes and cliches arranged into a 2.5 hour trailer for a movie you'll never get to see, because it was never made. You turn up for the product the advertisement is selling, but it turns out the whole product was ad. There's no "there" there. I've never seen a better embodiment of Baudrillard's conception of simulacrum than this.

The meaningless symbols are so densely packed in this movie I need a convention. Everything in the movie is a stereotype or cliche: a stand-in, an intellectually lazy shorthand reference. I'll be lazy too, and mark stereotypes in my commentary with [brackets].

The beginning is indistinguishable from a trailer. A brief [dawn of man] scene, you know the kind, [stone age people silhouetted against a dawn sky somewhere in Africa], doing [caveman things with the spears and the facepaint], with a [rumbling voiceover] helpfully telling you that it's ["Earth, birthplace of the human race"]. If it were storytelling, it would be rushed, heavy-handed, and contemptuous of the viewer - both showing and telling. But I don't think it is storytelling. It's arranging some symbols (humans, decepticons) into a particular aspect required for later symbolic purposes. The decepticons portrayed in this ancient time are [evil] (with [King Kong-like grabbing] of a feeble human, albeit male), but there is no motive, no narrative. Why would such powerful machines pay any more attention to stone age humans than they would apes, or insects, which they can swat away with similar ease?

Next up: Shanghai, [disaster scene], with [disaster radio news chatter]. Cue [Pentagon command centre], explaining that some black hawks are moving in, while showing some black hawks moving in: Americans aircraft and troops entering Chinese territory, in complete suspension of geopolitical disbelief, no explanation considered necessary. Expository trailer voiceover says "new autobots", while expository camera shot shows new autobots, including [hot girl on bike], [fast car], and [military transport]. "Together, we form an alliance", explains voiceover, while showing human troops in [military transport] (which subsequently transforms). No attempt to explain why squishy soldiers with small arms are going up against fast-moving heavy machinery. What do they hope to achieve with their flying pieces of lead? Would they go up against even a human-engineered tank with such miserable munitions? Nor an explanation for the gunships flying with mere tens of feet clearance from the ground and the surrounding buildings that tower over them, completely negating the tactical advantages of a mobile, hovering cannon and missile platform.

But all is soon revealed. The squishy humans aren't going in to fight, they are going in to be squished, to symbolize human weakness against the machines. After a decepticon slams its fists into some concrete pipe sections, somehow creating a fiery explosion, gunships capable of engaging the enemy with missiles and canons from considerable distance approach low and close enough to be clobbered with a mere wave of mechanical arms. As an alleged depiction of a military engagement, it's beyond ludicrous, laughable on its face. Suspension of disbelief isn't possible: this isn't a battle; it isn't even a simulation of a battle. It's a simulation of battle simulation, an arrangement of symbols of battles. Here are our valiant heroes going into battle; here's our shockingly powerful foe, see how he easily puts our heroes on the back foot; but wait (!) here come our heroes again with reinforcements, to win the day with a bunch of soundbites: ["damn, I'm good!"], ["punk-ass decepticon"], ["any last words?"], "the fallen shall rise again", ["that doesn't sound good"], ["not today!", reload-click, bullet to the head].

That's just the first 8 minutes or so; it goes on for hours (!), with no variation in pacing that you wouldn't also expect in a 30-second movie trailer. Some other commentary roughly concurs with mine, though I didn't enjoy the spectacle or visual feast aspects, primarily because those spectacles are filmed too close to the action, and the subjects, transformed machines, have so many bits and bobs hanging out of them it's hard to tell where one begins and another ends, much like how camouflage breaks up outlines. Trying to figure out what's actually going on within the pace of the editing cuts would give me a headache. Besides, marvelling at the sheer density of signifiers and its generally jaw-dropping empty awfulness is more fun, in a perverse way.