Entropy Overload: September 2008

Thursday, September 25, 2008

Anonymous methods as an alternative to 'out' parameters

Out parameters are a useful language feature for multiple results, particularly when the language in question doesn't have tuples as a first-class feature. Even though Java omitted pass-by-reference (meaning both var and out in Delphi parlance), C# did not follow its lead and includes both parameter-passing semantics. And so it follows that the .NET method System.Int32.TryParse(String,Int32&) is possible in C#, but not in Java.

The API works well enough when you want to put the extra return value(s) into local variables or visible fields. What about times when you want to put the return value into a property, though? Since you can't (generally) take a reference to a property, you have to manually create a local variable, pass the local variable by reference, and then, in a separate statement, assign the variable to the property.

Anonymous methods can provide an "out" here, if you'll pardon the pun. By passing a setter instead of a mutable reference, you can escape from this constraint. Here's are two overloaded ReadLine functions, one using the out mechanism, the other using the setter pattern:

function ReadLine(const Prompt: string; out Dest: string): Boolean; overload;
var
  line: string;
begin
  Write(Prompt, '> ');
  Flush(Output);
  Readln(line);
  Result := line <> '';
  if Result then
    Dest := line;
end;

function ReadLine(const Prompt: string; const Setter: TProc<string>): Boolean; overload;
var
  line: string;
begin
  Write(Prompt, '> ');
  Flush(Output);
  Readln(line);
  Result := line <> '';
  if Result then
    Setter(line);
end;

As can be seen, the code that uses setters versus out parameters is little different. (It would be more different in a language that requires definite assignment, like C#, as otherwise we would have to set Dest to some value in the case of no entry.)

The real difference is in the usage point of this idiom. (And don't forget, this is an idiom; it might look odd the first time you see it, but when you understand anonymous methods, or even just the idiom, it's a trivial pattern and easy to understand.) Here's the definition of a class with a couple of properties, and a couple of functions. The first uses the out idiom, and the second uses the setter idiom:

type
  TFrob = class
  private
    FFoo: string;
    FBar: string;
  public
    property Foo: string read FFoo write FFoo;
    property Bar: string read FBar write FBar;
  end;

// Using 'out' idiom - requires separate assignment statement
function WorkWithFrob: Boolean;
var
  frob: TFrob;
  temp: string;
begin
  frob := TFrob.Create;
  try
    if not ReadLine('Give me Foo', temp) then
      Exit(False);
    frob.Foo := temp;
    if not ReadLine('Give me Bar', temp) then
      Exit(False);
    frob.Bar := temp;
    Writeln(Format('Your foo is %s and bar is %s', [frob.Foo, frob.Bar]));
    Result := True;
  finally
    frob.Free;
  end;
end;

// Using 'setter' idiom - can be done inline
function WorkWithFrob2: Boolean;
var
  frob: TFrob;
begin
  frob := TFrob.Create;
  try
    if not ReadLine('Give me Foo', procedure(x: string) begin frob.Foo := x; end)
        or not ReadLine('Give me Bar', procedure(x: string) begin frob.Bar := x; end) then
      Exit(False);
    Writeln(Format('Your foo is %s and bar is %s', [frob.Foo, frob.Bar]));
    Result := True;
  finally
    frob.Free;
  end;
end;

This idiom isn't useful very often, but when the right situation comes up, it's really quite nice to have, especially when you need persistent references to properties - e.g. when you need to store the reference in some structure somewhere, and update it with different values over time.

Viewed more generally, anonymous methods / method references can act as kinds of pipes that join up producers and consumers of data, with getters (TFunc<TResult>) for consumer-pull and setters (TProc<TArg>) for producer-push.

Wednesday, September 17, 2008

Google Developer Day 2008, London

I was at Google Developer Day here in London yesterday. Most of it was pretty light on technical details, but there was some good info on V8, Google's new JavaScript engine which is used in Google Chrome.

The specific sessions I went to were Intro to Android, Intro to Android SDK, Google Data API mashups, and V8 - "the Chrome Engine" (sic). The heavy tilt towards Android was chiefly driven by curiosity after Mike Jennings took out a pre-production OHA phone and demoed it in the keynote. It looks reasonably neat, borrowing some nifty effects from Apple's iPhone; no pinch-zooming though.

Intro to Android was a mix of business / very lightweight technical details about the phone. Apparently, Google has twisted some arms in back-rooms to get buy-in from operators & OEMs, so anything with the OHA / Android branding etc. should have some minimum level of openness etc. Should be a welcome relief from the vice-grips of Apple. There wasn't much description of the Dalvik VM running behind the scenes, other than that you take your .class files from javac and run them through a processor to get running on the device, and apparently the guy Mike talks to about the VM is "very bright". That was mentioned a number of times, so it must be important.

The Android SDK intro was given by a guy (Carl-Gusaf Harroch) from a local startup, not an actual Google guy. He very roughly described content providers, and also briefly outlined some entries in the application manifests, which are the mechanism by which the Android OS figures out what events your application is interested in (phone call arrived, moved certain distance according to GPS, that kind of thing). There was a laboured comparison of content providers with REST, in that there are methods that correspond to CRUD operations, but apparently there are other concerns such as observability etc. which make them not as simple as REST (and thus an invalid comparison, in my view). Apparently content stored on the phone and exposed to other applications is heavily skewed towards assuming that the content is living in an SQLite DB.

The GData mashup session wasn't interesting to anyone who has interacted with Google REST / AtomPub APIs even trivially. Once upon a time I wrote a blogger post app, so I didn't learn much.

Oh, and if you are writing a client to GData etc., I recommend that you don't start by trying to grok any of the Google API libraries unless you need deep integration. I didn't like the look of them last time (Java-itis, factories etc. everywhere), and I'm fairly sure they haven't improved.

Finally, I went to the V8 talk by Kevin Millikin. This was the best and most technical by a long, long way; to be honest, if it hadn't been for this talk, the day would have been a waste, on net. He described some V8 implementation details.

Roughly speaking, the main approach V8 takes is to try and shoe-horn classes into the JavaScript dynamic type system so that other, more classical dynamic object-oriented optimizations can be applied in the future. The way it works is with hidden classes. Every freshly new'd up object obj gets the same class, call it class_0. The first time you add a property, call it 'x', to obj, a new class class_1 is created which has a single property called 'x'. Additionally, a class transition edge is added to class_0 pointing at class_1, and this edge is labeled 'x'. This means that all objects which are freshly new'd up and have a single property called 'x' added to them will all share the same class. So, in theory, every object in the system which has the same properties in the same order will share the same class. The class transitions described above form a tree (I asked), and can't close over to make a dag, because if that happened, enumeration order would change. (I believe this is an implementation detail of ECMA, so people shouldn't be relying on it; however, it's something the V8 folks discovered that they could not change from extant practice.) This in turn means that if you do have conditional branches in your JavaScript constructors, you should assign to all properties in the same order, if that is feasible.

Anyhow, the use of classes as indicated above means that object use sites can now be optimized based on the runtime class of the object. Here's a specific example: whenever you access a property in JS running on V8, the access site will be a little stub function using one of 4 templates: uninitialized, pre-monomorphic, fast-monomorphic, and megamorphic. The first time the access site is invoked, the runtime class is inspected and noted, and the stub moves to the pre-monomorphic state. The second time it's accessed, the runtime class is checked against the previous class and if it's the same, the stub moves to the fast-monomorphic state, and is written so that it is very simple: compare object type, if it's as expected then dereference to object storage, then load property at the specific offset (stored in the property in the class but inlined as machine code in the access site). If the class wasn't as expected, then the stub is changed to the megamorphic state. Finally, the megamorphic state is the slow path that falls back to hash-based lookup, just like most other JS implementations.

Since arrays in JS are just tightly-packed hash buckets with particular key patterns, the same approach could be taken but it wouldn't necessarily be fast. Apparently the V8 folks discovered that a lot of artificial JS benchmarks were based around array manipulations, so they put a little work in this, but they're not finished in this area (as far as I could make out). In any case, array access strategy is governed by a heuristic; for small packed arrays accessed with integers, a direct lookup can be made. For larger, sparse arrays, the property access mechanism is used.

Variable capture in closures (a topic dear to my heart at this point) also works along similar lines as property creation, from what Kevin described. There is a caveat: if you are using the "with" statement in JavaScript, or there is eval( between the variable definition and the access site in your closure, then the access can't be optimized; these constructs fall back to the hash-based lookup, because they effectively need to use dynamic scope. So, don't create closures in that way unless you're prepared for the speed bump.

The majority of the JS standard library, things like array.join etc., are implemented in JavaScript itself in V8. They're using a special %AddProperty function to make them non-enumerable. V8 also uses a freeze-dried heap mechanism to reduce the cost of initializing the standard library. It can essentially save and restore the heap, with appropriate relocations etc. as required, so that it doesn't have to reparse everything on startup.

The garbage collector for V8 looks less interesting as a source of performance. It's almost certainly an improvement on what other JS implementations are using for GC, but I think it's some way from the last word on the topic. It has only 2 generations, so intermediate live objects that get promoted when collecting new-space will eventually force a costly old-space collection. Kevin didn't say that they were using write barriers to reduce need to scan old-space looking for pointers to new-space, but grepping the V8 sources turns up some write barrier hits, so maybe they are. V8's GC is definitely better than reference counting as implemented in IE of older days, of course.

Since the main approach taken thus far was just to give values a class, and use that to optimize property access, there is still a lot of scope for optimization in V8. I wouldn't be surprised to see significant (2-5x) performance improvements in the not too distant future in V8, as more techniques are integrated. They're currently going straight from the JS AST to machine code, no inlining of aforementioned property access sites (AFAICT - there was a 'ret' at the end of the demo access site). The main thing is (a) they have objects pinned down to types now, and (b) hopefully as JS developers learn how to make code run fast under this paradigm, objects will look even more type-ful and thus increase scope for other optimizations.

Thursday, September 11, 2008

Exogenous Exceptions (oh and another Vista rant :)

Eric Lippert has just posted an entry about "vexing exceptions", talking about the various buckets he classifies exceptions into and different approaches for handling them:

Vexing exceptions are the result of unfortunate design decisions. Vexing exceptions are thrown in a completely non-exceptional circumstance, and therefore must be caught and handled all the time.

[...]

The classic example of a vexing exception is Int32.Parse, which throws if you give it a string that cannot be parsed as an integer. But the 99% use case for this method is transforming strings input by the user, which could be any old thing, and therefore it is in no way exceptional for the parse to fail.

[...]

And finally, exogenous exceptions appear to be somewhat like vexing exceptions except that they are not the result of unfortunate design choices. Rather, they are the result of untidy external realities impinging upon your beautiful, crisp program logic. [Example of file-not-found follows, and points out that File.Exists check would only be a race.]

However, I strongly disagree on his suggestion that you try to catch exogenous exceptions, and somewhat disagree on the "vexing" exceptions (here, it's probably just the specific example he chose that I don't really agree with).

In my opinion, exceptions should, in general, be propagated all the way to a user-visible dialog box / notification message, if the exception represents an error by the user. So, for example, if a user enters a floating-point value or a string value in a box that should semantically be an integer, it is fine for an exception in Int32.Parse to bubble its way back into the user's view - so long as the message in the exception is meaningful to the user, and is written in the correct language / jargon etc. If not, then certainly the exception should be wrapped in another exception and re-propagated, but the exception itself shouldn't be just caught.

Of course, if there is valid logic for a failure case that doesn't simply mean telling the user about a problem in their input / what they're trying to do, then that's a situation where TryParse etc. makes sense.

On the exogenous question, catching these is extremely problematic, because when the user eventually sees the message, they'll need to figure out what went wrong. So, if you have a file access problem, the very first exception raised - the exogenous one when the CreateFile call at the heart of things failed - should have a message associated with it indicating that such-and-such file couldn't be accessed. This is the same message that should eventually be propagated to the user, either in an event log, a dialog box, or other notification mechanism, particularly if the high-level attempted action was user-initiated.

To do otherwise leaves the user stranded with a generic, non-specific (and thus meaningless) "I couldn't do something" message. I've seen far too much MS software that gives non-specific and non-actionable error messages in failure situations that this kind of advice really annoys me unless it is very carefully qualified and described.

If anything, exceptions should be wrapped rather than caught, with higher-level semantic information wrapping the underlying reason. So, if you're trying to e.g. modify a record in a data entry application, the chain of failure might be "couldn't modify record because" -> "couldn't contact database for locking because" -> "couldn't connect to database server because" -> "remote server name FooBar could not be found". This kind of message has information about every level of the stack, and should a user have to e.g. contact IT, the full message (the technical details can be hidden in a pop-out dialog etc.) is 100% actionable, and even regular users, let alone power users, may find it actionable.

Software does not have AI-level capabilities, and is very far from it. Describing what went wrong is 100x more useful than presenting something vaguely actionable. Unless the error case is very common, and thus you are very certain what the fix is, you should not try to present "actionable" advice over describing what went wrong, simply because to give good actionable advice in general, you need to embed an expert system; without populating the expert system with data, it needs to include IT-support-level AI, which like I said, isn't happening any time soon.

Here's a specific example that really burned me just the other day. Vista Ultimate has a full PC backup capability. I found out that my main HD is failing (SMART alert). I wanted to restore that backup onto a different disk of the same make and size (actually, right down to consecutive serial numbers). However, my machine is rather complex - there are 7 SATA disk devices in the machine. The Vista OS DVD failed to restore the backup to my perfectly-matched disk, but I have no idea why. All I do know is that the error message was "vaguely actionable":

Error details: There are too few disks on this computer or one or more of the disks is too small. Add or change disks so they match the disks in the backup and try the restore again. (0x80042401)

This message is completely and utterly useless, because it does not describe what went wrong, only "how" to fix it - but because the software isn't AI-level and doesn't include an expert system, it itself can't produce a specific set of instructions.

In this machine, I had 4x1TB disks, 1x400GB disk, and 1x200GB disk; the backup was on the 400GB disk and the target of the restore was the 200GB disk. 2 of the 1TB disks were blank and formatted. Thus, there was no lack of disks or space. Similarly, the target of the restore was at Disk 0, achieved through careful selection of the SATA connection on the motherboard. Still didn't work though, and I can't fix it because the error message is following the wrong philosophy for our current knowledge of AI.

FWIW, here's another user's experience with this problem. Notice the procedure to actually show the user the actual error, rather than the useless message:

Boot from the Vista DVD
Go to Repair Computer -> Command Prompt
Go into Regedit
Under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\
Create the key: Asr
Under Asr create the key LogFileSetting
Under LogFileSetting create the dword EnableLogging with the value 1
Under LogFileSetting create the string LogPathName (string) with a value such as D:\Asr.log
- you should specify a physical drive (e.g. I used the drive you are going to restore from) not the ramdrive (X:) so that the log is saved after reboot.
Exit Regedit
From the Repair menu launch Complete PC Restore
Attempt the Complete PC Restore
When you get the error, check the logging path to be sure the Asr.log file exists. I did that by going back to the Command Prompt and getting a directly listing before rebooting.

This is disgraceful, and frankly, unforgivable. Don't do this with your exogenous exceptions.

Tuesday, September 09, 2008

Vista LanmanServer falling apart on you?

I've been having a lot of trouble over the past day or so with my Vista machine. Since I can't tolerate it long as a main desktop, it's primarily acting as a file and print server.

Over the past day, however, the Vista network sharing service, LanmanServer, has been falling over on me. All attempts to connect to Vista shares fail with error 1130, aka ERROR_NOT_ENOUGH_SERVER_MEMORY, and the event log on Vista is filled with 2017 (primarily) and 2021 Event IDs:

2017: The server was unable to allocate from the system nonpaged pool because the server reached the configured limit for nonpaged pool allocations.

2021: The server was unable to allocate a work item %d times in the last 60 seconds.

Fixing it required stopping Computer Browser (net stop browser) and Server (net stop server) and restarting them (net start etc.) on the Vista machine.

Other people have been having this problem too; here's a relevant thread where you can read up about these frustrations:

[...] Vista Ultimate file/print server that is accessed by various Vista and XP client workstations. Recently - and increasingly - the XP machines have been spontaneously losing their connections to the Vista file server. This started out as a minor annoyance about a month ago, but has escalated to the point where the XP clients cannot stay connected for more than 5-30 minutes[...]

Anyhow, I've found a solution that appears to work, linked to at the end of the previous thread (which makes me suspect it fixed the problem for those folks too).

Thursday, September 04, 2008

Smart pointers in Delphi

Strongly-typed smart pointers are now possible in Delphi, leveraging the work on generics. Here's a simple smart pointer type:

  TSmartPointer<T: class> = record
  strict private
    FValue: T;
    FLifetime: IInterface;
  public
    constructor Create(const AValue: T); overload;
    class operator Implicit(const AValue: T): TSmartPointer<T>;
    property Value: T read FValue;
  end;

Here it is in action, where TLifetimeWatcher is a little class that executes some code when it dies:

procedure UseIt;
var
  x: TSmartPointer<TLifetimeWatcher>;
begin
  x := TLifetimeWatcher.Create(procedure
  begin
    Writeln('I died.');
  end);
end;

Here's the full project code that defines TSmartPointer<>, TLifetimeWatcher, and the above test routine:

{$APPTYPE CONSOLE}

uses
  SysUtils;

type
  TLifetimeWatcher = class(TInterfacedObject)
  private
    FWhenDone: TProc;
  public
    constructor Create(const AWhenDone: TProc);
    destructor Destroy; override;
  end;

{ TLifetimeWatcher }

constructor TLifetimeWatcher.Create(const AWhenDone: TProc);
begin
  FWhenDone := AWhenDone;
end;

destructor TLifetimeWatcher.Destroy;
begin
  if Assigned(FWhenDone) then
    FWhenDone;
  inherited;
end;

type
  TSmartPointer<T: class> = record
  strict private
    FValue: T;
    FLifetime: IInterface;
  public
    constructor Create(const AValue: T); overload;
    class operator Implicit(const AValue: T): TSmartPointer<T>;
    property Value: T read FValue;
  end;

{ TSmartPointer<T> }

constructor TSmartPointer<T>.Create(const AValue: T);
begin
  FValue := AValue;
  FLifetime := TLifetimeWatcher.Create(procedure
  begin
    AValue.Free;
  end);
end;

class operator TSmartPointer<T>.Implicit(const AValue: T): TSmartPointer<T>;
begin
  Result := TSmartPointer<T>.Create(AValue);
end;

procedure UseIt;
var
  x: TSmartPointer<TLifetimeWatcher>;
begin
  x := TLifetimeWatcher.Create(procedure
  begin
    Writeln('I died.');
  end);
end;

begin
  try
    UseIt;
    Readln;
  except
    on E:Exception do
      Writeln(E.Classname, ': ', E.Message);
  end;
end.

Update: Fixed capture - was capturing a location in the structure, which of course will be freely copied etc.

Entropy Overload