Tuesday, January 13, 2009

Implementing user-defined copy-on-write data structures in Delphi

There was some discussion about user-defined class operators in the Delphi non-technical forum. Some folks seem to be asking for operators for heap-allocated classes, but in the absence of either GC or reference counting, these are problematic.

I suggested that records combined with interfaces can get much of the benefit of both worlds: user-defined operators on the record, life-time management via interface reference counting, and arbitrary data sizing through the interface being a heap reference.

To demonstrate the idea through example, I have written up a quick copy-on-write array type that has a user-defined '+' operator that concatenates two lists. Let me build it up from the bottom. First, I've defined a generic dynamic array type to avoid the pitfalls of nominal typing that Delphi has inherited from Pascal (a common newbie mistake, particularly with the syntactic ambiguity with open arrays in parameter lists):

type
  TArray<T> = array of T;

Next up is the interface. Using an interface for communicating between the record and the heap value means that I get reference counting, and thus lifetime management, for free.

  ICowArrayData<T> = interface
    function GetLength: Integer;
    function MutableClone: ICowArrayData<T>;
    function GetItem(Index: Integer): T;
    procedure SetItem(Index: Integer; const Value: T);
    function ToArray: TArray<T>;
  end;

  TCowArrayData<T> = class(TInterfacedObject, ICowArrayData<T>)
  private
    FData: TArray<T>;
  public
    constructor Create(const Data: TArray<T>);
    function GetLength: Integer;
    function MutableClone: ICowArrayData<T>;
    function GetItem(Index: Integer): T;
    procedure SetItem(Index: Integer; const Value: T);
    function ToArray: TArray<T>;
  end;

The only "interesting" method really is MutableClone, which I'll be calling to verify that I have a unique instance before I perform any mutating operations. This is necessary to get copy-on-write behaviour. The implementation class doesn't add anything beyond a constructor, as all communication will be through the interface.

Here's the actual record, and the only type that the user of this bundle should concern themselves with:

  TCowArray<T> = record
  private
    FData: ICowArrayData<T>;
    function GetItems(Index: Integer): T;
    procedure SetItems(Index: Integer; const Value: T);
    function GetLength: Integer;
  public
    constructor Create(const Data: TArray<T>); overload;
    constructor Create(const Data: array of T); overload;
    property Items[Index: Integer]: T read GetItems write SetItems; default;
    property Length: Integer read GetLength;
    function ToArray: TArray<T>;

    class operator Add(const Left, Right: TCowArray<T>): TCowArray<T>;
  end;

It adds a nice default Items property so that values can be indexed like arrays, exposes a Length property much like .NET, and adds in the Add operator to prove the point.

The implementation of most the methods are pretty routine, but a few are important. First up, because interfaces are managed types, they get zero-initialized by default. Thus, we can't expect the record's FData field to be initialized. We can take a leaf out of Delphi's built-in array (and string) types, and let zero-length arrays be represented by a nil FData. That affects the implementation of the record methods, e.g.:

function TCowArray<T>.GetLength: Integer;
begin
  if FData = nil then
    Exit(0); // nil => zero-length
  Result := FData.GetLength;
end;

Another important method is the implementation of the copy-on-write itself. The only mutating method on this type is SetItems, and here it is:

procedure TCowArray<T>.SetItems(Index: Integer; const Value: T);
begin
  FData := FData.MutableClone;
  FData.SetItem(Index, Value);
end;

This isn't particularly efficient, but it does make sure that we have a unique reference when required. The implementation of MutableClone is simple enough too:

function TCowArrayData<T>.MutableClone: ICowArrayData<T>;
begin
  if RefCount = 1 then
    Exit(Self);
  Result := TCowArrayData<T>.Create(ToArray);
end;

Using the inherited RefCount from TInterfacedObject, we can determine safely if we are only referred to by our caller. In a race scenario, where RefCount might e.g. spuriously be 2, there won't be any correctness problem if we duplicate the array anyway.

The implementation of the '+' operator is pretty trivial too:

class operator TCowArray<T>.Add(const Left, Right: TCowArray<T>): TCowArray<T>;
var
  resultArr: TArray<T>;
  i: Integer;
begin
  SetLength(resultArr, Left.Length + Right.Length);
  for i := 0 to Left.Length - 1 do
    resultArr[i] := Left[i];
  for i := 0 to Right.Length - 1 do
    resultArr[Left.Length + i] := Right[i];
  Result := TCowArray<T>.Create(resultArr);
end;

Finally, let's look at the program body itself, testing these types:

procedure WriteArray(const Msg: string; Arr: TCowArray<Integer>);
var
  i: Integer;
begin
  Write(Msg, ':');
  for i := 0 to Arr.Length - 1 do
    Write(' ', Arr[i]);
  Writeln;
end;

var
  x, y: TCowArray<Integer>;
begin
  try
    x := TCowArray<Integer>.Create([1, 2, 3]);
    y := x;

    Writeln('Starting out, both x and y refer to same instance data');
    WriteArray('x', x);
    WriteArray('y', y);

    Writeln('Modifying x; note that y doesn''t change:');
    x[1] := 42;
    WriteArray('x', x);
    WriteArray('y', y);

    // Add operator as concatenation
    Writeln('Concatenation:');
    y := x + y;
    WriteArray('x', x);
    WriteArray('y', y);
    
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

The output does indeed show that the array is copied on writes, and that concatenation works as expected:

Starting out, both x and y refer to same instance data
x: 1 2 3
y: 1 2 3
Modifying x; note that y doesn't change:
x: 1 42 3
y: 1 2 3
Concatenation:
x: 1 42 3
y: 1 42 3 1 2 3

As an addendum, let me add the full source for reference. Note that due to some bugs in the current compiler's generics implementation for dynamic arrays, the code won't work if it's part of a unit - it needs to be in a single whole for now, unfortunately.

{$apptype console}
uses SysUtils;

type
  TArray<T> = array of T;
  
  ICowArrayData<T> = interface
    function GetLength: Integer;
    function MutableClone: ICowArrayData<T>;
    function GetItem(Index: Integer): T;
    procedure SetItem(Index: Integer; const Value: T);
    function ToArray: TArray<T>;
  end;
  
  TCowArrayData<T> = class(TInterfacedObject, ICowArrayData<T>)
  private
    FData: TArray<T>;
  public
    constructor Create(const Data: TArray<T>);
    function GetLength: Integer;
    function MutableClone: ICowArrayData<T>;
    function GetItem(Index: Integer): T;
    procedure SetItem(Index: Integer; const Value: T);
    function ToArray: TArray<T>;
  end;
  
  TCowArray<T> = record
  private
    FData: ICowArrayData<T>;
    function GetItems(Index: Integer): T;
    procedure SetItems(Index: Integer; const Value: T);
    function GetLength: Integer;
  public
    constructor Create(const Data: TArray<T>); overload;
    constructor Create(const Data: array of T); overload;
    property Items[Index: Integer]: T read GetItems write SetItems; default;
    property Length: Integer read GetLength;
    function ToArray: TArray<T>;

    class operator Add(const Left, Right: TCowArray<T>): TCowArray<T>;
  end;

{ TCowArray<T> }

class operator TCowArray<T>.Add(const Left, Right: TCowArray<T>): TCowArray<T>;
var
  resultArr: TArray<T>;
  i: Integer;
begin
  SetLength(resultArr, Left.Length + Right.Length);
  for i := 0 to Left.Length - 1 do
    resultArr[i] := Left[i];
  for i := 0 to Right.Length - 1 do
    resultArr[Left.Length + i] := Right[i];
  Result := TCowArray<T>.Create(resultArr);
end;

constructor TCowArray<T>.Create(const Data: TArray<T>);
begin
  if Data = nil then
    FData := nil
  else
    FData := TCowArrayData<T>.Create(Data);
end;

constructor TCowArray<T>.Create(const Data: array of T);
var
  arr: TArray<T>;
  i: Integer;
begin
  if System.Length(Data) = 0 then
    FData := nil
  else
  begin
    SetLength(arr, System.Length(Data));
    for i := 0 to System.Length(Data) - 1 do
      arr[i] := Data[i];
    FData := TCowArrayData<T>.Create(arr);
  end;
end;

function TCowArray<T>.GetItems(Index: Integer): T;
begin
  Result := FData.GetItem(Index);
end;

function TCowArray<T>.GetLength: Integer;
begin
  if FData = nil then
    Exit(0);
  Result := FData.GetLength;
end;

procedure TCowArray<T>.SetItems(Index: Integer; const Value: T);
begin
  FData := FData.MutableClone;
  FData.SetItem(Index, Value);
end;

function TCowArray<T>.ToArray: TArray<T>;
begin
  if FData = nil then
    Exit(nil);
  Result := FData.ToArray;
end;

{ TCowArrayData<T> }

constructor TCowArrayData<T>.Create(const Data: TArray<T>);
begin
  FData := Data;
end;

function TCowArrayData<T>.GetItem(Index: Integer): T;
begin
  Result := FData[Index];
end;

function TCowArrayData<T>.GetLength: Integer;
begin
  Result := Length(FData);
end;

function TCowArrayData<T>.MutableClone: ICowArrayData<T>;
begin
  if RefCount = 1 then
    Exit(Self);
  Result := TCowArrayData<T>.Create(ToArray);
end;

procedure TCowArrayData<T>.SetItem(Index: Integer; const Value: T);
begin
  FData[Index] := Value;
end;

function TCowArrayData<T>.ToArray: TArray<T>;
var
  i: Integer;
begin
  SetLength(Result, Length(FData));
  for i := 0 to Length(FData) - 1 do
    Result[i] := FData[i];
end;

procedure WriteArray(const Msg: string; Arr: TCowArray<Integer>);
var
  i: Integer;
begin
  Write(Msg, ':');
  for i := 0 to Arr.Length - 1 do
    Write(' ', Arr[i]);
  Writeln;
end;

var
  x, y: TCowArray<Integer>;
begin
  try
    x := TCowArray<Integer>.Create([1, 2, 3]);
    y := x;

    Writeln('Starting out, both x and y refer to same instance data');
    WriteArray('x', x);
    WriteArray('y', y);

    Writeln('Modifying x; note that y doesn''t change:');
    x[1] := 42;
    WriteArray('x', x);
    WriteArray('y', y);

    // Add operator as concatenation
    Writeln('Concatenation:');
    y := x + y;
    WriteArray('x', x);
    WriteArray('y', y);
    
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

4 comments:

Craig said...

I think this:

if RefCount = 1 then
Result := Self;
Result :=

...is a bug. Should be:


if RefCount = 1 then
Exit(Self);
Exit(

Barry Kelly said...

Yes, well caught. This is what you get when you keep context-switching between return in C and Result in Pascal...

Unknown said...

"...Note that due to some bugs in the current compiler's generics implementation for dynamic arrays, the code won't work if it's part of a unit - it needs to be in a single whole for now ..."

I faced this very same problem when trying every serious usage of generics with Delphi 2009...
Are those bugs already fixed internally ? If so will the bugfix be delivered soon (either separately of in the next service pack for D2009) ?

Barry Kelly said...

Vincent - some bugs are fixed, but there are many still remaining, unfortunately.