Saturday, November 01, 2008

Reference-counted pointers, revisited

Some time ago, I blogged about writing smart pointers (i.e. reference-counted auto-destruction) in Delphi. While having dinner with some of the speakers at the EKON 12 conference I attended last week, a more fluent interface for using smart pointers in Delphi occurred to me.

I'm using the same TSmartPointer<T> class that I started out with in the previous article, though I've renamed it TObjectHandle<T>. The main tricks I'm pointing out here are (1) to use method references to avoid having to use the Value property all the time, and (2) to use aliases at the point of class definition to make construction slightly more palatable.

So, here's my new TObjectHandle<T> class; the main change, apart from the name, is a new method called Wrap:

type
  TObjectHandle<T: class> = record
  private
    FValue: T;
    FLifetimeWatcher: IInterface;
  public
    constructor Create(const AValue: T);
    property Value: T read FValue;
    class operator Implicit(const AValue: T): TObjectHandle<T>;
    class function Wrap(const AValue: T): TFunc<T>; static;
  end;

The implementation of the new method is pretty simple too:

class function TObjectHandle<T>.Wrap(const AValue: T): TFunc<T>;
var
  h: TObjectHandle<T>;
begin
  h := AValue;
  Result := function: T
  begin
    Result := h.Value;
  end;
end;

The capture of the h local variable will mean that the handle will be kept alive as long as the method reference constructed from the anonymous method is kept alive.

Here it is in use, as two versions, so that the usage difference can be seen. This is also where the additional lubrication of declaring aliases comes in. I start out with a little TCanary class which can keep track of destruction, and has a Name property to demo the fluency of the technique:

type
  TCanary = class
  private
    FName: string;
  public
    destructor Destroy; override;
    property Name: string read FName write FName;
  end;
  
  OHCanary = TObjectHandle<TCanary>;
  HCanary = TFunc<TCanary>;

The destructor prints out the name of the canary when it is destroyed. The two aliases represent an Object Handle for TCanary and a Handle for TCanary respectively. The fluent technique relies on both; the second is used for smart pointer locations and the first for smart pointer construction. There is a tradeoff involved in the technique, between construction fluency and usage fluency:

procedure Test1;
var
  canary: OHCanary;
begin
  // easy construction (implicit operator)
  canary := TCanary.Create;
  // but cumbersome access - always need Value accessor
  canary.Value.Name := 'Test1 canary';
end;

The new style has slightly worse construction, but better actual use:

procedure Test2;
var
  canary: HCanary;
begin
  // cumbersome constructor
  canary := OHCanary.Wrap(TCanary.Create);
  // but much nicer access
  canary.Name := 'Test2 canary';
end;

Without having to access everything by prefixing every access with .Value, a lot of fluency is gained, IMHO.

To summarize, here's the entire ObjHandle.pas unit:

unit ObjHandle;

interface

uses SysUtils;

type
  TObjectHandle<T: class> = record
  private
    FValue: T;
    FLifetimeWatcher: IInterface;
  public
    constructor Create(const AValue: T);
    property Value: T read FValue;
    class operator Implicit(const AValue: T): TObjectHandle<T>;
    class function Wrap(const AValue: T): TFunc<T>; static;
  end;
  
  TObjectHandleArray<T: class> = array of TObjectHandle<T>;

procedure MakeDestroyer(Obj: TObject; out Result: IInterface);

implementation

{ TLifetimeWatcher }

type
  TLifetimeWatcher = class(TInterfacedObject)
  private
    FProc: TProc;
  public
    constructor Create(const AProc: TProc);
    destructor Destroy; override;
  end;

constructor TLifetimeWatcher.Create(const AProc: TProc);
begin
  FProc := AProc;
end;

destructor TLifetimeWatcher.Destroy;
begin
  if Assigned(FProc) then
    FProc;
  inherited;
end;

procedure MakeLifetimeWatcher(out Result: IInterface; const AProc: TProc);
begin
  Result := TLifetimeWatcher.Create(AProc);
end;
  
procedure MakeDestroyer(Obj: TObject; out Result: IInterface);
begin
  Result := TLifetimeWatcher.Create(procedure
    begin
      Obj.Free;
    end);
end;

{ TObjectHandle<T> }

constructor TObjectHandle<T>.Create(const AValue: T);
begin
  FValue := AValue;
  MakeDestroyer(FValue, FLifetimeWatcher);
end;

class operator TObjectHandle<T>.Implicit(const AValue: T): TObjectHandle<T>;
begin
  Result := TObjectHandle<T>.Create(AValue);
end;

class function TObjectHandle<T>.Wrap(const AValue: T): TFunc<T>;
var
  h: TObjectHandle<T>;
begin
  h := AValue;
  Result := function: T
  begin
    Result := h.Value;
  end;
end;

end.

And here's the entire demo program:

{$apptype console}

uses SysUtils, ObjHandle;

type
  TCanary = class
  private
    FName: string;
  public
    destructor Destroy; override;
    property Name: string read FName write FName;
  end;
  
  OHCanary = TObjectHandle<TCanary>;
  HCanary = TFunc<TCanary>;

destructor TCanary.Destroy;
begin
  Writeln(FName, ' died.');
end;

procedure Test1;
var
  canary: OHCanary;
begin
  // easy construction (implicit operator)
  canary := TCanary.Create;
  // but cumbersome access - always need Value accessor
  canary.Value.Name := 'Test1 canary';
end;

procedure Test2;
var
  canary: HCanary;
begin
  // cumbersome constructor
  canary := OHCanary.Wrap(TCanary.Create);
  // but much nicer access
  canary.Name := 'Test2 canary';
end;

begin
  Test1;
  Test2;
end.

15 comments:

Anonymous said...

The usage of the "variable" is now easier but at what cost. Anonymous methods are implemented with interfaces. A simple field-property access is now an interface method call. This makes it impossible for the compiler to optimize the code and needs a CALL and a JMP (not talking about the extra memory accesses)

Barry Kelly said...

Andreas,

Yes, the access is slightly slower and less cache friendly, but these points pretty much irrelevant to a smart pointer.

The architectural benefits of being able to pass around references to objects without having to negotiate, document and maintain an ownership contract are very large.

Without garbage collection in Delphi, smart pointers are the easiest way to selectively implement this architectural approach without the inflexibility of e.g. a zoned heap solution.

Hadi Hariri said...

"The architectural benefits of being able to pass around references to objects without having to negotiate, document and maintain an ownership contract are very large."

Which makes dependency injection so much more attractive and doable now.

Sergey Antonov aka oxffff said...

Interesting, Barry.
A big chain of responsibilities.
I have just known about the capture of the whole record, not only of the guard field Value.

Sergey Antonov aka oxffff said...

'The architectural benefits of being able to pass around references to objects without having to negotiate, document and maintain an ownership contract are very large'

Barry,
may be, or may not.

What about the next sample with the correct syntax.

procedure Test2;
var
canary: HCanary;
WeakRef:TCanary;
begin
// cumbersome constructor
canary := OHCanary.Wrap(TCanary.Create);
// but much nicer access
canary.Name := 'Test2 canary';
WeakRef:=canary;
canary:=nil;
// Next code may result in AV on some
// circumstances. Or may not.
WeakRef.DoSomething.
end;

This sample of cource is very far-fetched.
But from user point it is still valid.

So, even the 3 extra unmanged heap objects can not guard from valid user actions.

The only way is to control all of the copy operations of heap objects references. So no GC, no guarantee.

Sergey Antonov aka oxffff said...

Barry,
I found a more aggressive, valid from user point sample.

canary: HCanary;
canary2: HCanary;
begin
canary := OHCanary.Wrap(TCanary.Create);
canary2:=canary;
canary2:=OHCanary.Wrap(canary);
canary:=nil;

canary2.Dosomething; // Op on invalid object
canary2:=nil; Try to Free already freed object.

Barry Kelly said...

@sergey
- The local variable is the only thing that is captured. The field access is just applied to the captured local; field accesses themselves are not captured, only the root symbol of the expression (which may be an implicit "Self").

- On your second point, yes, Delphi is not a memory-safe language, and I've never claimed it to be; it cannot be without full GC and abandoning raw pointers (and much more). Some programmer intelligence required; maybe we can throw SysUtils.EProgrammerNotFound for this case.

- On your third point, yes, if you wrap the same object twice with different handles, you'll free it twice, with concomitant bad results. Don't use any of this if you don't understand what's going on.

Sergey Antonov aka oxffff said...

@Barry

perfectly, since anonymous method has Ref counting semantic and is used by outer consumer, there is no need in inner shared TSmartPointerController.
So we can remove 2 extra objects.

So from Russia with love

TInjectType "T"=record
VMT:pointer;
unknown:Iunknown;
AValue:T;
end;

TSmartPointerFromRussia "T: class"=class
class function Wrap(const AValue: T): TFunc"T"; static;
end;

function Trick_Release(var obj): Integer; stdcall;

const PSEUDO_VMT:array [0..2] of pointer=(nil,nil,@trick_Release);

function Trick_Release(var obj): Integer; stdcall;
type
TypeA=record
VMT:pointer;
unknown:Iunknown;
AValue:Tobject;
end;
begin
TypeA(obj).AValue.Free;
end;

{ TSmartPointerFromRussia"T" }

class function TSmartPointerFromRussia"T".Wrap(const AValue: T): TFunc"T";
var
h: TInjectType"T";
begin
pointer(h.unknown):=@h;
h.VMT:=@PSEUDO_VMT;
h.AValue:=AValue;
Result := function: T
begin
Result := h.AValue;
end;
end;

:)

Barry Kelly said...

Sergey
- Indeed, it can be optimized even further. Since method references are just interfaces with an Invoke method, so we could actually get rid of the anonymous method altogether.

However, the point of the post is to demo the concept, not to create the ultimate implementation.

For example, consider this:

---8<---
unit  ObjHandle2;

interface

uses  SysUtils;

type
    TObjectHandle<T:  class>  =  class(TInterfacedObject,  TFunc<T>)
    private
        FValue:  T;
    public
        constructor  Create(AValue:  T);
        destructor  Destroy;  override;
        function  Invoke:  T;
    end;
   
implementation

constructor  TObjectHandle<T>.Create(AValue:  T);
begin
    FValue  :=  AValue;
end;

destructor  TObjectHandle<T>.Destroy;
begin
    FValue.Free;
end;

function  TObjectHandle<T>.Invoke:  T;
begin
    Result  :=  FValue;
end;

end.
--->8---

etc.

Sergey Antonov aka oxffff said...

@Barry

This idea comes to me before I already have sleeped after my comments to you with a little different implementation. Because I don't known all of available syntax as you known in my implementation I want to patch VMT-4 slot (i.e. destroy) of anonymous object at run time to redirect to the external proc with the inner calling of the original one.
So we think in the same direction.

Barry Kelly said...

Sergey,

I'm not sure what you mean when you refer to VMT -4 slot (vmtDestroy). I'm not sure patching the class VMT (as distinct from the COM interface vtable) would be a good way to go. Life would be a lot easier if one were to simply hand-encode the vtable directly and abandon an object altogether, since the TFunc<T>.Invoke function will always be returning a simple 32-bit pointer.

Patching compile-time tables at runtime is a bad idea in general anyway, as you will normally have to change page protection with VirtualProtect etc., and you will give up benefits of shared memory for that page for multiple executables / packages / DLLs etc.

Sergey Antonov aka oxffff said...

@barry

procedure MyDestroy(obj:Tobject);
asm
mov eax,[eax+12];
call Tobject.Free;
end;

const DestroySlot:pointer=@MyDestroy;

procedure PatchVMT(obj:Tobject);
asm
mov [eax],offset DestroySlot+4;
end;

class function TSmartPointerFromRussia'T'.Wrap2(const AValue: T): TFunc'T';
begin
PatchVMT(Tobject(dword(@AValue)-12));
Result := function: T
begin
Result := AValue;
end;
end;

Barry Kelly said...

Sergey,

Have you tested your code? Can you provide a complete unit that works as you expect? Note that you can use &lt; to represent < in comments etc. &nbsp; is useful too, for indentation.

The reason I ask is that your second piece of code contains no mechanism that I can see that will free the hidden object on the heap that implements the anonymous method. It appears to be trying to simply directly free the AValue. This will cause a leak, no? You need to free two objects.

I still think it is easier to implement the technique shown in the subsequent post to this one, but using vtables by hand, if that's the kind of efficiency that is required.

Sergey Antonov aka oxffff said...

@barry

Well criticism.
So. This is a correction.


procedure MyDestroy(obj:Tobject);
asm
push eax;
mov eax,[eax+12];
call Tobject.Free;
//No necessity for finalization, just Freemem
//And there is no valid RTTI for finalization
pop eax;
call System.@Freemem;
end;


I think your solution is more expressive and easy to understand. But!!!


TObjectHandle "T: class" = class(TInterfacedObject, TFunc"T")
private
FValue: T;
public
constructor Create(AValue: T);


The advantage of anonymous method goes away. The contract for analizing captured states is delegated to user, not to compiler. So all of deductions is done by programmer.

Sorry, for my text layout. I really don't known about control codes for text layout formation.
Today, I am deep in LL, LR analize.

Please, answer the question.
Why does the decision of making
parametrized types programming in Delphi
is done to be like .NET generics, not C++ templates?

Is generated code sharing for different instantions of generic class with class contrained type is used as in .NET?

If not, why to restrict using operators on typed parameters in Generics, if all can be deductded at compile time by compiler?

Generics.Defaults of couse can help a little, but
operands have to be of the same type. Suppose
we neeed to do something like that

A+B+C*D/E+1/A+C

where A,B,C,D,E - typed parameter.

So we have to suppy with all operations with all of the used combination of typed parameter. We have to deduct it at compile time singly.
No help from compiler.
If does the expression to be more complex?

Barry Kelly said...

Sergey,

There are two answers to your question about generics versus templates:

1) .NET compatibility - it was more important at start of development to keep compatibility with dccil.

2) We don't want C++ issues with obscure errors at instantiation time. C++ 0x is already having to correct this flaw in its design with Concepts. We may use the same thing, perhaps more designed after Haskell type classes, and also use them for type inference.

On your point about "all deductions done by programmer", i.e. compiler not doing enough, well one can't have it both ways. Either there can be a clear explication broken down into simple components (as in this post) or it can be implemented with maximum efficiently by taking advantage of "seeing through the abstractions", with the loss of clarity that causes (as in the subsequent post - as well as your own technique, I might add).