Tuesday, August 08, 2006

Programming language design philosophy: C++'s value orientation

Whenever I think of the C++ programming language, my heart sinks a little. I think the language has a set of features that, taken together, add up to less than the sum of its parts. I had occasion recently to think about what exactly it was about C++ that I didn't like, apart from the cryptic nature of template errors (mainly down to the lack of concepts), the lack of features that a GC environment permits (such as closures), and the ill-defined integral types and strange pointer compatibility rules inherited from historical machine architectures, and the #include / C linker model of modularity, and iostreams, and lack of a consistent, cohesive style for libraries, and lack of agreement on a string type. Come to think of it, there's quite a few things I don't like about C++. Anyway, I only want to deal with one issue here - C++'s value oriented programming.

To me, there's really one thing about C++ that makes it stand out against all the other popular industrial programming languages, when it comes to basic language features: it's taken the idea of 'user-defined types' about as far as it can go. This goes beyond simple object orientation, or even operator overloading. In particular, to permit library definitions of things like smart, reference-counted pointers (like Boost's shared_ptr) and dynamically sizable arrays and strings (such as std::basic_string and std::vector), there's some key requirements: copy constructors, assignment operator overloading, and destructors, with strong guarantees about being automatically called. They're required so that classes which aren't simply blittable can do the required bookkeeping. For example, classes might use dynamic allocation in a non-GC environment, so they need to call new / delete etc. as necessary to maintain the right ownership of data that member pointers refer to.

Now, those features, in isolation, aren't too bad. If they didn't exist, the user would have to do all the tedious, repetitious bookkeeping themselves, and that would be error prone. Throwing exceptions into the mix is what really causes problems. It was Herb Sutter's Exceptional C++, Item 10 which opened my eyes to the big problem - referring to writing a Pop() method on a Stack. When writing a function that returns a value of a user-defined type (such as a member function of a template class, or a template function), that function can't protect against an exception being thrown in the copy constructor. If it can't protect against an exception, then it can't be strongly exception-safe if it also modifies data, because it's not possible to roll back when an exception is thrown in the copy constructor. All this is one of the reasons that C++'s std::stack::pop() doesn't return the popped value.

Unfortunately, it's not very easy to guarantee that a useful copy constructor won't throw an exception. One of the handiest uses for a copy constructor is to make a deep copy of data that the class has dynamically allocated. That means that it's probably going to call operator new, which can throw if there's not enough memory. Even if it didn't throw or the nothrow variant is used, there's no easy way to return an error code instead, so an exception is pretty much required.

How much of a limitation is that, really? I actually don't think it's a big limitation. Not many systems deal very gracefully in extremely memory-constrained situations. Probably the best way to deal with it is not to run into an exception at all, by calculating up-front how much memory is going to be needed early on in a call stack, and throwing an exception if the required amount + slack isn't available. Of course, that tactic isn't rock-solid for several reasons, including race conditions with allocation on other threads, and the difficulty and flattening of abstraction in making the required calculation.

There's more to this value orientation. The construction of new value types that are supposed to work just as well as builtin types uses the same keywords (struct and class) as the keywords to create more traditional object-oriented types (i.e. Java-style reference types), focused on things like polymorphism and dynamic dispatch. Thing is, writing correct value types is far harder than writing correct Java-style types, and C++ has some features that seem to make value types the default.

It's only with value-oriented types that you need to deal with copy constructors, assignment operators and implicit and explicit type conversions. All you need to do in C++ to create a copy constructor or an implicit type conversion is have only one argument to your type's constructor, and C++ will automagically call your constructor, sometimes when you least expect it (e.g. calling a function with the wrong type, which happens to match a constructor). Hence, the requirement for an 'explicit' keyword which prevents this behaviour when it's undesireable. Similarly, C++ automatically creates an assignment operator that will copy the binary data in the class, and it permits object slicing - copying a subtype into a supertype location and lopping off everything that made the subtype a subtype, almost as if you could turn a mammal into an earlier, sea-based life form by chopping off its legs.

On the other hand, this capability permits some pretty neat functionality - the fact that std::basic_string or CComPtr in the ATL can be implemented without extending the language or modifying the compiler is admirable. I just wonder if it cost too much. One thing I'm certain of, from my prejudiced OO and functional (where appropriate) perspective: the strong lean towards value orientation over and above Java-style reference-based object orientation is a flaw in C++. Programming languages, above all things, should keep the principle behind the "Pit of Success" foremost, and always make the natural way of phrasing the solution to a problem the correct way.

Of course, the best practices that determine the natural solution adjust slowly over the years. But when did common domain objects like Person, Car and Account ever need to act like integers, freely copyable and duplicated hither and thither? I'm not sure they ever did, so I'm not inclined to let C++ off the hook just for historical reasons.

No comments: