Two Wrongs

Extendable Data in Haskell (part 2)

Extendable Data in Haskell (part 2)

I wrote earlier about the problem of “dynamic dispatch”, or allowing users to extend your data type with types of different shape, and then having your code call out to the right operation of that type.

A reader emailed me in response to that article about a problem they were facing. They were trying to apply these techniques to an entity–component–system architecture they were creating.

Fast recap for unaware readers: entity–component–system architectures are common in game programming, among other things, as a replacement for regular OOP techniques. They are useful because they’re extremely composable and extendable. In essence,

  • A component is a dumb, plain collection of data. (Something like a struct in C.)
  • An entity is a collection of components. (Something like an array in C.)
  • A system is a function that modifies some components in an entity.

So, for example, a Rocket can be an entity, and it may have components Position, Velocity and Weight. Components are just dumb, transparent collections of data. Then there may be a system called Newton which updates the Position component in Rocket based on its Velocity component.

The main loop in the game then just runs all entities through all systems. If an entity doesn’t have all components that a system needs, the system simply disregards that entity entirely.

As it turns out, this kind of system is hard to make in an extensible manner in standard Haskell. It feels like there should be a very neat solution that makes you go “oh, of course” once it’s explained to you, but I still haven’t been able to find anything.

I think I may be getting closer to the essence of the problem though, but I’m not yet sure whether that is a good or a bad thing.

The complication

You’ll notice in my previous article on extendability in Haskell that there’s still a fixed set of something: the operations. So the situation is this:

As long as your data set is fixed (non-extendable), you write code like pretty much any beginner intuitively would: your data is stored in data records, and those records are inspected by separate functions you write. Users can extend the set of functions by just writing new functions over the same data.

However, if your data is extendable (so your users should be able to add more types of data), things are a bit trickier. This is a bit muddy, so I’ll attempt to illustrate in code. There are essentially four cases:

       Fixed    Extendable
Operations    Fixed    1    3
     Extendable    2    4

These cases are explained with a short example below.

1. Fixed Data and Fixed Operations

Imagine this datatype with the two companion operations as shown.

data Instrument
    = AcousticGuitar
        { tuning :: [Frequency]
        , material :: WoodType
        , age :: Integer
    | Piano
        { make :: Manufacturer
        , weight :: Integer

sell :: Instrument -> Price
sell instrument =
     case instrument of
          AcousticGuitar _ wood ->
              120 + age * (0.7 * woodValue wood - 30)
          Piano brand weight ->
              brandValue brand * 1000 - weight

play :: Instrument -> Note -> Sound
play = {- ... -}

As long as you don’t export the Instrument constructors and destructuring functions, it shouldn’t be possible to write new operations without modifying existing code.

2. Fixed Data and Extendable Operations

However, if you export the Instrument constructors and destructuring functions, a user of your code can easily extend the set of operations with new ones. For example, they may want to add a function

impressPeople :: Instrument -> Attractiveness
impressPeople instrument =
     case instrument of
         AcousticGuitar tuning _ ->
             if correct tuning then
         Piano brand _ ->
             if brandValue brand > 9 then

which calculates how attractive they appear when playing that instrument.

3. Extendable Data and Fixed Operations

This is where things get tricky and less intuitive. Again, I refer to the first example in case 1 – musical instruments and two operations on them. However, now the user wants to be able to add their own musical instruments without modifying existing code. Therefore, we can no longer have a sum type listing all musical instruments.

What we have to do is essentially “find the right level of abstraction”. If we want users to be able to add their own musical instruments, we can no longer care about the minutiae of which instruments are specially tuned, or for which instruments have a weight. What is it that we care about when it comes to musical instruments?

In this case, we only care about two operations: sell and play.

So we “invert” our code and define

data Instrument = Instrument
    { sell :: Price
    , play :: Note -> Sound

acousticGuitar :: [Frequency] -> WoodType -> Integer -> Instrument
acousticGuitar tuning material age =
        { sell = 120 + age * (0.7 * woodValue wood - 30)
        , play = {- ... -}

piano :: Manufacturer -> Integer -> Instrument
piano make weight =
        { sell = brandValue brand * 1000 - weight
        , play = {- ... -}

This still supports both operations we need, but now it is trivial for the user to add another musical instrument without modifying any existing code. By simply creating a constructor function

harmonica :: [Note] -> Boolean -> Instrument
harmonica scale isMetal =
        { sell =
            if isMetal then
                12 * length scale
                7 * length scale

        , play = {- ... -}

They have defined a new type of musical instrument that works perfectly fine with whatever functions you have already written to deal with music instruments!

4. Extendable Data and Extendable Operations

This camp is where Entity–Component–System architectures fall. I have no idea how to deal with this. The very thing that allowed us to handle case #3 was that while the kinds of data were very different, all different kinds of data had something in common: the operations we want to use on them were always the same.

In an ecs architecture, we want the users to be able to define their own components (i.e. the data must be extendable), but we also want the users to be able to define their own systems (i.e. the operations must be extendable.)

I’d love to learn a solution to this.

Update on : A Possible Solution, Using Typeclasses?

Awesome reader Ben pointed me to More Thoughts on the Expression Problem in Haskell by Eli Bendersky11 Who is a great tech writer and certainly an inspiration of mine., where one solution to this problem is described very well. The paper Data Types à la Carte has been around for a while and is about exactly this problem, so the solution isn’t new, but Eli does a very good job of summarising it in a way that clicks with my intuition. Essentially,

  • The data is extendable because the user constructs their own collection22 And this is also why it’s so verbose: in pure Haskell, there is no syntax to talk about collections on the type level. Type system extensions can probably help a lot here. of valid things (as opposed to having it hard-coded in the type system machinery.)
  • The operations are extendable because as long as the operation is supported by all forms of data in the collection, some plumbing typeclasses take care of selecting the right implementation and ensures operations work on all collections of data!