Two Wrongs

Purely Functional AVL Trees in Common Lisp

Purely Functional AVL Trees in Common Lisp

There are some algorithms and data structures that I personally think every programmer should know by heart. These are relatively simple but incredibly useful. I used to have a list of them somewhere, which I seem to have misplaced. The list includes, among other things, insertion sort11 Insertion sort is so simple a child could have invented it, yet it performs surprisingly well on modern computers., Fisher-Yates shuffle22 A function frequently missing in standard libraries, but easy to write yourself., and reservoir sampling33 When you don’t need a full shuffle, or need to sample from an infinite stream., among other things. The list does not include linked lists.

Why Binary Trees

Many people are surprised when they hear linked lists are not part of this list. I have deliberately left it out, because very rarely, in my experience, do linked lists actually make sense44 Intrusive linked lists are one example of where they do make sense, but this is far from everything people seem to want to use them for.. Almost all of their use-cases can be better covered by55 Also keep in mind that data structures can be combined. A tree-of-arrays is called a rope, for example. A skiplist is a sequence backed by several layers of linked lists for quicker search.:

Plain arrays
when what you need is a plain sequence, for iteration, storage, or just about anything, a dynamic array is more versatile and perfomant than most people think. When in doubt, remember that nobody ever got fired for choosing arrays.
Array-backed heaps
when what you need is a prioritised queue, an array-backed heap gives you some benefits over plain arrays, while still remaining simple to implement.
Binary search trees
Binary search trees are practically linked lists without many of the things that suck about linked lists. The difference in performance between the linear traversal of linked lists and the logarithmic performance of binary trees is massive. There is some truth to the joke that goes “\(O(\log{n})\) is \(O(1)\) for any realistic \(n\)”.

However, binary search trees are marginally trickier to implement than linked lists. In order to be efficient, a binary search tree must be balanced66 Note, however, that even a degenerate binary search tree is more efficient than a linked list. The worst case, i.e. when the binary search tree is maximally unbalanced, gives you the characteristics of a linked list. In other words, by switching from linked lists to binary trees, you cannot get worse results, only better. The standard linked list characteristics are the absolute worst case of binary trees.. There are a few popular approaches to maintaining balance in binary search trees, and I picked two and implemented them in Common Lisp the last couple of days: left-leaning red-black trees, and avl trees. In my opinion, avl trees are simpler – and from what I can tell, they perform better too.

avl Trees

I won’t go into details on the specific maths behind avl trees because there are so many other resources doing that much better than I could dream of. It is a binary tree, which means it consists of nodes, each of which has a key and a value associated with that key77 The value can be skipped if all you want is a flat sequence of elements., as well as a left and right branch. The branches may be nil, indicating a leaf node.

(defclass avl-tree ()
    :initarg :node-key
    :reader node-key)
    :initarg :node-value
    :reader node-value)
    :initarg :left-child
    :reader left-child)
    :initarg :right-child
    :reader right-child))
   "A node in an AVL tree."))

(defun make-node (key value left right)
  "Binary tree node with association and branches."
   'avl-tree :node-key key :node-value value
   :left-child left :right-child right))

The height of a node in a binary tree is defined as the longest path down to a leaf node.88 A very sensible optimisation can be made here: store the height of each node in the node itself. All operations we will perform on the tree will have known effects on node heights, so by storing the height of each node we never have to calculate it from scratch ever again.

(defmethod tree-height ((tree null))
  "Get the height of an empty TREE."

(defmethod tree-height ((tree avl-tree))
  "Get height of TREE."
  (1+ (max (tree-height (left-child tree))
           (tree-height (right-child tree)))))

The central principle in keeping avl trees balanced is ensuring that the difference between the heights of the right and left branch of a node never exceeds one. This difference is called the balance factor of the node.

(defun balance-factor (node)
  "Get balance factor of subtree rooted at NODE."
  (ecase (- (tree-height (right-child node))
            (tree-height (left-child node)))
    (-2 :imbalanced-left)
    (-1 :left-heavy)
    ( 0 :balanced)
    (+1 :right-heavy)
    (+2 :imbalanced-right)))

The balance factor can become \(\pm 2\) if the insertion we just did caused the tree to become imbalanced. We can restore it back to the \(\pm 1\) range by rotating the tree around the node. A binary search tree is fundamentally unchanged by the rotation operation: it simply shifts some of the nodes over to the side we’re rotating towards. (We have not yet seen the definition of the avl-node function, but we will shortly.)

(defmethod rotate-left ((node avl-tree))
  "Return TREE rotated left."
  (with-slots (key value height left right) node
     (node-key right) 
     (node-value right)
     (avl-node key value
               left (left-child right))
     (right-child right))))

(defmethod rotate-right ((node avl-tree))
  "Return TREE rotated right."
  (with-slots (key value height left right) node
     (node-key left)
     (node-value left)
     (left-child left)
     (avl-node key value
               (right-child left) right))))

There are five simple rules that specify how we must rotate each node to maintain its balance:

  • If the current node is fully balanced, or only slightly lopsided (balance factor \(\pm 1\)), we do nothing.
  • If the current node is imbalanced to the left (balance factor \(-2\)), and the newly inserted element was inserted into the left branch of the left child of the current node 99 This can be detected by checking if the left child has balance factor \(-1\)., we rotate the tree right around the current node.
  • If the current node is imbalanced to the left (balance factor \(-2\)), but the newly inserted element was inserted into the right branch of the left child of the current node, we rotate the left child left. This will put the tree in a state where it’s still imbalanced to the left, but now the left child will be left-heavy instead, so a right rotation of the current node will be performed, and this balances the tree out.1010 This is in the literature described as a double rotation. First rotate the child one way, then the parent the other. The implementation is simplified if we recognise we don’t explicitly have to do two rotations – we can let the second one just happen through regular processing.
  • The remaining two cases are mirror images of the previous two cases.
(defun avl-node (key value &optional left right)
  "Balanced AVL tree node."
  (let ((node (make-node key value left right))) 
    (ecase (balance-factor node)
      ((:left-heavy :balanced :right-heavy)

       (ecase (balance-factor left)
          (rotate-right node))
          (avl-node key value
                    (rotate-left left) right))))

       (ecase (balance-factor right)
          (avl-node key value
                    left (rotate-right right)))
          (rotate-left node)))))))

In order to know where in the tree to insert a new value, we’d like a generic way to compare values. Curiously, this is not part of the Common Lisp standard, but on the other hand, it is oh so simple to implement. For the sake of this article, we only care about numbers, but in reality we’d probably define this for all built-in types – and maybe some user-defined types too.

(defmethod lessp ((a number) &rest rest)
  (apply #'< a rest))

Finally, we can define insertion recursively. If we hit a leaf node, we just return a new node with no branches to replace this leaf.

Otherwise, we construct a new node derived from the node we’re currently visiting, but where the new value is inserted into one of its branches (depending on whether it’s less than or greater than the node value). When constructing a node with avl-node, balancing is taken care of.

Note that this recursion is also what makes the tree purely functional: first we descend down the tree until we hit a leaf where the new node can be inserted. Then when the stack is unwound on the way back, it reconstructs all nodes that were visited on the way down. So in the end, the tree that is returned back to the caller will have one path from the root to the newly inserted element which will consist of brand new node objects, but all nodes not in these path will be shared with the previous version of the tree.

(defmethod insert (key value (tree null))
  "Insert pair of KEY and VALUE in an empty TREE."
  (avl-node key value nil nil))

(defmethod insert (key value (tree avl-tree))
  "Add an association from KEY to VALUE in TREE."
   (node-key tree)
   (node-value tree)
   (if (lessp key (node-key tree))
       (insert key value
               (left-child tree))
       (left-child tree))
   (if (lessp key (node-key tree))
       (right-child tree)
       (insert key value
               (right-child tree))))))

In order to be able to do something useful with this, we also need a way to lookup associations we have inserted.

(defmethod lookup (key (tree null))
  "Lookup KEY in the empty TREE."

(defmethod lookup (key (tree avl-tree))
  "Return all values associated with KEY in TREE."
  (with-slots ((node-key key) value left right)
      ((lessp key node-key) (lookup key left))
      ((lessp node-key key) (lookup key right))
      (t (cons value
               (append (lookup key left)
                       (lookup key right)))))))


For a final test run, we insert associations between numbers and some words of that length.

(defvar *word-map*)
(setq *word-map* nil)

(mapc (lambda (word)
        (setq *word-map*
              (insert (length word)
      '("hey" "goodbye" "hello"
        "hi" "world" "greetings"))

Now we can check which words we have registered with length 5.

(lookup 5 *word-map*)
("hello" "world")

Neat! We have implemented our own functional multimap, dict, table, or whatever you want to call it. That’s something you want to have in your toolbox.