Binary tree

This page is about the data structure. For the Binary tree form of poetry, see Binary tree (poetry).

In computer science, a binary tree is an ordered tree data structure in which each node has at most two children. Typically the child nodes are called left and right. One common use of binary trees is binary search trees; another is binary heaps.

Definition in graph theory

Graph theorists use the following definition. A binary tree is a connected acyclic graph such that the degree of each vertex is no more than 3. A rooted binary tree is such a graph that has one of its vertices of degree no more than 2 singled out as the root. With the root thus chosen, each vertex will have a uniquely defined parent, and up to two children; however, so far there is insufficient information to distinguish a left or right child. If we drop the connectedness requirement, allowing multiple connected component in the graph, we call such a structure a forest.

Types of binary tree

A binary tree is a rooted tree in which every node has at most two children.

A full binary tree is a tree which every node has zero or two children.

A perfect binary tree is a complete binary tree in which leaves (vertices with zero children) are at the same depth (distance from the root, also called height).

Sometimes the perfect binary tree is called the complete binary tree. Some others define a complete binary tree to be a full binary tree in which all leaves are at depth n or n-1 for some n.

Methods for storing binary trees

Binary trees can be constructed from programming language primitives in several ways. In a language with records and references, binary trees are typically constructed by having a tree node structure which contains some data and references to its left child and its right child. Sometimes it also contains a reference to its unique parent. If a node has fewer than two children, some of the child pointers may be set to a special null value, or to a special sentinel node.

Binary trees can also be stored in arrays, and if the tree is a complete binary tree, this method wastes no space. In this compact arrangement, if a node has an index i, its children are found at indices 2i+1 and 2i+2, while its parent (if any) is found at index floor((i-1)/2) (assuming the root has index zero). This method benefits from more compact storage and better locality of reference, particularly during a preorder traversal. However, it requires contiguous memory, is expensive to grow, and wastes space proportional to 2^h - n for a tree of height h with n nodes.

A small complete binary tree stored in an array

In languages with tagged unions such as ML, a tree node is often a tagged union of two types of nodes, one of which is a 3-tuple of data, left child, and right child, and the other of which is a "leaf" node, which contains no data and functions much like the null value in a language with pointers.

Methods of iterating over binary trees

Often, one wishes to visit each of the nodes in a tree and examine the value there. There are several common orders in which the nodes can be visited, and each has useful properties that are exploited in algorithms based on binary trees.

Pre-order, in-order, and post-order traversal

Say we have a tree node structure which contains a value value and references left and right to its two children. Then we can write the following function:

visit(node)
    print node.value
    if node.left  != null then visit(node.left)
    if node.right != null then visit(node.right)

This prints the values in the tree in pre-order. In pre-order, each node is visited before any of its children. Similarly, if the print statement were last, each node would be visited after all of its children, and the values would be printed in post-order. In both cases, values in the left subtree are printed before values in the right subtree.

visit(node)
    if node.left  != null then visit(node.left)
    print node.value
    if node.right != null then visit(node.right)

Finally, an in-order traversal, as above, visits each node between the nodes in its left subtree and the nodes in its right subtree. This is a particularly common way of traversing a binary search tree, because it gives the values in increasing order.

To see why this is the case, note that if n is a node in a binary search tree, then everything in n's left subtree is less than n, and everything in n's right subtree is greater than or equal to n. Thus, if we visit the left subtree in order, using a recursive call, and then visit n, and then visit the right subtree in order, we have visited the entire subtree rooted at n in order.

In this binary tree,

Preorder traversal yields: 2, 7, 2, 6, 5, 11, 5, 9, 4
Postorder traversal yields: 2, 5, 11, 6, 7, 4, 9, 5, 2
In-order traversal yields: 2, 7, 5, 6, 11, 2, 5, 4, 9

We could perform the same traversals in a functional language like Haskell using code like this:

data Tree a = Leaf | Node(a, Tree a, Tree a)

preorder Leaf = []
preorder (Node (x, left,right)) = [x] ++ (preorder left) ++ (preorder right)

postorder Leaf = []
postorder (Node (x, left,right)) = (postorder left) ++ (postorder right) ++ [x]

inorder Leaf = []
inorder (Node (x, left,right)) = (inorder left) ++ [x] ++ (inorder right)

All of the above recursive algorithms use stack space proportional to the depth of the tree. If we store on each node a reference to its parent, then we can implement all of these traversals using only constant space using a simple iterative algorithm. The parent references occupy much more space, however; this is only really useful if the parent references are needed anyway, or stack space in particular is limited. For example, here is an iterative algorithm for in-order traversal:

visit(root)
    prev    := null
    current := root
    next    := null
    
    while current != null
        if prev == current.parent
            prev := current
            next := current.left
        if next == null or prev == current.left
            print current.value
            prev := current
            next := current.right
        if next == null or prev == current.right
            prev := current
            next := current.parent
        current := next

Depth-first order

In depth-first order, we always attempt to visit the node farthest from the root that we can, but with the caveat that it must be a child of a node we have already visited. Unlike a depth-first search on graphs, there is no need to remember all the nodes we have visited, because a tree cannot contain cycles. Preorder, in-order, and postorder traversal are all special cases of this. See depth-first search for more information.

Breadth-first order

Contrasting with depth-first order is breadth-first order, which always attempts to visit the node closest to the root that it has not already visited. See Breadth-first search for more information.

Encoding n-ary trees as binary trees

There is a one-to-one mapping between general ordered trees and binary trees, which in particular is used by Lisp to represent general ordered trees as binary trees. Each node N in the ordered tree corresponds to a node N' in the binary tree; the left child of N' is the node corresponding to the first child of N, and the right child of N' is the node corresponding to the N's next sibling --- that is, the next node in order among the children of the parent of N

One way of thinking about this is that each node's children are in a linked list, chained together with their right fields, and the node only has a pointer to the beginning or head of this list, through its left field.

For example, in the tree on the left, A has the 6 children {B,C,D,E,F,G}. It can be converted into the binary tree on the right.

The binary tree can be thought of as the original tree tilted sideways, with the black left edges representing first child and the blue right edges representing next sibling. The leaves of the tree on the left would be written in Lisp as:

(((M N) H I) C D ((O) (P)) F (L))

which would be implemented in memory as the binary tree on the right, without any letters on those nodes that have a left child.