Jump to content

AVL tree

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Gaius Cornelius (talk | contribs) at 23:52, 20 October 2007 (Clean up using AWB). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
An example of an unbalanced non-AVL tree

In computer science, an AVL tree is a self-balancing binary search tree, and the first such data structure to be invented[citation needed]. In an AVL tree the heights of the two child subtrees of any node differ by at most one, therefore it is also called height-balanced. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases. Additions and deletions may require the tree to be rebalanced by one or more tree rotations.

The AVL tree is named after its two inventors, G.M. Adelson-Velsky and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information."

The balance factor of a node is the height of its right subtree minus the height of its left subtree. A node with balance factor 1, 0, or -1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees.

AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also take O(log n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications.[1] The AVL tree balancing algorithm appears in many computer science curricula.

The same tree after being height-balanced

Explanation

First, see binary tree for an explanation of a normal binary tree. An AVL tree operates exactly like a normal binary tree, except additional work is done after an insertion and a deletion.

The problem with normal binary trees is that if data is added in-order then the tree isn't a tree at all - it's a list. For example, if we had five data to add to a tree where each datum is an integer, and we added the integers 1, 2, 3, 4 and 5, in that order, the tree would not branch at all - it would be a single long line.

This is a problem because the whole point of a binary tree is that searching is fast because when the tree is balanced, each step in the search eliminates half of the tree at that point. (Imagine a very large, fully balanced tree; you start at the root node, and go left or right. In doing so, you've just eliminated half the elements from your search!)

So to solve this problem, an AVL tree can be used, which as mentioned, acts as a normal binary tree but does additional processing after an add or delete, so that the tree remains balanced, no matter what order data is placed (or removed) from the tree.

Now, to perform balancing work on the tree, we have to know when the tree is unbalanced - because we have to know when we need to fix things. One possible way to do this is to keep track in each node of the tree how many nodes exist on the left and right of each node. When we add a node, or delete a node, we travel up the tree from the point the node is added or deleted, updating the node counts. If the number of nodes is different, we know that the tree to the left and right cannot be balanced, because there are a different number of elements on each side.

However, this doesn't quite work - the problem is that the number of elements in a subtree doesn't tell you anything about how they are arranged. They could be properly balanced, or they could be a long list!

In fact what you need to keep track of is the depth of the subtree. A full subtree with a depth of three would have fourteen elements - and it would be perfectly balanced. A subtree with a depth of three which is unbalanced might have only three elements in (a list) - and so be in need of rebalancing.

Each time an element is added, we in fact update a count of the depth of the subtree on the left and right of each node. We travel up the tree from the point of addition, updating those values by adding one. We then travel back up the same path, this time rearranging the tree so that the subtree depths remain equal on each side.

(In fact, subtree depths must be equal or differ only by one - after all, we could have three elements on the left and two on the right, which would still be as balanced as possible. If we rebalanced that tree, all we'd do is then have two elements on the left and three on the right; we wouldn't gain anything).

Rebalancing the tree involves performing left and right rotation operations on unbalanced nodes, which adjusts the physical shape of the tree which keeping it logically valid (e.g. binary search down the tree remains correct).

In fact, the tree is rebalanced every time a node is inserted or deleted, which specifically limits the things that can have to be done to rebalance the tree, because the tree can never be more than one sub-tree depth out of balance.

Operations

The basic operations of an AVL tree generally involve carrying out the same algorithms as would be carried out on an unbalanced binary search tree, but preceded or followed by one or more of the so-called "AVL rotations."

Insertion

Insertion into an AVL tree may be carried out by inserting the given value into the tree as if it were an unbalanced binary search tree, and then retracing one's steps toward the root updating the balance factor of the nodes.[citation needed]

If the balance factor becomes -1, 0, or 1 then the tree is still in AVL form, and no rotations are necessary.

If the balance factor becomes 2 or -2 then the tree rooted at this node is unbalanced, and a tree rotation is needed. At most a single or double rotation will be needed to balance the tree.[citation needed]

Only the nodes traversed from the insertion point to the root of the tree need be checked, and rotations are a constant time operation, and because the height is limited to O(log(n)), the execution time for an insertion is O(log(n)).

Deletion

If the node is a leaf, remove it. If the node is not a leaf, replace it with either the largest in its left subtree or the smallest in its right subtree, and remove that node. Thus the node that is removed has at most one child. After deletion retrace the path back up the tree to the root, adjusting the balance factors as needed.

The retracing can stop if the balance factor becomes -1 or 1 indicating that the height of that subtree has remained unchanged. If the balance factor becomes 0 then the height of the subtree has decreased by one and the retracing needs to continue. If the balance factor becomes -2 or 2 then the subtree is unbalanced and needs to be rotated to fix it. If the rotation leaves the subtree's balance factor at 0 then the retracing towards the root must continue since the height of this subtree has decreased by one. This is in contrast to an insertion where a rotation resulting in a balance factor of 0 indicated that the subtree's height has remained unchanged.

The time required is O(h) for lookup plus O(h) rotations on the way back to the root; so the operation can be completed in O(log n) time.

Lookup

Lookup in an AVL tree is performed exactly as in an unbalanced binary search tree, except because of the height-balancing of the tree, a lookup takes O(log n) time. No special provisions need to be taken, and the tree's structure is not modified by lookups. (This is in contrast to splay tree lookups, which do modify their tree's structure.)

If each node additionally records the size of its subtree (including itself and its descendants), then the nodes can be retrieved by index in O(log n) time as well.

Once a node has been found in a balanced tree, the next or previous node can be obtained in amortized constant time. (In a few cases, about 2*log(n) links will need to be traversed. In most cases, only a single link need be traversed. On the average, about two links need to be traversed.)[citation needed]

Comparison to other structures

Both AVL trees and red-black trees are self-balancing binary search trees, so they are very similar mathematically. The operations to balance the trees are different, but both occur in constant time. The real difference between the two is the limiting height. For a tree of size :

  • an AVL tree's height is limited to
  • a red-black tree's height is limited to

So while both insertions, deletions, and lookups in both types of trees would be , an AVL tree is shorter in the worst case, so will generally require less time for each of those operations.

See also

References

  • G. Adelson-Velskii and E.M. Landis, "An algorithm for the organization of information." Doklady Akademii Nauk SSSR, 146:263–266, 1962 (Russian). English translation by Myron J. Ricci in Soviet Math. Doklady, 3:1259–1263, 1962.
  • Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Pages 458–475 of section 6.2.3: Balanced Trees. Note that Knuth calls AVL trees simply "balanced trees".
  1. ^ Pfaff, Ben (2004). "Performance Analysis of BSTs in System Software" (PDF). Stanford University. {{cite web}}: Unknown parameter |month= ignored (help)