About Me

My photo
I'm a colonist who has declared war on machines and intend to conquer them some day. You'll often find me deep in the trenches fighting off bugs and ugly defects in code. When I'm not tappity-tapping at my WMD (also, known as keyboard), you'll find me chatting with friends, reading comics or playing a PC game.

Tuesday, June 10, 2008

Sorting Stuff with Heap Sort - Part II

Why do we call it ‘Heap’ sort?
Heap sort begins by building a heap of the data. A heap can be visualized as a binary tree where each parent node is greater than its successors. In practice, the heap can easily be represented by a simple array where the left and right children of the ith node in the array will be at positions 2*i+1 and 2*i+2 respectively. For example, the following is a heap:

This heap above can be represented using an array as follows:

Note that the ith element has the (2i+1)th and (2i+2)th elements as its children. The fact that the heap can be represented like this means that we can do an in-place sort without having to construct an entirely new data structure for storage. There are a couple of interesting features of the heap which are to be noted:

1. Each node is greater than its successors (for sorting in ascending order).
2. By virtue of (1), the largest node will always be on top of the heap.

Now that we know what the heap is, let’s see how we can generate such a heap out of unsorted data.

Building the Heap.
Before we talk about building a heap, we must first discuss a sifting procedure. The sifting procedure basically sifts a given node downward through the tree until it reaches its rightful place in the tree. I may keep using the term ‘node’ simply because its easier to visualize things that way but remember that in practice, a node is nothing but an element in a list. The sifting algorithm is pretty straight-forward.

Procedure SIFT(int siftNode, int heapSize)
1. Start with the position of the node to be sifted.
2. If either of the sift node’s children are larger than it, then swap the sift node with the larger of its children.
3. If neither of the sift node’s children are larger than it or the end of the heap has been reached then exit.
4. Otherwise, go to step 2.

Using the sift procedure, we can easily build an initial heap out of seemingly unsorted data. We start with the last node in the tree or in other words, the last element in the input data list. We then invoke our earlier sift procedure on this element. Then we move onto the previous element in the list and the cycle continues until we have sifted all of the elements in the list. The result will be a nice heap. For grasping the idea better, the following algorithm should suffice. Note that the algorithm assumes a zero based input data list.


Procedure Heapify()
1. siftNode = inputDataList.length()-1
2. while siftNode >= 0
{
3. Invoke SIFT( siftNode, inputDataList.length() )
4. siftNode = siftNode – 1
}


Sometimes, building an initial heap out of the input data is known as “Heapifying” the data. We are not done yet. The heap has to now be sorted and we'll see how that can be done soon. Until then, have a nice week!

No comments: