Heap sorting-Introduction to Algorithms

Source: Internet
Author: User

I used heap sorting to sort the query results of the search engine. I 'd like to review it.
Before going into heap sorting, let's list common sorting methods,
Insertion Sort, The simplest and most intuitive sorting method, the worst time complexity O (n2 ), in place (recall that a sorting algorithm sorts in place if only a constant number of elements of the input array are ever stored outside the array .) that is to say, in addition to the input array, it only consumes space of constant size. Here, for insert sorting, an additional temporary storage space of an element should be required only when the element is exchanged. This method applies to input with a small size.
Merge sort, A Sort Algorithm Based on sub-governance, time complexity O (nlgn), but not in place, obviously merge requires a lot of extra space.
Heap Sort, We will introduce the time complexity O (nlgn) and in place.
Quick Sort, Fast sorting, worst time complexity O (n2), average time complexity is O (nlgn), but it is said that the actual reference is more efficient than the heap sorting.

The following describes Heap Sort,
The core of heap sorting is the heap data structure. The heap is a Complete Binary Tree, and each node is larger (or smaller) than the left and right subnodes, because the heap is divided into Max heap and Min heap.
A fully binary tree has a very efficient storage method, that is, an array. Generally, trees are stored using a linked list.
For Heap Sort input arrays, such as a [16, 4, 10, 14, 7, 9, 3, 2, 8, 1], to sort the heap, you must first build a heap. The heap can be divided into two steps:
Abstract The input array into a Complete Binary Tree
Heap BUILD-MAX-HEAP
Then the input array can be abstracted into the following binary tree,
16
4 10
14 7 9 3
2 8 1
Generally, you must record the tree structure, right? Generally, the linked list is used to record the node, and the pointer of the left and right subnodes of the node, which consumes several times more space than the input array, in this way, we can't get in place.
It's amazing that you create this Complete Binary Tree Based on the input array, without any additional space to record. This benefits from the fact that the full binary tree itself can be stored in arrays. This data structure is very efficient.
For any node in the array, you want to know its parent, left, right in the complete binary tree, which is very easy:
Parent(I)
Return I/2

Left(I)
Return 2I

Right(I)
Return 2I + 1
Now the input array has been abstracted as a Complete Binary Tree, so we need to start building a heap,
First to learn an important heap operation MAX-HEAPIFY
MAX-HEAPIFY(A, I)
1 l left (I)
2 R align right (I)
3 if L ≤ heap-size [a] and a [l]> A [I]
4 then largest rjl
5 else largest connector I
6 if r ≤ heap-size [a] and a [R]> A [Largest]
7 then largest release R
8 If largest = I
9 then exchange a [I], a [Largest]
10 MAX-HEAPIFY (A, largest)
This function is used to perform the heapify operation on the I node in array.
In fact, it is relatively simple, 1 ~ 7 is to compare and find out which of the I and left and right subnodes is the largest
8 ~ 10. If the largest node is not I, switch the largest node with the I node, and recursively perform heapify from the largest node position.
Obviously, for the full Binary Tree of N nodes, the height is lgn, And the heapify operation on each node is constant, so the time complexity of this operation is lgn.
With the heapify operation, the heap creation algorithm is very simple,
BUILD-MAX-HEAP()
1 heap-size [a] bytes length [A]
2 For I publish length [a]/2 downto 1
3 do MAX-HEAPIFY)

To put it bluntly, heapify is performed on the node from length [a]/2 to 1. Therefore, the maximum time complexity of this operation should be nlgn. In fact, it is much smaller than this one, which is about 2n. That is to say, the time complexity of heap building is O (n ), it is efficient to complete in linear time.
This algorithm is based on the elements in the subarray A [(n/2 + 1) limit N] are all leaves of the tree, so we only need to perform the heapify operation on all the non-leaf nodes.
After half a day of heap building, how can we sort the heap? We cannot get an ordered sequence from the heap.
Heapsort()
1 BUILD-MAX-HEAP ()
2 For I publish length [a] downto 2
3 do exchange a [1], a [I]
4 heap-size [a] ← heap-size [a]-1
5 MAX-HEAPIFY (A, 1)
The principle is very simple. We can only know the biggest one from the heap, so we will remove the biggest one, and then heapify will find the second largest one.
The implementation is also clever, and no additional storage space is used. Place the heap top to the end of the heap, and then heap size-1
The time complexity of this algorithm is also nlgn

Python version

 1 def heapSort(input):
2 output = []
3 buildHeap(input)
4 print input
5 while input:
6 i = len(input)-1
7 input[0],input[i] = input[i],input[0]
8 output.append(input.pop())
9 if input:
10 maxHeapify(input,0)
11 return output
12
13 def maxHeapify(input, i):
14 if i <0:
15 return
16 left = 2*i+1 # because the i from 0, not 1
17 right = 2*i+2
18 largest = i
19 length = len(input)
20 if left < length:
21 if input[i]< input[left]: largest = left
22 if right < length:
23 if input[largest]< input[right]: largest = right
24 if largest != i:
25 input[i], input[largest] = input[largest], input[i]
26 maxHeapify(input,largest)
27
28 def buildHeap(input):
29 length = len(input)
30 if length < 2: return
31 nonLeaf = length/2
32 for i in range(nonLeaf,-1,-1):
33 maxHeapify(input,i)

What are the applications of heap sorting?
I can see that when the search engine generates a query result, it needs to sort n candidate sets and take the first R as the query result. In this case, r <n
At this time, it is economical to sort by heap. First, the heap is generated, and then the heapify is performed once during sorting, and the subsequent heapify is ignored, saving a lot of time.

The typical application described in this article is priority queues.
Heap sorting is a good sorting algorithm, but it is still lost in practical applications, so other people use it. However, the heap data structure is widely used.
For example, this typical application Priority Queues
Queue is the first-in-first-out mode, so Priority Queues has the priority, which is a little more complicated. Priority is the first-in-first-out mode, which can be used for job scheduling such as CPU tasks.
This priority queue is suitable for implementing with a heap. The following describes the operations required,
Heap-Maximum()
1 return a [1]

HEAP-EXTRACT-MAX()
1 If heap-size [a] <1
2 then error "heap underflow"
3 Max keys a [1]
4 A [1] using a [heap-size [a]
5 heap-size [a] ← heap-size [a]-1
6 MAX-HEAPIFY (A, 1)
7 return Max

Heap-increase-Key(A, I, key)
1 If key <A [I]
2 then error "new key is smaller than current key"
3 A [I] primary key
4 while I> 1 and a [Parent (I)] <A [I]
5 do exchange a [I], a [Parent (I)]
6 I have parent (I)

MAX-HEAP-INSERT(A, key)
1 heap-size [a] ← heap-size [a] + 1
2 A [heap-size [a] bytes-∞
3 heap-increase-Key (A, heap-size [a], key)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.