## Sorting

```

Introduction
One of the most common applications in computer science is sorting,
the process through which data are arranged according to their values.

Sort Classifications
Sorts are generally classified as either internal or external sorts.
An internal sort is a sort in which all of the data are held in
primary memory during the sorting process. An external sort uses
primary memory for the data currently being sorted and secondary storage
for any data that will not fit in primary memory.

Internal

Insertion
Insertion sorting is one of the most common sorting techniques used
by card players. As they pick up each card, they insert it into the
proper sequence in their hand. The concept extends well into computer
sorting. In each pass of an insertion sort, one or more pieces of data
are inserted into their correct location in an ordered list. In this
section we study two insertion sorts, the straight insertion sort and
the shell sort.

Insertion
In the straight insertion sort, the list is divided into two parts: sorted
and unsorted. In each pass, the first element of the unsorted sublist is
transferred to the sorted sublist by inserting it at the appropriate place.
If we have a list of n elements, it will take at most n-1 passes to sort
the data.

The insertion sort efficiency is O(n^2).

Shell
The shell sort algorithm, named after its creator, Donald L. Shell, is
an improved version of the straight insertion sort in which the diminishing
partitions are used to sort the data.

In the shell sort, given a list of N elements, the list is divided into
k segments, where k is known as the increment. Each segment contains [n/k]
or less elements.

After each pass through the data, the increment is reduced until, in the final
pass, it is one. For example, 5,2,and 1, or 7,3,2,1.

Example (K=5, 3,1)
k=5
1    2    3   4    5    6    7    8    9  10
77  62  14  9  30  21  80  25  70  55
--                          --
--                          --
--                           --
-                              --
--                               --

21                          77
62                          80
14                           25
9                              70
30                                55

k=3
21  62  14  9  30  77  80  25  70  55
--               --            --            --
--               --               --
--              --             --

21  62  14   9  30  77  80  25  70  55
9  62  14  21
9  30  14  21  62
9  30  14  21  62  77
9  30  14  21  62  77  80
9  25  14  21  30  77  80  62
9  25  14  21  30  70  80  62  77
9  25  14  21  30  70  55  62  77  80

k=1
9  25  14  21  30  70  55  62  77  80
9
9  25
9  14  25
9  14  21  25
9  14  21  25  30
9  14  21  25  30  70
9  14  21  25  30  55  70
9  14  21  25  30  55  62  70
9  14  21  25  30  55  62  70  77
9  14  21  25  30  55  62  70  77  80

Selecting the increment size

First, let's recognize that there is not an increment size that is best for
all situations. Knuth suggests, however, that you should not start with an
increment greater than one-third of the list size. Other computer scientists
have suggested that the increments be a power of two minus one or a
Fibonacci series.

Knuth tells us that the sort effort for the shell sort cannot be determined
mathematically. He estimates from his empirical studies that the average
sort effort is 15 n^1.25. Reducing Knuth's analysis to a Big-O notation, we
see that the shell sort is O(n^1.25).

Selection
Selection sorts are among the most intuitive of all sorts. Given a list of
data to be sorted, we simply select the smallest item and place it in a
sorted list. These steps are then repeated until all of the data have been
sorted. In this section we study two selection sorts, the straight selection
sort and the heap sort.

Selection
In the straight selection sort, the list at any moment is divided into two
sublists, sorted and unsorted, which are divided by an imaginary wall.
We select the smallest element from the unsorted sublist and exchange it
with the element at the beginning of the unsorted data. After each selection
and exchange, the wall between the two sublists moves one element,
increasing the number of sorted elements and decreasing the number of
unsorted ones. Each time we move one element from the unsorted sublist to
the sorted sublist, we say that we have completed one sort pass. If we have
a list of n elements, therefore, we need n-1 passes to completely rearrange
the data.

In each pass of the selection sort, the smallest element is selected from
the unsorted sublist and exchanged with the element at the beginning of the
unsorted list.

Note: You may find the largest and move to the right side of the list. This
way you scan the list right to left.

The straight selection sort efficiency is O(n^2).

Heap
A heap is a tree structure in which the root contains the largest (or smallest)
element in the tree.

The heap sort algorithm is an improved version of the selection sort in which
the largest element (the root) is selected and exchanged with the last element
in the unsorted list.

Heap sort begins by turning the array to be sorted into a heap. This is done
only once for each sort. We then exchange the root, which is the largest
element in the heap, with the last element in the unsorted list. This exchange
results in the largest element being added to the heap and exchange again.
The reheap and exchange process continues until the entire list is sorted.

Following the branches of a binary tree from a root to a leaf requires
log N loops.The sort effort, the outer loop times the inner loop, for the
heap sort is therefore n(log2 n).

When we include the processing to create the original heap, the Big-O
notation is the same. Creating the heap requires nlog2n loops through the
data. When factored into the sort effort, it becomes a coefficient, which
is then dropped to determine the final sort effort.

The heap sort efficiency is O(nlog2n).

Exchange
The third category of sorts, exchange sorting, contains the most common sort
taught in computer science, the bubble sort, and the most efficient general
purpose sort, quick sort.

Bubble
In the bubble sort, the list at any moment is divided into two sublists:
sorted and unsorted. The smallest(largest)) element is bubbled from the
unsorted sublist and moved to the sorted sublist. After moving the smallest
(largest) to the sorted list, the wall moves one element to the right,
increasing the number of sorted elements and decreasing the number of
unsorted ones. Each time we move one element from the unsorted sublist
to the sorted sublist, we say that we have completed one sort pass. If
we have a list of n elements, therefore, we need n-1 passes to completely
rearrange the data.

The bubble sort efficiency is N(N-1)/2

The bubble sort efficiency is O(n^2).
Bubble Sort with a Flag
It is a variation of bubble sort. It stops after we make a pass through
the list without any exchange.

It gives butter performance when the original list is sorted or partially
sorted.

Its performance ranges from O(N) to O(N^2)
Quick
Quick sort is an exchange sort in which a pivot key is placed in its correct
position in the array while rearranging other elements widely dispersed across
the list. It is a divide-and-conquer sorting method that runs in average time
O(nlog n).

The main theme behind Quicksort is as follows: You first choose some key in
the array A as a pivot key. This pivot key is used to separate the keys in A
into two partitions: (1) A left partition containing keys less than or equal
to the pivot key, and (2) a right partition containing keys greater than or
equal to the pivot key. Quicksort is then applied recursively to sort the
left and right partitions.

Strategy for Quicksort

Procedure Quicksort (var: A:SortingArray; m,n:Integer);
Begin
If {there is more than one key to sort in A [m..n] then begin
Partition A[m..n] into a LeftPartiton and RightPartition
using one of the keys in A[m..n] as a pivot key.
Quicksort the LeftPartition
Quicksort the RightPartiton
end {if}
end {Quicksort};

Pascal code for Quicksort

Procedure Quicksort (var: A:SortingArray; m,n:Integer);
Var
i, j: integer;
Begin
If m < n then begin
i := m;
j := n;
Partition(A,i,j);
Quicksort(A,m,j);
Quicksort(A,i,n);
end {if}
end {Quicksort};

Methods in selecting the pivot key

The Worst Case for QuickSort

The best case for Quicksort

The average case for quicksort

Internal Merge Sort
MergeSort divides the list into two sub-lists and combines
the sorted sublists by merging them together into a single
list.

The number of levels is the ceiling of (log N).
The minimum number of comparisons to merge two lists is N/2
The maximum number of comparisons to merge two lists is N-1.

The maximum number of comparisons to sort the list is
Log N * (N-1). Merge sort runs in time O(NlogN).

Array implementation ... > Extra space and copying
List implementation ...> No extra space, but extra time to divide.

A formal sorting algorithm was first devised for use with punched
cards. The idea is to consider the key one character at a time and
to devide the entries into as many sublists as there are
possibilities for the given character from the key.
If our keys, for example, are words or other alphabetic strings,
then we devide the list into 26 sublists at each stage. That is,
we set up a table of 26 lists and distribute the entries into
the lists according to one of the characters in the key.

Example,

Initial list	Sorted by l	Sorted by	Sorted by
letter 3	letter 2	letter 1

rat		mop		map		car
mop		map		rap		cat
cat		top		car		cot
map		rap		tar		map
car		car		rat		mop
top		tar		cat		rap
cot		rat		mop		rat
tar		cat		top		tar
rap		cot		cot		top

RadixSort is an O(n) sorting process because it makes exactly
k linear passes through the list of n keys when keys have k
digits (letters).

Proxmap Sort
In proxmap sorting, you compute a "proximity map," or proxmap for
short, which indicates, for each key, k, the beginning of a
subarray of the array A in which K will reside in final sorted order.
Let's proceed by example in order to help reveal the main ideas.

Example,

Initial unsorted list

i =   1       2      3      4      5      6     7     8      9    10    11   12   13
A[i] = [6.7   5.9  8.4  1.2  7.3  3.7  11.5  1.1  4.8  0.4  10.5  6.1  1.8 ]

Step1:	Map the keys to an index array as follows:

Mapkey (K) = ceiling (K). For example, Mapkey (3.7) >> 4

If we were to use i:= Mapkey(K) to send K into a location,
A[i], in array A where we kept a linked list of keys, sorted
in ascending order, we could scan through the original list, A,
and send its keys into a collection of sorted linked lists, as
shown.

i =   1       2      3      4      5      6     7     8      9    10    11   12   13

0.4    1.1           3.7   4.8   5.9  6.1  7.3  8.4         10.5   11.5

1.2                                   6.7

1.8

Step 2: Compute hit counts, H[i], for each position, i, in A

For i := 1 to 13 do begin
j := Mapkey (A[i]);
H[j] := H[j] + 1
end

i =   1       2      3      4      5      6     7     8      9    10    11   12   13
A[i] = [6.7   5.9  8.4  1.2  7.3  3.7  11.5  1.1  4.8  0.4  10.5  6.1  1.8 ]
H[i] =   1      3     0     1     1     1      2      1      1     0      1      1      0

Note: Look at the index array above.

Step 3: Computing the Proxmap

From the hit counts, H[i], we compute a proxmap, P[i], where each entry
P[i] gives the location of the beginning of the future reserved subarray
of A that will contain keys, K, mapping to location, i, under the mapping
Mapkey(K) = i.

{Convert hitcounts to a proxmap}
RunningTotal := 1;
For i := 1 to 13 do begin
If H[i] > 0 then begin
P[i] := RunningTotal;
RunningTotal := RunningTotal + H[i];
End
End

i =   1       2      3      4      5      6     7     8      9    10    11   12   13
A[i] = [6.7   5.9  8.4  1.2  7.3  3.7  11.5  1.1  4.8  0.4  10.5  6.1  1.8 ]
H[i] =   1      3     0     1     1     1      2      1      1     0      1      1      0
P[i] =   1      2     0     5     6     7      8     10     11    0     12     13     0

Step 4: Computing insertion locations, L[i], for each key, K=A[i], in array A

For i := 1 to 13 do begin
L[i] := P[MapKey (A[i])];
End

i =   1       2      3      4      5      6     7     8      9    10    11   12   13
A[i] = [6.7   5.9  8.4  1.2  7.3  3.7  11.5  1.1  4.8  0.4  10.5  6.1  1.8 ]
P[i] =   1      2     0     5     6     7      8     10     11    0     12     13     0
L[i] =   8      7    11    2    10     5     13     2      6    1      12     8      2

The final phase of proxmapsort consists in moving each key, A[i],
in the original unsorted array A into the location L[i] at the
beginning of its reserved future subarray, and in inserting it in
ascending order into the sequence of keys already occupying its reserved
subarray. If we had two copies of A, say A1 and A2, where A1 was the
original unsorted array, and A2 was an initially empty copy of A
designed to accumulate the keys of A in final sorted order as they
were being inserted, then we could map each key, A1[i], into its insertion
location, L[i], in A2, and insert it in ascending order into the sequence
of keys beginning at L[i] in A2.

i =   1       2      3      4      5      6     7     8      9    10    11   12   13
A1[i] =[6.7   5.9  8.4  1.2  7.3  3.7  11.5  1.1  4.8  0.4  10.5  6.1  1.8 ]
L[i] =   8      7    11    2    10     5     13     2      6    1      12     8      2
A2[i] =[-.-   1.2  -.-  -.-     3.7  -.-    5.9   6.7    -.-    7.3  8.4   -.-  11.5 ]

After moving 7 keys into their reserved subarrays.

i =   1       2      3      4      5      6     7     8      9    10    11   12   13
A1[i] =[6.7   5.9  8.4  1.2  7.3  3.7  11.5  1.1  4.8  0.4  10.5  6.1  1.8 ]
L[i] =   8      7    11    2    10     5     13     2      6    1      12     8      2
A2[i] =[0.4  1.1  1.2  -.-   3.7  4.8    5.9   6.7    -.-  7.3  8.4   10.5  11.5 ]

After moving 11 keys into their reserved subarrays.

Some changes are needed to sort numbers between 0 and 1 to
map them into 1 and N.

Mapping alphanumeric fields.

r(K) = base26value(K)/(1+Base26Value('Z..Z')).

Function Base26Value (K:AirportCodeKey):Integer;
Var n1, n2, n3:Integer;
Begin
n1 := (ord(K(1) - ord ('A')) * 26 * 26;
n2 := (ord(K(1) - ord ('A')) * 26;
n3 := (ord(K(1) - ord ('A'));
Base26Value := n1+n2+n3;
end;

Note: We assumed the airport name is three characters only.

Analysis of ProxmapSort

In the worst case, ProxmapSort can take time O(n^2), if all
the keys map to a single location. Its average is O(n).

External (Merges)
All of the algorithms we have studied so far have been internal
sorts, that is, sorts that require the data to be entirely sorted
in primary memory during the sorting process. We now turn our
attention to external sorting, sorts that allow portions of the
data to be stored in secondary memory during the sorting process.

Merging Ordered Files
A merge is the process that, given two files ordered on a given key,
combine the files into one ordered file on the same given key.

File 1:	1, 3, 5			File 2:	2, 4, 6, 8, 10	(input)

File 3:	1, 2, 3, 4, 5, 6, 8, 10				(output)

Merging Unordered Files
In merge sorting, however, we usually have a different situation than
shown above: The input files are not completely sorted. The data will
run in a sequence and then there will be a sequence break followed
by another series of data in sequence.

The series of consecutively ordered data in a file is known as a merge
run.

Many different merge concepts have been developed over the years.
Here are three that are representative:

Natural
In the natural merge, each phase merges a constant number of input
files into one output file. Between each merge phase, a distribution
phase is required to redistribute the merge runs to the input
files for remerging.

Input (2,300 unsorted records)

Merge 1: Three merge runs:
1    -   500
1,001- 1,500
2,001- 2,300

Merge2: Two merge runs:
500 - 1,000
1,501 - 2,000

Merge (merge1 & merge 2)

Merge 3: Three merge runs:
1 - 1,000
1,001 - 2,000
2,001 - 2,300
Distribution

Merge 1: Two merge runs:
1 - 1,000
2,001 - 2,300

Merge 2: One merge run:
1,001 - 2,000

Merge (merge 1 & 2)

Merge 3: Two merge runs:
1 - 2,000
2,001 - 2,300

Distribution

Merge 1: One merge run:
1 - 2,000

Merge 2: One merge run:
2,001 - 2,300

Merge (merge 1 & 2)

merge 3: One merge run:
1 - 2,300
Balanced
A balanced merge uses a constant number of input merge files and
the same number of output merge files. The balanced merge eliminates
the distribution phase by using the same number of input and output
merge files.

Input File

Merge 1: Three merge runs:
1 -   500
1,001 - 1,500
2,001 - 2,300

Merge2: Two merge runs:
500 - 1,000
1,501 - 2,000

Merge (merge1 & merge 2 into merge 3 & 4)

Merge 3: Two merge runs:
1 - 1,000
2,001 - 2,300

Merge 4: One merge run:
1,001 - 2,000

Merge (merge 3 & 4 into merge 1& 2)

Merge 1: One merge run:
1 - 2,000

Merge 2: One merge run:
2,001 - 2,300

Merge (merge 1 & 2 into merge 3)

Merge 3: One merge run:
1 - 2,300

File is sorted.
Polyphase
In the polyphase merge, a constant number of input merge files are
merged to one output merge file and the input merge file are immediately
reused when their input has been completely merged.

Merge 1: three merge runs:
1     -   500  ....
1,001 - 1,500  ....
2,001 - 2,300

Merge 2: Two merge runs:
501 - 1,000	....
1,501 - 2,000	....

Merge 3: (output) two merge runs:
1 - 1,000
1,001 - 2,000

The first two runs of merge1 & 2 will be merged into merge 3.
merge 1 is still has a run.

First merge phase complete
--------------------------------------------------------------
Merge 1: 2,001 - 2,300

Merge 3: two merge runs:
1 - 1,000
1,001 - 2,000

Merge 2: (output)One merge run:
1 - 1,000
2,001 - 2,300

Second merge phase complete
---------------------------------------------------------------
Merge 2: One merge run:
1 - 1,000
2,001 - 2,300

Merge 3: One merge run left:
1,001 - 2,000

Merge 1: 1 - 2,300

Third merge phase complete
---------------------------------------------------------------

Comparison of Methods

Use of Storage Space
Use of Computer Time, and
Programming Effort

Other factors in selecting a sorting method

Contiguous version
Record Size
Recursive version
Non-recursive version
Programming language used
Its average case
Best case
Worst case
One-time sorting
frequently used
List size

Last update October 3,1998.

```