0% found this document useful (0 votes)
65 views5 pages

Inventing A New Sorting Algorithm: A Case Study Susan M. Merritt

Uploaded by

DrSuresh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views5 pages

Inventing A New Sorting Algorithm: A Case Study Susan M. Merritt

Uploaded by

DrSuresh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

INVENTING A NEW SORTING ALGORITHM:

A CASE STUDY

Susan M. Merritt
Pace University, School of Computer Science and Information Systems
1 Martine Avenue, White Plains, NY 10606

Cecilia Y. Nauck
The Masters School
49 Clinton Avenue, Dobbs Ferry, NY 10522

ABSTRACT 2.0 THE LONGEST UPSEQUENCE PROBLEM (LUP)


Dijkstra’s algorithm for finding the length of the longest Given a sequences of n elements, an upsequencein s is a
upsequence within a given sequenceof numbers is used as the subsequenceof s whose elements appear in s in non decreasing
basis for a casestudy in the development of a new sorting order. A subsequenceof s is any subset of s in which the original
algorithm. The case study has pedagogic value for several order is retained (hence there are T possible subsequences).
reasons: the motivation for each step in the development of the The longest upsequenceproblem is:
algorithm is explained; interesting data structures are introduced;
the complexity of the algorithm is analyzed using mathematics fa- Give an algorithm which will return the length of the longest
miliar to first year computer science students; the appendices upsequenceof a sequences. (Note that there may be more than
provide detailed description of the implementation of the one longest upsequenceof the same length. For example, the
algorithm as well as several interesting examples of its use. The sequence[1.3,4,1.3.5] has the following upsequencesof length
proposed algorithm merits attention since it uses O(nlogn) com- 4: [1.3,4,51, [1,1,3,51, and [1.3,3.51.)
pares in the worst case(actually less than 2nlogn) and O(n) for
both ordered or nearly ordered and reverse ordered or nearly A solution with O(nlogn) worst caseperformance has been
reverse ordered sequences. given by Dijkstra [2] and modifications have been given by
Dewar, Merritt and Sharir [l].
1.O INTRODUCTION Dijkstra’s algorithm can be described as follows: Elements in
We provide a comprehensive and pedagogical description of the original sequenceare examined from left to right. As each
an algorithm development process. The algorithm is new. It uses new element is examined it is inserted into the m-sequence, a
a variety of building blocks from computer science including well sequenceconsisting of minimum rightmost elements of upse-
known techniques such as list merging, and lesser known quences of length 1.2,... New sequenceelements are inserted into
techniques such as the efficient solution of the longest upsequence the m-sequenceeither by being added to the right end (if x is
problem, It incorporates a variety of interesting data structures. It larger than the rightmost element of the partial m-sequence) or by
demonstrateshow basic concepts in computer science and “bumping” an element already in the m-sequence(where the
mathematics are used to develop and demonstrate an interesting element being bumped is the smallest element in the m-sequence
and respectable algorithm. which is larger than the new element to be inserted).
We describe the longest upsequenceproblem and the elegant Consider the following example, with input sequences of
solution given by Dijkstra [2]. We then propose an extension to length 8: s: 8, 1, 6,5,4, 7.2,3
Dijkstra’s algorithm to develop a sorting algorithm which uses step1 (insert 8) m:8
O(nlogn) compares in the worst case and O(n) comparesfor an step2 (insert 1) m:l Coump8)
interesting class of data. Substantial analysis of the algorithm is step3 (insert 6) m:l 6 (add 6)
demonstrated with respect to compares. The algorithm develop- step4 (insert 5) m:l5 (bump 6)
ment provides a good case study for a fist year computer science step5 (insert 4) m: 14 (bump 5)
course. step6 (insert 7) m: 14 7 (add 7)
Permission to copy without fee all or part of this material is granted provided step7 (insert 2) m: 1 2 7 (bump 4)
that the copies are not made or distributed for direct commercial advantage, stepS(insert 3) m:l 2 3 (bump 7)
the ACM copyright notice and the title of the publication and its date appear,
and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise, or to republish, requires a fee
The length of the longest upsequence is 3 (and m contains the
and/or specific permission. minimum rightmost elements of upsequencesof length 1.2,3,
0 1990 ACM 08979L346-9/9O/OOCZ/Ol81$1.50 respectively).

181
> 5 -> 6 -> 8 ->nil. This would then be merged with L3. which
3.0 THE COMPLEXITY OF THE LUP
Dijkstra uses a binary search technique to determine which contains 3 -> 7 ->nil to produce a single list of ordered elements.
element of the m-sequence is to be replaced. which makes the
worst case complexity of the algorithm O(nlogn); without the 5.0 THE SORTING PHASE
In discussing the merge operation required to transform the m-
binary search the worst case complexity is O(n*).
If x=s[k] is the next element of the input sequence and mu] is structure into a single sorted array it is interesting to consider the
the rightmost element of the m-sequence, the heart of Dijkstra’s various “shapes” the m-structure can have. In the case of reverse
algorithm is: ordered data it will consist of a single m[i], namely m[l] pointing
to a list of ordered elements. The work involved in forming this
if mu] c= s[k]
then n := n + 1; m[n] := s[k]; array has been of order n and no further compares are necessary to
else “establish c such that” create the final sorted sequence. Every other case will produce an
m-array with r>l and the question of the best way to merge these r
m[c-11 <= s[k] < m[c];
history queues is a central issue to be addressed.
m[c] := s[k]
Consider the simple case of merging two lists Ll and L2 of
end if;
ordered data. In general, given lists of length p and q, the largest
Binary search uses a divide and conquer approach to find the
number of compares requited to merge the two lists is we11known
correct placement of a given item within a given sorted list and is
described in many places, for example [3]. From a pedagogical (see [3], for example). In the worst case neither list would be
exhausted early. Since the maximum number of compares in this
perspective it is interesting to review the mathematics of this
case is always 1 less than the sum of the lengths of the two lists,
procedure. The solution to the equation given below provides the
there are p+q-1 compares in the worst case. Consider the merge
number of times a given set can be so divided:
of Ll and L2 which is begun by comparing 3 and 5,
(l/2)% = 1.
Ll L2
or 2-h = 1. Taking the log of each side using 2 as the base of the
logarithm and using logn to mean the log of n base 2, the
3 5 then5and8.
8 10 then8and10,
following result is obtained:
12 15 then 10 and 12,
log2-‘ + logn = log1
17 then 12 and 15.
or -x + logn = 0
andfinally15and17. Thenumberofcomparesis6=4+3-1.
x = logn.
Furthermore, if it is known that the fit element of Ll is always
f%ce there are n elements that need to be inserted, the order of
smaller than the fist element of L2 (as is the case in the m-
complexity for the entire process in the worst case is O(nlogn).
structure), the first compare can be avoided and thus the total
number of compares can be reduced in the worst case to p+q-2.
4.0 DEVELOPMENT OF THE SORTING ALGORITHM
There is a straightforward approach to merging the lists Ll,
In the solution to the LUP outlined above, bumped elements
L2, W, LA ,..., Lr. Ll and L2 are merged and then the resulting
are discarded. It should be noted that if bumped elements could
queue is merged with L3 and the resulting queue is merged with
be saved, some useful ordering information would be preserved.
LA, continuing in this manner until Lr is merged with the result of
In the proposed sorting algorithm the m-array produced by
all the previous merges. A bit of reflection reveals that Ll is used
Dijkstra’s algorithm together with the saved bumped elements
form something that we could call an m-structure. The building in all r-l merges. If the m-structure is such that most of the n
numbers are in Ll then it is clear that this merge operation uses
of the m-structure is the fist phase of the proposed sort. In the m-
O(n*) compares in the worst case. Since there are many good
structure each m[i] is the head of a list and initially set to point to
nil. If the original m[i] of Dijkstra’s m-array has not been sorting algorithms with O(nlogn) compares [3], a O(n2) technique
for the merge alone is not acceptable.
“bumped”, then it will continue to point to nil. Otherwise, it will
A better way to merge the lists is to merge Ll and L2, L3
be the head of a list of elements that have been “bumped” from
and LA. .... L(r-I) and Lr and then to do pair-wise merges of the
that position. Such a list shall be referred to as a history queue
labelled Li, with length li. Clearly, if Li is a history queue its merged lists. The complexity for this kind of merge is less
dependent on the “shape” of the m-stucture. The simplest case
corresponding length, li, must be at least 1 (no bumped elements)
and at most n (every new element bumps the previous one). For arises when r is a power of two. Consider the case Ll, L2, L3,
LA. I-5, L6, L7, L8. Recalling that Ii represents the length of list
example, the m-structure produced by applying Dijkstra’s
Li, it is clear that the maximum number of compares required to
algorithm to the sequence 8, 1,6.5,4, 7, 2,3 is:
merge Ll and L2 is 11+12-2. Similarly, the subsequent merges
Ll L2 L3
yield the following results for the maximum number of compares
Ill21 31
per merge (“ml=>11+12-2” means that the maximum number of
8 4 7
compares required to merge Ll and L2 is the sum of their
5
respective lengths, 11 and 12, minus 2):
6

For the case above, the length of the longest upsequence is 3.


ml=>ll+l2-2
The m-structure would be initialized as an array of 8 lists of
m2=>13+14-2
integers initialized to point to nil. After completion of the first
m3=>15+16-2
phase of the algorithm, three of the lists would not be empty. The
m4=>17+18-2
number of nonempty lists will be referred to as r. Clearly. r will
m5=>11+12+13+14-2
be at least 1 (the case for reverse order sequences) and at most n
m6=>15+16+17+18-2
(the case for ordered sequences). Here r=3. Note that the m-
m7=>11+12+13+14+15+16+17+18-2
structure not only contains a sorted first row which is the m-array
of Dijkstra’s algorithm but also a series of history queues which
Ll L2 L3 L4 LS L6 L7 L8
are themselves sorted lists.
Jmd Lm2J hJ im4i
The second or sorting phase of the proposed technique will
involve merging Ll. consisting of the list 1 -> 8 ->nil. with L2, I-mu l-m6 i
consisting of 2 -> 4 -> 5 -> 6 ->nil. to produce the list 1 -> 2 -> 4 - L-.----m71

182
The total number of compares required for the merge is 3n- m 1=>l 1+12-2 Ll L2 L3 LA L5
2(7). From the information above it is clear that the number of m2=>13+14-2 [ml-l lmzl
compares in this case is a function of n and not of the “shape” of m3=>11+12+13+14-2 Lm3 J
the m-structure. This information can be used to arrive at a m4=>11+12+13+14+15-2 /m4 1
formula for the number of compares required to merge r lists
when r is a power of two. the maximum total compares = 2n + (11+12+13+14-15)- 2(4)
Since the shape of the m-structure is immaterial in the case that
r is a power of two, suppose the n elements of the original array In this case the shape of the m-structure is relevant, but in any
are evenly divided among r queues where r is a power of two. case it is clear that the maximum number of compares is less than
Then each list contains n/r elements and the maximum number of 3n - 2(4).
compares required to merge any pair is n/r + n/r -2 or 2(n-r)/r. The result can be arrived at analytically by using result (A)
Since the first pass requires r/2 such merges the maximum total from above. If r is not a power of two the number of passes
number of compares for the first pass is (2(n-r)/r)(r/2) = n-r required to accomplish the r-l required merges is log p where p is
compares. defined as the smallest power of two bigger than r. Letting k =
For the second pass, each queue is now of length 2n/r and log p where p is chosen as the smallest power of 2 bigger than r
hence the maximum number of compares required to merge two (recall we are assuming r is not a power of 2), it follows that log p
such lists is 2n/r + 2n/r -2. Since there are r/2 lists after the first = [Ilog rl] + 1. It is also true that r/p is less than 1. Therefore, (A)
pass, r/4 merges are required so the maximum number of can be reduced to:
compares for pass two is(2n/r + 2n/r - 2)(r/4) = (2n - r)/2. c = nlogp - 2r + 2(r/p) < nlogp - 2r + 2( 1) = nlogp - 2(r - 1).
Similarly, for pass three each queue is of length 4n/r. Since r/8 This is already a satisfactory result since it produces the sorted
merges are required, the maximum total number of compares is array in O(nlogn) compares in any case. It does have one
(4n/r + 4n/r -2)(r/8) = (4n - r)/4. problem, however. In the case of sorted data the m-array
For the fourth pass the maximum number of compares is produced would have r equal to n and hence would require nlogn
(8n - r)/8. + n - 2n + 2 or nlogn -n + 2 compares to complete the merges.
For the kth pass the maximum number of compares is Although it is O(nlogn) in the worst case, it is a less than
(2k-‘n - r)/2k-‘. satisfying result for inorder data. To alleviate this problem and
Adding all of the compares for each pass together the follow- improve the performance of the process in general, an intemredi-
ing is obtained: ate step is introduced between the formation of the original m-
If c = the total number of compares up to the kth pass, then array and the final merge. This step can best be described as
c= (n-r)/1 + (2n-r)/2 + (4n-r)/4 + (8n-r)/8+...+(2k-‘n-r)/2k-’ follows:
A) Suppose Dijkstra’s algorithm is applied to the sequence
c= n-r/l + n-r/2 + n-r/4 + n-r/8 + .. . + n-rnk-i 0135471021520

c= (n+n+...+n) - (r/l + r/2 + r/4 + r/8 + ... + #-I) B) The m-structure produced is
IO 1 1(214\ 71101151201
c= kn - r(1 + l/2 + l/4 + l/8 + ... + 1/2k-‘) 35
Since r=8 is a power of two the maximum number of compares
c= kn - r(1 - (l/2) k, (sum of a geometric series) required to produce a single list is nlogr - 2(r -1) = lOlog 8 - 2(8-
W) 1) =10(3)-2(7)=16.
C) Produce a new m-structure by looking at the pointer of each
c= kn - 2r + 2r2-k (A) m[i]. If it points to nil let m[i] point to m[i+l]. This procedure
will yield the following structure which shall be referred to as an
Under the assumption that r is a power of two let k = log r base 2, m*-structure:
denoted as logr. Using this value in formula (A) obtained above it
follows that plq-?y
c = nlogr - 2r + 2r2-t”ar 1 5 10
2 15
c = nlogr - 2r + 2r-r* 3 20

c=nlogr-2r +2 D) The corresponding T value shall be referred to as T*. Since


r* = 3 we can use the formula for the non-power of two case
c=nlogr-2(r- 1) which says the number of compares < nlog p -2(r-1). In this case
p=4, r=3 and n=10 giving the result that the number of compares
Note that in the example above the number of compares < lO(log 4) - 2(3-l) = lO(2) - 2(2) = 16. In fact we can compute
required to merge eight queues was 3n - 2(7). the number of compares to be
To better understand the case when r is not a power of two, an ml=>4+2-2=4
example using 5 queues can be investigated. Consider the m2=>6+4-2=8
following two possible pair-wise mergings of Ll. L2, L3. L4, L5: total compares = 12 c 16.
ml=>11+12-2 Ll L2 L3 L4 L5 The most important reason for this improvement is in the case
m2=>13+14-2 I-mlJ 1~2J of sorted data where an m-array such as 1 2 3 4 5 would yield the
m3=>13+14+15-2
m4=>11+12+13+14+15-2 L
m4J
m3 1 m* -structure
n
1

the maximum total compares = 2n I- (13+14) - 2(4) 3”


4
On the other hand, if the 5 lists are merged in a different 5
manner, the following is obtained: which gives the final result requiring no further merging.
This discussion would seem to imply that the worst case

183
scenario would be an m-structure which is a perfect n/2 by 2 APPENDIX: EMPIRICAL DATA
rectangle since it would have the largest possible r with no hope
of improvement by arrangement into a m*-structure. That Sample output from several runs of the program are given below.
intuitive result can be further explored as follows using the Special test caseshave been used and the trace has been set so that
ordinary methods of the calculus. If we let r be a real number the different steps of the algorithm are shown.
then we can study the behavior of the function f(r) = nlogr - 2(r-1)
realizing that whenever r is a positive integer which is a power of Example: n = 8
two the value of f(r) accurately gives the maximum number of 01354792
comparesrequired for the merge operation. comparesneeded to form history queues = 13
For a given n, the maximum number of compares is a function heads point to 0 1 2 4 7 9
of r given by 1) 0
f(r) = nlogr - 2(r - 1) 2) 1
f(r) = nlogr - 2r + 2 3) 2 3
If r = 1 the number of compares required for the merge 4) 4 5
operation is 0. If r = n the maximum number of comparesis 5) 7
nlogn -2n + 2. To investigate critical points of this function 6) 9
consider taking derivatives. f’(r) = n/r - 2. Setting this equal to 0 after transfotmation:
we obtain r = n/2. f”(r) = -n/3. Since this is always negative the 1) 0123
critical point at r=n/2 is a relative maximum. 2) 4 5
3) 7 9
6.0 CONCLUSION total number of compares= 21
The sorting procedure described above consists of:
1) the creation of an m-structure, an extension of Dijkstra’s
algorithm for finding the LUP; Example: n = 8
2) the transformation of the m-structure into an m*-structure as 12345678
described above: comparesneeded to form history queues = 7
3) the pair-wise merge of the queues of the m*-structure. headspointto 12345678
It is a relatively efficient method of sorting all kinds of data. It 1) 1
has a lovely symmetry in that it treats ordered and reverse ordered 2) 2
sequences(and nearly ordered and reverse ordered sequences) 3) 3
equally well, performing the sort in O(n) compares. Other cases 4) 4
have O(nlogn) compares, with the maximum number of compares 5) 5
less than 2nlog n. We have described the motivation for various 6) 6
parts of this algorithm in some detail, as well as the results of 7) 7
various design decisions. The inventive process uses a creative 8) 8
data structure, motivates a binary search, and compares different after transformation:
merge techniques, Complexity is discussed on an intuitive level 1) 12345678
and is supported with fairly elementary mathematics. The result total number of compares= 7
is a case study which can be used either to demonstratethe
development process of an algorithm or to enrich/reinforce the
basic concepts of a first year course in computer science. Example: n = 8
87654321
comparesneeded to form history queues = 14
heads point to 1
REFERENCES 1) 123456789
after transfotmation:
1) 123456789
[l] Dewar, R.B.K.. Merritt, S. M. and Sharir. Micha. Some total number of compares= 14
modified algorithms for Dijkstra’s longest upsequenceproblem.
Acta Informurica,18. 1-15 (1982). For n=8,64,128.256, and 5 12, four runs of the LUPSORT
program were performed using random numbers generatedby a
[2] Dijkstra, E.W. Some beautiful arguments using mathematical RandomGenerator program. A summary of the results obtained is
induction. Actu Informutica, 13, l-8 (1980). given below:

[3] Knuth, D.E. The Art of Computer Programming, Vol3: n=8 nlogn=24
Sorting and Searching. Addison Wesley, Reading, Mass. 1973 average total comparesneeded to form history queues = 13
average total comparesneeded for sort = 22
[4] L’Ecuyer. P. Efficient and portable combined random number
generators. Communications of the ACM. Vo131. Number 6 n=64 nlogn=384
(1988). averagetotal comparesneeded to form history queues = 258
average total comparesneeded for sort = 509
The algorithm described above has been coded in Pascal. Copies
of the code may be obtained from Cecilia Nauck. n=128 nlogn=896
average total comparesneeded to form history queues = 615
average total comparesneeded for sort = 1276

184
n=256 nlogn=2048
average total compares needed to form history queues = 1399
average total compares needed for sort = 2833

n=5 12 nlogn=4608
average total comapres needed to form history queues = 3 138
average total comparesneeded for sort = 6601

In the last four of the five casesthe total number of compares


required for the sort is between nlong and 2nlogn. In the first case
the total is less than nlogn.

You might also like