## Tuesday Dec 28, 2010

### Finding Kth Minimum (partial ordering) – Using Tournament Algorithm

Ok, so far in my previous blog entries about finding 2nd minimum (and for repeating elements) I wrote about efficient algorithm for finding second minimum element of an array. This optimized algorithm takes O(n + log(n)) time (worst case) to find the second minimum element.
So, the next obvious question is - how can we find Kth minimum element efficiently in an unsorted array where k ranges from 1 – n. How about partial sorting - efficiently returning first K minimum values from unsorted array. As you might have guessed it, we can achieve this by extending the logic used before and tweaking our implementation little bit. We can find Kth min (or return partially sorted list of K min elements) in O(n + klog(n)) time.

### Design Details

For simplicity we will assume that the input unsorted array consists of non-repeating, non-negative integer numbers only. Solution can be extended for arrays with repeating numbers using similar considerations as outlined in my blog for finding second min for repeating values.

As noted in the previous blog, using tournament method we can build output tree by comparing the adjacent elements and moving the lower of the two to next level. Repeating this process yields a tree with log(n) depth and the root element of this tree represent the minimum value. Also, the second minimum can then easily be obtained by finding the minimum of all the values that the root was compared against at each level.  Since the depth of tree is log(n), we have to do at most O(log(n)) comparisons. For example, Figure 1 below uses tournament method for finding the root element. For this particular case, the minimum value is 1 and second minimum 2.

Let’s say we want to find the 3rd minimum value. Extending the logic used to find second minimum, the third minimum value must have met the root (minimum) or the second minimum before ending its progression (just like winning a tournament round) to next level of the tree (as smaller of the two compared values move to next level). Thus 3rd minimum can be obtained by finding the least of all the values compared against the minimum and the second minimum, i.e. backtracking the root (minimum) and the second minimum up to the top of tree (original array) and noting all values each (root and second minimum) was compared against.

Take the sample in Figure 1. Here is the list of values obtained by back-tracking the root (call it Set A).

Set A (for root) - [3, 2, 10, 8]

From this list we obtain the second minimum as 2. Back-tracking from 2, we obtain following list (Set B):

Set B (for Second minimum) [5, 14]

Thus the third minimum is the least value in union of set A and B, which is 3 (ignoring value 2, which was already established as second minimum). Note that the 3rd minimum must be in either Set A or Set B. In above case it was found from Set A.

How about if we have to find the 4th minimum? Well, we have to back track the 3rd least value and collect adjacent values a (one level up). Then add this set to the values for root and second minimum. Then this set will contain the 2nd, 3rd and 4th minimum. Continuing our sample, let’s say Set C consists of such elements obtained by backtracking 3rd min – [4, 11, 16]. Thus the minimum value in union of Set A, B, C is 4 (ignoring already established 2 and 3 as second min and third min resp.), is fourth least value. Note that for calculating kth minimum we don’t backtrack and find the adjacency list for minimum through k -1. We can use already calculated results from any previous such computation. This is the reason why this technique falls under Dynamic Programming – we have optimal substructure and avoid repeated computations.

Now it is time to formalize. To find Kth minimum, back track root, second min, third min up to k-1 min and collect the adjacent values for each - moving from the root (of sub-tree) all the way to the top. The resulting collection will contain the 2nd, 3rd … Kth minimum.

Implementation Details

To obtain Kth minimum, we need collection of all the comparisons from minimum (root) to K-1. Thus we need to make following changes to our previous code for finding second minimum.

1. Return array of values (as indicated in rectangle boxes in Figure 2) while back-tracking root (could also be sub-tree root). Also, in addition to the values, we need the level and also index information for of each of these values. The reason why we need the level and index is because once we determine that a particular value from this list is minimum, we need to back-track the sub-tree from that value (hence we need to know the location of the element at that level) on order to obtain new collection to determine next lower value. Thus we will return a two-dimensional array with first index denoting the value and the other one the index. We can infer the level from the index of the pair (in 2-d array).
For example, for the case above, for the first pass (to find the minimum value), the following will be returned (let’s call it adjacency list):

Thus from above array we can conclude that the second minimum value is 2 (least of first value of all pairs), and this value appears two levels above the root (level 5), at 0th index (at this level).  We will use this info to identify the sub-tree for out next round of back-tracking.

2. Change the api to take back-track a portion of tree. In order to achieve that we need to pass level and index to define the root of sub-tree to back track from.

3. The 2-dimensional list obtained from each run will be added to a 3-dimensional list – full adjacency list for min to K-1 min elements. 3-D list is used rather than merging a bigger 2-D list to preserve and identify the results for a particular run. The reason being once we identify the minimum element for all k-1 runs, the level of that particular element is obtained from particular list where the minimum was found (alternatively our backtracking api could have returned a 3-d array with level info as one more dimension, either way we need this info).

4. Api that take full adjacency list and min value and return the next min element info. The api should identify for which run it found the min and the index of element within that run. Refer Figure 4 below. When this list is passed to this api with value of second min (2), it returns (0,3), which is interpreted as the next min (third min) was found in first array (obtained from first backtrack) and 4th element within that array. Once we have this info, we can look into first array and locate the third min value as 3, at index 1 at level 3 (remember original array is level 1 – Figure 2). Note that for m elements in full adjacency list we are making m comparisons which seem not efficient at first glance, but since the size of the adjacency list is small O(log(n)), hence it is not significant.

5. To find Kth min, we need to find min through k. Hence this algorithm can return partially sorted array up-to k elements.

### Code Can be downloaded from here - KthMinimum.java (remove the .txt extension)

```/\*\*
\*
\* All source code and material on this Blog site is the copyright of Malkit S.
\* Bhasin, 2010 and is protected under copyright laws of the United States. This
\* source code may not be hosted on any other site without my express, prior,
\* written permission. Application to host any of the material elsewhere can be
\* made by contacting me at mbhasin at gmail dot com
\*
\* I have made every effort and taken great care in making sure that the source
\* code and other content included on my web site is technically accurate, but I
\* disclaim any and all responsibility for any loss, damage or destruction of
\* data or any other property which may arise from relying on it. I will in no
\* case be liable for any monetary damages arising from such loss, damage or
\* destruction.
\*
\* I further grant you ("Licensee") a non-exclusive, royalty free, license to
\* use, modify and redistribute this software in source and binary code form,
\* provided that i) this copyright notice and license appear on all copies of
\* the software;
\*
\* As with any code, ensure to test this code in a development environment
\* before attempting to run it in production.
\*
\* @author Malkit S. Bhasin
\*
\*/

public class KThMinimum {

/\*\*
\* @param inputArray
\*            unordered array of non-negative integers
\* @param k
\*            order of minimum value desired
\* @return kth minimum value
\*/
public static int getKthMinimum(int[] inputArray, int k) {
return findKthMinimum(inputArray, k)[k - 1];
}

/\*\*
\* @param inputArray
\*            unordered array of non-negative integers
\* @param k
\*            ordered number of minimum values
\* @return k ordered minimum values
\*/
public static int[] getMinimumKSortedElements(int[] inputArray, int k) {
return findKthMinimum(inputArray, k);
}

/\*\*
\* First output tree will be obtained using tournament method. For k
\* minimum, the output tree will be backtracked k-1 times for each sub tree
\* identified by the minimum value in the aggregate adjacency list obtained
\* from each run. The minimum value after each run will be recorded and
\* successive runs will produce next minimum value.
\*
\* @param inputArray
\* @param k
\* @return ordered array of k minimum elements
\*/
private static int[] findKthMinimum(int[] inputArray, int k) {
int[] partiallySorted = new int[k];
int[][] outputTree = getOutputTree(inputArray);
int root = getRootElement(outputTree);
partiallySorted[0] = root;
int rootIndex = 0;
int level = outputTree.length;
int[][][] fullAdjacencyList = new int[k - 1][][];
int[] kthMinIdx = null;
for (int i = 1; i < k; i++) {
level, rootIndex);
int row = kthMinIdx[0];
int column = kthMinIdx[1];
partiallySorted[i] = root;
level = column + 1;
}

return partiallySorted;
}

/\*\*
\* Takes an input array and generated a two-dimensional array whose rows are
\* generated by comparing adjacent elements and selecting minimum of two
\* elements. Thus the output is inverse triangle (root at bottom)
\*
\* @param values
\* @return
\*/
public static int[][] getOutputTree(int[] values) {
Integer size = new Integer(values.length);
double treeDepth = Math.log(size.doubleValue()) / Math.log(2);
// int intTreeDepth = getIntValue(Math.ceil(treeDepth)) + 1;
int intTreeDepth = (int) (Math.ceil(treeDepth)) + 1;
int[][] outputTree = new int[intTreeDepth][];

// first row is the input
outputTree[0] = values;
printRow(outputTree[0]);

int[] currentRow = values;
int[] nextRow = null;
for (int i = 1; i < intTreeDepth; i++) {
nextRow = getNextRow(currentRow);
outputTree[i] = nextRow;
currentRow = nextRow;
printRow(outputTree[i]);
}
return outputTree;
}

/\*\*
\* Compares adjacent elements (starting from index 0), and construct a new
\* array with elements that are smaller of the adjacent elements.
\*
\* For even sized input, the resulting array is half the size, for odd size
\* array, it is half + 1.
\*
\* @param values
\* @return
\*/
private static int[] getNextRow(int[] values) {
int rowSize = getNextRowSize(values);
int[] row = new int[rowSize];
int i = 0;
for (int j = 0; j < values.length; j++) {
if (j == (values.length - 1)) {
// this is the case where there are odd number of elements
// in the array. Hence the last loop will have only one element.
row[i++] = values[j];
} else {
row[i++] = getMin(values[j], values[++j]);
}
}
return row;
}

/\*\*
\* From the passed full adjacency list and min value scans the list and
\* returns the information about next minimum value. It returns int array
\* with two values:
\* first value: index of the back-track (the min value was found in the
\* Adjacency list for min value, second min etc.)
\* second value: index within the identified run.
\*
\*            Adjacency list obtained after k-1 backtracks
\* @param kth
\*            Order of minimum value desired
\* @param kMinusOneMin
\*            value of k-1 min element
\* @return
\*/
private static int[] getKthMinimum(int[][][] fullAdjacencyList, int kth,
int kMinusOneMin) {
int kThMin = Integer.MAX_VALUE;
int[] minIndex = new int[2];
int j = 0, k = 0;
int temp = -1;

for (int i = 0; i < kth; i++) {
for (j = 0; j < fullAdjacencyList.length; j++) {
if (row != null) {
for (k = 0; k < fullAdjacencyList[j].length; k++) {
if (temp <= kMinusOneMin) {
continue;
}
if ((temp > kMinusOneMin) && (temp < kThMin)) {
kThMin = temp;
minIndex[0] = j;
minIndex[1] = k;
}
}
}
}
}
return minIndex;
}

/\*\*
\* Back-tracks a sub-tree (specified by the level and index) parameter and
\* returns array of elements (during back-track path) along with their index
\* information. The order elements of output array indicate the level at
\* which these elements were found (with elements closest to the root at the
\* end of the list)
\*
\* Starting from root element (which is minimum element), find the lower of
\* two adjacent element one row above. One of the two element must be root
\* element. If the root element is left adjacent, the root index (for one
\* row above) is two times the root index of any row. For right-adjacent, it
\* is two times plus one. Select the other element (of two adjacent
\* elements) as second minimum.
\*
\* Then move to one row further up and find elements adjacent to lowest
\* element, again, one of the element must be root element (again, depending
\* upon the fact that it is left or right adjacent, you can derive the root
\* index for this row). Compare the other element with the second least
\* selected in previous step, select the lower of the two and update the
\* second lowest with this value.
\*
\* Continue this till you exhaust all the rows of the tree.
\*
\* @param tree
\*            output tree
\* @param rootElement
\*            root element (could be of sub-tree or outputtree)
\* @param level
\*            the level to find the root element. For the output tree the
\*            level is depth of the tree.
\* @param rootIndex
\*            index for the root element. For output tree it is 0
\* @return
\*/
public static int[][] getAdjacencyList(int[][] tree, int rootElement,
int level, int rootIndex) {
int[][] adjacencyList = new int[level - 1][2];
int[] rowAbove = null;

// we have to scan in reverse order
for (int i = level - 1; i > 0; i--) {
// one row above
rowAbove = tree[i - 1];

// the root element could be the last element carried from row above
// because of odd number of elements in array, you need to do
// following
// check. if you don't, this case will blow {8, 4, 5, 6, 1, 2}
if (rowAbove.length >= ((adjacentleftIndex + 1) + 1)) {
} else {
}

// if there is no right adjacent value, then adjacent left must be
// root continue the loop.
// just checking for error condition
throw new RuntimeException(
"This is error condition. Since there "
+ " is only one adjacent element (last element), "
+ " it must be root element");
} else {
rootIndex = rootIndex \* 2;
continue;
}
}

// one of the adjacent number must be root (min value).
// Get the other number and compared with second min so far
rootIndex = rootIndex \* 2;
adjacencyList[i - 1][1] = rootIndex + 1;
} else if (adjacentleftElement != rootElement
rootIndex = rootIndex \* 2 + 1;
adjacencyList[i - 1][1] = rootIndex - 1;
} else if (adjacentleftElement == rootElement
// This is case where the root element is repeating, we are not
// handling this case.
throw new RuntimeException(
"Duplicate Elements. This code assumes no repeating elements in the input array");
} else {
throw new RuntimeException(
"This is error condition. One of the adjacent "
+ "elements must be root element");
}
}

}

/\*\*
\* Returns minimum of two passed in values.
\*
\* @param num1
\* @param num2
\* @return
\*/
private static int getMin(int num1, int num2) {
return Math.min(num1, num2);
}

/\*\*
\* following uses Math.ceil(double) to round to upper integer value..since
\* this function takes double value, diving an int by double results in
\* double.
\*
\* Another way of achieving this is for number x divided by n would be -
\* (x+n-1)/n
\*
\* @param values
\* @return
\*/
private static int getNextRowSize(int[] values) {
return (int) Math.ceil(values.length / 2.0);
}

/\*\*
\* Returns the root element of the two-dimensional array.
\*
\* @param tree
\* @return
\*/
public static int getRootElement(int[][] tree) {
int depth = tree.length;
return tree[depth - 1][0];
}

private static void printRow(int[] values) {
for (int i : values) {
// System.out.print(i + " ");
}
// System.out.println(" ");
}

public static void main(String args[]) {
int[] input = { 2, 14, 5, 13, 1, 8, 17, 10, 6, 12, 9, 4, 11, 15, 3, 16 };
System.out.println("Fifth Minimum: " + getKthMinimum(input, 5));

int minimumSortedElementSize = 10;
int[] tenMinimum = getMinimumKSortedElements(input,
minimumSortedElementSize);
System.out.print("Minimum " + minimumSortedElementSize + " Sorted: ");
for (int i = 0; i < minimumSortedElementSize; i++) {
System.out.print(tenMinimum[i] + " ");
}
}
}

```

### Running Time

Avoiding pure mathematics and using approximations, the worst case running time of this algorithm can be calculated as follows:

For finding second min –>  O(n + log(n))
For Kth min (or partial k sort) –>  O(n + klog(n)), ignoring constants (actually, the algorithm should perform much better than this as not all the k elements can be at log(n) depth).

### Final Thoughts

• The above algorithm does provide optimal mechanism to find Kth minimum. This can also be used for partially sorting unsorted array. But, note that this does require additional memory (or the order of original array size). Also, the calculations does not account for processing required for creating tree or the copy operations. What if we are limited by memory and the whole cannot be fit into memory. Obviously this solution will not work.
• Is the implementation too complex? In addition to Tournament method there are other mechanisms to find Kth minimum efficiently. This wiki page enumerates some of those methods - Selection Algorithm. I will try to solve using some of the methods listed on wiki resource. As always, feedback welcome.

## Thursday Dec 16, 2010

### Finding Second Minimum with REPEATING elements in array

I posed few questions at the end of my earlier blog entry about finding 2nd minimum element. Let’s tackle the first question – how to efficiently find second minimum when we have repeating elements in the array.

Let’s say we have following scenario. Notice that the minimum element, 1 appears twice in the list.

As before, we will use tournament method to first find the minimum value. Also note that while progressing through successive levels, the root elements, if repeating, will be compared against each other at some point.

Using tournament method we obtain the minimum as below.

#### Design Details

We can use similar logic as to find the second minimum, but we need to make slight changes to our logic. Our earlier code allowed only one of the adjacent elements of the row above to be root but now it is a valid condition and the of course the result of comparison will be a root element for such cases.

As noted above, if the root element is repeating, it must meet other root element(s) in the back path. So at the point where the two root elements meet, the tree would be divided into two sub-trees and we need to backtrack along the root element in each sub-tree till we reach the top level and select the minimum for each sub-tree. As before, since the second minimum must meet the root element at some point; by comparing the minimum for each path, we can obtain second minimum for repeating root.

#### Implementation Details

We can use similar logic as used for finding second minimum, but we will have to make few changes:

1. The api getSecondMinimum will be extended to take additional parameters.  In addition to the tree, it will also take the level to backtrack from and the root index. Same api can then be recursively used for back traversal of the tree from any point – root or from any level above root.

2. Whenever we hit a level where both the adjacent elements of row above are root elements, obtain the minimum from each sub-tree for both the root elements (left adjacent and right adjacent). Comparing the result of two would give us the second minimum.

3. To back track from root, just pass the total depth of the tree for the level argument and root index as value of 0.

#### Code

Here is updated code for the method getSecondMinimum (no change in rest of methods in previous post). Click here to download complete class - SecondMinRepeatingElements.java (remove the .txt extension).

```	/\*\*
\* The logic for finding second minimum is as follows:
\*
\* Starting from root element (which is minimum element), find the lower of
\* two adjacent element one row above. One of the two element must be root
\* element. If the root element is left adjacent, the root index (for one
\* row above) is two times the root index of any row. For right-adjacent, it
\* is two times plus one. Select the other element (of two adjacent
\* elements) as second minimum.
\*
\* Then move to one row further up and find elements adjacent to lowest
\* element, again, one of the element must be root element (again, depending
\* upon the fact that it is left or right adjacent, you can derive the root
\* index for this row). Compare the other element with the second least
\* selected in previous step, select the lower of the two and update the
\* second lowest with this value.
\*
\* Continue this till you exhaust all the rows of the tree.
\*
\* @param tree
\* @param rootElement
\* @param level
\*            The depth from where to start back-tracking
\* @param rootIndex
\*            Index of the root element at that depth level
\* @return
\*/
public static int getSecondMinimum(int[][] tree, int rootElement,
int level, int rootIndex) {
int secondLeast = Integer.MAX_VALUE;
int[] rowAbove = null;
// we have to scan in reverse order
for (int i = level - 1; i > 0; i--) {
// one row above
rowAbove = tree[i - 1];

// the root element could be the last element carried from row above
// because of odd number of elements in array, you need to do
// following
// check. if you don't, this case will blow {8, 4, 5, 6, 1, 2}
if (rowAbove.length >= ((adjacentleftIndex + 1) + 1)) {
} else {
}

// if there is no right adjacent value, then adjacent left must be
// root continue the loop.
// just checking for error condition
throw new RuntimeException(
"This is error condition. Since there "
+ " is only one adjacent element (last element), "
+ " it must be root element");
} else {
rootIndex = rootIndex \* 2;
continue;
}
}

// one of the adjacent number must be root (min value).
// Get the other number and compared with second min so far
rootIndex = rootIndex \* 2;
} else if (adjacentleftElement != rootElement
rootIndex = rootIndex \* 2 + 1;
} else if (adjacentleftElement == rootElement
// This is case where the root element is repeating
// The tree is now divided into two sub-trees and we need to
// back-track both the sub-trees and select the minimum for
// each. Then lower of the two is second minimum
int minLeftSubTree = getSecondMinimum(tree, rootElement, i,
int minRightSubTree = getSecondMinimum(tree, rootElement, i,
return getMin(minLeftSubTree, minRightSubTree);

} else {
throw new RuntimeException(
"This is error condition. One of the adjacent "
+ "elements must be root element");
}
}

return secondLeast;
}
```

#### Running Time

For single root, the depth of tree will be log(n), for second repeating element (for worst case) will be log(n) – 1 and so on. For m repeating root elements, there will be m backtrackings, one for each repeating root. Thus for m repeating roots, the running time:

n + log(n) + (log(n) -1) +  … + (log(n) – m)

Or,  O(n + mlog(n)), ignoring constants.

Also, note that the for repeating elements other than root element, the running time remains same as previously calculated (for non-repeating elements - O(n + log(n)); can also be obtained by substituting m=1 above) and that the previous code will work fine without any changes.