Question 3

  1. To sort the scores of the participants for each university, we can use mergesort for our comparative sorting algorithm. Our initial array will just be filled with the (unsorted) scores for every student of every university, and since the array is filled with numerical values, a simple mergesort (with no modifications) is needed to sort the scores. Disregarding space efficienecy, mergesort has a worst case of \(nlogn\).

    For our non-comparitive sort, we can use counting sort because we do know that there is a maximum and minimum score, so the possible values for the counting sort are bounded. Also all the values are integers so there will be no issues using this sort which will take n time.

  2. To sort the scores of participants given k files, where each file includes the sorted scores from specific universities, we will make use of a max heap.

    First we will take the first element of each array (university), and insert it into a max heap. This takes \(klogk\) time. Since these arrays are all sorted we will be inserting the largest elements, then we will pop out the largest element. Popping the largest value takes \(logk\) time. This value is the highest score in all the universities.

    Then, we examine the popped element, and if the element is from array \(i\), then the next element we add to the heap will be from array \(i\). This takes \(logk\) time.

    If we continue this process the max heap will give us the correct order of all the scores.

    We will do all of this n times, so in total it will take \(O(2nlogk)\) time. The total time to sort is \(O(klogk + 2nlogk) = O(nlogk)\).

  3. Given k sorted files we need to find the rth largest score and to do this without sorting the scores of the participants we will be using binary search on all the sorted files. First we will start by looking at the r/k position of each of the k arrays. If r/k is not an integer we will round up for some and round down for others to make sure if you add all the indicies of the arrays it will equal k. It is important to note that during any time during this algorithm the summation of the indices will always be k.

    Next we will find the smallest value of each one of these arrays which will only take k time.

    Then for the array that had the smallest value at the index r/k we know that all the values under that one cannot be the rth largest value. So we ignore the bottom half of the array. Then for all the other arrays we know that the values above the r/k index will not have the rth largest value so we can ignore the top half of all of those arrays. This just shows us that at each iteration we are able to cut our problem in half.

    Now we can look at the our new smaller arrays and we continue this same process. We choose indices such that the summation of them are equal to r and then we find the smallest of these values. At each step we are on average spliting up our problem in half. So we continue this process until we are only left with one element in each array. Since the indices of these positions sum up to r the smallest value is the rth largest number.

    At each step we have to do k comparisons to find the smallest number and we are basically doing binary search to each array the running time is \(k*(time of binary searches)\)

    Runtime= \(\theta(k\sum_{i=1}^{k}logm_i)\)

    We are not sure if we can do this in \(O(\log k (\sum_{i=1}^{k}\log m_{i}+k))\)