CS 344 HW#5
A. First, we will put all of the \(S(u_i,u_j)\) values into an array that keeps track of the two people and their similarity level. If there is \(n\) people, the number of connections will be on the order of \(n^2\).
Now, we can use counting sort to sort all of these values from the largest to smallest. To use counting sort, we have to make sure these two assumptions are satisfied first:
1. The values must be an integer (we know this because of how the problem is set up).
2. We must know the range of the numbers.
We can find the range of these numbers in linear time by looking through every single value, and keeping track of the max number (at the time). So, in linear time, we can find the max value. Then, we can use counting sort which also costs linear time, so in total, the time for the sort is \(n^2\) (to find the max)\(+\) \(n^2\)(to do the sort) = \(O(n^2)\). Although it looks like it’s not, the efficiency is still linear.
Now, we take the largest value in our list and place those users into the same group. WE will use a graph data structure ot keep track of which nodes are connected to each other. This data structure will be explained in part b. The runtime for using this data structure is \(O(log^*(n))\).
So when we find the max value from the list, we put these two values into the same group, and update the data structre. THen, we find the next highest value, and check if they are in the same group. If they are, move on to the next one, and if they aren’t, put them in the same group.
We also have a counter that starts at \(n\) (number of users), and decrements each time, the two nodes are connected. WE stop once that counter equals \(k\), because that will mean we have \(k\) groups. So that means this program runs \(n-k\) times, before we get \(k\) groups, and then we can stop.
To sort this last of satisfaction levels: \(O(n^2)\).
Using the data structure to keep track of which nodes are connected: \(O(log^*n)\). We need to do this \(n-k\) times.
Total Runtime: \(O(n^2+(n-k)log^*n) = O(n^2)\)
B. Explanation of the data structure sto keep track of which nodes are connected:
Now to keep track of the users that are in a connected component together, we will use a grpah. INitially, each user will have its own individual node and it will have one edge that points to itself. Then each node will have a rep value. If two nodes ahve the same rep value, it is part of the same connected compoenent. To find a node’s rep value, you have to follow the edges that leave the node, and when you searched all the edges, the node you get to, will be the rep value.
So in the initial case, all the nodes point to themselves, so node A has rep value A. So all the nodes have different rep values, and they all start in their own connected components. Now when you want to join to connected components you take the pointer from one node and point it to the other one.