Max Population Cities

Preprocessing

We store the tuple of coordinates in a range-tree. The auxiliary data structure at each node also keeps track of the maximum population city in the subtree rooted at this node.

Building our auxiliary data-structures for general range-trees take \(O(n)\) work per layer. We can store the maximum-population data in all subtrees with \(O(n)\) additional work, maintaining the overall preprocessing time. We do this with a simple in-order traversal on every auxiliary-tree. When we reach an internal node, we just compare the population sizes of the two children and store the larger of the two. Our traversal is \(O(n)\) and we do \(O(1)\) work at each node so the total run-time is \(O(n)\).

Queries

We query in mostly the same way as general range trees. We first search our \(x\)-tree and then find \(O(\log n)\) auxiliary data-structures. In each of these, we find the \(O(\log n)\) sub-trees determining our valid \(y\)-range (Figure 1).

Instead of traversing this segment of the graph, we simply look at the population sizes stored in the root of these subtrees. We find the maximum such value for each auxiliary tree, and then return the maximum over all auxiliary structures.

  • \(M\leftarrow 0\). This will store our overall maximum value.
  • For each auxiliary tree: \(O(\log n)\)
    • Find the sub-trees determining the valid \(y\)-range: \(O(\log n)\)
    • Let \(m\) be the largest root-value among these sub-trees: \(O(\log n)\).
    • \(M\leftarrow \max(M, m)\): \(O(1)\)
  • Emit \(M\): \(O(1)\)

Complexity

Preprocessing: The process of building the auxiliary tree is \(O(n)\) per layer with \(\log n\) layers. As such, it runs in \(O(n\log n)\) time. This is additive to the \(O(n\log n)\) time required to build the \(x\)-tree, so we have overall \(O(n\log n)\) pre-processing time.

Space: We also maintain the \(O (n\log n)\) space bound of range-trees. This follows from the fact that we store only one additional value per node in the auxiliary data-structure.

Query Time: As demonstrated above we have \(O(\log^2 n)\) queries.

Correctedness

  1. The sub-trees in our auxiliary data structure we consider contain exactly every point that falls into our range-query. This is a consequence of the fact that we have not altered the range-query algorithm up until this point.
  2. By construction, the root of every sub-tree in the auxiliary tree contains the maximum size population among all descendent nodes.

From (2), by comparing the root-values of all sub-trees we find the maximum of all nodes that descend from these roots. From (1) this is the same as finding the maximum of all point that falls in our range query. The proof is complete.