ROUGH DRAFT authorea.com/108627

# Line Segments

## Algorithm and Runtime

• S $$\leftarrow$$ Stack
• For segment i by ascending x-coordinate: $$O(n)$$.
• Pop from S until the top element s can be hit by i or S is empty: RTP amortised $$O(1)$$.
• Emit index of s or 0 in the empty case: $$O(1)$$.
• Push i onto S: $$O(1)$$.

To see that our pop-step is amortised $$O(1)$$ we note that every element is pushed onto our stack at most one time. We can only pop as many elements as we push, so in total we can pop $$O(n)$$ elements. Our loop runs $$O(n)$$ times, so the amortised pop-step per iteration is $$O\left(\frac{n}{n}\right) = O(1)$$. This guarantees overall $$O(n)$$ runtime $$\square$$.

## Correctedness

1. If an element is popped at iteration $$i$$ it was lower than interval $$i$$. If it is lower than interval $$i$$, any subsequent interval would have to pass through $$i$$ to hit it. As such it cannot be the first-hit point for any subsequent interval.
2. The stack is populated in loop-order. This is ascending x-coord, so higher elements always have larger x-coordinates.

When we consider an interval, we keep on throwing away elements until we find one that we can hit. From (2), among all elements in the stack we can hit, this is the one with the largest x-coordinate. From (1), we throw away a value IFF we cannot use it later. As such, there can be no intervals that could possibly hit first that were not present in the stack. The proof is complete.

# Max Population Cities

## Preprocessing

We store the tuple of coordinates in a range-tree. The auxiliary data structure at each node also keeps track of the maximum population city in the subtree rooted at this node.

Building our auxiliary data-structures for general range-trees take $$O(n)$$ work per layer. We can store the maximum-population data in all subtrees with $$O(n)$$ additional work, maintaining the overall preprocessing time. We do this with a simple in-order traversal on every auxiliary-tree. When we reach an internal node, we just compare the population sizes of the two children and store the larger of the two. Our traversal is $$O(n)$$ and we do $$O(1)$$ work at each node so the total run-time is $$O(n)$$.

## Queries

We query in mostly the same way as general range trees. We first search our $$x$$-tree and then find $$O(\log n)$$ auxiliary data-structures. In each of these, we find the $$O(\log n)$$ sub-trees determining our valid $$y$$-range (Figure 1).

Instead of traversing this segment of the graph, we simply look at the population sizes stored in the root of these subtrees. We find the maximum such value for each auxiliary tree, and then return the maximum over all auxiliary structures.

• $$M\leftarrow 0$$. This will store our overall maximum value.
• For each auxiliary tree: $$O(\log n)$$
• Find the sub-trees determining the valid $$y$$-range: $$O(\log n)$$
• Let $$m$$ be the largest root-value among these sub-trees: $$O(\log n)$$.
• $$M\leftarrow \max(M, m)$$: $$O(1)$$
• Emit $$M$$: $$O(1)$$

## Complexity

Preprocessing: The process of building the auxiliary tree is $$O(n)$$ per layer with $$\log n$$ layers. As such, it runs in $$O(n\log n)$$ time. This is additive to the $$O(n\log n)$$ time required to build the $$x$$-tree, so we have overall $$O(n\log n)$$ pre-processing time.

Space: We also maintain the $$O (n\log n)$$ space bound of range-trees. This follows from the fact that we store only one additional value per node in the auxiliary data-structure.

Query Time: As demonstrated above we have $$O(\log^2 n)$$ queries.

## Correctedness

1. The sub-trees in our auxiliary data structure we consider contain exactly every point that falls into our range-query. This is a consequence of the fact that we have not altered the range-query algorithm up until this point.
2. By construction, the root of every sub-tree in the auxiliary tree contains the maximum size population among all descendent nodes.

From (2), by comparing the root-values of all sub-trees we find the maximum of all nodes that descend from these roots. From (1) this is the same as finding the maximum of all point that falls in our range query. The proof is complete.

An example of the search on our auxiliary tree. We represent the maximum population data inside our nodes. The y-coordinates are at the bottom of the tree in the table. Our search sub-trees are indicated by a filled in circle.

# Smallest Triangles

• A triangle's area is determined by perpendicular distance to a point. As such, we can throw away the y-coordinates for both the segments and points.
• When considering a single interval the width is fixed. As such, the minimum area triangle is found by simply finding the closest point in terms of perpendicular distance
• We scan from L to R. For each segment, we determine the points closest on the left and right. We compare the distance from our segment to each, and return the point of minimum distance.

If we wanted to be extra careful we would add safeguards against the case where our point has the same x-coord as a segment. We would want to do this as we can't make a triangle with three colinear points. We can identify this case by noting when our dist(t,.) function is zero. We can resolve it by keeping track of two previous points at any time (that we ensure have different x-values). When we find a distance of zero, we simply compare it 'two-back'. We can perform a similar process in the other direction, storing a point until we have two points with dist!=0 on either side. We have omitted this process from our algorithm for brevity.

## Algorithm

• $$L \leftarrow S \cup P \cup p_{x=\infty}: O(m+n)$$

• N.B: We add a point with x-coordinate $$\infty$$ to our point set to handle the boundary case of segments occurring after our last point. The other boundary is handled by intialising our LeftPoint as below
• T $$\leftarrow$$ Empty List, LeftPoint $$\leftarrow -\infty$$

• Sort $$L$$ by ascending x-coordinate: $$O((m+n) \log (m + n) )$$

• For each element in $$L: O(m+n)$$

• If it's an interval, store it in $$T$$
• If it's a point denote it RightPoint:
• For each interval $$t$$ in T: RTP amortised $$O(1)$$.
• Compare dist(t,LeftPoint) and dist(t, rightPoint). Emit a tuple of $$t$$ and the argmin of these two distances: $$O(1)$$.
• Remove t from the T: $$O(1)$$
• LeftPoint $$\leftarrow$$ RightPoint

## Complexity

Our runtime depends on the inner-loop being amortised $$O(1)$$. This is clear as an interval is included in a list exactly once before being removed. The total number of elements in the list is $$O(n)$$ and so per outer-loop iteration we have amortised $$O(1))$$.

Time: Running time is $$O((m+n) \log (m + n) )$$ for the preprocessing and $$O(m+n)$$ in the loop. Total of $$O((m+n) \log (m + n) )$$.

Space: We require $$O(m+n)$$ space for the sorted elements and an additional $$O(m+n)$$ in total for the list $$T$$. We have overall $$O(m+n)$$ space complexity.

## Correctedness

1. Triangle areas are a product of perpendicular distance and length. When length is fixed, we minimise area by finding the closest point to our intervals.
2. Our algorithm finds the closest point to all intervals on the left- and right-hand side.
3. For an interval, the closest point of these two must be the closest point in the whole plane. This is the value we return $$\square$$.