Prezi Scale challenge

Problem description

Prezi uses virtual machines (VMs) to complete three different kind of jobs:

  • export

  • url

  • default

These jobs have pre-determined processing times, which are distributed according to some general distribution \(G\) (see figures). Each job type has a corresponding type of VM. VM running costs are charged in hourly increments. This can be represented using a cost function \(C(t)\) where \(t\) is the running time of a server in seconds

\[C(t) = \left\lceil \frac{t}{3600}\right\rceil\]

VMs can be turned on and off, but it takes 2 minutes (120 seconds) for a virtual machine to start. This time is charged.

The objective is to minimize the total running cost of the servers, while ensuring that all jobs are handled within 5 seconds. If a job waits for more than 5 seconds, a penalty is incurred. A job is not allowed to wait for more than 2 minutes. The holding cost of a job \(H(t)\) is defined

\[H(t) = \max\left(\left\{ 0, \frac{t - 5}{40} \right\} \right)\]

Two datasets are given, each spanning a period of 48 hours, starting at 10pm GMT (so 2/5pm on the West/East coast). They are of the form timestamp job-duration unique-id queue , for example 1378072800 2.133 9ea24acd-cee3-4248-bf43-fb33e7e9ac4b url with 475858 and 669642 jobs in the respective datasets.


Arrival rate

We can assume our job arrivals to be govern