a simple knapsack problem. The weights \(p_i\) are non-negative, so this is still a relaxation of the original maximisation problem; if that relaxation is infeasible, so is the original problem. Otherwise, we'll find a feasible solution to the surrogate subproblem (i.e., the expected payoff is non-positive as needed), and the solution will be optimal for a relaxation of the original problem, thus optimal or super-optimal for the original problem. We have found a feasible solution in the implicitly defined set of super-optimal solutions, \(X\)! In the unlikely event that the solution is also feasible for the original problem, we have solved the problem to optimality, and we can stop.
Each time we solve a surrogate subproblem, we will find an upper bound on the optimal value. The online learning algorithms do not guarantee any descent criterion for that bound value. However, if we fail to improve the upper bound, we can use the bound we already have to speed up convergence to a feasible solution. We'll instead solve