\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\begin{document}
\title{No regret surrogate decomposition}
\author[1]{Paul Khuong}%
\affil[1]{Affiliation not available}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\sloppy
Optimal, ``no regret,'' online learning algorithm solve the seemingly
impossible ``experts problem,'' with non-asymptotic worst-case
guarantees. One textbook trick in theoretical computer science casts the
linear programming approximate~\emph{feasibility} problem as an instance
of the experts problem. Theoretician are then satisfied by the fact that
optimisation reduces to feasibility~\cite{kun}.
\par\null
Recognising that the conversion of linear programming to the experts
problem is a surrogate relaxation (rather than a Lagrangian relaxation)
lets us directly reduce approximate linear optimisation to online
learning algorithms. We can then exploit the structure of the surrogate
relaxation for a decomposition reformulation (with cloned variable) to
derive theoretically stronger bounds than Lagrangian decomposition,
without destroying the subproblems' structure, and while preserving the
non-asymptotic convergence bounds of no-regret learning.
\par\null
\section*{The experts problem}
{\label{706520}}
We are interested in the adversarial (worst-case) variant of the experts
problem. The problem is to come up with strategies in a two-player game
against an omniscient adversary. Each game has a fixed set of
``experts,'' or moves. At each turn, we must define a mixed strategy,
i.e., define the probability that we'll play each expert or move. The
omniscient adversary then reveals an arbitrary payoff function, and our
actual payoff for this turn is the expected payoff for the mixed
strategy we picked. We continue this game for a number of turns,
potentially forever.
\par\null
Our ``regret,'' for a given game, is the difference between our actual
(expected) payoff and the best payoff achieved by any single expert
(pure strategy),~\emph{a posteriori}. Optimal, ``no regret'' online
learning algorithms guarantee that this regret is in~\(\mathcal{O}\left(\sqrt{T\ \ln k}\right)\),
for~\(T\) turns and~\(k\) experts
(once~\(T\) is large enough); corresponding lower bounds
show that the fully adversarial regret must be in~\(\Omega\left(\sqrt{T}\right)\).
\par\null
This sounds like an almost impossibly powerful guarantee, at first. The
key is that the regret is compared to a static~\emph{pure} strategy. In
an iterated rock-paper-scissors game, that's equivalent to coming up
with a strategy that's never much worse than always playing rock, or
always playing paper, or always playing scissors\ldots{} not a hard
feat. An adversarial no-regret strategy can't expect anything about the
future; it must look at the cumulative payoff for turns we have already
played, and simply give more weight to moves or experts that are doing
well so far. If the resulting mixed strategy is heavily penalised, the
current best experts also were penalised, so the incremental regret
shouldn't be too bad.
\par\null
\section*{Algorithms and bounds for the experts
problem}
{\label{852316}}
Classic algorithms for the experts problem look like the multiplicative
weight update algorithm~\cite{kale} or mirror
descent~\cite{orecchia2016}. They must know the number of iterations
ahead of time, as well as an upper bound on a measure of ``hardness''
for the payoff defined by the adversary, in order to determine a fixed
step size. For the MWU algorithm, hardness is the range of the payoffs.
If we assume the payoffs are in~\([-1, 1]\), letting the step
size~\(\eta = \sqrt{\frac{\ln k}{T}}\) is guaranteed to offer regret at most
\[R_\mathrm{MWU} \leq \eta \sum_{t=1}^T |m_\star^t| + \frac{\ln k}{\eta} \leq 2 \sqrt{T\ln k},\]
where~\(m_\star^t\) is the payoff for the \emph{a posteriori} best
expert, at turn \(t\).
\par\null
That's a strong non-asymptotic bound, which applies for any sequence of
payoff functions, as long as the value for each expert is always in the
range~\([-1, 1]\). It's also pessimistic: the step size is fixed
ahead of time, so the algorithm will hedge as defensively on ``easy''
inputs as it will against a worst-case adversary.
\par\null
AdaHedge~\cite{koolen2014} determines its step size adaptively, without
knowing the time horizon (number of turns) nor the range of losses ahead
of time. Assuming payoff are in the same range~\([-1, 1]\), the
regret for AdaHedge after~\(T\)~iterations is at most
\[R_\mathrm{AdaHedge} \leq 2 \sqrt{T \ln k} + \frac{8}{3} \ln k + 4,\]
which is only slightly worse than the MWU bound\ldots{} and could be
much better for easier instances. The fact that AdaHedge adapts to the
range of payoff isn't that interesting: the actual range will pretty
much always be~\([-1, 1]\) for our decomposition application. Not
having to specify the horizon is more useful: it means that, on easier
instances, we will be able to stop AdaHedge earlier than the worst-case
iteration bound.
Note 2020-12-20:
{[}AdaNormalHedge{]}(\url{https://arxiv.org/abs/1502.05934})~is
similarly parameter-free, and naturally deals with constraint generation
(and, thus column generation).
The Adaptive Exponentiated Gradient - Path (AEG-Path)
algorithm~\cite{liang2014}~exploit a finer grained measure of
hardness, the sum of absolute difference between the payoff value for
consecutive iterations, for each expert. However, if the time horizon is
not fixed, AEG-Path's constant multiplicative factor on top of the
usual~\(\sqrt{\sum_{t=1}^T \mathrm{hardness}_t \ln k}\) is 10. The next section will show how this
increase, from 2 to 10, means that AEG-Path would need, in the worst
case, 25 times as many iterations to reach a nearly feasible solution of
the same quality as AdaHedge or MWU.
\par\null
\section*{The classic reduction of LP feasibility to the experts
problem}
{\label{654771}}
Feasibility for a linear program of the form
\begin{align*}
Ax &\leq b,\\
x &\in X,
\end{align*}
where~\(X\) is a closed convex set, may be reduced to the
experts problem, as long as~\(X\) is equipped with an
optimisation oracle for linear
functions~\cite{kale}~\cite{kun}. We map one expert to
each linear inequality~\(a_i x\ \leq b_i\). Given a probability
assignment~\(p_i\) for each expert~\(i\), the
no-regret algorithm's mixed strategy for the current turn, we solve~
\[z(p) = \min_{x\in X} p^\top Ax.\]
If~\(z(p) > p^\top b,\) the instance is infeasible. Otherwise,
let~\(x(p)\) be a minimiser. We assign the
payoff~\((b_i - a_i x(p))\) to each expert. The expected payoff for the
mixed strategy~\(p\) is non-positive (\(z(p) \leq p^\top b\)),
and each expert's payoff corresponds to its constraint's violation.
After~\(T\) iteration, the regret is at
most~\(\mathcal{O}(\sqrt{T \ln k})\), and we know that our mixed strategies' total
payoff is at most 0. This means that the sum of violations for each
expert is at most~\(\mathcal{O}(\sqrt{T \ln k})\), so we can simply take the average
of the~\(T\) solutions~\(x(p)\) to obtain a
solution where each constraint is violated by at
most~\(\mathcal{O}\left(\frac{\sqrt{T \ln k}}{T}\right) = \mathcal{O}\left(\sqrt{\frac{\ln k}{T}}\right)\).~ If we use the MWU or AdaHedge algorithms, and
the payoffs (violation / satisfaction) are always in
\([-1, 1]\), we can let~\(T \geq 4 \left(\frac{\ln k}{\varepsilon}\right)^2\) and know that we
will always find an~\(\varepsilon\)-feasible solution.
\par\null
This doesn't work for optimisation: we're using Lagrangian optimisation
with weights normalised to 1 (to fit the online learning mould), and we
can only apply this normalisation for pure feasibility problems.
\par\null
\section*{Surrogate relaxation to reduce linear optimisation to the
experts
problem}
{\label{771463}}
The reduction for feasibility works with any closed convex
set~\(X\) in
\begin{align*}
Ax &\leq b,\\
x &\in X.
\end{align*}
We could let~\(X\) be the set of superoptimal solutions,
solutions for which~\(c^\top x \geq z^\star\), where
\[z^\star = \max c^\top x,\]
s.t.
\begin{align*}
Ax &\leq b,\\
x&\in X_0.
\end{align*}
The previous reduction of linear feasibility to online learning would
take the average of super-optimal solutions, so we would end up with a
super-optimal~\(\varepsilon\)-feasible solution, i.e.,
an~\(\varepsilon\)-feasible solution to the optimisation problem!
\par\null
We don't even know what the optimal solution value is, so it's hard to
implement the optimisation oracle for~\(X\). There's
another trick that can help us: we don't need an optimisation oracle to
make the reduction work, we only need a feasibility oracle. As long as
we always find one~\(x \in X\) such that~\( p^\top A x \leq p^\top b\), the
expected payoff will be non-positive, and the reduction works---and, if
there is no such solution, the problem is infeasible.
\par\null
We can find such a solution by solving to optimality the surrogate
relaxation subproblem
\[\quad\max c^\top x\]
s.t.
\begin{align*}
p^\top A x &\leq p^\top b,\\
x &\in X_0,
\end{align*}
a simple knapsack problem. The weights~\(p_i\) are
non-negative, so this is still a relaxation of the original maximisation
problem; if that relaxation is infeasible, so is the original problem.
Otherwise, we'll find a feasible solution to the surrogate subproblem
(i.e., the expected payoff is non-positive as needed), and the solution
will be optimal for a relaxation of the original problem, thus optimal
or super-optimal for the original problem. We have found a feasible
solution in the implicitly defined set of super-optimal
solutions,~\(X\)! In the unlikely event that the solution
is also feasible for the original problem, we have solved the problem to
optimality, and we can stop.
Each time we solve a surrogate subproblem, we will find an upper bound
on the optimal value. The online learning algorithms do not guarantee
any descent criterion for that bound value. However, if we fail to
improve the upper bound, we can use the bound we already have to speed
up convergence to a feasible solution. We'll instead solve
\[\quad\min p^\top Ax\]
s.t.
\begin{align*}
c^\top x &\geq \overline{z} & \overline{z} \textrm{ is the current best upper bound},\\
x &\in X_0,
\end{align*}
and find a solution that maximises feasibility, while keeping the
objective value above our current best upper bound.
When~\(X_0\) is a box and we solve the surrogate subproblem
as a linear knapsack problem, we simply stop accumulating elements
greedily as soon as either the knapsack is full, or the objective value
equal to the best upper bound.
I believe this shows the true nature of the reduction from linear
programming to no-regret online learning is a surrogate
relaxation~\cite{glover1965}~\cite{Glover_1975} method, not a
Lagrangian relaxation. The fact that the weights are in the probability
simplex (non-negative and sum to 1) is a nice hint that we're dealing
with surrogate constraints. It also helps us understand why the
reduction gives us stronger worst-case bounds than optimal subgradient
methods: even when everything is nicely normalised, subgradient methods
must search the space of Lagrange multipliers~\([0, 1]^k\), while
the surrogate relaxation's master problem only needs to explore the much
smaller probability simplex~\(\Delta^k\).~ While the surrogate
relaxation approach scales its iteration count with~\(\ln k\),
the log of the number of relaxed constraints, subgradient methods scale
with~\(\| w_0 - w^\star \|^2\), the square of the Euclidean distance between
the initial solution and the closest optimal vector of multipliers. In
general, that squared distance is~\emph{linear}~in the number of relaxed
constraints.
The surrogate subproblem is slightly more complicated than the
Lagrangian relaxation subproblem, but, as the instance grows, the
pricing problem becomes exponentially easier for surrogate relaxation
than for Lagrangian relaxation! However, there's a hidden problem here:
the number of iteration grows with the maximum violation (or
satisfaction) over all constraints, and that's not polynomial. We can
fix that by making sure the relaxed constraints can only be satisfied or
violated by a bounded amount.
\section*{Surrogate decomposition for
free}
{\label{877692}}
Lagrangian decomposition dominates Lagrangian
relaxation~\cite{kim1987}. Instead of relaxing constraints from the
original formulation, we introduce new auxiliary variable and
constraints, and relax these constraints~\cite{guignard2003}.
For example, in order to relax
\[\quad \max c^\top x\]
s.t.
\begin{align*}
Ax &\leq b,\\
Bx &\leq d,\\
x &\in X,
\end{align*}
a Lagrangian relaxation approach might associate multipliers with rows
of~\(A\), and solve subproblems of the form
\[\max c^\top x - \lambda (A x - b)\]
s.t.
\begin{align*}
Bx &\leq d,\\
x &\in X.
\end{align*}
\par\null
Lagrangian decomposition instead reformulates the initial problem as
\[\max c^\top x\]
s.t.
\begin{align*}
Ax &\leq b,\\
By &\leq d,\\
x &= y,\\
x&\in X,\\
y&\in X,
\end{align*}
and relaxes only the coupling constraints~\(x = y\). The
subproblem becomes
\[\max c^\top x - \lambda (x - y)\]
s.t.
\begin{align*}
Ax &\leq b,\\
By &\leq d,\\
x&\in X,\\
y &\in X,
\end{align*}
which can be maximised independently for~\(x\)
and~\(y\).
If there are more than two components, it becomes natural to have a set
of artificial variables for each block of constraints, with the
reformulation
\[\max c^\top x\]
s.t.
\begin{align*}
x &= y,\\
x &= z,\\
x &= \ldots,\\
Ay &\leq b,\\
Bz &\leq d,\\
\ldots&\leq \ldots,\\
x &\in X,\\
y &\in Y \supseteq X,\\
z &\in Z \supseteq X,\\
\ldots &\in \ldots \supseteq X,
\end{align*}
and relaxed subproblem
\[\max c^\top x - \lambda(x - y) - \mu (x - z) - \ldots\]
s.t.
\begin{align*}
Ay &\leq b,\\
Bz &\leq d,\\
\ldots&\leq \ldots,\\
x &\in X,\\
y &\in Y \supseteq X,\\
z &\in Z \supseteq X,\\
\ldots &\in \ldots \supseteq X.
\end{align*}
Again, the relaxed subproblem may be maximised separately for each block
of variables~\(x,y,z,\ldots\) In practice, this natural decomposition
is usually a bad idea for Lagrangian decomposition: the profit for
decision variables~\(x\) is too decoupled from the impact
of the decision variables~\(y, z, \ldots\) in their respective
constraint block. Eventually, the pricing problem transfers the costs
correctly, but, depending on scaling between the constraints and the
objective function, that can take many iterations.
We shouldn't have any scaling issue with a surrogate relaxation. If we
apply surrogate relaxation to the previous reformulation, we obtain
\[\max c^\top x\]
s.t.
\begin{align*}
\lambda (x - y) + \mu(x - z) + \ldots &= 0,\\
Ay &\leq b,\\
Bz &\leq d,\\
\ldots &\leq \ldots,\\
x &\in X,\\
y &\in Y \supseteq X,\\
z &\in Z \supseteq X,\\
\ldots &\in \ldots \supseteq X.
\end{align*}
That doesn't separate nicely, but, more importantly, our reduction to
the expert problem can't handle the equality constraints. We must turn
the equality linking constraints into pairs of inequalities, and apply
surrogate relaxation to all linking inequalities at once.
The decomposition reformulation becomes
\[\max c^\top x\]
s.t.
\begin{align*}
x &\leq y, & \textrm{multipliers } \lambda\\
y &\leq x, & \textrm{multipliers } \lambda^\prime\\
x &\leq z, & \textrm{multipliers } \mu\\
z &\leq x, & \textrm{multipliers } \mu^\prime\\
x &= \ldots,\\
Ay &\leq b,\\
Bz &\leq d,\\
\ldots&\leq \ldots,\\
x &\in X,\\
y &\in Y \supseteq X,\\
z &\in Z \supseteq X,\\
\ldots &\in \ldots \supseteq X,
\end{align*}
and relaxes into
\[\max c^\top x\]
s.t.
\begin{align*}
(\lambda - \lambda^\prime) (x - y) + (\mu - \mu^\prime)(x - z) + \ldots &\leq 0,\\
Ay &\leq b,\\
Bz &\leq d,\\
\ldots &\leq \ldots,\\
x &\in X,\\
y &\in Y \supseteq X,\\
z &\in Z \supseteq X,\\
\ldots &\in \ldots \supseteq X.
\end{align*}
With this single linking surrogate inequality, we can finally separate
the subproblem. The artificial variables~\(y,z,\ldots\) do not have
anything in the objective function. The only thing they can do to
improve the objective function is maximise feasibility in the linking
inequality. For each set of artificial variables, the subproblem
decomposes into
\[w_y = -\min (\lambda^\prime -\lambda) y \]
s.t.
\begin{align*}
Ay&\leq b,\\
y&\in Y \supseteq X,
\end{align*}
similarly for~\(z\),
\[w_y = -\min (\mu^\prime - \mu) z \]
s.t.
\begin{align*}
Bz&\leq d,\\
z&\in Z \supseteq X,
\end{align*}
and so on.
\par\null
Once these subproblems are solved, we'll have the optimal (minimal)
contribution for the artificial variables to the linking inequality~
\[(\lambda - \lambda^\prime) (x - y) + (\mu - \mu^\prime)(x - z) + \ldots \leq 0.\]
We can take that into account and solve the last remaining subproblem
\[\max c^\top x\]
s.t.
\begin{align*}
(\lambda - \lambda^\prime) x + (\mu - \mu^\prime) x + \ldots &\leq w_y + w_z + \ldots,\\
x &\in X,
\end{align*}
a straightforward knapsack problem. We may choose how to solve this
subproblem however we want, with an exact method, a continuous
relaxation (e.g., the linear multiple choice knapsack problem,
if~\(X\)~includes GUB constraints), or any other hybrid
that guarantees an optimal or super-optimal solution. Again, we can also
stop filling the knapsack when we reach a super-optimal objective value.
If the knapsack problem is infeasible, the original problem is also
infeasible; if all the clone variables agree with the knapsack
subproblem, we have a feasible optimal solution to the original problem.
\par\null
This surrogate decomposition method dominates Lagrangian decomposition
in theory, without requiring anything new from the subproblem: just like
Lagrangian decomposition, each subproblem need only be equipped with an
optimisation oracle. We also need a knapsack solver, but that's a well
understood problem. Crucially, the surrogate decomposition's
weight-setting (pricing) problem can be solved with AdaHedge, or any
other no-regret online learning algorithm, with an iteration count that
scales with~\(\frac{\ln 2k}{\varepsilon^2}\), where~\(k\) is the number
of cloned variables. The number of clones is in~\(\mathcal{O}(m n),\) so
this compare favourably with subgradient
methods'~\(\frac{k}{\varepsilon^2}\)~(or~\(\frac{k}{\varepsilon}\) if subproblems can be
smoothed).
\par\null
The~\(\varepsilon\) for no-regret surrogate decomposition measures
the worst-case infeasibility of all linking constraints, while
the~\(\varepsilon\) for subgradient methods measures suboptimality in
the bound value (with weak guarantees on the feasibility of the
associated fractional primal solution~\cite{frangioni2005}
\cite{anbil2000}). In practice, I think the no-regret decomposition's
measure is more useful: we have a solution that's feasible for the
convexified subproblem, within some measurement error for each block of
constraint. If the result is widely super-optimal, maybe the instance is
badly conditioned and should be formulated differently.
\par\null
In the common case where~\(X \subseteq [0,1]^n\) (and similarly for the
clones~\(Y, Z, \ldots\)), the range of losses is~\([-1, 1]\),
which is pretty much ideal for online learning. When relaxing discrete
optimisation problems, the subproblems will usually be solved to
integrality, and, even if it's impractical to solve the knapsack problem
to integrality, its continuous relaxation will have very few (one)
fractional value at optimum. The end result is that the we actually
expect the losses to cover~\([-1, 1]\) at each iteration. This
means it probably doesn't make sense to look into complex online
learning algorithms that can adapt to the range of losses; adaptiveness
to path length (variation in each expert's loss, i.e., constraint
violation, across consecutive iterations) might be.
\par\null
Another annoying thing in the practice of Lagrangian decomposition is
that we must always solve the subproblems to near-optimality in order to
obtain valid subgradients. However, the pricing problem will often yield
sequences of similar Lagrange multipliers, so, when solving subproblems
as MIPs, we end up exploring very similar branch-and-bound trees. That's
what lead to the reformulation in~ my dissertation. With surrogate
decomposition, we only have to generate a solution that's known to be
optimal or super-optimal, and feasible for the relaxed problem. We can
heuristically assemble solutions by looking back at the history of each
component, solve the resulting knapsack subproblem, and see if the
result is super-optimal; if so, we've generated a valid superoptimal
solution without solving each component to optimality!
\par\null
For large enough instances, surrogate relaxation with no-regret master
problems should eventually outperform Lagrangian decomposition, not only
in terms of theoretical bound value, but also with respect to the number
of iterations. Moreover, the surrogate relaxation will either detect
infeasibility, or yields a super-optimal and approximately feasible
solution in the intersection of the convex hull of all the subproblems'
feasible spaces. Infeasibility detection and fractional solutions are
both useful when the relaxation is embedded in a larger search
algorithm, or guides primal heuristics. Finally, the subproblems are
trivially solvable in parallel (and even the knapsack can be
parallelised), so we should be able to exploit GPUs and distributed
computing to speed up the surrogate relaxation subproblems.
\par\null
\section*{What's different this time?}
{\label{192683}}
Lagrangian decomposition has been around since the 70s, and surrogate
relaxation even earlier than that. Surrogate relaxation, let alone the
decomposition approach in particular, hasn't made it past a niche role,
e.g., in multidimensional knapsack solvers. Why makes the approach
described here better?
\par\null
The main difficulty in applying surrogate relaxation is the search for
dual multipliers. The surrogate dual function isn't convex, it's only
quasi-convex. While its level sets do form convex sets, the dual
functions has plateaux, and not just around the optimum: it's not hard
to see how a marginal change in the importance of a few constraints
might fail to change the optimal solution. In his doctoral
dissertation~\cite{karwan1976surrogate}, Karwan constructs an example where the
subgradient is not an ascent direction, although there does exist one.
\par\null
In reaction to this lack of subgradient, subgradient-like methods have
to look for strictly infeasible solutions, and slowly turn down the
degree of infeasibility. Not only do the simple algorithms inherit the
unpredictability of subgradients methods, but they now have another
parameter that must be tuned in an outer loop.
\par\null
Alternatively, it is possible to adapt the cutting plane
method~\cite{Kelley_Jr__1960} to this setting. Rather than finding the set
of multipliers that result in the maximin objective value for all
relaxed solutions we have so far, we search for the set of multipliers
that result in the maximin violation for all surrogate relaxed
solutions. Again, we have the same instability issues as the usual
method, which can be solved with various trust region tricks. However,
the cutting plane approach is still slow (requires solving an LP, if not
a QP, at each iteration), and needs to use more and more memory in order
to guarantee convergence.
\par\null
It sounds like, except for the 2-dimensional cases, where specialised
dual ascent methods do well, neither subgradient-style nor cutting plane
methods scale well. That's not surprising: Lagrangian relaxation methods
have the same issue with their pricing problems.
\par\null
The no-regret approach is different. It directly handles the
quasi-convex dual, does not need to store more than~\(\mathcal{O}(1)\)
solution and violation vectors, and offers strict convergence
guarantees. We might have a practical approach here.
\par\null
As for the decomposition approach\ldots{} decomposition introduces a lot
more dual multipliers than relaxation. The pricing problem is already
hard enough for surrogate relaxation, there's no reason to think it'll
do better on decomposition. With no-regret pricing algorithms, we can
scale to a large number of dual multipliers, and it makes sense to
introduce a ton of multipliers to both improve the relaxation's strength
and simplify the (separable!) subproblem.
\selectlanguage{english}
\FloatBarrier
\bibliographystyle{plainnat}
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}