Anytime capacity, entropy power inequality, Markov channels, Markov jump linear systems, quantized control.

Introduction

We consider the problem of moment stabilization of a dynamical system where the estimated state is transmitted for control over a time-varying communication channel. This has been studied extensively in the context of networked control systems and discussed in several special issue journals dedicated to the topic \cite{Baillieul--Antsaklis20071,Baillieul--Antsaklis20072,Franceschetti-Kumar-Mitter-Teneketzis-2008}. Recently, it gained renewed attention due to its relevance for the design of cyberphysical systems \cite{Kim-Kumar2012}. A tutorial review with extensive references appears in \cite{Franceschetti-Minero-2014}.

The notion of Shannon capacity is in general not sufficient to characterize the trade-off between the entropy rate production of the system, expressed by the growth of the state space spanned in open loop, and the communication rate required for its stabilization. A large Shannon capacity is useless for stabilization if it cannot be used in time for control. For the control signal to be effective, it must be appropriate to the current state of the system. Since decoding the wrong codeword implies applying a wrong signal and driving the system away from stability, applying an effective control signal depends on the history of whether previous codewords were decoded correctly or not. In essence, the stabilization problem is an example of interactive communication \cite{shannon1961}, where two-way communication occurs through the feedback loop between the plant and the controller. For this reason, usage of alternative capacity notions with stronger reliability constraints than a vanishing probability of error has been proposed, including the zero-error capacity \cite{matveed-savkin2007b}, originally introduced by Shannon \cite{Shannon1956}, and the anytime capacity of Sahai and Mitter  \cite{Sahai2001}, \cite{Como-Fagnani-Zampieri2010,Sahai-Mitter2006,simsek-jain-varaija2004,Xu2005}.

Within this general framework, we focus on the \(m\)-th moment stabilization of an unstable scalar system whose state is communicated over a rate-limited channel capable of supporting \(R_k\) bits at each time step \(k\) and evolving randomly in a Markovian fashion, see Fig. \ref{fig:scheme}. The rate process is known casually to both encoder and decoder. Many variations of this “bit-pipe” model have been studied in the literature  , including the case of fixed rate channel; the erasure channel where the rate process can assume value zero; and the packet loss channel, where the rate process can oscillate randomly between the two values of zero and infinity, allowing a real number with infinite precision to be transported across the channel in one time step. Connections between the rate limited and the packet loss channel have been pointed out in \cite{Minero-etal-2009,Coviello-etal2012}, showing that results for the latter model can be recovered by appropriate limiting arguments.

Beside the bit-pipe model, stabilization over the additive white Gaussian channel has also been considered in \cite{Tatikonda-etal2004,Braslavsky-Middleton-Freudenberg2007,Middleton-Roja2009,Freudenberg-Middleton-Solo2010,Goodwin-Quevedo2010} and in this case the Shannon capacity has been shown to be sufficient to express the rate needed for stabilization. Extensions to the additive colored Gaussian channel \cite{Ardestanizadeh--franceschetti2012} show that the maximum “tolerable instability” — expressed by the sum of the logarithms of the unstable eigenvalues of the system that can be stabilized by a linear controller with a given power constraint over a stationary Gaussian channel— corresponds to the Shannon feedback capacity \cite{Kim2010}, that assumes the presence of a noiseless feedback link between the output and the input of the channel and that is subject to the same power constraint. This suggests a duality between the problems of control and communication in the presence of feedback, and indeed it has been shown that efficient feedback communication schemes can be obtained by solving a corresponding control problem \cite{Elia2004,Ardestanizadeh-etal-2012}.

The major contribution of the present paper is the introduction of a stability threshold function of the channel’s parameters and of the moment stability number \(m\) that converges to the Shannon capacity for \(m \rightarrow 0\), to the zero-error capacity for \(m \rightarrow \infty\), and it provides a parametric characterization of the anytime capacity for the remaining values of \(m\). This function yields a novel anytime capacity formula in the special case of the \(r\)-bit Markov erasure channel, as well as novel formulas for memoryless rate processes including Binomial, Poisson, and Geometric distributions. It also provides a general strategy to compute the anytime capacity of an arbitrary memoryless channel with given rate distribution. To prove our results, we require some extensions of the theory of Markov Jump Linear Systems (MJLS), that are of independent value. In passim, although we do not deal with the case of vector systems, we point out that our results can be extended to this case exploiting usual bit-allocation techniques outlined in \cite{Minero-etal-2009,Coviello-etal2012}, at the expense of a more technical treatment that does not provide additional engineering insight.

The rest of the paper is organized as follows. Some preliminary results on Markov Jump Linear Systems, necessary for our derivations are presented in Section \ref{sec:mjls}. Section III describes the system and channel model and introduces the stability threshold function, illustrating some of its properties. Section IV describes relationships with the anytime capacity, and provides some representative examples. Section V provides the formula for the anytime capacity of the Markov erasure channel.

Throughout the paper, the following notation is used. Logarithm are assumed to be in base two; random variables are denoted with uppercase letters, while their realizations with lowercase letters; matrices are also denotes in uppercase letters, but we denote them with the special typeset \({{\mathrm{A}}}\).

Markov Jump Linear Systems.

\label{sec:mjls} Consider the scalar non-homogeneous MJLS \cite{Costa-et-al-2004} with dynamics \[\label{eq:mjls} z_{k+1}=a_k z_k+w_k,\] where \(z_k\) denotes the state, \(w_k\) is an additive disturbance, and \(\{A_k\}_{k\geq 0}\) is a Markov rate process defined on a finite set \[\begin{aligned} \label{eq:support1} \mathcal{A} = \{\bar{a}_1,\ldots,\bar{a}_n\} \subseteq \mathbb{R}^n,\end{aligned}\] and with one-step transition probability matrix \({{\mathrm{P}}}\) having entries \[\begin{aligned} \label{eq:P1} p_{ij} = { {\sf P}\{A_k = \bar{a}_j| A_{k-1} = \bar{r}_i\}}\end{aligned}\] for every \(i,j \in \{1,\dotsc,n\}\).

We assume that the initial condition \(Z_0\) and the disturbance process \(\{W_k\}_k\) are mutually independent random variables with finite \(m\)-th moment. The system \eqref{eq:mjls} is said to be weakly \(m\)-th moment stable if \[\label{eq:weak} \sup_{k} \operatorname{\sf E}( |Z|^m )< \infty.\]

Let \({{\mathrm{A}}} \in \mathbb{Z}_+^{n\times n}\) be a diagonal matrix with diagonal entries \( \bar{a}_1 , \dots , \bar{a}_n\), i.e., \[\begin{aligned} \label{eq:Adiag} {{\mathrm{A}}} = \text{diag}(|\bar{a}_1|^m , \dots , |\bar{a}_n|^m).\end{aligned}\] The following lemma states the necessary and sufficient condition for \(m\)-th moment stability of the system \eqref{eq:mjls}.

\label{lemmamjls} For any \(m \in \mathbb{R}^+\), the MJLS \eqref{eq:mjls} is weakly \(m\)-th moment stable if and only if \[\label{eq:pp} \rho({{\mathrm{P}}}^{\mathsf{T}}{{\mathrm{A}}})\lesssim 1,\]

The usage of the symbol “\(\lesssim\)” indicates that while the necessary condition holds with a weak inequality, the sufficient condition requires a strong inequality.

The sufficient condition is obtained by subsampling the original MJLS at a sampling rate of \(\tau\) samples and then showing the weak stability of the subsampled system \(Y_k = Z_{k \tau}\), \(k=0,1,\cdots\). The weak stability of \(\{Y_k\}_k\) is sufficient to ensure that \eqref{eq:weak} holds, because the \(m\)-th moment of the system in fact cannot grow unbounded in any finite time horizon due the assumptions that \(|\bar{a}_1| , \dots , |\bar{a}_n|\) are finite and that \(\sup_k \mathbb{E}(|W_k|^m) \le \infty\).

By iterating \eqref{eq:mjls} \(\tau\) times, it can be seen that the subsampled process is a non-homogeneous MJLS with dynamics \[\begin{aligned} \label{eq:sub} y_{k+1} = b_{k} y_k + c_k,\end{aligned}\] where for every \(k \geq 0\) we define \(B_k = \prod_{j= k \tau}^{(k+1)\tau-1} A_k\) as the random expansion of the open-loop system during the \(k\)-th sampling interval and \(C_k = \sum_{j=0}^{\tau-1} \bigl( \prod_{i=j}^{\tau-1} A_{k \tau+i} \bigr) W_{k \tau + j}\) as the total disturbance entering the system in the \(k\)-th sampling interval. Notice that \(\sup_k \mathbb{E}(|C_k|^m) < \infty\), since by assumption the disturbance process \(\{W_k\}_k\) has bounded \(m\)-th moment.

By taking the \(m\)-th power in both sides of \eqref{eq:sub} and by applying the inequality \((x+y)^m \le 2^m ( x^m + y^m )\), which holds for any \(m>0\), it follows that \[\begin{aligned} \label{eq:0} \mathbb{E}(|Y|^m) & \le 2^m \mathbb{E}(| B_k|^m |Y_k |^m) + \text{const},\end{aligned}\] where \(\text{const}\) represents a uniform bound on the \(m\)-th moment of \(\{C_k\}_k\). Let \(v_{k,i} = \mathbb{E}( |Y_{k+1}|^m 1_{\{A_{k} = \bar{a}_i}\} )\), \(i=1,\dotsc,n\). By repeatedly applying the law of total probability on both sides of \eqref{eq:0}, it can be easily seen that the vector \(v_k=(v_{k,1},\ldots,v_{k,n})^T\) satisfies the following component-wise vector inequality \[v_{k+1} \leq 2^m \bigl( {{\mathrm{P}}}^{\mathsf{T}}{{\mathrm{A}}}\bigr)^\tau v_{k} + c,\] where \(c\) is again a constant that only depends on the statistics of the disturbance process. If \eqref{eq:pp} holds we can choose the subsampling rate \(\tau\) large enough to ensure that \(\rho(2^m \bigl( {{\mathrm{P}}}^{\mathsf{T}}{{\mathrm{A}}}\bigr)^\tau) <1\) and thus that the above recursion remains bounded. Since \(\operatorname{\sf E}(|Y_{k}|^{m}) = \sum_{i=1}^{n}v_{k,i} \) it follows that \(\rho({{\mathrm{P}}}^{\mathsf{T}}{{\mathrm{A}}}) < 1\) is sufficient to ensure that \(\sup_{k} \operatorname{\sf E}(|Y_k|^m) < \infty\) as desired.

The proof of the necessary condition relies on the fact that the non-homogeneous MJLS \(\{Z_k\}_k\) is weakly stable only if the homogeneous MJLS obtained by setting \(w_k=0\) in \eqref{eq:mjls} is weakly stable. Therefore, we focus on the homogeneous setting wherein \(w_k=0\) for all \(k\) and let \(v_{k,i} = \mathbb{E}( |Z_{k+1}|^m 1_{\{A_{k} = \bar{a}_i}\} )\) for \(i=1,\dotsc,n\). By \eqref{eq:mjls} and the law of total probability, the vector \(v_k=(v_{k,1},\ldots,v_{k,n})^T\) evolves over time according to the linear system \[v_{k+1} = {{\mathrm{P}}}^{\mathsf{T}}{{\mathrm{A}}} v_{k}.\] Since \(\operatorname{\sf E}(|Z_{k}|^{m}) = \sum_{i=1}^{n}v_{k,i} \), we conclude that \eqref{eq:weak} can hold only if the above linear system is stable and therefore only if \(\rho({{\mathrm{P}}}^{\mathsf{T}}{{\mathrm{A}}}) \leq 1\), as claimed.

Theorem \ref{lemmamjls} extends the well known conditions for second moment stability given in \cite{Costa-et-al-2004} to \(m\)-th moment stability. A similar result appears in \cite[Theorem 3.2]{fang--etal1994}, limited to the special case of a homogeneous MJLS driven by an i.i.d. rate process.

Moment Stabilization over Markov Channels

\label{sec:scalar} Results on the stability of MJLS are used to characterize the stability of linear dynamical systems where the estimated state is sent to the controller over a digital communication link whose state is described by a Markov process, as depicted in Fig. \ref{fig:scheme}.

[c]Controller [c]Plant [c]Estimator [c]Encoder [c]Decoder [c]Channel state [c]\(R_k\) [c]information [c]

\label{fig:scheme}

System model

Consider the scalar dynamical system \[\begin{aligned} \label{eq:scalsys} x_{k+1} & =\lambda x_k+u_k+v_k, \nonumber \\ y_k &=x_k+w_k,\end{aligned}\] where \(k \in \mathbb{N}\), and \({|\lambda |}\geq1\). The variable \(x_k\) represents the state of the system, \(u_k\) the control input, \(v_k\) is an additive stochastic disturbance, \(y_k\) is the sensor measurement, and \(w_k\) is the measurement noise. Both disturbance and noise are independent of each other and of the initial condition \(x_0\). They are also independent of the channel state, as defined below.

Channel Model

The state observer is connected to the actuator through a noiseless digital communication link that at each time \(k\) allows transmission without errors of \(R_k\) bits. The rate process \(\{R_k\}_{k\geq 0}\) is modeled as a homogeneous positive-recurrent Markov process defined on the finite set \[\begin{aligned} \label{eq:support} \mathcal{R} = \{\bar{r}_1,\ldots,\bar{r}_n\},\end{aligned}\] for some integer numbers \(0 \le \bar{r}_1 < \dots < \bar{r}_n\), and with one-step transition probability matrix \({{\mathrm{P}}}\) having entries \[\begin{aligned} \label{eq:P} p_{ij} = { {\sf P}\{R_k = \bar{r}_j| R_{k-1} = \bar{r}_i\}}\end{aligned}\] for every \(i,j \in \{1,\dotsc,n\}\). In the sequel, we define \({{\mathrm{R}}} \in \mathbb{Z}_+^{n\times n}\) as the diagonal matrix with diagonal entries \( \bar{r}_1 , \dots , \bar{r}_n\), i.e., \[\begin{aligned} \label{eq:Rdiag} {{\mathrm{R}}} = \text{diag}(\bar{r}_1 , \dots , \bar{r}_n).\end{aligned}\] Encoder and decoder are supposed to have causal knowledge of the rate process.

This noiseless digital link corresponds to a discrete-memoryless channel with Markov state available causally at both the encoder and the decoder. A channel with state is defined by a triple \((\mathcal{X}\times \mathcal{S},p(y|x,s),\mathcal{Y})\) consisting of an input set \(\mathcal{X}\), a state set \(\mathcal{S}\), an output set \(\mathcal{Y}\), and a transition probability matrix \(p(y|x)\) for every \(x \in \mathcal{X}\), \(s \in \mathcal{S}\), and \(y \in \mathcal{Y}\). This channel is memoryless if the output \(y_k\) at time \(k\) is conditionally independent of everything else given \((x_k,s_k)\). The state sequence is Markov if \(S_0,S_1,\dotsc\) forms a Markov chain. According to these definitions, our channel model is a discrete-memoryless channel with Markov state \((\mathcal{X}\times \mathcal{S},p(y|x,s),\mathcal{Y})\) with \(\mathcal{X}=\mathcal{Y}=\{1,\dotsc,\bar{r}_n\}\), \(\mathcal{S} = \{\bar{r}_1,\cdots,\bar{r}_n\}\), \[\begin{aligned} p(y|x,s) = \begin{cases} 1 & x=y \text{ and } x \le s \\ 0. & \text{otherwise} \end{cases}\end{aligned}\] and state transition probabilities \[\begin{aligned} p(s_{k+1}= \bar{r}_j|s_k = \bar{r}_i) = p_{ij}.\end{aligned}\]

The Shannon capacity of this channel is \cite{GoldsmithVaraiya1996} \[\begin{aligned} C = \sum_{i=1}^n \pi_i \bar{r}_i, \label{eq:shanc}\end{aligned}\] where \((\pi_1,\dotsc,\pi_n)\) denotes the unique stationary distribution of \({{\mathrm{P}}}\).

The zero-error capacity of this channel is \cite{Shannon1956} \[\begin{aligned} C_{0} = \bar{r}_1. \label{eq:shanz}\end{aligned}\]

Next, we show that the capacities in (\ref{eq:shanc}) and (\ref{eq:shanz}) are the limiting values of a stability threshold function indicating the channel’s rate-reliability constraint required to achieve a given level of stabilization. As \(m \rightarrow \infty\) and the system is highly stable, then the stability threshold function tends to the zero-error capacity that has a hard reliability constraint of providing no decoding error. Conversely, as \(m \rightarrow 0\) and the system’s stability level decreases, then the stability threshold function tends to the Shannon capacity that has a soft reliability constraint of vanishing probability of error.

Stability threshold function

The system \eqref{eq:scalsys} is \(m\)-th moment stable if \[\label{eq:sta} \sup_k \operatorname{\sf E}[|X_k|^m]<\infty,\] where the expectation is taken with respect to the random initial condition \(x_0\), the additive disturbance \(v_k\), and the rate process \(R_k\). The following Theorem establishes the equivalence between the \(m\)-th moment stability of \eqref{eq:scalsys} and the weak moment stability of a suitably defined MJLS.

\label{thm1} There exists a control scheme that stabilizes the scalar system \eqref{eq:scalsys} in \(m\)-th momemt sense if and only if the scalar non-homogeneous MJLS with dynamics \[\label{eq:mjls2} z_{k+1}=\frac{|\lambda|}{2^{m r_k}} z_k+ \emph{const},\] for some suitably defined constant \(\emph{const}\) is weakly \(m\)-th moment stable, i.e., if and only if \[\log |\lambda | \lesssim -\frac{1}{m}\log \rho({{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}}) \triangleq R(m). \label{eq:Rm}\]

The proof is given in the appendix assuming the disturbance has bounded support. This assumption is made for ease of presentation and to compare our results to the ones on the anytime capacity that only apply to plants with bounded disturbance \cite{Sahai2001}. The extension to unbounded disturbance can be easily obtained using standard, but more technical, adaptive encoding schemes described in \cite{Nair-Evans-2004,Minero-etal-2009,Coviello-etal2012}. We mention several properties of the threshold function \(R(m)\), whose proofs are given in the appendix.

\label{prop1} The following holds:

  1. Monotonicity: \(R(m)\) is continuous and strictly decreasing for \(m > 0\).

  2. Convergence to the Shannon capacity: \[\label{eq:fact2} \lim_{m \to 0} R(m) = \sum_{i=1}^n \pi_i \bar{r}_i = C.\]

  3. Convergence to the Zero Error capacity: \[\label{eq:fact3a} R(m) \sim \bar{r}_1 - \frac{1}{m} \log p_{11}, \quad \mbox{as } m \to \infty,\] and hence \[\label{eq:fact3b} \lim_{m\to \infty} R(m) = \bar{r}_1 = C_0.\]

  4. Sensitivity with respect to self-loop probabilities: \[\label{eq:fact4} \frac{d R(m)}{ d p_{ii}} = - \frac{2^{-m \bar{r}_{ii}}}{m \rho({{\mathrm{P}}}^{\mathsf{T}}2^{-m{{\mathrm{R}}}})} \; \frac{| {{\mathrm{D}}}(1) |}{\sum_{i=1}^n |{{\mathrm{D}}}(i)|} < 0,\] where \({{\mathrm{D}}}:=\rho({{\mathrm{P}}}^{\mathsf{T}}2^{-m{{\mathrm{R}}}}) {{\mathrm{I}}} - {{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}}\), \({{\mathrm{I}}}\) denotes the \(n\times n\) identity matrix, and \(|{{\mathrm{D}}}(i)|\) is the determinant of the matrix obtained by eliminating the \(i\)th row and the \(i\)th column from \({{\mathrm{D}}}\). We also have the asymptotic behavior \[\label{eq:fact4b} \frac{d R(m)}{ d p_{11}} \sim - \frac{1}{m p_{11} \ln(2)} \mbox{ as } m \to \infty.\]

  5. The function \(mR(m)\) is nonnegative, strictly increasing, and strictly concave. If \(\bar{r}_1 \neq 0\), then \(mR(m)\) grows unbounded as \(m\to \infty\). If instead \(\bar{r}_1 =0\), then \[\lim_{m \to \infty}mR(m) = - \log p_{11}.\]

Anytime Capacity of Markov Channels

We relate the stability threshold function \(R(m)\) to the anytime capacity. For the given Markov channel, \(R(m)\) provides a parametric representation of the anytime capacity in terms of system’s stability level \(m\).

The anytime capacity is defined in the following context \cite{Sahai-Mitter2006}. Consider a system for information transmission that allows the decoding time to be infinite, and improves the reliability of the estimated message as time progresses. More precisely, at each step \(k\) in the evolution of the plant a new message \(m_k\) of \(r\) bits is generated that must be sent over the channel. The coder sends a bit over the channel at each time \(k\) and the decoder upon reception of the new bit updates the estimates for all messages up to time \(k\). It follows that at time \(k\) messages \[m_0,m_1,\dotsc,m_k \nonumber\] are considered for estimation, while estimates \[\hat{m}_{0|k},\hat{m}_{1|k},\dotsc,\hat{m}_{k|k} \nonumber\] are constructed, given all the bits received up to time \(k\). Hence, the processing operation for any message \(m_i\) continues indefinitely for all \(k \ge i \). A reliability level \(\alpha\) is achieved in the given transmission system if for all \(k\) the probability that there exists at least a message in the past whose estimate is incorrect decreases \(\alpha\)-exponentially with the number of bits received, namely for all \(d \leq k\) \[\label{eq:relia} { {\sf P}\{ (\hat{M}_{0|k},\dotsc,\hat{M}_{d|k}) \ne (M_0,\dotsc,M_d) \}} = O( 2^{-\alpha d }) .\] The described communication system is characterized by a rate-reliability pair \((r, \alpha)\). The work in \cite{Sahai-Mitter2006} has shown that for scalar systems the ability to achieve stability depends on the ability to construct such a communication system, in terms of achievable coding and decoding schemes, with a given rate-reliability constraints.

To state this result in the context of our Markov channel, let the \(\alpha\)-anytime capacity \(C_A(\alpha)\) be the supremum of the rate \(r\) that can be achieved with reliability \(\alpha\). The problems of \(\alpha\)-reliable communication and \(m\)-th moment stabilization of a scalar system over a Markov channel are then equivalent in the sense of the following theorem.

\label{thm2} The necessary and sufficient condition for \(m\)-th moment stabilization of a scalar system with bounded disturbances and in the presence of channel output feedback over a Markov channel is \[\log |\lambda| \lesssim C_A(m \log |\lambda|). \label{anytime2a}\]

The anytime capacity is an intermediate notion between the zero-error capacity and the Shannon capacity. The zero-error capacity requires transmission without error. The Shannon capacity requires the decoding error to tend to zero by increasing the length of the code. In the presence of disturbances, only a critical value of the zero-error capacity can guarantee the almost sure stability of the system \cite{matveed-savkin2007b}. On the other hand, for scalar systems in the presence of bounded disturbances, a critical value of the anytime capacity can guarantee the ability to stabilize the system in the weaker \(m\)-th moment sense.

By combining Theorem \ref{thm1} and Theorem \ref{thm2}, we obtain the following result.

The following holds:

  1. Parametric characterization of the anytime capacity: For every \(m >0\), the anytime capacity \(C_A\) satisfies \[C_A\bigl(m R(m) \bigr) = R(m), \label{anytime2}\] i.e., for every \(\alpha \ge 0\), there exists a unique \(m(\alpha)\) such that \[\label{eq:par} m(\alpha) R\bigl( m(\alpha) \bigr) = \alpha\] and \[\label{anytime2b} C_A(\alpha) = R\bigl( m(\alpha) \bigr) = \frac{\alpha}{ m(\alpha)}.\]

  2. \(C_A(\alpha)\) is a strictly decreasing function function of \(\alpha > 0\).

  3. Convergence to the Shannon capacity: \[\label{eq:fact2a} \lim_{\alpha \to 0} C_A(\alpha) = \sum_{i=1}^n \pi_i \bar{r}_i = C,\]

  4. Convergence to the Zero Error capacity: If \(\bar{r}_1 = 0\), then for every \(\alpha \ge \log(1/p_{11})\) \[\label{eq:fact3a1} C_A(\alpha) = 0 = C_0.\] Conversely, if \(\bar{r}_1 \neq 0\), then \(C_A(\alpha)\) has unbounded support and \[\label{eq:fact3a2} C_A(\alpha) \sim \bar{r}_1 \; \frac{\alpha}{\alpha- \log(1/p_{11}) } , \quad \text{ as } \alpha \to \infty,\] hence \[\label{eq:fact3b1} \lim_{\alpha \to \infty} C_A( \alpha) = \bar{r}_1 = C_0.\]

By Theorem \ref{thm1} and Theorem \ref{thm2}, at the boundaries of the stability region we must have that \[\label{eq:any1} \log |\lambda| = R(m).\] and \[\label{eq:any2} \log |\lambda| = C(m \log |\lambda|).\] Therefore, \eqref{anytime2} follows by combining \eqref{eq:any1} and \eqref{eq:any2}. Next, by Proposition \ref{prop1}, the function \(\phi(m)=mR(m)\) is increasing and strictly concave, thus invertible. It follows that for every \(\alpha \ge 0\), there exists a unique \(m := m(\alpha)\) such that \[m R(m) = \alpha.\] Substituting this equality into \eqref{anytime2}, it follows that \(C_A(\alpha) = R\bigl( m(\alpha) \bigr).\) as claimed. The remaining properties are immediate consequences of Proposition \ref{prop1}. By definition, \(C_A(\alpha)\) is non increasing in \(\alpha\). Since both \(mR(m)\) and \(R(m)\) are both monotonic function of \(m\), however, it follows that \(C_A(\alpha)\) must be strictly decreasing in \(\alpha\), thus establishing 2). Property 3) follows from \eqref{anytime2} combined with the fact that \(mR(m) \to 0 \) and \(R(m) \to C\) as \(m \to 0\). Similarly, property 4) follow directly by combining \eqref{anytime2} with properties 3) and 5) in Proposition \ref{prop1}.

We now give some representative examples of the stability threshold function, visually showing its extremal properties and its relationship to the anytime capacity.

\label{ex1} Let \(n=4\),\(\mathcal{R}=\{1,3,4,5\}\) and \({{\mathrm{P}}}\) be is a \(4 \times 4\) circulant matrix with first row equal to \(\frac{1}{16}\left(1,13,1,1\right)\), namely \[\label{eq:exP} P= \frac{1}{16} \begin{bmatrix} 1 & 13 & 1 & 1 \\[0.3em] 13 & 1 & 1 & 1 \\[0.3em] 1 & 1 & 1 & 13 \\[0.3em] 1 & 1 & 13 & 1 \end{bmatrix}.\] In this case it is easy to compute \(C= \frac{1}{4}(1+3+4+5) = \frac{13}{4}\) and \(C_0=1\). Figure \ref{fig:t2} plots the stability threshold function \(R(m)\) (together with its asymptotic approximation) and the anytime capacity \(C_A(\alpha)\). Both curves have the same shape and they are in fact related by an affine transformation as \(m\) grows. Furthermore, both curves have unbounded support and tend to one at infinity. There is a change of convexity for small values of \(m\) and \(\alpha\), as indication that \(R(m)\) and \(C_A(\alpha)\) are generally not convex functions. In contrast, the function \(\phi(m)=m R(m)\), reported in red in the top plot of Figure \ref{fig:t2}, is strictly convex and increasing.

\label{fig:t2}

\label{ex2} Let \(\mathcal{R}=\{0,3,4,5\}\) and \({{\mathrm{P}}}\) is as in \eqref{eq:exP}. The only difference with the previous example is that \(\bar{r}_1\) is now 0 instead than 1. In this case it is easy to compute \(C= \frac{1}{4}(0+3+4+5) = 3\) and \(C_0=0\). Figure \ref{fig:t1} plots the stability threshold function \(R(m)\) (together with its asymptotic approximation) and the anytime capacity \(C_A(\alpha)\). In this case, while \(R(m)\) has unbounded support, \(C_A(\alpha)\) is zero for all \(\alpha \ge - \log p_{11} = \log 16 = 4\). This occurs because the function \(\phi(m)=m R(m)\) saturates as \(m \rightarrow \infty\), tending to the limiting value \(- \log p_{11}=4\).

\label{fig:t1}

When viewed together, the two examples above show that for some channels, like the one in Example \ref{ex2}, a communication system with an arbitrary rate-reliability pair \((r,\alpha)\) cannot be constructed, because the anytime capacity may have bounded support and tend abruptly to zero. However, in order to achieve \(m\)-th moment stabilization it is sufficient to consider the parametric function \(R(m)=C_A(m R(m))\), and construct a communication system whose reliability level depends on the desired stabilization level. It follows that we do not need to compute the whole anytime capacity if we are interested only in moment stabilization, and we may be content with determining the threshold function \(R(m)\) corresponding to its parametric representation. The extremal properties of \(R(m)\) determine the support of the anytime capacity corresponding to the achievable reliability level \(\alpha\). If \(R(m)=O(1/m)\) then the anytime capacity has support bounded by the pre-constant of the asymptotic order. On the other hand, if \(R(m)\) decreases at most sub-linearly to zero, or it tends to a constant zero-error capacity, then the anytime capacity has unbounded support and any reliability level \(\alpha\) can be achieved.

The Markov Erasure Channel

We use the stability threshold function to characterize the anytime capacity of the Markov erasure channel. The Markov erasure channel corresponds to a two-state Markov process where \(n=2\) \(\mathcal{R}=\{0,\bar{r}\}\), \(p_{12} = q\), and \(p_{21}=p\), where \(0<p,q<1\). In this case, \[\begin{aligned} \label{eq:bec} {{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}}= \begin{pmatrix} (1-q) & \frac{1}{2^{m\bar{r}}}p\\ q & \frac{1}{2^{m\bar{r}}}(1-p)\\ \end{pmatrix},\end{aligned}\] and we have the following result.

The anytime capacity of the Markov Erasure Channel is \[\label{eq:anyMarkov} C_{A}(\alpha) = \frac{ \alpha \bar{r}}{ \alpha + \log_2 \left(\frac{ 1-p - 2^{\alpha}(1-p-q)}{1-(1-q)2^{\alpha}}\right) },\] if \(0\le \alpha < -\log_2 (1-q) \), and 0 otherwise.

By \eqref{eq:par} and the definition of \(R(m)\), for every \(\alpha\) there exists an \(m(\alpha)\) such that \[\label{eq:bec2a} \rho({{\mathrm{P}}}^{\mathsf{T}}2^{-m(\alpha) {{\mathrm{R}}}}) = 2^{-\alpha}\] In the case of a Markov erasure channel, a simple calculation shows that \[\label{eq:bec2} \rho({{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}})=\frac{\text{tr}}{2}+\frac{1}{2}\sqrt{\text{tr}^2-4\text{det}},\] where \(\text{tr}\) and \(\text{det}\) denote the trace and determinant of \eqref{eq:bec}, respectively. By combining \eqref{eq:bec2a} and \eqref{eq:bec2} and squaring both sides of the resulting equation, it follows that \(m(\alpha)\) must satisfy \[\label{eq:bec3} 2^{-\alpha} \text{tr} - 4 \text{det} = 2^{-\alpha+1},\] where \(\text{tr} = (1-q) +2^{-m(\alpha) \bar{r}}(1-p)\) and \(\text{det} = 2^{-m(\alpha) \bar{r}} (1-p-q)\). Solving \eqref{eq:bec3} yields \[\label{bec4} m(\alpha) = \frac{1}{ \bar{r}} \left( \alpha + \log_2 \left(\frac{ 1-p - 2^{\alpha}(1-p-q)}{1-(1-q)2^{\alpha}}\right) \right)\] for \(0\le \alpha < -\log_2 (1-q) \). Finally, substituting \eqref{bec4} into \eqref{anytime2b} yields \eqref{eq:anyMarkov}.

Special cases

Several special cases recover previous results in the literature. By (\ref{eq:anyMarkov}) it follows that the anytime capacity of the binary erasure channel (BEC) with Markov erasures and with noiseless channel output feedback is \[C_{A}(\alpha) = \frac{ \alpha}{ \alpha + \log_2 \left(\frac{ 1-p - 2^{\alpha}(1-p-q)}{1-(1-q)2^{\alpha}}\right) }.\]

By letting \(q=1-p\), the erasure process becomes i.i.d. and we recover the anytime capacity of the memoryless BEC with erasure probability \(p\) derived by Sahai \cite[page 129]{Sahai2001} (in parametric form) and by Xu \cite[Theorem 1.3]{Xu2005} (in non-parametric form) \[\label{eq:xu} C_{A}(\alpha) = \frac{ \alpha }{ \alpha + \log_2 \left(\frac{ 1-p }{1-p2^{\alpha}}\right) }.\]

By (\ref{eq:anyMarkov}), letting \(\alpha \rightarrow 0\), we have that \[\lim_{\alpha \to 0} C_{A}(\alpha) = \frac{q}{p+q}\bar{r}= C,\] This recovers the Shannon capacity of an \(\bar{r}\)-bit erasure channel with Markov erasures and with noiseless channel output feedback.

In the case \(n=2\), \(\bar{r}_1=0\), \(\bar{r}_2=r\), and an i.i.d. rate process with \({ {\sf P}\{R_k = 0\}}= p_1\) and \(P\{R_k = r\}= p_2\) for all \(k\), then the stability condition becomes \[\begin{aligned} {|\lambda |}^m \left( p_1 + p_2 2^{-mr } \right)<1,\end{aligned}\] which provides a converse to the achievable scheme of Yüksel and Meyn \cite[Theorem 3.3]{Yuksel--Meyn2013}.

If we further let \(r~\to~\infty\), then the stability condition \(p_1>1/{|\lambda |}^m\) only depends on the erasure rate of the channel. In this case, our condition generalizes the packet loss model result in \cite{Gupta-etal-2007}.

Memoryless Channels

We show that the stability threshold function can be used to determine the anytime capacity of an arbitrary memoryless channel of given rate distribution and provide three relevant examples of this computation. Beside the memoryless erasure channel and the additive white Gaussian noise channel with input power constraint, these are the only cases where the anytime capacity has been computed.

Consider the special case of an i.i.d. rate process \(R_k\) where \(R_k \sim R\) has probability mass function \(p_i={ {\sf P}\{R=\bar{r}_i\}}\), \(\bar{r}_i \in \mathcal{R}\). For \(t\) real, let \(M_R(t)= \mathbb{E}( e^{tR})\) denote the moment generating function of \(R\) and let \(M_R^{-1}(y)\) denotes the inverse of the \(M_R(t)\), if it exists, i.e., \(M_R^{-1}(y) = t \) if and only if \(M_R(t)=y\).

We have the following result.

\label{thm:memo} The anytime capacity of a memoryless channel with rate distribution \(R\) is \[C_A(\alpha) = \frac{\ln 2^{-\alpha} }{M_R^{-1}( 2^{- \alpha})} \label{eq:mmf3}\] for \(\alpha < -\log p_{11}\) if \(\bar{r}_0 = 0\), or for any \(\alpha > 0 \) if \(\bar{r}_0 \neq 0\).

In the special case of an i.i.d. rate process, \[{{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}} =\bigl(p_1,\ldots,p_n \bigr)^\mathrm{T} \bigl(2^{-m \bar{r}_1},\dotsc,2^{-m \bar{r}_n}\bigr)\] is a rank-one matrix whose only nonzero eigenvalue is \(\sum_{i=1}^n p_i 2^{-m \bar{r}_i} = \mathbb{E}( 2^{-mR})\). Therefore, in this case \[\begin{aligned} R(m) & = -\frac{1}{m}\log \mathbb{E}( 2^{-mR}) \nonumber \\ & = -\frac{1}{m}\log M_R( -m \ln 2 ). \label{eq:mmf}\end{aligned}\] By combining \eqref{eq:par} and \eqref{eq:mmf}, we find that \(\alpha = -\log M_R( -m(\alpha) \ln 2 )\) and so \[\begin{aligned} m(\alpha) = -\frac{1}{\ln 2 } M_R^{-1}( 2^{- \alpha}), \label{eq:mmf2}\end{aligned}\] where \(M_R^{-1}(y)\) exists in light of property 5) in Proposition \ref{prop1}. Thus, by \eqref{anytime2b}, \eqref{eq:mmf3} follows.

Theorem \ref{thm:memo} shows that in the case the channel is memoryless, the anytime capacity can be evaluated by computing the inverse of the moment generating function of \(R\), as illustrated in the next three examples.

\label{ex4a} Suppose that \(R\) is a binomial random variable with parameters \(k\) and \(1-p\). Then, \[M_R(t) = ( p + (1-p) e^{t})^k\] and \[M_R^{-1}(y) = \ln \frac{y^{1/k} - p}{ 1-p }, \quad y < p^k,\] and thus by \eqref{eq:mmf3} \[\begin{aligned} \label{eq:xu2} C_A(\alpha) = \frac{ \alpha }{ \alpha/k + \log_2 \left(\frac{ 1-p }{1-p2^{\alpha/k}}\right) }\end{aligned}\] for \(\alpha < - k \log p\). Notice that  \eqref{eq:xu2} recovers \eqref{eq:xu} in the special case \(k=1\) in which \(R\) is Bernoulli with parameter \(1-p\).

\label{ex4b} Suppose that \(R\) is a Poisson random variable with parameter \(\lambda\). Then, \[M_R(t) = e^{\lambda (e^{t}-1)}\] and \[M_R^{-1}(y) = \ln (1 + 1/\lambda \ln y), \quad y > 0,\] and thus by \eqref{eq:mmf3} \[\begin{aligned} C_A(\alpha) = -\frac{ \alpha }{ \log( 1 -\alpha/\lambda \ln 2 ) }\end{aligned}\] for \(\alpha < - \lambda/\ln 2 \).

Notice in both examples \(\bar{r}_1=0\), so the anytime capacity has bounded support. In the next example \(\bar{r}_1=1\) and thus the anytime capacity is defined for all \(\alpha>0\).

\label{ex4c} Suppose that \(R\) is a geometric random variable with parameter \(p\). Then, \[M_R(t) = \frac{p e^{t}}{ 1- (1-p)e^{t} }\] for \(t < - \ln(1-t)\) and \[M_R^{-1}(y) = \ln \frac{y}{ p + y(1-p) }, \quad y > 0,\] for \(y>0\), and thus \[\begin{aligned} C_A(\alpha) = \frac{\alpha }{ \log\bigl( (1-p) + p 2^{\alpha} \bigr) }, \quad \alpha > 0.\end{aligned}\]

Plots of the anytime capacity for different memoryless channels are given in Fig. \ref{fig:t3}. Plots for the Binomial, Poisson, and Geometric distributions correspond to the results in Examples \ref{ex4a}, \ref{ex4b}, and \ref{ex4c}. The plot for the uniform distribution is obtained by numerically inverting the moment generating function of the rate process.

Comparison of the anytime capacity for different memoryless channels. For the uniform distribution the plot is obtained numerically.

\label{fig:t3}

Conclusion

We presented a parametric characterization of the anytime capacity in terms of the threshold function for \(m\)-th moment stabilization of a scalar linear system controlled over a Markov time-varying digital feedback channel that depends on \(m\) and on the channel’s parameters. This parametrization allows an explicit computation of the anytime capacity of the \(r\)-bit Markov erasure channel, and of Binomial, Poisson, and Geometric memoryless rate processes. It also provides a general strategy to compute the anytime capacity of arbitrary memoryless channels of given rate distribution, based on the inversion of the moment generating function of the process. The operational interpretation of the stability threshold function is that of achievable communication rate, subject to a reliability constraint, and it has as extreme values the Shannon capacity, corresponding to a soft reliability constraint, and the zero-error capacity, corresponding to hard reliability. It follows that a more stringent stability requirement for the system, expressed in terms of higher moment, translates in a harder reliability constraint for communication.

Our technical derivation relies on some extensions of the theory of MJLS. We provide the generalization of the necessary and sufficient conditions for second moment stability of MJLS to \(m\)-th moment stability. A commonly encountered notion of stochastic stability in the MJLS literature \cite{Costa-et-al-2004} is the one of strong moment stability, according to which the MJLS \eqref{eq:mjls} is stable if there exists a finite \(\alpha_m \in \mathbb{R}\) such that \( \operatorname{\sf E}[ |Z|^m ] \to \alpha_m, \text{ as } m \to \infty. \) Clearly, if a MJLS is strongly stable, then it is also weakly stable according to our definition. It can be verified that our proof of the necessary condition in Theorem \ref{lemmamjls} extends to the case of strong moment stability, thereby establishing the necessity of \eqref{eq:pp} for the strong moment stability of \eqref{eq:mjls}. In contrast, the subsampling technique used in the direct part of Theorem \ref{lemmamjls} is not suitable to prove strong moment stability because it cannot prevent undesirable fluctuations in the system dynamics between any two sampling periods. In \cite{minero-etal-2013} we presented a different proof technique based on the binomial expansion of \((a_k z_k+w_k)^m\) which can be used to prove that \eqref{eq:pp} is a sufficient condition for the strong moment stability of \eqref{eq:mjls}. This approach however requires \(m\) to be an integer number, as opposed to the subsampling technique which can be applied to any \(m>0\).

Finally, there is no difficulty, sic et simpliciter, but at the expense of a more technical treatment, to extend our results to vector systems using bit-allocation techniques outlined in \cite{Minero-etal-2009,Coviello-etal2012}, and to disturbances of unbounded support using standard adaptive schemes \cite{Nair-Evans-2004,Minero-etal-2009,Coviello-etal2012}.

Auxiliary Results

We state a series of auxiliary results that are needed in the rest of the paper. We begin with a maximum entropy result which generalizes the well-know result that the second moment of a continuous random variable \(X\) is lower bounded by its entropy power \(e^{2h(X)}\) to the case of the \(L^m\)-norm of a continuous random vector.

\label{lem:vme} Let \({X}\) be a continuous \(n\)-dimensional vector-valued random variable with \(\mathbb{E}[||{{X}}||_m^m]<\infty\), where \(m\) is a positive real number. Let \(\Vert { X} \Vert_m = (\sum_i |X_i|^m)^{1/m}\) denote the \(L^m\)-norm of \({ X}\). Then, \[\label{ref:eqqq} \mathbb{E}[||{{X}}||_m^m] \ge \frac{n}{c_m} e^{ \frac{ m }{ n } h( { X} )}\] where \(c_m = 2^m m^{1-m} e \left(\Gamma(\frac{1}{m})\right)^m\) and \(\Gamma(x) = \int_{0}^{\infty} e^{-t}t^{x-1}dt\). Equality holds if and only if \(X_1, \cdots, X_n\) are i.i.d \(\sim X\), where \[\label{eq:unde} f_{X}(x)=\frac{\exp\left(-\frac{|x|^m}{m \mu^m}\right)}{2 m^{(1-m)/m} \mu \Gamma\left(\frac{1}{m}\right)}, \quad x \in \mathbb{R}.\]

First, notice that by the maximum entropy theorem \cite[Theorem 12.1.1]{Cover-Thomas2006}, the unique density \(f_X(x)\) that maximizes the differential entropy \(h(X)\) over all probability densities \(f\) having \(m\)-th moment equal to \(|X|^m\) is given in \eqref{eq:unde}. Notice that if \(X \sim f_X(x)\), then \(h(X) = \frac{1}{m} \log[c_m \mathbb{E}[|X|^m]]\). Next, by using the fact that conditioning reduces the entropy, \[\begin{aligned} h({{X}}) &\leq \sum_{i=1}^{n} h(X_i) \\ &\leq \sum_{i=1}^{n} \frac{1}{m} \log[c_m \mathbb{E}[|X_i|^m]]\\ &\leq \frac{n}{m} \log\left[ { \frac{c_m}{n} \sum_{i=1}^{n} \mathbb{E}[|X_i|^m] } \right] \\ &= \frac{n}{m} \log \left[{ \frac{c_m}{n} \mathbb{E}[||{{X}}||_m^m] }\right]\end{aligned}\] where the second inequality follows from the maximum entropy theorem applied to each random variable \(X_i\), while the last inequality follows from Jensen’s inequality.

It should be remarked that inequality \eqref{ref:eqqq} with \(n=1\) is a special case of a result proved in \cite{lutwav-etal2005} relating the \(m\)-th moment and the Renyi entropy of a random variable.

Next, we state a known result on the log-convexity of the spectral radius of a nonnegative matrix which is needed to prove some of the properties of \(R(m)\).

Let \(\mathcal{D}_n\) be the set of \(n\times n\) real-valued diagonal matrices. Let \({{\mathrm{A}}}\) be a fixed \(n\times n\) non-negative matrix having a positive spectral radius. Define \(\phi: \mathcal{D}_n \to \mathbb{R}\) by \(\phi({{\mathrm{D}}}) := \log \rho(e^{{{\mathrm{D}}}}{{\mathrm{A}}})\). Then \(\phi({{\mathrm{D}}})\) is a convex functional on \(\mathcal{D}_n\). Specifically: for every \({{\mathrm{D}}}_1, {{\mathrm{D}}}_2 \in \mathcal{D}_n\) and \(\alpha \in (0,1)\), \[\label{eq:Friedland} \phi( \alpha {{\mathrm{D}}}_1 + (1-\alpha){{\mathrm{D}}}_2) \le \alpha \phi({{\mathrm{D}}}_1) + (1-\alpha) \phi({{\mathrm{D}}}_2).\] Moreover, if \({{\mathrm{A}}}\) is irreducible and the diagonal entries of \({{\mathrm{A}}}\) are positive (or \({{\mathrm{A}}}\) is fully indecomposable) then equality holds in \eqref{eq:Friedland} if and only if \[\label{eq:Friedland2} {{\mathrm{D}}}_1 - {{\mathrm{D}}}_2 = c {{\mathrm{I}}}\] for some \(c \in \mathbb{R}\), where \({{\mathrm{I}}}\) is the identity matrix. \label{thm:Friedland}

Next, we state a result on the derivative of the spectral radius as a function of non-negative matrix elements.

\label{lem:cohen} Let \({{\mathrm{A}}}\) be a fixed \(n\times n\) non-negative matrix having a positive spectral radius. Define \({{\mathrm{D}}}:= \rho({{\mathrm{A}}}) {{\mathrm{I}}} - {{\mathrm{A}}}\), where \({{\mathrm{I}}}\) denotes the \(n\times n\) identity matrix. Then, \[\label{eq:Cohen} 0< \frac{d \rho({{\mathrm{A}}})}{d a_{11}} = \frac{| {{\mathrm{D}}}(1) |}{\sum_{i=1}^n |{{\mathrm{D}}}(i)|} < 1\] where \({{\mathrm{D}}}:=\rho({{\mathrm{A}}}) {{\mathrm{I}}} - {{\mathrm{A}}}\), \({{\mathrm{I}}}\) denotes the \(n\times n\) identity matrix, and \(|{{\mathrm{D}}}(i)|\) is the determinant of the matrix obtained by eliminating the \(i\)th row and the \(i\)th column from \({{\mathrm{D}}}\).

Finally, we show a result on the asymptotic behavior of \({{\mathrm{P}}}^{\mathsf{T}}2^{-mR}\) as \(m \to \infty\).

\label{eq:lim1} The following equality holds \[\lim_{m \to 0}({{\mathrm{P}}}^{\mathsf{T}}2^{-mR})^{\frac{1}{m}} = \lim_{m \to 0} \left( \begin{bmatrix} \pi_1 2^{-m \bar{r}_1} & \cdots & \pi_1 2^{-m \bar{r}_n} \\ \vdots & \vdots & \vdots \\ \pi_n 2^{-m \bar{r}_1} & \cdots & \pi_n 2^{-m \bar{r}_n} \end{bmatrix} \right)^{\frac{1}{m}}\]

Let \(1/m = k\). By the monotonicity property of \(R(m)\), it is sufficient to prove the claim for \(k\) integer. Therefore, let \[{{\mathrm{A}}} := \begin{bmatrix} p_{11} 2^{-\bar{r}_1/k} & \cdots & p_{1n} 2^{-\bar{r}_1/k} \\ \vdots & \vdots & \vdots \\ p_{n1} 2^{-\bar{r}_n/k} & \cdots & p_{nn} 2^{-\bar{r}_n/k} \end{bmatrix} = \left({{\mathrm{P}}}^{\mathsf{T}}2^{-1/k R} \right)^{\mathsf{T}}\] and \[{{\mathrm{B}}} := \begin{bmatrix} \pi_1 2^{-\bar{r}_1/k} & \cdots & \pi_n 2^{-\bar{r}_1/k} \\ \vdots & \vdots & \vdots \\ \pi_1 2^{-\bar{r}_n/k} & \cdots & \pi_n 2^{-\bar{r}_n/k} \end{bmatrix} \] To establish the claim it suffices to show that \[\lim_{k \to \infty} {{\mathrm{A}}}^k = \lim_{k \to \infty} {{\mathrm{B}}}^k.\] To do so, we prove that \[\lim_{k \to \infty} [{{\mathrm{A}}}^k]_{i,j} = \lim_{k \to \infty} [{{\mathrm{B}}}^k]_{i,j}.\] Note that \[\begin{aligned} [{{\mathrm{A}}}^k]_{i,j} & = \sum_{l_1, \dots,l_{k-1}} \bigl( p_{i l_1} 2^{-i/k} \bigr) \cdots \bigl( p_{l_{k-1} j} 2^{-l_{k-1}/k} \bigr) \\ & = \sum_{l_1, \dots,l_{k-1}} \bigl( p_{i l_1} \cdots p_{l_{k-1}j} \bigr) 2^{ -(i+ \cdots + l_{k-1})/k} \end{aligned}\] and \[\begin{aligned} [{{\mathrm{B}}}^k]_{i,j} & = \sum_{l_1, \dots,l_{k-1}} \bigl( \pi_{l_1} 2^{-i/k} \bigr) \cdots \bigl( \pi_{l_{j} } 2^{-l_{k-1}/k} \bigr) \\ & = \sum_{l_1, \dots,l_{k-1}} \bigl( \pi_{ l_1} \cdots \pi_{l_{j}} \bigr) 2^{ -(i+ \cdots + l_{k-1})/k} \end{aligned}\] Then, we denoting the stationary distribution of \({{\mathrm{P}}}\) as \(\Pi = \lim_{k \to \infty} {{\mathrm{P}}}^k\) \[\begin{aligned} & \lim_{k \to \infty} ( [{{\mathrm{A}}}^k]_{i,j} -[{{\mathrm{B}}}^k]_{i,j} ) \\ & = \lim_{k \to \infty} \sum_{l_1, \dots,l_{k-1}} \bigl( p_{i l_1} \cdots p_{l_{k-1}j} - \pi_{ l_1} \cdots \pi_{l_{j}} \bigr) 2^{ -(i+ \cdots + l_{k-1})/k} \\ & \le \lim_{k \to \infty} \sum_{l_1, \dots,l_{k-1}} \bigl( p_{i l_1} \cdots p_{l_{k-1}j} - \pi_{ l_1} \cdots \pi_{l_{j}} \bigr) \\ & = \left( \left( \lim_{k \to \infty} \sum_{l_1, \dots,l_{k-1}} p_{i l_1} \cdots p_{l_{k-1}j}\right) \right. \\ & \qquad - \left. \left( \lim_{k \to \infty} \sum_{l_1, \dots,l_{k-1}} \pi_{ l_1} \cdots \pi_{l_{j}} \right) \right) \\ & = \left( \lim_{k \to \infty} [{{\mathrm{P}}}^k]_{i,j} - \lim_{k \to \infty} [{{\mathrm{\Pi}}}^k]_{i,j} \right) \\ & = 0.\end{aligned}\]

Proof of Theorem \ref{thm1}

Necessity. To establish the necessary condition, we prove that for every \(k=0,1,\ldots\), \[\label{eq:lbstate2} c_m \operatorname{\sf E}[\vert X_k\vert^m] \ge \operatorname{\sf E}[\vert Z_k\vert^m],\] where \(c_m\) is a constant defined in Lemma \ref{lem:vme} and \(\{z_k\}\) is a homogeneous MJLS with dynamics \[\begin{aligned} z_{k+1}=\frac{|\lambda|}{2^{r_k}} z_k \label{eq:zzzz}\end{aligned}\] and \(z_0=e^{h(X_0)}\).

Let \(S^k = \{S_0,\ldots,S_k \}\) denote the symbols transmitted over the digital link up to time \(k\). By the law of total expectation and Lemma \ref{lem:vme} , \[\begin{aligned} \operatorname{\sf E}( |X_{k+1}|^m ) & \ge \frac{1}{c_m}~\operatorname{\sf E}_{S^{k}} \bigl( e^{mh(X_{k+1}|S^{k}=s^{k}) }\bigr) \label{eq:in_0}.\end{aligned}\] It follows that the \(m\)-th moment of the state is lower bounded by the average entropy power of \(X_k\) conditional on \(S^{k}\). From the translation invariance property of the differential entropy, the conditional version of entropy power inequality \cite{El-Gamal--Kim2011}, and Assumption A2, it follows that \[\begin{aligned} & \operatorname{\sf E}_{S^{k}} \bigl( e^{ m h(X_{k+1} |S^{k}=s^{k}) }\bigr)\nonumber \\ & = \operatorname{\sf E}_{S^{k}} \bigl( e^{m h(\lambda X_{k} + \hat{x}(s^{k}) + V_k |S^{k}=s^{k}) }\bigr) \nonumber \\ & \ge \operatorname{\sf E}_{S^{k}} \left( \bigl( e^{2 h(\lambda X_{k} |S^{k}=s^{k}) } + e^{2 h(V_k)} \; \bigr)^{\frac{m}{2}} \right) \nonumber \\ & \ge \operatorname{\sf E}_{S^{k}} \left( e^{m h(\lambda X_{k} |S^{k}=s^{k}) } \right) \nonumber \\ & = |\lambda|^m \operatorname{\sf E}_{S^{k}} \bigl( e^{mh(X_{k} |S^{k}=s^{k}) } \bigr) \label{eq:ineq1}\end{aligned}\] We can further lower bound \eqref{eq:ineq1} as in Lemma 1 in \cite{Minero-etal-2009}, and obtain that for every \(k \geq 0\) \[\label{lem_min} \operatorname{\sf E}_{S^k|S^{k-1},R_k} \bigl( e^{ m h(X_k|S^{k}=s^{k})} \bigr) \geq \frac{1}{2^{mR_k}}e^{ m h(X_k|S^{k-1}=s^{k-1})},\] with \(S_{-1}=\emptyset\). By the tower rule of conditional expectation, it then follows that \[\begin{aligned} & \operatorname{\sf E}_{S^{k}} \bigl( e^{m h(X_{k} |S^{k}=s^{k}) } \bigr) \geq \nonumber \\ & \qquad \operatorname{\sf E}_{S^{k-1},R_k} \left( \frac{1}{2^{m R_k } } e^{m h(X_k|S^{k-1}=s^{k-1}) } \right). \label{eq:ineq2}\end{aligned}\] Combining \eqref{eq:ineq2} and \eqref{eq:ineq1} gives \[\begin{aligned} & \operatorname{\sf E}_{S^{k}} \left( e^{ m h(X_{k+1} |S^{k}=s^{k}) }\right) \nonumber \\ & \geq \operatorname{\sf E}_{R_k} \left( \frac{ |\lambda|^m }{2^{m R_k} } \operatorname{\sf E}_{S^{k-1}|R_k} \bigl( e^{m h(X_{k} |S^{k-1}=s^{k-1}) } \bigr) \right). \label{eq:in_1}\end{aligned}\] Following similar steps and using the Markov chain \(S^{k-1}\to ( S^{k-2},R_{k-1}) \to R_k\), we obtain \[\begin{aligned} & \operatorname{\sf E}_{S^{k-1}|R_k} \bigl( e^{m h(X_{k} |S^{k-1}=s^{k-1}) } \bigr) \nonumber \\ & \geq |\lambda|^m \operatorname{\sf E}_{S^{k-1}|R_k} \bigl( e^{m h(X_{k-1} |S^{k-1}=s^{k-1}) } \bigr) \nonumber \\ & \ge \operatorname{\sf E}_{S^{k-2},R_{k-1}|R_k} \left( \frac{|\lambda|^m}{2^{m R_{k-1}}} e^{m h(X_{k-1} |S^{k-2}=s^{k-2}) } \right) \nonumber \\ & = \operatorname{\sf E}_{R_{k-1}|R_k} \left( \frac{|\lambda|^m}{2^{2R_{k-1}}} \operatorname{\sf E}_{S^{k-2}|R_{k-1},R_k} \bigl[ e^{2h(X_{k-1} |S^{k-2}=s^{k-2}) } \bigr] \right). \label{eq:in_2} \] Substituting \eqref{eq:in_2} into \eqref{eq:in_1} and re-iterating \(k\) times, it follows that \( \operatorname{\sf E}_{S^{k}} \bigl( e^{ m h(X_{k+1} |S^{k}=s^{k}) }\bigr)\) is lower bounded by \[\begin{aligned} & \operatorname{\sf E}_{R_{k-1},R_k} \Bigl( \frac{|\lambda|^{m}}{2^{m(R_{k-1} + R_k )}} \nonumber \\ & \qquad \qquad \times \operatorname{\sf E}_{S^{k-2}|R_{k-1},R_k} \Bigl( e^{m h(X_{k-1} |S^{k-2}=s^{k-2}) } \Bigr) \Bigr) \nonumber \\ & \geq \operatorname{\sf E}_{R_{1},\dotsc,R_k} \left( \frac{|\lambda|^{mk} }{2^{m(R_{1} +\cdots + R_k )}} \operatorname{\sf E}_{S_1 |R_{1},\dotsc,R_k} \Bigl( e^{m h(X_{1} |S_0=s_0 ) } \Bigr) \right) \label{eq:in_3} \\ & = \operatorname{\sf E}\left( \tfrac{|\lambda|^{m(k+1)} }{2^{m(R_{1} +\cdots + R_k )}} \right) e^{mh(X_{0})} \label{eq:in_4}, \] where \eqref{eq:in_3} uses the fact that the initial condition of the state \(x_0\) is independent of the rate process \(R_k\). Let \(\{Z_k\}\) be a non-homogeneous MJLS with dynamics \[\begin{aligned} z_{k+1}=|\lambda|/2^{r_k} z_k, \label{eq:nec_mjls22}\end{aligned}\] with \(z_0=e^{h(X_0)}\). By taking the expectation on both sides of \eqref{eq:nec_mjls22} and iterating \(k\) times, it is easy to see that the right hand side of \eqref{eq:in_4} is equal to the \(m\)th moment of \(Z_{k+1}\). Hence, combining \eqref{eq:in_0}–\eqref{eq:in_4} we conclude that \(\operatorname{\sf E}(| X_k |^m) > \frac{1}{c_m}\operatorname{\sf E}( |Z_k|^m )\), which is the claim.

Sufficiency. Let \(\omega\) denote a uniform bound on the measure of the support of the initial condition, the noise, and the disturbance. Then, we claim that the plant dynamics can be bounded as follows \[\label{eq:in_} |x_k| \le z_k, \quad k=0,1,\ldots,\] where \(Z_k\) is a homogeneous MJLS with dynamics \[z_{k+1} = \frac{|\lambda|}{2^{R_k}}z_k + \omega, \quad k=0,1,\ldots\] and \(z_0 = \omega\). To see this, consider the following inductive proof. By assumption, \(|x_0|\leq \omega = z_0\). Assume that the claim holds for all times up to \(k\), so \(|x_{k}|\le z_{k}\). Suppose that at time \(k\) the uncertainty set \([-z_k,z_k]\) is quantized using a \(r_k\)-bit uniform quantizer, and that the encoder communicates to the decoder the interval containing the state. Then, the decoder approximates the state by the centroid \(\hat{x}_k\) of this interval and sends to the plant the control input \(u_k = -\lambda \hat{x}_k\). By construction \(|x_k-\hat{x}_k| \le z_k/2^{r_k}\), thus \[\begin{aligned} |x_{k+1}| & = |\lambda (x_k-\hat{x}_k) + v_k | \nonumber \\ & \le |\lambda| |x_k-\hat{x}_k| + \omega \nonumber \\ & \le \frac{|\lambda|}{2^{r_k}}z_k + \omega \nonumber \\ & = z_{k+1}\end{aligned}\] i.e., the claim holds at time \(k+1\) as well. It follows that \(x_k\) is \(m\)-th moment stable if the homogeneous MJLS \(Z_k\) is weakly \(m\)-th moment stable, i.e., if and only if \eqref{eq:Rm} holds.

Proof of Proposition \ref{prop1}

\label{App1}

The monotonicity property is an immediate consequence of the log-convexity of the spectral radius of a nonnegative matrix stated in Lemma \ref{thm:Friedland}. Let \( {{\mathrm{D}}}_1 = - m {{\mathrm{R}}} \ln 2\), \( {{\mathrm{D}}}_2 = {{\mathrm{0}}}_{n\times n}\) and \(\alpha = \frac{n}{m}\). Notice that \( 2^{-m {{\mathrm{R}}}} = e^{ {{\mathrm{D}}}_1}\), \(\log \rho( {{\mathrm{P}}}^{\mathsf{T}}e^{ {{\mathrm{D}}}_2} ) = 0\), and \({{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}} = {{\mathrm{P}}}^{\mathsf{T}}e^{\alpha {{\mathrm{D}}}_1}\). Then, for every \(n < m\), by Lemma \ref{thm:Friedland} \[\begin{aligned} -n R(m) & = \frac{n}{m} \log \rho( {{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}} ) \nonumber \\ & = \alpha \log \rho( {{\mathrm{P}}}^{\mathsf{T}}e^{{{\mathrm{D}}}_1} ) + (1-\alpha) \log \rho( {{\mathrm{P}}}^{\mathsf{T}}e^{{{\mathrm{D}}}_2} ) \nonumber \\ & > \log \rho( {{\mathrm{P}}}^{\mathsf{T}}e^{\alpha {{\mathrm{D}}}_1 + (1-\alpha) {{\mathrm{D}}}_2} ) \nonumber \\ & = \log \rho( {{\mathrm{P}}}^{\mathsf{T}}2^{-n{{\mathrm{R}}}} ) \nonumber \\ & = -n R(n). \end{aligned}\] By dividing both sides by \(-n\), it then follows that \(R(n) > R(m)\), establishing that \(R(m)\) is a strictly decreasing function of \(m\).

To establish the convergence of \(R(m)\) to the Shannon capacity as \(m \to 0\), observe that \[\begin{aligned} & \lim_{m \to 0} R(m) \\ & = \lim_{m \to 0} \log \rho\bigl( ({{\mathrm{P}}}^{\mathsf{T}}2^{-m{{\mathrm{R}}}})^{\frac{1}{m}} \bigr) \\ & = \log \rho\bigl( \lim_{m \to 0}({{\mathrm{P}}}^{\mathsf{T}}2^{-m{{\mathrm{R}}}})^{\frac{1}{m}} \bigr) \\ & \stackrel{(a)}{=} \log \rho \left( \lim_{m \to 0} \left( \begin{bmatrix} \pi_1 2^{-m \bar{r}_1} & \cdots & \pi_1 2^{-m \bar{r}_n} \\ \vdots & \vdots & \vdots \\ \pi_n 2^{-m \bar{r}_1} & \cdots & \pi_n 2^{-m \bar{r}_n} \end{bmatrix} \right)^{\frac{1}{m}} \right) \\ & = \lim_{m \to 0} \frac{1}{m}\log \rho \left( \begin{bmatrix} \pi_1 2^{-m \bar{r}_1} & \cdots & \pi_1 2^{-m \bar{r}_n} \\ \vdots & \vdots & \vdots \\ \pi_n 2^{-m \bar{r}_1} & \cdots & \pi_n 2^{-m \bar{r}_n} \end{bmatrix} \right) \\ & = \lim_{m \to 0} \frac{1}{m}\log \left( \sum_{i} \pi_i 2^{-m\bar{r}_i} \right) \\ & \stackrel{(b)}{=} \lim_{m \to 0} \frac{ \sum_{i} \pi_i r _i 2^{-m\bar{r}_i} }{ \sum_{i} \pi_i 2^{-m\bar{r}_i} } \\ & = \sum_{i} \pi_i \bar{r}_i \end{aligned}\] where (a) follows from Lemma \ref{eq:lim1} and (b) follows from l’Hôpital’s rule.

The convergence of \(R(m)\) to the zero-error capacity as \(m \to \infty\) can be proved using the fact that, as \(m \to \infty\) \[\begin{aligned} R(m) & = -\frac{1}{m}\log \rho ({{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}}) \\ & = - \log \rho \Bigl( ({{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}})^{1/m} \Bigr)\\ & = \bar{r}_1- \log \rho \left( \begin{bmatrix} p_{11} & \cdots & p_{n1} 2^{-m(\bar{r}_n-\bar{r}_1)} \\ \vdots & \vdots & \vdots \\ p_{1n} & \cdots & p_{nn} 2^{-m(\bar{r}_n-\bar{r}_1)} \end{bmatrix}^{1/m} \right) \\ & \sim \bar{r}_1- \frac{1}{m} \log p_{11},\end{aligned}\] where the last equation follows from the fact that for any column vector \(v=(v_1,\dotsc,v_n)^{\mathsf{T}}\), \(\rho(\begin{bmatrix} v & {{\mathrm{0}}}_{n \times (n-1)} \end{bmatrix}) = v_1\). From the above asymptotic approximation it immediately follows that if \(\bar{r}_1 =0\), then \[\lim_{m \to \infty}mR(m) = - \log p_{11}.\]

Next, the property on the sensitivity with respect to self-loop probabilities is a direct application of a result in \cite{Cohen1978}, which is re-stated here as Lemma \ref{lem:cohen}, on the derivative of the spectral radius as a function of non-negative matrix elements.

Next, the monotonicity property of \(-m R(m) = \log \rho( {{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}} )\) follows from the fact that all the entries of the matrix \({{\mathrm{P}}}^{\mathsf{T}}2^{-m {{\mathrm{R}}}}\) are monotonically decreasing in \(m\). Finally, the strict concavity of \(m R(m)\) is again a direct consequence of the log-convexity of the spectral radius of a nonnegative matrix stated in Lemma \ref{thm:Friedland}.