Chapter 0: Using Probability to Cover a Geometric Set

Exercise 0.0.3.  (a) \(\mathbf{E}\left|\left|\Sigma_{j=1}^kZ_j\right|\right|_2^2=\sum_{_{i,j=1,\ i\ne j}}^k\mathbf{E}Z_iZ_j+\sum_{_{j=1}}^k\mathbf{E}Z_j^2\).  By independence, for \(i\ne j\),\(\mathbf{E}Z_iZ_j=\mathbf{\ \ \ E}Z_i\mathbf{\ E}Z_j\ \).  Since the \(Z_j\) have zero mean, the first sum vanishes.  The second sum is \(\Sigma_{j=1}^k\mathbf{E\ \left|\left|Z_j\right|\right|}_2^2\).  
(b)  Expand the LHS using the identity \(\left(x-y\right)^2=x^2-2xy+y^2\), and use linearity of expectation.
Exercise 0.0.5.  First inequality:
\({n \choose m}=\frac{n(n-1)\cdots (n-m+1)}{m!}\), so that \({n \choose m}=\left(\frac{n}{m}\right)^m\cdot \frac{m^m}{m!}\cdot\frac{n(n-1)\cdots(n-m+1)}{n^m}\).  The ratio of the quantities is therefore the product \(\prod_{j=0}^{m-1}\frac{m\cdot (n-k)}{(m-k)n}\).  Each factor in the products individually has value greater than or equal to one because \(n\geq m\).  Therefore, the inequality is as claimed.
Second inequality is obvious because \({n \choose m}\) is one of the summands in the sum on the RHS (and all summands are nonnegative).
Third inequality (upper bound on the sum of binomial coefficients): following the hint, multiply both sides by \((m/n)^m\).  Since \(m/n\leq1\),
\(\sum_{k=0}^m{n \choose k}\left(\frac{m}{n} \right)^m\leq\sum_{k=0}^m{n \choose k}\left(\frac{m}{n} \right)^k\leq\sum_{k=0}^n{n \choose k}\left(\frac{m}{n} \right)^k\).  Apply the binomial theorem to rewrite this quantity as \(\left( 1+\frac{m}{n}\right)^n=\left[\left( 1+\frac{m}{n}\right)^{(n/m)}\right]^m<e^m\), from the familiar limit representation familiar limit representation of e (and the fact that the limiting expressions are bounded above by e).
Exercise 0.0.6.  By Theorem 1.58 in \cite{Loehr_2011}, there are \({N+k-1 \choose k}\) ways of choosing \(k\) out of \(N\) vertices with repetition.  With \(\mathcal{N}\) defined as in the proof of Corollary 0.0.4, and since \(k=\lceil 1/\epsilon^2\rceil\)\(|\mathcal{N}|={N+k-1\choose k}\leq \left( \frac{e(N+\frac{1}{\epsilon^2} - 1)}{\epsilon^{-2}} \right)^{\lceil 1/\epsilon ^2\rceil}\), which is of the form \((C+C\epsilon^2N)^{\lceil 1/\epsilon^2 \rceil}\) for an appropriate constant \(C\).

Chapter 1: Preliminaries

Exercise 1.2.2  Define, as is standard in many analysis texts, 
\(X_+ := \max(0,X)\)\(X_-=-\min(0,X)\), so that both \(X_+, X_-\) are non-negative and \(X=X_+ - X_-\).  By linearity of expectation, \(\mathbf{E}X=\mathbf{E}X_++\mathbf{E}[-X_-]\), and Lemma 1.2.1 applies to \(X_+\), so we obtain the first term.  The variable  \(-X_-\) is a non-positive random variable, so for the second term we need to prove a version of Lemma 1.2.1 which applies to non-positive random variables.  We do this following the steps of Lemma 1.2.1, first by noting that for any non-positive real number \(x\)\(x=-\int_{-\infty}^0 \mathbf{1}_{t>x}\mathrm{d}t\).  Next assume that \(X\) is a nonpositive (rather than non-negative) function proceed as in the proof of Lemma 1.2.1 as follows: \(\mathbf{E}X=\mathbf{E}[-\int_{-\infty}^0\mathbf{1}_{\{t>X\}}\mathrm{d}t=-\int_{-\infty}^0\mathbf{E}\mathbf{1}_{\{t>X\}}\mathrm{d}t=-\int_{\infty}^0\mathbf{P}\left\{t>X\right\}\).
Exercise 1.2.3.  Apply Lemma 1.2.1 to evaluate the expectation: \(\mathbf{E}|X|^p=\int_0^{ \inf}\mathbf{P}\left\{|X|\geq t^{1/p}\right\}\mathrm{d}t\) .  Make the change of variables \(s=t^{1/p},\; s^p=t; ps^{p-1}\mathrm{d}s=\mathrm{d}t\).
Exercise 1.2.6.  Statement tells exactly how to do it.
Exercise 1.3.3.  We may assume without loss of generality that \(\mu=0\) and \(\sigma=1\).  By the Lindeberg-Levy central limit theorem, Theorem 1.3.2, \(\frac{1}{\sqrt{N}}\sum_{_{i=1}}^NX_i\rightarrow N\left(0,1\right)\) as \(N\rightarrow\infty\) in distribution. This implies that \(\mathbf{E}\left|\frac{1}{\sqrt{N}}\sum_{_{i=1}}^NX_i\right|=O\left(1\right)\) as \(N\rightarrow\infty\).  The conclusion follows by dividing both sides by \(\sqrt{N}\).

Chapter 2: Concentration of Sums of Independent Random Variables

Exercise 2.1.4  Second inequality follows from Proposition 2.1.2.  First inequality: ???
Exercise 2.2.3.  \(\cosh\left(x\right)=\frac{e^x+e^{-x}}{2}=1+\frac{x^2}{2!}+\frac{x^4}{4!}+\cdots\), wherea\(\exp\left(\frac{x^2}{2}\right)=1+\frac{x^2}{1!}+\frac{x^4}{2!}+\frac{x^6}{3!}+\cdots\).  Note that the radius of convergence of both power series expansions is infinite because the radius of convergence of the power series for the exponential function is infinite.   Thus the term-by-term comparison between the power series is valid over all reals.
Exercise 2.2.7. Follow the pattern of the proof of Theorem 2.2.2.  In order to prove the bound on the moment generating function of \(X_i\) you need to use the following result (Hoeffding's Lemma): if \(X\)is a centered (mean zero) bounded random variable taking values in \(\left[a,b\right]\), then the moment generating function of \(X\)is bounded by \(\frac{\lambda^2\left(b-a\right)^2}{8}\).
Exercise 2.2.8.  Let
\(X_i=\left\{+1\ \text{incorrect with probability}\ \frac{1}{2}-\delta;\ -1\ \text{correct with probability }\ \frac{1}{2}+\delta\ \right\}\)
Because \(\mathbf{E}\ X_i=-2\delta\)\(\sum_{_{i=1}}^N\mathbf{E}X_i=-2\delta N\).  Therefore, \(\mathbf{P\ \left\{\sum_{_{i=1}}^NX_i>0\right\}\ =P}\left\{\sum_{_{i=1}}^NX_i-\mathbf{E}X_i>2\delta N\right\}\)Theorem 2.2.6, Hoeffding's Inequality, implies that \(\mathbf{P}\left\{\sum_{_{i=1}}^NX_i>0\right\}\le\exp\left(\frac{-2\cdot4\delta^2N^2}{\sum_{_{i=1}}^N2^2}\right)=\exp\left(-2\delta^2N\right).\) In order to have \(\exp\left(-2\delta^2N\right)<\epsilon\) it is necessary and sufficient to have \(N>\frac{1}{2}\delta^2\ln\epsilon^{-1}\).
Exercises 2.2.9 (a).  \(\mathbf{P}\left\{|\hat{\mu}-\mu|\geq \epsilon\right\}=\mathbf{P}\left\{ |\frac{1}{N}\sum_{i=1}^N X_i-\mu| \geq \epsilon\right\}\), which is \(\mathbf{P}\left\{\left|\sum_{_{i=1}}^NX_i-\sum_{_{i=1}}^N\mathbf{E}X_i\right|>N\epsilon\right\}\).  The random variable inside the absolute values in the previous expression has mean 0 and variance \(N\sigma^2\) by the comment on the bottom of p. 8.  Therefore, Chebyshev \(\mathbf{P}\left\{\left|\sum_{_{i=1}}^NX_i-\sum_{_{i=1}}^N\mathbf{E}X_i\right|>N\epsilon\right\}\le\frac{N\sigma^2}{N^2\epsilon^2}=\frac{\sigma^2}{N\epsilon^2}\)   In order to have \(\frac{\sigma^2}{N\epsilon^2}\le\frac{1}{4}\) it is necessary and sufficient to have \(N\ge\frac{\sigma^2}{4\epsilon^2}=O\left(\frac{\sigma^2}{\epsilon^2}\right)\).
(b)  As suggested by the hint, use the median of \(\lceil8\cdot\log\left(\delta^{-1}\right)\rceil\) estimates from part (a).  Think of each estimate as being a vote that the median is or is not greater than \(\mu+\epsilon\).  For the majority vote to be wrong is equivalent to the median being greater than \(\mu+\epsilon\).  The majority vote is wrong with probability less than \(\delta\) if the number of estimates is  \(\ge\left(\frac{1}{2}\right)\left(\frac{1}{4}\right)^{-2}\ln\delta^{-1}=8\ln\delta^{-1}\) .
Exercise 2.2.10.(a)  \(\mathbf{E}\exp\left(-tX_i\right)=\int_{_0}^{\infty}\exp\left(-tx\right)\mathrm{pmf}_{X_i}\left(x\right)\mathrm{d}x\), by the "Law of the Unconscious Statistician" and the fact that the \(X_i\) are nonnegative.   The hypothesis on the \(X_i\) implies that \(\mathrm{pmf}_{X_i}\left(x\right)\) is a nonnegative function bounded by \(1\), whose integral is \(1\).  Since \(\exp\left(-tx\right)\) is a monotonically decreasing function, the integral cannot be any larger than when the mass function \(\mathrm{pmf}_{X_i}\left(x\right)\) is concentrated in the unit interval \(\left[0,1\right]\) and is the constant \(1\).  Therefore, \(\mathbf{E}\exp\left(-tX_i\right)\le\int_{_0}^1\exp\left(-tx\right)\mathrm{\cdot1}\mathrm{d}x=\frac{1}{t}\left[1-\exp\left(-t\right)\right]\le\frac{1}{t}\).
(b)  \(\mathbf{P}\left\{\sum_{_{i=1}}^NX_i\le\epsilon N\right\}=\mathbf{P}\left\{\exp\left(-t\sum_{_{i=1}}^NX_i\right)\ge\exp\left(-t\epsilon N\right)\right\}\le\frac{\mathbf{E}\prod_{_{i=1}}^N\exp\left(-tX_i\right)}{\exp\left(-t\epsilon N\right)}\), by Markov's inequality.  (To be completed...)
Exercise 2.3.2.  \(t<\mu\Leftrightarrow-t>-\mu\) and \(-\mu=\mathbf{E}\left(-S_N\right)\).  Therefore \(\mathbf{P}\ \left\{S_N\le t\right\}=\mathbf{P\ }\left\{-S_N\ge-t\right\}=\mathbf{P}\left\{e^{-\lambda S_N}>e^{-\lambda t}\right\}\le e^{\lambda t}\prod_{i=1}^N\mathbf{E}\exp\left(-\lambda X_i\right)\), by Markov's inequality.  The random variable \(-X_i\) takes value -1 with probability \(p_i\)  and value 0 probability \(1-p_i\) .  Therefore, \(\mathbf{E}\exp\left(\lambda\left(-X_i\right)\right)=e^{-\lambda}p_i+\left(1-p_i\right)=1+\left(e^{-\lambda}-1\right)p_i\le\exp\left(\left(e^{-\lambda}-1\right)p_i\right)\).