1. ) Suppose that \(\Sigma_n \mathbf{P}(|X_n - X| > \varepsilon) < \infty\) for all \(\varepsilon > 0\). It is a condition for \(X_n \rightarrow X\) almost surely. Hence, a sequence of random variables, with summability of the probabilities \(\mathbf{P}(|X_n - X| > \varepsilon)\) for any threshold \(\varepsilon > 0\), is almost surely convergent. This technique is convenient in proving almost sure convergence from summable bounds for convergence in probability. Give a counterexample for the converse.
  2. This gives a (strictly) partial converse to 1: If \(X_n \rightarrow X\) in probability, then one can find a subsequence \(X_{n_j}\) such that \(X_{n_j} \rightarrow X\) almost surely
  3. If \(\mathbf{E}|X_n|, \mathbf{E}|X| < \infty\) and \(E|X_n - X| \rightarrow 0\), then \(\mathbf{P}(|X_n - X| > \varepsilon) \rightarrow 0\) as \(n \rightarrow \infty\) for every \(\varepsilon > 0\); thus, if absolutely integrable variables \(X_n\) converges to the putative value \(X\), which is another absolutely integrable variable, in expectation, then \(X_n\) converges to \(X\) in probability. Give a counterexample so that the former notion is strictly stronger.
  4. (Urysohn subsequence principle) Suppose that every subsequence \(X_{n_j}\)of \(X_n\) has a further subsequence \(X_{n_{j_k}}\)that converges to \(X\) in probability. It is a necessary and sufficient condition for the entire sequence \(X_n\) to converge to \(X\) in probability.
  5. Does the Urysohn subsequence principle still hold if one replaces "in probability" thoroughly with "almost surely"?
  6. A continuous function preserves convergence in probability: If \(F: \mathbf{R} \rightarrow \mathbf{R}\) or \(\mathbf{C} \rightarrow \mathbf{C}\) is a continuous function and \(X_n \rightarrow X\) in probability, then \(F(X_n) \rightarrow F(X)\) in probability also. This holds more generally for several-variables functions \(F: \mathbf{R}^k \rightarrow \mathbf{R}\) or \(\mathbf{C}^k \rightarrow \mathbf{C}\); for instance, if\(X_n, Y_n\) respectively converges to \(X, Y\) in probability then \(X_n + Y_n\) and \(X_n Y_n\) converges to \(X + Y\) and \(XY\) in probability also; thus, \(X_n + Y_n \rightarrow X + Y\) and \(X_n Y_n \rightarrow X Y\) in probability if \(X_n \rightarrow X, Y_n \rightarrow Y\) both in probability.
  7. (Fatou's lemma for convergence in probability) If \(X_n\) are unsigned and the event \(|X_n - X| > \varepsilon\) tends to be null as \(n\) increases to \(\infty\) at any scale \(\varepsilon > 0\), then \(\mathbf{E}X \leq \lim \inf_{n \rightarrow \infty} \mathbf{E}X_n\); the limit infimum \(\lim \inf_{n \rightarrow \infty} \mathbf{E}X_n\) gives an upper bound for the expectation \(\mathbf{E}X\).
  8. (Dominated convergence in probability) If \(X_n \rightarrow X\) in probability, and almost surely \(|X_n| \leq Y\) for all  \(n\) and some \(\)\(Y\) with finite expectation, then \(\mathbf{E}X_n \rightarrow \mathbf{E}X\) as \(n \rightarrow \infty\); hence, \(\lim_{n \rightarrow \infty} \mathbf{E}X_n = \mathbf{E}X\) if one has \(\lim_{n \rightarrow \infty} \mathbf{P}(|X_n - X| > \varepsilon) = 0\) and some randome variable \(Y\) with finite expectaion and dominance \(|X_n| \leq Y\).
Exercise 3 
We are now able to state the weak and strong laws of large numbers, in the specific case of identically distributed (iid for short) random variables.
Recall that:
Theorem 4 (Law of large numbers, model case) Let \(X_1, X_2, \dots\) be an iid sequence of copies of copies of a random variable \(X\) with finite expectation.
  1. (Weak law of large numbers)Then, \((X_1 + \cdots + X_n)/n \rightarrow \mathbf{E}X\) in probability.
  2. (Strong law of large numbers) Then, \((X_1 + \cdots + X_n)/n \rightarrow \mathbf{E}X\) almost surely.
Remarks:
There are several variants of the law of large numbers, for instance when one drops the hypothesis of identical distribution, or when the random variable \(X\)is not absolutely integrable, or if one seeks for more quantitative bounds on the rate of convergence; We will discuss some of these variants later on this note.
There are several ways to prove the law of large numbers. One main strategy is to use the moment method, the one to control statistics such as the empirical mean  \((X_1 + \cdots X_n)/n\) by computing moments such as the mean \(\mathbf{E}(X_1 + \cdots + X_n)/n\), the variance \(\mathbf{E}|(X_1 + \cdots + X_n)/n - \mathbf{E}(X_1 + \cdots + X_n)/n|^2\) or higher moments such as \(\mathbf{E}|(X_1 + \cdots + X_n)/n - \mathbf{E}(X_1 + \cdots + X_n)/n|^k\) for \(k=4,6, \dots\). Fortunately, the joint independence of the \(X_n\)allows us to compute such moments fairly easily, with some elementary combinatorics. A direct application of the moment method typically requires one to add a finite moment assumption on the putative value \(X\) such as \(\mathbf{E}|X|^k < \infty\), as yet we shall see, one can reduce fairly easily to this case by so-called a truncation argument. For the strong law of large numbers, one can also use methods from the domain of the theory of martingales, such as stopping time arguments and maximal inequalities; we will present some classical arguments of Kolmogorov in this context.

1. The moment method

We begin by using the moment method to establish both the strong and weak law of large numbers for sums of iid random variables, under additional moment hypotheses. To prove the versions for complex variables, it suffices to do so for real ones, since one can take real and imaginary parts after establishing the real case. Thus we shall focus our attention on real random variables henceforth.
Recall that:

The first moment

Let \(X_1, X_2, \dots\) be a sequence of iid copies of a scalar random variable \(X\), and put our partial sums \(S_n := X_1 + \cdots + X_n\). Suppose that \(X\) is absolutely integrable with mean \(\mathbf{E}X = \mu\). Then one can use linearity of expectation to compute the expectation, the first moment, of the partial sum \(S_n\)\[\mathbf{E}X_1 + \cdots + \mathbf{E}X_n = n\mu.\]
In particular, the mean \(\mu\) can be written as the expectation of the empirical mean \((X_1 + \cdots + X_n)/n\). This looks consistent with the strong and weak laws of large numbers. It yet does not instantly imply these laws, but we can record the following really weak bound for the moment \[\mathbf{P}((X_1+\cdots+X_n)/n\ge\lambda n\mu)\le\frac{1}{\lambda}\]for any allotted slot \(\lambda\ >0\) in the case that \(X\) is unsigned and \(\mathbf{E}X\) is finite. Therefore, in the unsigned case, the empirical \(\frac{(X_1 + \cdots + X_n)}{n}\) usually doesn't get much larger than the mean \(\mu\). We will refer to this inequality as a first moment bound on the \(S_n\) (as was obtained primarily through computations of the first moment \(\mathbf{E}S_n\)). 

Second moments

We now turn to second moment bounds, occurring through computations of second moments such as \(\mathbf{E}|(X_1 + \cdots + X_n)/n|^2\)or \(\mathbf{Var}(S_n) = \mathbf{E}|S_n - \mathbf{E}S_n|^2\). While so, it will be convenient to normalize the mean \(\mu\) to equal zero, by replacing each \(X_i\) with  \(X_i - \mu\) and \(X\) by \(X - \mu\), so that the partial sum \(S_n\) gets replaced by \(S_n - n\mu\) and the empirical mean \(S_n/n\) with \(S_n/n - \mu\). With this normalization, we see that to prove the strong or weak law of large numbers, it suffices to do so in the zero case \(\mu = 0\). On the other hand, if one has \(X\)unsigned, then normalizing it in this fashion will almost certainly destroy the unsigned property of \(X\); so it is not always desirable.
Remarks
Recall that 
We now start computing the variance of the partial sum \(S_n\). Suppose that \(X\) has finite second moment, which we write  \(\mathbf{E}|X|^2 = \sigma^2\).  Since a constant addition and subtraction does not affect the variance \(\mathbf{Var}(X - \mu) = \mathbf{Var}(X)\), we also assume \(X\) has been normalized to have mean zero. Then the variance of \(S_n\) is simply \(\mathbf{E}|X_1 + \cdots + X_n|^2\). By linearity of expectation, we have \[\mathbf{Var}(X_1 + \cdots + X_n) = \underset{1 \leq i,j \leq n}{\sum}\mathbf{E}X_i X_j.\]
All expressions here on the right side of the identity are absolutely integrable thanks to the Cauchy-Schwarz inequality. If \(i = j\), then the \(\mathbf{E}X_i X_j\) is equal to the second moment \(\mathbf{E}|X|^2 = \sigma^2\). If otherwise \(i \neq j\), then \(X_i, X_j\) are independent, have mean zero by hypothesis, and thus \[\mathbf{E}X_i X_j = (\mathbf{E}X_i)(\mathbf{E}X_j) = 0.\]
Putting these in, we obtain
\[\mathbf{Var}(X_1 + \cdots + X_n) = n \sigma^2\]
or equivalently, that of the empirical mean\[\mathbf{Var}((X_1 + \cdots + X_n)/n) = \frac{1}{n}\sigma^2.\]
This bound's been established in the mean zero case, but it is clear that this holds in general, since constant addition and subtraction does not affect the variance; for instance \(\mathbf{Var}(S_n/n) = \mathbf{Var}(S_n/n - \mu) = \frac{1}{n}\sigma^2\), where we write \(\sigma^2 = \mathbf{E}|X - \mu|^2\)
This is the first demonstration of concentrations of measure effect that come to arise from operating many independent random variables \(X_1, \dots X_n\). At the opposite extremal case, we may take \(X_1, \dots, X_n\) all to be exactly the same random variable: \(X_1 = \cdots = X_n = X\). Then \(\frac{X_1 + \cdots + X_n}{n} = X\)has exactly the same mean and variance as X (obviously).
Remarks:
Now, let us examine the behavior of the mean \(S_n/n\). If we insert the variance bound we've obtained into Chebyshev's inequality, we acquire \[\mathbf{P}(|\frac{S_n}{n} - \mu| \geq \varepsilon) \leq \frac{1}{n}\frac{\sigma^2}{\varepsilon^2}\]
for any natural number \(n\) and slot \(\varepsilon > 0\), whenever \(X\) holds mean \(\mu\) and a finite variance \(\sigma^2 < \infty\). Hence, \(\frac{X_1 + \cdots + X_n}{n} \rightarrow \mu\) in probability: we have proved the weak law of large numbers, in the case of \(X\)with finite variance.
Note that the last bound of the \(S_n/n\) can be modified, so that \[\frac{\sqrt{n}}{\omega(n)}|\frac{X_1 + \cdots + X_n}{n} - \mu|\]
converges to zero in probability whenever \(\omega(n)\) is a function of \(n\) that eventually goes to infinity as \(n \rightarrow \infty\). For instance we have\[\frac{\sqrt n}{\log n}|\frac{S_n}{n} - \mu| \rightarrow 0\]
in probability. An informal intuition of this result is that the empirical mean \(S_n/n\) tends to stray from \(\mu\) by not much more than \(\sqrt n\). This understanding will be reinforced later when we study the central limit theorem and related results such as the Chernoff inequality. It is enhanced also by the law of the iterated logarithm, which we will not be able to get to in this set of studies.
Recall that
One can also hope to use the bound \(\mathbf{P}(|\frac{S_n}{n} - \mu| \geq \varepsilon) \leq \frac{1}{n}\frac{\sigma^2}{\varepsilon^2}\) together with the Borel-Cantelli lemma to obtain the strong law of large numbers in the second moment case,