Proposition 5 (Approximation by Bernstein polynomials) Let \(f: [0, 1] \rightarrow \mathbf{R} \) be a continuous function. Then, the Bernstein polynomials \[f_n(t) := \sum_{i=0}^{n} \binom{n}{i}t^i(1-t)^{n-i}f(\frac{i}{n})\]converges uniformly to \(f\) as \(n \rightarrow \infty\). This asserts that continuous functions on (say) the unit interval \([0, 1]\) can be approximated by polynomials.
Proof: We first establish the pointwise convergence \(f_n(p) \rightarrow f(p)\) for \(0 \leq p \leq 1\). Fix \(p \in [0, 1]\) and let \(X_1, X_2, \dots\) be iid Bernoulli variables with probability \(p\). The empirical mean \(S_n/n\) takes values in the finite set \(\left\{0, 1/n, \dots, n/n \right\}\) (note that \(f(S_n/n)\) is thus dominated) with probability \(\binom{n}{i} p^i (1-p)^{n-i}\) for each \(i/n\), and so from the definition of the Bernstein polynomials \(f_n\) we see that 
\[\mathbf{E}f(\frac{X_1 + \cdots + X_n}{n}) = f_n(p).\]The mean of the \(X_i\) is \(p\) and the variance is finite, so by the weak law of large numbers for random variables of finite second moment, we see that the empirical mean \(S_n/n\) converges to the probability \(p\) in probability. By the dominated convergence theorem in probability, we conclude the pointwise convergence that \(\mathbf{E}f(S_n/n)\) converges to \(f(p)\).
To establish uniform convergence, we use the proof of the weak law of large numbers rather than the statement of that law, to get the desired uniformity in the parameter \(p\). For a given \(p\), we see from the variant of Chebyshev inequality that \[\mathbf{P}(|\frac{S_n}{n} - p| \geq \delta) \leq \frac{1}{n}\frac{p(1-p)}{\delta^2}\]for any \(\delta > 0\). From hypothesis, \(f \) is uniformly continuous on \([0, 1]\), and so for any \(\varepsilon > 0\) there exists a \(\delta > 0\) such that  \(|f(x) - f(y)| < \varepsilon\) whenever \(x, y \in [0, 1]\) with \(|x - y| < \delta\). For such an \(\varepsilon\) and \(\delta\), we conclude (by the monotonicity of probability) that\[\mathbf{P}(|f(\frac{S_n}{n}) - f(p)| \geq \varepsilon) \leq \frac{1}{n}\frac{p(1-p)}{\delta^2}.\]Also by hypothesis, \(f\) must be bounded in magnitude by some bound \(M\), so that \(|f(S_n/n) - f(p)| \leq 2M\). This leads to the upper bound \[\mathbf{E}|f(\frac{S_n}{n}) - f(p)| \leq \varepsilon + \frac{2M}{n}\frac{p(1-p)}{\delta^2}\] and thus by the triangle inequality and the identity \(|\mathbf{E}(f(S_n/n) - f(p))| = |f_n(p) - f(p)|\)\[|f_n(p) - f(p)| \leq \varepsilon + \frac{2M}{n}\frac{p(1-p)}{\delta^2}.\]
Since \(p(1-p)\) is clearly bounded and \(\varepsilon\) can be made arbitrarily small, we conclude that \(f_n\) converge uniformly to \(f\), as required. 
The first and second moment methods are very general, and apply to sums \(S_n = X_1 + \cdots + X_n\) of random variables \(X_1, \dots, X_n\) that do not need to be identically distributed, or even independent (Although the bounds can get weaker and more complicated if one deviates too far from these hypotheses). For instance it is clear from linearity of expectation that \(S_n\) has mean \[\mathbf{E}S_n = \mathbf{E}X_1 + \cdots + \mathbf{E}X_n\]
(assuming of course that \(X_1, \dots, X_n\) are absolutely integrable) and variance \[\mathbf{Var}(S_n) = \mathbf{Var}(X_1) + \cdots +\mathbf{Var}(X_n) + \sum_{1\leq i,j \leq n; i\ne j} \mathrm{Cov}(X_i, X_j)\](assuming now that \(X_1, \dots , X_n\) are square-integrable). If the \(X_1, \cdots, X_n\) are pairwise independent in addition to being square-ntegrable, then all hte covariances vanish and we obtain additivity of the variance:\[\mathbf{Var}(S_n) = \mathbf{Var}(X_1) + \cdots + \mathbf{Var}(X_n).\]
Remark