Statistics of Mapping

There must exist in some sense, an argument over the liklihood, that in an infinite string of digits, i.e an irrational number, the probability \(P_d\) of a given digit \(d\) occuring in any base \(\beta\) tends to \[P_d=\frac{1}{\beta-1}\] that is, a uniform distribution.

However this is a naieve, assumption still. However, if this were not true then the much later digits, and therefore much less significant digits would hold some order beyond first assumptions. Is it significant that \(\sqrt{2}\),\(\sqrt{7}\) and \(\sqrt{8}\) each have a double zero after thier mapping at digits \(3\) and \(4\). Probably not, however, if it were the case it would suggest correlation in digits \(104\) and \(85\) of the original numbers. The fact this feature is in base \(10\) only (I’m guessing) dismissed it almost immediatly.

Let us conduct some experiments. Analysing the first \(10000000\) digits of pi (from http://pi.karmona.com/) gives \[\begin{array}{|c|c|c|} \hline 0's& 9.9944 \%& 999440 \\ 1's& 9.9933 \%& 999333 \\ 2's& 10.0031 \%& 1000306 \\ 3's& 9.9996 \%& 999965 \\ 4's& 10.0109 \%& 1001093 \\ 5's& 10.0047 \%& 1000466 \\ 6's& 9.9934 \%& 999337 \\ 7's& 10.0021 \%& 1000207 \\ 8's& 9.9981 \%& 999814 \\ 9's& 10.0004 \%& 1000040 \\ \hline \end{array}\]

This is a fairly flat distribution, with the largest fluctuation of \(1093/10000000\) roughly one ten-thousandth.

However for Champernowne constant we have a distinctly different situation! for \(5888890\) digits we have an exact sharing of digits between non-zero digits! But the zeros are distinctly less. \[\begin{array}{|c|c|c|} \hline 0's & 8.3019 \%& 488890 \\ 1's & 10.1887 \%& 600000 \\ 2's & 10.1887 \%& 600000 \\ 3's & 10.1887 \%& 600000 \\ 4's & 10.1887 \%& 600000 \\ 5's & 10.1887 \%& 600000 \\ 6's & 10.1887 \%& 600000 \\ 7's & 10.1887 \%& 600000 \\ 8's & 10.1887 \%& 600000 \\ 9's & 10.1887 \%& 600000 \\ \hline \end{array}\]

Perhaps there is some convergence to a constant which is the ratio of the number of zeroes in the base 10 Champernowne constant to the number of another digit. The best guess from this is \(0.814816667\) so far. Actually it is a necessarcy condition that the number of digits printed are up to \(9\),\(99\),\(999\),etc. the maximum of an \(n\) digit string. The next approximation is \(5888890/7000000\) or \(0.84127\) from a sum of \(68888890\) digits. Then we have \(68888890/80000000\) or \(0.861111125\) from a sum of \(788888890\) digits.

This appears very slow to converge... But there is reasonable scope for a pattern, and limiting behavior.

\[\begin{array}{|c|c|c|} \hline It & Ndigs & frac \\ - & 0 & 0/0 \\ 0 & 10 & 1/1 \\ 1 & 190 & 10/20 \\ 2 & 2890 & 190/300 \\ 3 & 38890 & 2890/4000 \\ 4 & 488890 & 38890/50000 \\ 5 & 5888890 & 488890/600000 \\ \end{array}\]

This gives us OEIS:\(A033714\)

And we may take the fraction of \(n|(n-1)8|90\) and \((n+2)|(n+1)0\), where the \(|\) terms mean digit pastingor concatenation to find the convergence. Where for example, \(n=3\) gives \(38890/50000\) such a notation is interesting \[\frac{<n>|(n-1)8|<90>}{<n+2>|(n+1)0}\]

if we take \(<>\) to mean the literal value of, \(|\) a concatenation of, and \(f(n);d\) to mean \(f(n)\) repeats of digit \(d\). It even has a stringe interpretation in the corner cases, i.e \(n=-1\) etc...

\[\frac{<-1>|(-1-1)8|<90>}{<-1+2>|(-1+1)0}=\frac{<-1>|(-2)8|<90>}{<1>}= \frac{[90-88-1]}{1} \\ \frac{<0>|(0-1)8|<90>}{<0+2>|(0+1)0}=\frac{<0>|(-1)8|<90>}{<2>|(1)0}= \frac{[9-8]|<0>}{20}\]

Using this template we can concoct ratios such as \(20888888888888888888890/22000000000000000000000\) or \(0.949494949494949494949545454545454545454545454545454545454545...\)

Taking it to \[\frac{50|49;8|90}{52|51;0} = 0.9\overline{786324}\]

seems to indicate such a gap closes.