\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage{natbib}
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{float}
\iflatexml
\documentclass[%
aip,
%jmp,%
%bmf,%
%sd,%
rsi,%
amsmath,amssymb,
%preprint,%
reprint,%
%author-year,%
%author-numerical,%
nofootinbib
]{revtex4-1}
\fi
% \usepackage{pgfplots} % Not (yet) supported by LaTeXML.
\usepackage{listings}
% was minted
\usepackage{dcolumn}
\usepackage{bm}
\lstdefinestyle{mystyle}{
basicstyle=\ttfamily\footnotesize,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbersep=2pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2
}
\lstset{style=mystyle}
\providecommand{\noopsort}[1][]{}
\providecommand{\singleletter}[1][]{#1}
\begin{document}
\title{A Comparison of Quantum and Traditional Fourier Transform Computations}
\author[1]{Damian Musk}%
\affil[1]{Affiliation not available}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\selectlanguage{english}
\begin{abstract}
The quantum Fourier transform (QFT) can calculate the Fourier transform of a vector of size $N$ with time complexity $\mathcal{O}(\log^2 N)$ as compared to the classical complexity of $\mathcal{O}(N \log N)$. However, if one wanted to measure the full output state, then the QFT complexity becomes $\mathcal{O}(N \log^2 N)$, thus losing its apparent advantage, indicating that the advantage is fully exploited for algorithms when only a limited number of samples is required from the output vector, as is the case in many quantum algorithms. Moreover, the computational complexity worsens if one considers the complexity of constructing the initial state. In this paper, this issue is better illustrated by providing a concrete implementation of these algorithms and discussing their complexities as well as the complexity of the simulation of the QFT in MATLAB.%
\end{abstract}%
\sloppy
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=cyan,
}
% added by MDP
\lstdefinestyle{mystyle}{
basicstyle=\ttfamily\footnotesize,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbersep=2pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2
}
\lstset{style=mystyle}
\section*{\label{sec:level1}The principles of quantum computing}
\iffalse
The motivations of this paper are to better examine the incoming superiority of quantum algorithms over classical algorithms; otherwise known as ``quantum speedup," as coined by John Preskill\hyperref[csl:1]{(Preskill, 2012)}.
A first example is of Shor's Algorithm, taking an exponential speedup over classical algorithms(missing citation): a quantum algorithm designed for integer factorization\hyperref[csl:3]{(Shor, 1994)}. $$\\ \\ . \ . \ .$$
\fi
Quantum computation poses a fundamentally different framework than our more familiar classical computation. Specifically, this is due to the existence of qubits, as opposed to bits, the fundamental components of computing. Whereas bits can only hold the binary values of $0$ or $1$, qubits can instead hold a superposition of states. We use Dirac's bra-ket notation, which utilizes two kets analogous to the classical states $0$ and $1$, to represent an orthogonal basis,
\begin{equation}
|{0}\rangle=
\begin{bmatrix}
1 \\
0
\end{bmatrix}
\ \text{and} \
|{1}\rangle=
\begin{bmatrix}
0 \\
1
\end{bmatrix}.
\end{equation}
A more generic qubit state is a superposition of the previous basis states:
\begin{equation}
|{\psi}\rangle=x_0 |{0}\rangle + x_1 |{1}\rangle.
\end{equation}
where $x_0$ and $x_1$ are complex-valued amplitudes obeying the relation:
\begin{equation}
|x_0|^2 + |x_1|^2 = 1.
\end{equation}
It is important to clarify that such a superposition does not take a value ``in between 0 and 1": instead, in accordance with the Born rule\hyperref[csl:4]{(Born, 1954)}, a measurement of the state of the qubit corresponds to a binary value. The qubit has a \textit{probability} $|x_0|^2$ to be found in state ``0" and a \textit{probability} $|x_1|^2$ to be found in state ``1."
%MDP: duplicated:
%The sum of the two probabilities is one:
%\begin{equation}
% |x_0|^2 + |x_1|^2 = 1.
%\end{equation}
We can use the Kronecker tensor product to construct the basis of a system comprised of multiple ($n$) qubits. For example, the product basis states of a four-dimensional ($n=4$) linear vector space are:
\begin{align}
|\mathbf{0}\rangle &= |{0}\rangle \otimes |{0}\rangle=
\begin{bmatrix}
1 \\
0 \\
0 \\
0
\end{bmatrix},
|\mathbf{1}\rangle = |{0}\rangle \otimes |{1}\rangle=
\begin{bmatrix}
0 \\
1 \\
0 \\
0
\end{bmatrix}, \\
|\mathbf{2}\rangle &= |{1}\rangle \otimes |{0}\rangle=
\begin{bmatrix}
0 \\
0 \\
1 \\
0
\end{bmatrix},
|\mathbf{3}\rangle = |{1}\rangle \otimes |{1}\rangle=
\begin{bmatrix}
0 \\
0 \\
0 \\
1
\end{bmatrix}.
\end{align}
Given $n$ qubits, the general state is a superposition state vector in a $2^n$-dimensional Hilbert space:
\begin{equation}
|X\rangle = \sum_{j=0}^{j<2^n} x_j |j\rangle
\end{equation}
with $\sum_{j=0}^{j<2^n} |x_j|^2 = 1$.
Given an arbitrary state result of a computation, to measure one $x$ coefficient $j$, one would have to perform a projection on the corresponding base vector:
\begin{equation}
x_k = \langle k|x\rangle = \sum_{j=0}^{j < 2^n} x_j \delta_{kj},
\end{equation}
$\delta_{kj}$ is the Kronecker delta. This measurement destroys the state and therefore only one coefficient (or one linear combination of coefficients) can be known. To know more coefficients, one would have to repeat the experiment that produced the original state. In other words, each measurement only provides one $x_j$ of information, and one thus has to perform $\mathcal{O}(N)$ measurements to obtain all $N$ coefficients to a certain precision.
This is a point of major importance that will affect our analysis of the Quantum Fourier Transform.
To compute with qubits, one can form quantum logic gates, which are analogous to classical logic gates but instead operate on quantum states. For example, the following gate $H$ inputs state $|X\rangle$ and outputs $|Y\rangle$:
\begin{equation}
|Y\rangle = H |X\rangle.
\end{equation}
A more specific example would the mapping of the basis state $|0\rangle$ to $(|0\rangle + |1\rangle)/\sqrt{2}$ and $|1\rangle$ to $(|0\rangle - |1\rangle)/\sqrt{2}$ which would be represented by the one-qubit Hadamard matrix:
\begin{equation}
\label{eqn:10}
H=\frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.
\end{equation}
Another common example would be the controlled phase shift gate, a single-qubit phase shift gate that will leave the basis state $|0\rangle$ unchanged and map $|1\rangle$ to $\exp(i \phi)|{1} \rangle$, for a phase shift $\phi$ (such that the probability of measuring either a $|{0} \rangle$ or a $|{1} \rangle$ also remains unchanged).
% Suggestion 5: mention quantum gates being unitary operators on the Hilbert space
In general, quantum gates are unitary operators that map the Hilbert space into itself. They are time evolution operators and they thus conserve energy and information. Moreover, although free quantum evolution requires no energy, true quantum gates instead require energy to operate, as one has to initiate and stop the interactions. Additionally, because quantum error corrections techniques make use of classical hardware, error correction is expected to be a major part of the energy budget in quantum computing.
%This is different than non-reversible classical gates that instead lose information, with Richard Feynman\hyperref[csl:5]{(Feynman, 1996)} proving that deleting information requires energy. By not requiring energy to operate, quantum logic gates can therefore be reduced in size to act on subatomic systems.
\section*{The definition and use of the classical discrete Fourier transform}
Before we move on to the first quantum algorithms, let us first examine the limitations of classical computation via one particular algorithm: the general number field sieve (GNFS).
The GNFS is currently the most efficient known classical algorithm for large integer factorization (particularly for integers exceeding $10^{100}$)\hyperref[csl:6]{(Pomerance, 1996)}. A generalization of the special number field sieve, this algorithm works in sub-exponential time, specifically of complexity $\mathcal{O}[\exp(1.9(\log N)^{1/3} (\log \log N)^{2/3})]$.
However, we compare it to one of the first quantum algorithms to ever spark interest in the field of quantum computation: Shor's algorithm\hyperref[csl:3]{(Shor, 1994)}. Via fast multiplication algorithms\hyperref[csl:7]{(Beckman et al., 1996)}, the algorithm need only take quantum gates of order $\mathcal{O}[(\log N)^2(\log \log N)(\log \log \log N)]$, thus being a member of the bounded-error quantum polynomial time (BQP) complexity class.
One distinguishing feature of the Shor's algorithm is that it requires only a few repeated measurements of the output state to obtain the desired result such that one does not need to know all the coefficients of the output state. This is done via the quantum period-finding subroutine, which does not load classical data and evaluate a full vector but instead finds the period by measuring the result of the QFT multiple times, which will be valuable for quantum speedup. In fact, basing factoring on period finding by using the QFT is the great innovation in Shor's quantum factoring algorithm.
In fact, there is an almost exponential difference between the complexities of the GNFS and Shor's algorithm, showing that quantum computing is, in principle, capable of performing tasks that no classical computer could ever perform: this phenomenon is often referred to as {\it quantum speedup}. To better visualize this complexity, Fig.~\ref{scaling} graphs the number of operations as function of the input size.\selectlanguage{english}
\begin{figure}[h]
\centering
\includegraphics[width=0.5\textwidth]{Revised_Factorization_Complexities.png}
\caption{{Scaling, represented by the number of operations as function of the algorithm's input size. (This graph was normalized to start at the origin, with SA denoting Shor's algorithm.) %\iffalse
}}
\label{scaling}
\end{figure}
% I still think it might be useful to define the classical FT, so I kept it in here while specifying that it is indeed just here for clarity
\section*{Limitations of Classical Algorithms}
Before introducing the QFT, for clarity, we define its classical counterpart: the discrete Fourier transform (DFT). The DFT maps a sequence of $N$ complex numbers $\boldsymbol{x} = \{x_0, x_1, ...., x_{N-1}\}$ into another sequence, $\boldsymbol{y} = \{X_0,X_1,...,X_{N-1}\}$, such that:
\begin{equation}
y_k
=\sum_{n=0}^{N-1}{x_n\cdot e^{-\frac{i 2 \pi}{N}kn}}, \ k=0, 1, 2, ..., N-1.
\end{equation}
Moreover, an $N$-point DFT is often expressed as
\begin{equation}
\mathbf{y} = W \mathbf{x}
\end{equation}
where $W$ is the DFT matrix:
\begin{equation}
W=\frac{1}{\sqrt{N}}
\begin{bmatrix}
1 & 1 & 1 & 1 & \dots & 1 \\
1 & \omega & \omega^2
& \omega^3 & \dots & \omega^{N-1} \\
1 & \omega^2 & \omega^4 & \omega^6 & \dots & \omega^{2(N-1)} \\
1 & \omega^3 & \omega^6 & \omega^9 & \dots & \omega^{3(N-1)} \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
1 & \omega^{N-1} & \omega^{2(N-1)} & \omega^{3(N-1)} & \dots & \omega^{(N-1)(N-1)}
\end{bmatrix}
\end{equation}
Here, $\omega=\exp(-{2 \pi i}/{N})$ is the primitive $N$th root of unity.
This mathematical operation is typically implemented as an algorithm that minimizes the number of elementary arithmetic operations, which is known as the fast Fourier transform (FFT)\hyperref[csl:8]{(Zhou et al., 1992)}. This reduces the complexity by directly computing the DFT from $\mathcal{O}(N^2)$ to $\mathcal{O}(N \log{N})$\hyperref[csl:9]{(Lohne, 2017)}. Here, we will be using the Cooley-Tukey algorithm\hyperref[csl:10]{(Bekele, 2006)} to implement the FFT.
The simplest implementation of the Cooley-Tukey algorithm is the radix-$2$ decimation-in-time (DIT) case, which divides a DFT of size $N$ into two interleaved DFTs with size $N/2$ for each stage in the algorithm's recursion; the next section examines an implementation of this method in MATLAB.
\section*{The implementation of the radix-2 DIT case in MATLAB}
The radix-2 DIT is a recursive algorithm, taking the following inputs: $x$, the relevant data, the input data $x$ and its size $N=2^n$, and the stride z (that is the distance in memory between consecutive values of x; we are taking an initial value of $z = 1$).
A possible implementation of the algorithm in MATLAB is as follows:
\begin{lstlisting}[language=MATLAB]
function d = DIT(x, N, z)
if N == 1
d(1) = x(1)
else
d(1:N/2) = DIT(x, N/2, 2*z)
d(N/2+1:N) = DIT(x+z, N/2, 2*z)
for k = 1:N/2
t = d(k)
d(k) =
t + exp(-2*pi*i*(k-1)/N)*d(k+N/2)
d(k+N/2) =
t - exp(-2*pi*i*(k-1)/N)*d(k+N/2)
end
end
end
\end{lstlisting}
In the function's initial if statement, we account for the trivial case wherein $x$ is a single element list; we thus set our output equal to the first member of $x$ (which will, of course, always be equivalent to $x$).
In the else statement, we truly access the essence of the algorithm by first setting the first half of the output equal to the recursion of the function with a halved $N$ and a doubled $z$ (thus reiterating through the function until the elimination case is reached) and then setting the second half of the output to the shifted input by $z$ with a halved $N$ a doubled $z$. Afterwards, we interleave the two evaluations to get our full DIT. This results in some interesting complexity analysis, which is reviewed in the following section.
\section*{Evaluating the complexity of the radix-2 DIT implementation}
The runtime of the input of our implementation need only depend on the size of our input, $N$. Thus, we can describe this runtime via the usage of some unknown function, $T(N)$. As a result, the total runtime of our DIT implementation can be expressed as $T(N)$, and the recursive computations in line $5$ and $6$ as $T(N/2)$ (since we are merely reevaluating the function with a reduced $N$). Lastly, the runtime of the computations present in the final for loop is proportional to $N$. However, since we are to evaluate the overall complexity of our algorithm, we can ignore any such scaling factors and simply write $N$. Therefore, we have:
\begin{equation}
T(N)=2T(\frac{N}{2})+N.
\end{equation}
Next, we can utilize the master theorem for divide-and-conquer recurrences for a more concrete answer. The master theorem states that for recurrence relations of the form $T(N)=aT(N/B)+f(N)$, we determine the critical exponent $c_{\text{crit}}=\log_b{a}$. Therefore, we utilize the case of the master theorem stating that $f(N)=\mathcal{O}(N^{c_{\text{crit}}} \log^k{N})$ for any $k \geq 0$, since we merely have $c_{\text{crit}}=\log_2{2}=1$ and we can thus simply set $k=0$ such that $\log^k{N}=\log^0{N}=1$. Therefore, in accordance to the master theorem, we determine the function as having complexity:
\begin{equation}
T(N)=\mathcal{O}(N^{c_{\text{crit}}}\log^{k+1}N)=\mathcal{O}(N \log{N}).
\end{equation}
As expected, our implementation makes correct usage of the Cooley-Tukey algorithm, showing the aforementioned complexity.
\section*{Definition and usage of quantum Fourier transform}
Akin to the DFT that we examined in Section V, the quantum Fourier transform (QFT) maps a quantum state $|{x}\rangle=\sum_{i=0}^{N-1}{x_i |{i}\rangle}$ to $\sum_{i=0}^{N-1}{y_i |{i}\rangle}$ in accordance to the formula:
\begin{equation}
y_k=\frac{1}{\sqrt{N}}\sum_{n=0}^{N-1}{x_n\omega^{nk}}, \ k=0, 1, 2, ..., N-1.
\end{equation}
(We continue to use the notation of $\omega$ as a function of $N$: for further clarity, $\omega=\exp(-2 \pi i/N)$ for $N=2^n$.) Moreover, we can also re-express the QFT in a way that will be easier to simulate for our implementation\hyperref[csl:11]{(Tellez et al., 2008)}, which represents our input as a tensor product of states like so (here, $s$ is used to better distinguish it from $x$):
\begin{equation}
|{s}\rangle=|{s_1 s_2 \dots s_n}\rangle=|{s_1}\rangle \otimes |{s_2}\rangle \otimes \dots \otimes |{s_n}\rangle.
\end{equation}
We also borrow fractional binary notation such that:
\begin{equation}
[0.s_1 \dots s_m] = \sum_{k=1}^m{s_k2^{-k}}.
\label{bfn}
\end{equation}
Therefore, we can express the QFT as:
\begin{equation}
\label{eqn:QFT}
\text{QFT}(|{s}\rangle)=\frac{1}{\sqrt{2^n}} \bigotimes_{j=1}^n{(|{0} \rangle+e^{2 \pi i [0.s_j s_{j+1} \dots s_n]} |{1} \rangle)}.
\label{QFT}
\end{equation}
More precisely, eq. $\ref{QFT}$ represents a time evolution operator constituted by the composition of $n^2$ gates in parallel, some Hadamard gates, and some controlled phase shift gates. Here, the larger tensor product symbol functions as an indexed product of the Kronecker tensor product.
Via decomposition, the QFT on $2^n$ amplitudes can be implemented as a quantum circuit consisting of $\mathcal{O}(n^2)$ Hadamard gates and controlled phase shift gates for a number of qubits $n$.
The previously classical DFT would, in our implementation, require $\mathcal{O}(n2^n)$ gates, which is exponentially greater than the aforementioned quantum complexity. This may also be compared with $\mathcal{O}(n \log n)$, the number of gates required by the most efficient known quantum Fourier transform algorithms for an approximate result.%\hyperref[csl:12]{({{S.}~{Hallgren}}, 2000)}
This analysis, however, does not take into account the complexity of preparing the input state and measuring the output states, which both have complexity $\mathcal{O}(N\log 1/\epsilon)$ for a required resolution $\epsilon$\hyperref[csl:13]{(Shende et al., 2006)}. It is therefore evident that in the case of the best-known QFT algorithm, the complexity is completely dominated by these measurements. Furthermore, if one wants to read out the full output vector the complexity becomes $\mathcal{O}(N^2 \log^2 1/\epsilon)$. The classical FFT algorithm also has a dependency on the desired precision, but we treat it as a constant depending on machine precision, not considering arbitrary precision implementations of the classical algorithm.
However, the complexity analysis of our simulation on a classical computer will not be fully accurate to the innate properties of a quantum device. On a classical computer, we cannot simulate the full QFT algorithm but we can simulate the application of the unary operator that corresponds to the QFT algorithm to one base state. We first examine a gate-level accurate implementation that we may write in pseudocode. As the following image indicates, the QFT may be constructed with Hadamard gates (written $H$) and controlled phase shift gates (written $R_m$). \newline\selectlanguage{english}
\begin{figure}[h]
\centering
\includegraphics[width=0.49\textwidth]{QFT_Circuit_2.png}
\caption{{A quantum circuit drawn for the QFT for $n$ qubits.}}
\end{figure}\newline
In pseudocode, we may write:
\begin{lstlisting}
function QFT(s):
for i in 1..n:
H(s_i)
for m in 2...{n-i+1}
R_m(s_i)
\end{lstlisting}
Where $H(..)$ is an application of the Hadamard gate and $R_m(...)$ is an application of the controlled phase shift gate as in:
$$R_m = \begin{bmatrix} 1 & 0 \\ 0 & e^{i \phi} \end{bmatrix}$$
for some phase shift $\phi$.
They both may be thought of as operators acting in place on the state $s_i$.
The above algorithm consists of two nested loops therefore it has a complexity of $\mathcal{O}(n^2) = \mathcal{O}(\log^2 N)$.
Since each quantum gate can be simulated in $\mathcal{O}(N)$, and taking advantage of the possibility of parallelizing some of the transformations implemented by the gates, then we can simulate the above circuit in $\mathcal{O}(N \log^2 N)$.
We provide a possible implementation of this algorithm in MATLAB\footnote{Source code of the algorithm in MATLAB may be found at \url{https://github.com/DamianRMusk/MATLAB\_FT}.} using the QUBIT4MATLAB\hyperref[csl:14]{(Toth, 2008)} library.
% \begin{lstlisting}[language=matlab]
% function q = QFT(s)
% b0=ket([1 0]);
% b1=ket([0 1]);
% n = length(s)
% prodCoeff = 1/sqrt(2^n)
% e = exp(1)
% q =
% prodCoeff*
% (b0+e^(2*pi*i*FBN(s(N)))*b1)
% for j = 1:(n-1)
% tpn =
% prodCoeff*
% (b0+e^(2*pi*i*FBN(s(n-j:n)))*b1)
% q=kron(q,tpn)
% end
%end
%function b = FBN(s)
% b=0;
% for n=1:length(s)
% b=b+(s(n)/2^n)
% end
%end
%\end{lstlisting}
%This is merely a reformulated rendition of our QFT expression, Eq.~\ref{QFT}. from Section VI:
%The code starts by defining a basis of states (we abbreviated b for ``basis" in the variable names) using the QUBIT4MATLAB's ket function, $|{0} \rangle$ and $|{1} \rangle$, $N$ as the length of our input state, sumCoeff as the transform's coefficient, $1/2^n$, and then we initialize $q$ as the first term in our tensor product.
%The code proceeds to iterate over the elements of the output. $j$ here labels the basis states, constructed using the tensor product, using the MATLAB function kron (representing the Kronecker tensor product). As new states are computed, they are added to the complete state.
%Finally, the code performs the computation, and our function finishes with the use of the aforementioned fractional binary notation of Eq.~\ref{bfn} via a helper function $FNB$ defined below that uses a simple iterative loop with the variable b serving as our accumulator.
\section*{Evaluating the complexity of the QFT}
In the previous section, we discussed the running time of the gate-accurate description of the QFT algorithm and proceeded to multiply by $N$ (the cost of simulating the gate for each possible input) to obtain the gate-accurate simulation time.
We now discuss the complexity of eq.~\ref{QFT} given one input $s$. One way to implement eq.~\ref{QFT} consists of pre-computing all required values of $[0.s_j...s_n]$ (eq.~\ref{bfn}) for each $j$. Since we have $n$ possible values of $j$ and the sum has a complexity proportional to $n$ for each $j$, this operation has a total complexity of $n^2$.
We now consider the complexity of computing the Kronecker products in eq.~\ref{QFT}. This tensor product is effectively just a form of multiplication: using the associative property, we see that first product involves two vectors of size 2, resulting in a vector of size 4. The second product involves a vector of size 4 and a vector of size 2, resulting in a vector of size 8. This continues to the final result of size $N=2^n$.
Thus, we effectively sum over $2^j$ as:
\begin{equation}
\sum_{j=1}^{j