Jiahao Chen Add old paper  over 8 years ago

Commit id: fdcd13c5bc94c8ad085bcad246bc824ae4e93b38

deletions | additions      

         

\section{Old Introduction (Proposed) }  Humans, to a fault, are so good at special casing abstractions  by context that it would be easy to conclude that a discussion  of how linear algebra fits into computer languages would hardly  seem necessary. Decades after APL made multidimensional  arrays first class objects, and MATLAB made matrix laboratory  syntax popular, we recently reached an inescapable conclusion,  one we preferred not to believe, that   vectors, in the sense of computer languages,  and linear algebra vectors have serious coexistence issues.  To set the stage, in nearly every computer language (other than MATLAB (!!) )  a ``vector" is a one dimensional container. In concrete  linear algebra there are column vectors and row vectors.  In both concrete and abstract linear algebra, vectors are quantities that are subject  to addition, multiplication by scalars, linear transformations, and   play a role in scalar and outer products.  As we shall see, the seemingly innocuous idea of transposing a vector turns out to be  a highly contentious subject.  Consider the following familiar statements \begin{itemize}  \item A matrix times a vector is a vector  \item Matrix Multiplication of $A$ and $B$ is defined by taking the scalar product  of a row of $A$ with a column of $B$.  \item The scalar product is defined on a pair of vectors.  The scalar product of $v$ with itself is a non-negative number known as  $\|v\|^2$.  \end{itemize}  and consider common notations in math or software:  \begin{itemize}  \item \verb+A[i,:]*B[:,j]+ or \verb+dot(A[i,:],B[:,j]) +  \item $v^T\!v=(v,v)=$ \verb+v'*v+=\verb+dot(v,v)+  \end{itemize}  (more coming)    \section{Introduction (original)}  Matrices and vectors are fundamental concepts for both computer science  ~\cite{Knuth1967,Pratt2001} and computational  science~\cite{Strang2003,Trefethen1997}. Nevertheless, the terms ``vector'' and  ``matrix'' mean different things in the contexts of data structures and linear  algebra. As data structures, they are simply arrays, whereas as linear  algebraic objects, vectors are elements of a vector space and matrices are linear transformations.  When we represent linear algebra vectors and linear algebra transformation in a basis,  we obtain the familiar containers we know simply as vectors and matrices.  Computer science focuses primarily on the homogeneous container semantics of  vectors and matrices. Oftentimes they are considered synonymous with arrays of  rank 1 and 2 respectively. The seminal work of \cite{Iliffe1961} says  %  \begin{quote}  ``Depending on the organization of the array it may  be treated as a \textit{set}, a \textit{vector}, or a \textit{matrix}.''  \end{quote}  %  The classic \cite{Knuth1967} focuses only on indexing semantics; the index  entry for ``two-dimensional array'' cross-references the entry for ``matrix''.  Even today, the conflation persists. A modern textbook on programming language  design writes~\cite[p. 215]{Pratt2001}:  %  \begin{quote}  A vector is a one-dimensional array; a matrix composed of rows and columns of  components is a two-dimensional array[.]  \end{quote}  %PZ p 217 also has a nice description of linear indexing semantics adopted from  %Fortran. Slicing came from PL/I.  In contrast, vectors and matrices in linear algebra are defined by their  algebraic properties, such as inner products and matrix-vector products.  The aim of this paper is to identify if and when the semantics of array  indexing and linear algebra may conflict.  \section{Vector indexing and inner products}  \paragraph{Indexing is an inner product operation}  Indexing is a key aspect of array semantics. For array data structures, the  most basic form of indexing maps an index to an element of a homogeneous  collection. For numeric data types, the indexing operation \lstinline|v[i]| on  a one-dimension array \lstinline|v| of length $n$ can also be expressed in the  language of linear algebra as an inner product of two vectors  \begin{equation}  v_i = e_i \cdot v,\label{eq:idx1}  \end{equation}  where $v$ is the array \lstinline|v| interpreted as a vector $(v_1, v_2, \dots,  v_n)$, and $e_i$ is the $i$th canonical basis vector $(0, 0, \dots, 0, 1, 0,  \dots, 0)$ where the $i$th entry is nonzero.%  \footnote{In this paper, we assume column major order and one-based indexing,  which is prevalent in the numerical computing literature and also in  programming languages which are popular for numerical computing such as  FORTRAN, MATLAB and Julia. The discussion in this paper apply equally to row  major order and/or zero-based indexes with only minor notational changes. Also,  we use the Householder notation convention that is ubiquitous in the numerical  linear algebra literature, where vectors are named with small Latin letters,  matrices are named with capital Latin letters, and scalars not associated with  the entry of any vector or matrix are named with small Greek  letters.~\cite{Householder1964}}  The preceding discussion requires the following to hold:  \begin{enumerate}[label=\bfseries{}A\arabic*]  \item The equivalence between indexing and inner products holds for  any data type for which the integer 1 is a multiplicative  identity. \label{point1}  \end{enumerate}  %  %TODO unital algebra  %  %TODO indexing is more fundamanetal than linalg. You can index vectors of strings, for example.  \paragraph{Slicing is a vector-matrix product}  Many languages for technical computing like FORTRAN, APL, MATLAB, Python/NumPy,  and Julia also feature more sophisticated indexing semantics, such as slicing.  Slicing takes an array of indexes rather than a single scalar index, and returns  another array containing the respective elements. For example, a Julia  expression like \lstinline|v[1:3]| returns a new vector with entries  \lstinline|v[1], v[2], v[3]|.  Now consider the analogous argument that a slicing operation can also be  expressed in the language of linear algebra: \lstinline|v[1:3]| corresponds to  the matrix-vector product  \begin{equation}  \begin{pmatrix}e_1 & e_2 & e_3\end{pmatrix}^\prime v,\label{eq:idx2}  \end{equation}  where $\begin{pmatrix}e_1 & e_2 & e_3\end{pmatrix}$ is the $n \times 3$  matrix with columns $e_1$, $e_2$ and $e_3$ respectively and $'$ denotes matrix  transposition.%  \footnote{For simplicity, we consider only real numeric types, although the  discussion holds also for other algebraic fields such as complex numbers when  complex conjugations are combined appropriately with transposition.}  The equivalence follows from working out the product row by row,  and noting that  \begin{equation}  e_i^\prime v = e_i \cdot v = v_i,\label{eq:v'}  \end{equation}  where the last equality follows from \eqref{eq:idx1}.  The astute reader would have noted that in going from \eqref{eq:idx2} to  \eqref{eq:v'}, we sneaked in the notion of a \textit{vector transpose}, having  extracted rows of a matrix for subcomputations of the form \eqref{eq:v'}.  Whereas $(e_1)$ is an $n \times 1$ matrix, $e_1$ is an $(n,)$-vector. This  distinction is glossed over by numerical analysts, but the conflation of these  two objects poses a problem for computer scientists: $(e_1)^\prime$ must be  a $1 \times n$ matrix from the rules of matrix transposition, but what then is  $e_1^\prime$? Recall that the transpose of an $m \times n$ matrix $A$ is an  $n \times m$ matrix $A^\prime$ whose element in the $i$th row and $j$th column  is  \begin{equation}  \left(A^\prime\right)_{ij} = A_{ji}.\label{eq:A'}  \end{equation}  Transposition clearly swaps the two indexes. But what should transposition do  for an $(n,)$-vector, which has only one dimension?  The semantic problem posed by vector transposition arose from the assertion  that the columns of a matrix are column vectors. One could argue that an  expression of the form $\begin{pmatrix}v_1 & v_2 & \dots v_k \end{pmatrix}$  should not be thought of as the concatenation of vectors, but rather $n\times1$  matrices. Thus one sees here the intermixing of indexing semantics and  algebraic semantics, since one now has two choices of indexing semantics:  \begin{enumerate}[label=\bfseries{}C\arabic*]  \item A column slice \lstinline|A[:, i]| returns an $(n,)$-array.\label{rule:col1}  \item A column slice \lstinline|A[:, i]| returns an $(n,1)$-array.\label{rule:col2}  \end{enumerate}  Bearing these two choices in mind, we can now list possible choices for the  meaning of the expression \lstinline|v^\prime|. Let \lstinline|v| be a $(n,)$-array. Then:  \begin{enumerate}[label=\bfseries{}T\arabic*]  \item Transposing a vector, \lstinline|v^\prime|, returns a $(n,)$-array, which is just  $v$.\label{rule:v'1}  \item Transposing a vector, \lstinline|v^\prime|, returns a $(1,n)$-array.\label{rule:v'2}  \item Transposing a vector, \lstinline|v^\prime|, returns an object of shape $(n,)$,  which is in a dual space of some sort. \textit{not} a conventional array  that lies outside our current system.\label{rule:v'3}  \end{enumerate}  %  %TODO T2 requires a cast of v vector into (v,) matrix. then you transpose it.  %  \ref{rule:v'2} follows naturally from \ref{rule:col2} and the definition of  matrix transposition in \eqref{eq:A'}, which swaps the indexes. Similarly,  \ref{rule:v'1} follows naturally from \ref{rule:col1} and noting that  ``swapping'' the order of indexes is a no-op, since the identity is the only  permutation of length 1. \ref{rule:v'3} is perhaps the least intuitive, but  can be justified by noting that \lstinline|v^\prime| is a row vector, which is not  identical to a column vector and therefore cannot have the same representation.  A less conventional justification of \ref{rule:v'3} is to regard \lstinline|v^\prime|  not as an array, but a dot product expression which is curried in the first  argument. In other words, \lstinline|v^\prime| is an incomplete expression waiting to  be multiplied with another $(n,)$-vector \lstinline|w| so that  %  \begin{equation}  \alpha = v^\prime w = v \cdot w,\label{eq:w'v}  \end{equation}  %  %TODO clarify the point is that we want equality between the rules of matrix  %multiply (lhs) of second equality, and the rhs is vector dot product.  %Remind people tht (mxn) x n => m.  %Make the point that T1, T3 it is natural to take this equation as the definition as  %of the product whereas T2 has to already obey matmul.  %  %TODO v*w actually means matmul. But Alan wants to consider the case where we overload * to mean scalar*non-scalar also as a special case.  This interpretation of \lstinline|v^\prime| corresponds closely to the notion of a  linear functional, which is used in functional analysis.  We see immediately that while \ref{rule:v'1} and \ref{rule:v'3} satisfy  \eqref{eq:w'v}, producing a scalar result, \ref{rule:v'2} results in a product  involving quantities of shapes $(1,n)$ and $(n,)$, which by the usual rules of  linear algebra must produce a quantity of shape $(1,)$ and not a scalar. Some  programming languages like MATLAB employ an additional rule to circumvent this  problem, which is to define scalars and $(1,)$-vectors to be equivalent for the  purposes of equality comparison. More generally, we have the rule  \begin{enumerate}[label=\bfseries{}I\arabic*]  \item An array $A$ of shape $(n_1, n_2, \dots, n_k)$ defines an equivalence  class which also includes reshapings of $A$ into arrays of shape $(n_1, n_2,  \dots, n_k, 1)$, $(n_1, n_2, \dots, n_k, 1, 1)$, and other shapes with an  arbitrary number of trailing singleton dimensions, all of which contain the  same values. For $n = k = 1$, scalar quantities are also included in this  equivalence class.\label{rule:trail1}  \end{enumerate}  The equivalences defined by \ref{rule:trail1} remove some of the semantic  distinctions made earlier. First, the result of column indexing is no longer  ambiguous, as the combination \ref{rule:trail1}+\ref{rule:col1} is equivalent  to \ref{rule:trail1}+\ref{rule:col2}. Second, the objection to \ref{rule:v'2}  on the grounds of the inner product is mollified, as each of  \ref{rule:v'1}, \ref{rule:trail1}+\ref{rule:v'2} and \ref{rule:v'3} define  correct inner products \eqref{eq:w'v}.  \section{Outer products}  Inner products are not the only kind of vector-vector product in linear  algebra. There is also the outer product $vu'$, which for a $(n,)$-vector $v$  and a $(m,)$-vector $u$ is the $n \times m$ matrix whose element in the $i$th  row and $j$th column is $v_i w_j$. One can think of the outer product as a  particular kind of matrix-matrix product, generated from the product of a $n  \times 1$ matrix and a $1 \times m$ matrix.  It turns out that the outer product cannot be written as \lstinline|v*u'| with  \ref{rule:v'1} semantics, since \lstinline|vu'| is the product of a $(n,)$  vector with a $(m,)$ vector. This product is not defined except for $n = m$,  whence \lstinline|v*u'| evaluates to the inner product! This result follows  from \ref{rule:v'1} semantics and \eqref{eq:w'v}, since $vu^\prime = vu =  v^\prime u = v \cdot u$.  \ref{rule:v'2} is also problematic, as it yields the product $(n,) \times (1,  m)$, which is incommensurate. Only the combination  \ref{rule:trail1}+\ref{rule:v'2} allows this product to be defined sensibly,  since \ref{rule:trail1} allows $(n,)$ to be reshaped to $(n,1)$, and then the  product can be evaluated using the ordinary rules of matrix-matrix  multiplication.  \ref{rule:v'3} semantics give us the flexibility of defining the outer product  axiomatically as the product \lstinline|u*v^\prime|, since $v^\prime$ is represented  by a different type.  \section{Bilinear forms and the trace operation}  In this section we consider bilinear forms, which are vector-matrix-vector  products of the form $v'Aw$ for an $n$-vector $v$, $n\times m$ matrix $A$, and  $m$-vector $w$ and whose result is a scalar:  %  \begin{equation}  \beta = \sum_{i=1}^n \sum_{j=1}^m v_i A_{ij} w_j.\label{eq:v'Aw}  \end{equation}  By definition, a bilinear form can also be written using the trace operator.  The trace of a $n \times n$ matrix $A$ is defined as the sum of its diagonal  entries, i.e.  %  \begin{equation}  \gamma = \mathrm{tr }A = \sum_{i=1}^n A_{ii}.\label{eq:tr}  \end{equation}  The bilinear form therefore must satisfy the identity  %  \begin{equation}  \beta = v^\prime Aw = \mathrm{tr }Awv^\prime.\label{eq:trid}  \end{equation}  For this identity to be expressible in code of the form  \begin{lstlisting}  v'*A*w == tr(A*w*v')  \end{lstlisting}  , we need the semantics for both inner products and outer products.  Thus the only minimal sets of semantics that obey this identity are  \ref{rule:trail1}+\ref{rule:v'2} or \ref{rule:v'3}.  %TODO say that for the special case of A=I then you can satisfy (5) and be happy.  \section{Invalid operations}  Equally important to the story of capturing algebraic semantics is the story of  capturing \textit{invalid} semantics. Examples of invalid products are  \begin{enumerate}  \item Vector-vector products $vw$,  \item Transposed vector-transposed vector products $v^\prime w^\prime$,  \item Vector-matrix products $vA$, and  \item Matrix-transposed vector products $Av^\prime$.  \end{enumerate}  All these expressions are valid with \ref{rule:v'1} semantics, since any  vector-vector product is the inner product regardless of transposition,  $vA = v^\prime A$, and $Av^\prime = Av$.  \ref{rule:v'2} semantics sometimes yield valid expressions for these products:  %  \begin{enumerate}  \item When $v$ and $w$ are each $1$-vectors, $v^\prime w^\prime$ is a matrix  product yielding a $1\times 1$ matrix.  \item When $A$ is a $n\times 1$ matrix and $v$ is an arbitrary vector (say of  length $m$), $Av^\prime$ yields a $n\times m$ matrix.  \end{enumerate}  %  In this case, considering also \ref{rule:trail1} semantics  exacerbates the problem by creating \textit{more} valid cases:  %  \begin{enumerate}  \item When $v$ and $w$ are each $1$-vectors, $vw$ can be interpreted as a  matrix product of two $1\times1$ matrices, yielding a $1\times1$ matrix.  \item When $A$ is a $1\times n$ matrix and $v$ is an arbitrary vector of length  $m$, then $vA$ produces a $m\times n$ matrix.  \end{enumerate}  %  Thus one could say that \ref{rule:trail1}+\ref{rule:v'2} enables the expression  of inner products, outer products and bilinear forms, but at the price of  falsely allowing some invalid products as valid expressions.  Only \ref{rule:v'3} semantics can be shown to reject all these expressions as  invalid products. With the flexibility afforded by considering a transposed  vector as a different type, we can simply define $v^\prime w^\prime$ and  $Av^\prime$ to be erroneous. The other two products $vw$ and $vA$ are already  errors by default, unless \ref{rule:trail1} is also active, in which case $vw$  and $vA$ can be erroneously valid also.  We therefore conclude that \ref{rule:v'3} is the only valid semantic rule that  correctly excludes all erroneous products.  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  % BEGIN CRUFT  Vectors are also defined  by taking rows and columns of matrices. A typical linear algebra text  introduces notation for columns of a matrix by rewriting a matrix $A$ as  $A = [a_1, a_2, \dots, a_n]$ or even ~\cite[for example,  p. 6]{Trefethen1997}  \[  A = \left[\left.\begin{array}{c}  \\  \\  a_1\\  \\  \\  \end{array}\right|\begin{array}{c}  \\  \\  a_2\\  \\  \\  \end{array}\left|\begin{array}{c}  \\  \\  \ \cdots\ \\  \\  \\  \end{array}\right|\begin{array}{c}  \\  \\  a_n\\  \\  \\  \end{array}\right]  .\label{eq:cols}  \]  %  $a_k$ is therefore a vector that is the $k$th column of $A$. Thus we can see that  the indexing semantics are close in spirit to the operation of extracting  submatrices from matrices. However, it is less clear from the definitions if  the indexing semantics and algebraic semantics are fully compatible.  \section{Matrix slices and column vectors}  Matrices have a natural representation in 2-arrays:  \begin{enumerate}[label=\bfseries{}M\arabic*]  \item An $m \times n$ matrix $A$ with $m$ rows and $n$ columns can be  represented by a $(m,n)$-array, i.e.\ an array of rank 2, size $m$ in  the first dimension, and size $n$ in the second  dimension).\label{rule:mat}  \end{enumerate}  %Such slicing notation was originally developed for array indexing, but has  %since found its way into the mathematical literature, particularly in applied  %mathematics, where notation like $V(:, 1)$~\cite[p.  %2]{Higham2002},\cite[p. 38]{Strang2003} can be found in mathematical formulae.  %Similarly, slice indexing semantics are convenient for defining subblocks of  %matrices, such as  %\begin{lstlisting}  %B = A[1:p, 1:q]  %\end{lstlisting}  %which extracts the first $p$ rows and first $q$ columns of a matrix $A$ into a  %new matrix $B$.  \section{Generalizations to higher rank}  \ref{rule:col1} is a special case of the indexing rule where the rank of the  result is the sum of the ranks of the indexes. This indexing rule is used by  some languages such as APL. The history of this rule is discussed in  Sec.~\ref{sec:related}.  It is tempting to generalize \ref{rule:v'1} to arrays of general rank, by  considering $A^\prime$ as the array constructed by reversing all the indexes of  $A$. While by itself a valid rule, it is fundamentally incompatible with  \ref{rule:trail1} semantics. Applying the ``reverse all indexes'' rule to the  equivalence class defined in \ref{rule:trail1} produces an equivalence class  comprising arrays of shapes $(n_k, n_{k-1}, \dots, n_1)$, $(1, n_k, n_{k-1},  \dots, n_1)$, $(1, 1, n_k, n_{k-1}, \dots, n_1)$, and so on. In other words,  the resulting equivalence class under transposition defines an arbitrary number  of \textit{leading} singleton dimensions, with the result that the first  dimension of the array is no longer uniquely defined. We therefore reject the  combination of \ref{rule:trail1}+\ref{rule:v'1} semantics as mutually  inconsistent.  \section{Data structures for sparse matrices}  This discussion may seem a little silly, but one should recognize  that the situation for sparse matrices is a confusing hodgepodge of  both kinds of semantics. The so-called compressed sparse column or  compressed sparse row representations can only support A1 semantics.  These data structures are analogous to representations of $\left(m,n\right)$-tries.  In contrast, the so-called coordinate or IJV format naturally generalizes  to any number of dimensions.  \section{Implementations in current languages}  ...  Inconsistent design choices can cause user frustration. For example,  consider the Rayleigh quotient, denoted $v^{\prime}Av/v^{\prime}v$.  If $v^{\prime}v$ produced a scalar, the quotient can produce a scalar.  Similarly, if $v^{\prime}v$ produced a $\left(1,1\right)$-array,  we can interpret the result as a $1\times1$ matrix and interpret  the division $/$ as a matrix-matrix right division, i.e. the operation  $A/B=AB^{-1}$. However, if $v^{\prime}v$ produced a $\left(1,\right)$-array,  the usual correspondence to a 1-vector causes this operation to be  undefined, since right division by a vector is undefined. A $(1,)$-array  can be formed if $v^{\prime}$ produced a $1\times N$ matrix, which  when postmultiplied by a $N$-vector must produce a $1$-vector by  the rules of linear algebra. One could special case the length 1 case  but this feels unnatural. Yet another possibility is to define \textbf{implicit  trailing dimensions}, which is a type conversion rule that (in this  context) says that a $(1,)$-array can behave as if it is a $(1,1)$-array  when convenient. With implicit trailing dimensions, right division  by $v^{\prime}v$ can be defined by converting the $(1,)$-array into  a $(1,1)$-array and using the matrix right-division rule. Note that  implicit trailing dimensions allows the data structures of A2 to adopt  the semantics of A1.  %END CRUFT  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  \section{Related work}\label{sec:related}  \paragraph{Array theory} The APL literature contains a comprehensive discussion  of multidimensional array indexing semantics~\cite{Brown1982}, which were  absent in the classical APL of \cite{Iverson1962}. \cite{Ruehr1982,Gerth1988}  surveyed the development of array semantics in APL.  What we refer to as APL style indexing was first written down in the APL/360  manual~\cite{Falkoff1968}, albeit only for rank 2 only, where indexing by a  matrix returned a matrix. \cite{Haegi1976} noted an inconsistency . The general  case has been termed ``scatter-point indexing'' or ``choose indexing''  \cite{Brown1972,Ruehr1982}, and was perhaps most clearly described  in~\cite{More1979}. \cite{Gull1979} emphasized the importance of preserving  the rank of the index set in the rank of the output for multidimensional  arrays.  \cite{More1973} built a theory of multidimensional arrays using recursive  nesting of one-dimensional arrays, formalized upon axiomatic set theory.  \cite{Ghandour1973} emphasized the ordering of elements as a defining  characteristic of arrays. \cite{Gerth1988} emphasized indexing as a function on  arrays from index sets to value sets. We do not consider functional aspects of  indexing in this paper.  The APL literature makes some mention of linear algebra semantics, but does not  consider the interplay with array indexing semantics.  \begin{enumerate}  \item \cite{More1973} mentions the inner product returning a scalar as an  important rule for guiding array semantics.  \item \cite{Haegi1976} bemoans the conflation of scalars and $(1,)$-arrays in  classical APL.  \item \cite[p. 153]{More1973} also suggests the possibility of more complicated  multidimensional array semantics:  \begin{quote}  Let $V$ be an $n$-dimensional vector space over a field. Disregarding  considerations of contravariance and covariance, a tensor of valence $q$ on $V$  is a multilinear mapping of the Cartesian product of the list $V V \dots V$ of  length $q$ into a vector space. If $V$ has a base, then a tensor of valence $q$  on $V$ can be represented by a \textit{component tensor}, which is an array on  $q$ axes, each of length $n$.  \end{quote}  \end{enumerate}  \section{Summary}  The discussion of column and row indexing rules reveals a fundamental tension  between arrays and matrices which we believe have not been given due  consideration in the literature nor in the design of practical programming  languages for scientific applications.         

section_Introduction_The_thesis_of__.tex  Examples_of_linear_algebraic_expressions__.tex  4774old.tex  section_What_users_want_The__.tex  subsection_The_outer_product_The__.tex  subsection_The_contributions_of_Grassmann__.tex