The thesis of this paper is as follows:
Users want to write, using compositions of the transpose and product operators, linear algebra expressions that involve vectors and matrices, and the results make sense, i.e. the results of u'*v behaves like a scalar. Examples of linear algebraic expressions users expect to write include:
Inner product \(u^{T}v\), a scalar,
Outer product \(uv^{T}\), a matrix,
Quadratic form \(u^{T}Au\), a scalar,
Bilinear form \(u^{T}Av\), a scalar,
The description of these quantities employ Householder notation (Householder 1953, Householder 1955), a convention that is familiar to most, if not all, practitioners of numerical linear algebra today.
Humans, to a fault, are so good at special casing abstractions by context that it would be easy to conclude that a discussion of how linear algebra fits into computer languages would hardly seem necessary. Decades after APL made multidimensional arrays first class objects, and MATLAB made matrix laboratory syntax popular, we recently reached an inescapable conclusion, one we preferred not to believe, that vectors, in the sense of computer languages, and linear algebra vectors have serious coexistence issues.
To set the stage, in nearly every computer language (other than MATLAB (!!)) a “vector” is a one dimensional container. In concrete linear algebra there are column vectors and row vectors. In both concrete and abstract linear algebra, vectors are quantities that are subject to addition, multiplication by scalars, linear transformations, and play a role in scalar and outer products. As we shall see, the seemingly innocuous idea of transposing a vector turns out to be a highly contentious subject.
Consider the following familiar statements
A matrix times a vector is a vector
Matrix Multiplication of \(A\) and \(B\) is defined by taking the scalar product of a row of \(A\) with a column of \(B\).
The scalar product is defined on a pair of vectors. The scalar product of \(v\) with itself is a non-negative number known as \(\|v\|^{2}\).
and consider common notations in math or software:
A[i,:]*B[:,j]
or dot(A[i,:],B[:,j])
\(v^{T}\!v=(v,v)=\) v'*v
=dot(v,v)
(more coming)
Matrices and vectors are fundamental concepts for both computer science (Knuth 1967, Pratt 2001) and computational science (Strang 2003, Trefethen 1997). Nevertheless, the terms “vector” and “matrix” mean different things in the contexts of data structures and linear algebra. As data structures, they are simply arrays, whereas as linear algebraic objects, vectors are elements of a vector space and matrices are linear transformations. When we represent linear algebra vectors and linear algebra transformation in a basis, we obtain the familiar containers we know simply as vectors and matrices.
Computer science focuses primarily on the homogeneous container semantics of vectors and matrices. Oftentimes they are considered synonymous with arrays of rank 1 and 2 respectively. The seminal work of (Iliffe 1961) says
“Depending on the organization of the array it may be treated as a set, a vector, or a matrix.”
The classic (Knuth 1967) focuses only on indexing semantics; the index entry for “two-dimensional array” cross-references the entry for “matrix”. Even today, the conflation persists. A modern textbook on programming language design writes (p. 215 Pratt 2001):
A vector is a one-dimensional array; a matrix composed of rows and columns of components is a two-dimensional array[.]
In contrast, vectors and matrices in linear algebra are defined by their algebraic properties, such as inner products and matrix-vector products.
The aim of this paper is to identify if and when the semantics of array indexing and linear algebra may conflict.
“In linear algebra, better theorems and more insight emerge if complex numbers are investigated along with real numbers.”-(p. 1 Axler 2015)
The goal of this section is to substantiate the “what users want” part of our thesis. While seemingly obvious to most practitioners today, the historical evidence for writing expressions using compositions of transposition and products has been most illuminating and reflects the multifaceted evolution of what we know today as linear algebra.
Programming languages designed for scientific computing have many users which are familiar with linear algebraic constructs expressible using various products and quotients involving matrix and vector quantities. Some of these quantities include:
Inner products, \(u^{\prime}v\)
Outer products, \(uv^{\prime}\)
Bilinear forms, \(u^{\prime}Av\)
Quadratic/Hermitian forms, \(v^{\prime}Av\)
The notation for these quantities were systematized by Householder (Householder 1953, Householder 1955), who used capital Roman letters for matrices, small Roman letters for (column) vectors, and small Greek letters for scalars. Importantly, Householder also uses the vector transpose notation \(u^{\prime}\) to express row vectors, which are then used to construct the above products and forms. (Householder 1955) uses \(u^{T}={(u_{1},\cdots,u_{n})}^{T}\) to express row vectors as tuples, a construct which we shall see later as having ample precedent. (Householder 1955) writes the inner product as \(u^{*}v\), noting that \(*\) “denotes the conjugate transpose”. (Householder 1953) introduces the Hermitian form. (Householder 1953) introduces the row vector \(u^{T}\) corresponding to the column vector \(u\). Householder also acknowledges in (Householder 1953) the analogy between a numerical vector (as a column of a matrix) and a geometric vector in the sense of Gibbs. He writes the vector transpose as if it follows immediately from the matrix transpose:
If each column of a matrix \(M\) is written as a row, the order remaining the same, the resulting matrix is known as the transpose of the original and is designated \(M^{T}\). In particular, if \(x\) is the column vector of the \(\xi_{i}\), \(x^{T}\) is the row vector of the \(\xi_{i}\).
The outer product shows up in (Sec. 2.24 Householder 1953), although Householder does not give the quantity \(uv^{T}\) a special name^{1}^{1}while —(Householder 1953)— introduces “outer products” \([uv]\), these quantities are known today as bivectors and are conventionally denoted \(u\wedge v\)., and the discussion in context clearly implies that Householder considers vector-scalar-vector products \(u\sigma v^{T}\) special cases of matrix products \(USV^{T}\).
Interestingly, (Householder 1953) does not use the terms “bilinear form”, “quadratic form”, or “Hermitian form”. He uses “scalar product”. (Householder 1955) clearly spells out his notational convention. ((Householder 1953) has a missing page which might also spell it out, but it’s not clear.)
These basic expressions can be used to build more complicated expressions that occur in scientific computing.
Examples of more complicated expressions
BFGS Update Step
Alan’s example with least squares? MANOVA/Jacobi distribution formulas?