2025-09-22 Gram-Schmidt

  • Revisit projections, rotations, and reflections

  • Constructing orthogonal bases

  • QR factorization

[1]:
using LinearAlgebra
using Plots
using Polynomials
default(lw=4, ms=5, legendfontsize=12, xtickfontsize=12, ytickfontsize=12)

# Here's our Vandermonde matrix again
function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end
[1]:
vander (generic function with 2 methods)

Polynomials can be orthogonal too

[2]:
x = LinRange(-1, 1, 50)
A = vander(x, 4)
M = A * [.5 0 0 0;  # 0.5
         0  1 0 0;  # x
         0  0 1 0]' # x^2

scatter(x, M, label=["\$0.5\$" "\$x\$" "\$x^2\$"])
plot!(x, 0*x, label=:none, color=:black)
[2]:

Which inner products will be zero?

Which functions are even and odd?

Pairwise inner product

The pairwise inner products between two sets of vectors can be expressed by collecting the sets as columns in matrices and writing \(A = X^T Y\), where \(A_{i, j} = x_i^T y_j\). It follows from this definition that

\[\left( X^T Y \right)^T = Y^T X\]
[3]:
# Let's verify this
@show (M' * M)' == M' * M
M' * M
(M' * M)' == M' * M = true
[3]:
3×3 Matrix{Float64}:
 12.5          -1.11022e-15   8.67347
 -1.11022e-15  17.3469       -2.22045e-16
  8.67347      -2.22045e-16  10.8272

Normalization and orthogonalization

[4]:
# Let's normalize the first column
q1  = M[:, 1]
q1 /= norm(q1)

# and then plot the new function
Q = [q1 M[:, 2:end]]
scatter(x, Q)
[4]:
[5]:
# This should still be symmetric
Q' * Q
[5]:
3×3 Matrix{Float64}:
  1.0          -1.94289e-16   2.45323
 -1.94289e-16  17.3469       -2.22045e-16
  2.45323      -2.22045e-16  10.8272

Orthogonal matrices

If \(x^T y = 0\), then we say \(x\) and \(y\) are orthogonal (or \(x\) is orthogonal to \(y\)).

A vector is said to be normalized if \(\left\lvert \left\lvert x \right\rvert \right\rvert = 1\).

If \(x\) is orthogonal to \(y\) and \(\left\lvert \left\lvert x \right\rvert \right\rvert = \left\lvert \left\lvert y \right\rvert \right\rvert = 1\), then we say \(x\) and \(y\) are orthogonal.

A square matrix with orthonormal columns is said to be an orthogonal matrix.

We typically use \(Q\) or \(U\) and \(V\) for matrices that are know/constructed to be orthogonal. Orthogonal matrices are always full rank - the columns are linearly independent. The inverse of a orthogonal matrix is its transpose.

\[Q^T Q = Q Q^T = I\]

Orthogonal matrices are a powerful building block for robust numerical algorithms.

Gram-Schmidt orthogonalization

Suppose we’re given some vectors and want to find an orthogonal basis for their span.

\[\begin{split}A = \begin{split} \Bigg[ a_1 \Bigg| a_2 \Bigg] = \Bigg[ q_1 \Bigg| q_2 \Bigg] \begin{bmatrix} r_{11} & r_{12} \\ 0 & r_{22} \end{bmatrix} \end{split} = Q R\end{split}\]

A naive Gram-Schmidt algorithm

[6]:
# Let's form Q and R by making each column orthonormal to all previous columns
function gram_schmidt_naive(A)
    m, n = size(A)
    Q = zeros(m, n) # Orthonormal basis
    R = zeros(n, n) # Upper triangular
    for j in 1:n
        v = A[:, j] # current column to normalize
        for k in 1:j-1
            # We do a projection against the previous columns of Q. Why?
            r  = Q[:, k]' * v
            v -= Q[:, k] * r
            R[k, j] = r # What is this factor?
        end
        # normalize and store our final v
        R[j, j] = norm(v)
        Q[:, j] = v / R[j, j]
    end
    Q, R
end
[6]:
gram_schmidt_naive (generic function with 1 method)
[7]:
# Let's try it with the Vandermonde matrix
A = vander(x, 4)
Q, R = gram_schmidt_naive(A)

# Check this is valid
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 1.0245877047652434e-15
norm(Q * R - A) = 4.3681794554507014e-16

What do orthogonal polynomials look like

[8]:
# Let's look at 6 orthonormal polynomials
x = LinRange(-1, 1, 50)
A = vander(x, 6)
Q, R = gram_schmidt_naive(A)
plot(x, Q, legend=:none)
[8]:

What happen if we use more than \(50\) values of \(x\)? Is there a continuous limit?

Theorem

Every full-rank \(m \times n\) matrix (\(m \geq n\)) has a unique reduced \(QR\) factorization with \(R_{j, j} > 0\).

The algorithm we’re using generates this matrix due to the line:

R[j,j] = norm(v)

Solving equations with \(QR\)

If \(A x = b\), then \(R x = Q^T b\).

[9]:
# Let's fit a polynomial to these points
x1 = [-0.9, 0.1, 0.5, 0.8]
y1 = [1, 2.4, -0.2, 1.3]
scatter(x1, y1, label="data")

Q, R = gram_schmidt_naive(vander(x1, 4))
p = R \ (Q' * y1) # Using \ to solve R^-1, easy for upper triangular
@show p
plot!(x, vander(x, 4) * p, label="\$R^{-1}Q^Tb\$")
p = [3.352100840336134, -9.476050420168065, -1.747899159663866, 12.98319327731092]
[9]:

How accurate is the result?

[10]:
m = 20
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = gram_schmidt_naive(A)

@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 7.868453407006443e-9
norm(Q * R - A) = 7.6178704465113005e-16

A variant with more parallelism

\[q_2 q_2^T \left( q_1 q_1^T v \right) = q_2 \left( q_2^T q_1 \right) q_1^T v = 0\]
[11]:
# Let's form Q and R by making each column orthonormal to all previous columns
function gram_schmidt_classical(A)
    m, n = size(A)
    # Same setup as our naive version
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[: ,j]
        # Here we make the current column othogonal to all previous
        R[1:j-1, j] = Q[:, 1:j-1]' * v
        v -= Q[:, 1:j-1] * R[1:j-1, j]
        # And normalize the result
        R[j, j] = norm(v)
        Q[:, j] = v / norm(v)
    end
    Q, R
end
[11]:
gram_schmidt_classical (generic function with 1 method)
[12]:
m = 20
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = gram_schmidt_classical(A)

@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 1.5098459477938957
norm(Q * R - A) = 8.582718086704977e-16