2025-09-26 Householder QR

  • Right vs left-looking algorithms

  • Elementary reflectors

  • Householder QR

[1]:
using LinearAlgebra
using Plots
using Polynomials
default(lw=4, ms=5, legendfontsize=12, xtickfontsize=12, ytickfontsize=12)

# Here's our Vandermonde matrix again
function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end
[1]:
vander (generic function with 2 methods)

Gram-Schmidt orthogonalization

Suppose we’re given some vectors and want to find an orthogonal basis for their span.

\[\begin{split}A = \begin{split} \Bigg[ a_1 \Bigg| a_2 \Bigg] = \Bigg[ q_1 \Bigg| q_2 \Bigg] \begin{bmatrix} r_{11} & r_{12} \\ 0 & r_{22} \end{bmatrix} \end{split} = Q R\end{split}\]
[2]:
# Let's form Q and R by making each column orthonormal to all previous columns
function gram_schmidt_naive(A)
    m, n = size(A)
    Q = zeros(m, n) # Orthonormal basis
    R = zeros(n, n) # Upper triangular
    for j in 1:n
        v = A[:, j] # current column to normalize
        for k in 1:j-1
            # We do a projection against the previous columns of Q. Why?
            r = Q[:, k]' * v
            v -= Q[:, k] * r
            R[k, j] = r # What is this factor?
        end
        # Normalize and store our final v
        R[j, j] = norm(v)
        Q[:, j] = v / R[j, j]
    end
    Q, R
end
[2]:
gram_schmidt_naive (generic function with 1 method)

A variant with more parallelism

\[q_2 q_2^T \left( q_1 q_1^T v \right) = q_2 \left( q_2^T q_1 \right) q_1^T v = 0\]
[3]:
# Let's form Q and R by making each column orthonormal to all previous columns
function gram_schmidt_classical(A)
    m, n = size(A)
    # Same setup as our naive version
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[: ,j]
        # Here we make the current column othogonal to all previous
        R[1:j-1, j] = Q[:, 1:j-1]' * v
        v -= Q[:, 1:j-1] * R[1:j-1, j]
        # And normalize our result
        R[j, j] = norm(v)
        Q[:, j] = v / norm(v)
    end
    Q, R
end
[3]:
gram_schmidt_classical (generic function with 1 method)

Right-looking modified Gram-Schmidt

There are inherent data dependencies in this process.

\[\begin{split}\begin{split} \Bigg[ q_1 \Bigg| q_2 \Bigg| q_3 \Bigg| q_4 \Bigg| q_5 \Bigg] \begin{bmatrix} r_{11} & r_{12} & r_{13} & r_{14} & r_{15} \\ & r_{22} & r_{23} & r_{24} & r_{25} \\ & & r_{33} & r_{34} & r_{35} \\ & & & r_{44} & r_{45} \\ & & & & r_{55} \end{bmatrix} \end{split}\end{split}\]
[4]:
function gram_schmidt_modified(A)
    m, n = size(A)
    Q = copy(A)
    R = zeros(n, n)
    for j in 1:n
        # First we normalize our column
        R[j, j]  = norm(Q[:, j])
        Q[:, j] /= R[j, j]
        # Then we make all following columns orthogonal to the current
        R[j, j+1:end]  = Q[:, j]' * Q[:, j+1:end]
        Q[:, j+1:end] -= Q[:, j]  * R[j, j+1:end]'
    end
    Q, R
end
[4]:
gram_schmidt_modified (generic function with 1 method)

Classical vs modified

Classical

  • Really unstable, orthogonality error size of \(1 \gg \epsilon_\text{machine}\)

  • Don’t need to know all the vectors in advance

Modified

  • Needs to be right-looking for efficiency

  • Less unstable, but orthogonality error \(10^{-9} \gg \epsilon_\text{machine}\)

Householder QR

Gram-Schmidt constructed a triangular matrix \(R\) to orthogonalize \(A\) into \(Q\). Each step was a projector, which is a rank-deficient operation. Householder uses orthogonal transformations (reflectors) to triangularize.

\[\underbrace{Q_{n} \dotsb Q_1}_{Q^T} A = R\]

The structure of the algorithm is

\[\begin{split} \begin{split} \underbrace{\begin{bmatrix} * & * & * \\ * & * & * \\ * & * & * \\ * & * & * \\ * & * & * \\ \end{bmatrix}}_{A} \to \underbrace{\begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & * & * \\ 0 & * & * \\ 0 & * & * \\ \end{bmatrix}}_{Q_1 A} \to \underbrace{\begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & * \\ 0 & 0 & * \\ \end{bmatrix}}_{Q_2 Q_1 A} \to \underbrace{\begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix}}_{Q_3 Q_2 Q_1 A} \end{split}\end{split}\]

Constructing the \(Q_j\)

\[\underbrace{Q_n \dotsb Q_1}_{Q^T} A = R\]

Each of our \(Q_j\) will have the form

\[\begin{split}\begin{split}Q_j = \begin{bmatrix} I_j & 0 \\ 0 & F \end{bmatrix}\end{split}\end{split}\]

where \(F\) is a “reflection” that achieves

\[\begin{split}\begin{split} F x = \begin{bmatrix} \lVert x \rVert \\ 0 \\ 0 \\ \vdots \end{bmatrix} \end{split}\end{split}\]

where \(x\) is the column of \(R\) from the diagonal down. This transformation is a reflection across a plane with normal \(v = F x - x = \left\lvert \left\lvert x \right\rvert \right\rvert e_1 - x\).

image.png

The reflection, as depicted above by Trefethen and Bau (1999) can be written as \(F = I - 2 \frac{v v^T}{v^T v}\).

Reflection

[6]:
# Let's make any symmetric matrix
A = rand(4, 4); A += A'

# Now let's build v = |x| e_1 - x
v = copy(A[:,1])
v[1] -= norm(v)
v = normalize(v) # v /= norm(v)

# And build F to clear the first column of A below the diagonal
F = I - 2 * v * v'
B = F * A
display(B)
4×4 Matrix{Float64}:
  2.10041       1.5861     1.92938    1.9544
 -6.71674e-17   0.832466   0.493814  -0.0422542
  3.98231e-17  -0.0690713  0.505746   0.218325
 -3.4793e-17    0.507456   1.58059   -0.303831
[7]:
# Now let's keep going with column 2
v = copy(B[2:end, 2]) # Note - starting at 2, not the full column
v[1] -= norm(v);
v = normalize(v)

# Build and apply F to clear the second column of B
F = I - 2 * v * v'
B[2:end, 2:end] = F * B[2:end, 2:end]
display(B)
4×4 Matrix{Float64}:
  2.10041      1.5861        1.92938    1.9544
 -6.71674e-17  0.977385      1.2055    -0.209166
  3.98231e-17  0.0           0.844947   0.138772
 -3.4793e-17   5.55112e-17  -0.911459   0.280636

Householder algorithm

[8]:
function qr_householder_naive(A)
    m, n = size(A)
    R = copy(A)
    V = [] # list of reflectors
    for j in 1:n
        # Build v for the current column
        v = copy(R[j:end, j])
        v[1] -= norm(v)
        v = normalize(v)
        # Build and apply F
        R[j:end,j:end] -= 2 * v * (v' * R[j:end,j:end])
        # Keep track of the reflectors
        push!(V, v)
    end
    V, R
end
[8]:
qr_householder_naive (generic function with 1 method)
[9]:
# Let's pick on Vandermonde again
m = 4
x = LinRange(-1, 1, m)
A = vander(x, m)

# Factorize and check R
V, R = qr_householder_naive(A)
_, R = qr(A)
display(R)
4×4 Matrix{Float64}:
 -2.0  0.0      -1.11111      0.0
  0.0  1.49071   1.38778e-17  1.3582
  0.0  0.0       0.888889     9.71445e-17
  0.0  0.0       0.0          0.397523

But what about \(Q\)?

[10]:
# Here's a functino to apply the reflectors V to a vector x
function reflectors_mult(V, x)
    y = copy(x)
    for v in reverse(V)
        n = length(v) - 1
        # This is applying F as we did to form R
        y[end-n:end] -= 2 * v * (v' * y[end-n:end])
    end
    y
end

# Create a dense matrix Q from reflectors V
function reflectors_to_dense(V)
    m = length(V[1])
    Q = diagm(ones(m))
    # Apply the reflectors to the identity matrix to build Q
    for j in 1:m
        Q[:, j] = reflectors_mult(V, Q[:,j])
    end
    Q
end
[10]:
reflectors_to_dense (generic function with 1 method)
[11]:
# Again with Vandermonde as our example
m = 20
x = LinRange(-1, 1, m)
A = vander(x, m)

# Factorize
V, R = qr_householder_naive(A)
Q = reflectors_to_dense(V)

# And check
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 4.043305005028868e-15
norm(Q * R - A) = 7.653110366995408e-15

The columns of Q retain orthogonality! This is better than we saw from our right looking \(QR\) last time.

An edge case

[12]:
# Should be trivial, since A == I
A = [1 0; 0 1.]
# But this result is not good
V, R = qr_householder_naive(A)
[12]:
(Any[[NaN, NaN], [NaN]], [NaN NaN; NaN NaN])

We have the lines

v = copy(R[j:end, j])
v[1] -= norm(v)
v = normalize(v)

What happens when \(R\) is already upper triangular?

image2.png

Improved Householder QR

Let’s fix that problem by selecting the reflector that moves \(x\) the larger distance.

[13]:
function qr_householder(A)
    m, n = size(A)
    R = copy(A)
    V = [] # list of reflectors
    for j in 1:n
        # Build v for the current column
        v = copy(R[j:end, j])
        v[1] += sign(v[1]) * norm(v) # Here's the fix!
        v = normalize(v)
        # Build and apply F
        R[j:end, j:end] -= 2 * v * (v' * R[j:end,j:end])
        # Keep track of the reflectors
        push!(V, v)
    end
    V, R
end
[13]:
qr_householder (generic function with 1 method)
[14]:
# Now this works as expected
A = [1 0; 0 1.]
V, R = qr_householder(A)
[14]:
(Any[[1.0, 0.0], [1.0]], [-1.0 0.0; 0.0 -1.0])
[15]:
# And with a more robust test
A  = [2 -1; -1 2] * 1e-10
A += [1 0; 0 1]

V, R = qr_householder(A)
tau = [2*v[1]^2 for v in V]
@show tau
V1 = [v ./ v[1] for v in V]
@show V1
println("R =")
display(R)
tau = [2.0, 2.0]
V1 = [[1.0, -4.999999999e-11], [1.0]]
R =
2×2 Matrix{Float64}:
 -1.0   2.0e-10
  0.0  -1.0

Householder \(QR\) is backward stable

[16]:
# Let's try to 'break' our QR with Vandermonde as before
m = 40
x = LinRange(-1, 1, m)
A = vander(x, m)

V, R = qr_householder(A)
Q = reflectors_to_dense(V)
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 5.932687575393109e-15
norm(Q * R - A) = 6.6179593854314975e-15
[19]:
# And let's compare to Julia
A = [1 0; 0 1.]

# Ours
println("Our QR")
V, R = qr_householder(A)
println("R =")
display(R)
println("Q = ")
display(reflectors_to_dense(V))

# And Julia's
println("\nJulia's QR")
Q, R = qr(A)
println("R =")
display(R)
println("Q = ")
display(Matrix(Q))
Our QR
R =
2×2 Matrix{Float64}:
 -1.0   0.0
  0.0  -1.0
Q =
2×2 Matrix{Float64}:
 -1.0   0.0
  0.0  -1.0

Julia's QR
R =
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0
Q =
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0

Orthogonality is preserved

[20]:
a = 20
x = LinRange(-1, 1, a)
A = vander(x)
Q, _ = gram_schmidt_classical(A)

v = A[:, end]
@show norm(v)
# Note, if v is actually orthogonal, these should be 0
scatter(abs.(Q[:, 1:end-1]' * v), yscale=:log10, xticks=range(1, a-1; step=2), label="\$v \\cdot Q_{:, j}\$")
norm(v) = 1.4245900685395503
[20]:
[21]:
V, R = qr_householder(A)
Q = reflectors_to_dense(V)

v = A[:, end]
@show norm(v)
# Note, if v is actually orthogonal, these should be 0
scatter(abs.(Q[:, 1:end-1]' * v), yscale=:log10, xticks=range(1, a-1; step=2), label="\$v \\cdot Q_{:, j}\$")
norm(v) = 1.4245900685395503
[21]:
[22]:
scatter(abs.(diag(R)), yscale=:log10, label="\$R_{j, j}\$")
[22]:
[23]:
v = Q[:, end]
@show norm(v)
# Note, if v is actually orthogonal, these should be 0
scatter(abs.(Q[:, 1:end-1]' * v), ylims=[1e-18, 1e-15], yticks=[1e-15, 1e-16, 1e-17, 1e-18], yscale=:log10, xticks=range(1, a-1; step=2), label="\$v \\cdot Q_{:, j}\$")
norm(v) = 1.0
Warning: Invalid negative or zero value 0.0 found at series index 1 for log10 based yscale
@ Plots ~/.julia/packages/Plots/xKhUG/src/utils.jl:106
Warning: Invalid negative or zero value 0.0 found at series index 1 for log10 based yscale
@ Plots ~/.julia/packages/Plots/xKhUG/src/utils.jl:106
[23]:
Warning: Invalid negative or zero value 0.0 found at series index 1 for log10 based yscale
@ Plots ~/.julia/packages/Plots/xKhUG/src/utils.jl:106