2025-10-01 QR Retrospective¶
Householder \(QR\)
Comparison of interfaces
Profiling
Cholesky \(QR\)
[1]:
using LinearAlgebra
using Plots
using Polynomials
default(lw=4, ms=5, legendfontsize=12, xtickfontsize=12, ytickfontsize=12)
# Here's our Vandermonde matrix again
function vander(x, k=nothing)
if isnothing(k)
k = length(x)
end
m = length(x)
V = ones(m, k)
for j in 2:k
V[:, j] = V[:, j-1] .* x
end
V
end
[1]:
vander (generic function with 2 methods)
Gram-Schmidt orthogonalization¶
Suppose we’re given some vectors and want to find an orthogonal basis for their span.
[2]:
# Let's form Q and R by making each column orthonormal to all previous columns
function gram_schmidt_naive(A)
m, n = size(A)
Q = zeros(m, n) # Orthonormal basis
R = zeros(n, n) # Upper triangular
for j in 1:n
v = A[:, j] # current column to normalize
for k in 1:j-1
# We do a projection against the previous columns of Q. Why?
r = Q[:, k]' * v
v -= Q[:, k] * r
R[k, j] = r # What is this factor?
end
# Normalize and store our final v
R[j, j] = norm(v)
Q[:, j] = v / R[j, j]
end
Q, R
end
[2]:
gram_schmidt_naive (generic function with 1 method)
A variant with more parallelism¶
[3]:
# Let's form Q and R by making each column orthonormal to all previous columns
function gram_schmidt_classical(A)
m, n = size(A)
# Same setup as our naive version
Q = zeros(m, n)
R = zeros(n, n)
for j in 1:n
v = A[: ,j]
# Here we make the current column othogonal to all previous
R[1:j-1, j] = Q[:, 1:j-1]' * v
v -= Q[:, 1:j-1] * R[1:j-1, j]
# And normalize our result
R[j, j] = norm(v)
Q[:, j] = v / norm(v)
end
Q, R
end
[3]:
gram_schmidt_classical (generic function with 1 method)
Right-looking modified Gram-Schmidt¶
There are inherent data dependencies in this process.
[4]:
function gram_schmidt_modified(A)
m, n = size(A)
Q = copy(A)
R = zeros(n, n)
for j in 1:n
# First we normalize our column
R[j, j] = norm(Q[:, j])
Q[:, j] /= R[j, j]
# Then we make all following columns orthogonal to the current
R[j, j+1:end] = Q[:, j]' * Q[:, j+1:end]
Q[:, j+1:end] -= Q[:, j] * R[j, j+1:end]'
end
Q, R
end
[4]:
gram_schmidt_modified (generic function with 1 method)
Householder QR¶
Gram-Schmidt constructed a triangular matrix \(R\) to orthogonalize \(A\) into \(Q\). Each step was a projector, which is a rank-deficient operation. Householder uses orthogonal transformations (reflectors) to triangularize.
The structure of the algorithm is
Constructing the \(Q_j\)¶
Each of our \(Q_j\) will have the form
where \(F\) is a “reflection” that achieves
where \(x\) is the column of \(R\) from the diagonal down. This transformation is a reflection across a plane with normal \(v = F x - x = \left\lvert \left\lvert x \right\rvert \right\rvert e_1 - x\).
The reflection, as depicted above by Trefethen and Bau (1999) can be written as \(F = I - 2 \frac{v v^T}{v^T v}\). We want the reflector that moves \(x\) the largest distance.
[5]:
function qr_householder(A)
m, n = size(A)
R = copy(A)
V = [] # list of reflectors
for j in 1:n
# Build v for the current column
v = copy(R[j:end, j])
v[1] += sign(v[1]) * norm(v) # Here's the fix!
v = normalize(v)
# Build and apply F
R[j:end, j:end] -= 2 * v * (v' * R[j:end,j:end])
# Keep track of the reflectors
push!(V, v)
end
V, R
end
[5]:
qr_householder (generic function with 1 method)
Remember these helpers to convert \(V\) to \(Q\)
[6]:
# Here's a function to apply the reflectors V to a vector x
function reflectors_mult(V, x)
y = copy(x)
for v in reverse(V)
n = length(v) - 1
# This is applying F as we did to form R
y[end-n:end] -= 2 * v * (v' * y[end-n:end])
end
y
end
# Create a dense matrix Q from reflectors V
function reflectors_to_dense(V)
m = length(V[1])
Q = diagm(ones(m))
# Apply the reflectors to the identity matrix to build Q
for j in 1:m
Q[:, j] = reflectors_mult(V, Q[:,j])
end
Q
end
[6]:
reflectors_to_dense (generic function with 1 method)
[7]:
# Let's try it out
A = [2 -1; -1 2] * 1e-10
V, R = qr_householder(A)
tau = [2*v[1]^2 for v in V]
@show tau
V1 = [v ./ v[1] for v in V]
@show V1
println("R =")
display(R)
tau = [1.894427190999916, 2.0]
V1 = [[1.0, -0.2360679774997897], [1.0]]
R =
2×2 Matrix{Float64}:
-2.23607e-10 1.78885e-10
0.0 -1.34164e-10
Householder is backward stable¶
We have finally fixed our loss of orthogonality problem!
[8]:
# Let's try an example with Vandermonde
m = 40
x = LinRange(-1, 1, m)
A = vander(x, m)
V, R = qr_householder(A)
Q = reflectors_to_dense(V)
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 5.932687575393109e-15
norm(Q * R - A) = 6.6179593854314975e-15
[9]:
a = 20
x = LinRange(-1, 1, a)
A = vander(x)
Q, _ = gram_schmidt_classical(A)
v = A[:, end]
@show norm(v)
# Note, if v is actually orthogonal, these should be 0
scatter(abs.(Q[:, 1:end-1]' * v), yscale=:log10, xticks=range(1, a-1; step=2), label="\$v \\cdot Q_{:, j}\$")
norm(v) = 1.4245900685395503
[9]:
[10]:
V, R = qr_householder(A)
Q = reflectors_to_dense(V)
v = A[:, end]
@show norm(v)
# Note, if v is actually orthogonal, these should be 0
scatter(abs.(Q[:, 1:end-1]' * v), yscale=:log10, xticks=range(1, a-1; step=2), label="\$v \\cdot Q_{:, j}\$")
norm(v) = 1.4245900685395503
[10]:
Composition of reflectors¶
This turns applying reflectors from a sequence of vector operations to a sequence of (smallish) matrix operations. It’s the key to high performance and the native format (QRCompactWY
) returned by Julia’s qr()
.
[11]:
Q, R = qr(A)
[11]:
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor: 20×20 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}
R factor:
20×20 Matrix{Float64}:
-4.47214 0.0 -1.64763 0.0 … -0.514468 2.22045e-16
0.0 2.71448 1.11022e-16 1.79412 -2.22045e-16 0.823354
0.0 0.0 -1.46813 -4.16334e-17 -0.944961 -1.68268e-16
0.0 0.0 0.0 -0.774796 -7.26415e-17 -0.913056
0.0 0.0 0.0 0.0 0.797217 -2.39731e-16
0.0 0.0 0.0 0.0 … -1.30513e-16 0.637796
0.0 0.0 0.0 0.0 -0.455484 -5.47058e-16
0.0 0.0 0.0 0.0 3.85024e-17 -0.313652
0.0 0.0 0.0 0.0 -0.183132 1.28256e-15
0.0 0.0 0.0 0.0 -2.51101e-16 0.109523
0.0 0.0 0.0 0.0 … 0.0510878 1.68268e-15
0.0 0.0 0.0 0.0 -8.39606e-16 0.0264553
0.0 0.0 0.0 0.0 -0.0094344 1.00961e-15
0.0 0.0 0.0 0.0 -4.68375e-16 0.00417208
0.0 0.0 0.0 0.0 0.0010525 -6.32307e-16
0.0 0.0 0.0 0.0 … -1.40892e-15 -0.000385264
0.0 0.0 0.0 0.0 -5.9057e-5 2.88344e-16
0.0 0.0 0.0 0.0 -4.61672e-16 -1.66202e-5
0.0 0.0 0.0 0.0 -1.04299e-6 -9.8606e-17
0.0 0.0 0.0 0.0 0.0 1.71467e-7
Nonsquare matrices¶
\(QR\) factorization also works for very nonsquare matrices.
[12]:
# Much, much 'longer' than 'wide'
A = rand(1000000, 5)
Q, R = qr(A)
@show size(Q)
@show norm(Q*R - A)
R
size(Q) = (1000000, 1000000)
norm(Q * R - A) = 3.538426971490534e-12
[12]:
5×5 Matrix{Float64}:
-577.089 -432.471 -432.415 -433.388 -432.676
0.0 -381.619 -162.868 -163.531 -163.242
0.0 0.0 345.058 103.666 103.216
0.0 0.0 0.0 329.259 76.1959
0.0 0.0 0.0 0.0 -320.434
This is a “full” (or “complete”) \(QR\) factorization, in contrast to the reduced \(QR\) factorization in which \(Q\) has the same shape as \(A\).
How much memory does \(Q\) use?
You can compare with `numpy.linalg.qr
<https://numpy.org/doc/stable/reference/generated/numpy.linalg.qr.html>`__
Need to decide up-front weather you want full or reduced \(QR\)
Full \(QR\) is expensive to represent
Cholesky \(QR\)¶
The Cholesky decomposition takes a Hermitian, positive-definite matrix and creates the lower triangular matrix \(L\) such that \(A = L L^T\) (more formally \(A = L L^*\) where \(L^*\) is the conjugate transpose, but we will stick to the real case here).
Consider the product \(A^T A\). Using the \(QR\) factorization, we have
so we should be able to use \(A^T A = L L^T = R^T R\) to find \(R\) and then use \(R\) to compute \(Q\) as \(Q = A L^{-T}\).
[13]:
function qr_chol(A)
R = cholesky(A' * A).U
Q = A / R
Q, R
end
A = rand(10, 4)
Q, R = qr_chol(A)
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 1.3680071283797385e-15
norm(Q * R - A) = 2.699098171834165e-16
[14]:
# But let's pick on Vandermonde again
x = LinRange(-1, 1, 15)
A = vander(x)
Q, R = qr_chol(A)
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 3.960393895764547e-5
norm(Q * R - A) = 5.044150263130546e-16
We again see our loss of orthogonality.
Repairing Cholesky \(QR\)¶
Note that the product of two triangular matrices is triangular.
[15]:
R = triu(rand(5,5))
R * R
[15]:
5×5 Matrix{Float64}:
0.622166 0.21456 0.613734 0.597838 0.964626
0.0 0.0185025 0.343078 0.0464937 0.238227
0.0 0.0 0.487927 0.109346 0.147878
0.0 0.0 0.0 0.376023 0.0595793
0.0 0.0 0.0 0.0 0.0054071
Let’s apply this Cholesky process twice and multiply our two \(R\) matrices together.
[16]:
function qr_chol2(A)
Q, R = qr_chol(A) # QR factorization
Q, R1 = qr_chol(Q) # And again
Q, R1 * R
end
x = LinRange(-1, 1, 15)
A = vander(x)
Q, R = qr_chol2(A)
@show norm(Q' * Q - I)
@show norm(Q * R - A);
norm(Q' * Q - I) = 1.2262696169886244e-15
norm(Q * R - A) = 9.209900230881433e-16
Profiling investigation¶
How fast are these methods?
[17]:
m, n = 5000, 2000
A = randn(m, n)
@time qr(A);
0.243139 seconds (9 allocations: 77.396 MiB, 1.57% gc time)
[18]:
using Pkg
pkg"add ProfileSVG"
using ProfileSVG
@profview qr(A) # Note that memory actions are expensive!
Resolving package versions...
No Changes to `~/.julia/environments/v1.11/Project.toml`
No Changes to `~/.julia/environments/v1.11/Manifest.toml`
[18]:
Profiling exploration¶
Compare the profile for our different implementations of \(QR\) we have seen in class. What stands out as expensive?
(Note, these commands are commented out as they take a while to run.)
[19]:
# We can adjust profiling settings
using Profile
Profile.init() # returns the current settings
Profile.init(n = 10^5, delay = 0.001)
[20]:
# Let's use a smaller matrix
m, n = 5000, 200
A = randn(m, n);
[21]:
gram_schmidt_modified(A) # Run once to precompile
@profview gram_schmidt_modified(A)
[21]:
[22]:
qr_householder(A)
@profview qr_householder(A)
[22]:
[23]:
qr_chol2(A)
@profview qr_chol2(A)
[23]: