Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All).

Make sure you fill in any place that says BEGIN SOLUTION and END SOLUTION, as well as your name and collaborators below:

[ ]:

NAME = ""
COLLABORATORS = ""

2025-10-17 Krylov Subspaces¶

In class, we discussed computing the \(QR\) factorization of a matrix by applying an orthogonalization process, such as Gram-Schmidt to the columns of a matrix \(A\). However, this can become impractical if the matrix \(A\) is quite large, such as linear systems of equations representing billions of unknown values in a large scale computational simulation.

There exists a family of iterative methods based upon the Krylov subspace. In this assignment, we will look at one of these Krylov subspace methods, the conjugate gradient method.

Suppose the matrix \(A\) is square. The Krylov subspace generated by the \(n \times n\) matrix \(A\) and the vector \(b\) of length \(n\) is defined as the linear subspace of the images of \(b\) under the first \(k + 1\) powers of \(A\). That is to say,

\[\mathscr{K}_r \left( A, b \right) = \text{span} \left\lbrace b, A b, A^2 b, \dots, A^k b \right\rbrace\]

Building a Krylov subspace¶

Let’s start by comparing a Krylov subspace and the columns of a \(QR\) factorization for the Vandermonde matrix.

[ ]:

using LinearAlgebra
using Plots
using Test

# Here's our Vandermonde matrix again
function vander(x, k=nothing)
    """Return a Vandermonde matrix of the first k monomials evaluated at the points x.
    The returned matrix will be of size m x k, where m is the length of the vector x.
    If no k is provided, the returned matrix will be square.
    """
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

We’ll use this convenience function for visualizing the difference between the basis formed from the columns of \(A\) with \(QR\) factorization and the Krylov basis we’ll form below.

[ ]:

# Helper function to plot the columns of a matrix as basis functions

function plot_basis(x, A)
    """Plot a series of basis vectors stored in the columns of the matrix A
    at the points given by x. The length of x and the number of rows of A
    must be the same.
    """
    _, m = size(A)
    plt = plot(x, A[:, 1], label="\$q_0\$")
    for i in 2:m
        plt = plot!(x, A[:, i], label="\$q_$(i-1)\$")
    end
    display(plt)
end

# Plot the basis vectors from a QR factorization of the Vandermonde matrix
n = 15
x = LinRange(-1, 1, n)
Q, R = qr(vander(x))
plot_basis(x, Q[:, 1:4])

Krylov subspace methods are iterative methods used to solve linear systems of equations \(A x = b\). For now, we will focus on creating the Krylov subspace of degree \(k\) for the matrix \(A\).

The Krylov subspace is formed by the vectors

\[\mathscr{K}_r \left( A, b \right) = \text{span} \left\lbrace b, A b, A^2 b, \dots, A^k b \right\rbrace\]

However, we want an orthonormal basis. We will use Gram-Schmidt orthogonalization as part of the Arnoldi iteration to form this orthonormal basis.

That is to say, we will form the matrices \(Q\) and \(H\) such that

\(Q\) is \(n \times k + 1\) with the orthonormalized columns given by the Gram-Schmidt process on the vectors \(A^i b\)
\(H\) is the \(k \times k\) upper Hessenberg matrix holding the factors from the Gram-Schmidt process.

This definition of \(H\) may seem strange, however it has a powerful application. The matrix \(H\) is constructed so that \(H = Q^T A Q\). This means that \(A Q_n = Q_{n + 1} \tilde{H}_n\), where \(Q_n\) is the orthonormal basis of the Krylov subspace with \(n\) vectors (columns) while \(Q_{n + 1}\) is the basis of the Krylov subspace with \(n + 1\) vectors and \(\tilde{H}_n\) is augmented with an additional row.

Our function will return the matrix \(\tilde{H}\) which is the \(k + 1 \times k\) matrix holding the factors \(\tilde{H}_{i, j-1} = q_i \cdot v\) from the Gram-Schmidt process on and above the diagonal and the norms \(\tilde{H}_{i, i-1} = \lvert \lvert v \rvert \rvert\) on the lower subdiagonal.

[ ]:

function arnoldi(A, b, k=nothing)
    """Form the Krylov subspace from the initial vector b and the matrix A.
    This function returns a pair of matrices Q, H.
    The columns of Q are the orthonormalized basis vectors for the Krylov space.
    H is upper Hessenberg and holds the normalization factors.
    """
    # Get our dimensions for the subspace
    n, _ = size(A)
    if isnothing(k)
        k = n
    end
    # Build a matrix to hold the subspace and initialize it
    Q = zeros(n, k+1)
    # Let's put our factors here, similar to QR
    H = zeros(k+1, k)
    r = norm(b)
    Q[:, 1] = b / r
    for i in 2:k+1
        v = A * Q[:, i-1]
        for j in 1:i
            ### BEGIN SOLUTION]

            ### END SOLUTION
        end
        r = norm(v)
        # We need to stop if the norm of v is small
        #  This means v ≈ 0
        if r < 1e3 * eps()
            break
        end
        Q[:, i] = v / r
        H[i, i-1] = r
    end
    Q, H
end

# And let's look at what happens
n = 4
x = LinRange(-1, 1, n) # Need a square matrix
Q, H = arnoldi(vander(x), ones(n), 4)
@show Q[:, 1]
@show Q[:, 2]
println("H =")
display(H);

[ ]:

A = [2 1 0; 1 3 0; 0 0 4]
b = [1 1 1]
Q, H = arnoldi(A, b)

@test Q[:, end] ≈ zeros(3)
@test Q * Q' ≈ I
@test A * Q[:, 1:end-1] ≈ Q * H

Let’s plot the basis vectors from our Arnoldi iterations and see how they compare to the basis vectors from the \(QR\) factorization.

[ ]:

# Plot the basis vectors from the Arnoldi iterations for a Vandermonde matrix
n = 15
x = LinRange(-1, 1, n)
Q, H = arnoldi(vander(x), ones(n), 3)
plot_basis(x, Q)

This plot looks significantly different, telling us that we cannot interpret this basis in the same way as we did before. However, we can use the basis from the Arnoldi iteration to solve a linear system of equations \(A x = b\). We’ll come back to this, but first let’s pick a problem to solve.

We will solve the equation \(A p = y\), where \(A\) is the Vandermonde matrix at some points \(x\) and \(p\) represents the polynomial coefficients for the polynomial of degree \(3\) that fits the points \((x_i, y_i)\) given by the vectors \(x\) and \(y\).

First, let’s solve a system of equations with \(QR\) factorization as we did in class. With \(QR\) factorization, we form the orthonormal matrix \(Q\) and the upper triangular matrix \(R\) such that \(A = QR\). This means that \(Q R p = y\) or \(p = R^{-1} Q^{-1} y_1\). Since \(Q\) is orthonormal, \(Q^{-1} = Q^T\). \(R\) is upper triangular and can be solved algorithmically.

The matrix equation \(R x = b\) has the structure

\[\begin{split}\begin{bmatrix} r_{11} & r_{12} & r_{13} & r_{14} & r_{15} \\ & r_{22} & r_{23} & r_{24} & r_{25} \\ & & r_{33} & r_{34} & r_{35} \\ & & & r_{44} & r_{45} \\ & & & & r_{55} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{bmatrix} = \begin{bmatrix} b_{1} \\ b_{2} \\ b_{3} \\ b_{4} \\ b_{5} \end{bmatrix}\end{split}\]

We can therefore solve from the ‘bottom up’ to determine the entries of \(x\).

Complete the function below to solve a system of equations \(R x = b\), where \(R\) is a upper triangular matrix.

[ ]:

function solve_upper_triangular(R, b)
    """Solve the linear system of equations R x = b.
    R is upper triangular.
    This function returns the vector x such that R x = b.
    """
    # We'll store the solution in x
    n, _ = size(R)
    x = zeros(n)
    # And solve for each x_i starting with the last row
    for i in n:-1:1 # Note the syntax here, how does i increment?
        x_i = b[i]
        # Solve the equation R_(i, :) * x_i = b_i for the current row
        ### BEGIN SOLUTION

        ### END SOLUTION
        x[i] = x_i / R[i, i]
    end
    x
end

# Let's do a small test
R = [1. 2 3;
     0  4 5;
     0  0 6]
b = [1.; 1; 1]
@show x = solve_upper_triangular(R, b)
@show R * x - b;

[ ]:

# 'normal' test case
R = [1. 2 3 4 5;
     0  6 7 8 9;
     0  0 8 7 6;
     0  0 0 5 4;
     0  0 0 0 3]
b = [1.; 2; 3; 4; 5]
x = solve_upper_triangular(R, b)
@test R * x - b ≈ zeros(5)

# and a trivial test case
R = [1. 0;
     0  1]
b = [1.; 2]
x = solve_upper_triangular(R, b)
@test R * x - b ≈ zeros(2)

Now we can your function solve_upper_triangular() to solve the linear system of equations \(A p = y_1\) given below.

[ ]:

# We want A p = y1
# Note - using x1 and y1 to distinguish from the plotting points x below
x1 = [-0.9, 0.1,  0.5, 0.8]
y1 = [ 1.0, 2.4, -0.2, 1.3]
A = vander(x1)

# Form the basis of A via QR factorization and solve for p
Q, R = qr(A)
p = solve_upper_triangular(R, Q' * y1)
@show p_qr = p

# And plot
k = 4
n = 50
x = LinRange(-1, 1, n)
scatter(x1, y1, label="data")
plot!(x, vander(x, k) * p, label="\$p = R^{-1} Q^T b\$")

We can also use the Krylov subspace from our Arnoldi iteration to solve a linear system of equations \(A x = b\). There are several Krylov subspace methods that have been developed; however we will use the Generalized minimal residual method (GMRES).

Given the basis \(Q_n\) from the Arnoldi process, we want to find the coefficients \(c\) to form the nearest vector in the Krylov subspace to the solution, which is the vector \(x_n = x_0 + Q_n c_n\) such that the residual \(\left\lvert \left\lvert r_n \right\rvert \right\rvert = \left\lvert \left\lvert b - A x_n \right\rvert \right\rvert\) is minimized. For simplicity, we will use \(x_0 = 0\).

Simplifying our expression, we have

\[\begin{split} \begin{align} \left\lvert \left\lvert r_n \right\rvert \right\rvert &= \left\lvert \left\lvert b - A x_n \right\rvert \right\rvert\\ &= \left\lvert \left\lvert b - A Q_n c_n \right\rvert \right\rvert\\ &= \left\lvert \left\lvert b - Q_{n + 1} \tilde{H}_n c_n \right\rvert \right\rvert\\ &= \left\lvert \left\lvert Q_{n + 1} \left( \beta e_1 - \tilde{H}_n c_n \right) \right\rvert \right\rvert\\ \end{align}\end{split}\]

where \(\beta = \left\lvert \left\lvert b \right\rvert \right\rvert\\\) and \(e_1 = \left[ 1, 0, 0, \dots, 0 \right]\).

We therefore want to find the solution to the least squares problem \(\beta e_1 - \tilde{H}_n c_n = 0\). We can then use our vector \(c_n\) to form \(x_n = x_0 + Q_n c_n = Q_n c_n\) (because we assumed \(x_0 = 0\)).

Note that we could create a function solve_upper_hessenberg(), but we will forgo that detail and focus on using our Krylov subspace. We can use \ to let Julia solve this system of equations for us.

[ ]:

# We want A p = y1
# Again using x1 and y1 to distinguish x1 from the plotting points x below
x1 = [-0.9, 0.1,  0.5, 0.8]
y1 = [ 1.0, 2.4, -0.2, 1.3]
A = vander(x1)

# Form the Krylov subspace
k = 4
Q, H = arnoldi(A, y1, k)

# Solve the least squares problem
β = norm(y1)
e1 = zeros(k+1)
e1[1] = 1
### BEGIN SOLUTION

### END SOLUTION
p = Q[:, 1:k] * c
@show p_gmres = p
@show norm(p_gmres - p_qr)

# And plot
n = 50
x = LinRange(-1, 1, n)
scatter(x1, y1, label="data")
plot!(x, vander(x, k) * p, label="\$p = Q H^{-1} e_1\$")

It should reassure us greatly that we appear to have reached the same solution.

It may seem like we did a more work to set up a problem that is more difficult to solve than using the \(QR\) factorization. While this may be true for small systems of equations, the true power of Krylov subspace methods is the fact that it they are iterative methods. This offers two important benefits.

For large systems of equations (millions or billions of values in the solution vector), we do not need to form the entire full rank Krylov space to achieve a solution \(x_n\) with a low residual \(r_n\).
The assembled matrix \(A\) is not required, only the ability to apply the matrix-vector product \(A x\). This enables matrix-free methods and Newton-Krylov methods, which are workhorses in large scale computational simulations, such as on supercomputers.

Let’s implement the ability to expand our Krylov subspace so we can see the benefit of the first point.

[ ]:

function arnoldi_update(A, Q, H)
    """Update the Krylov subspace from the initial vector b and the matrix A.
    This function returns an updated pair of matrices Q, H.
    Q has a new column for the next orthonormal vector and H has the updated factors.
    """
    # Get our dimensions for the subspace
    n, k = size(Q)
    # Build a new Q
    # This is inefficient but shows the idea
    Q_new = zeros(n, k+1)
    Q_new[:, 1:k] = Q
    # And a new H
    H_new = zeros(k+1, k)
    H_new[1:k, 1:k-1] = H
    # Compute new Krylov vector
    v = A * Q_new[:, k]
    for j in 1:k
        ### BEGIN SOLUTION

        ### END SOLUTION
    end
    r = norm(v)
    # We need to stop if the norm of v is small
    #  This means v ≈ 0
    if r > 1e3 * eps()
        Q_new[:, k+1] = v / r
        H_new[k+1, k] = r
    end
    Q_new, H_new
end

# And let's look at what happens
n = 4
x = LinRange(-1, 1, n) # Need a square matrix
A = vander(x)
b = ones(n)

# Let's compare the old
Q, H = arnoldi(A, b, 4)
println("arnoldi(A, b, 4)")
@show Q[:, 4]
println("H =")
display(H);

# With the new
Q, H = arnoldi(A, ones(n), 3)
Q, H = arnoldi_update(A, Q, H)
println("\narnoldi(A, b, 3) and arnoldi_update(A, Q, H)")
@show Q[:, 4]
println("H =")
display(H);

[ ]:

A = [2 1 0; 1 3 0; 0 0 4]
b = [1 1 1]

Q, H = arnoldi(A, b, 2)
Q, H = arnoldi_update(A, Q, H)

@test Q[:, end] ≈ zeros(3)
@test Q * Q' ≈ I
@test A * Q[:, 1:end-1] ≈ Q * H

We can use our iterative updates above to provide refined solutions to our problem \(A x = b\) until we hit a target tolerance. In the code below, fill in the missing pieces to see the convergence of GMRES as we try to fit a polynomial to datapoints from the function \(f \left( x \right) = \sin \left( \pi x \right) e^x\).

[ ]:

# We want A p = y1
# Again using x1 and y1 to distinguish x1 from the plotting points x below
k_max = 65
x1 = [0.0, -0.0243502926634244, 0.0243502926634244, -0.072993121787799, 0.072993121787799, -0.121462819296121, 0.121462819296121, -0.169644420423993, 0.169644420423993, -0.217423643740007, 0.217423643740007, -0.264687162208767, 0.264687162208767, -0.311322871990211, 0.311322871990211, -0.357220158337668, 0.357220158337668, -0.402270157963992, 0.402270157963992, -0.446366017253464, 0.446366017253464, -0.489403145707053, 0.489403145707053, -0.531279464019895, 0.531279464019895, -0.571895646202634, 0.571895646202634, -0.611155355172393, 0.611155355172393, -0.648965471254657, 0.648965471254657, -0.685236313054233, 0.685236313054233, -0.719881850171611, 0.719881850171611, -0.752819907260532, 0.752819907260532, -0.783972358943341, 0.783972358943341, -0.813265315122798, 0.813265315122798, -0.84062929625258, 0.84062929625258, -0.865999398154093, 0.865999398154093, -0.889315445995114, 0.889315445995114, -0.910522137078503, 0.910522137078503, -0.92956917213194, 0.92956917213194, -0.946411374858403, 0.946411374858403, -0.961008799652054, 0.961008799652054, -0.973326827789911, 0.973326827789911, -0.983336253884626, 0.983336253884626, -0.991013371476744, 0.991013371476744, -0.996340116771955, 0.996340116771955, -0.999305041735772, 0.999305041735772]
y1 = sin.(π * x1) .* exp.(x1)
A = vander(x1)
n = 50
x_plt = LinRange(-1, 1, n)
A_plt = vander(x_plt, k_max)
plt = scatter(x1, y1, label="data")

# Form the initial Krylov subspace
k = 1
Q, H = arnoldi(A, y1, k)

# Solve the least squares problem
β = norm(y1)
e1 = zeros(k+1)
e1[1] = 1
### BEGIN SOLUTION

### END SOLUTION
pn = Q[:, 1:k] * c
plt = plot!(x_plt, A_plt * pn, label="\$k = 1\$")

# And let's track the norm of the residuals
r = zeros(k_max)
r[k] = norm(y1 - A * pn)

# And see how well our solution matches as we increase k
for k = 2:k_max
    # Update the Krylov subspace
    Q, H = arnoldi_update(A, Q, H)

    # Solve the least squares problem
    β = norm(y1)
    e1 = zeros(k+1)
    e1[1] = 1
    ### BEGIN SOLUTION

    ### END SOLUTION
    pn = Q[:, 1:k] * c
    if k % 16 == 1
        plt = plot!(x_plt, A_plt * pn, label="\$k = $k\$")
    end

    # And calculate the latest residual
    r[k] = norm(y1 - A * pn)
end
display(plt)

Finally, let’s look at the residual \(b - A x_n\) (in this case \(y - A p_n\)) as the number of vectors in our Krylov subspace increases. We are often interested in the normalized residual, so normalize by \(\left\lvert \left\lvert b \right\rvert \right\rvert\) and plot the residuals on a log-linear scale (log scale in \(y\), linear scale in \(x\)).

Discuss the convergence. Is there a value \(n < 65\) for which the solution \(x_n\) might be acceptable?

[ ]:

### BEGIN SOLUTION

### END SOLUTION

BEGIN SOLUTION¶

END SOLUTION¶

Exploration¶

Select at least one of these exploration problems.

GMRES is not the only Krylov subspace method. Select another Krylov subspace method, such as Conjugate Gradient, and compare it with GMRES. Compare the assumptions required for the matrix \(A\) (symmetric, positive definite, etc), defining any relevant terms. Discuss the trade-offs between GMRES and the method you picked. Provide links to at least 2 sources you found (yes, one can be Wikipedia!).
Select a software package that uses Krylov methods. Provide a link to this package. Summarize which Krylov methods this package uses. Try to determine why the developers picked the Krylov method they did.

2025-10-17 Krylov Subspaces¶

Building a Krylov subspace¶

BEGIN SOLUTION¶

END SOLUTION¶

Exploration¶

Exploration 1¶

BEGIN SOLUTION¶

END SOLUTION¶

Exploration 2¶

BEGIN SOLUTION¶

END SOLUTION¶