Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All).

Make sure you fill in any place that says BEGIN SOLUTION and END SOLUTION, as well as your name and collaborators below:

[ ]:
NAME = ""
COLLABORATORS = ""

2025-10-03 Orthogonal Polynomials

In this activity, we will construct polynomials using orthogonality with respect to an integral inner product. For example, given functions \(u \left( x \right)\) and \(v \left( x \right)\), we define their inner product to be

\[\left\langle u, w \right\rangle = \int_{-1}^{1} u \left( x \right) v \left( x \right) dx\]

where we use the notation \(\left\langle u, w \right\rangle\) to distinguish this continuous inner product from the discrete inner product \(u^T v\) of vectors.

This is known as the \(L^2\) inner product on the domain \(\left( -1, 1 \right)\), which is a Hilbert space and a :math:`L^p space <https://en.wikipedia.org/wiki/Lp_space>`__. The study of such generalizations of linear algebraic concepts is called Functional Analysis, and advanced mathematical topic that is used in numerical activity. We will only need the above definition in this activity.

The inner product induces a norm,

\[\left\lvert \left\lvert u \right\rvert \right\rvert = \sqrt{\left\langle u, u \right\rangle}\]

The inner product and norm satisfy the usual properties from our Linear Algebra notebook.

Working with polynomials

We’ll develop functions to evaluate, multiply, and integrate polynomials expressed as vectors of coefficients.

[ ]:
display("text/latex", "\$x^2\$")
[ ]:
using LinearAlgebra
using Plots
using Test

function polystring(p)
    """Construct a string representation of the polynomial.

    You don't need to understand this function, it is only
    used to annotate plots and clearly view polynomials.
    """
    join((a == 1 ? "x^$(i-1)" : "$a x^$(i-1)" for (i, a) in enumerate(p) if a  0), " + ")
end

polystring([1 4 0 6])
[ ]:
function polydisp(name, p)
    """Display the polynomial with friendly math formatting."""

    s = join(('$', "$name(x) = ", polystring(p), '$'))
    display("text/latex", s)
end

polydisp("p", [1 4 0 6])
[ ]:
function polyval(p, x)
    """Evaluate function described by coefficients p at point x.

    This uses Horner's rule, which evaluates a0 + a1 x + a2 x^2 + a3 x^3 as
    a0 + x * (a1 + x * (a2 + x * (a3))).
    """
    f = one(x) * p[end]
    for a in reverse(p[1:end-1])
        f = a .+ x .* f
    end
    f
end

@test polyval([1 4 0 6], π)  1 + 4π + 6π^3
[ ]:
function polyplot(p; label=nothing)
    """Plot a polynomial p(x) defined by its coefficients
      p[1] + p[2]*x^1 + ... + p[n]*x^(n-1)
    where n = length(p).  We evaluate the polynomial using
    our polyval.
    """
    if isnothing(label)
        label = polystring(p)
    end
    plot!(x -> polyval(p, x), xlims=(-1, 1), label=join(('$', label, '$')))
end

plot()
polyplot([1 4 0 6])

The polynomial manipulation functions such as polyint are \(\mathcal{O} \left( n \right)\) (linear time) operations on lists or arrays. We’ll start by implementing polyint, which computes the indefinite integral

\[P \left( x \right) = \int p \left( x \right) dx\]

i.e., \(P' \left( x \right) = p \left( x \right)\). Note that the antiderivative \(P \left( x \right)\) is a polynomial determined only up to an integration constant, which you man choose arbitrarily.

As an example,

\[\int 1 + 4 x + 6 x^3 = C + x + 2x^2 + \frac{6}{4} x^4\]

where \(C\) can be any value, so we should have

polyint([1 4 0 6]) == [0 1 2 0 6/4]

[ ]:
function polyint(p)
    """Without using library functions, compute an indefinite integral
    of p(x) expressed as a polynomial of degree 1 higher than the
    degree of p.  The constant of integration may be chosen arbitrarily.
    """
    p = reshape(p, :) # Convert from row or column vector into list
    pint = zeros(length(p) + 1) # Array to hold the indefinite integral
    ### SOLUTION BEGIN

    ### SOLUTION END
    # return a row vector (for easier viewing)
    reshape(pint, 1, :)
end

p = [1 4 0 6]
polydisp('p', p)
P = polyint(p)
polydisp('P', P)
[ ]:
@test polyint([1 1])  [0 1 .5]
@test polyint([1 4 0 6])  [0 1 2 0 6/4]
@test polyint([1 2 3 4])  [0 1 1 1 1]

Multiplying polynomials

We need to be able to compute the product of polynomials. For example,

\[\left( a_1 + a_2 x + a_3 x^2 \right) \left( b_1 + b_2 x + b_3 x^2 \right) = a_1 b_1 + \left( a_1 b_2 + a_2 b_1 \right) x + \left( a_1 b_3 + a_2 b_2 + a_3 b_1 \right) x^2 + \left( a_2 b_3 + a_3 b_2 \right) x^3 + a_3 b_3 x^4\]

(You may notice that one-based indexing is kinda awkward here compared to zero-based indexing.)

[ ]:
function polymul(p, q)
    """Multiply two polynomials, returning a polynomial."""
    p = reshape(p, :)
    q = reshape(q, :)
    n = length(p) + length(q) - 1
    r = zeros(n)
    for (i, a) in enumerate(p)
        for (j, b) in enumerate(q)
            r[i+j-1] += a * b
        end
    end
    # return a row vector (for easier viewing)
    reshape(r, 1, :)
end

r = polymul([1 2 3], [10, 100])
polydisp('r', r)
@show r;
[ ]:
plot()
p = [1 -2]
q = [.5 1]
polyplot(p)
polyplot(q)
polyplot(polymul(p, q))
plot!(x -> 0, color=:black, label=nothing)

Addition and subtraction

We need to be able to add and subtract polynomials, that is, to compute \(r = p + q\) and \(r = p - q\). This is just entry-wise addition and subtraction, except that the vectors may have different lengths.

[ ]:
function polyadd(p, q)
    p = reshape(p, :)
    q = reshape(q, :)
    n = max(length(p), length(q))
    r = zeros(n)
    r[1:length(p)] = p
    r[1:length(q)] += q
    reshape(r, 1, :)
end

polyadd([1 2 3], [10 20])
[ ]:
function polysub(p, q)
    p = reshape(p, :)
    q = reshape(q, :)
    n = max(length(p), length(q))
    r = zeros(n)
    r[1:length(p)] = p
    r[1:length(q)] -= q
    reshape(r, 1, :)
end

polysub([1 2 3], [10 20])

Inner products of polynomials

We will use the \(L^2\) inner product,

\[\left\langle u, v \right\rangle = \int_{-1}^{1} u \left( x \right) v \left( x \right) dx\]

restricted to the case where \(u \left( x \right)\) and $ v \left`( x :nbsphinx-math:right`)$ are polynomials.

Recall the Fundamental Theorem of Calculus, which says that

\[\int_a^b f \left( x \right) = F \left( b \right) - F \left( a \right)\]

where \(F\) is a function such that \(F' \left( x \right) = f \left( x \right)\).

[ ]:
function poly_inner_product(u, v)
    """Use polymul, polyint, and polyval to define the L2
    inner product of u and v on the interval (-1, 1). Both u and v
    are polynomials expressed as arrays in the usual way. Recall that
    the inner product returns a scalar (single number).
    """
    ### SOLUTION BEGIN

    ### SOLUTION END
end

function display_inner_product(u, v)
    display("text/html", "\$ \\langle $(polystring(u)), $(polystring(v)) \\rangle = $(poly_inner_product(u, v)) \$")
end
poly_inner_product([1], [0 1])
display_inner_product([1], [0 1])
display_inner_product([0 0 1], [1])
[ ]:
@test poly_inner_product([-1 0 3], [1])  0
@test poly_inner_product([-1 0 3], [0 -3 0 5])  0
@test poly_inner_product([0 1], [0 -3 0 5])  0
@test poly_inner_product([1 2 3], [1 3 0 4])  11.2

Norms of polynomials

Recall that an inner product induces a norm. We’ll define it now and use it later to normalize polynomials.

[ ]:
function poly_norm(p)
    """Compute the norm of the polynomial p(x) induced by poly_inner_product().
    """
    ### SOLUTION BEGIN

    ### SOLUTION END
end
[ ]:
@test poly_norm([1])  sqrt(2)
@test poly_norm([0 1])  sqrt(2/3)
@test poly_norm([-1 0 3])  sqrt(8/5)

Orthogonalization of polynomials

We now have the ingredients to use the Gram-Schmidt algorithm to orgthogonalize polynomials (starting with the monomials) with respect to the \(L^2\) inner product. We start by defining a “matrix” in which each of the columns is a polynomial. This is conceptually similar to a Vandermonde matrix (see np.vander()) in the limit where we evaluate at infinitely many points; instead of storing infinitely tall columns, we store each as a list defining the polynomial.

[ ]:
function monomials(n)
    """Return a list of the first n monomials in order of increasing degree.
    We will think of this as a "tall" matrix indexed by its n columns.
    Each column contains a polynomial.
    """
    A = []
    for j in 1:n
        p = zeros(j)
        p[end] = 1
        push!(A, p)
    end
    A
end

plot(title="Columns of the monomial matrix")
for (j, p) in enumerate(monomials(6))
    polydisp("p_$j", p)
    polyplot(p)
end
plot!()

Projections

Polynomials can be projected just like vectors. For example, if \(q \left( x \right)\) is normalized,

\[q \left\langle q, p \right\rangle\]

is the orthogonal projection of \(p\) into the subspace of \(q\) and

\[p - q \left\langle q, p \right\rangle\]

is the part orthogonal to \(q\).

Notice the similarities between the expressions above (involving the \(L^2\) inner product) and those for projections of finite-dimensional vectors, \(q q^T p\) and \(\left( I - q q^T \right) p\).

[ ]:
function poly_project_orthog(p, q)
    """Find the component of p(x) that is orthogonal to the normalized
    polynomial q(x).
    """
    c = poly_inner_product(p, q) # length of p in direction q
    # polysub is just subtraction of arrays, but lines them up correctly
    # in case they are different lengths
    polysub(p, q * c) # p - q c
end

function test_project()
    A = monomials(5)
    plot()
    polyplot(A[2])                      # x
    polyplot(A[4])                      # x^3
    display_inner_product(A[2], A[4])
    p1 = A[2] / poly_norm(A[2])         # Normalize x in L^2 norm
    display("text/latex", "\$ \\lVert p_1 \\rVert = $(poly_norm(p1)) \$")
    polyplot(p1)
    p3 = poly_project_orthog(A[4], p1)  # Find component of x^3 that is orthogonal to x
    polyplot(p3)
    display_inner_product(A[2], p3)
end

test_project()

Polynomial matrix multiplication

We have discussed the matrix product \(A x\) being a linear combination of the columns of \(A\). If \(A\) is a polynomial matrix with \(m\) columns (we’ll say it has shape (*, m)), then

\[A x = \sum_i A_{:, i} x_i\]

Similarly, if \(B\) is a \(m \times n\) matrix (i.e. shape (m, n)), then the \(j\)th column of \(C = A B\) is

\[C_{:, j} = \sum_k A_{:, k} B_{k, j}\]

The matrix \(C\) is also a polynomial matrix, now with shape (*, n).

[ ]:
function poly_matmul(A, B)
    """Multiply a polynomial matrix A of shape (*,m) times
    a matrix B of shape (m,n) to yield a polynomial matrix
    C of shape (*,n).
    """
    C = []
    m, n = size(B)
    for j in 1:n
        # Use polyadd to compute A * B[:,j] and
        # append it to C using push!(C, new_col)
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    C
end

C = poly_matmul(monomials(4), [1 0 0; 0 1 0; 0 0 1; 0 0 0])

polydisp("C_{:,1}", C[1])
polydisp("C_{:,2}", C[2])
polydisp("C_{:,3}", C[3])
@show C
@show C[1]  # The polynomial 1
@show C[2]  # The polynomial x
@show C[3]; # The polynomial x^2
[ ]:
A = monomials(4)
B = reshape(0:7, 2, 4)'
@show C = poly_matmul(A, B)
Cref = [[0 2 4 6], [1 3 5 7]]
@test C[1]  Cref[1]
@test C[2]  Cref[2]

QR factorization of monomials

We can apply the Gram-Schmidt process to compute a QR factorization of the monomials. This will yield an orthogonal basis of polynomials.

[ ]:
function poly_qr(A)
    """Compute a reduced QR factorization of the polynomials in the columns of A
    using the Gram-Schmidt algorithm. If A has shape (*,n), the factor Q will have
    shape (*,n) (i.e., a list of polynomials) and R will be a square (n,n)
    triangular matrix.

    This algorithm can look line-for-line like the naive Gram-Schmidt in our class
    notebook (see Feb 11 or Feb 16 lectures), but you'll need to use the appropriate
    functions that act on polynomials, an ensure you are accessing columns. That is,
    when working with polynomial matrices (shape (*,n)), column j is accessed as A[j]
    instead of A[:,j] for traditional matrices.
    """
    Q = []
    n = length(A)
    R = zeros(n, n)
    for j in 1:n # Each column of A and R
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    Q, R
end

Q, R = poly_qr(monomials(4))
for (j, col) in enumerate(Q)
    polydisp("q_{$j}", col)
end
R
[ ]:
## This cell explores some properties of the factorization, and its output
## may be helpful in debugging your implementation above.

function poly_QtQ(Q)
    """Compute Q.T Q where Q is a matrix of polynomials
    expressed as a list of columns. The result is an ordinary
    square matrix.
    """
    n = length(Q)
    A = zeros(n, n)
    for i in 1:n
        for j in 1:n
            A[i,j] = poly_inner_product(Q[i], Q[j])
        end
    end
    A
end

function poly_matsub(A, B)
    """Compute A - B where A and B are matrices of polynomials
    expressed as lists of columns.
    """
    C = []
    for i in 1:length(A)
        push!(C, polysub(A[i], B[i]))
    end
    C
end

A = monomials(3)
Q, R = poly_qr(A)
QtQ = poly_QtQ(Q)
@show QtQ
QR_minus_A = poly_matsub(poly_matmul(Q, R), A)
@show QR_minus_A
@show norm(QtQ - I)
@show norm(QR_minus_A)
@test norm(QtQ - I) < 1e-12
@test norm(QR_minus_A) < 1e-12
[ ]:
# The Q factor loses orthogonality as the maximum degree is increased.
# This is much the same effect as we've observed with Vandermonde matrices
# in class, and part of the motivation for the Householder algorithm.

A = monomials(15)
Q, R = poly_qr(A)
QtQ = poly_QtQ(Q)
QR_minus_A = poly_matsub(poly_matmul(Q, R), A)
@show norm(QtQ - I)
@show norm(QR_minus_A);

Plot the orthogonal polynomials

[ ]:
function poly_plotmat(A)
    handle = plot() # Create new figure
    for (j, p) in enumerate(A)
        polyplot(p; label="p_{$j}")
    end
    handle
end

Q, _ = poly_qr(monomials(5))
poly_plotmat(Q)

Normalization

Our QR factorization is normalized using the \(L^2\) norm, which is just fine mathematically, but perhaps not visually satisfying.

This set of polynomials is famous (Legendre polynomials), but with the normalization \(p \left( 1 \right) = 1\) instead of the \(L^2\) normalization \(\left\langle p \left( x \right), p \left( x \right) \right\rangle = 1\). Let’s convert our \(QR\) factorization to use this different normalization convention.

[ ]:
function poly_qr_norm1(A)
    """Compute a QR factorization with the normalization convention that
    q(1) = 1 for each column of Q.  This can be constructed by modifying
    the standard QR factorization (scaling the columns of Q and absorbing
    that scaling factor into R).

    The resulting matrix Q should have columns that are orthogonal with
    respect to the L^2 inner product, but they will not be normalized with
    respect to that product.  Consequently, Q^T Q will be diagonal, but
    not the identity.

    The factors should still satisfy Q R = A.
    """
    Q, R = poly_qr(A)
    n, _ = size(R)
    # Loop over columns of Q (and rows of R).
    # Use polyval() to evaluate each column at x=1 and use that
    # normalization to scale the column of Q and corresponding entries
    # of R. The new columns of Q should be orthogonal, but will not, in
    # general, have norm 1.  This means Q^T Q will be diagonal, but not
    # the identity.  Q R = A should still be satisfied.
    ### SOLUTION BEGIN

    ### SOLUTION END
    Q, R
end

Q, R = poly_qr_norm1(monomials(6))
poly_plotmat(Q)
[ ]:
Q, R = poly_qr_norm1(monomials(6))
diag(poly_QtQ(Q))'
[ ]:
# There is an analytic formula for this normalizing factor
[2 / (2*i+1) for i in 0:5]'

Recurrence relation

Although there is also structure in the factor \(R\), we are often interested only in the basis \(Q\). Let us test whether our functions, the columns of \(Q\) (with our new normalization convention), satisfy the recurrence relation

\[\left( n + 1 \right) q_{n + 1} \left( x \right) = \left( 2 n + 1 \right) x q_n \left( x \right) - n q_{n - 1} \left( x \right)\]

The above uses the standard convention that \(q_n\) is a polynomial of degree \(n\), but we’re using Julia’s 1-based indexing so our \(q_n\) is a polynomial of degree \(n - 1\). As such, we’ll rewrite the standard recurrence as

\[\left( n + 1 \right) q_{n + 2} \left( x \right) = \left( 2 n + 1 \right) x q_{n + 1} \left( x \right) - n q_n \left( x \right)\]
[ ]:
for n in 1:length(Q)-2
    term = (2*n+1) * polymul([0 1], Q[n+1])
    rhs = polysub(term, n*Q[n])
    lhs = (n+1)*Q[n+2]
    err = poly_norm(polysub(rhs, lhs))
    @show err
end

This recurrence is faster and more stable to evaluate than applying Gram-Schmidt to the monomials. We can use it to create a generalized Vandermonde matrix in which the columns are Legendre polynomials instead of the monomials. This matrix will look like

\[Q = \left[ q_0 \left( x \right) \vert q_1 \left( x \right) \vert q_2 \left( x \right) \vert \cdots \right]\]

Use the recurrence to compute this Vandermonde matrix.

[ ]:
function vander_legendre(x, k=nothing)
    if isnothing(k)
        k = length(x) # Square by default
    end
    m = length(x)
    Q = ones(m, k)
    Q[:, 2] = x
    for n in 1:k-2
        # Use the recurrence to compute Q[:,n+1] from Q[:,n] and Q[:,n-1].
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    Q
end

x = LinRange(-1, 1, 50)
Q = vander_legendre(x, 6)
plot(x, Q, label=["\$P_{$j}\$" for j in (0:5)'])

In both the standard Vandermonde matrix (vander) and our Legendre variant, column \(j\) is a polynomial of degree \(j - 1\) evaluated at the points \(x\). The columns span the same space (i.e., the matrices have the same range). Let’s compare the condition number of these matrices on a set of 50 evenly-spaced points on the interval \(\left[ -1 , \right]\). Recall that small condition numbers are desirable for numerical stability.

[ ]:
function vander(x, k=nothing)
    if isnothing(k)
        k = length(x) # Square by default
    end
    m = length(x)
    P = ones(m, k)
    P[:, 2] = x
    for n in 1:k-1
        P[:, n+1] = x .* P[:, n]
    end
    P
end

@show cond(vander(x, 20))
@show cond(vander_legendre(x, 20));

Outlook

  • This set of orthogonal polynomials is an alternative basis to polynomials, and much more attractive for approximating functions, due to better conditioning and the convenience of orthogonality (when handling continuous functions).

  • The recurrence provides a computationally efficient and numerically stable procedure for evaluating the polynomials of any degree.

  • The zeros of these orthogonal polynomials are particularly significant for representing functions and for numerical integration.

  • We will compute the zeros efficiently using Newton’s method and a carefully chosen initial guess. (They can also be computed by solving an eigenvalue problem.)