Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All).
Make sure you fill in any place that says BEGIN SOLUTION
and END SOLUTION
, as well as your name and collaborators below:
[ ]:
NAME = ""
COLLABORATORS = ""
2025-10-03 Orthogonal Polynomials¶
In this activity, we will construct polynomials using orthogonality with respect to an integral inner product. For example, given functions \(u \left( x \right)\) and \(v \left( x \right)\), we define their inner product to be
where we use the notation \(\left\langle u, w \right\rangle\) to distinguish this continuous inner product from the discrete inner product \(u^T v\) of vectors.
This is known as the \(L^2\) inner product on the domain \(\left( -1, 1 \right)\), which is a Hilbert space and a :math:`L^p space <https://en.wikipedia.org/wiki/Lp_space>`__. The study of such generalizations of linear algebraic concepts is called Functional Analysis, and advanced mathematical topic that is used in numerical activity. We will only need the above definition in this activity.
The inner product induces a norm,
The inner product and norm satisfy the usual properties from our Linear Algebra notebook.
Working with polynomials¶
We’ll develop functions to evaluate, multiply, and integrate polynomials expressed as vectors of coefficients.
[ ]:
display("text/latex", "\$x^2\$")
[ ]:
using LinearAlgebra
using Plots
using Test
function polystring(p)
"""Construct a string representation of the polynomial.
You don't need to understand this function, it is only
used to annotate plots and clearly view polynomials.
"""
join((a == 1 ? "x^$(i-1)" : "$a x^$(i-1)" for (i, a) in enumerate(p) if a ≠ 0), " + ")
end
polystring([1 4 0 6])
[ ]:
function polydisp(name, p)
"""Display the polynomial with friendly math formatting."""
s = join(('$', "$name(x) = ", polystring(p), '$'))
display("text/latex", s)
end
polydisp("p", [1 4 0 6])
[ ]:
function polyval(p, x)
"""Evaluate function described by coefficients p at point x.
This uses Horner's rule, which evaluates a0 + a1 x + a2 x^2 + a3 x^3 as
a0 + x * (a1 + x * (a2 + x * (a3))).
"""
f = one(x) * p[end]
for a in reverse(p[1:end-1])
f = a .+ x .* f
end
f
end
@test polyval([1 4 0 6], π) ≈ 1 + 4π + 6π^3
[ ]:
function polyplot(p; label=nothing)
"""Plot a polynomial p(x) defined by its coefficients
p[1] + p[2]*x^1 + ... + p[n]*x^(n-1)
where n = length(p). We evaluate the polynomial using
our polyval.
"""
if isnothing(label)
label = polystring(p)
end
plot!(x -> polyval(p, x), xlims=(-1, 1), label=join(('$', label, '$')))
end
plot()
polyplot([1 4 0 6])
The polynomial manipulation functions such as polyint
are \(\mathcal{O} \left( n \right)\) (linear time) operations on lists or arrays. We’ll start by implementing polyint
, which computes the indefinite integral
i.e., \(P' \left( x \right) = p \left( x \right)\). Note that the antiderivative \(P \left( x \right)\) is a polynomial determined only up to an integration constant, which you man choose arbitrarily.
As an example,
where \(C\) can be any value, so we should have
polyint([1 4 0 6]) == [0 1 2 0 6/4]
[ ]:
function polyint(p)
"""Without using library functions, compute an indefinite integral
of p(x) expressed as a polynomial of degree 1 higher than the
degree of p. The constant of integration may be chosen arbitrarily.
"""
p = reshape(p, :) # Convert from row or column vector into list
pint = zeros(length(p) + 1) # Array to hold the indefinite integral
### SOLUTION BEGIN
### SOLUTION END
# return a row vector (for easier viewing)
reshape(pint, 1, :)
end
p = [1 4 0 6]
polydisp('p', p)
P = polyint(p)
polydisp('P', P)
[ ]:
@test polyint([1 1]) ≈ [0 1 .5]
@test polyint([1 4 0 6]) ≈ [0 1 2 0 6/4]
@test polyint([1 2 3 4]) ≈ [0 1 1 1 1]
Multiplying polynomials¶
We need to be able to compute the product of polynomials. For example,
(You may notice that one-based indexing is kinda awkward here compared to zero-based indexing.)
[ ]:
function polymul(p, q)
"""Multiply two polynomials, returning a polynomial."""
p = reshape(p, :)
q = reshape(q, :)
n = length(p) + length(q) - 1
r = zeros(n)
for (i, a) in enumerate(p)
for (j, b) in enumerate(q)
r[i+j-1] += a * b
end
end
# return a row vector (for easier viewing)
reshape(r, 1, :)
end
r = polymul([1 2 3], [10, 100])
polydisp('r', r)
@show r;
[ ]:
plot()
p = [1 -2]
q = [.5 1]
polyplot(p)
polyplot(q)
polyplot(polymul(p, q))
plot!(x -> 0, color=:black, label=nothing)
Addition and subtraction¶
We need to be able to add and subtract polynomials, that is, to compute \(r = p + q\) and \(r = p - q\). This is just entry-wise addition and subtraction, except that the vectors may have different lengths.
[ ]:
function polyadd(p, q)
p = reshape(p, :)
q = reshape(q, :)
n = max(length(p), length(q))
r = zeros(n)
r[1:length(p)] = p
r[1:length(q)] += q
reshape(r, 1, :)
end
polyadd([1 2 3], [10 20])
[ ]:
function polysub(p, q)
p = reshape(p, :)
q = reshape(q, :)
n = max(length(p), length(q))
r = zeros(n)
r[1:length(p)] = p
r[1:length(q)] -= q
reshape(r, 1, :)
end
polysub([1 2 3], [10 20])
Inner products of polynomials¶
We will use the \(L^2\) inner product,
restricted to the case where \(u \left( x \right)\) and $ v \left`( x :nbsphinx-math:right`)$ are polynomials.
Recall the Fundamental Theorem of Calculus, which says that
where \(F\) is a function such that \(F' \left( x \right) = f \left( x \right)\).
[ ]:
function poly_inner_product(u, v)
"""Use polymul, polyint, and polyval to define the L2
inner product of u and v on the interval (-1, 1). Both u and v
are polynomials expressed as arrays in the usual way. Recall that
the inner product returns a scalar (single number).
"""
### SOLUTION BEGIN
### SOLUTION END
end
function display_inner_product(u, v)
display("text/html", "\$ \\langle $(polystring(u)), $(polystring(v)) \\rangle = $(poly_inner_product(u, v)) \$")
end
poly_inner_product([1], [0 1])
display_inner_product([1], [0 1])
display_inner_product([0 0 1], [1])
[ ]:
@test poly_inner_product([-1 0 3], [1]) ≈ 0
@test poly_inner_product([-1 0 3], [0 -3 0 5]) ≈ 0
@test poly_inner_product([0 1], [0 -3 0 5]) ≈ 0
@test poly_inner_product([1 2 3], [1 3 0 4]) ≈ 11.2
Norms of polynomials¶
Recall that an inner product induces a norm. We’ll define it now and use it later to normalize polynomials.
[ ]:
function poly_norm(p)
"""Compute the norm of the polynomial p(x) induced by poly_inner_product().
"""
### SOLUTION BEGIN
### SOLUTION END
end
[ ]:
@test poly_norm([1]) ≈ sqrt(2)
@test poly_norm([0 1]) ≈ sqrt(2/3)
@test poly_norm([-1 0 3]) ≈ sqrt(8/5)
Orthogonalization of polynomials¶
We now have the ingredients to use the Gram-Schmidt algorithm to orgthogonalize polynomials (starting with the monomials) with respect to the \(L^2\) inner product. We start by defining a “matrix” in which each of the columns is a polynomial. This is conceptually similar to a Vandermonde matrix (see np.vander()
) in the limit where we evaluate at infinitely many points; instead of storing infinitely tall columns, we store each as a list defining the polynomial.
[ ]:
function monomials(n)
"""Return a list of the first n monomials in order of increasing degree.
We will think of this as a "tall" matrix indexed by its n columns.
Each column contains a polynomial.
"""
A = []
for j in 1:n
p = zeros(j)
p[end] = 1
push!(A, p)
end
A
end
plot(title="Columns of the monomial matrix")
for (j, p) in enumerate(monomials(6))
polydisp("p_$j", p)
polyplot(p)
end
plot!()
Projections¶
Polynomials can be projected just like vectors. For example, if \(q \left( x \right)\) is normalized,
is the orthogonal projection of \(p\) into the subspace of \(q\) and
is the part orthogonal to \(q\).
Notice the similarities between the expressions above (involving the \(L^2\) inner product) and those for projections of finite-dimensional vectors, \(q q^T p\) and \(\left( I - q q^T \right) p\).
[ ]:
function poly_project_orthog(p, q)
"""Find the component of p(x) that is orthogonal to the normalized
polynomial q(x).
"""
c = poly_inner_product(p, q) # length of p in direction q
# polysub is just subtraction of arrays, but lines them up correctly
# in case they are different lengths
polysub(p, q * c) # p - q c
end
function test_project()
A = monomials(5)
plot()
polyplot(A[2]) # x
polyplot(A[4]) # x^3
display_inner_product(A[2], A[4])
p1 = A[2] / poly_norm(A[2]) # Normalize x in L^2 norm
display("text/latex", "\$ \\lVert p_1 \\rVert = $(poly_norm(p1)) \$")
polyplot(p1)
p3 = poly_project_orthog(A[4], p1) # Find component of x^3 that is orthogonal to x
polyplot(p3)
display_inner_product(A[2], p3)
end
test_project()
Polynomial matrix multiplication¶
We have discussed the matrix product \(A x\) being a linear combination of the columns of \(A\). If \(A\) is a polynomial matrix with \(m\) columns (we’ll say it has shape (*, m)
), then
Similarly, if \(B\) is a \(m \times n\) matrix (i.e. shape (m, n)
), then the \(j\)th column of \(C = A B\) is
The matrix \(C\) is also a polynomial matrix, now with shape (*, n)
.
[ ]:
function poly_matmul(A, B)
"""Multiply a polynomial matrix A of shape (*,m) times
a matrix B of shape (m,n) to yield a polynomial matrix
C of shape (*,n).
"""
C = []
m, n = size(B)
for j in 1:n
# Use polyadd to compute A * B[:,j] and
# append it to C using push!(C, new_col)
### SOLUTION BEGIN
### SOLUTION END
end
C
end
C = poly_matmul(monomials(4), [1 0 0; 0 1 0; 0 0 1; 0 0 0])
polydisp("C_{:,1}", C[1])
polydisp("C_{:,2}", C[2])
polydisp("C_{:,3}", C[3])
@show C
@show C[1] # The polynomial 1
@show C[2] # The polynomial x
@show C[3]; # The polynomial x^2
[ ]:
A = monomials(4)
B = reshape(0:7, 2, 4)'
@show C = poly_matmul(A, B)
Cref = [[0 2 4 6], [1 3 5 7]]
@test C[1] ≈ Cref[1]
@test C[2] ≈ Cref[2]
QR factorization of monomials¶
We can apply the Gram-Schmidt process to compute a QR factorization of the monomials. This will yield an orthogonal basis of polynomials.
[ ]:
function poly_qr(A)
"""Compute a reduced QR factorization of the polynomials in the columns of A
using the Gram-Schmidt algorithm. If A has shape (*,n), the factor Q will have
shape (*,n) (i.e., a list of polynomials) and R will be a square (n,n)
triangular matrix.
This algorithm can look line-for-line like the naive Gram-Schmidt in our class
notebook (see Feb 11 or Feb 16 lectures), but you'll need to use the appropriate
functions that act on polynomials, an ensure you are accessing columns. That is,
when working with polynomial matrices (shape (*,n)), column j is accessed as A[j]
instead of A[:,j] for traditional matrices.
"""
Q = []
n = length(A)
R = zeros(n, n)
for j in 1:n # Each column of A and R
### SOLUTION BEGIN
### SOLUTION END
end
Q, R
end
Q, R = poly_qr(monomials(4))
for (j, col) in enumerate(Q)
polydisp("q_{$j}", col)
end
R
[ ]:
## This cell explores some properties of the factorization, and its output
## may be helpful in debugging your implementation above.
function poly_QtQ(Q)
"""Compute Q.T Q where Q is a matrix of polynomials
expressed as a list of columns. The result is an ordinary
square matrix.
"""
n = length(Q)
A = zeros(n, n)
for i in 1:n
for j in 1:n
A[i,j] = poly_inner_product(Q[i], Q[j])
end
end
A
end
function poly_matsub(A, B)
"""Compute A - B where A and B are matrices of polynomials
expressed as lists of columns.
"""
C = []
for i in 1:length(A)
push!(C, polysub(A[i], B[i]))
end
C
end
A = monomials(3)
Q, R = poly_qr(A)
QtQ = poly_QtQ(Q)
@show QtQ
QR_minus_A = poly_matsub(poly_matmul(Q, R), A)
@show QR_minus_A
@show norm(QtQ - I)
@show norm(QR_minus_A)
@test norm(QtQ - I) < 1e-12
@test norm(QR_minus_A) < 1e-12
[ ]:
# The Q factor loses orthogonality as the maximum degree is increased.
# This is much the same effect as we've observed with Vandermonde matrices
# in class, and part of the motivation for the Householder algorithm.
A = monomials(15)
Q, R = poly_qr(A)
QtQ = poly_QtQ(Q)
QR_minus_A = poly_matsub(poly_matmul(Q, R), A)
@show norm(QtQ - I)
@show norm(QR_minus_A);
Plot the orthogonal polynomials¶
[ ]:
function poly_plotmat(A)
handle = plot() # Create new figure
for (j, p) in enumerate(A)
polyplot(p; label="p_{$j}")
end
handle
end
Q, _ = poly_qr(monomials(5))
poly_plotmat(Q)
Normalization¶
Our QR factorization is normalized using the \(L^2\) norm, which is just fine mathematically, but perhaps not visually satisfying.
This set of polynomials is famous (Legendre polynomials), but with the normalization \(p \left( 1 \right) = 1\) instead of the \(L^2\) normalization \(\left\langle p \left( x \right), p \left( x \right) \right\rangle = 1\). Let’s convert our \(QR\) factorization to use this different normalization convention.
[ ]:
function poly_qr_norm1(A)
"""Compute a QR factorization with the normalization convention that
q(1) = 1 for each column of Q. This can be constructed by modifying
the standard QR factorization (scaling the columns of Q and absorbing
that scaling factor into R).
The resulting matrix Q should have columns that are orthogonal with
respect to the L^2 inner product, but they will not be normalized with
respect to that product. Consequently, Q^T Q will be diagonal, but
not the identity.
The factors should still satisfy Q R = A.
"""
Q, R = poly_qr(A)
n, _ = size(R)
# Loop over columns of Q (and rows of R).
# Use polyval() to evaluate each column at x=1 and use that
# normalization to scale the column of Q and corresponding entries
# of R. The new columns of Q should be orthogonal, but will not, in
# general, have norm 1. This means Q^T Q will be diagonal, but not
# the identity. Q R = A should still be satisfied.
### SOLUTION BEGIN
### SOLUTION END
Q, R
end
Q, R = poly_qr_norm1(monomials(6))
poly_plotmat(Q)
[ ]:
Q, R = poly_qr_norm1(monomials(6))
diag(poly_QtQ(Q))'
[ ]:
# There is an analytic formula for this normalizing factor
[2 / (2*i+1) for i in 0:5]'
Recurrence relation¶
Although there is also structure in the factor \(R\), we are often interested only in the basis \(Q\). Let us test whether our functions, the columns of \(Q\) (with our new normalization convention), satisfy the recurrence relation
The above uses the standard convention that \(q_n\) is a polynomial of degree \(n\), but we’re using Julia’s 1-based indexing so our \(q_n\) is a polynomial of degree \(n - 1\). As such, we’ll rewrite the standard recurrence as
[ ]:
for n in 1:length(Q)-2
term = (2*n+1) * polymul([0 1], Q[n+1])
rhs = polysub(term, n*Q[n])
lhs = (n+1)*Q[n+2]
err = poly_norm(polysub(rhs, lhs))
@show err
end
This recurrence is faster and more stable to evaluate than applying Gram-Schmidt to the monomials. We can use it to create a generalized Vandermonde matrix in which the columns are Legendre polynomials instead of the monomials. This matrix will look like
Use the recurrence to compute this Vandermonde matrix.
[ ]:
function vander_legendre(x, k=nothing)
if isnothing(k)
k = length(x) # Square by default
end
m = length(x)
Q = ones(m, k)
Q[:, 2] = x
for n in 1:k-2
# Use the recurrence to compute Q[:,n+1] from Q[:,n] and Q[:,n-1].
### SOLUTION BEGIN
### SOLUTION END
end
Q
end
x = LinRange(-1, 1, 50)
Q = vander_legendre(x, 6)
plot(x, Q, label=["\$P_{$j}\$" for j in (0:5)'])
In both the standard Vandermonde matrix (vander
) and our Legendre variant, column \(j\) is a polynomial of degree \(j - 1\) evaluated at the points \(x\). The columns span the same space (i.e., the matrices have the same range). Let’s compare the condition number of these matrices on a set of 50 evenly-spaced points on the interval \(\left[ -1 , \right]\). Recall that small condition numbers are desirable for numerical stability.
[ ]:
function vander(x, k=nothing)
if isnothing(k)
k = length(x) # Square by default
end
m = length(x)
P = ones(m, k)
P[:, 2] = x
for n in 1:k-1
P[:, n+1] = x .* P[:, n]
end
P
end
@show cond(vander(x, 20))
@show cond(vander_legendre(x, 20));
Outlook¶
This set of orthogonal polynomials is an alternative basis to polynomials, and much more attractive for approximating functions, due to better conditioning and the convenience of orthogonality (when handling continuous functions).
The recurrence provides a computationally efficient and numerically stable procedure for evaluating the polynomials of any degree.
The zeros of these orthogonal polynomials are particularly significant for representing functions and for numerical integration.
We will compute the zeros efficiently using Newton’s method and a carefully chosen initial guess. (They can also be computed by solving an eigenvalue problem.)