2025-09-05 Convergence classes

  • Rootfinding as a modeling tool

  • Limitations of bisection

  • Convergence classes

  • Intro to Newton methods

See also FNC

Stability: Volume of a polygon

[1]:
# What is the area of this polygon?
using Plots
default(lw=4, ms=5, legendfontsize=12, xtickfontsize=12, ytickfontsize=12)

X = ([1 0; 2 1; 1 3; 0 1; -1 1.5; -2 -1; .5 -2; 1 0])

R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
Y = X * R(deg2rad(10))' .+ [1e8 1e8]

plot(Y[:,1], Y[:,2], seriestype=:shape, aspect_ratio=:equal)
[1]:
[2]:
using LinearAlgebra

function pvolume(X)
    n = size(X, 1)
    vol = sum(det(X[i:i+1, :]) / 2 for i in 1:n-1)
end

@show vol_X = pvolume(X)
@show vol_Y = pvolume(Y);
vol_X = pvolume(X) = 9.25
vol_Y = pvolume(Y) = 8.022769168019295

Why is the first digit wrong? These values are ~\(10^8\) and \(\epsilon_\text{machine} \approx 10^{-16}\) so shouldn’t we get about 8 correct digits?

Answer: In the determinant, we are are multiplying values ~\(10^8\) to get values ~\(10^{16}\) and subtracting them.

Rootfinding

Given \(f \left( x \right)\), find \(x_*\) such that \(f \left( x_* \right) = 0\).

We are working with scalars for now and will revisit vector valued functions later.

  • We don’t have \(f \left( x \right)\) but rather the algorithm f(x) that approximates it.

  • Sometimes we have extra information, such as fp(x) that approximates \(f' \left( x \right)\).

  • If we have the source code for f(x), we might be able to transform it to provide fp(x).

Iterative Bisection

[3]:
# Bisection algorithm from last time
f(x) = cos(x) - x

hasroot(f, a, b) = f(a) * f(b) < 0

function bisect_iter(f, a, b, tol)
    hist = Float64[]
    while abs(b - a) > tol
        mid = (a + b) / 2
        push!(hist, mid)
        if hasroot(f, a, mid)
            b = mid
        else
            a = mid
        end
    end
    hist
end

plot(f, xlims=[-1, 3], label="\$cos(x) - x\$")
[3]:
[4]:
# How many steps does this take?
length(bisect_iter(f, -1, 3, 1e-20))
[4]:
56

How fast does the error decrease?

[5]:
a = -1
b = 3
r = bisect_iter(f, a, b, 1e-15)[end] # is this a good 'true' root?

hist = bisect_iter(f, a, b, 1e-10)
# Recall the importance of using log-linear axes
scatter(abs.(hist .- r), yscale=:log10, label="\$x_k\$")

ks = 1:length(hist)
plot!(ks, (b - a) * (0.5 .^ ks), xlabel="\$k\$", ylabel="\$e_k\$", label="~\$2^{-k}\$")
[5]:

So the error \(e_k = x_k - x_*\) satisfies the bound

\[\left\lvert e_k \right\rvert \leq c 2^{-k}\]

Convergence classes

A convergent rootfinding algorithm produces a sequence of approximations \(x_k\) such that

\[\lim_{k \rightarrow \infty} x_k \rightarrow x_*\]

where \(f \left( x_* \right) = 0\). For analysis, it is convenient to define the errors \(e_k = x_k - x_*\). We say that an iterative algorithm is :math:`q`-linearly convergent if

\[\lim_{k \rightarrow \infty} \lvert e_{k + 1} \rvert / \lvert e_k \rvert = \rho < 1\]

(:math:`q` stands for “quotient”) A smaller convergence factor \(\rho\) represents faster convergence.

A slightly weaker condition, \(r\)-linear convergence or linear convergence, is that

\[\lvert e_{k + 1} \rvert \leq \lvert e_k \rvert\]

for all sufficiently large \(k\) when the sequence \(\left\lbrace e_k \right\rbrace\) converges \(q\)-linearly to 0.

[6]:
# Let's plot the error vs iterate for perfect q-linear convergence
ρ = 0.8
errors = [1.]
for i in 1:30
    next_e = errors[end] * ρ
    push!(errors, next_e)
end

plot(errors, ylims=(1e-10, 1), xlabel="\$k\$", ylabel="\$e_k\$", label="p = 0.8")
[6]:
[7]:
# Note the log scale on the plots make this easier to understand
plot(errors, yscale=:log10, ylims=(1e-10, 1), xlabel="\$k\$", ylabel="\$e_k\$", label="p = 0.8")
[7]:
[8]:
# The ratio of errors remains constant - q-linear
e = hist .- r

scatter(abs.(errors[2:end] ./ errors[1:end-1]), ylims=(0,1), xlabel="\$k\$", ylabel="\$\\rho\$", label="\$|e_{k+1}|/|e_k|\$")
[8]:

What about bisection

Is bisection

  1. \(q\)-linearly convergent

  2. \(r\)-linearly convergent

  3. neither

Note

  • Specifying an interval is often inconvenient

  • An interval in which the function changes sign guarantees convergence (robustness)

  • No derivative information is required

  • If bisection works for \(f \left( x \right)\), then it works and gives the same accuracy for \(f \left( x \right) \sigma \left( x \right)\) where \(\sigma \left( x \right) > 0\)

  • Roots of even degree are problematic

  • A bound on the solution error is directly available

  • The convergence rate is modest - one iteration per bit of accuracy

Newton-Raphson method

Much of numerical analysis is based on Taylor series, the approximation

\[f \left( x \right) = f \left( x_0 \right) + f' \left( x_0 \right) \left( x - x_0 \right) + f'' \left( x_0 \right) \left( x - x_0 \right)^2 / 2 + \cdots\]

centered on some reference point \(\left( x_0 \right)\).

(Note - how big are the terms in my \(\cdots\) above?)

In numerical computation, it is exceedingly rare to look beyond the first-order approximation

\[\tilde{f}_{x_0} \left( x \right) = f \left( x_0 \right) + f' \left( x_0 \right) \left( x - x_0 \right)\]

Since \(\tilde{f}_{x_0} \left( x \right)\) is a linear function, we can explicitly compute the unique solution of \(\tilde{f}_{x_0} \left( x \right) = 0\) as \(x = x_0 - f \left( x_0 \right) / f' \left( x_0 \right)\).

This is Newton’s Method (aka Newton-Raphson or Newton-Raphson-Simpson) for finding the roots of differentiable functions.

[9]:
using Printf

function newton(f, fp, x0; tol=1e-8, verbose=false)
    x = x0
    for k in 1:100 # max number of iterations
        fx = f(x)
        fpx = fp(x)
        if verbose
            println("k=$k, x_k=$(@sprintf("%.16f", x)), f_k=$(@sprintf("%.16e", fx))")
        end
        if abs(fx) < tol
            return x, fx, k
        end
        x = x - fx / fpx
    end
end

f(x) = cos(x) - x
fp(x) = -sin(x) - 1
newton(f, fp, 1; tol=1e-15, verbose=true)
k=1, x_k=1.0000000000000000, f_k=-4.5969769413186023e-01
k=2, x_k=0.7503638678402439, f_k=-1.8923073822117442e-02
k=3, x_k=0.7391128909113617, f_k=-4.6455898990771516e-05
k=4, x_k=0.7390851333852840, f_k=-2.8472058044570758e-10
k=5, x_k=0.7390851332151607, f_k=0.0000000000000000e+00
[9]:
(0.7390851332151607, 0.0, 5)