2025-09-05 Convergence classes

  • Rootfinding as a modeling tool

  • Limitations of bisection

  • Convergence classes

  • Intro to Newton methods

Stability: Volume of a polygon

[1]:
# what is the area of this polygon
using Plots
default(lw=4, ms=5, legendfontsize=12, xtickfontsize=12, ytickfontsize=12)

X = ([1 0; 2 1; 1 3; 0 1; -1 1.5; -2 -1; .5 -2; 1 0])

R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]
Y = X * R(deg2rad(10))' .+ [1e8 1e8]

plot(Y[:,1], Y[:,2], seriestype=:shape, aspect_ratio=:equal)
[1]:
[2]:
using LinearAlgebra

function pvolume(X)
    n = size(X, 1)
    vol = sum(det(X[i:i+1, :]) / 2 for i in 1:n-1)
end

@show vol_X = pvolume(X)
@show vol_Y = pvolume(Y);
vol_X = pvolume(X) = 9.25
vol_Y = pvolume(Y) = 8.022769168019295

Why is the first digit wrong? These values are ~\(10^8\) and \(\epsilon_\text{machine} \approx 10^{-16}\) so shouldn’t we get about 8 correct digits?

Rootfinding

Given \(f \left( x \right)\), find \(x_*\) such that \(f \left( x_* \right) = 0\).

We are working with scalars for now and will revisit vector valued functions later.

  • We don’t have \(f \left( x \right)\) but rather the algorithm f(x) that approximates it.

  • Sometimes we have extra information, such as fp(x) that approximates \(f' \left( x \right)\).

  • If we have the source code for f(x), we might be able to transform it to provide fp(x).

Iterative Bisection

[3]:
# Bisection algorithm from last time
f(x) = cos(x) - x

hasroot(f, a, b) = f(a) * f(b) < 0

function bisect_iter(f, a, b, tol)
    hist = Float64[]
    while abs(b - a) > tol
        mid = (a + b) / 2
        push!(hist, mid)
        if hasroot(f, a, mid)
            b = mid
        else
            a = mid
        end
    end
    hist
end
[3]:
bisect_iter (generic function with 1 method)
[4]:
# How many steps does this take?
length(bisect_iter(f, -1, 3, 1e-20))
[4]:
56

How fast does the error decrease?

[5]:
hist = bisect_iter(f, -1, 3, 1e-10)
r = hist[end] # What are we trusting?

hist = hist[1:end-1]
scatter( abs.(hist .- r), yscale=:log10, label="x_k")

ks = 1:length(hist)
plot!(ks, 4 * (.5 .^ ks), label = "~2^k")
[5]:

So the error \(e_k = x_k - x_*\) satisfies the bound

\[\left\lvert e_k \right\rvert \leq c 2^{-k}\]

Convergence classes

A convergent rootfinding algorithm procudes a sequence of approximations \(x_k\) such that

\[\lim_{k \rightarrow \infty} x_k \rightarrow x_*\]

where \(f \left( x_* \right) = 0\). For analysis, it is convenient to define the errors \(e_k = x_k - x_*\). We say that an iterative algorithm is :math:`q`-linearly convergent if

\[\lim_{k \rightarrow \infty} \lvert e_{k + 1} \rvert / \lvert e_k \rvert = \rho < 1\]

(:math:`q` stands for “quotient”) A smaller convergence factor \(\rho\) represents faster convergence.

A slightly weaker condition, \(r\)-linear convengence or linear convergence, is that

\[\lvert e_{k + 1} \rvert \leq \lvert e_k \rvert\]

for all sufficiently large \(k\) when the sequence \(\left\lbrace e_k \right\rbrace\) converges \(q\)-linearly to 0.

[6]:
ρ = 0.8
errors = [1.]
for i in 1:30
    next_e = errors[end] * ρ
    push!(errors, next_e)
end

plot(errors, yscale=:log10, ylims=(1e-10, 1), label="p = 0.8")
[6]:
[7]:
e = hist .- r

scatter(abs.(errors[2:end] ./ errors[1:end-1]), ylims=(0,1), label="|e_k+1|/|e_k|")
[7]:

What about bisection

Is bisection

  1. \(q\)-linearly convergent

  2. \(r\)-linearly convergent

  3. neither

Note

  • Specifying an interval is often inconvenient

  • An interval in which the function changes sign guarantees convergence (robustness)

  • No derivative information is required

  • If bisection works for \(f \left( x \right)\), then it works and gives the same accuracy for \(f \left( x \right) \sigma \left( x \right)\) where \(\sigma \left( x \right) > 0\)

  • Roots of even degree are problematic

  • A bound on the solution error is directly available

  • The convergence rate is modest – one iteration per bit of accuracy

Newton-Raphson method

Much of numerical analysis is based on Taylor series, the approximation

\[f \left( x \right) = f \left( x_0 \right) + f' \left( x_0 \right) \left( x - x_0 \right) + f'' \left( x_0 \right) \left( x - x_0 \right)^2 / 2 + \cdots\]

centered on some reference point \(\left( x \right)\).

(Note - how big are the terms in my \(\cdots\) above?)

In numerical computation, it is exceedingly rare to look beyond the first-order approximation

\[\tilde{f}_{x_0} \left( x \right) = f \left( x_0 \right) + f' \left( x_0 \right) \left( x - x_0 \right)\]

Since \(\tilde{f}_{x_0} \left( x \right)\) is a linear function, we can explicitly compute the unique solution of \(\tilde{f}_{x_0} \left( x \right) = 0\) as \(x = x_0 - f \left( x_0 \right) / f' \left( x_0 \right)\).

This is Newton’s Method (aka Newton-Raphson or Newton-Raphson-Simpson) for finding the roots of differentiable functions.

[8]:
function newton(f, fp, x0; tol=1e-8, verbose=false)
    x = x0
    for k in 1:100 # max number of iterations
        fx = f(x)
        fpx = fp(x)
        if verbose
            println("[$k] x=$x  f(x)=$fx  f'(x)=$fpx")
        end
        if abs(fx) < tol
            return x, fx, k
        end
        x = x - fx / fpx
    end
end

f(x) = cos(x) - x
fp(x) = -sin(x) - 1
newton(f, fp, 1; tol=1e-15, verbose=true)
[1] x=1  f(x)=-0.45969769413186023  f'(x)=-1.8414709848078965
[2] x=0.7503638678402439  f(x)=-0.018923073822117442  f'(x)=-1.6819049529414878
[3] x=0.7391128909113617  f(x)=-4.6455898990771516e-5  f'(x)=-1.6736325442243012
[4] x=0.739085133385284  f(x)=-2.847205804457076e-10  f'(x)=-1.6736120293089505
[5] x=0.7390851332151607  f(x)=0.0  f'(x)=-1.6736120291832148
[8]:
(0.7390851332151607, 0.0, 5)