Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All).

Make sure you fill in any place that says BEGIN SOLUTION and END SOLUTION, as well as your name and collaborators below:

[ ]:

NAME = ""
COLLABORATORS = ""

2025-09-19 Rootfinding¶

Globalizing Newton¶

In this activity, we experiment with helping Newton methods to converge from poor initial guesses (known as globalization, which strives to get close enough that Newton’s quadratic local convergence can kick in).

[ ]:

using Plots

# We write down functions and their derivatives
tests = Dict(
    "x^2 - 2" => (x -> x^2 - 2, x -> 2*x),
    "cos(x) - x" => (x -> cos(x) - x, x -> -sin(x) - 1),
)

function plotlist(ts)
    "A utility function to plot and label each function"
    p = plot()
    # Style note: I use _ for unused parts of pattern matches
    for (name, (f, _fp)) in ts
        plot!(f, label=name)
    end
    plot!(x -> 0, color="black", label=false)
    p
end

plotlist(tests)

Convergence rates¶

We implemented bisection and Newton’s method in class. Our bisection implementation here has

better error handling (raises AssertionError when the function does not change sign in the interval), and
is more efficient by not re-evaluating the function at both endpoints on each iteration.

[ ]:

function bisect((f, _fp), a, b; tol=1e-10)
    """Iterative (rather than recursive) bisection method.

    Notice that this takes as input, a pair of functions (f, fp). Bisection
    does not use fp, but we're keeping the same interface here to make it
    easy to compare methods.
    """
    fa, fb = f(a), f(b)
    @assert fa*fb <= 0 "Interval function does not change sign, may not contain root"
    history = []
    while b-a > tol
        mid = (a + b) / 2
        fmid = f(mid)
        push!(history, mid, fmid)
        if fa*fmid <= 0
            b, fb = mid, fmid
        else
            a, fa = mid, fmid
        end
    end
    # return num_its rows of [x f(x)]
    reshape(vcat(history...), 2, :)'
end

plot(abs.(bisect(tests["cos(x) - x"], 0, 2)[:, 2]), yscale=:log10, marker=:auto, label="\$r_k\$")

[ ]:

function newton((f, fp), a, b; tol=1e-10)
    """Newton method without a line search.

    Note that this function takes an interval [a, b], which is a bit
    more information than is usually provided to Newton methods. We use
    this interface for uniformity with bisection above, and warn if Newton
    sends us outside the interval.
    """
    x = (a + b) / 2
    history = []
    for k in 1:100
        fx = f(x)
        push!(history, x, fx)
        if x < a || b < x
            @warn "Diverged at iteration $k"
        break
        end
        if abs(fx) < tol
            break
        end
        fpx = fp(x)
        x -= fx / fpx
    end
    # return num_its rows of [x f(x)]
    reshape(vcat(history...), 2, :)'
end

plot(abs.(newton(tests["cos(x) - x"], 0, 2)[:, 2]) .+ 1e-20, yscale=:log10, marker=:auto, label="\$r_k\$")

This is steeper than a straight line, demonstrating the signature superlinear convergence of Newton methods.

[ ]:

function plot_conv(tests, methods)
    """Plot the convergence of a group of methods on a dict of test problems"""
    colors = Dict(zip(keys(tests), [:red, :green, :blue]))
    markers = Dict(zip(methods, [:circle, :square]))
    p = plot(x -> 0.5 ^ x, color=:black, xlims=(0, 30), yscale=:log10, ylims=(1e-9, 10), label=false)
    for (name, fpair) in tests
        for method in methods
            left, right = -1, 3.5
            h = method(fpair, left, right)
            scatter!(abs.(h[:,2]) .+ 1e-20, label="$(string(method)) - $(string(name))", marker=markers[method], color=colors[name])
        end
    end
    p
end

plot_conv(tests, (bisect, newton))

Breaking Newton¶

Find a function \(f \left( x \right)\) that satisfies all of the following properties:

\(f \left( x \right)\) is strictly monotonically increasing
\(f \left( x \right)\) has a simple root at \(x = 0\)
\(f' \left( x \right)\) exists for all real \(x\)
Newton’s method diverges when started with an initial guess \(x_0 = 1\)

Definition

A function \(f \left( x \right)\) is strictly monotonically increasing if \(f \left( a \right) < f \left( b \right)\) for all \(a < b\). If is differentiable, this implies \(f' \left( x \right) > 0\) for all \(x\).

Hint

Look up “sigmoid” functions and sketch a Newton iteration.

[ ]:

function monotonic_diverge()
    """A strictly monotonically increasing function with a simple
    root at x=0 for which Newton's method diverges with initial
    guess x=1. Recall that we need to return a pair of functions
    (f, fp). If your function was f(x) = x^2, you could write:

    f(x)  = x^2
    fp(x) = 2*x
    return (f, fp)
    """
    ### SOLUTION BEGIN

    ### SOLUTION END
end

# You may see a "Warning: Diverged". That's expected.
plot(monotonic_diverge()[1], aspect_ratio=:equal, label="\$f\$")
hist = newton(monotonic_diverge(), -4, 6)
plot!(hist[:,1], hist[:,2], markershape=:auto, label="\$r_k\$")

[ ]:

using Test


function isderivative(f, fp)
    x = LinRange(-5, 5, 100)
    eps = 1e-8
    fh = (f.(x .+ eps) - f.(x)) / eps
    all(abs.(fp.(x) - fh) .< 1e-5*maximum(abs.(f.(x))))
end

function ismonotonic(f, _fp)
    x = LinRange(-5, 5, 100)
    fx = f.(x)
    all(fx[2:end] .> fx[1:end-1])
end

function isdivergent(f, fp)
    h = newton((f, fp), -1, 3)
    hx = h[:, 1]
    abs(hx[end]) > maximum(abs.(hx[1:end-1]))
end

@test monotonic_diverge()[1](0) < 1e-10
@test isderivative(monotonic_diverge()...)
@test ismonotonic(monotonic_diverge()...)
@test isdivergent(monotonic_diverge()...)

Fixing Newton - Secant method¶

The secant method is similar to Newton’s method, but instead of evaluating \(f' \left( x_i \right)\) directly, it uses the approximation

\[f' \left( x_i \right) \approx \frac{f \left( x_i \right) - f \left( x_{i - 1} \right)}{x_i - x_{x - 1}}\]

Implement the secant method by writing code similar to Newton’s method, but using this approximation in place of \(f' \left( x_i \right)\) where it appears in Newton’s method.

[ ]:

function secant((f, _fp), a, b; tol=1e-10)
    """Solve f(x) = 0 using the secant method.

    We're keeping the same interface as bisection() and newton(), but
    the secant method should not use the argument _fp above. Instead,
    it should use the approximation based on a prior iterate.
    """
    xlast, fxlast = b, f(b) # Choose an arbitrary "last" x
    x = (a + b) / 2
    history = [xlast, fxlast]
    for k in 1:100
        fx = f(x)
        push!(history, x, fx)
        if x < a || b < x
            @warn "Diverged at iteration $k"
        break
        end
        if abs(fx) < tol
            break
        end
        ### SOLUTION BEGIN

        ### SOLUTION END
        x -= fx / fpx
    end
    # return num_its rows of [x f(x)]
    reshape(vcat(history...), 2, :)'
end

plot(abs.(secant(tests["cos(x) - x"], 0, 2)[:, 2]) .+ 1e-20, yscale=:log10, marker=:auto, label="\$r_k\$")

[ ]:

plot_conv(tests, (newton, secant))

[ ]:

h = secant(tests["x^2 - 2"], -1., 3.)

@test maximum(abs.(h[end, :] .- [sqrt(2), 0])) < 1e-9 # Did not converge to correct root
@test size(h, 1) < 10 # Convergence too slow

Fixing Newton - Line search¶

Newton’s method is often paired with a line search, which finds the smallest non-negative integer \(n\) such that

\[\left\lvert f \left( x_{k + 1} \right) \right\rvert < \left\lvert f \left( x_k \right) \right\rvert\]

where

\[x_{k + 1} = x_k - 2^{-n} f \left( x_k \right) / f' \left( x_k \right)\]

Implement this algorithm by shortening the step until the inequality is satisfied. A new line search (i.e., \(n = 0, 1, \dots\)) starts every time you take a step and you can just try the inequality with each step size, until you can accept the step and continue with the next Newton step.

[ ]:

function newtonls((f, fp), a, b; tol=1e-10)
    """Newton method with a line search.

    Note that this function takes an interval [a, b], which is a bit
    more information than is usually provided to Newton methods. We use
    this interface for uniformity with bisection above, and warn if Newton
    sends us outside the interval.
    """
    x = (a + b) / 2
    history = []
    for k in 1:100
        fx = f(x)
        push!(history, x, fx)
        if x < a || b < x
            @warn "Diverged at iteration $k"
        break
        end
        if abs(fx) < tol
            break
        end
        fpx = fp(x)
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    reshape(vcat(history...), 2, :)'
end

plot(monotonic_diverge()[1], aspect_ratio=:equal, label="\$f\$")
hist = newtonls(monotonic_diverge(), -4, 6)
plot!(hist[:,1], hist[:,2], markershape=:auto, label="\$r_k\$")

[ ]:

plot_conv(tests, (newtonls, secant))

[ ]:

@test maximum(abs.(newtonls(monotonic_diverge(), -4, 6)[end, :])) < 1e-10

Newton Accuracy¶

Find a function \(f \left( x \right)\) satisfying the following properties:

\(f \left( 0 \right) = 0\)
bisection and Newton both converge, but Newton has an error of at least \(10^{-2} = 0.01\) even when the residual is smaller than \(10^{-10}\)

As a side-effect of this, you’ll likely see that Newton error is higher than bisection error, even if Newton residuals converge faster.

[ ]:

function inaccurate()
    """A function in which Newton error is large despite residual being small.
    You'll be creating a function and its derivative, as in

    f(x)  = x^2
    fp(x) = 2*x
    return (f, fp)
    """
    ### SOLUTION BEGIN

    ### SOLUTION END
end

plot(inaccurate()[1], label="\$f\$")

[ ]:

# Note that we're plotting residuals |f(x_k)|, not errors |x_k - x_*|
plot_conv(Dict("inaccurate" => inaccurate()), (bisect, newton))

[ ]:

h = newton(inaccurate(), -1, 3)
@show final_error = abs(h[end, 1])
smallresid = abs.(h[:,2]) .< 1e-10

@test any(smallresid)
@test any(h[smallresid,1] .> 1e-2)
@test isderivative(inaccurate()...)

Newton - Impact of initial guess¶

We consider the function \(f \left( z \right) = z^3 - 1\) and examine when Newton’s method converges, and which root it converges to. (This function has three roots in the complex plane.) We will color each pixel based on how Newton’s method converges with that initial guess. You are asked to implement one step of Newton’s method (without line search or anything fancy) inside the inner loop.

[ ]:

function image(m, maxits)
    x1 = LinRange(-2, 2, m)
    o1 = ones(m)
    X, Y = x1' .* o1, o1' .* x1
    Z = X .+ 1im*Y
    C = fill(RGB(0), m, m) # m×m array of colors
    # The equation has three known roots
    roots = [1, exp(2im*π/3), exp(-2im*π/3)]
    for i in 1:m
        for j in 1:m
            z = Z[i, j] # Initial guess for this pixel
            k = 0
            while k < maxits
                # Update z with one Newton step on the equation f(z) = z**3 - 1.
                # Break if |f(z)| < 1e-10.
                ### SOLUTION BEGIN

                ### SOLUTION END
                k += 1
            end
            # Color the output based on how many iterations were needed
            color = [1., 1, 1] # white if not converged
            for r in 1:3
                if abs(z - roots[r]) < 1e-8
                    color[:] .= 0
                    color[r] = log(k) / log(100)
                end
            end
            C[i,j] = RGB(color[1], color[2], color[3])
        end
    end
    C
end

# You may try adjusting the resolution and/or number of iterations
plot(image(200, 50))

You can read more about such diagrams, the complexity of which demonstrate that precisely describing the domain of convergence may be far more complicated than the function or the algorithm. Considering that for many problems, one specific root is physical (say the real one, colored red above), we usually need good initial guesses for fast, reliable convergence in practical applications. Note that convergence in this case is pretty reliable if the initial guess is real.

Newton for rational functions¶

Newton’s method solves \(f \left( x \right) = 0\) using a first order Taylor approximation at each point. We can use it to compute square roots and related functions. Sample code for Newton’s method is provided.

[ ]:

using Printf

function newton((f, fp), a, b; tol=1e-10, verbose=false)
    """Newton method without a line search.

    Note that this function takes an interval [a, b], which is a bit
    more information than is usually provided to Newton methods. We use
    this interface for uniformity with bisection above, and warn if Newton
    sends us outside the interval.
    """
    x = (a + b) / 2
    for k in 1:100
        fx = f(x)
        if verbose
            println("k=$k, x_k=$(@sprintf("%.16f", x)), f_k=$(@sprintf("%.16e", fx))")
        end
        if abs(fx) < tol
            break
        end
        fpx = fp(x)
        x -= fx / fpx
    end
    x
end

newton(tests["cos(x) - x"], 0, 2; verbose=true)

Square root¶

We can estimate \(\sqrt{a}\) as the root of \(f \left( x \right) = x^2 - a\).

[ ]:

function my_sqrt(a; guess=1, tol=1e-10, verbose=false)
    """Compute sqrt(a) using Newton's method.
    """
    function funcs()
        """Return a pair of functions, f(x) and f'(x)
        """
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    newton(funcs(), guess - 0.1, guess + 0.1; tol=tol, verbose=verbose)
end

@show sqrt_4 = my_sqrt(4; guess=1, verbose=true);

[ ]:

@test my_sqrt(pi; guess=1) ≈ sqrt(pi)

Experiment to see if there are positive values of a and/or guess for wich this function does not converge.

Reciprocal square root¶

Similar to above, express the reciprocal square root \(1 / \sqrt{a}\) as a rootfinding problem, \(f \left( x; a \right) = 0\).

[ ]:

function my_rsqrt1(a; guess=1, tol=1e-10, verbose=false)
    """Compute sqrt(a) using Newton's method.
    """
    function funcs()
        """Return a pair of functions, f(x) and f'(x)
        """
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    newton(funcs(), guess - 0.1, guess + 0.1; tol=tol, verbose=verbose)
end

@show rsqrt_9 = my_rsqrt1(9., guess=.5, verbose=true);

[ ]:

@test 1/my_rsqrt1(1.05)^2 ≈ 1.05
@test 1/my_rsqrt1(100, guess=.01)^2 ≈ 100

Choose a different way to express this problem as a rootfinding problem. Hint: When writing \(f \left( x; a \right)\), you can place the division on the term with \(x\) or the term with \(a\).

[ ]:

function my_rsqrt2(a; guess=1, tol=1e-10, verbose=false)
    """Compute sqrt(a) using Newton's method.
    """
    function funcs()
        """Return a pair of functions, f(x) and f'(x)
        """
        ### SOLUTION BEGIN

        ### SOLUTION END
    end
    newton(funcs(), guess - 0.1, guess + 0.1; tol=tol, verbose=verbose)
end

@show rsqrt_9 = my_rsqrt2(9., guess=.5, verbose=true);

[ ]:

@test 1/my_rsqrt2(1.05)^2 ≈ 1.05
@test 1/my_rsqrt2(100, guess=.01)^2 ≈ 100

Make and explain a figure¶

Fix a positive value of \(a\) (your choice) and plot a figure with both functions \(f_1 \left( x \right)\) and \(f_2 \left( x \right)\) that you’ve created above. (They should have roots at \(x_* = 1 / \sqrt{a}\).) You may want to refer to class notebooks for the plotting syntax, including setting xlim such that you avoid singularities.

[ ]:

### SOLUTION BEGIN

### SOLUTION END

Write a caption for the figure you just made, explaining how Newton could struggle with either of your functions. Which function will be more reliable for finding the root if the initial guess is poor?

SOLUTION BEGIN¶

SOLUTION END¶

Avoiding division¶

This implementation performs division in defining the function \(f \left( x \right)\) and the Newton step,

\[x_{i + 1} = x_i - f \left( x_i \right) / f' \left( x_i \right)\]

(Division is a relatively expensive operation compared to addition and multiplication.)

Substitute the expressions you used for \(f \left( x \right)\) and \(f' \left( x \right)\) and simplify so that there is no division in the Newton step itself.

[ ]:

function my_rsqrt3(a; guess=1, tol=1e-10, verbose=false)
    """Compute sqrt(a) using Newton's method with no division.
    """
    x = guess
    for k in 1:10
        ### SOLUTION BEGIN

        ### SOLUTION END
        if verbose
            fx = x^(-2) - a
            println("k=$k, x_k=$(@sprintf("%.16f", x)), f_k=$(@sprintf("%.16e", fx))")
        end
    end
    x
end

@show rsqrt_9 = my_rsqrt3(9., guess=.5, verbose=true);

[ ]:

@test 1/my_rsqrt3(1.05)^2 ≈ 1.05
@test 1/my_rsqrt3(100, guess=.01)^2 ≈ 100
@test my_rsqrt3(π, guess=.3) ≈ 1/sqrt(π)

Conclusions?¶

According to the experiments above, should we prefer one formulation over another? Run some additional experiments to confirm. What criteria might we use to evaluate which is better? (Write a couple of sentences. Feel free to discuss on Zulip.)

SOLUTION BEGIN¶

SOLUTION END¶

Exploration¶

When introducing rootfinding, we consider examples such as nonlinear elasticity and used rootfinding to change inputs and outputs in an expression that cannot be analytically inverted. In this exploratory activity, I’d like you take a similar process with a function of interest to you.

Find a function \(f \left( x \right)\) that models something you’re interested in. You could consider nonlinear physical models (aerodynamic drag, nonlinear elasticity), behavioral models, probability distributions, or anything else that that catches your interest. Implement the function in Julia.
Consider how you might know the output of such functions, but not an input. Think from the position of different stakeholders: is the equation used differently by an experimentalist collecting data versus by someone making predictions through simulation? How about a company or government reasoning about people versus the people their decisions may impact?
Formulate the map from known to desired data as a rootfinding problem and try one or more methods (Newton, bisection, etc., or use a rootfinding library such as Roots, which was used in the introductory class demos).
Plot the inverse function (output versus input) from the standpoint of one or more stakeholder. Write captions describing any interesting inflection points or ways in which the methods may or may not be reliable.
If there are a hierarchy of models for the application you’re interested in, you could consider using a simpler model to provide an initial guess to a more complicated model.