Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel → Restart) and then run all cells (in the menubar, select Cell → Run All).
Make sure you fill in any place that says BEGIN SOLUTION
and END SOLUTION
, as well as your name and collaborators below:
[ ]:
NAME = ""
COLLABORATORS = ""
2025-09-19 Rootfinding¶
Globalizing Newton¶
In this activity, we experiment with helping Newton methods to converge from poor initial guesses (known as globalization, which strives to get close enough that Newton’s quadratic local convergence can kick in).
[ ]:
using Plots
# We write down functions and their derivatives
tests = Dict(
"x^2 - 2" => (x -> x^2 - 2, x -> 2*x),
"cos(x) - x" => (x -> cos(x) - x, x -> -sin(x) - 1),
)
function plotlist(ts)
"A utility function to plot and label each function"
p = plot()
# Style note: I use _ for unused parts of pattern matches
for (name, (f, _fp)) in ts
plot!(f, label=name)
end
plot!(x -> 0, color="black", label=false)
p
end
plotlist(tests)
Convergence rates¶
We implemented bisection and Newton’s method in class. Our bisection implementation here has
better error handling (raises
AssertionError
when the function does not change sign in the interval), andis more efficient by not re-evaluating the function at both endpoints on each iteration.
[ ]:
function bisect((f, _fp), a, b; tol=1e-10)
"""Iterative (rather than recursive) bisection method.
Notice that this takes as input, a pair of functions (f, fp). Bisection
does not use fp, but we're keeping the same interface here to make it
easy to compare methods.
"""
fa, fb = f(a), f(b)
@assert fa*fb <= 0 "Interval function does not change sign, may not contain root"
history = []
while b-a > tol
mid = (a + b) / 2
fmid = f(mid)
push!(history, mid, fmid)
if fa*fmid <= 0
b, fb = mid, fmid
else
a, fa = mid, fmid
end
end
# return num_its rows of [x f(x)]
reshape(vcat(history...), 2, :)'
end
plot(abs.(bisect(tests["cos(x) - x"], 0, 2)[:, 2]), yscale=:log10, marker=:auto, label="\$r_k\$")
[ ]:
function newton((f, fp), a, b; tol=1e-10)
"""Newton method without a line search.
Note that this function takes an interval [a, b], which is a bit
more information than is usually provided to Newton methods. We use
this interface for uniformity with bisection above, and warn if Newton
sends us outside the interval.
"""
x = (a + b) / 2
history = []
for k in 1:100
fx = f(x)
push!(history, x, fx)
if x < a || b < x
@warn "Diverged at iteration $k"
break
end
if abs(fx) < tol
break
end
fpx = fp(x)
x -= fx / fpx
end
# return num_its rows of [x f(x)]
reshape(vcat(history...), 2, :)'
end
plot(abs.(newton(tests["cos(x) - x"], 0, 2)[:, 2]) .+ 1e-20, yscale=:log10, marker=:auto, label="\$r_k\$")
This is steeper than a straight line, demonstrating the signature superlinear convergence of Newton methods.
[ ]:
function plot_conv(tests, methods)
"""Plot the convergence of a group of methods on a dict of test problems"""
colors = Dict(zip(keys(tests), [:red, :green, :blue]))
markers = Dict(zip(methods, [:circle, :square]))
p = plot(x -> 0.5 ^ x, color=:black, xlims=(0, 30), yscale=:log10, ylims=(1e-9, 10), label=false)
for (name, fpair) in tests
for method in methods
left, right = -1, 3.5
h = method(fpair, left, right)
scatter!(abs.(h[:,2]) .+ 1e-20, label="$(string(method)) - $(string(name))", marker=markers[method], color=colors[name])
end
end
p
end
plot_conv(tests, (bisect, newton))
Breaking Newton¶
Find a function \(f \left( x \right)\) that satisfies all of the following properties:
\(f \left( x \right)\) is strictly monotonically increasing
\(f \left( x \right)\) has a simple root at \(x = 0\)
\(f' \left( x \right)\) exists for all real \(x\)
Newton’s method diverges when started with an initial guess \(x_0 = 1\)
Definition
A function \(f \left( x \right)\) is strictly monotonically increasing if \(f \left( a \right) < f \left( b \right)\) for all \(a < b\). If is differentiable, this implies \(f' \left( x \right) > 0\) for all \(x\).
Hint
Look up “sigmoid” functions and sketch a Newton iteration.
[ ]:
function monotonic_diverge()
"""A strictly monotonically increasing function with a simple
root at x=0 for which Newton's method diverges with initial
guess x=1. Recall that we need to return a pair of functions
(f, fp). If your function was f(x) = x^2, you could write:
f(x) = x^2
fp(x) = 2*x
return (f, fp)
"""
### SOLUTION BEGIN
### SOLUTION END
end
# You may see a "Warning: Diverged". That's expected.
plot(monotonic_diverge()[1], aspect_ratio=:equal, label="\$f\$")
hist = newton(monotonic_diverge(), -4, 6)
plot!(hist[:,1], hist[:,2], markershape=:auto, label="\$r_k\$")
[ ]:
using Test
function isderivative(f, fp)
x = LinRange(-5, 5, 100)
eps = 1e-8
fh = (f.(x .+ eps) - f.(x)) / eps
all(abs.(fp.(x) - fh) .< 1e-5*maximum(abs.(f.(x))))
end
function ismonotonic(f, _fp)
x = LinRange(-5, 5, 100)
fx = f.(x)
all(fx[2:end] .> fx[1:end-1])
end
function isdivergent(f, fp)
h = newton((f, fp), -1, 3)
hx = h[:, 1]
abs(hx[end]) > maximum(abs.(hx[1:end-1]))
end
@test monotonic_diverge()[1](0) < 1e-10
@test isderivative(monotonic_diverge()...)
@test ismonotonic(monotonic_diverge()...)
@test isdivergent(monotonic_diverge()...)
Fixing Newton - Secant method¶
The secant method is similar to Newton’s method, but instead of evaluating \(f' \left( x_i \right)\) directly, it uses the approximation
Implement the secant method by writing code similar to Newton’s method, but using this approximation in place of \(f' \left( x_i \right)\) where it appears in Newton’s method.
[ ]:
function secant((f, _fp), a, b; tol=1e-10)
"""Solve f(x) = 0 using the secant method.
We're keeping the same interface as bisection() and newton(), but
the secant method should not use the argument _fp above. Instead,
it should use the approximation based on a prior iterate.
"""
xlast, fxlast = b, f(b) # Choose an arbitrary "last" x
x = (a + b) / 2
history = [xlast, fxlast]
for k in 1:100
fx = f(x)
push!(history, x, fx)
if x < a || b < x
@warn "Diverged at iteration $k"
break
end
if abs(fx) < tol
break
end
### SOLUTION BEGIN
### SOLUTION END
x -= fx / fpx
end
# return num_its rows of [x f(x)]
reshape(vcat(history...), 2, :)'
end
plot(abs.(secant(tests["cos(x) - x"], 0, 2)[:, 2]) .+ 1e-20, yscale=:log10, marker=:auto, label="\$r_k\$")
[ ]:
plot_conv(tests, (newton, secant))
[ ]:
h = secant(tests["x^2 - 2"], -1., 3.)
@test maximum(abs.(h[end, :] .- [sqrt(2), 0])) < 1e-9 # Did not converge to correct root
@test size(h, 1) < 10 # Convergence too slow
Fixing Newton - Line search¶
Newton’s method is often paired with a line search, which finds the smallest non-negative integer \(n\) such that
where
Implement this algorithm by shortening the step until the inequality is satisfied. A new line search (i.e., \(n = 0, 1, \dots\)) starts every time you take a step and you can just try the inequality with each step size, until you can accept the step and continue with the next Newton step.
[ ]:
function newtonls((f, fp), a, b; tol=1e-10)
"""Newton method with a line search.
Note that this function takes an interval [a, b], which is a bit
more information than is usually provided to Newton methods. We use
this interface for uniformity with bisection above, and warn if Newton
sends us outside the interval.
"""
x = (a + b) / 2
history = []
for k in 1:100
fx = f(x)
push!(history, x, fx)
if x < a || b < x
@warn "Diverged at iteration $k"
break
end
if abs(fx) < tol
break
end
fpx = fp(x)
### SOLUTION BEGIN
### SOLUTION END
end
reshape(vcat(history...), 2, :)'
end
plot(monotonic_diverge()[1], aspect_ratio=:equal, label="\$f\$")
hist = newtonls(monotonic_diverge(), -4, 6)
plot!(hist[:,1], hist[:,2], markershape=:auto, label="\$r_k\$")
[ ]:
plot_conv(tests, (newtonls, secant))
[ ]:
@test maximum(abs.(newtonls(monotonic_diverge(), -4, 6)[end, :])) < 1e-10
Newton Accuracy¶
Find a function \(f \left( x \right)\) satisfying the following properties:
\(f \left( 0 \right) = 0\)
bisection and Newton both converge, but Newton has an error of at least \(10^{-2} = 0.01\) even when the residual is smaller than \(10^{-10}\)
As a side-effect of this, you’ll likely see that Newton error is higher than bisection error, even if Newton residuals converge faster.
[ ]:
function inaccurate()
"""A function in which Newton error is large despite residual being small.
You'll be creating a function and its derivative, as in
f(x) = x^2
fp(x) = 2*x
return (f, fp)
"""
### SOLUTION BEGIN
### SOLUTION END
end
plot(inaccurate()[1], label="\$f\$")
[ ]:
# Note that we're plotting residuals |f(x_k)|, not errors |x_k - x_*|
plot_conv(Dict("inaccurate" => inaccurate()), (bisect, newton))
[ ]:
h = newton(inaccurate(), -1, 3)
@show final_error = abs(h[end, 1])
smallresid = abs.(h[:,2]) .< 1e-10
@test any(smallresid)
@test any(h[smallresid,1] .> 1e-2)
@test isderivative(inaccurate()...)
Newton - Impact of initial guess¶
We consider the function \(f \left( z \right) = z^3 - 1\) and examine when Newton’s method converges, and which root it converges to. (This function has three roots in the complex plane.) We will color each pixel based on how Newton’s method converges with that initial guess. You are asked to implement one step of Newton’s method (without line search or anything fancy) inside the inner loop.
[ ]:
function image(m, maxits)
x1 = LinRange(-2, 2, m)
o1 = ones(m)
X, Y = x1' .* o1, o1' .* x1
Z = X .+ 1im*Y
C = fill(RGB(0), m, m) # m×m array of colors
# The equation has three known roots
roots = [1, exp(2im*π/3), exp(-2im*π/3)]
for i in 1:m
for j in 1:m
z = Z[i, j] # Initial guess for this pixel
k = 0
while k < maxits
# Update z with one Newton step on the equation f(z) = z**3 - 1.
# Break if |f(z)| < 1e-10.
### SOLUTION BEGIN
### SOLUTION END
k += 1
end
# Color the output based on how many iterations were needed
color = [1., 1, 1] # white if not converged
for r in 1:3
if abs(z - roots[r]) < 1e-8
color[:] .= 0
color[r] = log(k) / log(100)
end
end
C[i,j] = RGB(color[1], color[2], color[3])
end
end
C
end
# You may try adjusting the resolution and/or number of iterations
plot(image(200, 50))
You can read more about such diagrams, the complexity of which demonstrate that precisely describing the domain of convergence may be far more complicated than the function or the algorithm. Considering that for many problems, one specific root is physical (say the real one, colored red above), we usually need good initial guesses for fast, reliable convergence in practical applications. Note that convergence in this case is pretty reliable if the initial guess is real.
Newton for rational functions¶
Newton’s method solves \(f \left( x \right) = 0\) using a first order Taylor approximation at each point. We can use it to compute square roots and related functions. Sample code for Newton’s method is provided.
[ ]:
using Printf
function newton((f, fp), a, b; tol=1e-10, verbose=false)
"""Newton method without a line search.
Note that this function takes an interval [a, b], which is a bit
more information than is usually provided to Newton methods. We use
this interface for uniformity with bisection above, and warn if Newton
sends us outside the interval.
"""
x = (a + b) / 2
for k in 1:100
fx = f(x)
if verbose
println("k=$k, x_k=$(@sprintf("%.16f", x)), f_k=$(@sprintf("%.16e", fx))")
end
if abs(fx) < tol
break
end
fpx = fp(x)
x -= fx / fpx
end
x
end
newton(tests["cos(x) - x"], 0, 2; verbose=true)
Square root¶
We can estimate \(\sqrt{a}\) as the root of \(f \left( x \right) = x^2 - a\).
[ ]:
function my_sqrt(a; guess=1, tol=1e-10, verbose=false)
"""Compute sqrt(a) using Newton's method.
"""
function funcs()
"""Return a pair of functions, f(x) and f'(x)
"""
### SOLUTION BEGIN
### SOLUTION END
end
newton(funcs(), guess - 0.1, guess + 0.1; tol=tol, verbose=verbose)
end
@show sqrt_4 = my_sqrt(4; guess=1, verbose=true);
[ ]:
@test my_sqrt(pi; guess=1) ≈ sqrt(pi)
Experiment to see if there are positive values of a
and/or guess
for wich this function does not converge.
Reciprocal square root¶
Similar to above, express the reciprocal square root \(1 / \sqrt{a}\) as a rootfinding problem, \(f \left( x; a \right) = 0\).
[ ]:
function my_rsqrt1(a; guess=1, tol=1e-10, verbose=false)
"""Compute sqrt(a) using Newton's method.
"""
function funcs()
"""Return a pair of functions, f(x) and f'(x)
"""
### SOLUTION BEGIN
### SOLUTION END
end
newton(funcs(), guess - 0.1, guess + 0.1; tol=tol, verbose=verbose)
end
@show rsqrt_9 = my_rsqrt1(9., guess=.5, verbose=true);
[ ]:
@test 1/my_rsqrt1(1.05)^2 ≈ 1.05
@test 1/my_rsqrt1(100, guess=.01)^2 ≈ 100
Choose a different way to express this problem as a rootfinding problem. Hint: When writing \(f \left( x; a \right)\), you can place the division on the term with \(x\) or the term with \(a\).
[ ]:
function my_rsqrt2(a; guess=1, tol=1e-10, verbose=false)
"""Compute sqrt(a) using Newton's method.
"""
function funcs()
"""Return a pair of functions, f(x) and f'(x)
"""
### SOLUTION BEGIN
### SOLUTION END
end
newton(funcs(), guess - 0.1, guess + 0.1; tol=tol, verbose=verbose)
end
@show rsqrt_9 = my_rsqrt2(9., guess=.5, verbose=true);
[ ]:
@test 1/my_rsqrt2(1.05)^2 ≈ 1.05
@test 1/my_rsqrt2(100, guess=.01)^2 ≈ 100
Make and explain a figure¶
Fix a positive value of \(a\) (your choice) and plot a figure with both functions \(f_1 \left( x \right)\) and \(f_2 \left( x \right)\) that you’ve created above. (They should have roots at \(x_* = 1 / \sqrt{a}\).) You may want to refer to class notebooks for the plotting syntax, including setting xlim
such that you avoid singularities.
[ ]:
### SOLUTION BEGIN
### SOLUTION END
Write a caption for the figure you just made, explaining how Newton could struggle with either of your functions. Which function will be more reliable for finding the root if the initial guess is poor?
SOLUTION BEGIN¶
SOLUTION END¶
Avoiding division¶
This implementation performs division in defining the function \(f \left( x \right)\) and the Newton step,
(Division is a relatively expensive operation compared to addition and multiplication.)
Substitute the expressions you used for \(f \left( x \right)\) and \(f' \left( x \right)\) and simplify so that there is no division in the Newton step itself.
[ ]:
function my_rsqrt3(a; guess=1, tol=1e-10, verbose=false)
"""Compute sqrt(a) using Newton's method with no division.
"""
x = guess
for k in 1:10
### SOLUTION BEGIN
### SOLUTION END
if verbose
fx = x^(-2) - a
println("k=$k, x_k=$(@sprintf("%.16f", x)), f_k=$(@sprintf("%.16e", fx))")
end
end
x
end
@show rsqrt_9 = my_rsqrt3(9., guess=.5, verbose=true);
[ ]:
@test 1/my_rsqrt3(1.05)^2 ≈ 1.05
@test 1/my_rsqrt3(100, guess=.01)^2 ≈ 100
@test my_rsqrt3(π, guess=.3) ≈ 1/sqrt(π)
Conclusions?¶
According to the experiments above, should we prefer one formulation over another? Run some additional experiments to confirm. What criteria might we use to evaluate which is better? (Write a couple of sentences. Feel free to discuss on Zulip.)
SOLUTION BEGIN¶
SOLUTION END¶
Exploration¶
When introducing rootfinding, we consider examples such as nonlinear elasticity and used rootfinding to change inputs and outputs in an expression that cannot be analytically inverted. In this exploratory activity, I’d like you take a similar process with a function of interest to you.
Find a function \(f \left( x \right)\) that models something you’re interested in. You could consider nonlinear physical models (aerodynamic drag, nonlinear elasticity), behavioral models, probability distributions, or anything else that that catches your interest. Implement the function in Julia.
Consider how you might know the output of such functions, but not an input. Think from the position of different stakeholders: is the equation used differently by an experimentalist collecting data versus by someone making predictions through simulation? How about a company or government reasoning about people versus the people their decisions may impact?
Formulate the map from known to desired data as a rootfinding problem and try one or more methods (Newton, bisection, etc., or use a rootfinding library such as Roots, which was used in the introductory class demos).
Plot the inverse function (output versus input) from the standpoint of one or more stakeholder. Write captions describing any interesting inflection points or ways in which the methods may or may not be reliable.
If there are a hierarchy of models for the application you’re interested in, you could consider using a simpler model to provide an initial guess to a more complicated model.