2025-09-10 More Newton¶

Newton methods in computing culture
Breaking Newton’s method
Exploration
Portfolios

Convergence of fixed-point (Taylor Series)¶

Consider the iteration

\[x_{k + 1} = g \left( x_k \right)\]

where \(g\) is a continuously differentiable function. Suppose there exists a fixed point \(x_* = g \left( x_* \right)\). there exists a Taylor series at \(x_*\),

\[g \left( x_k \right) = g \left( x_* \right) + g' \left( x_* \right) \left( x_k - x_* \right) + \mathcal{O} \left( \left( x_k - x_* \right)^2 \right)\]

and thus

\[x_{k + 1} - x_* = g \left( x_k \right) - g \left( x_* \right) = g' \left( x_* \right) \left( x_k - x_* \right) + \mathcal{O} \left( \left( x_k - x_* \right)^2 \right)\]

In terms of the error \(e_k = x_k - x_*\),

\[\left\lvert \frac{e_{k + 1}}{e_k} \right\rvert = \left\lvert g' \left( x_* \right) \right\rvert + \mathcal{O} \left( e_k^2 \right)\]

Recall the definition of :math:`q`-linear convergence

\[\lim_{k \rightarrow \infty} \left\lvert \frac{e_{k + 1}}{e_k} \right\rvert = \rho < 1\]

[2]:

# Let's find the root
xstar, _ = newton(f, fp, 1.)

# And look at our target function
g(x)     = cos(x)
gp(x)    = -sin(x)

# Here is our fixed point iteration
function fixed_point(g, x, n)
    xk = [x]
    for k in 1:n
        x = g(x)
        append!(xk, x)
    end
    xk
end

# Plot the fixed point and the iterates
xk = fixed_point(g, 0., 15)
plot(g, xlims=[-1, 2], label="\$cos(x)\$")
plot!(xk, abs.(g.(xk)), seriestype=:path, marker=:auto, label="\$g(x_k)\$")

[2]:

Verifying fixed point convergence theory¶

\[\left\lvert \frac{e_{k+1}}{e_k} \right\rvert \rightarrow \left\lvert g' \left( x_* \right) \right\rvert\]

[3]:

# The ratio of errors should approach g'(x_*)
# First, what is g'(x_*)
@show gp(xstar)

# And then look at the ratio
ek = xk .- xstar
println("\nratios e_k+1 / e_k =")
(ek[2:end] ./ ek[1:end-1])

gp(xstar) = -0.6736120293089505

ratios e_k+1 / e_k =

[3]:

15-element Vector{Float64}:
 -0.3530241034880909
 -0.7618685362635164
 -0.5959673878312852
 -0.7157653025686597
 -0.6414883589709152
 -0.6933762938713267
 -0.6595161800339986
 -0.6827343083372247
 -0.667303950535869
 -0.677785479788835
 -0.670766892391035
 -0.6755130653097281
 -0.6723244355324894
 -0.6744762481989985
 -0.6730283414604459

[4]:

# And let's put it on a log-linear plot to visualize it
scatter(abs.(ek), yscale=:log10, xlabel="\$k\$", ylabel="\$e_k\$", label="\$e_k\$")
plot!(k -> abs(gp(xstar))^k, label="\$|g'|^k\$")

[4]:

Plotting Newton convergence¶

[5]:

# Ok, this is for fixed points, but what about Newton?!?
# Let's look at the history of our iterates
function newton_hist(f, fp, x0; tol=1e-12)
    x = x0
    hist = []
    for k in 1:100 # max number of iterations
        fx = f(x)
        fpx = fp(x)
        push!(hist, [x fx fpx])
        if abs(fx) < tol
            return vcat(hist...)
        end
        x = x - fx / fpx
    end
end

[5]:

newton_hist (generic function with 1 method)

[6]:

# Let's plot the residual, |f(x_K)|
xk = newton_hist(f, fp, 1.97)

plot(abs.(xk[1:end-1, 2]), yscale=:log10, marker=:auto, xlabel="\$k\$", ylabel="\$r_k\$", label="\$r_k\$")

[6]:

[7]:

# And the error, |x_k - x_*|
xk = newton_hist(f, fp, 1.97)
@show x_star = xk[end,1]

@show ek = xk[1:end-1,1] .- x_star # the errors themselves
plot(ek, yscale=:log10, marker=:auto, xlabel="\$k\$", ylabel="\$e_k\$", label="\$e_k\$")

x_star = xk[end, 1] = 0.7390851332151607
ek = xk[1:end - 1, 1] .- x_star = [1.2309148667848393, 0.003309687313930887, 2.4103207354464473e-6, 1.2827516826519059e-12]

[7]:

[8]:

# And the path taken to the solution
plot(f, xlims=[0.7, 2], xscale=:log10, label="\$cos(x) - x\$")
plot!(xk[:, 1], xk[:, 2], seriestype=:path, marker=:auto, label="\$f(x_k)\$")
# Note how fast the points overlap each other as they converge to the root!

[8]:

Convergence class¶

Is Newton’s method

\(q\)-linearly convergent
\(r\)-linearly convergent
neither

Formulations are not unique (functions)¶

If \(x = g \left( x \right)\) then \(g \left( x \right) - x = 0\) and

\[x = x + h \left( x \right) \left( g \left( x \right) - x \right)\]

for any smooth \(h \left( x \right) \neq 0\). Let \(g_3 \left( x \right) = x + h \left( x \right) \left( g \left( x \right) - x \right)\). Can we choose \(h \left( x \right)\) to make \(\left\lvert g'_3 \left( x \right) \right\rvert\) small?

[9]:

# Let's try something that looks more like Newton
h(x) = -1 / (gp(x) - 1)
g3(x) = x + h(x) * (g(x) - x)
plot([x -> x, cos, g3], ylims=(-5, 5), label=["\$x\$" "\$cos (x)\$" "\$g_2 (x)\$" "\$g_3 (x)\$"])

[9]:

We don’t know \(g' \left( x_* \right)\) in advance because we don’t know \(x_*\)
This method converges fast
We just re-derived Newton’s method

Newton’s method - a new derivation¶

A rootfinding problem \(f \left( x \right) = 0\) can be converted to a fixed point problem

\[x = x + f \left( x \right) := g \left( x \right)\]

but there is no guarantee that \(g' \left( x_* \right) = 1 + f' \left( x_* \right)\) will have magnitude less than 1.

Problem-specific algebraic manipulation can be used to make \(\left\lvert g' \left( x_* \right) \right\rvert\) small.

\(x = x + h \left( x \right) f \left( x \right)\) is a valid formulation for any \(h \left( x \right)\) bounded away from \(0\).

Can we choose \(h \left( x \right)\) such that

\[g' \left( x \right) = 1 + h' \left( x \right) f \left( x \right) + h \left( x \right) f' \left( x \right) = 0\]

when \(f \left( x \right) = 0\)?

Then, we have

\[g' \left( x \right) = 1 + h \left( x \right) f' \left( x \right) = 0\]

\[h \left( x \right) = \frac{-1}{f' \left( x \right)}\]

In other words,

\[x_{k + 1} = x_k + \frac{-1}{f' \left( x_k \right)} f \left( x_k \right) = x_k + h \left( x_k \right) f \left( x_k \right)\]

which is our Newton method.

Quadratic convergence¶

\[\left\lvert \frac{e_{k + 1}}{e_k} \right\rvert \rightarrow \left\lvert g' \left( x_* \right) \right\rvert\]

What does it mean that \(g' \left( x_* \right) = 0\)?
It turns out that Newton’s method has locally quadratic conevgence to simple roots,

\[\lim_{k \rightarrow \infty} \frac{\left\lvert e_{k + 1} \right\rvert}{\left\lvert e_k \right\rvert^2} < \infty\]

“The number of correct digits doubles each iteration”
Now that we know how to make a good guess accurate, the hard part is getting a good guess

Sidebar - Fast inverse square root¶

The following code appeared literally (with these comments) in the Quake III Arena source code (late 1990s).

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                     // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );             // what the fuck?
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

    return y;
}

We now have vector instructions for approximate inverse square root. See also Wikipedia. For details about the magic number, read this section on Wikipedia.

How does it work?¶

Let’s look at the last line.

y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration

We want a function \(f \left( y \right)\) such that \(f \left( 1 / \sqrt{x} \right) = 0\). One such function is

\[f \left( y \right) = 1 / y^2 - x, f' \left( y \right) = -2 / y^3\]

There are others (see the homework activity) that require a division.

Newton’s method is

\[\begin{split} \begin{align} y_{k + 1} &= y_k - \frac{f \left( y_k \right)}{f' \left( y_k \right)}\\ &= y_k - \frac{ 1 / y_k^2 - x}{-2 / y_k^3}\\ &= y_k + \frac{1}{2} \left( y_k - x y_k^2 \right)\\ &= y_k \left( \frac{3}{2} - \frac{1}{2} x f_k^2 \right) \end{align}\end{split}\]

Rootfinding outlook¶

Newton is a workhorse of numerical computation
- Convergence theory is local, need good initial guesses (activity)
- Computing the derivative \(f' \left( x \right)\) is intrusive
  - Option - secant methods (activity)
  - Option - automatic differentiation (future topic)
- Line search helps robustness (activity)
- When does Newton diverge?
More topics
- Find all of the roots
- Use Newton-type methods with bounds
- When does Newton converge slowly?

Exploration¶

Find a function \(f \left( x \right)\) that models something you’re interested in. You could consider nonlinear physical models (aerodynamic drag, nonlinear elasticity), behavioral models, probability distributions, or anything else that that catches your interest. Implement the function in Julia or another language.
Consider how you might know the output of such functions, but not an input. Think from the position of different stakeholders: is the equation used differently by an experimentalist collecting data versus by someone making predictions through simulation? How about a company or government reasoning about people versus the people their decisions may impact?
Formulate the map from known to desired data as a rootfinding problem and try one or more methods (Newton, bisection, etc., or use a rootfinding library).
Plot the inverse function (output versus input) from the standpoint of one or more stakeholder. Are there interesting inflection points? Are the methods reliable?
If there are a hierarchy of models for the application you’re interested in, consider using a simpler model to provide an initial guess to a more complicated model.

Example - equation of state¶

Consider an equation of state for a real gas, which might provide pressure \(p \left( T, \rho \right)\) as a function of temperature \(T\) and density \(\rho\).

An experimentalist can measure temperature and pressure, and will need to solve for density (which is difficult to measure directly).
A simulation might know (at each cell or mesh point, at each time step) the density and internal energy, and need to compute pressure (and maybe temperature).
An analyst might have access to simulation output and wish to compute entropy (a thermodynamic property whose change reflects irreversible processes, and can be used to assess accuracy/stability of a simulation or efficiency of a machine).

The above highlights how many equations are incomplete, failing to model how related quantities (internal energy and entropy in this case) depend on the other quantities. Standardization bodies (such as NIST, here in Boulder) and practitioners often prefer models that intrinsically provide a complete set of consistent relations. An elegant methodology for equations of state for gasses and fluids is by way of the Helmholtz free energy, which is not observable, but whose partial derivatives define a complete set of thermodynamic properties. The CoolProp software has highly accurate models for many gasses, and practitioners often build less expensive models for narrower ranges of thermodynamic conditions.