Newton’s Method for Systems of Equations¶

We have covered rootfinding for non-linear equations, including Newton’s method, and linear algebra. There is a connection between these two topics - Newton’s method for systems of equations.

We want to find the solution \(x_*\) such that

\[f \left( x_* \right) = 0\]

for some non-linear function \(f \left( x \right)\).

For a single equation, Newton’s method is

\[x_n = x_{n - 1} - \frac{f \left( x_{n - 1} \right)}{f' \left( x_{n - 1} \right)}\]

We will generalize this for two equations with two unknown variables. Therefore, we will find the solution \(X_*\) such that

\[F \left( X_* \right) = 0\]

[1]:

# Let's start with some system of equations
f1(x1, x2) = (x1 - 2) * (x1 - 3) + (x2 - 4) * (x2 - 3)
f2(x1, x2) = (x1 - 2) * (x1 - 4) + (x2 - 5) * (x2 - 3)

function F(X)
    return [f1(X[1], X[2]);
            f2(X[1], X[2])]
end

[1]:

F (generic function with 1 method)

With Newton’s method for scalar equations, we start with an initial guess \(x_0\) for \(x_*\) and create iterates \(x_n\) that get increasingly close to \(x_*\) With each iteration, we are solving for the value \(x\) that is the root of the tangent line for the current guess \(x_{n - 1}\):

\[t \left( x \right) = f \left( x_{n - 1} \right) + f' \left( x_{n - 1} \right) \left( x - x_{n - 1} \right)\]

Solving for \(t \left( x \right) = 0\) gives the next \(x_n\) for our Newton iteration:

\[x_n = x_{n - 1} - \frac{f \left( x_{n - 1} \right)}{f' \left( x_{n - 1} \right)}\]

Similarly, we can form a tangent plane, \(T \left( X \right)\), and find the point on the tangent plane closest to the origin.

\[T \left( X \right) = F \left( X_{n - 1} \right) + J \left( X_{n - 1} \right) \left( X - X_{n - 1} \right)\]

Solving for \(T \left( X \right) = 0\) gives the next \(X_n\) for our Newton iteration:

\[X_n = X_{n - 1} - J^{-1} \left( X_{n - 1} \right) F \left( X_{n - 1} \right)\]

If \(J \left( X_{n - 1} \right)\) is not invertible, the we want to find the \(X_n\) such that \(\left\lvert \left\lvert J \left( X_{n - 1} \right) X_{n - 1} - F \left( X_{n - 1} \right) \right\rvert \right\rvert\) is minimized. This happens frequently with practical problems.

Especially when \(J\) is large, it is common to use a Krylov method to solve for \(X_n\). This leads to the Newton-Krylov methods, which are workhorses of numerical computation.

In order to form this tangent plane, we need the Jacobian matrix, \(J \left( X_{n - 1} \right)\). This Jacobian matrix is formed with the entries \(J_{i, j} = \frac{\partial f_i}{\partial i_j}\).

[2]:

# We also need derivatives
f1dx1(x1, x2) = (x1 - 3) + (x1 - 2)
f1dx2(x1, x2) = (x2 - 3) + (x2 - 4)
f2dx1(x1, x2) = (x1 - 4) + (x1 - 2)
f2dx2(x1, x2) = (x2 - 3) + (x2 - 5)

function J(X)
    return [f1dx1(X[1], X[2]) f1dx2(X[1], X[2]);
            f2dx1(X[1], X[2]) f2dx2(X[1], X[2])]
end

[2]:

J (generic function with 1 method)

Let’s start with an initial guess of \(X_0 = 0\) and do a few iterations “by hand”.

[3]:

# Initial guess
X_0 = [0; 0]

# Solving Ax = b where A = J(X_O) and b = F(X_0)
b = F(X_0)
A = J(X_0)
X_1 = X_0 - inv(A) * b

[3]:

2-element Vector{Float64}:
  8.499999999999986
 -3.4999999999999973

Is this closer? Let’s check.

[4]:

using LinearAlgebra

@show norm(F(X_0))
@show norm(F(X_1));

norm(F(X_0)) = 29.206163733020468
norm(F(X_1)) = 119.50104602052625

Hmm, that doesn’t look closer, but let’s try another iteration.

[5]:

# Solving Ax = b where A = J(X_1) and b = F(X_1)
b = F(X_1)
A = J(X_1)
X_2 = X_1 - inv(A) * b

[5]:

2-element Vector{Float64}:
  5.249999999999987
 -0.24999999999999956

[6]:

@show norm(F(X_0))
@show norm(F(X_1))
@show norm(F(X_2));

norm(F(X_0)) = 29.206163733020468
norm(F(X_1)) = 119.50104602052625
norm(F(X_2)) = 29.875261505131533

Now we’re headed in the right direction again. Let’s take another step.

[7]:

# Solving Ax = b where A = J(X_1) and b = F(X_1)
b = F(X_2)
A = J(X_2)
X_3 = X_2 - inv(A) * b

[7]:

2-element Vector{Float64}:
 3.624999999999995
 1.3750000000000024

[8]:

@show norm(F(X_0))
@show norm(F(X_1))
@show norm(F(X_2))
@show norm(F(X_3));

norm(F(X_0)) = 29.206163733020468
norm(F(X_1)) = 119.50104602052625
norm(F(X_2)) = 29.875261505131533
norm(F(X_3)) = 7.46881537628288

Ok, this is starting to look like an algorithm we can code up:

[9]:

function newton(F, J, X_0; tol=1e-12, max_its=100, verbose=true)
    # Initial setup
    X_n = X_0
    F_n = F(X_n)
    if verbose
        println("Newton residuals:")
        println("r_0 = $(norm(F_n))")
    end

    i = 0
    while norm(F_n) > tol && i < max_its
        # Form Ax = b
        b = F_n
        A = J(X_n)
        # Solve for update
        X_n = X_n - inv(A) * b
        F_n = F(X_n)
        if verbose
            println("r_$(i) = $(norm(F_n))")
        end
        i += 1
    end
    if i == max_its
        println("Warning: Max iterations reached")
    end
    X_n
end

[9]:

newton (generic function with 1 method)

[10]:

# And let's try it
X_0 = [0; 0]
X_star = newton(F, J, X_0)

println("\nFinal result:")
@show X_star
@show F_star = F(X_star)
@show norm(F_star);

Newton residuals:
r_0 = 29.206163733020468
r_0 = 119.50104602052625
r_1 = 29.875261505131533
r_2 = 7.46881537628288
r_3 = 1.867203844070719
r_4 = 0.4668009610176802
r_5 = 0.11670024025441997
r_6 = 0.029175060063604052
r_7 = 0.0072937650159012775
r_8 = 0.0018234412539753194
r_9 = 0.00045586031349382984
r_10 = 0.00011396507837345746
r_11 = 2.8491269593364365e-5
r_12 = 7.122817398341091e-6
r_13 = 1.7807043495852728e-6
r_14 = 4.451760873963182e-7
r_15 = 1.1129402184907955e-7
r_16 = 2.7823505462269888e-8
r_17 = 6.955876365567472e-9
r_18 = 1.738969091391868e-9
r_19 = 4.34742272847967e-10
r_20 = 1.0868556821199175e-10
r_21 = 2.7171392052997937e-11
r_22 = 6.792848013249484e-12
r_23 = 1.698212003312371e-12
r_24 = 4.2455300082809277e-13

Final result:
X_star = [2.000000387430191, 2.999999612569809]
F_star = F(X_star) = [3.0020430585864233e-13, 3.0020430585864233e-13]
norm(F_star) = 4.2455300082809277e-13