Newton’s Method for Systems of Equations¶
We have covered rootfinding for non-linear equations, including Newton’s method, and linear algebra. There is a connection between these two topics - Newton’s method for systems of equations.
We want to find the solution \(x_*\) such that
for some non-linear function \(f \left( x \right)\).
For a single equation, Newton’s method is
We will generalize this for two equations with two unknown variables. Therefore, we will find the solution \(X_*\) such that
[1]:
# Let's start with some system of equations
f1(x1, x2) = (x1 - 2) * (x1 - 3) + (x2 - 4) * (x2 - 3)
f2(x1, x2) = (x1 - 2) * (x1 - 4) + (x2 - 5) * (x2 - 3)
function F(X)
return [f1(X[1], X[2]);
f2(X[1], X[2])]
end
[1]:
F (generic function with 1 method)
With Newton’s method for scalar equations, we start with an initial guess \(x_0\) for \(x_*\) and create iterates \(x_n\) that get increasingly close to \(x_*\) With each iteration, we are solving for the value \(x\) that is the root of the tangent line for the current guess \(x_{n - 1}\):
Solving for \(t \left( x \right) = 0\) gives the next \(x_n\) for our Newton iteration:
Similarly, we can form a tangent plane, \(T \left( X \right)\), and find the point on the tangent plane closest to the origin.
Solving for \(T \left( X \right) = 0\) gives the next \(X_n\) for our Newton iteration:
If \(J \left( X_{n - 1} \right)\) is not invertible, the we want to find the \(X_n\) such that \(\left\lvert \left\lvert J \left( X_{n - 1} \right) X_{n - 1} - F \left( X_{n - 1} \right) \right\rvert \right\rvert\) is minimized. This happens frequently with practical problems.
Especially when \(J\) is large, it is common to use a Krylov method to solve for \(X_n\). This leads to the Newton-Krylov methods, which are workhorses of numerical computation.
In order to form this tangent plane, we need the Jacobian matrix, \(J \left( X_{n - 1} \right)\). This Jacobian matrix is formed with the entries \(J_{i, j} = \frac{\partial f_i}{\partial i_j}\).
[2]:
# We also need derivatives
f1dx1(x1, x2) = (x1 - 3) + (x1 - 2)
f1dx2(x1, x2) = (x2 - 3) + (x2 - 4)
f2dx1(x1, x2) = (x1 - 4) + (x1 - 2)
f2dx2(x1, x2) = (x2 - 3) + (x2 - 5)
function J(X)
return [f1dx1(X[1], X[2]) f1dx2(X[1], X[2]);
f2dx1(X[1], X[2]) f2dx2(X[1], X[2])]
end
[2]:
J (generic function with 1 method)
Let’s start with an initial guess of \(X_0 = 0\) and do a few iterations “by hand”.
[3]:
# Initial guess
X_0 = [0; 0]
# Solving Ax = b where A = J(X_O) and b = F(X_0)
b = F(X_0)
A = J(X_0)
X_1 = X_0 - inv(A) * b
[3]:
2-element Vector{Float64}:
8.499999999999986
-3.4999999999999973
Is this closer? Let’s check.
[4]:
using LinearAlgebra
@show norm(F(X_0))
@show norm(F(X_1));
norm(F(X_0)) = 29.206163733020468
norm(F(X_1)) = 119.50104602052625
Hmm, that doesn’t look closer, but let’s try another iteration.
[5]:
# Solving Ax = b where A = J(X_1) and b = F(X_1)
b = F(X_1)
A = J(X_1)
X_2 = X_1 - inv(A) * b
[5]:
2-element Vector{Float64}:
5.249999999999987
-0.24999999999999956
[6]:
@show norm(F(X_0))
@show norm(F(X_1))
@show norm(F(X_2));
norm(F(X_0)) = 29.206163733020468
norm(F(X_1)) = 119.50104602052625
norm(F(X_2)) = 29.875261505131533
Now we’re headed in the right direction again. Let’s take another step.
[7]:
# Solving Ax = b where A = J(X_1) and b = F(X_1)
b = F(X_2)
A = J(X_2)
X_3 = X_2 - inv(A) * b
[7]:
2-element Vector{Float64}:
3.624999999999995
1.3750000000000024
[8]:
@show norm(F(X_0))
@show norm(F(X_1))
@show norm(F(X_2))
@show norm(F(X_3));
norm(F(X_0)) = 29.206163733020468
norm(F(X_1)) = 119.50104602052625
norm(F(X_2)) = 29.875261505131533
norm(F(X_3)) = 7.46881537628288
Ok, this is starting to look like an algorithm we can code up:
[9]:
function newton(F, J, X_0; tol=1e-12, max_its=100, verbose=true)
# Initial setup
X_n = X_0
F_n = F(X_n)
if verbose
println("Newton residuals:")
println("r_0 = $(norm(F_n))")
end
i = 0
while norm(F_n) > tol && i < max_its
# Form Ax = b
b = F_n
A = J(X_n)
# Solve for update
X_n = X_n - inv(A) * b
F_n = F(X_n)
if verbose
println("r_$(i) = $(norm(F_n))")
end
i += 1
end
if i == max_its
println("Warning: Max iterations reached")
end
X_n
end
[9]:
newton (generic function with 1 method)
[10]:
# And let's try it
X_0 = [0; 0]
X_star = newton(F, J, X_0)
println("\nFinal result:")
@show X_star
@show F_star = F(X_star)
@show norm(F_star);
Newton residuals:
r_0 = 29.206163733020468
r_0 = 119.50104602052625
r_1 = 29.875261505131533
r_2 = 7.46881537628288
r_3 = 1.867203844070719
r_4 = 0.4668009610176802
r_5 = 0.11670024025441997
r_6 = 0.029175060063604052
r_7 = 0.0072937650159012775
r_8 = 0.0018234412539753194
r_9 = 0.00045586031349382984
r_10 = 0.00011396507837345746
r_11 = 2.8491269593364365e-5
r_12 = 7.122817398341091e-6
r_13 = 1.7807043495852728e-6
r_14 = 4.451760873963182e-7
r_15 = 1.1129402184907955e-7
r_16 = 2.7823505462269888e-8
r_17 = 6.955876365567472e-9
r_18 = 1.738969091391868e-9
r_19 = 4.34742272847967e-10
r_20 = 1.0868556821199175e-10
r_21 = 2.7171392052997937e-11
r_22 = 6.792848013249484e-12
r_23 = 1.698212003312371e-12
r_24 = 4.2455300082809277e-13
Final result:
X_star = [2.000000387430191, 2.999999612569809]
F_star = F(X_star) = [3.0020430585864233e-13, 3.0020430585864233e-13]
norm(F_star) = 4.2455300082809277e-13