Return to Article Details Approximating solutions of equations using Newton's method with a modified Newton's method iterate as a starting point

Approximating solutions of equations
using Newton’s method with a modified Newton’s method iterate as a starting point

Ioannis K. Argyros
(Date: January 14, 2007.)
Abstract.

In this study we are concerned with the problem of approximating a locally unique solution of an equation in a Banach space setting using Newton’s and modified Newton’s methods. We provide weaker convergence conditions for both methods than before [6]–[8]. Then, we combine Newton’s with the modified Newton’s method to approximate locally unique solutions of operator equations in a Banach space setting. Finer error estimates, a larger convergence domain, and a more precise information on the location of the solution are obtained under the same or weaker hypotheses than before [6]–[8]. Numerical examples are also provided.

Key words and phrases:
Banach space, Newton–Kantorovich method, radius of convergence, Fréchet-derivative, Banach lemma on invertible operators.
1991 Mathematics Subject Classification:
65H10, 65G99, 47H10, 49M15.
Cameron University, Department of Mathematical Sciences, Lawton, OK 73505, USA, e-mail: iargyros@cameron.edu

1. Introduction

In this study we are concerned with the problem of approximating a locally unique solution x of nonlinear equation

(1) F(x)=0,

where F is a Fréchet-differentiable operator defined on an open convex subset D of a Banach space X with values in a Banach space Y.

A large number of problems in applied mathematics and also in engineering are solved by finding the solutions of certain equations. For example, dynamic systems are mathematically modeled by difference or differential equations, and their solutions usually represent the states of the systems. For the sake of simplicity, assume that a time-invariant system is driven by the equation x˙=Q(x) for some suitable operator Q, where x is the state. Then the equilibrium states are determined by solving equation (1.1). Similar equations are used in the case of discrete systems. The unknowns of engineering equations can be functions (difference, differential, and integral equations), vectors (systems of linear or nonlinear algebraic equations), or real or complex numbers (single algebraic equations with single unknowns). Except in special cases, the most commonly used solution methods are iterative – when starting from one or several initial approximations a sequence is constructed that converges to a solution of the equation. Iteration methods are also applied for solving optimization problems. In such cases, the iteration sequences converge to an optimal solution of the problem at hand. Since all of these methods have the same recursive structure, they can be introduced and discussed in a general framework.

The most popular methods for approximation x are undoubtedly Newton’s method

(2) xn+1=xnF(xn)1F(xn)(n0),(x0D),

and the modified Newton’s method

(3) yn+1=ynF(y0)1F(yn)(n0),(y0=x0).

There is an extensive literature on the semilocal as well as the local convergence results for both methods under various hypotheses. Such results can be found in [4], [6], [7], and the references there.

The most popular hypotheses are of Newton–Kantorovich type [4], [6], [7]. Indeed, let x0D. Assume there exist constants η>0, >0 such that

(4) F(x0)1L(Y,X),
(5) F(x0)1F(x0)η,
(6) F(x0)1[F(x)F(y)]xyfor all x,yD,
(7) h=η12,

and

(8) U¯(x0,s)={xXxx0s}D,

where,

(9) s=112h.

Estimate (7) is the crucial non-optimum sufficient condition for the semilocal convergence of both methods [4], [6], [7] (see also Theorem 1).

Under condition (7) method (2) converges quadratically to x (if (7) holds as a strict inequality) whereas method (3) converges linearly to x. There are examples in the literature where both methods converge to x but condition (7) is violated. Therefore one would expect that there may be conditions weaker than (7). This is the motivation for our study. Note that in view of the Lipschitz condition (6) it follows that there exists 0>0 such that the center-Lipschitz condition

(10) F(x0)1[F(x)F(x0)]0xx0for all xD

holds.

In general

(11) 0,

holds true, and 0 can be arbitrarily large [2]–[4].

Recently, in [3, pp. 387, Case 3, for δ0=δ], we showed that condition (7) can always be replaced by the weaker

(12) h1=1η12,1=18(+40+2+80),

in the case of Newton’s method (2) (see also, Examples 9–11).

Here, we show that in the case of the modified method (3), for convergence, condition (7) can be replaced by

(13) h0=0η12,

and (6) by weaker condition (10). Finer error estimates on the distances involved, a larger convergence domain, and a more precise information on the location of the solution than in earlier results [6] are also obtained this way (see Theorem 3 for method (2), and Theorem 5 for method (3)).

Using (13) (whenever (7) (or (12) do not hold), we can employ method (3) for a finite number of steps, say N until condition (7) (or (12)) is satisfied for x0=yN. Then faster method (2) takes over from method (3).

The above advantages extend to the local convergence of methods (2) and (3) (see Theorems 10 and 11). Numerical examples are also provided. The technique introduced here can extend to other Newton-type iterative methods [1], [3], [4], [5].

2. Semilocal Convergence Analysis

The following semilocal convergence result for methods (2) and (3) can be found in [7]:

Theorem 1.

Let F:DXY be a differentiable operator.

Assume there exist x0D, and constants >0, η>0 such that

F(x0)1L(Y,X),
F(x0)1F(x0)η,
F(x0)1[F(x)F(y)]xyfor all x,yD,
h=η12,

and

U¯(x0,s)D,

where

s=112η.

Then sequences {yn}, {xn} are well defined, remain in U¯(x0,s) for all n0 and converge to a unique solution x of equation F(x)=0 in U¯(x0,s). Moreover the following estimates hold:

yn+1yn qny1y0qnη,
ynx qn1qη,
xn+2xn+1 (sn+1sn)22(1sn+1),

and

xnxssn,s=limnsn,

where,

s0=0,s1=η,sn+2=sn+1+(sn+1sn)22(1sn+1)(n0),

and

(14) q=112h.
Remark 2.

There is a plethora of estimates on the distances xn+1xn, xnx, yn+1yn, ynx (n0) [4], [6], [7]. However we decided to list only the estimates related to what we need in this study. In the case of Newton’s method (2) we showed in [2], [3] the following improvement of Theorem 1.∎

Theorem 3.

[2], [3]. Let F:DXY be a differentiable operator.

Assume there exist x0D, and constants 0>0, >0, η0 such that

F(x0)1L(Y,X),
F(x0)1F(x0)η,
F(x0)1[F(x)F(x0)]0xx0for all xD,
F(x0)1[F(x)F(y)]xyfor all x,yD,
h1=1η12,
U¯(x0,t)D,

where,

t0=0,t1=η,tn+2=tn+1+(tn+1tn)22(10tn+1)(n0),

and

t=limntn2η22=t0,2=0+(0)2+802.

Then sequence {xn} (n0) generated by Newton’s method (2) is well defined, remains in U¯(x0,t) for all n0 and converges to a unique solution x of equation F(x)=0 in U¯(x0,t).

Moreover the following estimates hold for all n0:

xn+1xn tn+1tn,
xnx ttn,
(15) tn sn,
(16) tn+1tn sn+1sn,

and

(17) ttnssn.
Remark 4.

Note also that (15) and (16) hold as strict inequalities if 0< [2]–[4]. Moreover we have:

(18) h12h112,

but not vice versa unless if 0=. That is under the same computational cost we managed to weaken (7) since in practice the computation of also requires the computation of 0. Furthermore, in Example 9, we show that (12) holds but condition (7) is violated.∎

Concerning the semilocal convergence of the modified Newton’s method we show that (13) replaces condition (7).

Theorem 5.

Let F:DXY be a differentiable operator.

Assume there exist x0D and constants 0>0, η>0, such that

F(x0)1L(Y,X),
F(x0)1F(x0)η,
F(x0)1[F(x)F(x0)]0xx0for all xD,
h0=0η<12,

and

(19) U¯(x0,s0)D,

where,

s0=1120η0.

Then sequence {yn} (n0) generated by the modified Newton’s method (3) is well defined, remains in U¯(x0,s0) for all n0 and converges to a unique solution x of equation F(x)=0 in U¯(x0,s0).

Moreover the following estimates hold for all n0:

yn+1ynq0ny1y0

and

ynxq0n1q0η,

where

q0=1120η.
Proof.

We shall show that the assumptions of the contraction mapping principle are satisfied for the operator

(20) P(x)=xF(x0)1F(x)onU¯(x0,s0).

Let x,yU¯(x0,s0). Then we can obtain the identity

P(x)P(y) = xyF(x0)1(F(x)F(y))
= F(x0)101{F(x0)F[y+t(xy)]}(xy)dt.

This identity together with (10) implies the estimate

(21) P(x)P(y) 001[(1t)xx0+tyx0]dtxy
0s0xy=q0xy.

Consequently, P is a contraction operator in the ball U¯(x0,s0). To complete the proof, it remains to show that

PU¯(x0,s0)U¯(x0,s0).

Let xU¯(x0,s0). Then by (20) we can obtain in turn

P(x)x0 P(x)P(x0)+P(x0)x0
F(x0)101{F(x0)F[x0+t(xx0)]}(xx0)dt+η
001tdtxx02+η02(s0)2+η=s0,

by the choice of s0. That completes the proof of Theorem 5. ∎

Remark 6.

Note that by (21) the operator P satisfies a Lipschitz condition with constant q0 in the ball U¯(x0,s0). The modified Newton’s method thus converges at the rate of a geometric progression with quotient q0.∎

The above analysis of method (3) relates to the simplest case. More subtle arguments (see, e.g. Kantorovich and Akilov [6]) show that Theorem 5 remains valid if the sign < in (19) is replace by . Therefore from now on we can replace (19) by (13) in Theorem 5.

Remark 7.

If 0= Theorems 3 and 5 reduce to Theorem 1. Otherwise these theorems constitute improvements of it. Indeed see (15)–(18), and notice that

q0<q

and

s0<s.

Notice also that (7) (or (12)) implies (13) and if ts0 the quadratic convergence of method (2) is guaranteed. Moreover s0 given in closed form can then in practice replace t. Furthermore if s0<t then there exists N>1 such that xnU¯(x0,s0) for nN and then again s0 can replace t.∎

Next we show that we can start with method (3) and after a finite number of steps continue with faster method (2):

Proposition 8.

Under hypotheses (4)–(6), (10), (13), and (19), for x0=y0, define

α=110s0,
L=α,

for a fixed integer N

2=supxU¯(yN,rN)F(y0)1[F(x)F(yN)]xyN,
L0=α2L,
L¯={LifL0=L18(L+4L0+L2+8L0L)ifL0<L,
N=[ln2L¯α2ηlnq0]+1,
rN={112L¯ηNL¯ifL0=L2ηN2L2NifL0<L,
ηN=αq0Nη,

and

L2N=LL0+(LL0)2+8LL02,forL00,

where [r] denotes the integer part of real number r.

Set

x0¯=yN.

Moreover, assume:

(22) U¯(yN,rN)D.

Then the following hold:

(23) F(yN)1F(yN)ηN,
(24) F(yN)1[F(x)F(y)]Lxy,
(25) F(yN)1[F(x)F(yN)]L0xyN,
(26) HN=L¯ηN12;

Newton’s method (2), starting at x0=x0¯ converges to a unique solution x of equation F(x)=0 in U¯(yN,rN),

and

N0N1,

where

N0=NforL0=L,

and

N1=NforL0<L.

Moreover, if the inclusion

(27) U¯(yN,rN)U¯(y0,s0),

hold, then

x=x.

Note that parameter L0 is independent of N if L0=L, and the inclusion (27) holds if and only if

yNy0+rNs0.

If yNy0s0, then by the definition of rN, there exists an integer N0, such that (27) holds. In this case N={N,N0} should replace N in the Proposition. Hypothesis (22) can now be dropped, since it follows from (19) and (27).

Proof.

Using Theorem 5, and the estimates

F(yN)1F(yN)F(yN)1F(y0)F(y0)1F(yN)110yNy0F(y0)1F(yN)F(y0)1F(yN)10s0αq0Nη=ηN,
F(yN)1[F(x)F(y)]αF(y0)1[F(x)F(y)]αxy=Lxy,
F(yN)1[F(x)F(yN)]αF(y0)1[F(x)F(yN)]α2xyN=L0xyN,

we obtain

HN=L¯ηN=α2L¯q0Nη12

and

N0N1,

by the choice of N.

It follows by Theorems 1 and 3, with L0, L, ηN, rN, rN replacing 0, , η, s, t0 respectively, that there exists a unique solution x of equation F(x)=0 in U¯(yN,rN).

Moreover, if inclusion (27) holds by the uniqueness of the solution x in U¯(y0,s0), we deduce x=x.

That completes the proof of Proposition 8. ∎

Let us provide an example.

Example 9.

Let X=Y=𝐑, D=[a,2a], a[0,12) and define scalar function F on D by

(28) F(x)=x3a.

Choose y0=1. Using (5), (6), (10) and (28), we obtain

(29) η=13(1a),0=3aand=2(2a).

The Newton–Kantorovich hypothesis (7) becomes

(30) h=23(1a)(2a)>12

for all a[0,12). That is according to Theorem 1 there is no guarantee that either methods (2) or (3) starting at x0=y0=1 converge to x.

However according to Theorem 3 condition (12) becomes:

(31) h112,

provided that

(32) a[.450339002,12).

Using condition (13) we can do even better since

(33) h0=13(1a)(3a)12

provided that

(34) a[4102,12)

which improves the choice for a given by (32). However only linear and not quadratic convergence is guaranteed.

Let us now use a=.49. In particular (7) does not hold since for η=.17, =3.02

h1=.5134>12.

However (13) holds since for 0=2.51:

h0=.426712.

We get

q0=.617116205,α=2.61175848,s0=.24586303,
(35) N=[4.0325]+1=5.

Moreover we obtain

x¯0=y4=.78911736.

Finally note that x=.788373516.

Our motivation for introducing condition (10) instead of (6) for the convergence modified Newton’s method (3) can also be seen in the following examples.

Example 10.

Let X=Y=𝐑, D=[0,), x0=1 and define function F on D by

(36) F(x)=x1+1i1+1i+c1x+c2,

where c1,c2 are real parameters and i>2 an integer. Then F(x)=x1i+c1 is not Lipschitz on D. However, center Lipschitz condition (10) holds for 0=(1+c1)1 (c11).

Indeed, we have

F(x0)1[F(x)F(x0)] = (1+c1)1|x1ix01i|
= (1+c1)1|xx0|x0i1i++xi1i
0|xx0|.
Example 11.

We consider the integral equation

(37) u(s)=f(s)+λabG(s,t)u(t)1+1ndt,n𝐍.

Here, f is a given continuous function satisfying f(s)>0, s[a,b], λ is a real number, and the kernel G is continuous and positive in [a,b]×[a,b].

For example, when G(s,t) is the Green kernel, the corresponding integral equation is equivalent to the boundary value problem

u′′ = λu1+1n,
u(a) = f(a),u(b)=f(b).

These type of problems have been considered in [2]–[7].

Equations of the form (37) generalize equations of the type

(38) u(s)=abG(s,t)u(t)ndt

studies in [4], [6], [7].

Instead of (37) we can try to solve the equation F(u)=0 where

F:ΩC[a,b]C[a,b],Ω={uC[a,b]:u(s)0,s[a,b]},

and

F(u)(s)=u(s)f(s)λabG(s,t)u(t)1+1ndt.

The norm we consider is the max-norm.

The derivative F is given by

F(u)v(s)=v(s)λ(1+1n)abG(s,t)u(t)1nv(t)dt,vΩ.

First of all, we notice that F does not satisfy a Lipschitz-type condition in Ω. Let us consider, for instance, [a,b]=[0,1], G(s,t)=1 and y(t)=0. Then F(y)v(s)=v(s) and

F(x)F(y)=|λ|(1+1n)01x(t)1ndt.

If F were a Lipschitz function, then

F(x)F(y)L1xy,

or, equivalently, the inequality

(39) 01x(t)1ndtL2maxx[0,1]x(s),

would hold for all xΩ and for a constant L2. But this is not true. Consider, for example, the functions

xj(t)=tj,j1,t[0,1].

If these are substituted into (39)

1j1/n(1+1n)L2jj11nL2(1+1n),j1.

This inequality is not true when j.

Therefore, condition (6) fails in this case. However, condition (10) holds. To show this, let x0(t)=f(t) and α=mins[a,b]f(s), α>0. Then, for vΩ,

[F(x)F(x0)]v
=|λ|(1+1n)maxs[a,b]|abG(s,t)(x(t)1nf(t)1n)v(t)dt|
|λ|(1+1n)maxs[a,b]G(s,t)|x(t)f(t)|x(t)(n1)/n+x(t)(n2)/nf(t)1/n++f(t)(n1)/ndtv.

Hence,

F(x)F(x0) |λ|(1+1n)α(n1)/nmaxs[a,b]abG(s,t)dtxx0
Kxx0,

where K=|λ|(1+1n)α(n1)/nN, N=maxs[a,b]abG(s,t)dt, and 0=F(x0)1K. Finally note that condition (13) is satisfied for sufficiently small λ.

3. Local Convergence Analysis

In order for us to cover the local convergence of methods (2) and (3) we start the theorem [7]:

Theorem 12.

Let F:DXY be a differentiable operator. Assume there exist xD and a constant K>0 such that:

(40) F(x)1L(Y,X),F(x)=0,
(41) F(x)1[F(x)F(y)]Kxyfor all x,yD,

and

(42) U¯(x,rRN)D

where,

(43) rRN=23K.

Then

  1. (a)

    sequence {xn} generated by Newton’s method (2) is well defined, remains in U¯(x,rRN) for all n0, converges to x provided that x0U(x,rRN) and

    (44) xn+1xKxnx22(1Kxnx)(n0).

    If (58) is replaced by

    (45) U¯(x,rRM)D,

    where,

    (46) rRM=25K,

    then

  2. (b)

    sequence {yn} generated by modified Newton’s method (3) is well defined, remains in U¯(x,rRM) for all n0, converges to x provided that x0U(x,rRM), and

    (47) yn+1xK[xy0+12ynx]1Kxy0ynx(n0).
Proof.

The proof of (a) can be found in [8], whereas the proof of (b) is a special case of part (b) in our Theorem 11 that follows.

It follows from condition (41) that there exists K0(0,K) such that:

(48) F(x)1[F(x)F(x)]K0xxfor all xD.

Then using a combination of conditions (41) and (48) for method (2), and only condition (48) for method (3) we can show: ∎

Theorem 13.

Let F:DXY be a differentiable operator.

Assume there exist xD and constants K0>0, K>0 such that:

F(x)1L(Y,X),F(x)=0,
F(x)1[F(x)F(x)]K0xxfor all xD,
F(x)1[F(x)F(y)]Kxyfor all x,yD,

and

(49) U¯(x,rAN)D

where,

(50) rAN=22K0+K.
  1. (a)

    Then sequence {xn} generated by Newton’s method (2) is well defined, remains in U¯(x,rAN) for all n0, converges to x provided that x0U(x,rAN) and

    (51) xn+1xKxnx22(1K0xnx)(n0).

    Using only the center-Lipschitz condition, and if (49) is replaced by

    (52) U¯(x,rAM)D,

    and (48), where,

    (53) rAM=25K0,

    then,

  2. (b)

    sequence {yn} generated by the modified Newton’s method (3) is well defined, remains in U¯(x,rAM) for all n0, converges to x provided that x0U(x,rAM),

    and

    (54) yn+1xK0[xy0+12ynx]ynx1K0xy0(n0).
Proof.

(a) The proof can be found in [2].

(b) Let xU¯(x,rAM). Then using (48) we get

(55) F(x)1[F(x)F(x)]K0xxK0rAM<1

by the choice of rAM. It follows from (55) and the Banach Lemma on invertible operators [4], [6] that F(x)1 exists, and

(56) F(x)1F(x)11K0xx11K0rAM.

In particular by hypothesis y0U(x,rAM)U¯(x,rAM).

Let us assume ykU¯(x,rAM) for k=0,1,,n. Then using (3), (48) and (53) we obtain in turn

(57) yn+1x = ynxF(y0)1F(yn)
= F(y0)1[F(y0)(ynx)F(yn)]
= F(y0)1[F(yn)F(x)F(y0)(ynx)]
= F(y0)1F(x)F(x)1{01[F(x+t(ynx))F(x)]
+[F(x)F(y0)]}(ynx)dt,

and

(58) yn+1x K001[xy0+tynx]1K0xy0ynxdt
= K0[xy0+12ynx]1K0xy0ynx
< ynxrAM

which shows (54), and limnyn=x.

That completes the proof of Theorem 13. ∎

Remark 14.

In general

(59) K0K

holds and KK0 can be arbitrarily large. If K0=K Theorem 13 reduces to Theorem 10. Otherwise Theorem 13 improves Theorem 12 under the same hypotheses for method (2), and the same or less computational cost for method (3); finer estimates on the distances xnx (n0) are obtained and the radius of convergence is enlarged. In particular, we have

(60) rRN < rAN
(61) rRM < rAM.

Moreover, since

(62) rRM<rRN,

iterates from method (3) cannot be used to find the initial guess x0 for faster method (2).

Examples where estimate (59) holds as in strict inequality can be found in [2]–[4]. However using Theorem 13 we can achieve this as follows: Let p(0,1) be the desired rate of convergence for method (3). Then by (54) we have for y0x2p3K0+2pK0 that:

yn+1xpynx.

Choose:

(63) M=[ln5K02K0+Klnp]+1,

then it can easily be seen that

pMrAMrAN

and consequently according to (a) of Theorem 11 we can set

x0=yM.

In case of K0=K according to (a) of Theorem 10 we can set

x0=yM1,

where,

M1=[ln53lnp]+1

(simply set K0=K in (63)). Note that

MM1.

Finally we observe

rRN<rAM

if

KK0>53,

which can hold, since KK0 can be arbitrarily large [2]. ∎

The ideas presented here can be extended to other Newton-type iterative methods [1], [3], [4], [5] along the same lines.

4. Conclusion

The famous for its simplicity and clarity Newton–Kantorovich hypothesis (7) is the crucial sufficient semilocal convergence condition for both the quadratically convergent Newton’s method (2), and the linearly convergent modified Newton’s method (3) [6]. There exist simple numerical examples to show that both methods converge even when condition (7) is violated [4], [6]. In fact, it is common practice even condition (7) is violated for a certain initial guess, to still use Newton’s method (2) for a few iterates until condition (7) is satisfied. However, this approach is considered a shot in the dark, and it is not working in general [4], [7]. Here we have introduced a certain approach that works in case condition (7)is violated. First, we showed that condition (13) is a weaker sufficient convergence hypothesis for the modified Newton’s method (3) than (7). That is we extended the convergence region of method (3). Then, if (7) is violated but (13) holds, we start with slower method (3) until we reach, (after a finite number of steps) a certain iterate for which condition (7) also holds true. We then continue with faster method (2) using this iterate.

References

  • [1] Amat, S., Busquier, S. and Candela, V., A class of quasi-Newton generalized Steffensen methods on Banach spaces, J. Comput. Appl. Math., 149, pp. 397–408, 2002.
  • [2] Argyros, I.K., On the Newton–Kantorovich hypothesis for solving equations, J. Comput. Appl. Math., 169, pp. 315–332, 2004.
  • [3] Argyros, I.K., A unifying local-semilocal convergence analysis and applications for two-point Newton-like methods in Banach space, J. Math. Anal. Applic., 298, 2, pp. 374–397, 2004.
  • [4] Argyros, I.K., Computational theory of iterative methods, Studies in Computational Mathematics, 15 (Editors: C.K. Chui and L. Wuytack), Elsevier Publ. Co., New York, 2007.
  • [5] Gutierrez, J.M., Hernandez, M.A. and Salanova, M.A., Accessibility of solutions by Newton’s method, Intern. J. Comput. Math., 57, pp. 239–247, 1995.
  • [6] Kantorovich, L.V. and Akilov, G.P., Functional Analysis in Normed Spaces, Pergamon Press, Oxford, 1982.
  • [7] Krasnoselskii, M.A., Vainikko, G.M., Zabreiko, P.P., Rutitskii, Ya.B. and Stetsenko, V.Ya., Approximate Solution of Operator Equations, Wolters-Noordhoff Publ., Groningen, 1972.
  • [8] Rheinboldt, W.C., An adaptive continuation process for solving systems of nonlinear equations, Polish Academy of Science, Banach Ctr. Publ., 3, pp. 129–142, 1977.
  • [9]