Approximating solutions of equations
using Newton’s method with a modified Newton’s method iterate as a starting point

Ioannis K. Argyros^∗

(Date: January 14, 2007.)

Abstract.

In this study we are concerned with the problem of approximating a locally unique solution of an equation in a Banach space setting using Newton’s and modified Newton’s methods. We provide weaker convergence conditions for both methods than before [6]–[8]. Then, we combine Newton’s with the modified Newton’s method to approximate locally unique solutions of operator equations in a Banach space setting. Finer error estimates, a larger convergence domain, and a more precise information on the location of the solution are obtained under the same or weaker hypotheses than before [6]–[8]. Numerical examples are also provided.

Key words and phrases:

Banach space, Newton–Kantorovich method, radius of convergence, Fréchet-derivative, Banach lemma on invertible operators.

1991 Mathematics Subject Classification:

65H10, 65G99, 47H10, 49M15.

^∗Cameron University, Department of Mathematical Sciences, Lawton, OK 73505, USA, e-mail: iargyros@cameron.edu

1. Introduction

In this study we are concerned with the problem of approximating a locally unique solution $x^{*}$ of nonlinear equation

(1)

F(x)=0,

where $F$ is a Fréchet-differentiable operator defined on an open convex subset $D$ of a Banach space $X$ with values in a Banach space $Y$ .

A large number of problems in applied mathematics and also in engineering are solved by finding the solutions of certain equations. For example, dynamic systems are mathematically modeled by difference or differential equations, and their solutions usually represent the states of the systems. For the sake of simplicity, assume that a time-invariant system is driven by the equation $\dot{x}=Q(x)$ for some suitable operator $Q$ , where $x$ is the state. Then the equilibrium states are determined by solving equation (1.1). Similar equations are used in the case of discrete systems. The unknowns of engineering equations can be functions (difference, differential, and integral equations), vectors (systems of linear or nonlinear algebraic equations), or real or complex numbers (single algebraic equations with single unknowns). Except in special cases, the most commonly used solution methods are iterative – when starting from one or several initial approximations a sequence is constructed that converges to a solution of the equation. Iteration methods are also applied for solving optimization problems. In such cases, the iteration sequences converge to an optimal solution of the problem at hand. Since all of these methods have the same recursive structure, they can be introduced and discussed in a general framework.

The most popular methods for approximation $x^{*}$ are undoubtedly Newton’s method

(2)

x_{n+1}=x_{n}-F^{\prime}(x_{n})^{-1}F(x_{n})\quad(n\geq 0),\ \ (x_{0}\in D),

and the modified Newton’s method

(3)

y_{n+1}=y_{n}-F^{\prime}(y_{0})^{-1}F(y_{n})\quad(n\geq 0),\ \ (y_{0}=x_{0}).

There is an extensive literature on the semilocal as well as the local convergence results for both methods under various hypotheses. Such results can be found in [4], [6], [7], and the references there.

The most popular hypotheses are of Newton–Kantorovich type [4], [6], [7]. Indeed, let $x_{0}\in D$ . Assume there exist constants $\eta>0$ , $\ell>0$ such that

(4)		$\displaystyle F^{\prime}(x_{0})^{-1}\in L(Y,X),$
(5)		$\displaystyle\\|F^{\prime}(x_{0})^{-1}F(x_{0})\\|\leq\eta,$
(6)		$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(y)]\\|\leq\ell\\|% x-y\\|\ \ \hbox{for all }x,y\in D,$
(7)		$\displaystyle h=\ell\eta\leq\frac{1}{2},$

and

(8)

\overline{U}(x_{0},s^{*})=\{x\in X\mid\|x-x_{0}\|\leq s^{*}\}\subseteq D,

where,

(9)

s^{*}=\frac{1-\sqrt{1-2h}}{\ell}\,.

Estimate (7) is the crucial non-optimum sufficient condition for the semilocal convergence of both methods [4], [6], [7] (see also Theorem 1).

Under condition (7) method (2) converges quadratically to $x^{*}$ (if (7) holds as a strict inequality) whereas method (3) converges linearly to $x^{*}$ . There are examples in the literature where both methods converge to $x^{*}$ but condition (7) is violated. Therefore one would expect that there may be conditions weaker than (7). This is the motivation for our study. Note that in view of the Lipschitz condition (6) it follows that there exists $\ell_{0}>0$ such that the center-Lipschitz condition

(10)

\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(x_{0})]\|\leq\ell_{0}\|x-x_{% 0}\|\ \ \hbox{for all }x\in D

holds.

In general

(11)

\ell_{0}\leq\ell,

holds true, and $\frac{\ell}{\ell_{0}}$ can be arbitrarily large [2]–[4].

Recently, in [3, pp. 387, Case 3, for $\delta_{0}=\delta$ ], we showed that condition (7) can always be replaced by the weaker

(12)

h_{1}=\ell_{1}\eta\leq\frac{1}{2},\ \ \ell_{1}=\frac{1}{8}\,\,(\ell+4\,\,\ell_% {0}+\sqrt{\ell^{2}+8\,\,\ell_{0}\,\,\ell})\,,

in the case of Newton’s method (2) (see also, Examples 9–11).

Here, we show that in the case of the modified method (3), for convergence, condition (7) can be replaced by

(13)

h_{0}=\ell_{0}\eta\leq\frac{1}{2},

and (6) by weaker condition (10). Finer error estimates on the distances involved, a larger convergence domain, and a more precise information on the location of the solution than in earlier results [6] are also obtained this way (see Theorem 3 for method (2), and Theorem 5 for method (3)).

Using (13) (whenever (7) (or (12) do not hold), we can employ method (3) for a finite number of steps, say $N$ until condition (7) (or (12)) is satisfied for $x_{0}=y_{N}$ . Then faster method (2) takes over from method (3).

The above advantages extend to the local convergence of methods (2) and (3) (see Theorems 10 and 11). Numerical examples are also provided. The technique introduced here can extend to other Newton-type iterative methods [1], [3], [4], [5].

2. Semilocal Convergence Analysis

The following semilocal convergence result for methods (2) and (3) can be found in [7]:

Theorem 1.

Let $F\colon D\subseteq X\to Y$ be a differentiable operator.

Assume there exist $x_{0}\in D$ , and constants $\ell>0$ , $\eta>0$ such that

	$\displaystyle F^{\prime}(x_{0})^{-1}\in L(Y,X),$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}F(x_{0})\\|\leq\eta,$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(y)]\\|\leq\ell\\|% x-y\\|\ \ \hbox{\it for all }x,y\in D,$
	$\displaystyle h=\ell\eta\leq\frac{1}{2},$

and

\overline{U}(x_{0},s^{*})\subseteq D,

where

s^{*}=\frac{1-\sqrt{1-2\ell\eta}}{\ell}\,.

Then sequences $\{y_{n}\}$ , $\{x_{n}\}$ are well defined, remain in $\overline{U}(x_{0},s^{*})$ for all $n\geq 0$ and converge to a unique solution $x^{*}$ of equation $F(x)=0$ in $\overline{U}(x_{0},s^{*})$ . Moreover the following estimates hold:

$\displaystyle\\|y_{n+1}-y_{n}\\|$	$\displaystyle\leq$	$\displaystyle q^{n}\\|y_{1}-y_{0}\\|\leq q^{n}\eta,$
$\displaystyle\\|y_{n}-x^{*}\\|$	$\displaystyle\leq$	$\displaystyle\frac{q^{n}}{1-q}\eta,$
$\displaystyle\\|x_{n+2}-x_{n+1}\\|$	$\displaystyle\leq$	$\displaystyle\frac{\ell(s_{n+1}-s_{n})^{2}}{2(1-\ell s_{n+1})}\,,$

and

\|x_{n}-x^{*}\|\leq s^{*}-s_{n},\quad s^{*}=\lim_{n\to\infty}s_{n},

where,

s_{0}=0,\ \ s_{1}=\eta,\ \ s_{n+2}=s_{n+1}+\frac{\ell(s_{n+1}-s_{n})^{2}}{2(1-% \ell s_{n+1})}\quad(n\geq 0),

and

(14)

q=1-\sqrt{1-2\ell h}\,.

Remark 2.

There is a plethora of estimates on the distances $\|x_{n+1}-x_{n}\|$ , $\|x_{n}-x^{*}\|$ , $\|y_{n+1}-y_{n}\|$ , $\|y_{n}-x^{*}\|$ $(n\geq 0)$ [4], [6], [7]. However we decided to list only the estimates related to what we need in this study. In the case of Newton’s method (2) we showed in [2], [3] the following improvement of Theorem 1.∎

Theorem 3.

[2], [3]. Let $F\colon D\subseteq X\to Y$ be a differentiable operator.

Assume there exist $x_{0}\in D$ , and constants $\ell_{0}>0$ , $\ell>0,$ $\eta\geq 0$ such that

	$\displaystyle F^{\prime}(x_{0})^{-1}\in L(Y,X),$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}F(x_{0})\\|\leq\eta,$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(x_{0})]\\|\leq% \ell_{0}\\|x-x_{0}\\|\ \ \hbox{for all }x\in D,$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(y)]\\|\leq\ell\\|% x-y\\|\ \ \hbox{for all }x,y\in D,$
	$\displaystyle h_{1}=\ell_{1}\eta\leq\frac{1}{2},$
	$\displaystyle\overline{U}(x_{0},t^{*})\subseteq D,$

where,

t_{0}=0,\ \ t_{1}=\eta,\ \ t_{n+2}=t_{n+1}+\tfrac{\ell(t_{n+1}-t_{n})^{2}}{2(1% -\ell_{0}t_{n+1})}\ \ (n\geq 0),

and

t^{*}=\lim_{n\to\infty}t_{n}\leq\displaystyle\tfrac{2\,\,\eta}{2-\ell_{2}}=t_{% 0}^{*},\quad\ell_{2}=\displaystyle\tfrac{\displaystyle-\tfrac{\ell}{\ell_{0}}+% \displaystyle\sqrt{{\big(}\tfrac{\ell}{\ell_{0}}{\big)}^{2}+\tfrac{8\,\,\ell}{% \ell_{0}}}}{2}.

Then sequence $\{x_{n}\}$ $(n\geq 0)$ generated by Newton’s method (2) is well defined, remains in $\overline{U}(x_{0},t^{*})$ for all $n\geq 0$ and converges to a unique solution $x^{*}$ of equation $F(x)=0$ in $\overline{U}(x_{0},t^{*})$ .

Moreover the following estimates hold for all $n\geq 0$ :

	$\displaystyle\\|x_{n+1}-x_{n}\\|$	$\displaystyle\leq$	$\displaystyle t_{n+1}-t_{n},$
	$\displaystyle\\|x_{n}-x^{*}\\|$	$\displaystyle\leq$	$\displaystyle t^{*}-t_{n},$
(15)		$\displaystyle t_{n}$	$\displaystyle\leq$	$\displaystyle s_{n},$
(16)		$\displaystyle t_{n+1}-t_{n}$	$\displaystyle\leq$	$\displaystyle s_{n+1}-s_{n},$

and

(17)

t^{*}-t_{n}\leq s^{*}-s_{n}.

Remark 4.

Note also that (15) and (16) hold as strict inequalities if $\ell_{0}<\ell$ [2]–[4]. Moreover we have:

(18)

h\leq\frac{1}{2}\Rightarrow h_{1}\leq\frac{1}{2}\,,

but not vice versa unless if $\ell_{0}=\ell$ . That is under the same computational cost we managed to weaken (7) since in practice the computation of $\ell$ also requires the computation of $\ell_{0}$ . Furthermore, in Example 9, we show that (12) holds but condition (7) is violated.∎

Concerning the semilocal convergence of the modified Newton’s method we show that (13) replaces condition (7).

Theorem 5.

Let $F\colon D\subseteq X\to Y$ be a differentiable operator.

Assume there exist $x_{0}\in D$ and constants $\ell_{0}>0$ , $\eta>0$ , such that

	$\displaystyle F^{\prime}(x_{0})^{-1}\in L(Y,X),$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}F(x_{0})\\|\leq\eta,$
	$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(x_{0})]\\|\leq% \ell_{0}\\|x-x_{0}\\|\ \ \hbox{\it for all }x\in D,$

h_{0}=\ell_{0}\eta<\frac{1}{2},

and

(19)

\overline{U}(x_{0},s^{*}_{0})\subseteq D,

where,

s^{*}_{0}=\frac{1-\sqrt{1-2\ell_{0}\eta}}{\ell_{0}}\,.

Then sequence $\{y_{n}\}$ $(n\geq 0)$ generated by the modified Newton’s method (3) is well defined, remains in $\overline{U}(x_{0},s^{*}_{0})$ for all $n\geq 0$ and converges to a unique solution $x^{*}$ of equation $F(x)=0$ in $\overline{U}(x_{0},s^{*}_{0})$ .

Moreover the following estimates hold for all $n\geq 0$ :

\|y_{n+1}-y_{n}\|\leq q^{n}_{0}\,\,\|y_{1}-y_{0}\|

and

\|y_{n}-x^{*}\|\leq\frac{q_{0}^{n}}{1-q_{0}}\eta,

where

q_{0}=1-\sqrt{1-2\ell_{0}\eta}\,.

Proof.

We shall show that the assumptions of the contraction mapping principle are satisfied for the operator

(20)

P(x)=x-F^{\prime}(x_{0})^{-1}F(x)\ \ \hbox{on}\ \ \overline{U}(x_{0},s^{*}_{0}).

Let $x,y\in\overline{U}(x_{0},s^{*}_{0})$ . Then we can obtain the identity

	$\displaystyle P(x)-P(y)$	$\displaystyle=$	$\displaystyle x-y-F^{\prime}(x_{0})^{-1}(F(x)-F(y))$
		$\displaystyle=$	$\displaystyle F^{\prime}(x_{0})^{-1}\int^{1}_{0}\{F^{\prime}(x_{0})-F^{\prime}% [y+t(x-y)]\}(x-y){\rm d}t.$

This identity together with (10) implies the estimate

(21)		$\displaystyle\\|P(x)-P(y)\\|$	$\displaystyle\leq$	$\displaystyle\ell_{0}\int^{1}_{0}[(1-t)\\|x-x_{0}\\|+t\\|y-x_{0}\\|]\,\,{\rm d}t\,% \,\parallel x-y\parallel$
(21)			$\displaystyle\leq$	$\displaystyle\ell_{0}\,\,s^{*}_{0}\,\,\\|x-y\\|=q_{0}\,\,\\|x-y\\|.$

Consequently, $P$ is a contraction operator in the ball $\overline{U}(x_{0},s^{*}_{0})$ . To complete the proof, it remains to show that

P\overline{U}(x_{0},s^{*}_{0})\subseteq\overline{U}(x_{0},s^{*}_{0}).

Let $x\in\overline{U}(x_{0},s^{*}_{0})$ . Then by (20) we can obtain in turn

$\displaystyle\\|P(x)-x_{0}\\|$	$\displaystyle\leq$	$\displaystyle\\|P(x)-P(x_{0})\\|+\\|P(x_{0})-x_{0}\\|$
	$\displaystyle\leq$	$\displaystyle\left\\|F^{\prime}(x_{0})^{-1}\int^{1}_{0}\{F^{\prime}(x_{0})-F^{% \prime}[x_{0}+t(x-x_{0})]\}(x-x_{0}){\rm d}t\right\\|+\eta$
	$\displaystyle\leq$	$\displaystyle\ell_{0}\int^{1}_{0}t\,{\rm d}t\\|x-x_{0}\\|^{2}+\eta\leq\frac{\ell% _{0}}{2}(s^{}_{0})^{2}+\eta=s^{}_{0},$

by the choice of $s^{*}_{0}$ . That completes the proof of Theorem 5. ∎

Remark 6.

Note that by (21) the operator $P$ satisfies a Lipschitz condition with constant $q_{0}$ in the ball $\overline{U}(x_{0},s^{*}_{0})$ . The modified Newton’s method thus converges at the rate of a geometric progression with quotient $q_{0}$ .∎

The above analysis of method (3) relates to the simplest case. More subtle arguments (see, e.g. Kantorovich and Akilov [6]) show that Theorem 5 remains valid if the sign $<$ in (19) is replace by $\leq$ . Therefore from now on we can replace (19) by (13) in Theorem 5.

Remark 7.

If $\ell_{0}=\ell$ Theorems 3 and 5 reduce to Theorem 1. Otherwise these theorems constitute improvements of it. Indeed see (15)–(18), and notice that

q_{0}<q

and

s^{*}_{0}<s^{*}.

Notice also that (7) (or (12)) implies (13) and if $t^{*}\leq s^{*}_{0}$ the quadratic convergence of method (2) is guaranteed. Moreover $s^{*}_{0}$ given in closed form can then in practice replace $t^{*}$ . Furthermore if $s^{*}_{0}<t^{*}$ then there exists $N>1$ such that $x_{n}\in\overline{U}(x_{0},s^{*}_{0})$ for $n\geq N$ and then again $s^{*}_{0}$ can replace $t^{*}$ .∎

Next we show that we can start with method (3) and after a finite number of steps continue with faster method (2):

Proposition 8.

Under hypotheses (4)–(6), (10), (13), and (19), for $x_{0}=y_{0}$ , define

\alpha=\displaystyle\tfrac{1}{1-\ell_{0}\,\,s_{0}^{*}},

L=\alpha\,\,\ell,

for a fixed integer $N$

\ell_{2}=\displaystyle\sup_{x\in\overline{U}(y_{N},r_{N})}\displaystyle\tfrac{% \parallel F^{\prime}(y_{0})^{-1}\,\,[F^{\prime}(x)-F^{\prime}(y_{N})]\parallel% }{\parallel x-y_{N}\parallel},

L_{0}=\alpha\,\,\ell_{2}\leq L,

\overline{L}=\left\{\begin{array}[]{lll}L\quad&if&\quad L_{0}=L\\ \\ \displaystyle\tfrac{1}{8}\,\,(L+4\,\,L_{0}+\sqrt{L^{2}+8\,\,L_{0}\,\,L})\quad&% if&\quad L_{0}<L,\par\end{array}\right.

N={\bigg[}-\displaystyle\tfrac{\ln 2\,\,\overline{L}\,\,\alpha^{2}\,\,\eta}{% \ln q_{0}}{\bigg]}+1,

r_{N}=\left\{\begin{array}[]{lll}\displaystyle\tfrac{1-\sqrt{1-2\,\,\overline{% L}\,\,\eta_{N}}}{\overline{L}}\quad&if&\quad L_{0}=L\\ \\ \displaystyle\tfrac{2\,\,\eta_{N}}{2-L_{2\,N}}\quad&if&\quad L_{0}<L,\par\end{% array}\right.

\eta_{N}=\alpha\,\,q_{0}^{N}\,\,\eta,

and

L_{2\,N}=\displaystyle\tfrac{\displaystyle-\tfrac{L}{L_{0}}+\displaystyle\sqrt% {{\big(}\tfrac{L}{L_{0}}{\big)}^{2}+\tfrac{8\,\,L}{L_{0}}}}{2},\qquad{\rm for}% \quad L_{0}\neq 0,

where $[r]$ denotes the integer part of real number $r$ .

Set

\overline{x_{0}}=y_{N}.

Moreover, assume:

(22)

\overline{U}(y_{N},r_{N})\subseteq D.

Then the following hold:

(23)

\parallel F^{\prime}(y_{N})^{-1}\,\,F(y_{N})\parallel\leq\eta_{N},

(24)

\parallel F^{\prime}(y_{N})^{-1}\,\,[F^{\prime}(x)-F^{\prime}(y)]\parallel\leq L% \,\,\parallel x-y\parallel,

(25)

\parallel F^{\prime}(y_{N})^{-1}\,\,[F^{\prime}(x)-F^{\prime}(y_{N})]\parallel% \leq L_{0}\,\,\parallel x-y_{N}\parallel,

(26)

H_{N}=\overline{L}\,\,\eta_{N}\leq\displaystyle\frac{1}{2};

Newton’s method (2), starting at $x_{0}=\overline{x_{0}}$ converges to a unique solution $x^{**}$ of equation $F(x)=0$ in $\overline{U}(y_{N},r_{N})$ ,

and

N_{0}\leq N_{1},

where

N_{0}=N\qquad{\rm for}\quad L_{0}=L,

and

N_{1}=N\qquad{\rm for}\quad L_{0}<L.

Moreover, if the inclusion

(27)

\overline{U}(y_{N},r_{N})\subseteq\overline{U}(y_{0},s_{0}^{*}),

hold, then

x^{*}=x^{**}.

Note that parameter $L_{0}$ is independent of $N$ if $L_{0}=L$ , and the inclusion (27) holds if and only if

\parallel y_{N}-y_{0}\parallel+r_{N}\leq s_{0}^{*}.

If $\parallel y_{N}-y_{0}\parallel\leq s_{0}^{*}$ , then by the definition of $r_{N}$ , there exists an integer $N_{0}$ , such that (27) holds. In this case $N^{*}=\{N,\,\,N_{0}\}$ should replace $N$ in the Proposition. Hypothesis (22) can now be dropped, since it follows from (19) and (27).

Proof.

Using Theorem 5, and the estimates

\begin{array}[]{lll}\parallel F^{\prime}(y_{N})^{-1}\,\,F(y_{N})\parallel&\leq% &\parallel F^{\prime}(y_{N})^{-1}\,\,F^{\prime}(y_{0})\parallel\,\,\parallel F% ^{\prime}(y_{0})^{-1}\,\,F(y_{N})\parallel\\ \\ &\leq&\displaystyle\tfrac{1}{1-\ell_{0}\,\,\parallel y_{N}-y_{0}\parallel}\,\,% \parallel F^{\prime}(y_{0})^{-1}\,\,F(y_{N})\parallel\\ \\ &\leq&\displaystyle\tfrac{\parallel F^{\prime}(y_{0})^{-1}\,\,F(y_{N})% \parallel}{1-\ell_{0}\,\,s_{0}^{*}}\leq\alpha\,\,q_{0}^{N}\,\,\eta=\eta_{N},% \end{array}

\begin{array}[]{lll}\parallel F^{\prime}(y_{N})^{-1}\,\,[F^{\prime}(x)-F^{% \prime}(y)]\parallel&\leq&\alpha\,\,\parallel F^{\prime}(y_{0})^{-1}\,\,[F^{% \prime}(x)-F^{\prime}(y)]\parallel\\ &\leq&\alpha\,\,\ell\,\,\parallel x-y\parallel=L\,\,\parallel x-y\parallel,% \end{array}

\begin{array}[]{lll}\parallel F^{\prime}(y_{N})^{-1}\,\,[F^{\prime}(x)-F^{% \prime}(y_{N})]\parallel&\leq&\alpha\,\,\parallel F^{\prime}(y_{0})^{-1}\,\,[F% ^{\prime}(x)-F^{\prime}(y_{N})]\parallel\\ &\leq&\alpha\,\,\ell_{2}\,\,\parallel x-y_{N}\parallel=L_{0}\,\,\parallel x-y_% {N}\parallel,\end{array}

we obtain

H_{N}=\overline{L}\,\,\eta_{N}=\alpha^{2}\,\,\overline{L}\,\,q_{0}^{N}\,\,\eta% \leq\displaystyle\frac{1}{2}

and

N_{0}\leq N_{1},

by the choice of $N$ .

It follows by Theorems 1 and 3, with $L_{0}$ , $L$ , $\eta_{N}$ , $r_{N}$ , $r_{N}$ replacing $\ell_{0}$ , $\ell$ , $\eta$ , $s^{*}$ , $t_{0}^{*}$ respectively, that there exists a unique solution $x^{**}$ of equation $F(x)=0$ in $\overline{U}(y_{N},r_{N})$ .

Moreover, if inclusion (27) holds by the uniqueness of the solution $x^{*}$ in $\overline{U}(y_{0},s_{0}^{*})$ , we deduce $x^{*}=x^{**}$ .

That completes the proof of Proposition 8. ∎

Let us provide an example.

Example 9.

Let $X=Y=\mathbf{R}$ , $D=[a,2-a]$ , $a\in\bigl[0,\frac{1}{2}\bigr)$ and define scalar function $F$ on $D$ by

(28)

F(x)=x^{3}-a.

Choose $y_{0}=1$ . Using (5), (6), (10) and (28), we obtain

(29)

\eta=\frac{1}{3}(1-a),\quad\ell_{0}=3-a\ \ \hbox{and}\ \ \ell=2(2-a).

The Newton–Kantorovich hypothesis (7) becomes

(30)

h=\frac{2}{3}(1-a)(2-a)>\frac{1}{2}

for all $a\in\bigl[0,\frac{1}{2}\bigr)$ . That is according to Theorem 1 there is no guarantee that either methods (2) or (3) starting at $x_{0}=y_{0}=1$ converge to $x^{*}$ .

However according to Theorem 3 condition (12) becomes:

(31)

h_{1}\leq\displaystyle\frac{1}{2},

provided that

(32)

a\in\biggl[.450339002,\frac{1}{2}\biggr).

Using condition (13) we can do even better since

(33)

h_{0}=\frac{1}{3}(1-a)(3-a)\geq\frac{1}{2}

provided that

(34)

a\in\biggl[\frac{4-\sqrt{10}}{2},\frac{1}{2}\biggr)

which improves the choice for $a$ given by (32). However only linear and not quadratic convergence is guaranteed.

Let us now use $a=.49$ . In particular (7) does not hold since for $\eta=.17$ , $\ell=3.02$

h_{1}=.5134>\frac{1}{2}\,.

However (13) holds since for $\ell_{0}=2.51$ :

h_{0}=.4267\leq\frac{1}{2}\,.

We get

q_{0}=.617116205,\ \ \alpha=2.61175848,\ \ s^{*}_{0}=.24586303,

(35)

N=[4.0325]+1=5.

Moreover we obtain

\overline{x}_{0}=y_{4}=.78911736.

Finally note that $x^{*}=.788373516$ .

Our motivation for introducing condition (10) instead of (6) for the convergence modified Newton’s method (3) can also be seen in the following examples.

Example 10.

Let $X=Y=\mathbf{R}$ , $D=[0,\infty)$ , $x_{0}=1$ and define function $F$ on $D$ by

(36)

F(x)=\tfrac{x^{1+\frac{1}{i}}}{1+\frac{1}{i}}+c_{1}x+c_{2},

where $c_{1},c_{2}$ are real parameters and $i>2$ an integer. Then $F^{\prime}(x)=x^{\frac{1}{i}}+c_{1}$ is not Lipschitz on $D$ . However, center Lipschitz condition (10) holds for $\ell_{0}=(1+c_{1})^{-1}$ $(c_{1}\neq-1)$ .

Indeed, we have

$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(x_{0})]\\|$	$\displaystyle=$	$\displaystyle(1+c_{1})^{-1}\bigl\|x^{\tfrac{1}{i}}-x_{0}^{\frac{1}{i}}\bigr\|$
	$\displaystyle=$	$\displaystyle\tfrac{(1+c_{1})^{-1}\|x-x_{0}\|}{x_{0}^{\frac{i-1}{i}}+\cdots+x^{% \frac{i-1}{i}}}$
	$\displaystyle\leq$	$\displaystyle\ell_{0}\|x-x_{0}\|.$

Example 11.

We consider the integral equation

(37)

u(s)=f(s)+\lambda\int^{b}_{a}G(s,t)u(t)^{1+\frac{1}{n}}{\rm d}t,\quad n\in% \mathbf{N}.

Here, $f$ is a given continuous function satisfying $f(s)>0$ , $s\in[a,b]$ , $\lambda$ is a real number, and the kernel $G$ is continuous and positive in $[a,b]\times[a,b]$ .

For example, when $G(s,t)$ is the Green kernel, the corresponding integral equation is equivalent to the boundary value problem

	$\displaystyle u^{\prime\prime}$	$\displaystyle=$	$\displaystyle\lambda u^{1+\frac{1}{n}},$
	$\displaystyle u(a)$	$\displaystyle=$	$\displaystyle f(a),\ \ u(b)=f(b).$

These type of problems have been considered in [2]–[7].

Equations of the form (37) generalize equations of the type

(38)

u(s)=\int^{b}_{a}G(s,t)u(t)^{n}{\rm d}t

studies in [4], [6], [7].

Instead of (37) we can try to solve the equation $F(u)=0$ where

F\colon\Omega\subseteq C[a,b]\to C[a,b],\ \ \Omega=\{u\in C[a,b]\colon u(s)% \geq 0,s\in[a,b]\},

and

F(u)(s)=u(s)-f(s)-\lambda\int^{b}_{a}G(s,t)u(t)^{1+\frac{1}{n}}\,{\rm d}t.

The norm we consider is the max-norm.

The derivative $F^{\prime}$ is given by

F^{\prime}(u)v(s)=v(s)-\lambda\biggl(1+\frac{1}{n}\biggr)\int^{b}_{a}G(s,t)u(t% )^{\frac{1}{n}}v(t){\rm d}t,\ \ v\in\Omega.

First of all, we notice that $F^{\prime}$ does not satisfy a Lipschitz-type condition in $\Omega$ . Let us consider, for instance, $[a,b]=[0,1]$ , $G(s,t)=1$ and $y(t)=0$ . Then $F^{\prime}(y)v(s)=v(s)$ and

\|F^{\prime}(x)-F^{\prime}(y)\|=|\lambda|\biggl(1+\frac{1}{n}\biggr)\int^{1}_{% 0}x(t)^{\frac{1}{n}}{\rm d}t.

If $F^{\prime}$ were a Lipschitz function, then

\|F^{\prime}(x)-F^{\prime}(y)\|\leq L_{1}\|x-y\|,

or, equivalently, the inequality

(39)

\int^{1}_{0}x(t)^{\frac{1}{n}}{\rm d}t\leq L_{2}\max_{x\in[0,1]}x(s),

would hold for all $x\in\Omega$ and for a constant $L_{2}$ . But this is not true. Consider, for example, the functions

x_{j}(t)=\tfrac{t}{j},\ \ j\geq 1,\ \ t\in[0,1].

If these are substituted into (39)

\tfrac{1}{j^{1/n}\bigl(1+\tfrac{1}{n}\bigr)}\leq\tfrac{L_{2}}{j}% \Leftrightarrow j^{1-\tfrac{1}{n}}\leq L_{2}\biggl(1+\tfrac{1}{n}\biggr),\ \ % \forall j\geq 1.

This inequality is not true when $j\to\infty$ .

Therefore, condition (6) fails in this case. However, condition (10) holds. To show this, let $x_{0}(t)=f(t)$ and $\alpha=\min\limits_{s\in[a,b]}f(s)$ , $\alpha>0$ . Then, for $v\in\Omega$ ,

	$\displaystyle\\|[F^{\prime}(x)-F^{\prime}(x_{0})]v\\|$
	$\displaystyle=\,\|\lambda\|\biggl(1+\frac{1}{n}\biggr)\max_{s\in[a,b]}\biggl\|% \int^{b}_{a}G(s,t)\bigl(x(t)^{\frac{1}{n}}-f(t)^{\frac{1}{n}}\bigr)v(t){\rm d}% t\biggr\|$
	$\displaystyle\leq\,\|\lambda\|\biggl(1+\frac{1}{n}\biggr)\!\max_{s\in[a,b]}\!% \frac{G(s,t)\|x(t)-f(t)\|}{x(t)^{(n\!-\!1)/n}\!+\!x(t)^{(n\!-\!2)/n}\!f(t)^{1/n}% \!+\!\cdots\!+\!f(t)^{(n\!-\!1)/n}}{\rm d}t\\|v\\|.$

Hence,

	$\displaystyle\\|F^{\prime}(x)-F^{\prime}(x_{0})\\|$	$\displaystyle\leq$	$\displaystyle\frac{\|\lambda\|\bigl(1+\frac{1}{n}\bigr)}{\alpha^{(n-1)/n}}\max_{% s\in[a,b]}\int^{b}_{a}G(s,t){\rm d}t\\|x-x_{0}\\|$
		$\displaystyle\leq$	$\displaystyle K\\|x-x_{0}\\|,$

where $K=\frac{|\lambda|(1+\frac{1}{n})}{\alpha^{(n-1)/n}}N$ , $N=\max\limits_{s\in[a,b]}\int^{b}_{a}G(s,t){\rm d}t$ , and $\ell_{0}=\|F^{\prime}(x_{0})^{-1}K\|$ . Finally note that condition (13) is satisfied for sufficiently small $\lambda$ .

3. Local Convergence Analysis

In order for us to cover the local convergence of methods (2) and (3) we start the theorem [7]:

Theorem 12.

Let $F\colon D\subseteq X\to Y$ be a differentiable operator. Assume there exist $x^{*}\in D$ and a constant $K>0$ such that:

(40)		$\displaystyle F^{\prime}(x^{})^{-1}\in L(Y,X),\quad F(x^{})=0,$
(41)		$\displaystyle\\|F^{\prime}(x^{*})^{-1}[F^{\prime}(x)-F^{\prime}(y)]\\|\leq K\,\,% \\|x-y\\|\ \ \hbox{for all }x,y\in D,$

and

(42)

\overline{U}(x^{*},r_{RN})\subseteq D

where,

(43)

r_{RN}=\tfrac{2}{3\,\,K}\,.

Then

(a)

sequence $\{x_{n}\}$ generated by Newton’s method (2) is well defined, remains in $\overline{U}(x^{*},r_{RN})$ for all $n\geq 0$ , converges to $x^{*}$ provided that $x_{0}\in U(x^{*},r_{RN})$ and

(44) $\|x_{n+1}-x^{*}\|\leq\tfrac{K\,\,\|x_{n}-x^{*}\|^{2}}{2(1-K\,\,\|x_{n}-x^{*}\|% )}\quad(n\geq 0).$

If (58) is replaced by

(45) $\overline{U}(x^{*},r_{RM})\subseteq D,$

where,

(46) $r_{RM}=\frac{2}{5\,\,K}\,,$

then
(b)

sequence $\{y_{n}\}$ generated by modified Newton’s method (3) is well defined, remains in $\overline{U}(x^{*},r_{RM})$ for all $n\geq 0$ , converges to $x^{*}$ provided that $x_{0}\in U(x^{*},r_{RM})$ , and

(47) $\|y_{n+1}-x^{*}\|\leq\tfrac{K\,\,\bigl[\|x^{*}-y_{0}\|+\frac{1}{2}\|y_{n}-x^{*% }\|\bigr]}{1-K\,\,\|x^{*}-y_{0}\|}\|y_{n}-x^{*}\|\ \ (n\geq 0).$

Proof.

The proof of (a) can be found in [8], whereas the proof of (b) is a special case of part (b) in our Theorem 11 that follows.

It follows from condition (41) that there exists $K_{0}\in(0,K)$ such that:

(48)

\|F^{\prime}(x^{*})^{-1}[F^{\prime}(x)-F^{\prime}(x^{*})]\|\leq K_{0}\,\,\|x-x% ^{*}\|\ \ \hbox{for all }x\in D.

Then using a combination of conditions (41) and (48) for method (2), and only condition (48) for method (3) we can show: ∎

Theorem 13.

Let $F\colon D\subseteq X\to Y$ be a differentiable operator.

Assume there exist $x^{*}\in D$ and constants $K_{0}>0$ , $K>0$ such that:

	$\displaystyle F^{\prime}(x^{})^{-1}\in L(Y,X),\ \ F(x^{})=0,$
	$\displaystyle\\|F^{\prime}(x^{})^{-1}[F^{\prime}(x)-F^{\prime}(x^{})]\\|\leq K% _{0}\,\,\\|x-x^{*}\\|\ \ \hbox{\it for all }x\in D,$
	$\displaystyle\\|F^{\prime}(x^{*})^{-1}[F^{\prime}(x)-F^{\prime}(y)]\\|\leq K\,\,% \\|x-y\\|\ \ \hbox{\it for all }x,y\in D,$

and

(49)

\overline{U}(x^{*},r_{AN})\subseteq D

where,

(50)

r_{AN}=\tfrac{2}{2\,\,K_{0}+K}\,.

(a)

Then sequence $\{x_{n}\}$ generated by Newton’s method (2) is well defined, remains in $\overline{U}(x^{*},r_{AN})$ for all $n\geq 0$ , converges to $x^{*}$ provided that $x_{0}\in U(x^{*},r_{AN})$ and

(51) $\|x_{n+1}-x^{*}\|\leq\tfrac{K\,\,\|x_{n}-x^{*}\|^{2}}{2(1-K_{0}\,\,\|x_{n}-x^{% *}\|)}\ \ (n\geq 0).$

Using only the center-Lipschitz condition, and if (49) is replaced by

(52) $\overline{U}(x^{*},r_{AM})\subseteq D,$

and (48), where,

(53) $r_{AM}=\tfrac{2}{5\,\,K_{0}}\,,$

then,
(b)

sequence $\{y_{n}\}$ generated by the modified Newton’s method (3) is well defined, remains in $\overline{U}(x^{*},r_{AM})$ for all $n\geq 0$ , converges to $x^{*}$ provided that $x_{0}\in U(x^{*},r_{AM})$ ,

and

(54) $\|y_{n+1}-x^{*}\|\leq\tfrac{K_{0}\,\,\bigl[\|x^{*}-y_{0}\|+\frac{1}{2}\|y_{n}-% x^{*}\|\bigr]\|y_{n}-x^{*}\|}{1-K_{0}\,\,\|x^{*}-y_{0}\|}\ \ (n\geq 0).$

Proof.

(a) The proof can be found in [2].

(b) Let $x\in\overline{U}(x^{*},r_{AM})$ . Then using (48) we get

(55)

\|F^{\prime}(x^{*})^{-1}[F^{\prime}(x)-F^{\prime}(x^{*})]\|\leq K_{0}\,\,\|x-x% ^{*}\|\leq K_{0}\,\,r_{AM}<1

by the choice of $r_{AM}$ . It follows from (55) and the Banach Lemma on invertible operators [4], [6] that $F^{\prime}(x)^{-1}$ exists, and

(56)

\|F^{\prime}(x)^{-1}F^{\prime}(x^{*})\|\leq\tfrac{1}{1-K_{0}\,\,\|x-x^{*}\|}% \leq\tfrac{1}{1-K_{0}\,\,r_{AM}}\,.

In particular by hypothesis $y_{0}\in U(x^{*},r_{AM})\subseteq\overline{U}(x^{*},r_{AM})$ .

Let us assume $y_{k}\in\overline{U}(x^{*},r_{AM})$ for $k=0,1,\ldots,n$ . Then using (3), (48) and (53) we obtain in turn

(57)	$\displaystyle y_{n+1}-x^{*}$	$\displaystyle=$	$\displaystyle y_{n}-x^{*}-F^{\prime}(y_{0})^{-1}F(y_{n})$
		$\displaystyle=$	$\displaystyle F^{\prime}(y_{0})^{-1}[F^{\prime}(y_{0})(y_{n}-x^{*})-F(y_{n})]$
		$\displaystyle=$	$\displaystyle-F^{\prime}(y_{0})^{-1}[F(y_{n})-F(x^{})-F^{\prime}(y_{0})(y_{n}% -x^{})]$
		$\displaystyle=$	$\displaystyle-F^{\prime}(y_{0})^{-1}F^{\prime}(x^{})F^{\prime}(x^{})^{-1}% \Biggl\{\int^{1}_{0}[F^{\prime}(x^{}+t(y_{n}-x^{}))-F^{\prime}(x^{*})]$
			$\displaystyle+\;[F^{\prime}(x^{})-F^{\prime}(y_{0})]\Biggr\}(y_{n}-x^{}){\rm d% }t,$

and

(58)	$\displaystyle\\|y_{n+1}-x^{*}\\|$	$\displaystyle\leq$	$\displaystyle\frac{K_{0}\,\,\int^{1}_{0}[\\|x^{}-y_{0}\\|+t\\|y_{n}-x^{}\\|]}{1-% K_{0}\,\,\\|x^{}-y_{0}\\|}\\|y_{n}-x^{}\\|{\rm d}t$
		$\displaystyle=$	$\displaystyle\frac{K_{0}\,\,\bigl[\\|x^{}-y_{0}\\|+\frac{1}{2}\\|y_{n}-x^{}\\|% \bigr]}{1-K_{0}\,\,\\|x^{}-y_{0}\\|}\\|y_{n}-x^{}\\|$
		$\displaystyle<$	$\displaystyle\\|y_{n}-x^{*}\\|\leq r_{AM}$

which shows (54), and $\lim\limits_{n\to\infty}y_{n}=x^{*}$ .

That completes the proof of Theorem 13. ∎

Remark 14.

In general

(59)

K_{0}\leq K

holds and $\frac{K}{K_{0}}$ can be arbitrarily large. If $K_{0}=K$ Theorem 13 reduces to Theorem 10. Otherwise Theorem 13 improves Theorem 12 under the same hypotheses for method (2), and the same or less computational cost for method (3); finer estimates on the distances $\|x_{n}-x^{*}\|$ $(n\geq 0)$ are obtained and the radius of convergence is enlarged. In particular, we have

(60)		$\displaystyle r_{RN}$	$\displaystyle<$	$\displaystyle r_{AN}$
(61)		$\displaystyle r_{RM}$	$\displaystyle<$	$\displaystyle r_{AM}.$

Moreover, since

(62)

r_{RM}<r_{RN},

iterates from method (3) cannot be used to find the initial guess $x_{0}$ for faster method (2).

Examples where estimate (59) holds as in strict inequality can be found in [2]–[4]. However using Theorem 13 we can achieve this as follows: Let $p\in(0,1)$ be the desired rate of convergence for method (3). Then by (54) we have for $\|y_{0}-x^{*}\|\leq\frac{2p}{3\,\,K_{0}+2\,\,p\,\,K_{0}}$ that:

\|y_{n+1}-x^{*}\|\leq p\|y_{n}-x^{*}\|.

Choose:

(63)

M=\left[\tfrac{\ln\frac{5\,\,K_{0}}{2\,\,K_{0}+K}}{\ln p}\right]+1,

then it can easily be seen that

p^{M}r_{AM}\leq r_{AN}

and consequently according to (a) of Theorem 11 we can set

x_{0}=y_{M}.

In case of $K_{0}=K$ according to (a) of Theorem 10 we can set

x_{0}=y_{M_{1}},

where,

M_{1}=\left[\tfrac{\ln\frac{5}{3}}{\ln p}\right]+1

(simply set $K_{0}=K$ in (63)). Note that

M\leq M_{1}.

Finally we observe

r_{RN}<r_{AM}

\tfrac{K}{K_{0}}>\frac{5}{3}\,,

which can hold, since $\frac{K}{K_{0}}$ can be arbitrarily large [2]. ∎

The ideas presented here can be extended to other Newton-type iterative methods [1], [3], [4], [5] along the same lines.

4. Conclusion

The famous for its simplicity and clarity Newton–Kantorovich hypothesis (7) is the crucial sufficient semilocal convergence condition for both the quadratically convergent Newton’s method (2), and the linearly convergent modified Newton’s method (3) [6]. There exist simple numerical examples to show that both methods converge even when condition (7) is violated [4], [6]. In fact, it is common practice even condition (7) is violated for a certain initial guess, to still use Newton’s method (2) for a few iterates until condition (7) is satisfied. However, this approach is considered a shot in the dark, and it is not working in general [4], [7]. Here we have introduced a certain approach that works in case condition (7)is violated. First, we showed that condition (13) is a weaker sufficient convergence hypothesis for the modified Newton’s method (3) than (7). That is we extended the convergence region of method (3). Then, if (7) is violated but (13) holds, we start with slower method (3) until we reach, (after a finite number of steps) a certain iterate for which condition (7) also holds true. We then continue with faster method (2) using this iterate.

References

[1] Amat, S., Busquier, S. and Candela, V., A class of quasi-Newton generalized Steffensen methods on Banach spaces, J. Comput. Appl. Math., 149, pp. 397–408, 2002.
[2] Argyros, I.K., On the Newton–Kantorovich hypothesis for solving equations, J. Comput. Appl. Math., 169, pp. 315–332, 2004.
[3] Argyros, I.K., A unifying local-semilocal convergence analysis and applications for two-point Newton-like methods in Banach space, J. Math. Anal. Applic., 298, 2, pp. 374–397, 2004.
[4] Argyros, I.K., Computational theory of iterative methods, Studies in Computational Mathematics, 15 (Editors: C.K. Chui and L. Wuytack), Elsevier Publ. Co., New York, 2007.
[5] Gutierrez, J.M., Hernandez, M.A. and Salanova, M.A., Accessibility of solutions by Newton’s method, Intern. J. Comput. Math., 57, pp. 239–247, 1995.
[6] Kantorovich, L.V. and Akilov, G.P., Functional Analysis in Normed Spaces, Pergamon Press, Oxford, 1982.
[7] Krasnosel^′skii, M.A., Vainikko, G.M., Zabreiko, P.P., Rutitskii, Ya.B. and Stetsenko, V.Ya., Approximate Solution of Operator Equations, Wolters-Noordhoff Publ., Groningen, 1972.
[8] Rheinboldt, W.C., An adaptive continuation process for solving systems of nonlinear equations, Polish Academy of Science, Banach Ctr. Publ., 3, pp. 129–142, 1977.
[9]

$\displaystyle\\|P(x)-x_{0}\\|$	$\displaystyle\leq$	$\displaystyle\\|P(x)-P(x_{0})\\|+\\|P(x_{0})-x_{0}\\|$
	$\displaystyle\leq$	$\displaystyle\left\\|F^{\prime}(x_{0})^{-1}\int^{1}_{0}\{F^{\prime}(x_{0})-F^{% \prime}[x_{0}+t(x-x_{0})]\}(x-x_{0}){\rm d}t\right\\|+\eta$
	$\displaystyle\leq$	$\displaystyle\ell_{0}\int^{1}_{0}t\,{\rm d}t\\|x-x_{0}\\|^{2}+\eta\leq\frac{\ell% _{0}}{2}(s^{}_{0})^{2}+\eta=s^{}_{0},$

$\displaystyle\\|F^{\prime}(x_{0})^{-1}[F^{\prime}(x)-F^{\prime}(x_{0})]\\|$	$\displaystyle=$	$\displaystyle(1+c_{1})^{-1}\bigl\|x^{\tfrac{1}{i}}-x_{0}^{\frac{1}{i}}\bigr\|$
	$\displaystyle=$	$\displaystyle\tfrac{(1+c_{1})^{-1}\|x-x_{0}\|}{x_{0}^{\frac{i-1}{i}}+\cdots+x^{% \frac{i-1}{i}}}$
	$\displaystyle\leq$	$\displaystyle\ell_{0}\|x-x_{0}\|.$

	$\displaystyle\\|F^{\prime}(x)-F^{\prime}(x_{0})\\|$	$\displaystyle\leq$	$\displaystyle\frac{\|\lambda\|\bigl(1+\frac{1}{n}\bigr)}{\alpha^{(n-1)/n}}\max_{% s\in[a,b]}\int^{b}_{a}G(s,t){\rm d}t\\|x-x_{0}\\|$
		$\displaystyle\leq$	$\displaystyle K\\|x-x_{0}\\|,$

Approximating solutions of equations
using Newton’s method with a modified Newton’s method iterate as a starting point

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. Semilocal Convergence Analysis

Theorem 1.

Remark 2.

Theorem 3.

Remark 4.

Theorem 5.

Proof.

Remark 6.

Remark 7.

Proposition 8.

Proof.

Example 9.

Example 10.

Example 11.

3. Local Convergence Analysis

Theorem 12.

Proof.

Theorem 13.

Proof.

Remark 14.

4. Conclusion

References

Information

Indexing and abstracting

Publisher

Approximating solutions of equations using Newton’s method with a modified Newton’s method iterate as a starting point

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. Semilocal Convergence Analysis

Theorem 1.

Remark 2.

Theorem 3.

Remark 4.

Theorem 5.

Proof.

Remark 6.

Remark 7.

Proposition 8.

Proof.

Example 9.

Example 10.

Example 11.

3. Local Convergence Analysis

Theorem 12.

Proof.

Theorem 13.

Proof.

Remark 14.

4. Conclusion

References

Information

Indexing and abstracting

Publisher

Approximating solutions of equations
using Newton’s method with a modified Newton’s method iterate as a starting point