Global Convergence of the Armijo epsilon steepest descent Algorithm

Nour Eddine Rahali$^{\ast }$, Nacera Djeghaba$^{\S }$ Rachid Benzine$^{\bullet }$

January 23, 2012.

$^{\ast }$Department of Mathematics, Souk Ahras University, Algeria, e-mail: rahali.noureddine@yahoo.fr.

$^{\S }$Department of Mathematics, Badji Mokhtar University, B.P. 12, Annaba, Algeria,
e-mail: djeghaba.nacera@yahoo.fr.

$^{\bullet }$Department of Mathematics, Badji Mokhtar University, B.P. 12, Annaba, Algeria,
e-mail: rachid.benzine@univ-annaba.org.

In this article, we study the unconstrained minimization problem

\[ (P)\, \, \, \min \left\{ f(x):x\in \mathbb {R}^{n}\right\} . \]

where $f:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a continuously differentiable function. We introduce a new algorithm which accelerates the convergence of the steepest descent method. We further establish the global convergence of this algorithm in the case of Armijo inexact line search.

MSC. 90C30; 65K05; 49M37

Keywords. Unconstrained optimization, global convergence, steepest descent algorithm, $\ \varepsilon $-algorithm, Armijo inexact line search.

1 INTRODUCTION

Consider the following unconstrained minimization problem:

\begin{equation} \min \left\{ f(x):x\in \mathbb {R}^{n}\right\} \tag {1}\end{equation}

where $f:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a continuously differentiable function. Numerical methods for problem (1) are iterative. An initial point $x_{1}$ should be given, and at the $k$-th iteration a new iterate point $x_{k+1}$ is to be computed by using the information at the current iterate point $x_{k}$ and those at the previous points. It is hoped that the sequence $\left\{ x_{k}\right\} _{k\in \mathbb {N}}$ generated will converge to the solution of (1).

Most numerical methods for unconstrained optimization can be classified into two groups, namely line search algorithms and trust region algorithms. A line search algorithm chooses or computes a search direction $d_{k}$ at the $k$-th iteration, and it sets the next iterate point by

\[ x_{k+1}=x_{k}+\alpha _{k}d_{k} \]

where $d_{k}$ is a descent direction of $f(x)$ at $x_{k}$ and $\alpha _{k}$ is a step size. The search direction $d_{k}$ is generally required to satisfy

\[ \nabla f\left( x_{k}\right) ^{t}d_{k}{\lt}0, \]

which guarantees that $d_{k}$ is a descent direction of $f(x)$ at $x_{k}\ $([5], [28]). In line search methods, if the search direction $d_{k}$ is given at the $k$-th iteration, then the next task is to find a step size $\alpha _{k}$ along the search direction. The ideal line search rule is the exact one that satisfies

\[ f(x_{k}+\alpha _{k}d_{k})=\underset {\alpha {\gt}0}{\min }f(x_{k}+\alpha d_{k}). \]

In fact, the exact step size is difficult or even impossible to obtain in practical computation. Thus many researchers constructed some inexact line search rules, such as Armijo rule, Goldstein rule and Wolfe rule ([4], [26], [34]). In this article, we use the Armijo’s line search ([4]), which may be summarized as follows:

Armijo’s line search ([4])

Armijo’s Rule is driven by two parameters, $0{\lt}c{\lt}1$ and $\beta {\gt}1,$ which respectively manage the acceptable step length from being too large or too small. (Typical values are $c=0.2,\ \beta =2$). Suppose that $\nabla f\left( x_{k}\right) ^{t}d_{k}{\lt}0.$ Define the functions $\varphi (\alpha )$ and $\widehat{\varphi }(\alpha )$ as follows:

\[ \varphi (\alpha )=f(x_{k}+\alpha d_{k}),\ \ \alpha \geq 0 \]

\[ \widehat{\varphi }(\alpha )=\varphi (0)+\alpha c\varphi ^{\prime }(0)=f(x_{k})+\alpha c\nabla f\left( x_{k}\right) ^{t}d_{k},\ \ \ \alpha \geq 0,\ 0{\lt}c{\lt}1. \]

A step length $\overline{\alpha }$ is considered to be acceptable, provided that

\[ \varphi \left( \overline{\alpha }\right) \leq \widehat{\varphi }(\overline{\alpha }). \]

However, in order to prevent $\overline{\alpha }$ from being too small, Armijo Rule also requires that the following inequality holds

\[ \varphi \left( \beta \overline{\alpha }\right) {\gt}\widehat{\varphi }(\beta \overline{\alpha }), \]

which yields an acceptable range for $\overline{\alpha }.$

Frequently, Armijo Rule is adopted in the following manner. A fixed step length parameter $\overline{\alpha }$ is chosen. If $\varphi \left( \overline{\alpha }\right) \leq \widehat{\varphi }(\overline{\alpha }),$ then either $\overline{\alpha }$ is itself selected as the step size, or $\overline{\alpha } $ is sequentially-doubled (assuming $\beta =2$) to find the largest integer $t\geq 0$ for which

\[ \varphi \left( 2^{t}\overline{\alpha }\right) \leq \widehat{\varphi }(2^{t}\overline{\alpha }) \]

On the other hand, if $\varphi \left( \overline{\alpha }\right) {\gt}\widehat{\varphi }(\overline{\alpha }),$ then $\overline{\alpha }$ is sequentially halved to find the smallest integer $t\geq 1$ for which

\[ \varphi \left( \tfrac {\overline{\alpha }}{2^{t}}\right) \leq \widehat{\varphi }(\tfrac {\overline{\alpha }}{2^{t}}). \]

The steepest descent method is one of the simplest and the most fundamental minimization methods for unconstrained optimization. Since it uses the negative gradient as its descent direction, it is also called the gradient method.

For many problems, the steepest descent method is very slow. Although the method usually works well in the early steps, as a stationary point is approached, it descends very slowly with zigzagging phenomena. There are some ways to overcome these difficulties of zigzagging by deflecting the gradient. Rather then moving along $d=-\nabla f(x)$, we can move along $d=-D\nabla f(x)\ $([8], [9], [10], [12], [13], [14], [15], [17], [21], [22], [23], [24], [25], [31], [32], [33]) or along $d=-\nabla f(x)+g\ $([18], [19], [20], [21], [27], [29], [30]), where $D$ is an appropriate matrix, and $g$ is an appropriate vector.

In [16] Benzine and Djeghaba provided another solution to this problem by accelerating the convergence of the gradient method.

They achieved this goal by designing a new algorithm, named the epsilon steepest descent algorithm, in which the Wynn epsilon algorithm ([1], [6], [35], [36]) and exact line searches played a prominent role.

In this work we accelerate the convergence of the gradient method by using the Florent Cordellier epsilon algorithm ([11]).

We study the global convergence of the new algorithm, named the Armijo epsilon steepest descent algorithm, by using Armijo inexact line searches ([4])

2 THE EPSILON ALGORITHM

The Epsilon Algorithm is due to P. Wynn ([35], [36]).
Given a sequence $\left\{ x_{k}\right\} _{k\in \mathbb {N}},\ x_{k}\in \mathbb {R} ^{n}.\ $The coordinates of $x_{k}$ will be noted as follows:

\[ x_{k}=(x_{k}^{1},x_{k}^{2},...,x_{k}^{i},...,x_{k}^{n})\in \mathbb {R}^{n} \]

For $i\in \left\{ 1,2,...,n\right\} ,\ $the Epsilon Algorithm calculates quantities with two indices $\varepsilon _{j}^{k,i}(j,k=0,1,...)$ as follows:

\begin{align} & \varepsilon _{-1}^{k,i} =0\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \varepsilon _{0}^{k,i}\, =x_{k}^{i}\quad \quad k=0,1,...\tag {2}\\ & \varepsilon _{j+1}^{k,i} =\varepsilon _{j-1}^{k+1,i}+\tfrac {1}{\varepsilon _{j}^{k+1,i}-\varepsilon _{j}^{k,i}}\quad \quad j,k=0,1,...\nonumber \end{align}

For $i\in \left\{ 1,2,...,n\right\} ,\ $these numbers can be placed in an array as follows:

$\varepsilon _{-1}^{0,i}=0$
	$\varepsilon _{0}^{0,i}=x_{0}^{i}$
$\varepsilon _{-1}^{1,i}=0$		$\varepsilon _{1}^{0,i}$
	$\varepsilon _{0}^{1,i}=x_{1}^{i}$		$\varepsilon _{2}^{0,i}$
$\varepsilon _{-1}^{2,i}=0$		$\varepsilon _{1}^{1,i}$		$\varepsilon _{3}^{0,i}$
	$\varepsilon _{0}^{2,i}=x_{2}^{i}$		$\varepsilon _{2}^{1,i}$		$\varepsilon _{4}^{0,i}$
$\varepsilon _{-1}^{3,i}=0$		$\varepsilon _{1}^{2,i}$		$\varepsilon _{3}^{1,i}$		$\varepsilon _{5}^{0,i}$
	$\varepsilon _{0}^{3,i}=x_{3}^{i}$		$\varepsilon _{2}^{2,i}$		$\varepsilon _{4}^{1,i}$		$\varepsilon _{6}^{0,i}$
$\varepsilon _{-1}^{4,i}=0$		$\varepsilon _{1}^{3,i}$		$\varepsilon _{3}^{2,i}$		$\varepsilon _{5}^{1,i}$		$\varepsilon _{7}^{0,i}$
	$\varepsilon _{0}^{4,i}=x_{4}^{i}$		$\varepsilon _{2}^{3,i}$	$\varepsilon _{4}^{2,i}$		$\varepsilon _{6}^{1,i}$		$-$
$\varepsilon _{-1}^{5,i}=0$		$\varepsilon _{1}^{4,i}$		$\varepsilon _{3}^{3,i}$		$\varepsilon _{5}^{2,i}$		$\varepsilon _{7}^{1,i}$
	$\varepsilon _{0}^{5,i}=x_{5}^{i}$		$\varepsilon _{2}^{4,i}$		$\varepsilon _{4}^{3,i}$		$\varepsilon _{6}^{2,i}$		$-$
$\varepsilon _{-1}^{6,i}=0$		$\varepsilon _{1}^{5,i}$		$\varepsilon _{3}^{4,i}$		$\varepsilon _{5}^{3,i}$		$\varepsilon _{7}^{2,i}$
	$\varepsilon _{0}^{6,i}=x_{6}^{i}$		$\varepsilon _{2}^{5,i}$		$\varepsilon _{4}^{4,i}$		$\varepsilon _{6}^{3,i}$		$-$
$\varepsilon _{-1}^{7,i}=0$		$\varepsilon _{1}^{6,i}$		$\varepsilon _{3}^{5,i}$		$\varepsilon _{5}^{4,i}$		$\varepsilon _{7}^{3,i}$

Table 1 Epsilon Algorithm

This array is called the $\varepsilon $-array. In this array the lower index denotes a column while the upper index denotes a diagonal.

For $i\in \left\{ 1,2,...,n\right\} ,\ $the Epsilon algorithm relates the numbers located at the four vertices of a rhombus:

\[ \begin{array}[c]{ccc}& \varepsilon _{j}^{k,i} & \\ \varepsilon _{j-1}^{k+1,i} & & \varepsilon _{j+1}^{k,i}\\ & \varepsilon _{j}^{k+1,i} & \end{array} \]

To calculate the quantities $\varepsilon _{j+1}^{k,i}$, we need to know the numbers $\varepsilon _{j-1}^{k+1,i},$
$\ \varepsilon _{j}^{k+1,i}\ $and $\varepsilon _{j}^{k,i}.$

3 THE ARMIJO EPSILON STEEPEST DESCENT ALGORITHM

To construct our algorithm, we use the column $\varepsilon _{2}^{k,i}\ (i=1,2,...,n).\ $Given a sequence $\left\{ x_{k}^{i}\right\} _{k\in \mathbb {N}}\ (i=1,2,...,n),$ F. Cordellier ([11]) proposed another formula to calculate the epsilon algorithm of order 2. The quantities $\varepsilon _{2}^{k,i}$can be calculated as follows

\begin{equation} \varepsilon _{2}^{k,i}=x_{k+1}^{i}+\left[ \tfrac {1}{x_{k+2}^{i}-x_{k+1}^{i}}-\tfrac {1}{x_{k+1}^{i}-x_{k}^{i}}\right] ^{-1},\ \ (i=1,2,...,n)\tag {3}\end{equation}

To calculate $\varepsilon _{2}^{k,i},$ we use the elements $x_{k}^{i},$ $x_{k+1}^{i}$ and $x_{k+2}^{i}\ (i=1,2,...,n).$ Numerical calculations ([11]) showed that the epsilon algorithm of order 2 with the Cordellier formula (3) is more stable than Wynn epsilon algorithm (2).

We are now in measure to introduce our new algorithm: the Armijo epsilon steepest descent algorithm.

The Armijo epsilon steepest descent algorithm
Initialization step: Choose an initial point $x_{0}\in \mathbb {R}^{n} $ an initial point. The coordinates of $x_{0}$ will be noted as follows:

\[ x_{0}=(x_{0}^{1},x_{0}^{2},...,x_{0}^{i},...,x_{0}^{n})\in \mathbb {R}^{n} \]

Let $k=0$ and go to Main step.

Main Step: Starting with the vector $x_{k},$

\[ x_{k}=(x_{k}^{1},x_{k}^{2},...,x_{k}^{i},...,x_{k}^{n}). \]

If $\left\Vert \nabla f(x_{k})\right\Vert =0,$ stop. Otherwise, let should be $r_{k}=x_{k}$ and compute the vectors $s_{k}$ and $t_{k}$ by using twice the steepest descent algorithm, with Armijo inexact line search

\[ s_{k}=r_{k}-\lambda _{k}\nabla f(r_{k}), \]

and

\[ t_{k}=s_{k}-\beta _{k}\nabla f(s_{k}), \]

$\lambda _{k}\, $ and $\beta _{k}$ are positive scalars obtained by the Armijo inexact line search.
If

\[ s_{k}^{i}-r_{k}^{i}\neq 0,\ \ t_{k}^{i}-s_{k}^{i}\neq 0\ \ \mathrm{and}\ \ \tfrac {1}{t_{k}^{i}-s_{k}^{i}}-\tfrac {1}{s_{k}^{i}-r_{k}^{i}}\neq 0,\ \ i=1,...n \]

Let

\[ \varepsilon _{2}^{k,i}=s_{k}^{i}+\left[ \tfrac {1}{t_{k}^{i}-s_{k}^{i}}-\tfrac {1}{s_{k}^{i}-r_{k}^{i}}\right] ^{-1},\ \ \ \ \ \ i=1,....n, \]

and

\[ \varepsilon _{2}^{k}=\left( \varepsilon _{2}^{k,1},...,\varepsilon _{2}^{k,i},...\varepsilon _{2}^{k,n}\right) . \]

If $f\left( \varepsilon _{2}^{k}\right) {\lt}f(t_{k}),$ let $x_{k}=\varepsilon _{2}^{k}.$ Replace $k$ by $k+1$ and go to main step.
If $f\left( \varepsilon _{2}^{k}\right) \geq f(t_{k})$ or if

\[ s_{k}^{i_{0}}-r_{k}^{i_{0}}=0\ \text{\ or }\ t_{k}^{i_{0}}-s_{k}^{i_{0}}=0\ \ \text{\ or }\ \tfrac {1}{t_{k}^{i_{0}}-s_{k}^{i_{0}}}-\tfrac {1}{s_{k}^{i_{0}}-r_{k}^{i_{0}}}=0,\ i_{0}\in \left\{ 1,...,n\right\} . \]

Let $x_{k}=t_{k}.$ Replace $k$ by $k+1$ and go to Main step.

Remark 1

According to the Algorithm, the vectors $s_{k}$ and $t_{k}$ are obtained by using twice the steepest descent method, with Armijo inexact line search. Then we have

\[ f(s_{k}){\lt}f(r_{k})=f(x_{k}) \]

and

\[ f(t_{k}){\lt}f(s_{k}) \]

Now, by considering the Algorithm, if the calculation of $\varepsilon _{2}^{k}$ is possible, two cases are possible:
a) $f\left( \varepsilon _{2}^{k}\right) {\lt}f\left( t_{k}\right) .$ Then we have

\[ f\left( x_{k+1}\right) =f\left( \varepsilon _{2}^{k}\right) {\lt}f\left( x_{k}\right) \]

b) $f\left( \varepsilon _{2}^{k}\right) \geq f\left( t_{k}\right) $ or if the calculation of $\varepsilon _{2}^{k}$ is not possible$.\ $In this case and according to the algorithm we have

\[ f\left( x_{k+1}\right) =f(t_{k}){\lt}f\left( x_{k}\right) \]

In conclusion the Armijo epsilon steepest descent Algorithm guarantees

\begin{equation} f\left( x_{k+1}\right) 4

â–¡

4 Global Convergence
of the Armijo epsilon steepest descent algorithm

The foregoing preparatory results enable us to establish the following theorem

Theorem 2

For the unconstrained minimization problem (1), we let $x_{0}$be a starting point of the Armijo epsilon steepest descent Algorithm, and assume that the following assumptions hold

The function f is continuously differentiable in a neighborhood $\mathcal{L}$ of the level set $\delta (x_{0})=\left\{ x\in \mathbb {R} ^{n}:\ f(x)\leq f(x_{0})\right\} .$
The gradient of $f$ is Lipschitzian in $\mathcal{L}$, i.e. there exists $K{\gt}0,$ such that

\[ \left\Vert \nabla f(x)-\nabla f(y)\right\Vert \leq K\left\Vert x-y\right\Vert ,\ \ \ \ \forall (x,y)\in \mathcal{L}\times \mathcal{L} \]

Then, the sequence $\left\{ x_{k}\right\} _{k\in \mathbb {N}}$generated by the Armijo epsilon steepest descent algorithm must satisfy one of the properties: $\nabla f(x_{k_{0}})=0$ for some $k_{0}\in \mathbb {N}$, or $\left\Vert \nabla f(x_{k})\right\Vert \underset {k\rightarrow \infty }{\longrightarrow }0.$

Proof â–¼

Suppose that an infinite sequence $\left\{ x_{k}\right\} _{k\in \mathbb {N}}$ is generated by the Armijo epsilon steepest descent Algorithm. In the main step of the Algorithm, the vectors $s_{k}$ and $t_{k}$ are obtained by using twice the steepest descent method, with Armijo inexact line search. The vectors $s_{k}$ and $t_{k}$ are the successors of $x_{k},$ and used to calculate $x_{k+1}$ (see the main step of the algorithm)$.\ $Note that

\[ s_{k}=x_{k}-\lambda _{k}\nabla f(x_{k}), \]

$\lambda _{k}=\tfrac {\overline{\alpha }}{2^{t}}\ (\overline{\alpha }{\gt}0$ is a constant defined in the Armijo inexact line search) verifying the following Armijo criterion

\begin{equation} \varphi \left( \tfrac {\overline{\alpha }}{2^{t}}\right) \leq \widehat{\varphi }\left( \tfrac {\overline{\alpha }}{2^{t}}\right) \tag {5}\end{equation}

with

\begin{equation} \varphi (\tfrac {\overline{\alpha }}{2^{t}})=f\left( x_{k}-\tfrac {\overline{\alpha }}{2^{t}}\nabla f\left( x_{k}\right) \right) ,\tag {6}\end{equation}

and

\begin{equation} \widehat{\varphi }\left( \tfrac {\overline{\alpha }}{2^{t}}\right) =f(x_{k})-\tfrac {\overline{\alpha }}{2^{t}}c\nabla f\left( x_{k}\right) ^{t}\nabla f\left( x_{k}\right) ,\tag {7}\end{equation}

$\ c\in \left] 0,1\right[ ,\ \overline{\alpha }{\gt}0,\ t\in \mathbb {N}$, such that the inequality (5) is satisfied. Taking into account the relations (5), (6) and (7), we obtain

\begin{align} f\left( s_{k}\right) & =f\left[ x_{k}-\tfrac {\overline{\alpha }}{2^{t}}\nabla f\left( x_{k}\right) \right] \tag {8}\\ & \leq f(x_{k})-c\tfrac {\overline{\alpha }}{2^{t}}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}.\nonumber \end{align}

(8) implies

\begin{equation} f\left( s_{k}\right) -f(x_{k})\leq -c\tfrac {\overline{\alpha }}{2^{t}}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}.\tag {9}\end{equation}

On the other hand, by using the mean value theorem, we have

\begin{equation} f\left( s_{k}\right) -f(x_{k})=\left( s_{k}-x_{k}\right) ^{T}\nabla f\left( \widetilde{x}\right) ;\ \ \ \widetilde{x}=\lambda s_{k}+(1-\lambda )x_{k},\ \ \ \lambda \in ]0,1[.\tag {10}\end{equation}

In as much as

\begin{equation} s_{k}=x_{k}-\alpha _{k}\nabla f\left( x_{k}\right) ,\ \ \alpha _{k}=\tfrac {\overline{\alpha }}{2^{t}},\tag {11}\end{equation}

then

\begin{align} f\left( s_{k}\right) -f(x_{k}) & =-\alpha _{k}\nabla f\left( x_{k}\right) ^{t}.\nabla f\left( \widetilde{x}\right) \tag {12}\\ & =-\alpha _{k}\nabla f\left( x_{k}\right) ^{t}\left[ \nabla f\left( x_{k}\right) -\nabla f\left( x_{k}\right) +\nabla f\left( \widetilde{x}\right) \right] \nonumber \\ & =-\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\alpha _{k}\nabla f\left( x_{k}\right) ^{t}\left[ \nabla f\left( x_{k}\right) -\nabla f\left( \widetilde{x}\right) \right] .\nonumber \end{align}

$\widetilde{x}=\lambda s_{k}+(1-\lambda )x_{k},\ \lambda \in ]0,1[$. Noting that the sequence $\left\{ f(x_{k})\right\} _{k\in \mathbb {N}},$ is decreasing, then

\[ x_{k}\in \delta (x_{0}),\ \ k=0,1,... \]

Now, by using the Cauchy Schwarz inequality and the fact that the gradient of $f$ is Lipschitzian of constant $K$, we have

\begin{align} f\left( s_{k}\right) -f(x_{k}) & \leq -\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert \left\Vert \nabla f\left( x_{k}\right) -\nabla f\left( \widetilde{x}\right) \right\Vert \tag {13}\\ & \leq -\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert K\left\Vert x_{k}-\widetilde{x}\right\Vert \nonumber \end{align}

Noting that $\widetilde{x}=\lambda s_{k}+(1-\lambda )x_{k},\ \ \ \lambda \in ]0,1[,\ $then

\begin{equation} \left\Vert x_{k}-\widetilde{x}\right\Vert =\left\Vert (1-\lambda )(x_{k}-x_{s})\right\Vert \leq \left\vert (1-\lambda )\right\vert \left\Vert (x_{k}-x_{s})\right\Vert <\left\Vert (x_{k}-x_{s})\right\Vert .\tag {14}\end{equation}

(13) and (14) imply

\[ f\left( s_{k}\right) -f(x_{k})\leq -\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert K\left\Vert x_{s}-x_{k}\right\Vert \]

Remark that $x_{s}-x_{k}=x_{k}-\alpha _{k}\nabla f\left( x_{k}\right) -x_{k}= $ $-\alpha _{k}\nabla f\left( x_{k}\right) ,$ $\alpha _{k}=\tfrac {\overline{\alpha }}{2^{t}},\ $then

\begin{equation} f\left( s_{k}\right) -f(x_{k})\leq -\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert K\alpha _{k}\left\Vert \nabla f\left( x_{k}\right) \right\Vert .\tag {15}\end{equation}

Hence, we obtain

\begin{equation} f\left( s_{k}\right) -f(x_{k})\leq -\tfrac {\overline{\alpha }}{2^{t}}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}\left[ 1-K\tfrac {\overline{\alpha }}{2^{t}}\right] .\tag {16}\end{equation}

Now choose $\ t{\gt}0,$ the smallest integer such that the following relation is true

\begin{equation} 2^{t}\geq \tfrac {K\overline{\alpha }}{1-c}\tag {17}\end{equation}

(17) implies

\begin{equation} 1-K\tfrac {\overline{\alpha }}{2^{t}}\geq c.\tag {18}\end{equation}

Hence

\begin{equation} 1-K\tfrac {\overline{\alpha }}{2^{t-1}} < c. \tag {19}\end{equation}

(18) implies

\begin{equation} -\tfrac {\overline{\alpha }}{2^{t}}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}\left[ 1-K\tfrac {\overline{\alpha }}{2^{t}}\right] \leq -\tfrac {\overline{\alpha }}{2^{t}}c\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2},\ \ \ \ c\in ]0,1[,\tag {20}\end{equation}

and (19) gives

\[ -K\tfrac {\overline{\alpha }}{2^{t-1}}{\lt}c-1\Rightarrow K\tfrac {\overline{\alpha }}{2^{t-1}}{\gt}1-c. \]

Hence

\begin{equation} -\tfrac {\overline{\alpha }}{2^{t}}c<-\tfrac {c(1-c)}{2K}.\tag {21}\end{equation}

Note that the choice of $t{\gt}0$ satisfying (17) implies that the inequality (9) holds true. Therefore and taking into account the relations (16), (18), (19), (20), (21), we obtain

\begin{equation} f\left( s_{k}\right) -f(x_{k})\leq -\tfrac {c(1-c)}{2K}\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}.\tag {22}\end{equation}

Denote by $G=-\tfrac {c(1-c)}{2K}.$ It is clear that $G{\lt}0$ and (22) gives

\begin{equation} f\left( s_{k}\right) -f(x_{k})\leq G\left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}.\tag {23}\end{equation}

Consider now $t_{k}\ $the successor of $s_{k},\ t_{k}=s_{k}-\beta _{k}\nabla f(s_{k}).\ $By doing the same with $t_{k},$ we obtain

\begin{equation} f\left( t_{k}\right) -f(s_{k})\leq G\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}.\tag {24}\end{equation}

We will prove now that

\[ \underset {k\rightarrow \infty }{\lim }\left\Vert \nabla f\left( x_{k}\right) \right\Vert =0. \]

To this end, consider

\[ f\left( x_{k+1}\right) -f(x_{k})=f\left( \varepsilon _{2}^{k}\right) -f\left( t_{k}\right) +f\left( t_{k}\right) -f(s_{k})+f(s_{k})-f(x_{k}) \]

Note that

\[ f\left( \varepsilon _{2}^{k}\right) -f\left( t_{k}\right) {\lt}0\ \ \ (see\ (4)). \]

then

\[ f\left( x_{k+1}\right) -f(x_{k}){\lt}f\left( t_{k}\right) -f(s_{k})+f(s_{k})-f(x_{k}) \]

The relations (23) and (24) imply

\begin{equation} f\left( x_{k+1}\right) -f(x_{k}) < G\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) .\tag {25}\end{equation}

Noting that $\left\{ f(x_{k})\right\} _{k\in \mathbb {N}}$ is a monotone decreasing sequence and so has a limit $($otherwise $\inf f(x)=-\infty )$. Hence, the relation (25) implies

\begin{equation} \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}<\tfrac {1}{G}\left[ f\left( x_{k+1}\right) -f(x_{k})\right] .\tag {26}\end{equation}

Taking $\overline{\lim }$ as $k\rightarrow \infty ,$ we get

\begin{equation} \underset {k\rightarrow \infty }{\overline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \leq 0\tag {27}\end{equation}

On the other hand we have

\begin{equation} \underset {k\rightarrow \infty }{\underline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \leq \underset {k\rightarrow \infty }{\overline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \tag {28}\end{equation}

and

\begin{equation} 0\leq \underset {k\rightarrow \infty }{\underline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \tag {29}\end{equation}

The inequalities (27), (28) and (29) imply

\begin{align} 0 & \leq \underset {k\rightarrow \infty }{\underline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \tag {30}\\ & \leq \underset {k\rightarrow \infty }{\overline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \leq 0\nonumber \end{align}

which implies

\begin{align} & \underset {k\rightarrow \infty }{\underline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}\! +\! \left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) =\underset {k\rightarrow \infty }{\overline{\lim }}\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}\! +\! \left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) \nonumber \\ & =\underset {k\rightarrow \infty }{\lim }\left( \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\right) =0\nonumber \end{align}

Notice that we have

\begin{equation} 0\leq \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}\leq \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\tag {32}\end{equation}

and

\begin{equation} 0\leq \left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\leq \left\Vert \nabla f\left( x_{k}\right) \right\Vert ^{2}+\left\Vert \nabla f\left( s_{k}\right) \right\Vert ^{2}\tag {33}\end{equation}

Finally, the inequalities (31), (32) and (33) imply

\[ \underset {k\rightarrow \infty }{\lim }\left\Vert \nabla f\left( x_{k}\right) \right\Vert =\underset {k\rightarrow \infty }{\lim }\left\Vert \nabla f\left( s_{k}\right) \right\Vert =0. \]

5 Numerical results and comparisons

In this section we report some numerical results obtained with an implementation of the Armijo Epsilon Steepest Descent algorithm. For our numerical tests, we used test functions and Fortran programs from ([2],[7]). Considering the same criteria as in ([3]), the code is written in Fortran and compiled with f90 on a Workstation Intel Pentium 4 with 2 GHz. We selected a number of 52 unconstrained optimization test functions in generalized or extended form [34] (some from CUTE library [7]). For each test function we have taken twenty (20) numerical experiments with the number of variables increasing as$\ n=2,10,30,50,70,100,$ $300,500,700,900,1000,2000,3000,4000,5000,$ $6000,7000,8000,9000,10000.$ The algorithm implements the Armijo line search conditions ([1]), and the same stopping criterion $\left\Vert \nabla f\left( x_{k}\right) \right\Vert {\lt}10^{-6}.$ In all the algorithms we considered in this numerical study the maximum number of iterations is limited to 100000.

The comparisons of algorithms are given in the following context. Let $f_{i}^{ALG1}\ $and $\ f_{i}^{ALG2}\ $be the optimal value found by ALG1 and ALG2, for problem $i=1,...,962,$ respectively. We say that, in the particular problem $i,$ the performance of ALG1 was better than the performance of ALG2 if:

\[ \left\vert f_{i}^{ALG1}-f_{i}^{ALG2}\right\vert {\lt}10^{-3} \]

and the number of iterations, or the number of function-gradient evaluations, or the CPU time of ALG1 was less than the number of iterations, or the number of function-gradient evaluations, or the CPU time corresponding to ALG2, respectively.

In the set of numerical experiments we compare Armijo Epsilon Steepest Descent algorithm versus Steepest descent algorithm. Figure 1 shows the Dolan and Moré CPU performance profile of Armijo Epsilon Steepest Descent algorithm versus Steepest descent algorithm.

In a performance profile plot, the top curve corresponds to the method that solved the most problems in a time that was within a factor $\tau $ of the best time. The percentage of the test problems for which a method is the fastest is given on the left axis of the plot. The right side of the plot gives the percentage of the test problems that were successfully solved by these algorithms, respectively. Mainly, the right side is a measure of the robustness of an algorithm. When comparing Armijo Epsilon Steepest Descent algorithm with Steepest descent algorithm subject to CPU time metric we see that Armijo Epsilon Steepest Descent algorithm is top performer. The Armijo Epsilon Steepest Descent algorithm is more successful than the Epsilon Steepest Descent algorithm.

$\includegraphics[ height=3.98in, width=4.47in]{figura.png}$

Figure 1 The Dolan and Moré CPU performance profile of Armijo Epsilon Steepest Descent algorithm versus Steepest descent algorithm.

Acknowledgement

The authors are very grateful to Professor Neculai Andrei for his help and his valuable comments and suggestions on the early version of the paper.

Bibliography

1: A.C. Aitken, On Bernoulli’s numerical solution of algebraic equations, Proc. Roy. Soc. Edinburgh, 46 (1926), pp. 289–305.
2: N. Andrei, An unconstrained optimization test functions collection, Advanced Modeling and Optimization, 10 (2008), pp. 147–161.
3: N. Andrei, Another conjugate gradient algorithm for unconstrained optimization, Annals of Academy of Romanian Scientists, Series on Science and Technology of Information, 1 (2008) no. 1, pp. 7–20.
4: L. Armijo, Minimization of functions having Lipschitz continuous first partial derivatives, Pacific J. Mathematics, 16 (1966) 1, pp. 1–3.
5: M.S. Bazaraa, H.D. Sherali and C.M Shetty, Nonlinear Programing, John Wiley & Sons, New York, 1993.
6: C. Brezinski, Acceleration de la convergence en analyse numérique, Lecture Notes in Mathematics, 584 (1977), Springer Verlag.
7: I. Bongartz, A.R. Conn, N.I.M. Gould and P.L. Toint, CUTE: constrained and unconstrained testing environments, ACM Trans. Math. Software, 21 (1995), pp.123–160.
8: C.G. Broyden, Quasi-Newton Methods and their application to Function Minimization, Mathematics of Coputation, 21 (1967), pp. 368–381.
9: C.G. Broyden, The convergence of a class of double rank minimization algorithms 2. The new algorithm, J. Institute of Mathematics and its applications, 6 (1970), pp. 222–231.
10: C.G. Broyden, J.E. Jr. Dennis and J .J. Moré, On the local and superlinear convergence of quasi-Newton methods, J .Inst. Math. Appl., 12 (1973), pp. 223–246.
11: F. Cordellier, Transformations de suites scalaires et vectorielles, Thèse de doctorat d’état soutenue à l’université de Lille I, 1981.
12: W.C. Davidon, Variable Metric Method for Minimization, AEC research Development, Report ANL-5990, 1959.
13: J.E.Jr. Dennis and J.J. Moré, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549–560.
14: J.E. Dennis and J.J. Moré, Quasi-Newton methods, motivation and theory, SIAM. Rev., 19 (1977), pp. 46–89.
15: L.C.W. Dixon, Variable metric algorithms: necessary and sufficient conditions for identical behavior on nonquadratic functions, J. Opt. Theory Appl., 10 (1972), pp. 34–40.
16: N. Djeghaba and R. Benzine, Accélération de la convergence de la méthode de la plus forte pente, Demonstratio Mathematica., 39 (2006) No. 1, pp. 169–181.
17: R. Fletcher, A new approch to Variable Metric Algorithms, Computer Journal, 13 (1970), pp. 317–322.
18: R. Fletcher, Practical methods of Optimization, Second Edition, John Wiley & Sons, Chichester, 1987.
19: R. Fletcher, An overview of unconstrained optimization, Algorithms for Continuous Optimization: the State of Art, E. Spedicato, ed., Kluwer Academic Publishers, 1994.
20: R. Fletcher, and M. Reeves, Function minimization by conjugate gradients, Computer J., 7 (1964), pp. 149–154.
21: R. Fletcher, and M.M. Powell, A rapidly Convergent Descent Method for Minimization, Computer Journal, 6 (1963), pp. 163–168.
22: G.E. Forsythe, On the asymptotic directions of the s-dimentional optimum gradient method, Numerische Mathematik, 11, pp. 57–76.
23: P.E. Gill and W. Marray, Quasi-Newton Methods for unconstrained optimization, J.Inst. Maths applics, 9 (1972), pp. 91–108.
24: A. Griewank, The global convergence of partitioned BFGS on problems with convex decompositions and Lipschitz gradients, Math. Prog., 50 (1991), pp. 141–175.
25: D. Goldfarb, A Family of Variable Metric Methods Derived by Variational Means, Mathematics of Computation, 24 (1970), pp. 23–26.
26: A.A. Goldstein and J.F. Price, An effective Algorithm for Minimization, Numerische Mathematik, 10 (1967), pp. 184–189.
27: M.R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nation-Bureau Stand., 49 (1952) No. 6, pp. 409–436.
28: J. Nocedal and S.J. Wright, Numerical Optimization, Springer, Second edition, 2006.
29: E. Polak and G. Ribiere, Note sur la convergence de méthodes de directions conjuguées, Revue Française Informatique et Recherche opérationnelle, 16 (1969), pp. 35–43.
30: B.T. Polyak, The Method of Conjugate Gradient in Extremum Problems, USSR Computational Mathematics and Mathematical Physics, (English Translation), 9 (1969), pp. 94–112.
31: M.J.D, Powell, On the convergence of the variable metric algorithms, J. Inst. Math. Appl., 7 (1971), pp. 21–36.
32: M.J.D. Powell, Some global convergence properties of variable metric algorithms for minimization without exact line searches, Nonlinear Programming, SIAM-AMS Proceedings, IX (1976), R.W. Cottle and C.E. Lemke eds., SIAM.
33: D.F. Shanno, Conditionning of quasi-Newton Methods for function minimization, Mathematics of Computation, 24 (1970), pp. 641–656.
34: P. Wolfe, Convergence conditions for ascent méthods, Siam Review, 11 (1969), pp. 226–235.
35: P. Wynn, On a device for computing the $e_{m}(S_{n})$ transformation, M.T.A.C., 10 (1956), pp. 91–96.
36: P. Wynn, Upon systems of recursions which obtain among quotients of the Padé table, Numer. Math., 8 (1966), pp. 264–269.

\(\varepsilon _{-1}^{0,i}=0\)
	\(\varepsilon _{0}^{0,i}=x_{0}^{i}\)
\(\varepsilon _{-1}^{1,i}=0\)		\(\varepsilon _{1}^{0,i}\)
	\(\varepsilon _{0}^{1,i}=x_{1}^{i}\)		\(\varepsilon _{2}^{0,i}\)
\(\varepsilon _{-1}^{2,i}=0\)		\(\varepsilon _{1}^{1,i}\)		\(\varepsilon _{3}^{0,i}\)
	\(\varepsilon _{0}^{2,i}=x_{2}^{i}\)		\(\varepsilon _{2}^{1,i}\)		\(\varepsilon _{4}^{0,i}\)
\(\varepsilon _{-1}^{3,i}=0\)		\(\varepsilon _{1}^{2,i}\)		\(\varepsilon _{3}^{1,i}\)		\(\varepsilon _{5}^{0,i}\)
	\(\varepsilon _{0}^{3,i}=x_{3}^{i}\)		\(\varepsilon _{2}^{2,i}\)		\(\varepsilon _{4}^{1,i}\)		\(\varepsilon _{6}^{0,i}\)
\(\varepsilon _{-1}^{4,i}=0\)		\(\varepsilon _{1}^{3,i}\)		\(\varepsilon _{3}^{2,i}\)		\(\varepsilon _{5}^{1,i}\)		\(\varepsilon _{7}^{0,i}\)
	\(\varepsilon _{0}^{4,i}=x_{4}^{i}\)		\(\varepsilon _{2}^{3,i}\)	\(\varepsilon _{4}^{2,i}\)		\(\varepsilon _{6}^{1,i}\)		\(-\)
\(\varepsilon _{-1}^{5,i}=0\)		\(\varepsilon _{1}^{4,i}\)		\(\varepsilon _{3}^{3,i}\)		\(\varepsilon _{5}^{2,i}\)		\(\varepsilon _{7}^{1,i}\)
	\(\varepsilon _{0}^{5,i}=x_{5}^{i}\)		\(\varepsilon _{2}^{4,i}\)		\(\varepsilon _{4}^{3,i}\)		\(\varepsilon _{6}^{2,i}\)		\(-\)
\(\varepsilon _{-1}^{6,i}=0\)		\(\varepsilon _{1}^{5,i}\)		\(\varepsilon _{3}^{4,i}\)		\(\varepsilon _{5}^{3,i}\)		\(\varepsilon _{7}^{2,i}\)
	\(\varepsilon _{0}^{6,i}=x_{6}^{i}\)		\(\varepsilon _{2}^{5,i}\)		\(\varepsilon _{4}^{4,i}\)		\(\varepsilon _{6}^{3,i}\)		\(-\)
\(\varepsilon _{-1}^{7,i}=0\)		\(\varepsilon _{1}^{6,i}\)		\(\varepsilon _{3}^{5,i}\)		\(\varepsilon _{5}^{4,i}\)		\(\varepsilon _{7}^{3,i}\)