Misplaced Pages

Collinear gradients method

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
The topic of this article may not meet Misplaced Pages's general notability guideline. Please help to demonstrate the notability of the topic by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond a mere trivial mention. If notability cannot be shown, the article is likely to be merged, redirected, or deleted.
Find sources: "Collinear gradients method" – news · newspapers · books · scholar · JSTOR (January 2025) (Learn how and when to remove this message)
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Collinear gradients method" – news · newspapers · books · scholar · JSTOR (January 2025)
This article is written like a manual or guide. Please help rewrite this article and remove advice or instruction. (January 2025)
(Learn how and when to remove this message)

Collinear gradients method (ColGM)  is an iterative method of directional search for the local extremum of a smooth multivariate function J ( u ) : R n R {\displaystyle J(u)\colon \mathbb {R} ^{n}\to \mathbb {R} } , which do moving towards the extremum along the vector d R n {\displaystyle d\in \mathbb {R} ^{n}} such that the gradients J ( u ) J ( u + d ) {\displaystyle \nabla J(u)\parallel \nabla J(u+d)} , i.e. they are collinear vectors. This is a first-order method (it uses only the first derivatives J {\displaystyle \nabla J} ) with a quadratic convergence rate. It can be applied to functions of high dimension n {\displaystyle n} with several local extremes. GolGM can be attributed to the Truncated Newton methods family.

Collinear vectors J ( u k ) {\displaystyle \nabla J(u^{k})} and J ( u k ) {\displaystyle \nabla J(u^{k_{\ast }})} with the direction of minimization d k {\displaystyle d^{k}} for a convex function, n = 2 {\displaystyle n=2}

The concept of the method

For a smooth function J ( u ) {\displaystyle J(u)} in a relatively large vicinity of a point u k {\displaystyle u^{k}} , there is a point u k {\displaystyle u^{k_{\ast }}} , where the gradients J k = def J ( u k ) {\displaystyle \nabla J^{k}\,{\overset {\textrm {def}}{=}}\,\nabla J(u^{k})} and J k = def J ( u k ) {\displaystyle \nabla J^{k_{\ast }}\,{\overset {\textrm {def}}{=}}\,\nabla J(u^{k_{\ast }})} are collinear vectors. The direction to the extremum u {\displaystyle u_{\ast }} from the point u k {\displaystyle u^{k}} will be the direction d k = ( u k u k ) {\displaystyle d^{k}={(u^{k_{\ast }}-u}^{k})} . The vector d k {\displaystyle d^{k}} points to the maximum or minimum, depending on the position of the point u k {\displaystyle u^{k_{\ast }}} . It can be in front or behind of u k {\displaystyle u^{k}} relative to the direction to u {\displaystyle u_{\ast }} (see the picture). Next, we will consider minimization.

The next iteration of ColGM:

(1) 
  
    
      
        
        
          u
          
            k
            +
            1
          
        
        =
        
          u
          
            k
          
        
        +
        
          b
          
            k
          
        
        
          d
          
            k
          
        
        ,
        
        k
        
        
          {
          
            0
            ,
            1
            
          
          }
        
        ,
      
    
    {\displaystyle \quad u^{k+1}=u^{k}+b^{k}d^{k},\quad k\in \left\{0,1\dots \right\},}
  

where the optimal b k R {\displaystyle b^{k}\in \mathbb {R} } is found analytically from the assumption that the one-dimensional function J ( u k + b d k ) {\displaystyle J(u^{k}+bd^{k})} is quadratic:

(2) b k = ( 1 J ( u k , d k J ( u k ) , d k ) 1 , u k . {\displaystyle \quad b^{k}=\left(1-{\frac {\langle \nabla J(u^{k_{\ast }},d^{k}\rangle }{\langle \nabla J(u^{k}),d^{k}\rangle }}\right)^{-1},\quad \forall u^{k_{\ast }}.}

Angle brackets are an inner product in the Euclidean space R n {\displaystyle \mathbb {R} ^{n}} . If J ( u ) {\displaystyle J(u)} is a convex function in the vicinity of u k {\displaystyle u^{k}} , then for the front point u k {\displaystyle u^{k_{\ast }}} we get the number b k > 0 {\displaystyle b^{k}>0} , for the back b k < 0 {\displaystyle b^{k}<0} . In any case, we follow step by (1).

For a strictly convex quadratic function J ( u ) {\displaystyle J(u)} the ColGM step is

 
  
    
      
        
          b
          
            k
          
        
        
          d
          
            k
          
        
        =
        
        
          H
          
            
            1
          
        
        
        
          J
          
            k
          
        
        ,
      
    
    {\displaystyle b^{k}d^{k}=-H^{-1}\nabla J^{k},}
  

i.e. it is a Newton's step (a second-order method with a quadratic convergence rate), where H {\displaystyle H} is the Hesse matrix. Such steps ensure the quadratic convergence rate for ColGM.

In general, if J ( u ) {\displaystyle J(u)} has a variable convexity and saddle points are possible, then the minimization direction should be checked by the angle γ {\displaystyle \gamma } between the vectors J k {\displaystyle \nabla J^{k}} and d k {\displaystyle d^{k}} . If cos ( γ ) = J k , d k | | J ( u k ) | | | | d k | | 0 {\displaystyle \cos(\gamma )={\frac {\langle \nabla J^{k},d^{k}\rangle }{||\nabla J(u^{k})||\;||d^{k}||}}\geq 0} , then d k {\displaystyle d^{k}} is the direction of maximization, and in (1) we should take b k {\displaystyle b^{k}} with the opposite sign.

Search for collinear gradients

Collinearity of gradients is estimated by the residual of their directions, which has the form of a system of n {\displaystyle n} equations for search a root u = u k {\displaystyle u=u^{k_{\ast }}} :

(3) 
  
    
      
        
        
          r
          
            k
          
        
        (
        u
        )
        =
        
          
            
              
              J
              (
              u
              )
            
            
              
                |
              
              
                |
              
              
              J
              (
              u
              )
              
                |
              
              
                |
              
            
          
        
        s
        
        
          
            
              
              J
              (
              
                u
                
                  k
                
              
              )
            
            
              
                |
              
              
                |
              
              
              J
              (
              
                u
                
                  k
                
              
              )
              
                |
              
              
                |
              
            
          
        
        =
        0
        
        
          
            R
          
          
            n
          
        
        ,
      
    
    {\displaystyle \quad r^{k}(u)={\frac {\nabla J(u)}{||\nabla J(u)||}}s-{\frac {\nabla J(u^{k})}{||\nabla J(u^{k})||}}=0\in \mathbb {R} ^{n},}
  

where the sign s = sgn J ( u ) , J ( u k ) {\displaystyle s=\operatorname {sgn} \langle \nabla J(u),\nabla J(u^{k})\rangle } , this allows us to equally evaluate the collinearity of gradients, both co-directional and oppositely directed, | | r k ( u ) | | 2 {\displaystyle ||r^{k}(u)||\leq {\sqrt {2}}} .

System (3) is solved iteratively (sub-iterations l {\displaystyle l\,} ) by the conjugate gradient method, assuming that the system is linear in the u k {\displaystyle u^{k}} -vicinity:

(4) u k l + 1 = u k l + τ l p l , l = 1 , 2 , {\displaystyle \quad u^{k_{l+1}}=u^{k_{l}}+\tau ^{l}p^{l},\quad l=1,2\ldots ,}

where vector p l = def p ( u k l ) = r l + β l p l 1 {\displaystyle \;p^{l}\;{\overset {\textrm {def}}{=}}\,p(u^{k_{l}})=-r^{l}+{\beta ^{l}p}^{l-1}} , r l = def r ( u k l ) {\displaystyle \;r^{l}\,{\overset {\textrm {def}}{=}}\,r(u^{k_{l}})} , β l = | | r l | | 2 / | | r l 1 | | 2 ,   β 1 , n , 2 n . . . = 0 {\displaystyle \;\beta ^{l}=||r^{l}||^{2}/||r^{l-1}||^{2},\ \beta ^{1,n,2n...}=0} , τ l = | | r l | | 2 / p l , H l p l {\displaystyle \;\tau ^{l}=||r^{l}||^{2}/\langle p^{l},H^{l}p^{l}\rangle } , the product of the Hesse matrix H l {\displaystyle H^{l}} by p l {\displaystyle p^{l}} is found by numerical differentiation:

(5) H l p l r ( u k h ) r ( u k l ) h / | | p l | | , {\displaystyle \quad H^{l}p^{l}\approx {\frac {r(u^{k_{h}})-r(u^{k_{l}})}{h/||p^{l}||}},}

where u k h = u k l + h p l / | | p l | | {\displaystyle u^{k_{h}}=u^{k_{l}}+hp^{l}/||p^{l}||} , h {\displaystyle h}  is a small positive number such that p l , H l p l 0 {\displaystyle \langle p^{l},H^{l}p^{l}\rangle \neq 0} .

The initial approximation is set at 45° to all coordinate axes and with δ k {\displaystyle \delta ^{k}} -length:

(6) u i k 1 = u i k + δ k n sgn   i J k , i = 1 n . {\displaystyle \quad u_{i}^{k_{1}}=u_{i}^{k}+{\frac {\delta ^{k}}{\sqrt {n}}}\operatorname {sgn} {\ \nabla _{i}J}^{k},\quad i=1\ldots n.}

The initial radius δ k {\displaystyle \delta ^{k}} is the vicinity of the point u k {\displaystyle u^{k}} and it is modifid:

(7) δ k = max [ min ( δ k 1 | | J ( u k ) | | | | J ( u k 1 ) | | , δ 0 ) , δ m ] , k > 0. {\displaystyle \quad \delta ^{k}=\max \left,\quad k>0.}

Necessary | | u k l u k | | δ m , l 1 {\displaystyle ||u^{k_{l}}-u^{k}||\geq \delta ^{m},\;l\geq 1} . Here, the small positive number δ m {\displaystyle \delta _{m}} is noticeably larger than the machine epsilon.

Sub-iterations l {\displaystyle l} terminate when at least one of the conditions is met:

  1. | | r l | | c 1 2 , 0 c 1 < 1 {\displaystyle ||r^{l}||\leq c_{1}{\sqrt {2}},\quad 0\leq c_{1}<1}  — accuracy achieved;
  2. | | | r l | | | | r l 1 | | | | r l | | | c 1 , l > 1 {\displaystyle \left|{\frac {||r^{l}||-||r^{l-1}||}{||r^{l}||}}\right|\leq c_{1},\quad l>1}  — convergence has stopped;
  3. l l m a x = integer | c 2 ln c 1 ln n | , c 2 1 {\displaystyle l\leq l_{max}=\operatorname {integer} \left|c_{2}\ln c_{1}\ln n\right|,\quad c_{2}\geq 1}  — redundancy of sub-iterations.

Algorithm for choosing the minimization direction

  • Parameters: c 1 , c 2 , δ 0 , δ m = 10 15 δ 0 , h = 10 5 {\displaystyle c_{1},c_{2},\delta ^{0},\delta _{m}=10^{-15}\delta ^{0},h=10^{-5}} .
  • Input data: n , k = 0 , u k , J ( u k ) , J ( u k ) , J k {\displaystyle n,k=0,u^{k},J(u^{k}),\nabla J(u^{k}),\nabla J^{k}} .
  1. l = 1 {\displaystyle l=1} . If k > 0 {\displaystyle k>0} then set δ k {\displaystyle \delta ^{k}} from (7).
  2. Find u k l {\displaystyle u^{k_{l}}} from (6).
  3. Calculate J l , | | J l | | {\displaystyle \nabla J^{l},||\nabla J^{l}||} . Find r l {\displaystyle r^{l}} from (3) when u = u k l {\displaystyle u=u^{k_{l}}} .
  4. If | | r l | | c 1 2 {\displaystyle ||r^{l}||\leq c_{1}{\sqrt {2}}\,} or l l m a x {\displaystyle l\geq l_{max}} or | | u k l u k | | < δ m {\displaystyle ||u^{k_{l}}-u^{k}||<\delta _{m}} or { l > 1 {\displaystyle \,l>1} and | | | r l | | | | r l 1 | | | | r l | | | c 1 {\displaystyle \left|{\frac {||r^{l}||-||r^{l-1}||}{||r^{l}||}}\right|\leq c_{1}} } then set { u k = u k l {\displaystyle u^{k_{\ast }}=u^{k_{l}}} , return J ( u k ) {\displaystyle \nabla J\left(u^{k_{\ast }}\right)} and d k = ( u k u k ) {\displaystyle d^{k}={(u^{k_{\ast }}-u}^{k})} , stop}.
  5. If l 1 , n , 2 n , 3 n {\displaystyle l\neq 1,n,2n,3n\ldots } then set β l = | | r l | | 2 / | | r l 1 | | 2 {\displaystyle \beta ^{l}=||r^{l}||^{2}/||r^{l-1}||^{2}} else β l = 0 {\displaystyle \beta ^{l}=0} .
  6. Find p l = r l + β l p l 1 {\displaystyle p^{l}=-r^{l}+\beta ^{l}p^{l-1}} .
  7. Searching for τ l {\displaystyle \tau ^{l}} :
    1. Memorize u k l {\displaystyle u^{k_{l}}} , J l {\displaystyle \nabla J^{l}} , | | J l | | {\displaystyle ||\nabla J^{l}||} , r l {\displaystyle r^{l}} , | | r l | | {\displaystyle ||r^{l}||} ;
    2. Find u k h = u k l + h p l / | | p l | | {\displaystyle u^{k_{h}}=u^{k_{l}}+hp^{l}/||p^{l}||} . Calculate J ( u k h ) {\displaystyle \nabla J(u^{k_{h}})} and r ( u k h ) {\displaystyle r\left(u^{k_{h}}\right)} . Find H l p l {\displaystyle H^{l}p^{l}} from (5) and assign w p l , H l p l {\displaystyle w\leftarrow \langle p^{l},H^{l}p^{l}\rangle } ;
    3. If w = 0 {\displaystyle w=0} then h 10 h {\displaystyle h\leftarrow 10h} and return to step 7.2;
    4. Restore u k l {\displaystyle u^{k_{l}}} , J l {\displaystyle \nabla J^{l}} , | | J l | | {\displaystyle ||\nabla J^{l}||} , r l {\displaystyle r^{l}} , | | r l | | {\displaystyle ||r^{l}||} ;
    5. Set τ l = | | r l | | 2 / w {\displaystyle \tau ^{l}=||r^{l}||^{2}/w} .
  8. Perform sub-iteration u k l + 1 {\displaystyle u^{k_{l+1}}} from (4).
  9. l l + 1 {\displaystyle l\leftarrow l+1} , Go to step 3

The parameter c 2 = 3 ÷ 5 {\displaystyle c_{2}=3\div 5} . For functions without saddle points, we recommend c 1 10 8 {\displaystyle c_{1}\approx 10^{-8}} , δ 10 5 {\displaystyle \delta \approx {10}^{-5}} . To "bypass" saddle points, we recommend c 1 0.1 {\displaystyle c_{1}\approx 0.1} , δ 0.1 {\displaystyle \delta \approx 0.1} .

The described algorithm allows us to approximately find collinear gradients from the system of equations (3). Therefore, the resulting direction b k d k {\displaystyle b^{k}d^{k}} for the ColGM algorithm (1) will be approximate Newton direction (truncated Newton method).

Demonstrations

In all the demonstrations, ColGM shows convergence no worse and sometimes even better (for functions of variable convexity) than Newton's method.

The "rotated ellipsoid" test function

A strictly convex quadratic function:

ColGM minimization, n = 2 {\displaystyle n=2}
J ( u ) = i = 1 n ( j = 1 i u j ) 2 , u = ( 0...0 ) . {\displaystyle J(u)=\sum _{i=1}^{n}\left(\sum _{j=1}^{i}u_{j}\right)^{2},\quad u_{\ast }=(0...0).}

In the drawing, three black starting points u 0 {\displaystyle u^{0}} are set for n = 2 {\displaystyle {\color {red}n=2}} . The gray dots are sub-iterations of u 0 l {\displaystyle u^{0_{l}}} with δ 0 = 0.5 {\displaystyle \delta ^{0}=0.5} (shown as a dotted line, inflated for demonstration). Parameters c 1 = 10 8 {\displaystyle c_{1}=10^{-8}} , c 2 = 4 {\displaystyle c_{2}=4} . It took one iteration for all u 0 {\displaystyle u^{0}} and no more than two sub-iterations l {\displaystyle l} .

For n = 1000 {\displaystyle {\color {red}n=1000}} (parameter δ 0 = 10 5 {\displaystyle \delta ^{0}={10}^{-5}} ) with the starting point u 0 = ( 1...1 ) {\displaystyle u^{0}=(-1...1)} ColGM achieved u {\displaystyle u_{\ast }} with an accuracy of 1% in 3 iterations and 754 calculations J {\displaystyle J} and J {\displaystyle \nabla J} . Other first-order methods: Quasi-Newtonian BFGS (working with matrices) required 66 iterations and 788 calculations; conjugate gradients (Fletcher—Reeves) - 274 iterations and 2236 calculations; Newton's finite difference method — 1 iteration and 1001 calculations. Newton's method of the second order — 1 iteration.

As the dimension n {\displaystyle \color {red}n} increases, computational errors in the implementation of the collinearity condition (3) may increase markedly. Because of this, the ColGM, in comparison with the Newton's method, in the considered example required more than one iteration.

ColGM minimization: 3 iterations and 16 calculations J {\displaystyle J} and J {\displaystyle \nabla J}

Test function Rosenbrock

J ( u ) = 100 ( u 1 2 u 2 ) 2 + ( u 1 1 ) 2 , u = ( 1 , 1 ) . {\displaystyle J(u)=100(u_{1}^{2}-u_{2})^{2}+(u_{1}-1)^{2},\quad u_{\ast }=(1,1).}

Parameters c 1 = 10 8 {\displaystyle c_{1}=10^{-8}} , c 2 = 4 {\displaystyle c_{2}=4} , δ 0 = 10 5 {\displaystyle \delta ^{0}={10}^{-5}} . The descent trajectory of the ColGM completely coincides with the Newton's method. In the drawing, the blue starting point is u 0 = ( 0.8 ; 1.2 ) {\displaystyle u^{0}=\left(-0.8;-1.2\right)} , and the red one is u {\displaystyle u_{\ast }} . Unit vector of the gradient are drawn at each point u k {\displaystyle u^{k}} .

Test function Himmelblau function

J ( u ) = ( u 1 2 + u 2 11 ) 2 + ( u 1 + u 2 2 7 ) 2 . {\displaystyle J(u)=(u_{1}^{2}+u_{2}-11)^{2}+(u_{1}+u_{2}^{2}-7)^{2}.}

Parameters c 1 = 0.1 {\displaystyle c_{1}=0.1} , c 2 = 4 {\displaystyle c_{2}=4} , δ 0 = 0.05 {\displaystyle \delta ^{0}=0.05} .

ColGM minimization: 7 iterations and 22 calculations J {\displaystyle J} and J {\displaystyle \nabla J} . The red lines are cos γ 0 {\displaystyle \cos {\gamma }\geq 0} .
Minimization by Newton's method: 9 iterations ( b k = 1 {\displaystyle b^{k}=1} )
Minimization by conjugate gradient method (Fletcher-Reeves): 9 iterations and 62 calculations J {\displaystyle J} and J {\displaystyle \nabla J}
Minimization by quasi-Newton BFGS: 6 iterations and 55 calculations J {\displaystyle J} and J {\displaystyle \nabla J} . Red line (violation of the curvature condition) — steepest descent method.

ColGM is very economical in terms of the number of calculations J {\displaystyle J} and J {\displaystyle \nabla J} . Due to formula (2), it does not require expensive calculations of the step multiplier b k {\displaystyle b^{k}} by linear search (for example, golden-section search, etc.).

Optimization: Algorithms, methods, and heuristics
Unconstrained nonlinear
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
Constrained nonlinear
General
Differentiable
Convex optimization
Convex
minimization
Linear and
quadratic
Interior point
Basis-exchange
Combinatorial
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows
Metaheuristics

References

  1. Tolstykh V.K. Collinear Gradients Method for Minimizing Smooth Functions // Oper. Res. Forum. — 2023. — Vol. 4. — No. 20. — doi: s43069-023-00193-9
  2. Tolstykh V.K. Free demo Windows application Optimization
Category: