Stefano Martiniani edited minimisation.tex  over 9 years ago

Commit id: 26800373a82f5eb9842334fcc847fc3cd5709077

deletions | additions      

       

Just so that we are all on the same page I am going to outline below how we will be doing the minimisations. Currently  -------------------------------------------------------------------------------  run FIRE(d_{tol}, g_{tol})  1. if d \leq d_{tol} converged to the same minimum  2. if d>d_{tol} but g \leq g_{tol} converged to another minimum  3. if n > n_{max}^{FIRE} quench failed  -------------------------------------------------------------------------------  In order to set the parameters d_{tol} and g_{tol} we make use of some simple arguments. We set  d_{tol} = d_{k_{max}} \times s  where d_{k_{max}} is the radius we obtained from the findk routine, which should correspond approximately to the radius of the harmonic hypersphere and s \in (0.1, 0.01)  is some safety factor, because the radius obtained by findk is larger than the actual harmonic basin. From some tests run by hand it turns out that this should be more about 1/100 to get the right results.  Then from the observation that the basin is harmonic at the bottom of the well we estimate an upper bound for the gradient at d_{tol}, we write  g_{tol} = \nabla \left ( \frac{1}{2} \lambda_{min} x^2 \right)_{x=d_{tol}} \times s' = \lambda_{min} \times d_{tol} \times s'   where \lambda_{min} is the lowest eigenvalue for the full hessian and the s' \in (0,1,0.01) is just another safety factor, I found that 1/10 yields the same results as 1/100, it's more important to choose well d_{tol} to obtain consistent results, at the least on the basin I looked at.  To obtain robust results, this choice of g_{tol} can only be an upper bound and the minimisation must have a more stringent convergence condition to avoid that the minimisation gets stuck on a plateau and get the wrong acceptance, besides the fact that we would identify an unstable structure as a new minimum.   At the same time we want the minimisation to be as fast as possible and FIRE spends an awful lot of time stepping through those regions of low gradient towards the very end of each minimisation. To make this more efficient we note that L-BFGS is a lot faster than FIRE near the minimum, when g is small and by that point we should also be far away from the boundaries of the basin, hence L-BFGS should not suffer from numerical instabilities.  A FIRE+L-BFGS hybrid minimiser would work as follows:  -------------------------------------------------------------------------------------------  run FIRE(d_{tol}, g_{tol})  1. if d \leq d_{tol} converged to the same minimum  2. if d>d_{tol} but g \leq g_{tol} :  run LBFGS(d_{tol}, g_{tol} \times s'')    2.1 if d \leq d_{tol} converged to the same minimum    2.2 if d>d_{tol} but g \leq g_{tol} \times s'' converged to another minimum  2.3 TEST that the derivative of g during the descent was not large and negative  2.3.1 if TEST fails back-trace to 2. and run FIRE(d_{tol},g_{tol} \times s')  2.4 if m > m_{max}^{LBFGS} quench failed  3. if n > n_{max}^{FIRE} quench failed  -------------------------------------------------------------------------------------------  where s'' \in (10^{-5},10^{-7}) is another arbitrary safety factor. Point 2.3.1 can be though in a couple of ways:   we back-trace and make the convergence criterion for fire more stringent. Then, once the gradient becomes smaller than g_{tol} \times s', try again with LBFGS to get to the very low gradients, unless the tolerance is not satisfied. This goes on in cricles until we run out of FIRE steps, then we stop trying. This means that at the third attempt we would have g_{tol} \times s' \times s' and so on, this should be very rare however.  we just run fire all the way to the bottom with the very high precision, but this is likely to not converge (because we run out of steps) and it is expensive. We should not encounter many plateaus though.  I prefer the first option, although it is slightly more complicated to write. Initially we are going to compute the second derivative using 5 points equally spaced, Jake is thinking of a way to use the lbfgs estimate for the hessian. Please let me know whether you have any comments, I will start the implementation now.