# [Computer-go] Parameter Optimization with Local Quadratic Regression

Jonas Kahn jonas.kahn at math.u-psud.fr
Wed Mar 24 03:21:08 PDT 2010

```On Fri, 19 Mar 2010, Rémi Coulom wrote:

> Jean-loup Gailly wrote:
>> Thanks Rémi for these slides, looking forward to see the paper.
>>
>>> For this application it is safe to assume that the function to be
>>> optimized (the winning rate) is a smooth function of parameters,
>>> and it has no tricky local optima
>>
>> Why is it safe to assume no local optimum? This is only a guess,
>> but with high dimension (many parameters to tune) I would expect
>> local optima. Does your algorithm scale to many dimensions?
>>
>
> I never observed any local optimum in my long experience of tuning
> parameters. My feeling is that, the higher the dimension, the lower the risk
> of local optimum. Each additional dimension is an additional option to escape
> from a local optimum. The Rosenbrock function is a typical example:
> http://en.wikipedia.org/wiki/Rosenbrock_function
> It has local optimal in the x direction, but the whole function has no local
> optima.

As a real-life counter-example, complex systems in statistical physics,
such as spin glasses, have many local minima. Correction: they have
many, *many*, MANY local minima.

On the other hand, I would tend to guess that the kind of random local
interactions that create them do not exist for usual evaluation
functions. Moreover, since the aim is prediction, it does not matter if
two features are highly correlated and only one is really to take into
account: even if we pick the other one, the evaluation function will be
almost as good.

Just to develop:
The cases that are dangerous would be when there are say two groups of features
{f} and {g} with much inner interaction (that is the coefficient of f_1 f_2 and
g_5 g_8 are typically big), almost no interaction betwenn the features
that belong to only one of the two groups (coefficient of f_2 g_7 is zero),
except that a few features are common two the two groups. So here you get some
interaction between the groups. If the common features do not have the
same optimal values within group {f}, and within group {g}, you get
hard-to-get-out local minima.

Anyhow, I'll be sure to look closely, at your work, Rémi.

Jonas

```