[Computer-go] win rate bias and CLOP

Wed Jan 4 13:56:24 PST 2012

On 4 janv. 2012, at 16:22, Olivier Teytaud wrote:

> 
> BTW, I imagine that "CLOP" could be any fully automated parameter tuning
> solution. That is, nothing here is really specific to CLOP. It just happens
> that CLOP is the first fully automated parameter tuning system that I have
> made to work.
> 
> 
> Hi Brian and others;
> I'm not sure that CLOP could be any fully automated parameter tuning solution.
> The point is that there are roughly two families of algorithms:
> 1) those in which exploration is roughly equal to recommendation; we test parameters close to our current approximation
>     of the optimum;
> 2) those in which exploration is really different from recommendation: we test parameters possibly far from the optimum,
>     because the variance is higher and we get more information there.
> I assume that CLOP is in category 2. 

I don't think CLOP fits in any of these categories. CLOP tries to balance bias and variance optimally by sampling over an area that is as small as possible (to avoid regression bias because the function to be optimized might not be quadratic or symmetric), but big enough to be confident that values on the border are inferior to the optimal win rate (otherwise we can have no idea where the maximum is).

The big deal with CLOP, compared to methods such as UCT or CEM, is not about sampling far or close. The sample distribution of CLOP is very similar to the sample distribution of Gaussian CEM. The big deal with CLOP is that it is an order of magnitude faster when the function to be optimized is smooth (bounded second-order derivative, say). That is because it is not a comparison-based algorithm (like population-based methods). It seems that comparison-based approaches cannot do better than 1/sqrt(N), whereas CLOP can do N^{-2/3} asymptotic expected simple regret.

The function to be optimized does not have to be symmetric. I know optimizing symmetric functions is very popular in some theoretical literature of stochastic optimization, but I don't expect these results have much practical value, because functions to be optimized in practice are never symmetric around the maximum. Lack of symmetry really changes the nature of the problem.

The really fundamental result in terms of asymptotic speed of convergence is that paper:
@article{
 Chen-1988a,
 author = "Hung Chen",
 title = "Lower Rate of Convergence for Locating a Maximum of a Function",
 journal = "The Annals of Statistics",
 volume = 16,
 number = 3,
 pages = "1330--1334",
 year = 1988
}
in short: bounded d-th derivative => O(N^{d/(d+1)}) lower bound on the asymptotic rate of convergence. This is a universal result. It applies to _any_ algorithm. I must admit I don't really understand the theorem, but I intuitively agree with it.

Rémi