[Computer-go] Computer-go Digest, Vol 12, Issue 89

Sun Jan 30 07:13:24 PST 2011

> 57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this. Did 
> you take the reduced playouts per second into account in your experiments? 
> How many games did you play?

    I was planning to test fixed time per game after having 100 elo 
improvement with fixed 20k playouts per game. Then I was a bit disappointed 
and stopped the testing at 450 games with 57.5%, same with the result of 
LGR-1.

> As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns. What 
> are your results for plain LGRF-1 without patterns, and did you try LGRF-2 
> at all?

   Yes, this testing was LGRF-2 with 3x3 patterns checked for the reply, 
last move, and second reply. A reply is played only if all the three 
patterns match. I haven't tested LGRF-2 without 3x3 patterns, because I 
thought patterns should help a lot. But looks like I had too much 
condifence. :(

> It is possible that canned responses in LGRF fashion have a certain 
> expected quality that does not change much with the quality of the 
> underlying policy. In that case, they could lead to big improvements, to 
> no effect or even to degradation of playing strength depending on how 
> strong your program already is pre-LGRF. Of course this also depends on 
> how you prioritize LGRF and how successful you are in replacing 
> low-quality moves, but not high-quality moves of your default policy with 
> its suggestions. We would need to study this systematically.

   My target was set on 100 Elo, or at least 70 Elo. 50 Elo with 15% speed 
cost is not very satisfying (for me). Agree, we would need to do systematic 
experiments to adapt LGRF-2 to softmax. And I believe it is a very good 
research direction to develop an algorithm to automatically learn the 
feature weights combined with LGRF.

Aja