[Computer-go] Computer-go Digest, Vol 12, Issue 89

Hendrik Baier hendrik.baier at googlemail.com
Sun Jan 30 05:44:07 PST 2011


Hi Aja,

57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this. 
Did you take the reduced playouts per second into account in your 
experiments? How many games did you play?
As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns. 
What are your results for plain LGRF-1 without patterns, and did you try 
LGRF-2 at all?
As for your question, the behavior of playout policies in an MCTS 
context is of course always difficult to interpret. In addition, you use 
a softmax framework while Orego plays deterministically (with the 
exception of a random fallback policy if no capture, escape or matching 
pattern is found).
It is possible that canned responses in LGRF fashion have a certain 
expected quality that does not change much with the quality of the 
underlying policy. In that case, they could lead to big improvements, to 
no effect or even to degradation of playing strength depending on how 
strong your program already is pre-LGRF. Of course this also depends on 
how you prioritize LGRF and how successful you are in replacing 
low-quality moves, but not high-quality moves of your default policy 
with its suggestions. We would need to study this systematically.

Hendrik

> Hi Hendrik,
>
> Expect to see your contributions on adaptive playout in the future.
>
> I have tested 20k playouts for LGRF with 3x3 patterns. It's a bit strange
> that the best performance is only around 57.5%, even though I tuned the
> probability offset very hard. LGRF with 3x3 patterns slows down Erica almost
> 15%. Overall, This improvement is much weaker than yours in the paper.
> Incrementally updating larger patterns is too costly from my past
> experiments, so I don't plan to try it. But I am wondering why Orego can get
> so BIG improvement while Erica can't.
>
> Aja




More information about the Computer-go mailing list