[Computer-go] Computer-go Digest, Vol 12, Issue 89
Hendrik Baier
hendrik.baier at googlemail.com
Sun Jan 30 05:44:07 PST 2011
Hi Aja,
57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this.
Did you take the reduced playouts per second into account in your
experiments? How many games did you play?
As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns.
What are your results for plain LGRF-1 without patterns, and did you try
LGRF-2 at all?
As for your question, the behavior of playout policies in an MCTS
context is of course always difficult to interpret. In addition, you use
a softmax framework while Orego plays deterministically (with the
exception of a random fallback policy if no capture, escape or matching
pattern is found).
It is possible that canned responses in LGRF fashion have a certain
expected quality that does not change much with the quality of the
underlying policy. In that case, they could lead to big improvements, to
no effect or even to degradation of playing strength depending on how
strong your program already is pre-LGRF. Of course this also depends on
how you prioritize LGRF and how successful you are in replacing
low-quality moves, but not high-quality moves of your default policy
with its suggestions. We would need to study this systematically.
Hendrik
> Hi Hendrik,
>
> Expect to see your contributions on adaptive playout in the future.
>
> I have tested 20k playouts for LGRF with 3x3 patterns. It's a bit strange
> that the best performance is only around 57.5%, even though I tuned the
> probability offset very hard. LGRF with 3x3 patterns slows down Erica almost
> 15%. Overall, This improvement is much weaker than yours in the paper.
> Incrementally updating larger patterns is too costly from my past
> experiments, so I don't plan to try it. But I am wondering why Orego can get
> so BIG improvement while Erica can't.
>
> Aja
More information about the Computer-go
mailing list