[Computer-go] Computer-go Digest, Vol 12, Issue 89
Aja
ajahuang at gmail.com
Sun Jan 30 07:13:24 PST 2011
> 57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this. Did
> you take the reduced playouts per second into account in your experiments?
> How many games did you play?
I was planning to test fixed time per game after having 100 elo
improvement with fixed 20k playouts per game. Then I was a bit disappointed
and stopped the testing at 450 games with 57.5%, same with the result of
LGR-1.
> As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns. What
> are your results for plain LGRF-1 without patterns, and did you try LGRF-2
> at all?
Yes, this testing was LGRF-2 with 3x3 patterns checked for the reply,
last move, and second reply. A reply is played only if all the three
patterns match. I haven't tested LGRF-2 without 3x3 patterns, because I
thought patterns should help a lot. But looks like I had too much
condifence. :(
> It is possible that canned responses in LGRF fashion have a certain
> expected quality that does not change much with the quality of the
> underlying policy. In that case, they could lead to big improvements, to
> no effect or even to degradation of playing strength depending on how
> strong your program already is pre-LGRF. Of course this also depends on
> how you prioritize LGRF and how successful you are in replacing
> low-quality moves, but not high-quality moves of your default policy with
> its suggestions. We would need to study this systematically.
My target was set on 100 Elo, or at least 70 Elo. 50 Elo with 15% speed
cost is not very satisfying (for me). Agree, we would need to do systematic
experiments to adapt LGRF-2 to softmax. And I believe it is a very good
research direction to develop an algorithm to automatically learn the
feature weights combined with LGRF.
Aja
More information about the Computer-go
mailing list