[Computer-go] Accelerating Self-Play Learning in Go
Darren Cook
darren at dcook.org
Fri Mar 8 04:26:58 PST 2019
> Blog post:
> https://blog.janestreet.com/accelerating-self-play-learning-in-go/
> Paper: https://arxiv.org/abs/1902.10565
I read the paper, and really enjoyed it: lots of different ideas being
tried. I was especially satisfied to see figure 12 and the big
difference giving some go features made.
Though it would be good to see figure 8 shown in terms of wall clock
time, on equivalent hardware. How much extra computation do all the
extra ideas add? (Maybe it is in the paper, and I missed it?)
> I found some other interesting results, too - for example contrary to
> intuition built up from earlier-generation MCTS programs in Go,
> putting significant weight on score maximization rather than only
> win/loss seems to help.
Score maximization in self-play means it is encouraged to play more
aggressively/dangerously, by creating life/death problems on the board.
A player of similar strength doesn't know how to exploit the weaknesses
left behind. (One of the asymmetries of go?)
I hope you are able to continue the experiment, with more training time,
to see if it flattens out or keeps improving.
Darren
More information about the Computer-go
mailing list