[Computer-go] Accelerating Self-Play Learning in Go

Fri Mar 8 04:26:58 PST 2019

> Blog post:
> https://blog.janestreet.com/accelerating-self-play-learning-in-go/ 
> Paper: https://arxiv.org/abs/1902.10565

I read the paper, and really enjoyed it: lots of different ideas being
tried. I was especially satisfied to see figure 12 and the big
difference giving some go features made.

Though it would be good to see figure 8 shown in terms of wall clock
time, on equivalent hardware. How much extra computation do all the
extra ideas add? (Maybe it is in the paper, and I missed it?)

> I found some other interesting results, too - for example contrary to
> intuition built up from earlier-generation MCTS programs in Go,
> putting significant weight on score maximization rather than only
> win/loss seems to help.

Score maximization in self-play means it is encouraged to play more
aggressively/dangerously, by creating life/death problems on the board.
A player of similar strength doesn't know how to exploit the weaknesses
left behind. (One of the asymmetries of go?)

I hope you are able to continue the experiment, with more training time,
to see if it flattens out or keeps improving.

Darren