[Computer-go] Accelerating Self-Play Learning in Go

Sun Mar 3 18:40:28 PST 2019

>From before AlphaGo was announced, I thought the way forward was
generating games that play to the bitter end maximizing score, and
then using the final ownership as something to predict. I am very glad
that someone has had the time to put this idea (and many others!) into
practice. Congratulations on a very compelling paper.

Álvaro.

On Sun, Mar 3, 2019 at 9:21 PM David Wu <lightvector at gmail.com> wrote:
>
> For any interested people on this list who don't follow Leela Zero discussion or reddit threads:
>
> I recently released a paper on ways to improve the efficiency of AlphaZero-like learning in Go. A variety of the ideas tried deviate a little from "pure zero" (e.g. ladder detection, predicting board ownership), but still only uses self-play starting from random and with no outside human data.
>
> Although longer training runs have NOT yet been tested, for reaching up to about LZ130 strength so far (strong human pro or just beyond it, depending on hardware), you can speed up the learning to that point by roughly a factor of 5 at least compared to Leela Zero, and closer to a factor of 30 for merely reaching the earlier level of very strong amateur strength rather than pro or superhuman.
>
> I found some other interesting results, too - for example contrary to intuition built up from earlier-generation MCTS programs in Go, putting significant weight on score maximization rather than only win/loss seems to help.
>
> Blog post: https://blog.janestreet.com/accelerating-self-play-learning-in-go/
> Paper: https://arxiv.org/abs/1902.10565
> Code: https://github.com/lightvector/KataGo
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go