[Computer-go] Value network that doesn't want to learn.

Mon Jun 19 08:38:23 PDT 2017

Hello everyone,

For my master thesis, I have built an AI that has a strategical approach 
to the game. It doesn’t play but simply describe the strategy behind all 
possible move for a given strategy ("enclosing this group", "making life 
for this group", "saving these stones", etc). My main idea is that once 
associated with a playing AI, I will be able to generate comments on a 
position (and then teach people). So for my final experiment, I’m trying 
to build a playing AI. I don’t want it to be highly competitive, I just 
need it to be decent (1d or so), so I thought about using a policy 
network, a value network and a simple MCTS.  The MCTS works fine, the 
policy network learns quickly and is accurate, but the value network 
seems to never learn, even the slightest.

During my research, I’ve trained a lot of different networks, first on 
9x9 then on 19x19, and as far as I remember all the nets I’ve worked 
with learned quickly (especially during the first batches), except the 
value net which has always been problematic (diverge easily, doesn't 
learn quickly,...) . I have been stuck on the 19x19 value network for a 
couple months now. I’ve tried countless of inputs (feature planes) and 
lots of different models, even using the exact same code as others. Yet, 
whatever I try, the loss value doesn’t move an inch and accuracy stays 
at 50% (even after days of training). I've tried to change the learning 
rate (increase/decrease), it doesn't change. However, if I feed a stupid 
value as target output (for example black always win) it has no trouble 
learning.
It is even more frustrating that training any other kind of network 
(predicting next move, territory,...) goes smoothly and fast.

Has anyone experienced a similar problem with value networks or has an 
idea of the cause?

Thank you