[Computer-go] Value network that doesn't want to learn.

Vincent Richard vincent.francois.richard at gmail.com
Mon Jun 19 12:31:26 PDT 2017


This is what have been thinking about, yet unable to find an error.

Currently, I'm working with:

- SGF Database: fuseki info Tygem -> http://tygem.fuseki.info/index.php 
(until recently I was working with games of all level from KGS)

- The data is then analyzed by a script which extracts all kind of 
features from games. When I'm training a network, I load the features I 
want from this analysis to build the batch. I have 2 possible methods 
for the batch construction. I can either add moves one after the other 
(the fast mode) or pick random moves among different games (slower but 
reduces the variance). I set the batch size according to my GPU memory 
(200 moves in the case of full sized value/policy network). I don't 
think the problem may come from here since the data is the same for all 
the networks

- For the input, I’m using the same architecture as 
https://github.com/TheDuck314/go-NN (I have been trying a lot of kind of 
shapes, from minimalist to alphago)

- For the network, I’m once again using TheDuck314 network 
(EvalModels.Conv11PosDepFC1ELU) with the same layers 
https://github.com/TheDuck314/go-NN/blob/master/engine/Layers.py, and 
the learning rate he recommends

During sime of the tests, all the networks I was training had the same 
layers except for the last. So as you suggested, I was also wondering if 
this last layer wasn’t the problem. Yet, I haven’t found any error.



Le 20-Jun-17 à 3:19 AM, Gian-Carlo Pascutto a écrit :
> On 19-06-17 17:38, Vincent Richard wrote:
>
>> During my research, I’ve trained a lot of different networks, first on
>> 9x9 then on 19x19, and as far as I remember all the nets I’ve worked
>> with learned quickly (especially during the first batches), except the
>> value net which has always been problematic (diverge easily, doesn't
>> learn quickly,...) . I have been stuck on the 19x19 value network for a
>> couple months now. I’ve tried countless of inputs (feature planes) and
>> lots of different models, even using the exact same code as others. Yet,
>> whatever I try, the loss value doesn’t move an inch and accuracy stays
>> at 50% (even after days of training). I've tried to change the learning
>> rate (increase/decrease), it doesn't change. However, if I feed a stupid
>> value as target output (for example black always win) it has no trouble
>> learning.
>> It is even more frustrating that training any other kind of network
>> (predicting next move, territory,...) goes smoothly and fast.
>>
>> Has anyone experienced a similar problem with value networks or has an
>> idea of the cause?
> 1) What is the training data for the value network? How big is it, how
> is it presented/shuffled/prepared?
>
> 2) What is the *exact* structure of the network and training setup?
>
> My best guess would be an error in the construction of the final layers.
>



More information about the Computer-go mailing list