[Computer-go] Exploration formulas for UCT

Sat Jan 1 19:41:11 PST 2011

Hi Hiroshi,

> (1 - beta) * (win_rate + 0.31 * sqrt( ln(parent_visits) / child_visits)) + 
> beta (rave_win_rate *  0.31 * sqrt( ln(rave_parent_visits) / 
> rave_child_visits))

   I suggest to take off the exploration_term of RAVE, just like Silver 
suggested in his PhD thesis. Considering exploration for RAVE is a bit 
meaningless, since in a node normally all moves are updated at the same 
time.

> UCT searches B(E5),W(D3),B(C5),W(F7), and in this position, playout 
> searches
> B(E7),W(E8),B(D8),W(F8),B(D7)...Black win.
>
> In W(D3) positions, Aya updates RAVE and UCT,
> Updates  C5(UCT)
> Updates  C5(RAVE)
> Updates  E7(RAVE)
> Updates  D8(RAVE)
> Updates  D7(RAVE)
>
> I think "Updates C5(RAVE)" is strange, but I could not get good result 
> without this.

   I can't see why it is strange and wonder why do you think so. In Erica, I 
update C5(RAVE) as well.

  Aja