[Computer-go] Exploration formulas for UCT
Aja
ajahuang at gmail.com
Sat Jan 1 19:41:11 PST 2011
Hi Hiroshi,
> (1 - beta) * (win_rate + 0.31 * sqrt( ln(parent_visits) / child_visits)) +
> beta (rave_win_rate * 0.31 * sqrt( ln(rave_parent_visits) /
> rave_child_visits))
I suggest to take off the exploration_term of RAVE, just like Silver
suggested in his PhD thesis. Considering exploration for RAVE is a bit
meaningless, since in a node normally all moves are updated at the same
time.
> UCT searches B(E5),W(D3),B(C5),W(F7), and in this position, playout
> searches
> B(E7),W(E8),B(D8),W(F8),B(D7)...Black win.
>
> In W(D3) positions, Aya updates RAVE and UCT,
> Updates C5(UCT)
> Updates C5(RAVE)
> Updates E7(RAVE)
> Updates D8(RAVE)
> Updates D7(RAVE)
>
> I think "Updates C5(RAVE)" is strange, but I could not get good result
> without this.
I can't see why it is strange and wonder why do you think so. In Erica, I
update C5(RAVE) as well.
Aja
More information about the Computer-go
mailing list