[Computer-go] Exploration formulas for UCT
Hiroshi Yamashita
yss at bd.mbn.or.jp
Sat Jan 1 19:16:12 PST 2011
For Aya,
(1 - beta) * (win_rate + 0.31 * sqrt( ln(parent_visits) / child_visits)) + beta (rave_win_rate * 0.31 * sqrt(
ln(rave_parent_visits) / rave_child_visits))
beta = sqrt(100 / (3 * child_visits + 100));
Aya uses Progressive Windening. High order N moves are only considerd.
PW_sort_N = ln(parent_visits/ 40.0) / ln(1.4) +2;
Moves are sorted sometimes by rave value, Criticality, and MC owners.
I also would like to know how to count rave.
UCT searches B(E5),W(D3),B(C5),W(F7), and in this position, playout searches
B(E7),W(E8),B(D8),W(F8),B(D7)...Black win.
In W(D3) positions, Aya updates RAVE and UCT,
Updates C5(UCT)
Updates C5(RAVE)
Updates E7(RAVE)
Updates D8(RAVE)
Updates D7(RAVE)
I think "Updates C5(RAVE)" is strange, but I could not get good result without this.
Hiroshi Yamashita
----- Original Message -----
From: "David Fotland" <fotland at smart-games.com>
To: <computer-go at dvandva.org>
Sent: Sunday, January 02, 2011 5:18 AM
Subject: [Computer-go] Exploration formulas for UCT
>
> It would be interesting to see the actual formulas used for choosing the more to try in the tree part of the search.
>
> For Many Faces, it is:
>
> (1 – beta) * (win_rate + 0.45 * sqrt( ln(parent_visits) / child visits)) +
>
> beta * rave_win_rate + mfgo_bias
>
> beta is the old Mogo formula of sqrt(500/(500 + 3 * parent_visits))
>
> A child with no visits has a win_rate of 1.1. Otherwise there is no win_rate bias.
>
> rave wins and visits are strongly biased when moves are generated using various rules and information from the mfgo
> move generator (in a range of 10% to 90% win rate, with hundreds to thousands of visits).
>
> mfgo_bias is unchanging, per move, within a range of about +-2%, based on mfgo’s move generator’s estimate of the
> quality of the move.
>
> Does anyone else want to share?
>
> David
More information about the Computer-go
mailing list