[Computer-go] Exploration formulas for UCT

Hiroshi Yamashita yss at bd.mbn.or.jp
Sat Jan 1 19:16:12 PST 2011


For Aya,

(1 - beta) * (win_rate + 0.31 * sqrt( ln(parent_visits) / child_visits)) + beta (rave_win_rate *  0.31 * sqrt( 
ln(rave_parent_visits) / rave_child_visits))

beta = sqrt(100 / (3 * child_visits + 100));

Aya uses Progressive Windening. High order N moves are only considerd.

PW_sort_N = ln(parent_visits/ 40.0) / ln(1.4) +2;

Moves are sorted sometimes by rave value, Criticality, and MC owners.


I also would like to know how to count rave.

UCT searches B(E5),W(D3),B(C5),W(F7), and in this position, playout searches
 B(E7),W(E8),B(D8),W(F8),B(D7)...Black win.

In W(D3) positions, Aya updates RAVE and UCT,
Updates  C5(UCT)
Updates  C5(RAVE)
Updates  E7(RAVE)
Updates  D8(RAVE)
Updates  D7(RAVE)

I think "Updates C5(RAVE)" is strange, but I could not get good result without this.

Hiroshi Yamashita


----- Original Message ----- 
From: "David Fotland" <fotland at smart-games.com>
To: <computer-go at dvandva.org>
Sent: Sunday, January 02, 2011 5:18 AM
Subject: [Computer-go] Exploration formulas for UCT
>
> It would be interesting to see the actual formulas used for choosing the more to try in the tree part of the search.
>
> For Many Faces, it is:
>
> (1 – beta) * (win_rate + 0.45 * sqrt( ln(parent_visits) / child visits)) +
>
> beta * rave_win_rate + mfgo_bias
>
> beta is the old Mogo formula of sqrt(500/(500 + 3 * parent_visits))
>
> A child with no visits has a win_rate of 1.1.  Otherwise there is no win_rate bias.
>
> rave wins and visits are strongly biased when moves are generated using various rules and information from the mfgo 
> move generator (in a range of 10% to 90% win rate, with hundreds to thousands of visits).
>
> mfgo_bias is unchanging, per move, within a range of about +-2%, based on mfgo’s move generator’s estimate of the 
> quality of the move.
>
> Does anyone else want to share?
>
> David




More information about the Computer-go mailing list