[Computer-go] Exploration formulas for UCT

Sat Jan 1 12:18:46 PST 2011

It would be interesting to see the actual formulas used for choosing the more to try in the tree part of the search.

For Many Faces, it is:

(1 – beta) * (win_rate + 0.45 * sqrt( ln(parent_visits) / child visits)) +

beta * rave_win_rate + mfgo_bias 

beta is the old Mogo formula of sqrt(500/(500 + 3 * parent_visits))

A child with no visits has a win_rate of 1.1.  Otherwise there is no win_rate bias.

rave wins and visits are strongly biased when moves are generated using various rules and information from the mfgo move generator (in a range of 10% to 90% win rate, with hundreds to thousands of visits).

mfgo_bias is unchanging, per move, within a range of about +-2%, based on mfgo’s move generator’s estimate of the quality of the move.

Does anyone else want to share?

David

From: computer-go-bounces at dvandva.org [mailto:computer-go-bounces at dvandva.org] On Behalf Of Fuming Wang
Sent: Saturday, January 01, 2011 9:00 AM
To: Aja; computer-go at dvandva.org
Subject: Re: [Computer-go] Fwd: News on Tromp-Cook ?

Hi Aja,

On Sun, Jan 2, 2011 at 12:16 AM, Aja <ajahuang at gmail.com> wrote:

Hi Fuming,

Most of the current strong programs are using UCT combined with RAVE (a kind of AMAF). The formula is like this (there are many variants),

C*RAVE+(1-C)*UCT

This has been my understanding. However, I am surprized to find out that people have been setting C close to one, according to Petr and Oliver's postings, which is essentially AMAF. MF apparently is doing something different.

Fuming

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20110101/163cb364/attachment.html>