[Computer-go] progressive strategies

Sat Jan 15 06:05:13 PST 2011

I recently tried to get ELO features, as describe in the first paper
you link, to work in my playouts to no avail (see: "Oakfoam and ELO
Features"). This has not changed since, however, I then decided to try
"progressive widening" as described in Coulom's paper. It uses the ELO
feature strengths to order the moves and then unprunes moves according
to the number of playouts.

I managed to get a large strength increase by doing progressive
widening/unpruning. In the order of 300 ELO on 9x9 against GnuGo. I
then retrained the features for 19x19, which gave me a hit on 9x9, but
should do much better on 19x19.

Some of the parameters seem to be quite delicate and a small change
can give quiet a change in strength. It seems that my program usually
finds a good line of thinking, but sometimes the best move is pruned
and doesn't not get looked at sufficiently. It's difficult to say if
this will hamper my program in the future or not, but for now it works
well.
--
Francois van Niekerk
Email: flash.slash at gmail.com | Twitter: @francoisvn
Cell: +2784 0350 214 | Website: http://leafcloud.com

On Thu, Jan 13, 2011 at 8:59 PM, Olivier Teytaud <olivier.teytaud at lri.fr> wrote:
> Dear all,
>
>    there are papers using a bandit formula only for the n^alpha "first"
> moves, for a given order of moves (Coulom, Chaslot et al, Couetoux et al):
> Coulom: http://hal.inria.fr/inria-00149859_v1/
> Chaslot et al:
> http://www.personeel.unimaas.nl/g-chaslot/papers/progressivemcts.pdf
> Couetoux et al: http://hal.archives-ouvertes.fr/hal-00542673/en/
>
> This is known efficient in some cases whenever we use a random ordering of
> unvisited moves (no RAVE values, no heuristic value), in particular with
> infinitely many legal moves (roughly speaking, it's better to study 100
> moves with 100 simulations each than to sample 10000 moves with only 1
> simulation each).
> (the paper by Wang, Audibert & Munos is good around that
> http://certis.enpc.fr/~audibert/Mes%20articles/NIPS08.pdf).
>
> In Go we don't use this, but for some other problems we do, in particular
> with no ranking of unvisited moves. Our constant
> "alpha" is usually 1/2, in this context.
>
> Anyone wanting to share his experience around that (with or without
> heuristic ranking of legal moves) ?
>
> Best regards,
> Olivier
>
>
>
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>