[Computer-go] AMAF/RAVE + heavy playouts - is it save?

Tue Nov 3 12:39:20 PST 2015

This helps very much, thank you for taking the time to answer!

You might be looking for for "Combining Online and Offline Knowledge in
UCT" [1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo
Simulation Balancing" [2] with "Unfortunately, a stronger simulation
policy can actually lead to a weaker Monte-Carlo search (Gelly & Silver,
2007), a paradox that we explore further in this paper."

I'll make it a priority to read both papers in detail thank you! If you
meant another paper, someone else knows one I'm happy to see more
references.

Thanks!
Tobi

[1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf
[2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf

On 03.11.2015 21:03, robertfinkng555 at o2.co.uk wrote:
> You have to be careful what heuristics you apply. This was a
> surprising result: using a playout policy which in itself is a
> stronger go player can actually make MCTS/AMAF weaker. The reason is
> that MCTS depends entirely on accurate estimations of the value of
> each position in the tree. Any playout policy which introduces a bias
> therefore weakens MCTS. It may increase precision (lower standard
> deviation) but gives a less accurate assessment of the value (an
> incorrect mean). Most playouts at the moment (at least published ones)
> are based on Remi's Mogo playout policy, which increases precision
> without sacrificing accuracy.
>
> There's a really nice diagram in one of David Silver's papers
> illustrating the effect that bias can have on playouts. As soon as you
> see it you understand the problem. Unfortunately I don't have it to
> hand and have unfortunately run out of time looking for it, otherwise
> I'd reference it. Hopefully somebody else can give the reference. I
> suspect David probably co-authored the paper in which case apologies
> to the other author for not crediting them here!
>
> I hope this helps
>
> Regards
>
> Raffles
>
> On 03-Nov-15 19:38, Tobias Pfeiffer wrote:
>> Hi everyone,
>>
>> I haven't yet caught up on most recent go papers. If what I ask is
>> answered in one of these, please point there.
>>
>> It seems everyone is using quite heavy playouts these days (nxn
>> patterns, atari escapes, opening libraris, lots of stuff that I don't
>> know yet, ...) - my question is how does that mix with AMAF/RAVE? I
>> remember from the early papers, that they said it'd be dangerous to do
>> it with non random playouts and that they shouldn't have too much logic.
>>
>> Which, well, makes sense (to me) because the argument is that we play
>> random moves so they are order independent. With patterns that doesn't
>> hold true anymore.
>>
>> What's the experience out there? Does it just still work? Does it not
>> matter because you just "warm up" the tree? Or do you need to be careful
>> with what heuristics you apply not too break RAVE/AMAF?
>>
>> Thank you!
>> Tobi
>>
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

-- 
www.pragtob.info

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20151103/3254530d/attachment.html>