[Computer-go] Computer-go Digest, Vol 12, Issue 79

Wed Jan 26 05:44:11 PST 2011

Hi Aja,

that's a good question. At least for the LGR policy without forgetting 
(https://webdisk.lclark.edu/drake/publications/drake-icga-2009.pdf), 
only using the first appearance of a reply did not significantly differ 
in performance. A possible explanation could be that in cases where the 
same move by the same player appears twice in a playout, the first stone 
must have been captured, and therefore the answer to the second play is 
the one that really influences the final position/result. I'm not sure I 
repeated this experiment with LGRF, but I did try dismissing the tails 
of playouts (with the rationale that there might be too much noise) and 
ignoring stones that would later be captured (with the rationale that 
those moves might be bad on average). Both variants were significantly 
weaker than plain LGRF.
It's only a few lines of code, test it and see if it makes a difference 
for your playout policy and program architecture. Stronger playout 
policies than Orego's will have different interactions with LGRF. You 
could even try saving several sets of replies per intersection, for the 
first, second, third appearance of the previous move in a playout, in 
the hope of capturing certain tactical situations with sacrifices. But I 
don't expect much.

Hendrik

Am 26.01.2011 14:13, schrieb Aja:
> Hi Hendrik,
>
> Thanks.
>
> Congratulations, you have done a really nice work. I check your 
> thesis. My result is consistent with yours of LBR-2. No benefit at 
> all, so I took it off. I adapt LGR-1 to softmax policy of Erica. 
> Basically, I am tuning the probability offset by checking some 
> aritifical test-positions. In 3000 playouts, now it scores around 57% 
> after 500 games, almost 60%, which is my target (my intuition is LGR-1 
> should help a lot already). :)
>
> Actually I have one question and still can't figure out your 
> reasoning. In a playout, why do you over-write the earlier replies by 
> the later ones? Using the earliest one looks more reasonable to me.
>
> Aja
>
> ----- Original Message ----- From: "Hendrik Baier" <>
> To: <computer-go at dvandva.org>
> Sent: Wednesday, January 26, 2011 5:00 PM
> Subject: Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79
>
>
>> Hi Aja,
>>
>> I would be interested in your results. I think the LGRF policy is 
>> only a small first step into the direction of more adaptive playouts 
>> (and hopefully the overcoming of the horizon effect).
>> As for the Last-Bad-Reply idea, you can read about my experiences 
>> with this and related policies in my Master's thesis, if you're 
>> interested. It contains the idea that resulted in the "Power of 
>> Forgetting" paper as well.
>> http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf 
>>
>>
>> regards,
>> Hendrik
>>
>>> I admit that it's difficult for me to include such deterministic 
>>> default policy. :-)
>>> With softmax policy, using the information of "last-LOST-reply" is 
>>> maybe a good direction.
>>>
>>> Aja
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at dvandva.org
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go 
>