[Computer-go] Computer-go Digest, Vol 12, Issue 79

Wed Jan 26 05:13:53 PST 2011

Hi Hendrik,

Thanks.

Congratulations, you have done a really nice work. I check your thesis. My 
result is consistent with yours of LBR-2. No benefit at all, so I took it 
off. I adapt LGR-1 to softmax policy of Erica. Basically, I am tuning the 
probability offset by checking some aritifical test-positions. In 3000 
playouts, now it scores around 57% after 500 games, almost 60%, which is my 
target (my intuition is LGR-1 should help a lot already). :)

Actually I have one question and still can't figure out your reasoning. In a 
playout, why do you over-write the earlier replies by the later ones? Using 
the earliest one looks more reasonable to me.

Aja

----- Original Message ----- 
From: "Hendrik Baier" <>
To: <computer-go at dvandva.org>
Sent: Wednesday, January 26, 2011 5:00 PM
Subject: Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

> Hi Aja,
>
> I would be interested in your results. I think the LGRF policy is only a 
> small first step into the direction of more adaptive playouts (and 
> hopefully the overcoming of the horizon effect).
> As for the Last-Bad-Reply idea, you can read about my experiences with 
> this and related policies in my Master's thesis, if you're interested. It 
> contains the idea that resulted in the "Power of Forgetting" paper as 
> well.
> http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf
>
> regards,
> Hendrik
>
>> I admit that it's difficult for me to include such deterministic default 
>> policy. :-)
>> With softmax policy, using the information of "last-LOST-reply" is maybe 
>> a good direction.
>>
>> Aja
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go