[Computer-go] Computer-go Digest, Vol 12, Issue 79
Aja
ajahuang at gmail.com
Wed Jan 26 05:13:53 PST 2011
Hi Hendrik,
Thanks.
Congratulations, you have done a really nice work. I check your thesis. My
result is consistent with yours of LBR-2. No benefit at all, so I took it
off. I adapt LGR-1 to softmax policy of Erica. Basically, I am tuning the
probability offset by checking some aritifical test-positions. In 3000
playouts, now it scores around 57% after 500 games, almost 60%, which is my
target (my intuition is LGR-1 should help a lot already). :)
Actually I have one question and still can't figure out your reasoning. In a
playout, why do you over-write the earlier replies by the later ones? Using
the earliest one looks more reasonable to me.
Aja
----- Original Message -----
From: "Hendrik Baier" <>
To: <computer-go at dvandva.org>
Sent: Wednesday, January 26, 2011 5:00 PM
Subject: Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79
> Hi Aja,
>
> I would be interested in your results. I think the LGRF policy is only a
> small first step into the direction of more adaptive playouts (and
> hopefully the overcoming of the horizon effect).
> As for the Last-Bad-Reply idea, you can read about my experiences with
> this and related policies in my Master's thesis, if you're interested. It
> contains the idea that resulted in the "Power of Forgetting" paper as
> well.
> http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf
>
> regards,
> Hendrik
>
>> I admit that it's difficult for me to include such deterministic default
>> policy. :-)
>> With softmax policy, using the information of "last-LOST-reply" is maybe
>> a good direction.
>>
>> Aja
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
More information about the Computer-go
mailing list