[Computer-go] The heuristic "last good reply"
Aja
ajahuang at gmail.com
Tue Jan 25 10:19:21 PST 2011
Dear all,
Today I have tried Professor Drake's "last good reply" in Erica. So far, I got at most 20-30 elo from it.
I tested by self-play, with 3000 playouts/move on 19x19. The amount of playouts might be too few, but I would like to test more playouts IF the playing strength is not weaker with 3000 playouts.
At first I tried the original scheme: play the "last good reply" deterministically, but it did not work at all. Then I tried to increase the probability of the "last good reply" (since I use probabilistic simulation in Erica), then the winning rate became almost 50% after 250 games.
Finally I tried to include "forgetting", the winning rate increased to around 55% after 500 games. I also tried to decrease the probability for the "last-LOST-reply", still 50% after 200 games.
>From this preliminary experiments with 3000 playouts, I have some observations:
1. In Erica, it's better to consider probability for this heuristic.
2. In Prof. Drake's implementation, there is a weakness in learning. I think the main problem is that for a reply which is deterministically played by default policy, there is no room to learn a new reply. For example, if "save by capture" produces a lost game, then in the next simulation, it will still play "save by capture" by default policy. If I am wrong in this point, I am glad to be corrected by anyone.
3. This heuristic has potential to perform better in Erica. I hope this brief result would encourage other authors to try it.
Aja
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20110126/b76d30ab/attachment.html>
More information about the Computer-go
mailing list