[Computer-go] The heuristic "last good reply"

Tue Jan 25 10:19:21 PST 2011

Dear all,

Today I have tried Professor Drake's "last good reply" in Erica. So far, I got at most 20-30 elo from it.

I tested by self-play, with 3000 playouts/move on 19x19. The amount of playouts might be too few, but I would like to test more playouts IF the playing strength is not weaker with 3000 playouts.

At first I tried the original scheme: play the "last good reply" deterministically, but it did not work at all. Then I tried to increase the probability of the "last good reply" (since I use probabilistic simulation in Erica), then the winning rate became almost 50% after 250 games. 

Finally I tried to include "forgetting", the winning rate increased to around 55% after 500 games. I also tried to decrease the probability for the "last-LOST-reply", still 50% after 200 games.

>From this preliminary experiments with 3000 playouts, I have some observations:

1. In Erica, it's better to consider probability for this heuristic.

2. In Prof. Drake's implementation, there is a weakness in learning. I think the main problem is that for a reply which is deterministically played by default policy, there is no room to learn a new reply. For example, if "save by capture" produces a lost game, then in the next simulation, it will still play "save by capture" by default policy. If I am wrong in this point, I am glad to be corrected by anyone.

3. This heuristic has potential to perform better in Erica. I hope this brief result would encourage other authors to try it.

Aja

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20110126/b76d30ab/attachment.html>