[Computer-go] The heuristic "last good reply"

Tue Jan 25 10:27:24 PST 2011

On Jan 25, 2011, at 10:19 AM, Aja wrote:

> Dear all,
>
> Today I have tried Professor Drake's "last good reply" in Erica. So  
> far, I got at most 20-30 elo from it.
>
> I tested by self-play, with 3000 playouts/move on 19x19. The amount  
> of playouts might be too few, but I would like to test more playouts  
> IF the playing strength is not weaker with 3000 playouts.

Yes -- the smallest experiments in the paper were with 8k playouts per  
move. There may not be time to fill up the reply tables with only 3k.

> From this preliminary experiments with 3000 playouts, I have some  
> observations:
>
> 1. In Erica, it's better to consider probability for this heuristic.
>
> 2. In Prof. Drake's implementation, there is a weakness in learning.  
> I think the main problem is that for a reply which is  
> deterministically played by default policy, there is no room to  
> learn a new reply. For example, if "save by capture" produces a lost  
> game, then in the next simulation, it will still play "save by  
> capture" by default policy. If I am wrong in this point, I am glad  
> to be corrected by anyone.

This is true, but only if the previous move (or previous two moves)  
come up again in exactly the same board configuration. When the  
configuration is exactly the same, we are probably still in the search  
tree, which overrides the policy. If we are beyond the tree, the  
configuration is almost always different.

> 3. This heuristic has potential to perform better in Erica. I hope  
> this brief result would encourage other authors to try it.

It's reassuring to see that you got some strength improvement out of it!

Thanks,

Peter Drake
http://www.lclark.edu/~drake/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20110125/cfd2c8b4/attachment.html>