[Computer-go] replacing dynamic komi with a scoring function

Mon Jan 9 14:39:49 PST 2012

I think the idea could be extended in lots of different ways too. For
example, you could keep territory maps for each final score (quantized
into some buckets as with the histogram, 2*board_size+1 is way too
much information...). That gives you the histogram for each
intersection, so it would allow you to have some higher level logic,
perhaps enough to detect semeais? I'm quite ignorant on go topics, as
I don't really play it myself, but it seems like you could detect
critical intersections much better.

One downside is that it makes variations in backup functions
harder/slower, and also things like virtual loss. Backup functions are
another area that I think has a lot left to be explored, and I am
quite interested in theoretically. Yeah, yeah, I need to start a go
program...

Zach

On Mon, Jan 9, 2012 at 1:45 PM, Aja Huang <ajahuang at gmail.com> wrote:
> Lets say there is a semeai and Black wins 6.5 points if it wins the semeai.
> Otherwise White wins 3.5 points.
>
> Playout 1    B+6.5
> Playout 2    B+6.5
> Playout 3    B+6.5
> Playout 4    B+6.5
> Playout 5    B+6.5
> Playout 6    B+6.5
> Playout 7    B+6.5
> Playout 8    B+6.5
> Playout 9    B+6.5
> Playout 10 W+3.5
>
> Average score = (6.5 * 9 + (- 3.5)) / 10 = 5.5
>
> In this case, shifting komi to something like “Black 0.5 point win” (namely
> increasing komi by 5) does not work. Black simply has to win this semeai to
> win. So I think Zach’s idea quite makes sense. Actually I had a similar idea
> in the past but still have no time to try it.
>
> Aja
>
>
> From: Stefan Kaitschick
> Sent: Monday, January 09, 2012 2:28 PM
> To: Aja Huang ; computer-go at dvandva.org
> Subject: Re: [Computer-go] replacing dynamic komi with a scoring function
>
> The correct answer is 0, because shifting by just 1 point drops the rate to
> 10%
> But is this really a dynamic komi problem?
> I mean, is there really a graceful way to misevaluate semeais?
> apropos semeais: would it be feasible to expand RAVE to include the
> information of the previous move?
> ( like the "killer heuristic" in chess)
> It could be limited to the 5 by 5 surroundings, so that the previous move is
> one of 24 points or "other".
> But that's still not cheap computationally. I just have no idea how much
> effort it takes to maintain the current RAVE.
>
> Stefan
>
> On Mon, Jan 9, 2012 at 8:14 PM, Aja Huang <ajahuang at gmail.com> wrote:
>>
>> Interesting. I have observed a problem in the current dynamic komi scheme
>> especially when there is a big semeai/life-and-death:
>>
>> Playout 1   B+0.5
>> Playout 2   B+0.5
>> Playout 3   B+0.5
>> Playout 4   B+0.5
>> Playout 5   B+0.5
>> Playout 6   B+0.5
>> Playout 7   B+0.5
>> Playout 8   B+0.5
>> Playout 9   B+0.5
>> Playout 10 W+8.5
>>
>> B has 90% winning rate. How many points should we shift for dynamic komi?
>>
>> Aja
>>
>> -----原始郵件----- From: Zach Wegner
>> Sent: Monday, January 09, 2012 11:53 AM
>>
>> To: computer-go at dvandva.org
>> Subject: Re: [Computer-go] replacing dynamic komi with a scoring function
>>
>> On Mon, Jan 9, 2012 at 7:26 AM, Vlad Dumitrescu <vladdu55 at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> On Mon, Jan 9, 2012 at 13:17, Don Dailey <dailey.don at gmail.com> wrote:
>>>>
>>>> Summary:
>>>>
>>>> I believe a more correct scoring function won't be based on how much you
>>>> win
>>>> by OR how often you win but will incorporate some other more relevant
>>>> concept and it will be dynamic.    And it will not matter if the game is
>>>> a handicap game or otherwise because the scoring function will always be
>>>> relevant.   The goal will be to maximize your winning chances but it
>>>> will incorporate something more sophisticated that just counting how
>>>> often
>>>> you win or how much you win by.
>>>
>>>
>>> I hope I may interfere with something that Don's nice description
>>> revealed to me. It feels rather obvious, but since nobody stated it
>>> explicitly, maybe it's news for at least some people here.
>>>
>>> MCTS is maximizing the chances of winning. These chances are largest
>>> for a minimal score difference because this allows for making some
>>> errors. Winning by the largest possible score has rather small chances
>>> to happen because every move has to be perfect.
>>>
>>> The curve describing the probability of ending the game with a certain
>>> score is bell-shaped and MCTS explores the area beneath it, looking
>>> for winning moves. With handicap, the disadvantaged side is getting
>>> less samples explored, making it less likely to discover the really
>>> good moves. Dynamic komi shifts the bell left or right in order to
>>> equalize the sampling on both sides, but as mentioned it isn't dynamic
>>> enough (the curve changes after each move) and also is actually using
>>> a different shape for the curve than the real "handicap curve".
>>>
>>> In theory, I think that the solution for keeping the same level of
>>> play with handicap as without would be to make sure that the the
>>> disadvantaged side gets just as many samples with or without handicap.
>>> That is, use more playouts when playing with handicap. In practice,
>>> this is probably prohibitive...
>>>
>>> I wonder if it might be possible to estimate the shape of this curve
>>> after each move and use that estimate to dynamically adjust the number
>>> of playouts. One might have to use higher precision calculations, too,
>>> so that the noise doesn't get too loud.
>>>
>>> Does this make any sense? Has anyone tried something like this?
>>>
>>> best regards,
>>> Vlad
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go at dvandva.org
>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>>
>> This is similar to ideas I've had to solve this. Of course, I'm not a
>> go programmer, so it's really just idle curiosity for me when reading
>> this list. Perhaps an ideal scoring function is "highest median
>> score". You can do this in a slow/memory intensive way by storing a
>> histogram of final score in each node instead of wins/losses.
>> Wins/losses can then be computed by summing the histogram buckets
>> above/below the komi level. Computing the median is simply finding the
>> bucket where the sum crosses 50%. Other percentages could be used here
>> as well, though I'm not quite sure what effect this might have.
>>
>> This solves the problem of getting far less than one bit of
>> information per playout when the win rate is very high or low, and
>> also supersedes dynamic komi--if you have the full histogram at any
>> node, you can compute the exact win rate for any arbitrary komi.
>>
>> Using this in practice might be fairly difficult, and would probably
>> involve making the histogram "lossy" in some sense. I would be
>> interested if somebody here could run tests with this as a limit study
>> though--keep the number of nodes allocated and playouts per move
>> constant, and see the effect on strength.
>>
>> Zach
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at dvandva.org
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at dvandva.org
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go