[Computer-go] Understanding statistics for benchmarking

Urban Hafner contact at urbanhafner.com
Tue Nov 3 05:50:23 PST 2015


Yes, I noticed that too. But luckily that's the one thing I didn't even consider doing. Running the same number of games feels like the most natural thing to do anyway. 

Von meinem iPhone gesendet

> Am 03.11.2015 um 14:22 schrieb Petr Baudis <pasky at ucw.cz>:
> 
>> On Tue, Nov 03, 2015 at 09:46:00AM +0100, Rémi Coulom wrote:
>> The intervals given by gogui are the standard deviation, not the usual 95%
>> confidence intervals.
>> 
>> For 95% confidence intervals, you have to multiply the standard deviation by
>> two.
>> 
>> And you still have the 5% chance of not being inside the interval, so you
>> can still get the occasional non-overlapping intervals.
>> 
>> Likelihood of superiority is an interesting statistical tool:
>> https://chessprogramming.wikispaces.com/LOS+Table
>> 
>> For more advanced tools for deciding when to stop testing, there is SPRT:
>> http://www.open-chess.org/viewtopic.php?f=5&t=2477
>> https://en.wikipedia.org/wiki/Sequential_probability_ratio_test
> 
> An important corollary to this (noted on this list every few years)
> is that in the most naive scenario where your statistical test is just
> SD-based overlap after N games, you should fix your N number of games
> in advance and not rig it by terminating out of schedule.  If you look
> at the progress of your playtesting often, you could spot a few moments
> where the intervals do not overlap, enve if in the long run they
> typically would.
> 
> (The situation is a bit dire if you have limited computing resources.
> I admit that sometimes I didn't follow the above myself in less formal
> exploratory experiments, but at least I tried to look only
> "infrequently", e.g. single check every few hours, only at "round"
> numbers of playouts, etc.  I hope it's not a grave sin.)
> 
> -- 
>                Petr Baudis
>    If you have good ideas, good data and fast computers,
>    you can do almost anything. -- Geoffrey Hinton
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



More information about the Computer-go mailing list