[Computer-go] Understanding statistics for benchmarking

Rémi Coulom remi.coulom at free.fr
Tue Nov 3 00:46:00 PST 2015


The intervals given by gogui are the standard deviation, not the usual 
95% confidence intervals.

For 95% confidence intervals, you have to multiply the standard 
deviation by two.

And you still have the 5% chance of not being inside the interval, so 
you can still get the occasional non-overlapping intervals.

Likelihood of superiority is an interesting statistical tool:
https://chessprogramming.wikispaces.com/LOS+Table

For more advanced tools for deciding when to stop testing, there is SPRT:
http://www.open-chess.org/viewtopic.php?f=5&t=2477
https://en.wikipedia.org/wiki/Sequential_probability_ratio_test

Rémi

On 11/03/2015 09:38 AM, Urban Hafner wrote:
> So,
>
> I’m currently running 200 games against GnuGo to see if a change to my 
> program made a difference. But I now wonder if that’s enough games as 
> I ran the same benchmark with the same code (but a different compiler 
> version) and received different results:
>
> 85.5% wins (171 games of 200) the first time (+/- 2.5 according to 
> gogui-twogtp)
> 79.0% wins (158 games of 200) the second time (+/- 2.9 according to 
> gogui-twogtp)
>
> Looking at these results would make me believe that the difference is 
> significant (the intervals don’t overlap) but then the real difference 
> is only 13 wins …
>
> My statistics knowledge is sketchy at best but assuming that what 
> gogui-twogtp calculates is the 95% confidence interval (I’m pretty 
> sure I’m mixing terms here) it could well be that the difference 
> between the two runs above is just random.
>
> So, this leads me to two questions:
>
> 1. How many games do you normally run to test if a change is 
> significant “enough”?
> 2. Any good resources on how to calculate these statistics (i.e. if I 
> wanted to find the error margin for a 99% confidence interval)?
>
> Urban
> -- 
> Blog: http://bettong.net/
> Twitter: https://twitter.com/ujh
> Homepage: http://www.urbanhafner.com/
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go




More information about the Computer-go mailing list