[Computer-go] Understanding statistics for benchmarking
Rémi Coulom
remi.coulom at free.fr
Tue Nov 3 00:46:00 PST 2015
The intervals given by gogui are the standard deviation, not the usual
95% confidence intervals.
For 95% confidence intervals, you have to multiply the standard
deviation by two.
And you still have the 5% chance of not being inside the interval, so
you can still get the occasional non-overlapping intervals.
Likelihood of superiority is an interesting statistical tool:
https://chessprogramming.wikispaces.com/LOS+Table
For more advanced tools for deciding when to stop testing, there is SPRT:
http://www.open-chess.org/viewtopic.php?f=5&t=2477
https://en.wikipedia.org/wiki/Sequential_probability_ratio_test
Rémi
On 11/03/2015 09:38 AM, Urban Hafner wrote:
> So,
>
> I’m currently running 200 games against GnuGo to see if a change to my
> program made a difference. But I now wonder if that’s enough games as
> I ran the same benchmark with the same code (but a different compiler
> version) and received different results:
>
> 85.5% wins (171 games of 200) the first time (+/- 2.5 according to
> gogui-twogtp)
> 79.0% wins (158 games of 200) the second time (+/- 2.9 according to
> gogui-twogtp)
>
> Looking at these results would make me believe that the difference is
> significant (the intervals don’t overlap) but then the real difference
> is only 13 wins …
>
> My statistics knowledge is sketchy at best but assuming that what
> gogui-twogtp calculates is the 95% confidence interval (I’m pretty
> sure I’m mixing terms here) it could well be that the difference
> between the two runs above is just random.
>
> So, this leads me to two questions:
>
> 1. How many games do you normally run to test if a change is
> significant “enough”?
> 2. Any good resources on how to calculate these statistics (i.e. if I
> wanted to find the error margin for a 99% confidence interval)?
>
> Urban
> --
> Blog: http://bettong.net/
> Twitter: https://twitter.com/ujh
> Homepage: http://www.urbanhafner.com/
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
More information about the Computer-go
mailing list