[Computer-go] dealing with multiple local optima
Darren Cook
darren at dcook.org
Mon Feb 27 07:30:50 PST 2017
> But those video games have a very simple optimal policy. Consider Super Mario:
> if you see an enemy, step on it; if you see a whole, jump over it; if you see a
> pipe sticking up, also jump over it; etc.
A bit like go? If you see an unsettled group, make it live. If you have
a ko, play a ko threat. If you see have two 1-eye groups near each
other, join them together. :-)
Okay, those could be considered higher-level concepts, but I still
thought it was impressive to learn to play arcade games with no hints at
all.
Darren
>
> On Sat, Feb 25, 2017 at 12:36 AM, Darren Cook <darren at dcook.org
> <mailto:darren at dcook.org>> wrote:
>
> > ...if it is hard to have "the good starting point" such as a trained
> > policy from human expert game records, what is a way to devise one.
>
> My first thought was to look at the DeepMind research on learning to
> play video games (which I think either pre-dates the AlphaGo research,
> or was done in parallel with it): https://deepmind.com/research/dqn/
> <https://deepmind.com/research/dqn/>
>
> It just learns from trial and error, no expert game records:
>
> http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge
> <http://www.theverge.com/2016/6/9/11893002/google-ai-deepmind-atari-montezumas-revenge>
>
More information about the Computer-go
mailing list