[Computer-go] Deep Blue the end, AlphaGo the beginning?

Gian-Carlo Pascutto gcp at sjeng.org
Fri Aug 18 14:12:10 PDT 2017


On 18/08/2017 20:34, Petr Baudis wrote:
>   You may be completely right!  And yes, I was thinking about Deep Blue
> in isolation, not that aware about general computer chess history.  Do
> you have some suggested reading regarding Deep Blue and its lineage and
> their contributions to the field of AI at large?

I sure do. I *love* Hsu's papers, although it's hard now to imagine the
context some of his statements were made in, 30 year ago, because we
know the outcome. Back then, I'm sure many people thought he was a
raving lunatic.

His key idea was that, given ample of evidence program strength scaled
fairly well with computer speed, we should make the programs faster. He
did so by converting chess search to a literal hardware circuit, in an
implementation that was both algorithmically more efficient, and faster,
achieving about *3 orders of magnitude* improvement over what was then
the state of the art. And then designing a parallel search that worked
with it and scaled well enough.

Saying that these "implemented existing methods" is factually wrong, and
betrays a deep misunderstanding of the importance of computing power in
AI research. But I'll get back to this, later.

The original paper, "A Two-Million Moves/s CMOS Single-Chip Chess Move
Generator" was published in 1987. In the conclusion, it states "The best
chess machine now in existence is still about 400 to 500 rating points
below the Human World Chess Champion. Earlier experimental evidence
shows that each doubling in machine speed roughly corresponds to a 100
rating points increase...It is questionable that this remains true at
high level play. But nonetheless with a potential 100-1000 fold speed-up
at the door, *something interesting is probably about to happen*."

In this PhD thesis, he goes further, and draws the scaling graph with
Kasparov on it, at the end, and says in the introduction: "This
dissertation is mainly a collection of exploratory work on what I call
the "ultimate" chess machine - a chess machine that is capable of
searching at least 100 million nodes per second and possibly beyond 1
billion nodes per second. Current evidence seems to indicate that such a
machine will have an overwhelming chance of defeating the human World
Chess Champion."

He wrote that in 1989!

Kasparov, the same expert whose claims about Go and chess started this
very thread, had said the year before that no Grandmaster would be
defeated in tournament play before the year 2000. That gives you some
idea how outlandish Hsu's ideas seemed at the time. Or, for that matter,
how reliable Kasparov's opinion is in these matters.

Hsu achieved his goal in 1997, with 3 years to spare. Kasparov's
response was to call him a cheater.

Now, now, you might be thinking, was it all about speed? It was not -
the above was just Hsu's shtick, who was one member of the team. But, do
not for a moment make the mistake of underestimating just how important
speed is.

Do you know why, decades after having discarded them, we suddenly
started using neural networks again, and, well, do they turn out to work
well for Go now?

It's because we have several orders of magnitude more computing power.
Made possible by dedicated chips for neural network computations (OK, so
maybe they were intended for computer games - turns out the functions
are pretty similar, not to speak of TPUs).

And Hsu? He's working on FPGA's at Microsoft, who're mainly using them
to accelerate AI research and applications. In one of his last
interviews, in 2007, he predicted that "world-champion-level Go machine
can be built within 10 years." He got the details of the approach wrong,
though.

Others members also published several papers, i.e. Murray Campbell,
Thomas Anantharaman and Andreas Nowatzyk.

Nowatzyk has published the original automated evaluation tuning code
used by Deep Though. It's available, together with an explanation, at
http://www.tim-mann.org/deepthought.html

This was significant, because software based programs at the time had to
trade off evaluation terms for speed, so they mostly didn't have very
few, and could rely on manual tuning. Existing methods, you say?

Anantharaman's most known work is the publication of Singular
Extensions. The contribution of this method is somewhat hazy - with Hsu
admitting that they overestimated the initial gain from them due to
measurement errors - but improved methods are in fact in use in current
top of the line chess engines.

Campbell has published on a bunch of subjects. A ton on parallel game
tree search, and a method for biasing the tree search based on human
games. We call that a policy net, nowadays. Ok, maybe I'm stretching a
bit here.

Now, in as to how these methods "contributed to the AI field at large",
which I interpret as asking how well they generalize, that's an
interesting question. But it's also an interesting question that you can
ask of AlphaGo's contributions. Doing move prediction with a DCNN was
first done by Clark and Storkey of the university of Edinburgh, though
the timing (IIRC) indicated DeepMind was investigating the idea
independently, and with a better technique (and more computing power, to
repeat that pet peeve). Aside from that, the contribution (from my
perspective) is the value net and how to combine it with MCTS. Value
networks might have been described on computer-go before, but it's not
fair to equate voicing an idea with demonstrating a high performance
implementation of it, so as far as I can tell that was a novel, major
breakthrough from Aja's team [1]. The combo of SL and RL for generating
the value net data is interesting and AFAIK also novel, but it doesn't
seem as essential to the result.

So I think that, if you consider only the *new* techniques AlphaGo
introduced, you're not going to find *that* much wide applicability
either. It was more an effective demonstration that the RL and DCNN
toolbox has become powerful enough to overcome problems once thought
intractable.

In other words, they showed of just how sharp their tools are.

[1] While writing this, I noticed something interesting I hadn't seen
before. Their paper concludes the value network is 200 ELO weaker than
doing Monte Carlo rollouts when using the policy network, but it's 200
ELO stronger than Monte Carlo rollouts when using the fast tree policy.
That almost looks like a mistake in the table.

-- 
GCP


More information about the Computer-go mailing list