[computer-go] Fast Board implementation
Łukasz Lew
lukasz.lew at gmail.com
Fri Dec 15 16:47:55 PST 2006
On 12/16/06, Don Dailey <drd at mit.edu> wrote:
> Another report on D programming. It appears that D programs are about
> 1.5X slower in general - at least for what I'm doing.
>
> At one point I reported about 7% slower, but that was a mistake on my
> part.
>
> The difference in programming effort and just plain clean code is quite
> large - D was well designed and after doing a lot of programming in D,
> it's a real drag going back to C!
Is it?
I mean what features of D You've used that make it inconvenient to go back to C?
>
> But it appears that you can have your cake and eat it too by just making
> a low level library in C, and doing most of the other stuff in D. D
> and C are link compatible and there is no effort in making this work -
> it just works.
>
> I have indeed been able to experiment more quickly and easily doing it
> in D, so I have come up with what seems like a nice formula:
>
> 1. Start by doing everything in D.
> 2. Optimize and experiment.
> 3. When you are happy, redo the low level stuff in C. It will
> link
> right in.
> 4. You can still continue to experiment.
>
> Porting to C from D is quite easy because D is pretty much a cleaned up
> version of C. It's easy to make this port when you know exactly what
> you want.
>
> I've been able to do a substantial amount of experimentation using D and
> the D program is now just slightly faster than the old C program due to
> the improved
> data structures. I should be able to get close to 25,000 games per
> second if
> I get the 1.5 X improvement going to C. This is only about 3K of code
> that will
> have to be written in C, where much of the code is identical to C
> anyway.
>
> - Don
>
>
> On Thu, 2006-12-14 at 22:51 -0500, Luke Gustafson wrote:
> > Are you using the union-find algorithm with path compression for joining
> > strings? It's very simple and very fast.
> > http://en.wikipedia.org/wiki/Disjoint-set_data_structure#Disjoint-set_forests
> >
> > The other main thing to consider is reducing branch mispredictions, which
> > (I'm guessing) could very well use more clock cycles on a P4 than performing
> > the actual calculations themselves. I think this is why the next_stone list
> > is faster than using e.g. a recursive algorithm, which involves a lot of
> > if's that cannot be predicted.
> >
> > Compact data I think is unnecessary, although perhaps eliminating the String
> > struct would speed things up a tad. In fact, on a P4, shift operations are
> > slower than L1 cache memory retrievals, so fancy packing of data is not
> > worthwhile. As long as all data fits in the L1 cache it should be max
> > speed.
> >
> > I am working on a program myself, and I will be porting it to assembly at
> > some point; right now I am just experimenting with the algorithm design in
> > C. (Although this happens pretty slowly since I am very busy with work.) I
> > think that this sort of algorithm is perfect for assembly language, where
> > hand-optimization can be much better than compiler optimization. I will let
> > you know how it turns out compared to Lukasz :)
> >
> > >> At the time of Your post I've had it already implemented and regarded
> > >> it like "my sweet secret" :)
> > >
> > > I don't know how sweet this secret is, but it does help. I just
> > > implemented it in Lazarus and I get about a 20% speedup. That's not
> > > bad, but nothing to write home about. To a chess programmer this is
> > > rather enormous! I am guessing that the win would be much greater on
> > > bigger boards.
> > >
> > > In principle it would probably be FAR faster, but it adds extra
> > > complexity,
> > > operations and overhead to executing a move which subtracts some from
> > > the
> > > savings.
> > >
> > > The question is: "what is the best way to implement it?" The problem
> > > is that you now must track pseudo liberty counts for every chain on
> > > the board. This is more complicated obviously.
> > >
> > > Every time a stone is placed or removed, 4 additional entries must be
> > > accessed in order to update the pseudo liberty counts.
> > >
> > > It's still a good trade-off because the testing for whether a move
> > > is a capture or suicide is cheap.
> > >
> > > I'm reviewing my design now and I think I see some ways to improve.
> > >
> > > Here is the current data structure (assuming the 9x9 board):
> > >
> > > 1. A 10x11 board (actually 10x11 plus 1 for diagonal checking.)
> > > 0 = empty
> > > 1 = white
> > > 2 = black
> > > 4 = off-board points.
> > >
> > > 2. Another 10x11 board-like array - if there is a occupied point on
> > > the board this array contains the index (hash key) of a "string"
> > > data type.
> > >
> > > 3. A list of string data types (a struct in C):
> > > 1. stone count
> > > 2. pseudo liberties for the whole string
> > > 3. a reference stone.
> > >
> > > To save time, the list of strings is really more like a hash table,
> > > not a proper list. It's a perfect hash table because the keys will
> > > never collide. I use the reference stone as the key for each string
> > > which is generally the first stone placed in the string. So the hash
> > > key computation is just a table lookup and is the only thing the
> > > second table is used for.
> > >
> > > So to summarize, I really have 3 10x11 arrays, one of them is an array
> > > of structs in C (actually in D) and is viewed as a hash table.
> > >
> > > I could pack all of the information into a single array. Here is one
> > > possible scheme which fits in 32 bits and accommodates all board sizes:
> > >
> > > 3 bits to identify white/black/empty/off-board
> > > 9 bits for the hash key (the reference stone)
> > > 9 bits for pseudo liberty count
> > > 9 bits for stone count
> > >
> > > It's kind of interesting when you think about it. A single array does
> > > triple duty as a:
> > >
> > > 1. Go board.
> > > 2. Hash table.
> > > 3. lookup table - to get the "hash key" (table index) of the data
> > > for
> > > the string that occupies this point.
> > >
> > > The hash table bits in the table are usually irrelevant, it is only
> > > valid for the single point that represent the hash table entry for a
> > > given string.
> > >
> > > So this scheme does no list manipulations and has a fixed size in
> > > memory. It's compact and I think fast. Is there anything obvious I'm
> > > missing that simplifies things?
> > >
> > > By the way, stone count field is helpful but not necessary. But it's
> > > trivial to keep updated and it's convenient.
> > >
> > > Using the compact representation would slow down some of the
> > > operations which are in fast inner loops now, but would probably be a
> > > win overall, perhaps a big win because I'm not manipulating several
> > > different arrays.
> > >
> > > So I will also try this compact implementation unless someone has a
> > > superior scheme to suggest.
> > >
> > >
> > > - Don
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, 2006-12-11 at 20:00 +0100, Łukasz Lew wrote:
> > >> At the time of Your post I've had it already implemented and regarded
> > >> it like "my sweet secret" :) , so I was expecting that when You
> > >> published it everybody will appreciate, and start using. But I guess
> > >> it wasn't easy to see benefits.
> > >> I wonder how many really good ideas came through this list unnoticed.
> > >>
> > >> Best Regards,
> > >> Lukasz
> > >>
> > >>
> > >> On 12/11/06, House, Jason J. <jhouse at mitre.org> wrote:
> > >> > Wow. Did some of my early posts on liberties/chains actually get used
> > >> > by someone? Or did the idea of pseudo-liberties and union sets exist
> > >> > before my post(s) on the topic? I remember a lot of negative feedback
> > >> > on pseudo-liberties at the time of my post.
> > >> >
> > >> > -----Original Message-----
> > >> > From: computer-go-bounces at computer-go.org
> > >> > [mailto:computer-go-bounces at computer-go.org] On Behalf Of Lukasz Lew
> > >> > Sent: Monday, December 11, 2006 1:31 PM
> > >> > To: drd at mit.edu
> > >> > Cc: computer-go
> > >> > Subject: Re: [computer-go] Fast Board implementation
> > >> >
> > >> > Pseudo liberties of a group is a sum of liberties of each stone,
> > >> > example:
> > >> >
> > >> > OOOOO
> > >> > OXXXO
> > >> > OX.XO
> > >> > OXXXO
> > >> > OOOOO
> > >> >
> > >> > "X" group have 4 pseudo liberties.
> > >> >
> > >> > If You merge two groups just add pseudo liberties.
> > >> > If PL = 0 then group should be removed.
> > >> >
> > >> > This is simple and sufficient :)
> > >> >
> > >> > Lukasz
> > >> > On 12/11/06, Don Dailey <drd at mit.edu> wrote:
> > >> > >
> > >> > >
> > >> > > On Mon, 2006-12-11 at 18:22 +0100, Łukasz Lew wrote:
> > >> > > > - pseudo liberties at top of union-set tree
> > >> > >
> > >> > >
> > >> > > Refresh my memory on this. I remember talking about this a long
> > >> > time
> > >> > > ago. A psuedo liberty is an upper bound on how many liberties there
> > >> > are
> > >> > > for a given string, correct? Sometimes a liberty gets counted twice
> > >> > or
> > >> > > more?
> > >> > >
> > >> > > But if it goes to zero or one it's correct for any given string?
> > >> > >
> > >> > > Is the idea that it's lightning fast to update incrementally?
> > >> > >
> > >> > > - Don
> > >> > >
> > >> > >
> > >> > >
> > >> > _______________________________________________
> > >> > computer-go mailing list
> > >> > computer-go at computer-go.org
> > >> > http://www.computer-go.org/mailman/listinfo/computer-go/
> > >> >
> > >> _______________________________________________
> > >> computer-go mailing list
> > >> computer-go at computer-go.org
> > >> http://www.computer-go.org/mailman/listinfo/computer-go/
> > >
> > > _______________________________________________
> > > computer-go mailing list
> > > computer-go at computer-go.org
> > > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
More information about the computer-go
mailing list