[Computer-go] Fuego 04 native Windows

Fuming Wang fumingw85 at gmail.com
Thu Jun 17 07:13:35 PDT 2010


Hi Rene,

You guesses about our FPGA implementation are quite to the point. The 167
games are moving through the 167 pipelined stages of one module instead of
167 modules.

As this material is a cross between digital circuit design and computer
gaming, not quite sure which refereed journal is most suitable for this
material. Do you or any readers of this list has any suggestions?

Thanks,
Fuming


On Thu, Jun 17, 2010 at 8:09 AM, René van de Veerdonk <
rene.vandeveerdonk at gmail.com> wrote:

> Hi Fuming,
>
> Thanks for your answer, it makes much more sense to me now.
>
> We are using pipelining in different ways. When I referred to it for a
> CPU-based single-threaded application, I was thinking about speculative
> execution. If I understand it correctly, that does not exist in FPGA's, as
> these are advertised as deterministic in their execution and process flow.
> In the FPGA case, I imagine that pipelining refers to "unrolling the
> program", and having different boards physically move across the chip from
> module to module, as if they are on a production line, all in various states
> of simulation (board #11 at module #101: black to move; board #12@ module
> #100: white to move; etc.).
>
> How you have designed your program in detail would be an interesting read,
> there are a lot of high-level design trade-offs that you must have dealt
> with. These will be very different from how you would do it for a CPU-based
> program. One difference that I imagine, for instance, is the length of the
> simulation. A CPU-based program stops when the game ends (or you exceed some
> limit, or you force an early decision, or ...), whereas for FPGA you may end
> up with a fixed game-length (ready or not, i.e., no early out option) and
> you may have to simulate pass moves until you reach the end of the
> "production line" in case the game ended early (is this what you do?). In
> any case, your impressive numbers suggest that this can be done very
> efficiently. How you harness all this raw simulation power in a tree-search
> is yet another research topic that is very interesting and almost
> orthogonal. Do you think your approach could be mapped to a GPU as well? In
> any case, I hope you will make a pre-print available to this list when the
> time is there.
>
> In another response in this thread, you mention that you are simulating 167
> board in parallel. Does that mean that you unrolled your program for 167
> moves, moving a board between 167 separate modules every "cycle" and
> seed/harvest one complete board per "cycle"? Or do you have multiple
> (shorter) production lines in parallel? Or something else entirely?
>
> As you may have noticed, I am looking forward to your paper,
>
> René
>
> On Tue, Jun 15, 2010 at 7:03 PM, Fuming Wang <fumingw85 at gmail.com> wrote:
>
>> Hi Rene,
>>
>> Our design is fully pipelined, so we are able to simulate multiple games
>> simultaneously. The way way in which simulations are run in FPGA and in CPU
>> is quite different, so direct comparison is not easy. If we want to simulate
>> just one game, FPGA implementation is not 10x faster, however, if we want
>> thousands of games simulated for a single board position, than FPGA is 10x
>> faster. So, we are getting 1500k GAMES/sec, but only in the second sense.
>> The clock rate of our FPGA board is only 125 MHz, so with better board/chip,
>> we will still have 10-100 times improvement over the current result.
>>
>> best,
>> Fuming
>>
>>
>> On Wed, Jun 16, 2010 at 1:28 AM, René van de Veerdonk <
>> rene.vandeveerdonk at gmail.com> wrote:
>>
>>> Fuming,
>>>
>>> Could you please explain your approach a little bit? From the numbers you
>>> quote, this sounds extreme positive, but I have a hard time understanding
>>> how you achieve them. Taking 100k playouts/sec for 9x9 on my 2.4 GHz labtop
>>> for my single-threaded bitmap based light-playout implementation as an
>>> example, with 110 moves/playout, this results in a little less than 240
>>> clock-cycle/move. When I quickly looked up the Cyclone III specification, I
>>> saw that the clock-speed for this FPGA tops out around 240 MHz, yet you
>>> achieve 15x the throughput, i.e., you are 150x more efficient. This means
>>> 1.8 clock-cycle/move. Without being able to make use of pipe-lining inside
>>> the CPU (someone measured ~2 assembly instructions/clock-cycle for my bitmap
>>> approach), this leads me to questions. First, are you running a single
>>> threaded application, or playing on multiple boards at once? Second, are you
>>> just replaying moves, or also generating them on the fly (about half of the
>>> time is spend there in my implementation, more if you include updating the
>>> data-structure to make that possible)? Third, are we using the same
>>> definitions?
>>>
>>> For instance, I would find it much more comprehensible to believe that
>>> you achieved to do 1500k moves/second instead of 1500k playouts/sec (with
>>> each playout being ~110 moves). 200 clock-cycles/move sounds do-able if you
>>> can avoid branching, memory lookups, or miscellaneous calculations by
>>> creating fine-level parallelism in your FPGA-code and specializing functions
>>> on a per grid-point basis. In a CPU-based application, this results in
>>> code-bloat that will become counter-productive at some stage, may not be
>>> feasible in all instances, and is more difficult to maintain. For an
>>> FPGA-based application, however, this sounds entirely possible (not knowing
>>> anything about FPGA's).
>>>
>>> Thanks,
>>>
>>> René van de Veerdonk
>>>
>>>
>>> On Sat, Jun 12, 2010 at 10:37 AM, Fuming Wang <fumingw85 at gmail.com>wrote:
>>>
>>>>
>>>> Cyclone III
>>>>  120,000 logical elements
>>>> cycle time is linear to the number of moves to finish a game, which is
>>>> approximately linear to the square of the board size.
>>>>
>>>> Fuming
>>>>
>>>>
>>>>> - What FPGA? Virtex-6? Spartan-6?
>>>>> - What size is the core in LUT's?
>>>>> - Is your cycle time linear in the board size or in the number of
>>>>> squares (i.e. quadratic in board size)? Or something else?
>>>>>
>>>>> --
>>>>> GCP
>>>>> _______________________________________________
>>>>> Computer-go mailing list
>>>>> Computer-go at dvandva.org
>>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Computer-go mailing list
>>>> Computer-go at dvandva.org
>>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>>
>>>
>>>
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go at dvandva.org
>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>>
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at dvandva.org
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20100617/c2657d96/attachment.html>


More information about the Computer-go mailing list