[computer-go] More UCT / Monte-Carlo questions

Mark Boon tesujisoftware at gmail.com
Tue Feb 5 12:12:49 PST 2008


On 5-feb-08, at 17:08, David Fotland wrote:

> Can you elaborate on what is in a node, and what you mean by  
> expand?   I assume you have simple node, where each node represents  
> a position and the single move to get there.  Then when you find a  
> node with no children and expand it, you allocate up to 81 new  
> nodes, but you only make one random playout.  If uct picks a  
> different leaf next time, you end up with most of the leaf nodes  
> never visited.  In this case you run out of memory quickly.  If  
> there are a few hundred K playouts to pick a move, and 90% of the  
> leaves have no visits, then you need over a million nodes of memory.
>
>
>
OK, I implemented something that seemed to make sense to me, but is  
maybe different from others. Each node keeps what I call a state.  
This basically consists of a list of empty points in the current  
position (a bit more but not relevant now, used for clever tricks  
later). Expanding first checks if the current node has unplayed empty  
points. If it does, it selects the next one, randomly or by tactics,  
removes it from the list and plays it and then does a playout. After  
that it selects the best one of all the expanded nodes, based on the  
UCT value, and the process repeats.

The actual code is below.

Do most people do something different? I see no reason to create 81  
nodes at once. Or do you consider the list of empty points as the  
same thing? At least a list of empty points is a lot smaller than a  
list of nodes. But also note I have far fewer lists of empty points  
than nodes because it only creates it if the level below is complete.  
I allocate 512Mb to my program and have yet to run out of memory.  
Maybe it would if I let it run much longer than 5 seconds, but I  
expect playout only to slow down from here. Or maybe I have  
implemented something entirely different from UCT? :-)

Mark



public class UCTSearch<MoveType extends Move>
	implements Search<MoveType,UCTSearchResult<MoveType>>
{
	private static Logger _logger = Logger.getLogger(UCTSearch.class);
	
	private MonteCarloAdministration<MoveType> _mcCurrentAdministration;
	private MonteCarloAdministration<MoveType> _mcAdministration;
	private int _secondsPerMove;
	private TreeNode<UCTSearchResult<MoveType>> _rootNode;
	private int _nrPlayouts;
	
	public UCTSearch(MonteCarloAdministration<MoveType> administration)
	{
		_mcCurrentAdministration = administration;
		initRoot();
	}
	
	/* (non-Javadoc)
	 * @see tesuji.games.general.search.Search#doSearch()
	 */
	public UCTSearchResult<MoveType> doSearch(byte startColor)
	{
		assert _mcCurrentAdministration.isConsistent();

		_mcCurrentAdministration.setStartingColor(startColor);
		_mcAdministration = _mcCurrentAdministration.createClone();

		_nrPlayouts = 0;

		long t1 = System.currentTimeMillis();
		long t2;
		long timeLimit = _secondsPerMove*1000;
		
		do
		{
			_mcAdministration.copyDataFrom(_mcCurrentAdministration);
			expandBestNode(_rootNode, startColor);
			t2 = System.currentTimeMillis();
		}
		while ((t2-t1)<timeLimit);
				
		_logger.info("Nr playouts "+_nrPlayouts);
		
		assert _mcAdministration.isConsistent();
		
		TreeNode<UCTSearchResult<MoveType>> bestNode = getBestChildNode 
(_rootNode);
		if (bestNode==null)
			return null;
		return bestNode.getContent();
	}
	
	private void expandBestNode(TreeNode<UCTSearchResult<MoveType>>  
node, byte moveColor)
	{
		if (!node.getContent().isCompleteNode())
		{
			MCState mcState = node.getContent().getMCState();
			if (mcState==null)
			{
				mcState = MCStateFactory.createMCState();
				mcState.getEmptyPoints().copyFrom(_mcAdministration.getEmptyPoints 
());
				node.getContent().setMCState(mcState);
			}
			
			MoveType nextMove = _mcAdministration.selectMoveAndRemove 
(moveColor,mcState);
			
			if (nextMove.isInitialised())
				if (nextMove.isPass())
					node.getContent().setCompleteNode(true);
			
			TreeNode<UCTSearchResult<MoveType>> nextNode =
				(TreeNode<UCTSearchResult<MoveType>>)  
TreeNodeFactory.createTreeNode();
			UCTSearchResult<MoveType> result =
				(UCTSearchResult<MoveType>)  
SearchResultFactory.createUCTSearchResult(node.getContent 
().getLogNrPlayouts());

			nextNode.setContent(result);
			result.setMove(nextMove);
			
			_mcAdministration.play(nextMove);
			
			boolean win;
			if (_mcAdministration.getNrPasses()==1)
			{
				// Only allow pass when winning.
				byte winner = _mcAdministration.getWinner();
				if (winner!=nextMove.getColor())
				{
					nextNode.recycle();
					return;
				}
				win = (winner==BLACK);				
			}
			else if (_mcAdministration.getNrPasses()==2)
				win = (_mcAdministration.getWinner()==BLACK);
			else
				win = _mcAdministration.playout();

			_nrPlayouts++;
			result.increasePlayouts(win);

			node.add(nextNode);
			adjustTreeValue(node, nextNode,win);
		}
		else
		{
			TreeNode<UCTSearchResult<MoveType>> bestNode = getBestUCTChildNode 
(node);

			if (bestNode==null)
			{
				boolean win = (_mcAdministration.getWinner()==BLACK);
				node.getContent().increasePlayouts(win);
				if (node.getParent()!=null)
					adjustTreeValue(node.getParent(), node, win);
				return;
			}
			
			_mcAdministration.play(bestNode.getContent().getMove());
			
			if (_mcAdministration.getNrPasses()<2)
				expandBestNode(bestNode, opposite(moveColor));
			else
			{
				boolean win = (_mcAdministration.getWinner()==BLACK);
				bestNode.getContent().increasePlayouts(win);
				assert (bestNode.getParent()==node); // node should be parent of  
bestNode
				adjustTreeValue(node, bestNode, win);
			}
		}
	}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20080205/b6bb9bc4/attachment.htm


More information about the computer-go mailing list