Computer Gaming

Ranking and Matchmaking

The TrueSkill ranking system is a skill based ranking system for Xbox Live developed at Microsoft Research. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO. What makes TrueSkill unique is that it takes a fully Bayesian treatment of the problem using approximate message passing as the inference scheme (namely, Expectation Propagation). Using this approach, it is not only possible to fully quantify the uncertainty in every gamer’s skill at each point in time but also to link skills together through time and compare the relative strength of players across centuries,

  • P. Dangauthier, R. Herbrich, T. Minka, and T. Graepel, „TrueSkill Through Time: Revisiting the History of Chess,“ in Advances in Neural Information Processing Systems 20, 2007, p. 931–938.
    [BibTeX] [Abstract] [Download PDF]

    We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate player-specific draw margins. Based on these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players‘ lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player’s ability to force a draw provides significantly better predictive power.

    @inproceedings{dangauthier2007ttt,
    abstract = {We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate player-specific draw margins. Based on these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players' lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player's ability to force a draw provides significantly better predictive power.},
    author = {Dangauthier, Pierre and Herbrich, Ralf and Minka, Tom and Graepel, Thore},
    booktitle = {Advances in Neural Information Processing Systems 20},
    pages = {931--938},
    publisher = {The MIT Press},
    title = {TrueSkill Through Time: Revisiting the History of Chess},
    url = {https://www.herbrich.me/papers/ttt.pdf},
    year = {2007}
    }

  • R. Herbrich, T. Minka, and T. Graepel, „TrueSkill(TM): A Bayesian Skill Rating System,“ in Advances in Neural Information Procesing Systems 19, 2006, p. 569–576.
    [BibTeX] [Abstract] [Download PDF]

    We present a new Bayesian skill rating system which can be viewed as a generalisation of the Elo system used in Chess. The new system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. Inference is performed by approximate message passing on a factor graph representation of the model. We present experimental evidence on the increased accuracy and convergence speed of the system compared to Elo and report on our experience with the new rating system running in a large-scale commercial online gaming service under the name of TrueSkill.

    @inproceedings{herbrich2006trueskill,
    abstract = {We present a new Bayesian skill rating system which can be viewed as a generalisation of the Elo system used in Chess. The new system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. Inference is performed by approximate message passing on a factor graph representation of the model. We present experimental evidence on the increased accuracy and convergence speed of the system compared to Elo and report on our experience with the new rating system running in a large-scale commercial online gaming service under the name of TrueSkill.},
    author = {Herbrich, Ralf and Minka, Tom and Graepel, Thore},
    booktitle = {Advances in Neural Information Procesing Systems 19},
    pages = {569--576},
    publisher = {The MIT Press},
    title = {{TrueSkill(TM)}: A {Bayesian} Skill Rating System},
    url = {https://www.herbrich.me/papers/trueskill.pdf},
    year = {2006}
    }

  • T. Graepel and R. Herbrich, „Ranking and Matchmaking,“ Game Developer Magazine, iss. 10, 2006.
    [BibTeX] [Abstract] [Download PDF]

    Online game modes are gradually becoming the standard for console games, rather than an added bonus. But how do you get the right players playing against the right people, close enough in skill to be challenging, but not so difficult as to be frustrating?

    @article{graepel2006ranking,
    abstract = {Online game modes are gradually becoming the standard for console games, rather than an added bonus. But how do you get the right players playing against the right people, close enough in skill to be challenging, but not so difficult as to be frustrating?},
    author = {Graepel, Thore and Herbrich, Ralf},
    journal = {Game Developer Magazine},
    number = {10},
    title = {Ranking and Matchmaking},
    url = {https://www.herbrich.me/papers/Game-Developer-Feature-Article-Graepel-Herbrich.pdf},
    year = {2006}
    }

Computer Go

The game of Go is an ancient Chinese game of strategy for two players. By most measures of complexity it is more complex than Chess. While Deep Blue (and more recently Deep Fritz) play Chess at the world champion’s level, only recently DeepMind has demonstrated in a system called AlphaGo that Go-playing program can beat the best human players. The reason that it took over 20 years to replicate the successes of DeepBlue for the game of Go appear to lie in its greater complexity, both in terms of the number of different positions and in the difficulty of defining an appropriate evaluation function for Go positions.

We have worked on automated ways of acquiring Go knowledge using machine learning to radically improve on these two problems. Numerous Go servers in the internet offer thousands of game records of Go played by players that are very competent as compared to today’s computer Go programs. The great challenge is to build machine learning algorithms that extract knowledge from these data-bases such that it can be used for playing Go well.

  • D. Stern, R. Herbrich, and T. Graepel, „Learning to Solve Game Trees,“ in Proceedings of the 24th International Conference on Machine Learning, 2007, p. 839–846.
    [BibTeX] [Abstract] [Download PDF]

    We apply probability theory to the task of proving whether a goal can be achieved by a player in an adversarial game. Such problems are solved by searching the game tree. We view this tree as a graphical model which yields a distribution over the (Boolean) outcome of the search before it terminates. Experiments show that a best-first search algorithm guided by this distribution explores a similar number of nodes as Proof-Number Search to solve Go problems. Knowledge is incorporated into search by using domain-specific models to provide prior distributions over the values of leaf nodes of the game tree. These are surrogate for the unexplored parts of the tree. The parameters of these models can be learned from previous search trees. Experiments on Go show that the speed of problem solving can be increased by orders of magnitude by this technique but care must be taken to avoid over-fitting.

    @inproceedings{stern2007gametrees,
    abstract = {We apply probability theory to the task of proving whether a goal can be achieved by a player in an adversarial game. Such problems are solved by searching the game tree. We view this tree as a graphical model which yields a distribution over the (Boolean) outcome of the search before it terminates. Experiments show that a best-first search algorithm guided by this distribution explores a similar number of nodes as Proof-Number Search to solve Go problems. Knowledge is incorporated into search by using domain-specific models to provide prior distributions over the values of leaf nodes of the game tree. These are surrogate for the unexplored parts of the tree. The parameters of these models can be learned from previous search trees. Experiments on Go show that the speed of problem solving can be increased by orders of magnitude by this technique but care must be taken to avoid over-fitting.},
    author = {Stern, David and Herbrich, Ralf and Graepel, Thore},
    booktitle = {Proceedings of the 24th International Conference on Machine Learning},
    pages = {839--846},
    title = {Learning to Solve Game Trees},
    url = {https://www.herbrich.me/papers/p839-stern.pdf},
    year = {2007}
    }

  • D. Stern, R. Herbrich, and T. Graepel, „Bayesian Pattern Ranking for Move Prediction in the Game of Go,“ in Proceedings of the 23rd International Conference on Machine Learning, 2006, p. 873–880.
    [BibTeX] [Abstract] [Download PDF]

    We investigate the problem of learning to predict moves in the board game of Go from game records of expert players. In particular, we obtain a probability distribution over legal moves for professional play in a given position. This distribution has numerous applications in computer Go, including serving as an efficient stand-alone Go player. It would also be effective as a move selector and move sorter for game tree search and as a training tool for Go players. Our method has two major components: a) a pattern extraction scheme for efficiently harvesting patterns of given size and shape from expert game records and b) a Bayesian learning algorithm (in two variants) that learns a distribution over the values of a move given a board position based on the local pattern context. The system is trained on 181,000 expert games and shows excellent prediction performance as indicated by its ability to perfectly predict the moves made by professional Go players in 34\% of test positions.

    @inproceedings{stern2006bayesianpatternranking,
    abstract = {We investigate the problem of learning to predict moves in the board game of Go from game records of expert players. In particular, we obtain a probability distribution over legal moves for professional play in a given position. This distribution has numerous applications in computer Go, including serving as an efficient stand-alone Go player. It would also be effective as a move selector and move sorter for game tree search and as a training tool for Go players. Our method has two major components: a) a pattern extraction scheme for efficiently harvesting patterns of given size and shape from expert game records and b) a Bayesian learning algorithm (in two variants) that learns a distribution over the values of a move given a board position based on the local pattern context. The system is trained on 181,000 expert games and shows excellent prediction performance as indicated by its ability to perfectly predict the moves made by professional Go players in 34\% of test positions.},
    author = {Stern, David and Herbrich, Ralf and Graepel, Thore},
    booktitle = {Proceedings of the 23rd International Conference on Machine Learning},
    pages = {873--880},
    title = {{Bayesian} Pattern Ranking for Move Prediction in the Game of Go},
    url = {https://www.herbrich.me/papers/p873-stern.pdf},
    year = {2006}
    }

  • T. Graepel, M. Goutrié, M. Krüger, and R. Herbrich, „Learning on Graphs in the Game of Go,“ in Proceedings of the International Conference on Artifical Neural Networks, 2001, p. 347–352.
    [BibTeX] [Abstract] [Download PDF]

    We consider the game of Go from the point of view of machine learning and as a well-defined domain for learning on graph representations. We discuss the representation of both board positions and candidate moves and introduce the common fate graph (CFG) as an adequate representation of board positions for learning. Single candidate moves are represented as feature vectors with features given by subgraphs relative to the given move in the CFG. Using this representation we train a support vector machine (SVM) and a kernel perceptron to discriminate good moves from bad moves on a collection of life-and-death problems and on 9×9 game records. We thus obtain kernel machines that solve Go problems and play 9×9 Go.

    @inproceedings{graepel2001go,
    abstract = {We consider the game of Go from the point of view of machine learning and as a well-defined domain for learning on graph representations. We discuss the representation of both board positions and candidate moves and introduce the common fate graph (CFG) as an adequate representation of board positions for learning. Single candidate moves are represented as feature vectors with features given by subgraphs relative to the given move in the CFG. Using this representation we train a support vector machine (SVM) and a kernel perceptron to discriminate good moves from bad moves on a collection of life-and-death problems and on 9x9 game records. We thus obtain kernel machines that solve Go problems and play 9x9 Go.},
    author = {Graepel, Thore and Goutri\'{e}, Mike and Kr\"{u}ger, Marco and Herbrich, Ralf},
    booktitle = {Proceedings of the International Conference on Artifical Neural Networks},
    pages = {347--352},
    title = {Learning on Graphs in the Game of Go},
    url = {https://www.herbrich.me/papers/go.pdf},
    year = {2001}
    }

Learning to Fight

We apply reinforcement learning to the problem of finding good policies for a fighting agent in a commercial computer game. The learning agent is trained using the SARSA algorithm for on-policy learning of an action-value function represented by linear and neural network function approximators. Importance aspects include the selection and construction of features, actions, and rewards as well as other design choices necessary to integrate the learning process into the game. The learning agent is trained against the built-in AI of the game with different rewards encouraging aggressive or defensive behaviour.

  • T. Graepel, R. Herbrich, and J. Gold, „Learning to Fight,“ in Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education, 2004, p. 193–200.
    [BibTeX] [Abstract] [Download PDF]

    We apply reinforcement learning to the problem of finding good policies for a fighting agent in a commercial computer game. The learning agent is trained using the SARSA algorithm for on-policy learning of an action-value function represented by linear and neural network function approximators. We discuss the selection and construction of features, actions, and rewards as well as other design choices necessary to integrate the learning process into the game. The learning agent is trained against the built-in AI of the game with different rewards encouraging aggressive or defensive behaviour. We show that the learning agent finds interesting (and partly near optimal) policies in accordance with the reward functions provided. We also discuss the particular challenges arising in the application of reinforcement learning to the domain of computer games.

    @inproceedings{graepel2004learningtofight,
    abstract = {We apply reinforcement learning to the problem of finding good policies for a fighting agent in a commercial computer game. The learning agent is trained using the SARSA algorithm for on-policy learning of an action-value function represented by linear and neural network function approximators. We discuss the selection and construction of features, actions, and rewards as well as other design choices necessary to integrate the learning process into the game. The learning agent is trained against the built-in AI of the game with different rewards encouraging aggressive or defensive behaviour. We show that the learning agent finds interesting (and partly near optimal) policies in accordance with the reward functions provided. We also discuss the particular challenges arising in the application of reinforcement learning to the domain of computer games.},
    author = {Graepel, Thore and Herbrich, Ralf and Gold, Julian},
    booktitle = {Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education},
    pages = {193--200},
    title = {Learning to Fight},
    url = {https://www.herbrich.me/papers/graehergol04.pdf},
    year = {2004}
    }

Learning to Race

Drivatars are a novel form of learning artificial intelligence (AI) developed for Forza Motorsport. The technology behind the Drivatar concept is exploited within the game in two ways:

  • as an innovative new learning game feature: create your own AI driver!
  • as the underlying model for all the AI competitors in the Arcade and Career modes.

Our original goal in developing Drivatars was to create human-like computer opponents to race against. For more details please visit the Drivatar project page.

Social Relationships in First-Person Shooter Games

Online video games can be seen as medium for the formation and maintenance of social relationships. We explore what social relationships mean under the context of online First-Person Shooter (FPS) games, how these relationships influence game experience, and how players manage them. We combine qualitative interview and quantitative game log data, and find that despite the gap between the non-persistent game world and potentially persistent social relationships, a diversity of social relationships emerge and play a central role in the enjoyment of online FPS games. We report the forms, development, and impact of such relationships, and discuss our findings in light of design implications and comparison with other game genres.

  • Y. Xu, X. Cao, A. Sellen, R. Herbrich, and T. Graepel, „Sociable Killers: Understanding Social Relationships in an Online First-Person Shooter Game,“ in Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work, 2011, p. 197–206.
    [BibTeX] [Abstract] [Download PDF]

    Online video games can be seen as medium for the formation and maintenance of social relationships. In this paper, we explore what social relationships mean under the context of online First-Person Shooter (FPS) games, how these relationships influence game experience, and how players manage them. We combine qualitative interview and quantitative game log data, and find that despite the gap between the non-persistent game world and potentially persistent social relationships, a diversity of social relationships emerge and play a central role in the enjoyment of online FPS games. We report the forms, development, and impact of such relationships, and discuss our findings in light of design implications and comparison with other game genres.

    @inproceedings{xu2011sociablekillers,
    abstract = {Online video games can be seen as medium for the formation and maintenance of social relationships. In this paper, we explore what social relationships mean under the context of online First-Person Shooter (FPS) games, how these relationships influence game experience, and how players manage them. We combine qualitative interview and quantitative game log data, and find that despite the gap between the non-persistent game world and potentially persistent social relationships, a diversity of social relationships emerge and play a central role in the enjoyment of online FPS games. We report the forms, development, and impact of such relationships, and discuss our findings in light of design implications and comparison with other game genres.},
    author = {Xu, Yan and Cao, Xiang and Sellen, Abigail and Herbrich, Ralf and Graepel, Thore},
    booktitle = {Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work},
    pages = {197--206},
    title = {Sociable Killers: Understanding Social Relationships in an Online First-Person Shooter Game},
    url = {https://www.herbrich.me/papers/CSCW11.pdf},
    year = {2011}
    }