The 2015 data from the Warmachine World Team Championships was recently made available from the WTC blog. I’ve done a little cleanup of the dataset from the raw file and added it to WTCTools.
> library(WTCTools) > head(wtc, n = 3) game_id round TP year victory_condition scenario 1 1 1 1 2013 Caster Kill Into the Breach 2 2 1 0 2013 Caster Kill Into the Breach 3 3 1 1 2013 Caster Kill Into the Breach player1 team1 list1 1 PÃ¤r-Ola Nilsson Team Epic Sweden Vlad2 2 Robert Willemstein Team Netherlands Feora2 3 Christian Aas Team Epic Sweden Vayl2 faction1 CP1 AP1 player2 1 Khador 0 35 Aat Niehot 2 Protectorate of Menoth 0 5 Joakim Rapp 3 Legion of Everblight 0 56 Tom Starren team2 list2 faction2 CP2 AP2 1 Team Netherlands Borka Trollblood 2 2 2 Team Epic Sweden Bartolo Mercenaries 4 4 3 Team Netherlands Butcher2 Khador 1 1
Since I want to provide the best description of game outcome, let’s first baseline predictability against the original statistically based measure of competitive game rankings, the Elo rating. The
elo function from PlayerRatings calculates the Elo rating in R.
> library(PlayerRatings) > wtc$allrounds <- wtc$round + (wtc$year - 2013) * 6 > ratingElo <- elo( + x = wtc[, c("allrounds", "player1", "player2", "TP")]) > players2015 <- unique( + unlist( + wtc[wtc$year == 2015, c("player1", "player2")])) > head(ratingElo$ratings[ + ratingElo$ratings$Player %in% players2015, + c("Player", "Rating", "Games", "Win", "Loss")], n = 16)
The default initial rating for this function is 2200. The Elo rating calculation has a tuning parameter, K, which determines how quickly players’ scores change. The
elo has a default K of 27. Increasing K could allow players who had not played in previous years to move up the table faster.
The difference in rating between the two players is used to estimate the probability of player 1 winning. For example, Jaakko Uusitupa is estimated as having a 55% chance of beating Brian White. Jaakko has an 84% chance of beating the lowest rated player.
> predict(object = ratingElo, + newdata = data.frame(19, "Jaakko Uusitupa", "Brian White"), + gamma = 0)  0.5454877
The Elo ratings can be tested by seeing how predictive they are in the final round.
> # exclude final round > ratingEloTest <- elo( + x = wtc[wtc$allrounds != 18, + c("allrounds", "player1", "player2", "TP")]) > # predict probability of player 1 winning > wtc$pplayer1 <- predict(object = ratingEloTest, + newdata = wtc[, c("allrounds", "player1", "player2")], + tng = 0, gamma = 0) > # plot calibration > lastRound <- wtc[wtc$allrounds == 18, ] > # split into even sized groups > pBins <- quantile( + x = lastRound[["pplayer1"]], + probs = seq(from = 0, to = 1, by = 0.1), + na.rm = TRUE) > lastRound$bins <- cut(lastRound$pplayer1, breaks = pBins) > # summarize actual results > rQuantiles <- tapply( + X = lastRound[["TP"]], + INDEX = lastRound$bins, + FUN = mean) > # compare with predictions > pQuantiles <- tapply( + X = lastRound[["pplayer1"]], + INDEX = lastRound$bins, + FUN = mean)
If we are predicting perfectly, the mean results should fall on a straight line with the mean prediction (i.e. we expect 40% of all games predicted as 40% chance of victory for player 1 to be a victory for player 1).
> library(ggplot2) > library(ggthemes) > ggplot() + + geom_point(data = data.frame( + Predicted_Quantile = pQuantiles, + Result_Quantile = rQuantiles), + + aes(x = Result_Quantile, y = Predicted_Quantile)) + + geom_abline() + + theme_economist()
The Elo rating is not giving us a very wide range of probabilities, so we are overestimating low probability outcome games and underestimating high probability outcome games. Nevertheless, the Elo rating is providing some estimate of the match outcome. This is the benchmark that we need to beat to improve our predictions of match outcomes.