world team championships 2014 part 1

Everyone who has seen the Facebook movie The Social Network has been exposed to the Elo chess rating system. This algorithm was devised to allow player skill to be estimated using the results of games. The great thing about the rating system is that two players do not need to have played each other before to give an estimate of the likelihood of either player winning.

There are many implementations and extensions of Elo’s system. Happily, there is a package that implements Elo, and a number of more modern ranking systems, such as the Stephenson rating systems in R from the package PlayerRatings. The Stephenson method weighs more recent results more heavily than older results.

The World Team Championship was held October 4-5, 2014 in Szczyrk, Poland. I grabbed the table called wtc_results from their SQL interface and saved it as a CSV file. This is a cool dataset for a number of reasons. It is clean, ordered and ready to go. It has a partner dataset which has detailed list breakdowns. It is based on the game of Warmachine, which means that depending on army list selection, there is a handicap based on list matchup. And it has a slightly unusual pairing system, whereby the teams can manipulate their matchups.

> wtc <- read.csv("wmh-wtc_results2014.csv",
+     stringsAsFactors = FALSE)
> dim(wtc)
[1] 1560   11
> head(wtc)
game_id round           player         player_team     list
1       1     1       Petr Cermak Team Czech Republic   Saeryn
2       1     1 Anthony Ferraiolo     Team USA Stars Deneghra2
3       2     1   Josef Skládanka Team Czech Republic     Siege
4       2     1       Brian White     Team USA Stars Reclaimer
5       3     1 Daniel Bogdanoski Team Czech Republic     Irusk
6       3     1       Will Pagani     Team USA Stars   Kromac
              faction     opponent_name TP CP AP victory_condition
1   Legion of Everblight Anthony Ferraiolo 0 0 0         Scenario
2                   Cryx       Petr Cermak 1 5 9         Scenario
3                 Cygnar       Brian White 0 2 16         Scenario
4 Protectorate of Menoth   Josef Skládanka 1 5 40         Scenario
5                 Khador       Will Pagani 0 0 20       Caster Kill
6         Circle Orboros Daniel Bogdanoski 1 2 50       Caster Kill

The first thing I like to do with a data file like this is just check for any miscoding. Some simple summaries can quickly give us an idea about how the data are structured.

> table(wtc$round)
 
1   2   3   4   5   6
260 260 260 260 260 260
> table(wtc$faction) / 6
 
Circle Orboros Convergence of Cyriss                   Cryx
34.00000               6.00000               44.33333
Cygnar                 Khador   Legion of Everblight
29.00000               30.00000               28.83333
Mercenaries               Minions Protectorate of Menoth
6.00000               5.00000               28.83333
Retribution of Scyrah                 Skorne             Trollblood
14.00000               18.00000               16.00000

Okay, so if we’re expecting a whole number of records, we can instantly see that there’s a mistake in the table. We can track down the incorrect records by looking for where the pattern is not followed.

> err <- table(paste(wtc$player, wtc$faction, sep = "="))
> err[which(err != 6)]
 
Florian Hartmann=Cryx
1
Florian Hartmann=Legion of Everblight
5
János Kendi=Cryx
1
János Kendi=Protectorate of Menoth
5

This can then be easily be fixed.

 > wtc[wtc$game_id == 777 & wtc$player == "Florian Hartmann",
+     c("faction", "list")] <- c("Legion of Everblight", NA)
>
> wtc[wtc$game_id == 131 & wtc$player == "János Kendi",
+     c("faction", "list")] <- c("Protectorate of Menoth", NA)
>
> table(wtc$faction) / 6

Circle Orboros Convergence of Cyriss
34                     6
Cryx                 Cygnar
44                     29
Khador   Legion of Everblight
30                     29
Mercenaries               Minions
6                     5
Protectorate of Menoth Retribution of Scyrah
29                     14
Skorne             Trollblood
18                    16

If we want to make any predictions about game outcomes, we could use the ranking tools provided in PlayerRatings. When performing any kind of analytics we need to separate our data into training data and test data. If you don’t do this you just have a description of what happened with no generalized properties. For example, if I have the sequence 1, 2, 4 and make an algorithm that can guess that the third value is 4, you might not be impressed. But if I guess that the third value is 3 given the sequence 1, 2, then actually I’m not too far off. So let’s just try to guess the final round of the WTC.

> wtctrain <- wtc[wtc$round < 6, ]
> wtctest <- wtc[wtc$round == 6, ]

So let’s use the steph function to create some player ratings!

> require(PlayerRatings)
> # ratings based on pairings only
> rating <- steph(wtctrain[,
>    c("round", "player", "opponent_name", "TP")])

Once we have a rating object, we can use it to make some predictions about our test data.

> wtctest$pstephplayer <- predict(rating,
+    wtctest[, c("round", "player", "opponent_name")],
+    tng = 0, gamma = 0)
> with(wtctest, wtctest[player_team %in%
+     c("Team USA Stars", "Team Poland Reckless"),
+     c("player", "player_team", "pstephplayer", "TP")])
player         player_team pstephplayer TP
1471 Michal Nakonieczny Team Poland Reckless   0.28622347 0
1472 Anthony Ferraiolo       Team USA Stars   0.71377653 1
1473       Tomek Tutaj Team Poland Reckless   0.51302043 0
1474     Jake VanMeter       Team USA Stars   0.48697957 1
1475   Michal Konieczny Team Poland Reckless   0.06414879 0
1476       Brian White       Team USA Stars   0.93585121 1
1477 Andrzej Kasiewicz Team Poland Reckless   0.66126905 0
1478       Will Pagani       Team USA Stars   0.33873095 1
1479       Marcin Mycek Team Poland Reckless   0.63254214 0
1480     Ryan Chiriboga       Team USA Stars   0.36745786 1

At the event, Team USA Stars won with a clean sweep of 5 out of 5 games. To win they only needed to win 3 out of 5 games. Now there is surely an analytically exact solution to this, but we can estimate the probability of the observed event based on the player rankings by simulation.

> probs <- wtctest$pstephplayer[
+    wtctest$player_team == "Team USA Stars"]
> sim <- matrix(NA, nrow = 1e6, ncol = 5)
> for (i in seq_along(probs)) {
+     sim[, i] <- rbinom(1e6, size = 1, prob = probs[i]) }
> head(sim)
[,1] [,2] [,3] [,4] [,5]
[1,]   1   0   1   1   0
[2,]   1   0   1   1   0
[3,]   0   0   1   1   0
[4,]   1   1   1   0   0
[5,]   0   1   1   0   1
[6,]   1   0   1   0   0
> pUSA <- sum(apply(sim, MARGIN = 1,
+     FUN = function(x) { sum(x) > 2 })) / 1e6
> pUSA
[1] 0.634555

So based on the rankings from the first 5 rounds, provided we know which teams will play in the final, we predict that the USA have a 63% chance of winning the round.

We can also assess the ratings to see how predictive they actually are.

> # the rating object has an element ratings
> rat <- rating$ratings
> # we can merge the table of ratings with the training data
> names(rat) <- casefold(names(rat), upper = FALSE)
> rat <- rat[, c("player", "rating", "deviation")]
> colnames(rat) <- c("player", "steph_player_rating",
+    "steph_player_deviation")
> # ratings also for opponent
> wtctrain <- merge(wtctrain, rat)
> colnames(rat) <- c("opponent_name", "steph_opponent_rating",
+  "steph_opponent_deviation")
> wtctrain <- merge(wtctrain, rat)
> # predict estimates the probability of the player winning
> # for each game these probabilities sum to 1
> # tng is an argument discouraging us from using too little data
> # gamma is the home advantage
> wtctrain$pstephplayer <- predict(rating,
+    wtctrain[, c("round", "player", "opponent_name")],
+    tng = 0, gamma = 0)

How good is this rating at guessing outcomes? Of course a single game could go either way, but when we look at lots of games at once, we can see whether the scores are a good match with the outcome. This effect is called model calibration (props to Tim Paulden who gave a great explanation of this at the EARL conference).

> # group records by equally sized bins of probability of
> # player winning
> brk <- with(wtctrain, c(0, quantile(pstephplayer,
+             probs = seq_len(10) / 10), 1))
> grps <- with(wtctrain, cut(pstephplayer, breaks = brk))
> # predicted success by Stephenson method
> success <- with(wtctrain, tapply(TP, grps,
+           function(x) { sum(x) / length(x) }))
> success
(0,0.106] (0.106,0.197] (0.197,0.283] (0.283,0.378]
0.01538462   0.03846154   0.10769231   0.10769231
(0.378,0.5]   (0.5,0.622] (0.622,0.717] (0.717,0.803]
0.39230769   0.60769231   0.89230769   0.89230769
(0.803,0.894] (0.894,0.995]     (0.995,1]
0.96153846   0.98461538           NA
> plot(diff(brk) + brk[-length(brk)], success,
+     xlab = "predicted success", asp = 1)
> abline(coef = c(0, 1), col = 2)

success_player_steph_calibration

If this ranking was good at predicting success, then all of these points would be on or near the red line… so clearly we need to do some work here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s