world team championships 2015 part 1

We are now just a few days away from World Team Championships 2015. After a small hiatus, I finally completed an analysis for 2014 that attempted to account for list pairings.

We can do some simulation to guess which teams will do well, given that the lists were posted a few weeks ago. This will of course require assumptions.

Player ratings are close to steady state. There are only 6 rounds of games at last year’s WTC. Player ratings are effected by the result of each match-up, so after so few rounds they likely are not close to their “true” value. Not all players this year played in 2014. Of 250 players, 117 have a rating score (assuming I registered player names between the two datasets correctly!). All new players have been given a starting rating of 2200. Player skill can wax and wane based on their cumulative experience, their experience into match-ups that actually occur, and their recent practice. For this analysis we will assume that all players have been accurately rated at the start of the tournament.

Caster lists are identical. Generating the pairing lookup data from the training dataset is really scraping the bottom of the barrel, and introducing biases. This could probably be tweaked downwards to reduce the bias introduced. Attempting to subcategorize lists based on list unit elements would exacerbate this problem. A possible approach to solve this might be to group lists based on their components (for example, armies with many heavies, or many units). That is well beyond the scope of this article series. For this analysis we will assume that all caster lists are identical.

Last year’s caster pairings are applicable to this tournament. Some players are taking casters that were not taken last year. Some players are taking casters that were not released last year. Many match-ups between casters will occur which did not occur last year. All of these match-up penalties will be set at 0. A significant errata was released just after the lists were posted, meaning that the power level of certain key casters may have changed. However, the release of other models, and a change in styles of list that were taken last year, and are taken this year, mean that last year’s match-up penalties may be wholly inappropriate. We will assume that the caster pairings are informative.

I grabbed the players and caster lists from www.discountgamesinc.com and cleaned up the results and reshaped it into a table I could register with the 2014 results. I heard Norbert describing on Warmachine podcast Chain Attack how much cleaning goes into getting the lists into a usable format to be published in the first place, and I am extremely grateful for their efforts. I have added the results to WTCTools.

> library(WTCTools)
> data(wtc2015_players)
> head(wtc2015_players)
                Team               Faction        Player
1 Australia Platypus                  Cryx    Aaron Wale
2 Australia Platypus                  Cryx    Aaron Wale
3 Australia Platypus                Khador  Sheldon Pace
4 Australia Platypus                Khador  Sheldon Pace
5 Australia Platypus Retribution of Scyrah Dyland Simmer
6 Australia Platypus Retribution of Scyrah Dyland Simmer
         List       Objective NModels
1 Asphyxious2          Bunker      16
2   Deneghra2      Fuel Cache      16
3    Butcher3          Bunker      15
4   Vladimir2   Arcane Wonder      12
5    Kaelyssa      Fuel Cache      21
6        Rahn Effigy of Valor      16

I calculated scorefrac as previously and created a caster pair lookup table for the 2014 and 2015 lists. The pair lookup table has 1086 pair ratings. This is only 0.8% of match-ups assessed. This might represent a larger proportion of match-ups that will actually occur. Popular casters will be in more match-ups.

> castersPop <- sort(
+     table(wtc2015_players$List),
+     decreasing = TRUE)
> castersPop[1:6]
Haley2     Deneghra       Skarre
 26           21           21
Krueger2  Asphyxious2    Deneghra2
 19           17           17

Many of the common match-ups are represented. Without multiple match-ups to get a good estimate of the pairing, scores are restricted based on the number of games played. The long flat sections in the plot below show where several match-ups were played once. In these cases the result may be due to chance rather than a reflection of the pair strength.

> h2 <- pairLookup["Haley2", ]
> barplot(sort(h2[h2 != 0], decreasing = TRUE),
+     col = "lightblue")

wtc_haley2

We can see that Haley2 has good match-ups against Feora2, Durgen and Kreoss3, but bad match-ups against Krueger2, Rasheth and Lylyth2.

Fewer players took Deneghra so her best scores are restricted to slightly lower penalties.

> d1 <- pairLookup["Deneghra", ]
> barplot(sort(d1[d1 != 0], decreasing = TRUE),
+     col = "lightgreen")

wtc_deneghra

Deneghra has a good match-up against Asphyxious2 and Irusk, but poor match-up against Haley, Haley2, Lucant and Old Witch.

As before, we calculated the player ratings using the steph function from PlayerRatings. The gamma ‘home advantage’ parameter is provided from the 2014 pair ratings. The results are biased in favour of match-ups that the best players won. This is because the match-up pairings are correlated with the player ratings. But I believe that this effect is significantly important in Warmachine that I want to persevere with this approach for now.

> # ratings based on pairings with selected caster pairings
> rating2014 <- steph(x = wtc2014[, 
+      c("round", "player1", "player2", "TP")], 
+     gamma = getMatrixVal(
+      list1 = wtc2014[, "list1"], 
+      list2 = wtc2014[, "list2"], 
+      x = pairLookup))

Player ratings are adjusted each round based on wins or losses. The amount by which the rating is adjusted depends on their likelihood of winning that match. Highly rated players beat lower rated players will gain a small increment to their rating. Highly rated players that beat high rated players will gain a larger increment to their rating. Jake van Meter is rated highest in this dataset since he won all of his games against tougher opponents. Since the ratings are not at steady state after just 6 rounds, the deviation is very high. This allows us to realize that we don’t really know whether Jake is a better player than Brian or Colin, but he is likely a better player than those at the bottom of the table.

> head(rating2014$ratings)
     Player   Rating Deviation Games Win Draw Loss Lag
1     Jake VanMeter 2705.855  156.8029     6   6    0    0   0
2       Brian White 2668.346  159.2237     6   6    0    0   0
3        Colin Hill 2606.952  168.2910     6   6    0    0   0
4 Anthony Ferraiolo 2591.929  157.3978     6   5    0    1   0
5       Tomek Tutaj 2588.126  158.2652     6   5    0    1   0
6        Ben Leeper 2574.153  161.9183     6   6    0    0   0

To allow this ratings object to be used with the 2015 players data, the missing players must be added. The rbind function adds records to a table with the same number of columns.

> players2014 <- unique(c(wtc2014$player1, wtc2014$player2))
> players2015 <- unique(wtc2015_players$Player)
> # update ratings
> rating2014$ratings <- rbind(rating2014$ratings, 
+     data.frame(Player = players2015[
+      !players2015 %in% players2014], 
+      Rating = 2200,
+      Deviation = 300, 
+      Games = 0, Win = 0, 
+      Draw = 0, Loss = 0, 
+      Lag = 0))

Once we have the ratings for the players, we can simulate some possible tournaments and summarize them to get an idea of how the match-ups might play out this year. The function doRound performs a simulation for one round of a fictional tournament. It generates pairings between teams, pair players and make up an outcome based on the probability of each player winning as predicted by the ratings object. Until I can think of a better way to represent the team environment, this necessitates additional assumptions.

Player match-up by teams is indistinguishable from randomness. Each team will attempt to maximize their chance of winning a round using strategies of their own devising. Leading or finishing the pairing process with stronger players, selecting players with certain answers in response to opponent lists, or attempting to get the best match-ups for a number of players. Certain teams may have a match-up skill that allows them to win the pairing process. If teams are equally skilled in the player match-up, the pairing process will be indistinguishable from random.

Players select their caster lists randomly. Each player will have an opinion of which of their lists they would like to play into their opponent’s. Whether or not they get their preferred match-up is effectively random.

R Code for Simulating WTC Tournament
> set.seed(252634)
> nn <- 10000
> out <- matrix(NA_character_, nrow = 50, ncol = nn)
> for (ii in seq_len(nn)) {
+   thisRound <- doRound(data = wtc2015_players, 
+     results = NULL, 
+     ratings = rating2014, 
+     pairlookup = pairLookup)
+ 
+     for (jj in 8:12) { 
+       thisRound <- doRound(data = wtc2015_players, 
+         round = jj, 
+         results = thisRound, 
+         ratings = rating2014, 
+         pairlookup = pairLookup)
+     }
+     tmp <- rownames(thisRound)
+     tmp <- tmp[
+       order(thisRound$Matches.Won, 
+         thisRound$Total.Games.Won, 
+         decreasing = TRUE)]
+     out[, ii] <- tmp
+ })

These simulations can be summarized to give an estimate of the tournament results. For 10000 simulated tournaments we get a rough measure of the percentage likelihood of a team winning the event by dividing by 100.

> tab <- apply(X = out, MARGIN = 1L, FUN = function(x) {
+       sort(table(x), decreasing = TRUE) })
> (tab[[1]] / (nn/100))[1:6]
x
         USA Stars      England Lions Italy Michelangelo 
             17.20              15.47              14.74 
  Australia Wombat       Sweden Nobel      Poland Grunts 
             13.31               8.88               7.38

We can view these distributions by plotting the percentage chance of each team placing in each rank. From this we can see that we expect Teams Poland Leaders, Poland Grunts, Italy Michelangelo and team USA stars to perform well. Teams Belgium Blonde and England Roses. Teams Sweden Nobel, Australia Wombat and Australia Platypus are also in the running.

team_profiles

Of course these results are all contingent on the results above. The most important component of this simulation is the player ratings themselves, and these can be improved with more data, which will soon be available!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s