world team championships 2014 part 2

Perhaps with data from more games, we might see an improvement to these predictions. But also, there could be some other effect that we are not incorporating. The outcome of a game of Warmachine is influenced by the army lists used by each player. For now, we can simplify army list as warcaster. Let’s calculate some scores for lists and see how good they are at predicting outcome.

> # add opponent's list to each record
> wtctrain$opponent_list <- wtctrain[with(wtctrain,
+     order(game_id, -TP)), "list"]
> # ratings based on casters only
> rating_list <- steph(na.omit(wtctrain[,
+     c("round", "list", "opponent_list", "TP")]))
> # ratings for casters
> rating_list
 
Stephenson Ratings For 99 Players Playing 1298 Games
 
        Player Rating Deviation Games Win Draw Loss Lag
1    Vladimir3   2589    178.23     4   4   0   0   2
2      Makeda2   2555    155.22     8   8   0   0   0
3   Goreshade3   2544     78.59    30  26   0   4   0
4     Grissel2   2518     91.68    32  28   0   4   0
5      Damiano   2493    199.90     2   2   0   0   0
6     Vladimir   2492    108.67    12  10   0   2   2
7       Gorten   2482    209.97     2   2   0   0   2
8        Irusk   2424     82.84    28  18   0  10   0
9    Reclaimer   2378     77.63    40  30   0  10   0
10     Makeda3   2373    199.69     4   2   0   2   2
11     Skarre2   2368    130.25     8   6   0   2   1
12   Severius2   2362    134.09    10   8   0   2   0
13     Grissel   2360    101.95    16  12   0   4   0
14   Vladimir2   2333     80.73    30  20   0  10   0
15     Krueger   2332    129.63    10   6   0   4   1
16    Morvahna   2322    108.80    14   8   0   6   0
17    Deneghra   2286     61.06    62  36   0  26   0
18       Ravyn   2284    158.32     6   4   0   2   1
19   Harbinger   2282     56.54   110  62   0  48   0
20   Thagrosh2   2277    110.81    16  10   0   6   2
21    Kaelyssa   2273    115.46    12   6   0   6   0
22      Saeryn   2269     65.98    54  30   0  24   0
23   Vindictus   2268    142.83     6   4   0   2   0
24 Asphyxious2   2258     62.77    74  42   0  32   0
25      Caine2   2257     67.23    48  24   0  24   0
26 Asphyxious3   2256    133.66    10   6   0   4   1
27     Issyria   2247     66.33    58  36   0  22   0
28   Deneghra2   2247     60.01    72  38   0  34   0
29  Absylonia2   2247    256.04     2   2   0   0   2
30   Morvahna2   2245     58.90    90  46   0  44   0
31     Sorscha   2241     74.26    36  20   0  16   0
32    Terminus   2240    102.64    16   8   0   8   0
33       Haley   2233     69.23    40  20   0  20   0
34      Maelok   2231    163.50     6   4   0   2   0
35    Calandra   2224    102.31    18   8   0  10   0
36   Mordikaar   2223     86.03    20  12   0   8   0
37  Doomshaper   2221     75.19    36  20   0  16   0
38     Kreoss2   2218    113.86    14   8   0   6   0
39        Nemo   2217    174.56     4   2   0   2   2
40       Grim2   2210    125.57     8   4   0   4   1
41      Irusk2   2206     94.79    20  10   0  10   0
42    Severius   2205    131.91    10   4   0   6   2
43    OldWitch   2200     97.17    16   8   0   8   0
44      Xerxis   2189     67.18    60  32   0  28   0
45     Madrak2   2186     92.38    24  12   0  12   1
46    Krueger2   2179     58.39    88  44   0  44   0
47      Vyros2   2176     67.29    48  24   0  24   0
4       Ossrum   2174    156.10     8   2   0   6   1
49      Haley2   2173     53.93   152  78   0  74   0
50      Skarre   2173     50.78   150  74   0  76   0
51     Lylyth2   2173     66.45    48  22   0  26   0
52    Butcher3   2170     62.04    56  28   0  28   0
53    Hexeris2   2169     84.97    22  10   0  12   0
54        Rahn   2166    101.81    14   6   0   8   0
55        Vayl   2160    104.08    14   6   0   8   0
56  Goreshade2   2158    139.16     8   4   0   4   0
57      Darius   2157    122.94    10   4   0   6   1
58     Butcher   2152     71.37    36  16   0  20   0
59     Bartolo   2150    164.87     8   2   0   6   1
60        Zaal   2146     94.27    20   8   0  12   0
61      Kromac   2142     65.90    64  24   0  40   0
62        Rask   2140     75.40    36  14   0  22   0
63  IronMother   2137    168.66     4   2   0   2   1
64       Kaya2   2123    141.37    10   4   0   6   0
65    Butcher2   2119     82.75    46  30   0  16   0
66     Rasheth   2117     81.42    30  12   0  18   0
67     Karchev   2115    182.25     4   2   0   2   0
68       Coven   2114    147.18     8   2   0   6   1
69     Kreoss3   2110     90.53    30   8   0  22   0
70    Thagrosh   2104    138.32    10   4   0   6   0
71       Vayl2   2102     51.81   144  78   0  66   0
72  Syntherion   2097     90.05    26   8   0  18   0
73      Kreoss   2090     80.85    30  10   0  20   0
74     Hexeris   2082    161.65     4   2   0   2   0
75    Sorscha2   2081    106.91    12   4   0   8   0
76 Doomshaper2   2078    158.37     6   2   0   4   0
77      Durgen   2078     77.67    34  12   0  22   0
78      Ossyan   2076    225.03     2   0   0   2   3
79      Makeda   2074    155.36     8   4   0   4   1
80      Grayle   2074    188.28     4   2   0   2   2
81        Jarl   2067    201.22     4   0   0   4   2
82      Lucant   2041     79.66    30  10   0  20   0
83       Siege   2026    103.83    20   6   0  14   0
84      Baldur   2017    103.67    16   6   0  10   0
85     Ashlynn   2011    192.33     2   0   0   2   0
86     Baldur2   1984     93.41    30   6   0  24   0
87        Grim   1980    112.35    14   4   0  10   0
88      Blaize   1965    154.85     6   0   0   6   0
89       Sloan   1942    194.34     4   0   0   4   2
90      Feora2   1884     84.10    34   6   0  28   0
91      Kallus   1881    203.70     2   0   0   2   3
92      McBain   1864    169.72     4   0   0   4   2
93    Stryker3   1856    158.41     6   0   0   6   0
94  Asphyxious   1850    225.03     2   0   0   2   3
95    Morghoul   1835    168.79     4   0   0   4   0
96        Kaya   1830    156.66     4   0   0   4   0
97     Cassius   1806    141.47    10   0   0  10   0
98       Feora   1792    171.83     4   0   0   4   0
99    Barnabas   1784    141.68     8   0   0   8   0

These results suggest that there is a power level difference between lists (assuming that all lists for a given caster are equivalent of course). The Deviation in the rating gives us a measure of how much we trust the rating. Smaller Deviation suggests that the true rating is more likely to be close to the rating estimate than a rating with a larger Deviation. Therefore, since lots of games were played with Goreshade3 and Grissel2 (therefore a smaller Deviation), and they got a high score, we can be fairly sure that they are good lists. Although Vladimir3 and Makeda2 scored very well, winning all of their games, we are somewhat uncertain about their power, since fewer games were played.

goreshade3

Goreshade Lord of Ruin

The most surprising result to see at the bottom of the table is Feora2. She is widely considered to be an excellent caster with great game against lots of lists. However, she lost most of her games and was rated very poorly. Perhaps her poor performance is a feature of the team format whereby skilful teams can manipulate favourable match-ups, and all-rounders end up having a tough time playing into skews.

Note that the list ratings are their average performance in all match-ups observed, and do not take into account ratings versus specific lists. For example, by this method, Rock, Paper and Scissors would all have a rating of approximately 2200 (the starting rating for an average player).

As described in part 1 we can use the ratings to make some predictions about our test data.

> wtctest$pstephlist <- predict(rating_list,
+     wtctest[, c("round", "list", "opponent_list")],
+     tng = 0, gamma = 0)
> with(wtctest, wtctest[player_team %in%
+     c("Team USA Stars", "Team Poland Reckless"),
+     c("player", "player_team", "list", "pstephlist", "TP")])
                 player          player_team       list pstephlist TP
1471 Michal Nakonieczny Team Poland Reckless    Lylyth2  0.1160688  0
1472  Anthony Ferraiolo       Team USA Stars Goreshade3  0.8839312  1
1473        Tomek Tutaj Team Poland Reckless    Kreoss3  0.2960863  0
1474      Jake VanMeter       Team USA Stars     Saeryn  0.7039137  1
1475   Michal Konieczny Team Poland Reckless    Baldur2  0.1645806  0
1476        Brian White       Team USA Stars  Harbinger  0.8354194  1
1477  Andrzej Kasiewicz Team Poland Reckless  Deneghra2  0.5939719  0
1478        Will Pagani       Team USA Stars   Krueger2  0.4060281  1
1479       Marcin Mycek Team Poland Reckless   Butcher2  0.4505320  0
1480     Ryan Chiriboga       Team USA Stars Goreshade2  0.5494680  1
> probsl <- wtctest$pstephlist[
+     wtctest$player_team == "Team USA Stars"]
> siml <- matrix(NA, nrow = 1e6, ncol = 5)
> for (i in seq_along(probsl)) {
+     siml[, i] <- rbinom(1e6, size = 1, prob = probsl[i]) }
> pUSAl <- sum(apply(siml, MARGIN = 1,
+     FUN = function(x) { sum(x) > 2 })) / 1e6
> pUSAl
[1] 0.823273

So based on the list rankings from the first 5 rounds, provided we know which teams will play in the final, who will play who and which list each player will select, we predict that the USA have a 82% chance of winning the round, based on list strength alone.

If we summarize the list ratings as in part 1 we can see how good we are at rating casters.

> plot(diff(brk) + brk[-length(brk)], success,
+     xlab = "predicted success (list)", asp = 1)
> abline(coef = c(0, 1), col = 2)

success_list_steph_calibration
This plot suggests that the list selection is actually a little better at predicting game outcomes than player rankings. Since we are generating ratings of lists and players based on the same dataset, we should just check that they are not too correlated. If the best performing players are winning with different casters than the worst performing players, then there is no additional information in analysing both the player performance and list performance. But if the better performing players are winning despite less powerful lists, then we may be able to improve our predictions.

> with(wtctrain, cor(pstephlist, pstephplayer))
[1] 0.2704031
> plot(pstephlist ~ pstephplayer, data = wtctrain,
+     pch = 16, col = "#33333333", asp = 1)

list_player_steph_correlation

Okay, great, it seems like that our list scores are not being unduly influenced by the rankings of the players using them. So the question is now, how can we combine these player and list rankings so that we can improve our prediction of outcome?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s