I was excited to discover Battle Reporter while listening to This Deathclock has 60 Minutes. The site is hosted by the Trollblood Scrum and the Deff Head Dice blog. This is a database interface that allows anyone to record their Warmachine games. And best of all, the data provided is available to download as a live-updated CSV text file. The results so far are interesting, with entries from more than 450 unique names, including Warmachine celebrity MenothJohn, and two WTC players.
This file currently contains records of over 1300 games. It is also possible to track your games by getting in touch with the administrator should you wish to use the website for game tracking. I will certainly be using it from now on; I urge you to use it also. Currently 164 casters have been recorded, so a little way to go.There have been zero games recorded with Bradigus for example. Of the casters observed, there are 13366 unique combinations, so many more games are needed to get a complete representation of caster rating.
As Deff Head Dice already has descriptive statistics, I thought I’d use the dataset to initialise some caster ratings for WTC. Players can self assess their own ability. However, as players are not invited to assess their opponents’ ability I will not take this into account. On average, players likely mostly have opponents of similar skill, and if not, with large enough records unequal games should average out.
Players can also state what kind of game it was; casual, local tournament or national tournament. I chose to give a slightly higher weighting to the small number of tournament games as these may more closely align with how the casters will be played at the WTC.
About a dozen records had no opponent caster. Interestingly, all of these games had been entered by Retribution players (multiple users!). Presumably these haughty elves care not for their human prey. I discarded these results along with draws.
I added all games for each caster pairing to a new blank matrix for Mark 3, adding wins and subtracting losses, and recorded the number of games observed. At this point I wanted to scale this table of win and loss so that it corresponded to player ratings.
The Wisconsin Team Tournament is a WTC-style team tournament hosted by Privateer Press Judges Nathan Hoffman (from the Crippled System podcast) and Travis Marg. The organizers have made a very similar dataset to the WTC available as an HTML table. I was able to scrape the data from this using the R package rvest. Twelve teams entered and played out over 4 rounds. Eight of the 60 players were American WTC players for whom I had ratings. For the others I had no information, other than that they had not previously played in the WTC. I arbitrarily assigned these players as having a rating of 1400 as they are likely of lower skill than players that had passed through the selection process to join a team (initialised at 2200 in my initial analysis).
To allow the caster ratings to be scaled, I needed to add some information to these naive player ratings. I ran two rounds of the team with the unscaled caster ratings to start to split up the players. I then optimized the results of the third and fourth rounds by scaling the caster ratings to maximise r-squared (a measure of correlation) and minimize difference of calibration gradient from 1 for the predicted wins with the proportion of wins observed (Gist). The scaling number was 9.3, so most games were worth 9.3 points for each win.
This is a very… hmm… let’s kindly say heuristic… approach which has given me some approximate caster ratings. I will test these caster ratings on the WTC data. Still this approach is much more direct than my previous method, and can be performed in the absence of CP and AP information. If these penalties tally with broader player experience, that would be reassuring. The following penalty plots show penalties for the caster named in the title against each of a range of casters. The line shows the position of the rating estimate, and the coloured bands give an estimate of the number of records observed. Narrower bands mean more games have been recorded for that pairing. Plots are only shown here for casters with more than three games against more than three casters.
For reference, if two equally skilled players played a game using list1 against list2 with a penalty of -50, player 1 would win 44.7% of such games. If they played with penalties 0, 10, 20 and 50, player 1 would win 50.0%, 51.1%, 52.1% and 55.3% of their games respectively. This means that the process described here has observed 3 wins for Helynna1 against Agathia1, and proposed that this corresponds to a ~5 percentage point advantage. The 4 observed defeats at the twisted hands of Mordikaar1 corresponds to a ~6 percentage point disadvantage. Kozlov1 was observed defeating Syntherion1, but was defeated by Helynna1. Madrak2 had a good record against Tanith1, but lost several games against Butcher3.
Of course, these values are driven by relatively small numbers within a caster pair, but this is a relatively objective approach to allow me to initialise the caster penalties. In the meantime, keep reporting your games!