Balancing Factions

I previously presented calibration plots for the 16 most popular casters taken to the WTC. I cut to top 16 to make sure that there was enough data in each bin. I was convinced by Privateer Press forum user Fluffiest to try running the same analysis on the factions instead. I was not expecting much to leap out from the aggregated data.

factionwins_vs_predicted3

Proportion of observed wins for each Warmachine and Hordes factions in WTC 2016, plotted against the expected outcome given relative player skill as estimated by Elo rating (number of observations in brackets)

Due to the larger number of games observed in each group I spit the results into 10 bins each. I wanted to summarize these plots in a single easily digestible metric. To rank the factions I used the linear model trendline to calculate the area under the curve within the one by one box that makes up each plot. If the number is greater than 0.5, players in that faction are winning more games than expected. If it is less than 0.5 they are winning fewer games than expected. The area under the curve of the line of best fit can be calculated from the gradient and intercept. I would consider a faction to be balanced if the area under the Wins versus Predicted plot is 0.5. The results of this metric are somewhat unexpected.


Faction Score
1 Retribution 0.56
2 Circle 0.55
3 Legion 0.55
4 Cygnar 0.54
5 Convergence 0.51
6 Mercenaries 0.50
7 Protectorate 0.49
8 Minions 0.48
9 Khador 0.48
10 Cryx 0.45
11 Skorne 0.45
12 Trollbloods 0.41

While widely acclaimed faction Retribution is right at the top, Legion performed well above what is expected based on player experience, and while reviled faction Skorne is at the bottom, favoured faction Trollboods is in dead last.

Of course I cannot claim that these results definitively suggest that the balance is off for Trollbloods, this is a startling result, and one that may require further investigation.

Edit: The calibration plot for the entire WTC dataset explains the misspecification shown by many of the caster and faction calibration plots. There is an overall discrepancy between the predicted and observed win rate for the largest ratings differences. This is likely due to the small training dataset available (6-12 games for most players), as well as needing to impute around 30% of the field.

wins_vs_predicted1

Proportion of observed wins for all games in WTC 2016, plotted against the expected outcome given relative player skill as estimated by Elo rating (number of observations in brackets)

While this shows that the ratings still need more training data, I believe that this approach will be useful for considering the effectiveness of casters relative to player skill.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s