Thursday, January 9, 2014

Wisdom and Dogma in "Lineups Don't Matter"

For a community that originated the phrase "lineups don't matter", the baseball stats community  certainly seems to talk about them a lot. Part of this is because this subject is low-hanging fruit for making numbers-based arguments; it's easy to apply simple baseball statistics to the batting order. Part of this is simple baiting by the baseball establishment, who still do things like slotting Alcides Escobar #2 155 times (2012 & 2013), Jason Kendall 70 times (2010), and Willie Bloomquist 76 times (2009). 

And it's not just relegated to the Kansas City Royals, my hometown team. Elvus Andrus and his .677 OPS logged 468 ABs in the two-hole for the Rangers this year. JJ Hardy cobbled together a slash-line of .236/.281/.380 in 631 ABs for Baltimore in 2012. Chone Figgins limped to a .640 OPS in 590 ABs for the Mariners in 2011. All of these players have low-K rates, high contact, and hurt their teams by coming to the plate more than 7 other players. Older-school baseball managers cannot help themselves when given the opportunity to give away plate appearances in the two-hole. Perhaps they have read this WikiHow article on constructing a lineup.

In fact, this entire blog is the birthchild of the free-agent contract signed by Omar Infante to play second base for the Royals. After years of sub-replacement flotsam soiling the batters box for too many years to count, this was a legitimate upgrade. Infante, however, has a subcutaneous RFID tag that transmits "bat control guy" into the cell phones of old-school managers and GMs[1]. Thus, immediately after the signing, the common assumption was that Infante would bat second, after Norichika Aoki. So I took it upon myself to determine exactly how much this adherence of conventional wisdom would cost the team.

[1] Assuming they have cell phones.

Naturally, I then spent my entire Christmas break writing a software package to construct Monte Carlo simulations of the forthcoming Royals season.



The details of these simulations---many of which matter much, much more than I ever would have expected---I will leave to another post. But the simulations take as input a set of stats for nine players, most of which can be found on FanGraphs but some I had to go to Baseball-Reference or even Retrosheet. It assumes that the per-plate-appearance rates in those files represent the intrinsic ability of each player. From one season to the next, the actual numbers of singles, doubles, homeruns, etc, will vary according to Poisson statistics given the sample size of each season. Whether this is a bad assumption is hard to test, but I can restrict things to only seasons where the aggregate team wOBA is the same year-to-year.[2]

[2] I'll note that, before beginning this exercise, I was partial to OPS for assessing player ability, but after scratching the surface of advances statistics it is clear that wOBA is the optimal single statistic for quantifying a player's offensive impact. Quite simply, the value of wOBA is proportional to the number of runs created per plate appearance. If they didn't do that silly "wOBA scale" to make the numbers look more like on-base percentage values, it would be exactly the number of runs created per plate appearance.

To experienced stats people, much of what I ended up doing was classifiably "reinventing the wheel". But the general approach laid out in The Book, where each event has a given run value that varies with position in the up, is complementary to the approach taken here. When restricted to the same set of considerations, the Monte Carlo approach and the Run Expectancy calculations (which I'll call RE) should get the same answer. But the Monte Carlo approach is easier to tailor to a specific set of conditions and well as being more flexible for different types of conditional events. I will admit that the RE approach take significantly less CPU time.

In one sense, I wasted a few weeks of CPU time to come to the same conclusions that SABR-savvy reader would already know: slotting your best players on top gets then 5-10% more PAs per season, and bunching your best hitters together optimizes the runs they can produce. There are additional subtleties, of course, but these are probably the two prime directives of lineup construction. In the case of the 2014 Royals, these subtleties end up making all the difference.


One of the things that's difficult to encapsulate in the RE approach is baserunning (outside of double plays). The Monte Carlo code takes into account baserunning in four ways: (1) taking the extra base: some players are better than others, and it can vary enough to matter in the bottom line. (2) Relatedly, double plays are important. Player-to-player variation here can be extreme, as any Royals fan has come to grips with over and over and over again. (3) Base stealing. Not that this really matters much, but players above a given threshold of stolen base attempts are flagged as "base stealers" and they attempt and succeed at their previous rate. (4) Reaching base on error. This is a deceptively important aspect of the game, and is most certainly not a random occurrence. 


For 9 players, there are 362,880 different combinations ("9!" for the undergraduate statistics students). And I did them all. Here's what I found out:

Result #1: There is no such thing as the "optimal lineup". Well, strictly speaking that statement is not true. After determining the runs/game for each of the possible lineups, one lineup will be on top. The difference with respect to the second-best lineup, however, is likely to be small enough as to be inconsequential. For a typical sampling of today's players, there are about 20-40 different lineups that will yield the same production to within 1 run over a 162-game season. The difference in runs from the best to the 100th-best lineup is about 3 runs. The difference with respect to the 700th-best lineup is about 9 runs, and this is where we hit the value of one win according to FanGraphs. Now, one win is nothing to sneeze at---just ask Cleveland and Detroit fans last year. And if you're an offense-challenged team like the Royals, every run counts.

Result #2: The season-to-season variance of the same lineup is large. Take those same 9 guys and play a thousand seasons with them in the same exact order, and the season-to-season dispersion is 38 runs. Part of this is statistical fluctuations in the performance of each hitter. In each season, players will be slightly above and below their natural averages. In some seasons, more than half of your hitters will hit above their true levels, thus the team will do better. However, when culling the subset of seasons that produce the same aggregate team-wOBA (to within +/- 0.001), the dispersion is 19 runs. Understanding what's happening in those seasons is a subject for further study (although the quick-and-dirty anlaysis is "luck").

For a given lineup, you must make sure that you have a robust answer for the number of runs per game it will score. The number of possibilities for a given game is staggering, and in order to converge to the "right" answer to have to run enough simulations to span the entire space of possibilities (and do so such that their relative occurrences are correct). Since I don't really care about what happens below the level of 1 run over 162 games, a nominal "accuracy" would be 1/162~0.006 runs/game. To be rigorous, I set a convergence threshold of 0.002 runs/game. To achieve that, it took 50 million game simulations. For each lineup. In this context, doing analysis in Bill James' day is even more impressive.

Result #3: If you try hard enough, you can do some really dumb things with a lineup. And slotting a high-contact, low-base-reaching hitter second in the order is one of those things. Taking things to the extreme, the difference between the overall best and worst lineups is about 20 runs. For the exact cost of batting Omar Infante second, I'll get to that soon.

At the end of the chapter on lineups in The Book there is a statement about how arranging your lineup properly can save you 50 runs over the course of a season. I think this is a significant overstatement given both the Monte Carlo results and my own attempts to implement the method outlined in that chapter. But lineups do matter, and some of the considerations are counter-intuitive, and hopefully the analytics guys at One Royal Way can continue their good work---Gordon as leadoff!---and convince Ned Yost to think outside the box score.




No comments:

Post a Comment