CurveBall: January 2014

Thursday, January 30, 2014

The Immovable Object: 100 Years of MLB lineups

All this talk about lineups has gotten me interested in how thinking about lineups has changed over the course of baseball's history. Turns out, Darwinian evolution does not seem to apply inthe insular world or professional baseball. Just looking at wOBA as a function of lineup position at various years shows how little lineup construction has changed in the last 100 years.

In the above plot, the horizontal dashed line is the league-average wOBA, so this is showing player performance relative to the league's overall performance. Each year is shifted up by 20 points to keep all the plots from lying on top of each other. Leadoff and 2nd-hole players tightly correlated to league average. Third and cleanup are the big bats, and the rest of the lineup degrades monotonically. You can see the advent of the DH in the interval between 1964 and 1976.

To see just how much these numbers have (or haven't) changes since 1913, take a look at wOBA and OBP as a function of year for the first four lineup positions:

OBP as a function of year for different slots in the lineup. The leauge OBP is shown in the bottom panel. The tick solid lines, showing OBP relative to league OBP for slots 1-4, are boxcar smoothed over a 5-year window. The individual yearly results are shown with the thin solid cuves of the same color.

Sam as OBP above, but now for weighted on-base average (wOBA).

If you read this blog (both of you), you know that the leadoff spot is the most important spot in the lineup. The third spot, surprisingly, is the least important of the top 5. But for 100 years the leadoff batter has strayed--on average--within a narrow range of the league average. In fact, the era where the leadoff hitter had his highest wOBA was the early 1910's.

The idea that any batting event (walk, double, triple, etc) is more valuable in the leadoff spot than in any other spot (once adjusted for plate appearances) is a fairly new concept. Getting on base, one would think, was an antecedent to that notion, something that came into the fore in the early years of the Bill James revolution.

But getting on base--in fact, taking pitches and working the walk--was a very early requirement for a good leadoff hitter. Searching back issues of the New York Times starting 1900, the earliest mention I find of "leadoff man" is from July 16, 1919, after the Yankees manager decided to shake up his lineup with his team mired in a slump:

Roger Peckinpah was put at the head of the list in place of Sam Vick. The latter has not been getting on the bases often enough for a leadoff man…. His trouble was inability to wait out the pitchers. Sam wanted to smack the ball every time he had a chance.

And later that same year, the Times had an article about the Washington Senators manager predicting that the Chicago White Sox would beat the Reds to win the "world's series":

"When I pick the White Sox to win the world's series [sic], don't think I have just had a guess… Just take the batting order of the Chicago outfit. Nemo Leibold is a leadoff man of great ability. He is hard to pitch to and has a good eye. If the balls are bad he won't take a swing at them."

And indeed, Nemo Leibold had a .404 on-base percentage in 1919, roughly 80 points higher than the league average, with a 14% walk rate.

But after 1920, both wOBA and OBP of leadoff hitters declined, and other than a brief period around 1990, never reached the same levels. The grumbling about Joey Votto not realizing that a sacrifice fly is better than a walk just shows that baseball thinking hasn't just not-evolved, it's devolved. Votto may not be a leadoff hitter but the argument still applies.

My purpose in looking into these results is to find how many runs baseball teams have wasted by putting middling players at the top of the order. More on this later.

Monday, January 27, 2014

What is the most important slot in the lineup?

What was your first thought? Mine was that it had to be either the second or third spot, and I think most analytically-inclined (but maybe not analytically experienced) observers would say the same thing. Like most of these ideas, how to form the question is equally as important as how to reach the answer. To me, the phrasing of this question is most objectively defined in the following way:

If you had a lineup of average players---which, as described in the link, produces an average number of runs---and you could make one of those players 10% better, which one should you pick?

By 10% better, I mean that the rate at which is they reach base in all forms except error (walks, hits, hit-by-pitch) is increased by 10%. Strikeouts are the same. Ground ball to fly ball ratio is the same, as is baserunning. This is actually a substantial increase in player ability: an increase from a wOBA of 0.330 to 0.363 is the transition from a slightly-above-average player to a near All-Star caliber hitter. But it's easier to see the difference when making the shift larger. Over the course of a 162-game season, how many extra runs (over the fully-average team) would this new lineup score? How many would they lose if the player were made 10% worse instead?

I put this scenario through the lineup simulator and reached the following results:

Runs gained (or lost), when making one batter in a lineup of equal batters 10% better (or worse) than the other 8. Numbers are aggregates over a 162-game season.

Two things jump out at you: (1) Putting the boost into the leadoff spot creates the most runs relative to all other lineup spots, and (2) the 3rd spot in the order is actually the least important spot of the top 5. Now, the differences between each of these spots isn't especially significant, but this is mostly an academic exercise anyway since the "average lineup" doesn't exist in reality. But even with those caveats, it is an intriguing result.

These results here are essentially echoed in the chapter in lineup construction in The Book[1]. The value in the third spot in the lineup is attenuated by the fact that this spot comes up with 0 on and 2 outs more than any other spot. The run expectancy of any event is at a minimum. One might argue that my "objective approach" for defining this problem over-inflates the value of the leadoff spot, since the guy in the 9th slot will reach base more than a typical bottom-feeder in MLB (especially in the NL), leading to more RBI opportunities than would normally exist. However, the leadoff hitter actually leads off in ~41% of all their plate appearances[2] (which translates into 19% of all innings--nearly twice as much as the next batter, the 4th slot), so what happens in the ninth slot isn't critical. Additionally, the run expectancy + Markov Chain approach in The Book comes to pretty much the same conclusion: once the number of PAs is factored in, run values for each type of batter event are maximal in the leadoff spot for most all events save home runs, where the number of men on base is minimal.

[1] If I do anymore reinventing of the wheel on this blog, I'll be driving a semi-trailer truck this time next week.
[2] h/t SportsAnalyticsBlog for tweeting this number.

It's ironic that the sabermetric revolution seems to be responsible for an increased idea that your biggest bat needs to hit third, rather than fourth[3]. Remember back to all the arguments about where Barry Bonds should bat? But now, third seems to be the slot of choice for the games top hitters: Cabrera, Votto, Cano, McCutchen, Goldschmidt, Braun... the list goes on. Teams would be better served moving them one slot up or down. And that's not even getting into the double play opportunities afforded to the guy in the 3rd spot. I think it's worth pointing out that the Detroit Tigers have underperformed relative to their weighted runs created (wRC) every single year since Cabrera came over from the Marlins (with the exception of 2009, where they were +1 runs). And not by small margins: usually minus 20 to 40 runs each year.

[3] That's at least how I remember it. However, looking back at lineups of yesteryear kind of proves me wrong-- historically, the wOBA of the 3rd and 4th spots appears to be about the same through most eras.

Of course not all of this difference can be attributable to Cabrera batting third-- they're a slow overall team with negative team "BsR" each year except 2010 (a year they were -19 with respect to wRC)-- but as a KC fan, I'll be glad to see him come up in the third spot in 2014.

Is a Team of Average Players an Average Team?

Depending on who you are, you'll have one of three responses to this question:

That's so obvious I don't even know why you're asking it.
That's so obvious I-- well, now that I think about it a little more...
Click on nearest link to Kate Upton bikini pictures[1].

Because I fell in the category #2, I felt the need to test this with the simulator. And really what I mean with this question is "Does a lineup of average players produce an average number of runs?" The argument for "yes" has the backing of the stats community---the statistic wRC (weighted runs created) works. It predicts with high accuracy the number of runs a team will produce. And all you need to calculate it is the number of times each player reached base and how; a walk, a double, etc. And that statistic doesn't care about whether one player got those hits more than another.

The argument for "no" is that statistics like wRC aren't perfect; there's a dispersion[2] of 25 actual runs for teams that have the same wRC value. And different slots in the lineup matter more than others--- a better hitter in the 2-spot will help the team more than a better hitter in the 8-spot. So if you swap in an above-average player in the two spot and a below average player in the 8-spot, don't you end up with a better team?

Really, this all depends on how you define average. If you take all qualified players and sort by wOBA, and take whoever is smack in the middle... that's not actually an average player. An average player would have the statistics of what happens during an average plate appearance. That means that you take every single, double, triple, walk, etc, that happens during an MLB season, and divide by the total number of plate appearances that occurred. Thus the extra at-bats that high-slot players get are taken into the definition of "average":

700 plate appearances
141 strikeouts
109 singles
31 doubles
3 triples
18 home runs
54 walks
6 times hit by a pitch

The total runs achieved by a lineup of nine clones of this player is identical to the average runs from all lineups created from actual players. So, chalk one for the statheads.

This implies that most of the variance of team performance at fixed values of wRC can be attributed to "luck". This means that even though the number of hits+walks is the same, their distribution within each inning is different. If an average MLB team reached base 8 times per game, the fewer innings you bunch those baserunners into, the more runs you'll score. More on that in a later post.

This is not related to the shenanigans that managers engage in when constructing lineups. Although lineups shuffle around the same players, the shift in runs created is (mostly) due to the fact that a team will produce fewer baserunners when putting crappy players in the first two spots, and this will be reflected in their wRC.

[1] But really... are there any other kind of Kate Upton pictures?

[2] For those of you not familiar, "dispersion" here is the same as "standard deviation" (which is the square root of variance). It means that about 34% of teams will be up to 25 runs above what wRC predicts, and 34% will be up to 25 runs below what wRC predicts (assuming Gaussian statistics). Converting that into wins, there's over 5 wins separating teams at the high and low of one standard deviation. The remaining 32% of teams will be even farther apart. Just ask Cardinals fans this year.

Monday, January 13, 2014

Digging Deeper with the Double Play

This post is essentially a repository for results about the impact of double plays on an office. These results are helpful for understanding various results of what lineups are better than others. I refer to some results are RoyalsReview, but my technical skills aren't sharp enough to figure out how to get them onto the post I wrote there.

The first plot shows the number of double play opportunities (DPO) per slot in the lineup, relative to the average number of DPOs each player sees (thus +0.1 is 10% more opportunities, -0.1 is 10% less). Some of the data come from The Book (Tango et al) from MLB data from 1999-2002, the other results come from my lineup simulator. Slots 2 and 3 face the maximum number of opportunities. Slots 1 and 9 face the minimum. Don't pay attention to the "wOBA" lines-- they're for another post here.

There's also the matter of how much each double play costs the team. Double plays in the heart of the order matter more than double plays in the 6/7 spots, because there's not a lot of offense that comes out of the 8/9 spots. The run value of a double play in the 9 spot is so high because we're about to turn over in the lineup and get to (supposedly) good batters at the top of the order.

Combining these two results together, the 3-spot is still the most important slot for double plays, but the 4-spot isn't far behind. My statement that Butler faces too many DPOs in the 4 spot was reductive. In the four-spot, there are slightly more DPOs, but the cost per double play is maximal. The cost per double play in the 3-spot seems strangely low: this is because the third spot is the least important spot of the first five. I know this sounds counterintuitive, but I'm preparing a post with supporting evidence soon (or you can read the chapter on lineups in The Book, which comes to the same conclusion).

Saturday, January 11, 2014

Freakonomics and the Double Play

Double plays are good for a baseball team.

Okay, that's a loaded statement. They are not the cause of a team being better, but they are a bellweather of team performance. You don't have to scratch the surface of baseball statistics very deep to get to this level of understanding: teams that get more runners on base score more runs. Teams that get more runners on base encounter more double play opportunities, and the GIDP rate doesn't vary all that much from team to team. Thus, teams that score more runs hit into more double plays. This is the same sort of cocktail-party sabermetrics as "teams that leave more men on base win more games" that Malcolm Gladwell would really appreciate. But it's interesting to see it borne out in the data, and with a much higher correlation than I would have predicted.

The figure below shows the number of double play opportunities (DPO) each player sees relative to the average number of DPOs per player per game. A value of 0.1 mean 10% more DPOs per game over the average player. -0.1 means 10% less. The green squares are taken from The Book, which are based on MLB data (American League only) from 1999-2002. The circles are from 1,000 simulated seasons for different lineups[1]. Each simulated lineup samples from all 2013 players with >90 PA. The error bars indicate the season-to-season dispersion for a fixed lineup.

[1] How you order the players does make some difference in these results. For instance, if I order the players 1-9 based on highest to lowest wOBA, the GDOs in the 2-slot increase significantly and the GDOs in the 7/8/9 slots go down a little to make up the difference. Here I've implemented a "MLB-like" lineup where the leadoff hitter is the 5th best player by wOBA, the 2-5 slots are the best 4 players in random order, the 6th slot is the 6th-best player, and the bottom of the order are the lowest three wOBAs, also in random order.

Using this lineup, the results are quite consistent with The Book's values. There is significant team-to-team spread in this curve, however. The green dahsed line is the mean DPOs for teams in the top 10% of simulated wOBA, and the red dotted curve are the lowest 10% of teams.

Double play opportunities (DPO) relative to the mean for each lineup slot. The bracket notation "<>" indicates the mean. The error bars on the simulations represent the season-to-season dispersion for the same lineup of players. MLB data are taken from Tango et al.'s The Book.

We can try and look for this effect in the actual MLB data as well. The top panel just compares my simulated results to MLB data from 2005-2013, where the run-scoring environment is essentially the same as what I've set up in my simulations. The correlation coefficient r=0.11 for the MLB data and 0.07 for my simulated results. This isn't an overwhelming correlation, but it is there and it is positive.

Extending these data all the way down to 1968 yields the middle panel, where the correlation becomes apparent even to the naked eye. Results for strike-shortened seasons have been rescaled to 162 games. The sidebar shows the r-value for each 9-year chunk I looked at, hitting a maximum of r=0.26 in the height of the PED era. I'll leave a more rigorous investigation of why these dependencies changed with baseball era so significantly to a future post (perhaps just sample variance), but the bottom panel shows definitively that this correlation exists.[2]

[2] The results there are 'jumpy' because I've binned by sets of 60 teams, sorted by Team Runs, across the x-axis, instead of binning in fixed-sized bins of runs. This is a superior way to bin data when there is uneven coverage across the x-axis. Note that the range on the y-axis on this plot is smaller than the upper two panels.

Please, Ned Yost, don't bat Omar Infante Second

Bat him cleanup.

Hear me out. I'm not going to claim that Infante has what it takes to be a "run producer," as opposed to the "bat control guy" that Ned Yost wants. I'm saying he's neither of these things, and given what the Kansas City Royals have to work with, that's why he's a reasonable choice for the 4-spot.

In a previous post, I described some software tools that I created for simulating baseball games given a lineup of players with known statistics. These stats include both batting, baserunning, and base stealing. It runs through all the various lineup combinations and determines the average runs per game for each. There are 362,880 different combinations for a sample of nine players, and running through all of them takes significant computing time. After a few trial runs, my simulations yielded the common-sense outcome that Moustakas/Cain/Escobar should always be relegated to the 7/8/9 slots, preferably in that order[1]. So to make subsequent calculations run faster, I fixed those three players' slots in the lineup. This reduces the number down to only 720 possible lineups from the combinations of six players.

[1] If you don't agree that this is a logical and obvious solution, I would tell you to stop reading... but your name is probably "Ned" and you're the whole reason I'm writing this post to begin with. So please keep reading, Ned. Humor me.

Remember from my previous post that there is no one optimal lineup---my break point for "statistically indistinguishable" in the baseball sense is one run over a 162-game season. For each set of simulations, up to several 10's of lineups will make that cut. Below I'll show results for several criteria on player performance. Each figure shows one panel for each player, with a histogram indicating the number of times that player hits in each slot for the set of optimal lineups. The order of the panels is the lineup order of the top-ranked lineup.

As inputs I take the statistics for each player summed over the last three years (two years for Aoki), 2011-2013. This assumption implies mild bounce-back years from both Butler and Gordon, and a slight up-tick from Escobar's dismal 2013 (essentially he'd reproduce 2011). It also assumes that Moustakas and Cain are what they are after 1,500+ plate appearances. Aoki may only be a third-year MLB player, but he's also 32 and isn't likely to surpass what he's done previously. The only exception I make for significant change is Eric Hosmer. Perhaps it belies optimistic homerism on my part, but it did seem that he turned the corner after the first two months of the season, and his monthly splits thereafter showed consistent above-average production. I assume that 2014 Eric Hosmer is second-half 2013 Hosmer, which would make him the best hitter on the team. In KC there's optimism that Perez is another candidate for improvement, his early numbers were off the charts, so a 3-year average also represents a step above his 2014. Each player's wOBA is shown in each figure.

It should be noted that this lineup doesn't take into account lefty/righty platoon splits of the hitters, so these results might be better thought of as optimization against right-handed pitchers, which most stats are collected against. I also don't implement any hit-and-runs---it is, I think, impossible to collect the proper data for this: what fraction of caught-stealing are blown hit and runs? What fraction of hit and runs are actually just straight steals? What about hit and runs that just didn't work? Also, the code doesn't care about putting multiple left-handers in a row, something I know Yost didn't like about the in-house lineup analysis done last season. I think a strong argument can be made that this shouldn't be a consideration. Yes, a LOOGY can come in and get three straight outs, but why would you implement a sub-optimal lineup for the first 6-7 innings just to get a slight advantage in the 8th? (Most closers are right-handed, so it's just one or two innings.)

It is interesting to look at the results when adding each aspect of offensive play to the simulator. For simplicity, start by considering only what each player does at the plate. Everything after ball-makes-contact is considered to be league average for each player in the lineup. In that case, we have the straightforward task of trying to maximize the number of times your best players bat, as well as the number of times they bat in the same inning. Results are shown in the first figure below.

Results when considering only plate performance. All baserunning is MLB average. The number below each player's name is their weighted on-base average (wOBA), a mainstay in FanGraphs WAR calculations. Although each player can fit into many slots in the lineup, the ordering of the panels is the order of the best overall lineup.

Now, before you start laughing, this isn't the last time you're going to see Butler in the leadoff spot in this post. But taking into consideration only plate performance, sticking Butler/Gordon/Hosmer at the top of the order will produce a lot of runs. Infante, having the lowest wOBA, gets relegated to the six-spot.

Now let's add baserunning to the equation. Here, baserunning means taking the extra base (including scoring from second on a single, which happens ~60% of the time) and base stealing, but no double plays.

Lineup results when including baserunning, with the exception of grounding into double plays.

Hosmer is an above-average base runner, and combined with his hitting he jumps to the top of the order. Gordon, as before, can fit in almost any slot. But adding baserunning pushes Butler down in many of the best lineups. You can still get good production with him at #1 and #3, if you order the other guys properly. Now lets put double plays into the equation and see what happens.

Lineup results when including baserunning and double plays.

Gordon/Hosmer/Aoki/Infante is not a surprising top four, but their order is the opposite of a conventional lineup construction. Perez and Butler are the worst baserunners on the team, with Perez having a GIDP rate nearly as high as Butler (but better taking the extra base). The results for Butler are quite intriguing: either bat him leadoff or bat him sixth. Why? The leadoff spot encounters the least number of double play opportunities relative to the rest of the lineup, usually by a wide margin. He also has the highest on-base percentage on the team. It'll take two hits or a home run to push him across the plate once he reaches first, but he'll be on first (or second) more often than any other player. If you don't put him there, you don't have much choice but to put him in the sixth spot, where double play opportunities are around league-average and his baserunning won't matter because no one behind him is going to drive him in anyway.

Rotochamp has offered their best guess at the opening-day lineup for the Royals (see chart above), putting Aoki/Infante one-two. This seems to be the general consensus in the media that the Royals think this as well. There are three problems with this lineup: (1) the Royals best hitters aren't reached until the 3rd spot. (2) Batting Hosmer fifth reduces both his plate appearances and RBI opportunities. This slotting alone is a major knock on this batting order. (3) Butler in the four spot yields double plays and exposes his deficiencies on the base paths. In the table I've also listed the top lineup as well as the best Butler-leadoff lineup for comparison. Taking all this into consideration, the consensus lineup costs the Royals 11 runs over the course of the season according to simulations I ran with this lineup.

Eleven runs. More than one win. One win the Royals can't afford to lose.

Billy Butler's Iron Bat and Lead Feet

I'm not here to tell you that Billy Butler is slow. You know that already. You might have seen Joe Posnanski's post about him being the "slowest player in MLB". You might have listed all players on FanGraphs and reverse-sorted them by "Spd" just to see him at the top.[1] You know this already.

[1] As an aside, he's tied for this honor with Kendrys Morales, the player that, from many reports, Dayton Moore is interested in replacing him with.

I'm here to try and precisely and accurately quantify the impact this has on his team's offensive production.

First, I would like a sidebar to discuss Billy Butler's GIDP numbers. In the last three years, Butler has hit into 72 double plays. Although everyone knows how slow Butler is, Ned Yost continues to put him in a position to fail. Over that timespan, Butler has been faced with 400 double play opportunities. Gordon, on the other hand, has faced only 271. Gordon also strikes out more and hits more fly balls than Butler, helping to reduce the number of double plays he hits into. Neither of these aspects of double plays implies that Butler is hurting his team.[2] These are the reasons why I advocate batting Butler either leadoff (no, seriously) or in the 6-spot.

[2] Look here to read about the counterintutive nature of double plays and how they correlate with overall team performance.

But make no mistake, even without Yost's help, Billy is hurting his team. As listed above, Butler's GIDP rate is 18%, significantly higher than the MLB average of 11%. Every GIDP costs the team about 0.6 runs[3]. His baserunning is also abysmal. Here's how his baserunning breaks down with respect to each type of situation:

[3] This is the number I get when running identical simulations with and without double plays. This number is significantly higher than what you get from run expectancy (RE) calculations, such as that in Tango et al's The Book, which uses the 24 base/out states to get a value of -0.35 runs per DP. Perhaps this is just a limitation of the RE method, but The Book also lists the cost of a caught-stealing as -0.47 runs (Table 7). A double play, practically by definition, must cost a team more than a caught stealing.

Fraction of the time a player takes the extra base. The MLB averages will vary a few percent year-to-year, and these numbers are 2013.

Essentially, he's 35% of the average baserunner. Putting him anywhere in the 2-3-4 region of the lineup kills the team. To determine the impact this has on the team, I reran my lineup simulator, setting all baserunning parameters for Butler to be league average. The optimal lineups all put Butler in the 1-2-3 region of the lineup, and this lineup scores an extra 17.2 runs per season relative to the optimal lineups that take into account his true baserunning capabilities (but, it should be noted, assume he's not going to get any slower as he passes his age-28 season).

I love Country Breakfast. I love the barbeque sauce. I love that he's the best pure hitter on the team. I hate the Trolls who think he needs to "produce more runs". But -17.2 runs, it should be noted, is a reduction of two wins in the standings. I don't love that.

Friday, January 10, 2014

Playing a Baseball Season on my Laptop

Even though this post will be of limited interest (at best), it is necessary for me to both document what I've done in my simulations and to prove that the calculations I've done are robust and pass various tests. I'm going to document one straightfoward test here.

The code works just like an actual game (only without pitching); each game has nine innings, each inning has three outs, and each player up can either reach base or create an out. Outs are either strikeouts or in-play outs. Reaching base can happen by walks, hits or errors. The rate at which a player does all these things depends on stats that are read in from some input file. Usually I use real players with stats taken from FanGraphs, but sometimes it's interesting to play around with things. Players on base advance on balls in play, depending on whether the out is a ground ball or a fly ball (which are also part of the input statistics).

The code takes into account baserunning in four ways: (1) taking the extra base: there isn't a whole lot of player-to-player variance in the number of times a player takes the extra base relative to the league average, but it matters enough that it needs to be taken into account. There is a tail of bad baserunners out there that can have a significant impact on their teams. (2) Relatedly, double plays are important. The league average for double player per DP-opportunity is 11% in 2013, but this depends a lot on ground ball rate and strikeouts, as well as the number of opportunities a player is given to hit into one. Mike Moustakas actually leads the Royals of staying out the DP given the number of chances he's had, but that's because it's hard to double-up the guy on first when fielding a pop-up. So the code uses the fraction of times a ground-ball out in a DP-situation yields a GIDP (as well as the player's GB rate and the average BABIP for grounders). This is where I had to go to Retrosheet. (3) Base stealing. Not that this really matters much, but players above a given threshold of stolen base attempts are flagged as "base stealers" and they attempt and succeed at their previous rate. (4) Reaching base on error. (Not really baserunning per se, but I'll just stick it here anyway.) This is a statistically significant source of base runners, and it's frustrating that FanGraphs does not track it nor include it in their wOBA calculation, even though Tango et al's The Book puts it in theirs. This is another thing that depends on ground ball rate (since this is where the errors happen) and speed of the batter. Right now I only have this depending on GB%, but I should add considerations for player handedness and overall speed.

The test I'm presenting here is that I reproduce the actual run expectancy (RE) of each of the 24 base/out states possible. This is presented in the very first chapter of The Book, and forms the basis of a lot of the analysis presented thereafter and of many advanced statistics. The bases can be empty or occupied in various ways;

Bases empty
Runner on first
Runner on second.
Runner on third.
First and second.
First and third.
Second and third.
Bases loaded.

And those 8 states can happen with 0, 1 or 2 outs. Thus 24 total states. The RE is defined as the average number of runs from that point in the inning until the end. The RE value (0 out, bases empty) is simply the average number of runs scored per inning. The Book was written using statistics from the height of the steroid era (1999-2002), so their numbers are a bit higher than we see in MLB today. But at that time the average team scored just under 5 runs a game, of 0.55 runs per inning. So that's the first RE value. If there are 0 outs and a man on first, the RE goes up. Man on second, goes up again. I think you get the idea, and you can see all the values from The Book (table 1) in the figure below.

To simulate this, I took all players' MLB number from 1999-2002 who had more than 200 plate appearances total over that span, and created 10,000 random lineups (batting order sorted by wOBA---I have also tried to make "MLB-like" lineups, but it doesn't change the results enough to matter for this test) and ran them through the simulator. Essentially, there is one free parameter in the code: getting the first RE number right: the mean runs scored per inning. This is where the cutoff in plate appearances comes in: the lowest-PA players are all replacement (or worse). Bringing more of them into the simulated lineups lowers the overall offense of my virtual league. So once I figure out what players to use to get the right mean number of runs, then I can look at the rest of the states.

Results are quite encouraging. Additionally, I should point out that the code produces the right number runs with the right number of hits, walks, errors, double plays, sacrifice flies, etc etc etc. This is an important check as well, since if my virtual league scored the same runs as MLB did in this era but had different batting statistics, something would be wrong. I'll give more rigorous tests in the future, once I am infused with the desire to write more boring posts.

Thursday, January 9, 2014

Wisdom and Dogma in "Lineups Don't Matter"

For a community that originated the phrase "lineups don't matter", the baseball stats community certainly seems to talk about them a lot. Part of this is because this subject is low-hanging fruit for making numbers-based arguments; it's easy to apply simple baseball statistics to the batting order. Part of this is simple baiting by the baseball establishment, who still do things like slotting Alcides Escobar #2 155 times (2012 & 2013), Jason Kendall 70 times (2010), and Willie Bloomquist 76 times (2009).

And it's not just relegated to the Kansas City Royals, my hometown team. Elvus Andrus and his .677 OPS logged 468 ABs in the two-hole for the Rangers this year. JJ Hardy cobbled together a slash-line of .236/.281/.380 in 631 ABs for Baltimore in 2012. Chone Figgins limped to a .640 OPS in 590 ABs for the Mariners in 2011. All of these players have low-K rates, high contact, and hurt their teams by coming to the plate more than 7 other players. Older-school baseball managers cannot help themselves when given the opportunity to give away plate appearances in the two-hole. Perhaps they have read this WikiHow article on constructing a lineup.

In fact, this entire blog is the birthchild of the free-agent contract signed by Omar Infante to play second base for the Royals. After years of sub-replacement flotsam soiling the batters box for too many years to count, this was a legitimate upgrade. Infante, however, has a subcutaneous RFID tag that transmits "bat control guy" into the cell phones of old-school managers and GMs[1]. Thus, immediately after the signing, the common assumption was that Infante would bat second, after Norichika Aoki. So I took it upon myself to determine exactly how much this adherence of conventional wisdom would cost the team.

[1] Assuming they have cell phones.

Naturally, I then spent my entire Christmas break writing a software package to construct Monte Carlo simulations of the forthcoming Royals season.

The details of these simulations---many of which matter much, much more than I ever would have expected---I will leave to another post. But the simulations take as input a set of stats for nine players, most of which can be found on FanGraphs but some I had to go to Baseball-Reference or even Retrosheet. It assumes that the per-plate-appearance rates in those files represent the intrinsic ability of each player. From one season to the next, the actual numbers of singles, doubles, homeruns, etc, will vary according to Poisson statistics given the sample size of each season. Whether this is a bad assumption is hard to test, but I can restrict things to only seasons where the aggregate team wOBA is the same year-to-year.[2]

[2] I'll note that, before beginning this exercise, I was partial to OPS for assessing player ability, but after scratching the surface of advances statistics it is clear that wOBA is the optimal single statistic for quantifying a player's offensive impact. Quite simply, the value of wOBA is proportional to the number of runs created per plate appearance. If they didn't do that silly "wOBA scale" to make the numbers look more like on-base percentage values, it would be exactly the number of runs created per plate appearance.

To experienced stats people, much of what I ended up doing was classifiably "reinventing the wheel". But the general approach laid out in The Book, where each event has a given run value that varies with position in the up, is complementary to the approach taken here. When restricted to the same set of considerations, the Monte Carlo approach and the Run Expectancy calculations (which I'll call RE) should get the same answer. But the Monte Carlo approach is easier to tailor to a specific set of conditions and well as being more flexible for different types of conditional events. I will admit that the RE approach take significantly less CPU time.

In one sense, I wasted a few weeks of CPU time to come to the same conclusions that SABR-savvy reader would already know: slotting your best players on top gets then 5-10% more PAs per season, and bunching your best hitters together optimizes the runs they can produce. There are additional subtleties, of course, but these are probably the two prime directives of lineup construction. In the case of the 2014 Royals, these subtleties end up making all the difference.

One of the things that's difficult to encapsulate in the RE approach is baserunning (outside of double plays). The Monte Carlo code takes into account baserunning in four ways: (1) taking the extra base: some players are better than others, and it can vary enough to matter in the bottom line. (2) Relatedly, double plays are important. Player-to-player variation here can be extreme, as any Royals fan has come to grips with over and over and over again. (3) Base stealing. Not that this really matters much, but players above a given threshold of stolen base attempts are flagged as "base stealers" and they attempt and succeed at their previous rate. (4) Reaching base on error. This is a deceptively important aspect of the game, and is most certainly not a random occurrence.

For 9 players, there are 362,880 different combinations ("9!" for the undergraduate statistics students). And I did them all. Here's what I found out:

Result #1: There is no such thing as the "optimal lineup". Well, strictly speaking that statement is not true. After determining the runs/game for each of the possible lineups, one lineup will be on top. The difference with respect to the second-best lineup, however, is likely to be small enough as to be inconsequential. For a typical sampling of today's players, there are about 20-40 different lineups that will yield the same production to within 1 run over a 162-game season. The difference in runs from the best to the 100th-best lineup is about 3 runs. The difference with respect to the 700th-best lineup is about 9 runs, and this is where we hit the value of one win according to FanGraphs. Now, one win is nothing to sneeze at---just ask Cleveland and Detroit fans last year. And if you're an offense-challenged team like the Royals, every run counts.

Result #2: The season-to-season variance of the same lineup is large. Take those same 9 guys and play a thousand seasons with them in the same exact order, and the season-to-season dispersion is 38 runs. Part of this is statistical fluctuations in the performance of each hitter. In each season, players will be slightly above and below their natural averages. In some seasons, more than half of your hitters will hit above their true levels, thus the team will do better. However, when culling the subset of seasons that produce the same aggregate team-wOBA (to within +/- 0.001), the dispersion is 19 runs. Understanding what's happening in those seasons is a subject for further study (although the quick-and-dirty anlaysis is "luck").

For a given lineup, you must make sure that you have a robust answer for the number of runs per game it will score. The number of possibilities for a given game is staggering, and in order to converge to the "right" answer to have to run enough simulations to span the entire space of possibilities (and do so such that their relative occurrences are correct). Since I don't really care about what happens below the level of 1 run over 162 games, a nominal "accuracy" would be 1/162~0.006 runs/game. To be rigorous, I set a convergence threshold of 0.002 runs/game. To achieve that, it took 50 million game simulations. For each lineup. In this context, doing analysis in Bill James' day is even more impressive.

Result #3: If you try hard enough, you can do some really dumb things with a lineup. And slotting a high-contact, low-base-reaching hitter second in the order is one of those things. Taking things to the extreme, the difference between the overall best and worst lineups is about 20 runs. For the exact cost of batting Omar Infante second, I'll get to that soon.

At the end of the chapter on lineups in The Book there is a statement about how arranging your lineup properly can save you 50 runs over the course of a season. I think this is a significant overstatement given both the Monte Carlo results and my own attempts to implement the method outlined in that chapter. But lineups do matter, and some of the considerations are counter-intuitive, and hopefully the analytics guys at One Royal Way can continue their good work---Gordon as leadoff!---and convince Ned Yost to think outside the box score.