CurveBall

Thursday, May 1, 2014

Omar Infante is the Royals MVP so far

This entire website was based on the idea of proving that Omar Infante, a perfectly respectable MLB player, should not be featured in the second slot in the lineup, which is the second most important spot. But after only 26 games, it is clear that Omar brings something to the Roys that they sorely lack, and he has been the MVP of the team so far. I can only hope that the other players try to emulate him in so many ways:

He hits the ball in the air. Although his HR/FB ratio is only ~6%, he has a stellar line drive percentage at 24.4% which leads the time by a wide margin. Plus, he actually does have a home run, which is more than Eric Hosmer, Billy Butler, and Lorenzo Cain can say. His ISO of 0.128 isn't going to win any awards, but it's more than respectable. And for a second baseman on a team with an overall ISO barely north of 0.100, he's a power hitter.
He has some semblance of plate discipline. His walk rate is 7.5%, slightly below league average but better than the team average of 7.1%. But more importantly, he only swings at pitches outside the strike zone 23% of the time. Only Jarrod Dyson has better self-control (14.3% ... !!!). League average is 29% and KC is sitting, surprisingly, only slightly higher than that at 30%.
His approach maximizes the value of a low strikeout hitter. I've been less than impressed with KC's penchant for having the lowest K-rate in the league (since it has never translated into any run production), but Infante's 8% K-rate is once again the best on the team by a wide margin. The reason the Roys low K-rate doesn't equate offensive production (much discussed on FanGraphs) is that they swing at lots of junk and make contact with ALL of it. This is the major reason the Royals have the 8th highest GB/FB ratio in MLB at 1.52. Infante has a high contact rate when he swings outside the zone, but since he refrains from doing it as much as his teammates, he's able to post a wRC+ of 105 that isn't BABIP fueled (0.284).

Last, and I'm sure we all feel this way, it is so refreshing to see actual major-league play from second base in Kansas City. Infante doesn't lead the team in WAR, but given that our previous second basemen never posted a positive WAR in years before his arrival, he gets an extra boost from just being competent in all areas of his game. His fielding, baserunning, and hitting have all been above average.

It's been reported that Escobar has tried to emulate Infante's approach to baseball this year. That doesn't mean it's true, or that there's actual cause and effect, but Esky has recaptured some of his 2012 magic (BABIP-inflated it may be, he does have a 7.5% walk rate and the second-highest LD% among team regulars). All I can hope for is that Perez---and even Hosmer---sees his teammates' success and wants to tag along as well.

Did KC management seek his out because of his approach at the plate? It's so opposite the approach they seem to have fostered in all their home-grown talent (and many other FA signings), that I have to conclude that they simply looked at a ranked list of 2BFAs, ordered by reputation, and thought Well, we can't sign Cano, so who's next after him? Hey, even a stopped clock is right twice a day....

Thursday, April 24, 2014

I can't take it anymore

I'm watching Cory Kluber (that's Perennial Cy Young-Candidate Cory Kluber to you, friend) go through 6 innings of baseball with the Royals, expending only 62 pitches. Almost all of the outs have been on the ground. The one baserunner we got in scoring position was immediately thrown out in TOOTBLAN fashion.

Let's count the ways that this team is constructed against the current of modern baseball thinking.

Their batters do not draw walks. The negative value of this is self-explanatory.
Their rotation (3/5 of it, at least), does not throw strikeouts. In fact, there was a multi-game stretch last year where Jeremy Guthrie had not gotten a batter to swing and miss on any pitch thrown.
Their batters hit the ball on the ground. There is a known platoon split such that fly-ball batters have a small but definite advantage.
Their pitchers tend to get fly balls. Kauffman does suppress home runs, but not extra base hits. Plus, I do believe that 81 games are usually played on the road.
Their batters do not hit for power.
Their pitchers are often selected from a list of players entitled "homer-prone". And that's even with Kauffman's help.
Their batters do not take pitches and they swing at things outside the strike zone. (Related but slightly different from [1], since their high contact rate on balls outside the strike zone is the highest by far in all of baseball, leads to weakly-hit grounders.)

Honestly, if one were going to construct a roster from scratch, trying to adhere to some very general and basic principals, one would make a lineup that takes pitches, walks, and hits the ball in the air. One would construct a rotation that induces ground balls and has a high strikeout rate. The roster that Dayton Moore has put together has not just ignored these well-known principles, it actively cultivates the opposite of them.

No wonder the Royals are the worst offensive club in baseball that is actually trying to win.

I fail to understand why the characteristics they value in their fantastic bullpen (a FIP-blessed bunch if ever there was one) are not valued in their rotation.

Tuesday, February 4, 2014

Regression and the Royals: More Overdue than the Heyward Fault

As part of a fanpost that I wrote for Royals Review (that no one read, apparently), I crunched some numbers about regression for defense and bullpens that had very good showings in the previous year. The results were surprising (and depressing) on both fronts. I'm moving this here just to make sure it's put somewhere I have easy access to. The post went like this:

Bullpens do not post 7 WAR seasons 3 years in a row[*]. Defenses vary year to year[**].

[*] That's a statistically factual statement, not an assertion. Since 1990, no team has posted even consecutive +7 WAR seasons from a bullpen. Except the Royals of the last two years. We could all smell a little stink coming off Crow and Collins towards the end of the season, and Yost sensed it too given their usage down the stretch. So the health of our pen will depend on keeping new blood flowing though the system. Donnie Joseph. Louis Coleman. Please step forward. John Rauch, Guillermo Mota, and Brad Penny: Do not pass Go.

[**] Since DRS was created, 29 out of 30 teams that posted a +50 DRS regressed the next year. And I mean regressed A LOT. The average regression was -58 runs. Perhaps this implies that DRS is garbage that doesn't reflect the reality on the diamond. And to reinforce that notion, the 2013 Royals and their +93 DRS registered as totally average in defensive efficiency rating, as per Baseball Prospectus, turning 71% of balls in play into outs. I loved reading about how great our defense was last year, but I'm hoping that it was actually overrated.

Thursday, January 30, 2014

The Immovable Object: 100 Years of MLB lineups

All this talk about lineups has gotten me interested in how thinking about lineups has changed over the course of baseball's history. Turns out, Darwinian evolution does not seem to apply inthe insular world or professional baseball. Just looking at wOBA as a function of lineup position at various years shows how little lineup construction has changed in the last 100 years.

In the above plot, the horizontal dashed line is the league-average wOBA, so this is showing player performance relative to the league's overall performance. Each year is shifted up by 20 points to keep all the plots from lying on top of each other. Leadoff and 2nd-hole players tightly correlated to league average. Third and cleanup are the big bats, and the rest of the lineup degrades monotonically. You can see the advent of the DH in the interval between 1964 and 1976.

To see just how much these numbers have (or haven't) changes since 1913, take a look at wOBA and OBP as a function of year for the first four lineup positions:

OBP as a function of year for different slots in the lineup. The leauge OBP is shown in the bottom panel. The tick solid lines, showing OBP relative to league OBP for slots 1-4, are boxcar smoothed over a 5-year window. The individual yearly results are shown with the thin solid cuves of the same color.

Sam as OBP above, but now for weighted on-base average (wOBA).

If you read this blog (both of you), you know that the leadoff spot is the most important spot in the lineup. The third spot, surprisingly, is the least important of the top 5. But for 100 years the leadoff batter has strayed--on average--within a narrow range of the league average. In fact, the era where the leadoff hitter had his highest wOBA was the early 1910's.

The idea that any batting event (walk, double, triple, etc) is more valuable in the leadoff spot than in any other spot (once adjusted for plate appearances) is a fairly new concept. Getting on base, one would think, was an antecedent to that notion, something that came into the fore in the early years of the Bill James revolution.

But getting on base--in fact, taking pitches and working the walk--was a very early requirement for a good leadoff hitter. Searching back issues of the New York Times starting 1900, the earliest mention I find of "leadoff man" is from July 16, 1919, after the Yankees manager decided to shake up his lineup with his team mired in a slump:

Roger Peckinpah was put at the head of the list in place of Sam Vick. The latter has not been getting on the bases often enough for a leadoff man…. His trouble was inability to wait out the pitchers. Sam wanted to smack the ball every time he had a chance.

And later that same year, the Times had an article about the Washington Senators manager predicting that the Chicago White Sox would beat the Reds to win the "world's series":

"When I pick the White Sox to win the world's series [sic], don't think I have just had a guess… Just take the batting order of the Chicago outfit. Nemo Leibold is a leadoff man of great ability. He is hard to pitch to and has a good eye. If the balls are bad he won't take a swing at them."

And indeed, Nemo Leibold had a .404 on-base percentage in 1919, roughly 80 points higher than the league average, with a 14% walk rate.

But after 1920, both wOBA and OBP of leadoff hitters declined, and other than a brief period around 1990, never reached the same levels. The grumbling about Joey Votto not realizing that a sacrifice fly is better than a walk just shows that baseball thinking hasn't just not-evolved, it's devolved. Votto may not be a leadoff hitter but the argument still applies.

My purpose in looking into these results is to find how many runs baseball teams have wasted by putting middling players at the top of the order. More on this later.

Monday, January 27, 2014

What is the most important slot in the lineup?

What was your first thought? Mine was that it had to be either the second or third spot, and I think most analytically-inclined (but maybe not analytically experienced) observers would say the same thing. Like most of these ideas, how to form the question is equally as important as how to reach the answer. To me, the phrasing of this question is most objectively defined in the following way:

If you had a lineup of average players---which, as described in the link, produces an average number of runs---and you could make one of those players 10% better, which one should you pick?

By 10% better, I mean that the rate at which is they reach base in all forms except error (walks, hits, hit-by-pitch) is increased by 10%. Strikeouts are the same. Ground ball to fly ball ratio is the same, as is baserunning. This is actually a substantial increase in player ability: an increase from a wOBA of 0.330 to 0.363 is the transition from a slightly-above-average player to a near All-Star caliber hitter. But it's easier to see the difference when making the shift larger. Over the course of a 162-game season, how many extra runs (over the fully-average team) would this new lineup score? How many would they lose if the player were made 10% worse instead?

I put this scenario through the lineup simulator and reached the following results:

Runs gained (or lost), when making one batter in a lineup of equal batters 10% better (or worse) than the other 8. Numbers are aggregates over a 162-game season.

Two things jump out at you: (1) Putting the boost into the leadoff spot creates the most runs relative to all other lineup spots, and (2) the 3rd spot in the order is actually the least important spot of the top 5. Now, the differences between each of these spots isn't especially significant, but this is mostly an academic exercise anyway since the "average lineup" doesn't exist in reality. But even with those caveats, it is an intriguing result.

These results here are essentially echoed in the chapter in lineup construction in The Book[1]. The value in the third spot in the lineup is attenuated by the fact that this spot comes up with 0 on and 2 outs more than any other spot. The run expectancy of any event is at a minimum. One might argue that my "objective approach" for defining this problem over-inflates the value of the leadoff spot, since the guy in the 9th slot will reach base more than a typical bottom-feeder in MLB (especially in the NL), leading to more RBI opportunities than would normally exist. However, the leadoff hitter actually leads off in ~41% of all their plate appearances[2] (which translates into 19% of all innings--nearly twice as much as the next batter, the 4th slot), so what happens in the ninth slot isn't critical. Additionally, the run expectancy + Markov Chain approach in The Book comes to pretty much the same conclusion: once the number of PAs is factored in, run values for each type of batter event are maximal in the leadoff spot for most all events save home runs, where the number of men on base is minimal.

[1] If I do anymore reinventing of the wheel on this blog, I'll be driving a semi-trailer truck this time next week.
[2] h/t SportsAnalyticsBlog for tweeting this number.

It's ironic that the sabermetric revolution seems to be responsible for an increased idea that your biggest bat needs to hit third, rather than fourth[3]. Remember back to all the arguments about where Barry Bonds should bat? But now, third seems to be the slot of choice for the games top hitters: Cabrera, Votto, Cano, McCutchen, Goldschmidt, Braun... the list goes on. Teams would be better served moving them one slot up or down. And that's not even getting into the double play opportunities afforded to the guy in the 3rd spot. I think it's worth pointing out that the Detroit Tigers have underperformed relative to their weighted runs created (wRC) every single year since Cabrera came over from the Marlins (with the exception of 2009, where they were +1 runs). And not by small margins: usually minus 20 to 40 runs each year.

[3] That's at least how I remember it. However, looking back at lineups of yesteryear kind of proves me wrong-- historically, the wOBA of the 3rd and 4th spots appears to be about the same through most eras.

Of course not all of this difference can be attributable to Cabrera batting third-- they're a slow overall team with negative team "BsR" each year except 2010 (a year they were -19 with respect to wRC)-- but as a KC fan, I'll be glad to see him come up in the third spot in 2014.

Is a Team of Average Players an Average Team?

Depending on who you are, you'll have one of three responses to this question:

That's so obvious I don't even know why you're asking it.
That's so obvious I-- well, now that I think about it a little more...
Click on nearest link to Kate Upton bikini pictures[1].

Because I fell in the category #2, I felt the need to test this with the simulator. And really what I mean with this question is "Does a lineup of average players produce an average number of runs?" The argument for "yes" has the backing of the stats community---the statistic wRC (weighted runs created) works. It predicts with high accuracy the number of runs a team will produce. And all you need to calculate it is the number of times each player reached base and how; a walk, a double, etc. And that statistic doesn't care about whether one player got those hits more than another.

The argument for "no" is that statistics like wRC aren't perfect; there's a dispersion[2] of 25 actual runs for teams that have the same wRC value. And different slots in the lineup matter more than others--- a better hitter in the 2-spot will help the team more than a better hitter in the 8-spot. So if you swap in an above-average player in the two spot and a below average player in the 8-spot, don't you end up with a better team?

Really, this all depends on how you define average. If you take all qualified players and sort by wOBA, and take whoever is smack in the middle... that's not actually an average player. An average player would have the statistics of what happens during an average plate appearance. That means that you take every single, double, triple, walk, etc, that happens during an MLB season, and divide by the total number of plate appearances that occurred. Thus the extra at-bats that high-slot players get are taken into the definition of "average":

700 plate appearances
141 strikeouts
109 singles
31 doubles
3 triples
18 home runs
54 walks
6 times hit by a pitch

The total runs achieved by a lineup of nine clones of this player is identical to the average runs from all lineups created from actual players. So, chalk one for the statheads.

This implies that most of the variance of team performance at fixed values of wRC can be attributed to "luck". This means that even though the number of hits+walks is the same, their distribution within each inning is different. If an average MLB team reached base 8 times per game, the fewer innings you bunch those baserunners into, the more runs you'll score. More on that in a later post.

This is not related to the shenanigans that managers engage in when constructing lineups. Although lineups shuffle around the same players, the shift in runs created is (mostly) due to the fact that a team will produce fewer baserunners when putting crappy players in the first two spots, and this will be reflected in their wRC.

[1] But really... are there any other kind of Kate Upton pictures?

[2] For those of you not familiar, "dispersion" here is the same as "standard deviation" (which is the square root of variance). It means that about 34% of teams will be up to 25 runs above what wRC predicts, and 34% will be up to 25 runs below what wRC predicts (assuming Gaussian statistics). Converting that into wins, there's over 5 wins separating teams at the high and low of one standard deviation. The remaining 32% of teams will be even farther apart. Just ask Cardinals fans this year.

Monday, January 13, 2014

Digging Deeper with the Double Play

This post is essentially a repository for results about the impact of double plays on an office. These results are helpful for understanding various results of what lineups are better than others. I refer to some results are RoyalsReview, but my technical skills aren't sharp enough to figure out how to get them onto the post I wrote there.

The first plot shows the number of double play opportunities (DPO) per slot in the lineup, relative to the average number of DPOs each player sees (thus +0.1 is 10% more opportunities, -0.1 is 10% less). Some of the data come from The Book (Tango et al) from MLB data from 1999-2002, the other results come from my lineup simulator. Slots 2 and 3 face the maximum number of opportunities. Slots 1 and 9 face the minimum. Don't pay attention to the "wOBA" lines-- they're for another post here.

There's also the matter of how much each double play costs the team. Double plays in the heart of the order matter more than double plays in the 6/7 spots, because there's not a lot of offense that comes out of the 8/9 spots. The run value of a double play in the 9 spot is so high because we're about to turn over in the lineup and get to (supposedly) good batters at the top of the order.

Combining these two results together, the 3-spot is still the most important slot for double plays, but the 4-spot isn't far behind. My statement that Butler faces too many DPOs in the 4 spot was reductive. In the four-spot, there are slightly more DPOs, but the cost per double play is maximal. The cost per double play in the 3-spot seems strangely low: this is because the third spot is the least important spot of the first five. I know this sounds counterintuitive, but I'm preparing a post with supporting evidence soon (or you can read the chapter on lineups in The Book, which comes to the same conclusion).