Saturday, January 11, 2014

Freakonomics and the Double Play

Double plays are good for a baseball team.

Okay, that's a loaded statement. They are not the cause of a team being better, but they are a bellweather of team performance. You don't have to scratch the surface of baseball statistics very deep to get to this level of understanding: teams that get more runners on base score more runs. Teams that get more runners on base encounter more double play opportunities, and the GIDP rate doesn't vary all that much from team to team. Thus, teams that score more runs hit into more double plays. This is the same sort of cocktail-party sabermetrics as "teams that leave more men on base win more games" that Malcolm Gladwell would really appreciate. But it's interesting to see it borne out in the data, and with a much higher correlation than I would have predicted.

The figure below shows the number of double play opportunities (DPO) each player sees relative to the average number of DPOs per player per game. A value of 0.1 mean 10% more DPOs per game over the average player. -0.1 means 10% less. The green squares are taken from The Book, which are based on MLB data (American League only) from 1999-2002. The circles are from 1,000 simulated seasons for different lineups[1]. Each simulated lineup samples from all 2013 players with >90 PA. The error bars indicate the season-to-season dispersion for a fixed lineup.

[1] How you order the players does make some difference in these results. For instance, if I order the players 1-9 based  on highest to lowest wOBA, the GDOs in the 2-slot increase significantly and the GDOs in the 7/8/9 slots go down a little to make up the difference. Here I've implemented a "MLB-like" lineup where the leadoff hitter is the 5th best player by wOBA, the 2-5 slots are the best 4 players in random order, the 6th slot is the 6th-best player, and the bottom of the order are the lowest three wOBAs, also in random order. 

Using this lineup, the results are quite consistent with The Book's values. There is significant team-to-team spread in this curve, however. The green dahsed line is the mean DPOs for teams in the top 10% of simulated wOBA, and the red dotted curve are the lowest 10% of teams.

Double play opportunities (DPO) relative to the mean for each lineup slot. The bracket notation "<>" indicates the mean. The error bars on the simulations represent the season-to-season dispersion for the same lineup of players. MLB data are taken from Tango et al.'s The Book.



We can try and look for this effect in the actual MLB data as well. The top panel just compares my simulated results to MLB data from 2005-2013, where the run-scoring environment is essentially the same as what I've set up in my simulations. The correlation coefficient r=0.11 for the MLB data and 0.07 for my simulated results. This isn't an overwhelming correlation, but it is there and it is positive.

Extending these data all the way down to 1968 yields the middle panel, where the correlation becomes apparent even to the naked eye. Results for strike-shortened seasons have been rescaled to 162 games. The sidebar shows the r-value for each 9-year chunk I looked at, hitting a maximum of r=0.26 in the height of the PED era. I'll leave a more rigorous investigation of why these dependencies changed with baseball era so significantly to a future post (perhaps just sample variance), but the bottom panel shows definitively that this correlation exists.[2]

[2] The results there are 'jumpy' because I've binned by sets of 60 teams, sorted by Team Runs, across the x-axis, instead of binning in fixed-sized bins of runs. This is a superior way to bin data when there is uneven coverage across the x-axis. Note that the range on the y-axis on this plot is smaller than the upper two panels.




No comments:

Post a Comment