Along with watching Clemson down FSU in Death Valley, and jump to 13th in the AP poll, I saw Moneyball in theaters this weekend. I had already read the book and wrote one blog post about the book and another about the theory and style of play as a guest blogger on LenNY’s Yankees. Here are my thoughts on the movie and some mathematical insights into the theory of Moneyball.
Moneyball the Movie
Despite not being a baseball fan, Brad Pitt plays Oakland A’s GM, Billy Beane, who finds a young Yale economics graduate to help him build a baseball team around hard numbers instead of flimsy intuition. The movie opens with the Yankees beating the A’s in the 2001 ALDS, which for a Yankee fan like myself was great. Beane’s various conflicts with those who didn’t want to see change in the organization (namely managers and scouts) were very entertaining, but Jonah Hill who played the nerdy Yale grad, Peter Brand, stole the show, giving some comical relief to what is a somewhat serious movie. The movie focuses much more on the relationship between Beane and his daughter than the book, which largely excluded the topic. In fact, I’d say the movie was about Billy Beane while the book was about the players Beane picked out and the theory itself. Overall, the movie is worth seeing despite being more of a character study than an in-depth look at Moneyball as a baseball theory.
The Statistics Behind Moneyball
On to the numbers! As my subscribers know (…and thank you to everyone who has already subscribed.) I’m currently in a Masters program that spends quite a bit of time on the type of regression and model building that Moneyball is based upon (You can subscribe by clicking the “follow” button on the top of the right sidebar). For a class project, I’m creating a model that can predict the amount of runs a team will score based on various batting statistics.
Now I’m just getting started, and there is a lot of work left to do, but so far all my findings are backing up the theory behind Moneyball. My math shows that batting average is statistically insignificant in predicting runs, while OBP was highly relevant. A home run is worth 1.62 runs, a single .22 runs, a double .61 runs, and a triple 1.39 runs. A stolen base is worth .18 runs, but a caught stealing hurts more than a stolen base helps. AL teams score about 12 runs more per season than NL teams, everything else held constant. I’m sure these numbers will adjust slightly as I tweak the regression and update the 2011 season with the stats from the last few weeks of the season. Right now I am basing those numbers on the previous four seasons and this year’s stats as of mid-Septemeber.
I thought the baseball crowd and the marketing research crowd would both find that bit interesting. I was certainly entertained when my SAS output told me that batting average wasn’t even significant enough to include in the model. Now the question is, do I consider adding OBP to my fantasy leagues?
That’s a really cool idea, Ryan. I would be really interested in seeing how the research turns out in predicting scoring next year. I actually use AVG and OPS in my fantasy baseball league. For the research, you could maybe try using ISO in place of OBP and AVG. You could also check out wOBA here: http://www.insidethebook.com/woba.shtml
Thanks for the comment. You use OPS in your fantasy league? Do you go 6 x 5 or do you have an additional pitching stat to balance it out?
I’ve been considering adding OBP and Quality Start % for a while. The concept of wOBP is based on the type of regression I’m doing. They found a HR to be worth 1.7 which is really cool because I got 1.62…very close. Amazing stuff with regression analysis.
If you want to be your own boss, there are many ways of the process. Once you have decided on a enterprise idea, the next phase is to determine how we want to set up …mercury dimes
Pingback: 2013 Season Preview: American League West