Review: Baseball Hacks · 725 words posted 05/16/2006 10:52 AM

According to the NY Times, the internet arm of Major League Baseball has sued a St. Louis company operating commercial fantasy sports leagues:

[The] relationship between players and numbers, so often romanticized, is now being stripped to its skeleton in a lawsuit with considerably wider ramifications. While the dispute focuses on fantasy baseball—in which millions of fans compete against one another by assembling rosters of real-life major leaguers with the best statistics—a real legal question has arisen: Who owns that connection of name and number when it is used for such a commercial purpose?

Many onlookers have cast this issue as a tiff over batting averages—as if children were squabbling over the backs of baseball cards—but legal experts are saying it could affect the wider arena of celebrity rights, freedom of the press and even how the press is defined as the Internet age unfolds.

The dispute is between a company in St. Louis that operates fantasy sports leagues over the Internet and the Internet arm of Major League Baseball, which says that anyone using players’ names and performance statistics to operate a fantasy league commercially must purchase a license. The St. Louis company counters that it does not need a license because the players are public figures whose statistics are in the public domain.

However the case is settled, the outcome will affect only commercial use of sport statistics; NBA v. Motorola found that NBA statistics are facts and thus not subject to copyright law in the context of personal use.

That’s a relief: more than most sports, an appreciation of baseball requires understanding the numbers behind each play, and Joseph Adler has written Baseball Hacks, an exceptional guide for finding, graphing, and analyzing the stats at the heart of the game.

The very best hacks start with an intellectually curious author asking two questions: how does this black box work? And what technologies can I use to pry it open? Mr. Adler wanders across the various black boxes of baseball statistics and introduces the reader to an array of tools: Perl, MySQL, and best of all R, an open source language and environment for statistical computing. Most importantly, this book is fun, even for the casual fan (I am not a seamhead). Behind every hack one can clearly see Mr. Adler sharing the pleasure of discovery.

One of the simpler hacks is #35: Comparing Teams and Players with Lattices (available as a free PDF on the book’s samples page), which generates the density plot pictured at the top of this article, showing team batting averages from 2003. Here’s the code, in its entirety.

One of my favorite hacks is #51: Measure Pitching with DIPS:

In December 1999, baseball fan Voros McCracken came up with a new method of measuring pitching. McCracken started to wonder whether a pitcher could really do anything about balls in play; were outs from balls in play a function of a pitcher’s skill, the defense’s skill, or dumb luck? He set out to test this hypothesis and discovered (much to his surprise) that it wasn’t pitcher skill. He concluded that what happens after a ball is put in play depends on the defense. Only on walks, strikeouts, and home runs is the defense not involved.

Thus was DIPS (Defense Independent Pitching Stats) born. It’s amazing to consider that even though baseball has been around for more than 100 years, a student can still come up with a new way to crunch the numbers, and Mr. Adler shows you how to calculate DIPS for yourself using R. (For more on DIPS, see McCracken’s article in Baseball Prospectus or Moneyball by Michael Lewis.)

I have only two complaints about “Baseball Hacks”: first, the code has more unforced errors than I’d like to see and could have benefited from tighter tech editing. (Disclosure: I have tech edited numerous titles for O’Reilly and other publishers).

Second, and this is absolutely no fault of Mr. Adler’s, if you work and play on a Mac—as I do—you’ll need to fire up your Windows machine to run all of the hacks. It’s hard, if not impossible, to use RODBC on Mac, and while the book includes instructions for configuring the RMySQL package I couldn’t get them to work.

But the minor Mac annoyances in no way diminish the fun of working through this book.

Highly recommended.

~~~
Buy Baseball Hacks from Amazon.

* * *