Quick and Dirty MySQL Diagrams on OSX with EOModeler · 510 words posted 06/08/2007 05:45 PM

Whenever I take on a new project with an existing database the first thing I like to do is generate an entity diagram of all the tables. On OSX you can buy or download any number of third party packages to model tables, but here’s how to do it with software you already own: WebObjects and EOModeler.

The executive version:

Create a MySQL database with Foreign Key relationships; then install the proper JDBC driver for MySQL; finally point EOModeler to your database to generate the diagram.

Although these steps are covered in various places on the web, here’s the whole thing step by step in detail. I assume you have MySQL 5 and OSX 10.4.9.

cd ~/Desktop/mysql-connector-java-5.0.6 sudo cp mysql-connector-java-5.0.6-bin.jar /Library/Java/Extension [Enter your password when prompted] jdbc:mysql://localhost/yourdatabasenamehere

Schweet! You can save the generated diagram and reopen it in Xcode. You should end up with a diagram that looks something like the image below, foreign keys and all. Tip: if you end up with am archipelago of unconnected tables, you’re probably working with MyISAM tables instead of InnoDB.

Questions? Post them in the comments here. Just don’t expect any replies Sunday night. I’ll be respecting the Bing.

Comment

* * *

Parsel: a Mint Plug-in for detecting language, now updated · 29 words posted 01/31/2007 04:08 PM

Parsel is a plug-in for Mint, designed to detect your visitors’ browser settings. It’s updated it to work with Mint 1.29 and higher.

You can download it on google code.

Comment

* * *

Mini Review: SQL Hacks · 70 words posted 01/09/2007 12:42 PM

While writing a detailed review of SQL Hacks I noticed slashdot posted its review this morning so I’ll keep my notes brief.

SQL Hacks is one of the most useful guides to SQL I’ve read in years. My favorite hack is #80, “Play Six Degrees of Kevin Bacon.” If you already have a basic understanding of SQL and want to explore new ways to slice your data, check out this book.

* * *

Mapping Your Library with Amazon Web Services: Similar Items · 450 words posted 12/22/2006 01:27 PM

I’ve always enjoyed The Social Life of Books: Visualizing Communities of Interest via Purchase Patterns on the WWW, an article by the network analyst Valdis Krebs. Using Amazon’s publicly available data on book purchases Krebs sifts through purchasing patterns to construct a network of books based on people who purchased The New Pioneers.

Fortunately, it’s easy to use the Amazon Web Services API (AWS) and its “similar products” list to map your own collection of books. After reading Krebs’ work, I wondered: what I could learn from mapping my own library? As it turns out, not so much.

The image above shows a small section of my similar items map of 400 books. Each circle represents one book. (The number inside each circle is simply the book’s unique idenfitier). Arrows connect similar books.

I expected to see many interconnected clusters like the one to the left of the image. But here I ran into my first problem: Amazon narrowly defines “similarity.” For each multi-book cluster I found multiple orphans (books with no similar titles in my library) or duplets (a set of two similar books, with no similarities to any other title in my library).

For example, the duplet on the far right of the graph shows that The Future of Ideas is similar to Free Culture. Apparently, Lawrence Lessig’s books just aren’t like anything else in my library.

A second quirk: similarity often runs only one way. “The Future of Ideas” is similar to “Free Culture,” but “Free Culture” is not similar to “The Future of Ideas.”

Finally, similar books are often clustered so tightly that they don’t reach out to anything else in the library. The graph on the right is a Harry Potter cluster: books 1 to 6 in the series, with a Tolkien book thrown in at random. Strangely, Return of the King is similar to Harry Potter and the Sorcerer’s Stone, but isn’t similar to any other Harry Potter title. Likewise, “Harry Potter and the Sorcerer’s Stone” isn’t similar to “Return of the King.”

It’s easy to build a map of your own library. The key ingredients:

And that’s it. Click here to view the full size graph (PNG format, 120K).

After the holidays, I’ll post a tutorial for hooking up your book data to Mark Shepherd’s amazing SpringGraph component and show you how to sift by subject instead of similar items for a better view of your library.

Comment [1]

* * *

HowTo: Assign a Class to a Rails Form · 101 words posted 07/15/2006 09:37 AM

This isn’t rocket science, but the syntax confused me so I’m getting this into google. To assign a css class to a form using Rails ActionView::Helpers::FormTagHelper and start_form_tag, use the following code, where f-wrap-1 is your form’s CSS class:

start_form_tag({:action => 'create'},{:class => 'f-wrap-1'})

If you need to assign the class to an update form and pass your object’s ID via the URL, do it this way:

start_form_tag({:action => 'update', :id=> @post}, {:class => 'f-wrap-1'})

Thanks to the Textdrive forum members for helping me sort this out.

Comment

* * *

Review: Baseball Hacks · 725 words posted 05/16/2006 10:52 AM

According to the NY Times, the internet arm of Major League Baseball has sued a St. Louis company operating commercial fantasy sports leagues:

[The] relationship between players and numbers, so often romanticized, is now being stripped to its skeleton in a lawsuit with considerably wider ramifications. While the dispute focuses on fantasy baseball—in which millions of fans compete against one another by assembling rosters of real-life major leaguers with the best statistics—a real legal question has arisen: Who owns that connection of name and number when it is used for such a commercial purpose?

Many onlookers have cast this issue as a tiff over batting averages—as if children were squabbling over the backs of baseball cards—but legal experts are saying it could affect the wider arena of celebrity rights, freedom of the press and even how the press is defined as the Internet age unfolds.

The dispute is between a company in St. Louis that operates fantasy sports leagues over the Internet and the Internet arm of Major League Baseball, which says that anyone using players’ names and performance statistics to operate a fantasy league commercially must purchase a license. The St. Louis company counters that it does not need a license because the players are public figures whose statistics are in the public domain.

However the case is settled, the outcome will affect only commercial use of sport statistics; NBA v. Motorola found that NBA statistics are facts and thus not subject to copyright law in the context of personal use.

That’s a relief: more than most sports, an appreciation of baseball requires understanding the numbers behind each play, and Joseph Adler has written Baseball Hacks, an exceptional guide for finding, graphing, and analyzing the stats at the heart of the game.

The very best hacks start with an intellectually curious author asking two questions: how does this black box work? And what technologies can I use to pry it open? Mr. Adler wanders across the various black boxes of baseball statistics and introduces the reader to an array of tools: Perl, MySQL, and best of all R, an open source language and environment for statistical computing. Most importantly, this book is fun, even for the casual fan (I am not a seamhead). Behind every hack one can clearly see Mr. Adler sharing the pleasure of discovery.

One of the simpler hacks is #35: Comparing Teams and Players with Lattices (available as a free PDF on the book’s samples page), which generates the density plot pictured at the top of this article, showing team batting averages from 2003. Here’s the code, in its entirety.

One of my favorite hacks is #51: Measure Pitching with DIPS:

In December 1999, baseball fan Voros McCracken came up with a new method of measuring pitching. McCracken started to wonder whether a pitcher could really do anything about balls in play; were outs from balls in play a function of a pitcher’s skill, the defense’s skill, or dumb luck? He set out to test this hypothesis and discovered (much to his surprise) that it wasn’t pitcher skill. He concluded that what happens after a ball is put in play depends on the defense. Only on walks, strikeouts, and home runs is the defense not involved.

Thus was DIPS (Defense Independent Pitching Stats) born. It’s amazing to consider that even though baseball has been around for more than 100 years, a student can still come up with a new way to crunch the numbers, and Mr. Adler shows you how to calculate DIPS for yourself using R. (For more on DIPS, see McCracken’s article in Baseball Prospectus or Moneyball by Michael Lewis.)

I have only two complaints about “Baseball Hacks”: first, the code has more unforced errors than I’d like to see and could have benefited from tighter tech editing. (Disclosure: I have tech edited numerous titles for O’Reilly and other publishers).

Second, and this is absolutely no fault of Mr. Adler’s, if you work and play on a Mac—as I do—you’ll need to fire up your Windows machine to run all of the hacks. It’s hard, if not impossible, to use RODBC on Mac, and while the book includes instructions for configuring the RMySQL package I couldn’t get them to work.

But the minor Mac annoyances in no way diminish the fun of working through this book.

Highly recommended.

~~~
Buy Baseball Hacks from Amazon.

Comment

* * *

Ad: SubEthaEdit on macZOT · 198 words posted 04/25/2006 07:05 PM

macZOT.com is currently promoting SubEthaEdit from CodingMonkeys. According to the site, macZOT and TheCodingMonkeys will award $105,000 in Mac software. It’s a clever promotion: for each blogger who posts a link back to the promotion, the price of SubEthaEdit drops by 5¢. I feel a little tool-ish for participating, but only mildly so: I actually use and like the software.

SubEthaEdit’s collaborative text editing gets all the attention, but even without the rendevouz-based features it would be a light, tight, most-of-what-you-need-but-not-a-bit-more scripting tool. My favorite features:

Some day, I’d like to see the best features of TextMate and SubEthaEdit melded into one light and perfect Mac-centric code editor, but until that day SubEthaEdit is definitely worth a look. As I post this entry, it looks like the price is down to $7.60, from $35.

See BLOGZOT 2.0 on MacZOT.com for details on the promotion.

* * *

Kunal Anand's XML Exam, Question 4 · 899 words posted 04/21/2006 06:40 PM

Earlier this month Kunal Anand posted Some XML Exam Questions designed as “fun and practical bedtime exercises.” Hey buddy, bedtime’s all about The Daily Show with Jon Stewart, but afternoons, ah… afternoons were made for XML.

Here’s question #4:

Scrape a dynamic list from a web site (i.e. the Google Zeitgeist) and serialize a well-formed Atom feed.

The only other requirement: you can only implement the solution using Perl, Python, or Ruby. While I’m learning Ruby as part of picking up Rails, I have to admit I’ve never coded in Perl other than tweaking the occasional MovableType script, so a new (to me) language seemed like a fun way to solve the problem.

A lot of you out there will roll your eyes because this is so obvious, but Perl rocks! Trust me: if you earn your living coding PHP, or ColdFusion, or ActionScript, or C#, spend an afternoon with Perl. All of the problems are already solved. There’s a Perl library for everything under the sun.

The short answer to Kunal’s question: Simon Cozens has already solved the problem for us. See Painless RSS with Template::Extract, Hack #24 in O’Reilly’s excellent Spidering Hacks. But the HTML Simon extracts is simpler than the one I wanted to extract, and he generates RSS instead of Atom.

So here, step by step, is one way to scrape content from a page and atomize it. My solution is based on Simon’s code, tweaking and building on it when necessary. If you’re new to Perl, download the code and follow along. If you’re an old Perl hand, this might be boring. Skip it and read about converting illuminated Persian manuscripts into Flash applications instead.

See lines 34-46 for the template I came up with to match the popular searches widget. The key: once we’ve found the first ordered list beginning inside the mostSearched div (lines 36 and 37), we loop through the contents of the list and populate an array called records. The contents of the list are simple: we only need to extract a url, which will allow us to click a link and conduct the chosen search on the NYT site, and the query, or search string.

my $data = $x->extract($template, $page);

The hard work is done; all that’s left is to loop through the $data and output it as an Atom feed (Simon’s original tutorial used RSS but the logic is the same).

To run the script yourself, install the required modules, open your terminal and type:

perl atomize_nyt.pl

And that’s it.

Footnote: I’ll parse Kunal’s requirements like any good J.D. should; he said the feed had to be well-formed but he didn’t say it had to be valid. I typically use RSS and not Atom, but after I wrote my script I discovered that the library I used, XML::Atom::SimpleFeed produces an older flavor of Atom which, while well-formed XM, is no longer favored. For a valid feed, use the XML::Atom module instead.

Comment [2]

* * *

Kunal Anand Visualizes del.icio.us Tags · 97 words posted 04/06/2006 12:24 PM

Kunal Anand, an engineer at JPL, has written a visualization of del.icio.us tags using Python. I sent my tags to him in XML format and this is what he came up with (cropped):

According to Kunal:

  1. Each dot represents a tag (aka a node)
  2. Each line represents an intersection between tags
  3. The brighter colors indicate heavy intersections

The alogorithms appear similar to other social network maps such as Markos Weskamp’s Social Circles or Carey Priebe’s email usage patterns. You can view the full size, high resolution visualization here (464KB jpg).

Thanks Kunal! Via plasticbag.org.

Comment

* * *

Adobe: The Phone Number field does not match the pattern ^[0-9 \.,/\-\(\)\+]*$. · 203 words posted 04/05/2006 07:13 AM

While trying to update my customer account at Adobe I got the following error message:

The Phone Number field does not match the pattern ^[0-9 \.,/\-\(\)\+]*$.

I’m pretty handy with Regular Expressions and I still couldn’t get Adobe to accept my phone number. It’s a textbook example of bad error handling: written by programmers for programmers.

What’s more, there are several form fields for phone numbers and the error message is nowhere close to the offending field. Good error handling on the web follows two guidelines:

Error handling runs with a company’s culture. Some companies get it, others don’t. Macromedia got it: ColdFusion makes it easy to write form validation routines that help people recover from mistakes. Validation in Flash forms couldn’t be better: here’s an example from the Macromedia account login page. The field in question is highlighted, the error message is close to the error, and the language in the message isn’t technical.

Here’s hoping some of Macromedia’s best practices rub off on Adobe.

For more on error handling on the web, see Defensive Design for the Web from 37signals.

Comment [2]

* * *