Mapping Your Library with Amazon Web Services: Similar Items · 450 words posted 12/22/2006 01:27 PM

I’ve always enjoyed The Social Life of Books: Visualizing Communities of Interest via Purchase Patterns on the WWW, an article by the network analyst Valdis Krebs. Using Amazon’s publicly available data on book purchases Krebs sifts through purchasing patterns to construct a network of books based on people who purchased The New Pioneers.

Fortunately, it’s easy to use the Amazon Web Services API (AWS) and its “similar products” list to map your own collection of books. After reading Krebs’ work, I wondered: what I could learn from mapping my own library? As it turns out, not so much.

The image above shows a small section of my similar items map of 400 books. Each circle represents one book. (The number inside each circle is simply the book’s unique idenfitier). Arrows connect similar books.

I expected to see many interconnected clusters like the one to the left of the image. But here I ran into my first problem: Amazon narrowly defines “similarity.” For each multi-book cluster I found multiple orphans (books with no similar titles in my library) or duplets (a set of two similar books, with no similarities to any other title in my library).

For example, the duplet on the far right of the graph shows that The Future of Ideas is similar to Free Culture. Apparently, Lawrence Lessig’s books just aren’t like anything else in my library.

A second quirk: similarity often runs only one way. “The Future of Ideas” is similar to “Free Culture,” but “Free Culture” is not similar to “The Future of Ideas.”

Finally, similar books are often clustered so tightly that they don’t reach out to anything else in the library. The graph on the right is a Harry Potter cluster: books 1 to 6 in the series, with a Tolkien book thrown in at random. Strangely, Return of the King is similar to Harry Potter and the Sorcerer’s Stone, but isn’t similar to any other Harry Potter title. Likewise, “Harry Potter and the Sorcerer’s Stone” isn’t similar to “Return of the King.”

It’s easy to build a map of your own library. The key ingredients:

And that’s it. Click here to view the full size graph (PNG format, 120K).

After the holidays, I’ll post a tutorial for hooking up your book data to Mark Shepherd’s amazing SpringGraph component and show you how to sift by subject instead of similar items for a better view of your library.

* * *


1. On Dec 22, 08:21 PM Valdis Krebs said:

First of all, I am glad you enjoyed my article

You said... "After reading Krebs' work, I wondered: what I could learn from mapping my own library? As it turns out, not so much."

LOL! Although your further description points to a possible technical glitch, most people will NOT learn much from peering inside their own cluster [ e c h o chamber? ]. You already know all that is local to you... no surprises... just "Yeah, so what". What is interesting is how your cluster links to/intersects others.

On Amazon, that fact that one books point to another, and not vice versa, has to do the sales volume of each and the fact that we only see the top X books for each. If we had complete lists to compare we would see the A->B AND B->A relationships pop out from both lists.

Good luck on your further explorations!

BTW, I used this type of network analysis to figure how to position my book on connectivity.

#