Data ghettos

* Note: This post came from a version of this blog that got lost in a server failure. It's been restored from old RSS feeds, Google caches and other sources. As such, the comments, links and associated media have been lost.

One resolution for this year: Post more often. Starting now.

I’m not sold on the whole Data Desk/Data Center idea that a lot of newspaper websites are trying out. I hate to say all this because at a lot of places, the people responsible for them are my friends. But for all the love I have for putting data online, there’s something that has bothered me about the way they’re going about it. A friend summed it up for me recently: The Data Ghetto.

The Data Ghetto is that one mishmash page where all of that site’s databases are lumped together.

I won’t take the time to criticize how these pages are constructed — the criticisms are obvious and even people who have made them don’t like them. But if you take a step into one of the databases and you get to my second problem with them: couple of search boxes and a button.

Is that really it? Is that the big push into data? Sprawling, barely organized pages to get to a couple of search boxes and a button? This fails on a number of levels:

  • Creativity: Can we offer no more creative way into the data than to make a user put stuff here and hit search? Search is fine in context, but it’s also limiting. What if someone doesn’t know how to spell something, or doesn’t know what they want, or all they want to do is explore the data their own way? You’ve cut that off. In my opinion, browsing is much better. If your data is normalized — i.e. all the cities are spelled the same way, etc. — then you can let people click on the things they’re interested in and get those things. And in the process, they may see other things they’re interested in. To see it in action, look at how PolitiFact is browsable (example here). Or better yet, if you don’t believe me, look at There are 10 different ways to browse that data. There’s one search box. Search is a part of it — it has a value in context — but it shouldn’t be your whole app.

  • Repeat customers: A lot has been written about the traffic these database sites get. But I want to see what the traffic is like months after it first goes up. What’s the traffic like after the third or fourth update. The reason I ask is because some of these search apps to me seem like a pure voyeur play. What I mean by that is the user sees a salary database, goes and looks up their neighbor and … what? They’re done. They’ve answered the one question they wanted to ask. How are you bringing people back to your data?

  • Shaky business model: Are we really building a business model, or even a component of a business model, around making public data searchable? Because guess what? Google is too. That’s right. The search giant is dealing directly with government agencies to help them make their own data searchable. Sound familiar? Think your data ghetto can compete with Google? Do you think people are going to remember your url over Google? Really?

Here’s the to-be-fair portion of the post: I have exactly one data-driven app under my belt (PolitiFact). I have a half dozen more in the works, so I’m thinking about this stuff constantly. But for now, I can only talk. I can’t show, at least not yet.

That said, here’s how we can get out of the data ghetto: add some journalism to it.

Back in November, Will Sullivan tried to coin a term where multimedia and data collided into something he, jokingly, called multimedata journalism. Of note was a New York Times effort where they did a story about people freed from prison by DNA evidence. They interviewed 137 of 200 people released. They then put an app online that allows you to click on each name and see details about each case (data) and hear their story (audio).

I’d argue there doesn’t need to be a new term: This is what it’s supposed to be. Journalists are supposed to add context and value to information. Heaving databases online should be no different. Does each app require the type of effort that the NYT put in? No. But flatly serving up data with no context or analysis or value outside the record itself is hardly journalism. A public service maybe, but not journalism.

Next post: Instead of bringing journalism to your app, why not bring your app to your journalism?

By: Matt Waite | Posted: Jan. 3, 2008 | Tags: Journalism, Databases | 0 comments