Thursday, December 29, 2005

Who can read this blog?

Ok, so the previous post was all about who's coming. This post is about who (in theory) can actually read this blog.

Following a tip from the Download Squad blog, I plugged this blog into the Juicy Studio Readability Test. The test digested all the text on my blog and figured out how much education (on average) you'd need to read those words and sentences. After cranking through 535 sentences and 4,472 words (in about half a second), it concluded this blog has:
  • A Gunning Fog index of 7.64 -- in other words, you'd need roughly 7 years of schooling to understand this blog.
    • I think that's high, but there are probably a few entries here and there with some long words or difficult concepts.
    • An index of 7.64 puts us right between True Confessions and Ladies' Home Journal in difficulty. Reader's Digest is a 9, Newsweek is a 10, TIME a 11, and Atlantic Monthly a 12, in case you're curious.
  • A Flesch Reading Ease of 74.5 out of 100, with higher being easier to read.
    • According to the Juicy Studio site, writers are encouraged to aim for an index of 60-70.
  • A Flesh-Kincaid grade level of 4.94, another way to calculate the years of schooling you'd need. I think this result is far closer to the truth.
    • If anyone wants to volunteer their 4th or 5th grade student to do a sanity test, let us know.
It's illuminating what computer algorithms can deduce about readability from just looking at the words on a page. Try it yourself on your blog.

Side note: when I was at Oracle (shortly after the Iron Age ended), one of the coolest products I saw was an add-on to the Oracle database called Oracle Context (today's it's called Oracle Text, and it's a standard feature of the Oracle database). It analyzed words in a document much like the Juicy Studio site does, but it went one step farther and tried to understand the text rather than just count syllables and sentence lengths.

One thing you could do with the product was move a slider to control how much it tried to condense the text while preserving its meaning. You could literally create summaries of long articles in real time. It worked surprisingly well given that computers can't really understand text yet. I imagine it's gotten better, but I still haven't run into anyone in the real world that uses it. Sure would have been useful for 10th grade US History classes.

1 comment:

ivan said...

Very cool, Frank. I had some fun on the site typing in different URL's. I read in the NY Times once that USA Today's success was due to it's strategy of catering to a 4th grade reading level. Rather condescending of the NY Times (big surprise). However, according to Juicy Studio, you've got to be more educated to read the USA Today than the NY Times!