Rob St. Amant

Rob St. Amant
Birthday
December 31
Bio
My roots are in San Francisco and later Baltimore, where I went to high school and college. I stayed on the move, living for a while in Texas, several years in a small town in Germany, and then several more in Massachusetts, working on a Ph.D. in computer science. I'm now a professor at North Carolina State University, in Raleigh. My book, Computing for Ordinary Mortals, will appear this fall from Oxford University Press. http://goo.gl/hQBHy

MY RECENT POSTS

JUNE 15, 2012 7:27AM

Justin Bieber is a Literary Giant

Rate: 9 Flag

End here. Us then. Finn, again! Take. Bussoftlhee, mememormee! Till thousandsthee. Lps. The keys to. Given! A way a lone a last a loved a long the riverrun, past Eve and Adam's, from swerve of shore to bend of bay, brings us by a commodius vicus of recirculation back to Howth Castle and Environs.

That's James Joyce, of course, in Finnegans Wake. Compare with a set of tweets by Justin Bieber:

what the heck is barley water? #london Everyone back home... push #ALLAROUNDTHEWORLD to #1 on ITUNES and #BOYFRIEND back up the chart! WE CAN DO IT!! #AlwaysUnderdogs #muchlove @heidiklum see u and Germany soon LONDON IS CRAZY!! LOVING IT!! Off to Germany tonight. Berlin get ready didnt understand anything you guys were saying but always fun. thanks for having me @chattyman #BIEBERBLAST !!!! 

 

We see the same grand range of time and space, the playful neologisms, the eccentric orthography; it's a celebratory passage. Bieber is darker than Joyce, here. Note the faux naivete, which subtly allows for mockery of English tradition (in the form of a soft drink) and more broadly of English agrarian culture (cf. John Barleycorn). Is Justin Bieber the heir of James Joyce?

The words of noted literary critic ProudSwifty are compelling: “#ALLAROUNDTHEWORLD PEOPLE WANT @justinbieber 's LOVE!! :P He may never notice me but Imma keep #BIEBERBLASTING the song to #1 on iTunes!!!”

Indeed.

Okay, I'm kidding. I've made all this up. The I write like Web site explicitly warns us that an analysis of tweets is not reliable. (And it actually says that Justin Bieber's tweets are closest to Kurt Vonnegut; I fiddled the output.) That's not to say that the site's analysis of ordinary text is any more reliable. My Alice pastiche is judged to be like Lewis Carroll, and a short science-fiction-ish fragment in my book is supposedly like Ursula K. Le Guin. Should I be pleased? Another passage I wrote is said to be like Cory Doctorow. But when I pasted a Guardian article actually written by Doctorow into the site, it told me that Cory Doctorow writes like H. P. Lovecraft.

But this is just for fun. Serious work can be done in analyzing style in text, and computers can help us do better than airy speculation. Frederick Mosteller and David Wallace produced some of the earliest results that came to public attention, in their statistical analysis of the Federalist papers. 85 articles were published under the name Publius in the late 1700s, by Alexander Hamilton (51 of the articles), James Madison (14), and John Jay (5), with three articles produced by a collaboration between Hamilton and Madison. That leaves 12 articles without a clear attribution. In the 1960s, Mosteller and Wallace carried out several detailed statistical analyses of the unattributed articles, looking for similarities to the attributed articles--were they written by Hamilton or Madison? (Word frequency analysis is still popular today; for example, Hamilton used the word "upon" much more often than Madison.) In the end, all 12 articles were attributed to Madison.

This is of special interest in recent years, because Google has released a tool for analyzing the millions of books it has digitized, books going back to the year 1800. The Ngram Viewer lets you see how the popularity of specific words and phrases has changed over the past couple of centuries. Some scientists have begun to explore the evolution of styles and other information that can be extracted from the historical data; we're seeing the emergence of a new area of scholarly work: the Digital Humanities.


Your tags:

TIP:

Enter the amount, and click "Tip" to submit!
Recipient's email address:
Personal message (optional):

Your email address:

Comments

Type your comment below:
Frankly, Bieber makes more sense than Joyce -- which may explain why I've never been able to get thru Finnegan's Wake. Who would I compare thee, too? I haven't the slightest, but let me say you're one of the more readable tech guys I've come across.

As for me, I've been compared to Michener and Kipling, I'll take that with a beaming smile and a grain of salt. I'm glad to be compared to almost anybody but Ayn Rand.
Ok. Now I'm officially afraid to write...
Seems to me, in pre-computer days, that someone did such an analysis of Shakespeare's work to "prove" he didn't write all the plays. I'm always a little suspicious of such analyses, but that doesn't mean they're not of some use.
Wow!
All that analysis stuff must be why so many writers say that writing is such hard work!

I just let whatever is boiling around inside me bubble up, out, and onto the page. Then I might re-arrange it a bit, call it "art" and post it.

The first three paragraphs of anything by Joyce leaves me pooped out and needing a nap. If there is more than page 1 to any of his novels, I've never seen them.......;-)
.
Fascinating! Thanks for this, and for the links.
Hmmmm... I plugged in three samples and got HP Lovecraft three times. I guess I should go read the guy. (Not wild about his book covers, but the analyzer doesn't take that into account.)
Hey, Tom, I actually haven't read Joyce's two masterpieces; at least, I've never finished either. Portrait was fine, and I love Dubliners--the ending to "The Dead" is one of the best passages in the Western Canon.

And thanks for the compliment! I think highly of your writing style, though it's very different from mine. (Also not Rand-like).

Go for it, KC. Maybe some day someone will say, "Hey, I write like KC Redding-Gonzalez!"

Lee, I checked on Wikipedia to refresh my memory, and there have been disputes about the authorship of Shakespeare since the mid-1800s. I don't think very many people take the question seriously, though. These arguments remind me of the old joke about the Odyssey and the Illiad. These epics weren't actually written by Homer--rather, they were written by a different guy who happened to have the same name.

Hey, skypixie. Analysis happens after the fact, after all the hard work of writing is done. I don't think anyone thinks about replicating an author's style by looking at the statistics. (Aside from artificial intelligence researchers, maybe--I have a vague memory of someone trying to do this. I know it's been done for musical styles, but I'm not sure about prose.)

Oh, no, Karen! I think of Lovecraft as being great with ideas but terrible at writing. We're definitely seeing a failing of the "I write like" analysis. Unless you've started to include phrases like "nameless dread" and "eldritch horror" in your work. :-)
Tom, I'm so glad I'm not the only one - I've always felt embarrassed but defiant about not finishing Finnegan's Wake. Thanks for evoking the comparison of Rob to a summer's day. My mind twirled on that one for awhile. :-D
Don't know if you're aware but The Dead was made into a movie. In fact, it was the last film John Huston directed. It's unrelentingly sad, but a magnificent period piece. It's worth watching just for the exquisitely rendered passage you mentioned. The words -- well, there are no words to describe what Joyce could do at his best.

Hi Sandra, good to "see" you, hope all is well.
Happy Bloomsday, Sandra and Tom!

(I have The Dead DVR'ed; maybe today would be a good day to watch it.)
Cory Doctorow
J.D. Salinger
George Orwell
it's official, I write like a man. Evidently any man.
Very funny, Julie. I wouldn't have thought so--were you trying your poetry? I think there's a reasonable chance that a system could be developed to identify similarities between poets, because we have a lot of ways of looking at the structure of poems.

Now I'm thinking, "What woman writers would I be happy to write like?" Margaret Atwood, Isabel Allende, A. S. Byatt... That's just two letters into the alphabet.
I'll give I Write Like credit -- I popped in this text from my homage to Swift's "A Modest Proposal" ...

"I think it is agreed by all parties that this prodigious number of children in the arms, or on the backs, or at the heels of their mothers, and frequently absent their fathers, is in the present deplorable state of the country a very great additional grievance; and, therefore, whoever could find out a fair, cheap, and easy method of making these children sound, useful members of the nation, would deserve so well of the public as to have his statue in Washington.

But my intention is very far from being confined to provide only for the children of welfare mothers; it is of a much greater extent, and shall take in the whole number of infants who are born of parents in effect as little able to support them as those who demand our charity. "

... and five seconds later I Write Like informed my that I write like Jonathan Swift.
However, three excerpts from my current post came back as Dan Brown, Edgar Allen Poe and HP Lovecraft. What those three authors have in common escapes me. Maybe IWL is trying to tell me I lack a consistent style. That could well be.

One thing certain -- IWL is fond of HP Lovecraft. To test further, I inserted the quote from Machiavelli that appears under my bio on OS and it came back as HP Lovecraft. A cynic might be tempted to suggest somebody set up the site to sell books by HP Lovecraft.

Out of curiosity, I tried several different excerpts from comments left by others on my latest post. A high percentage of those returned Dan Brown. What I suspect may be going on here isn't so much a matter of style as it is content. A few words or phrases are leading the bots to pick authors who've written on the subject matter. That's what we call SWAG -- strictly a wild-assed guess.
One more if you can stand it. As a final test, I tried an excerpt from a white paper by George F Kennan, a brilliant man but hardly known for his literary talents. The result? You might have guessed -- HP Lovecraft.
Okay -- I promise this is the last -- at least I'm keeping you in the feed. I tried the passage from The Dead ...

"A few light taps upon the pane made him turn to the window. It had begun to snow again. He watched sleepily the flakes, silver and dark, falling obliquely against the lamplight. The time had come for him to set out on his journey westward. Yes, the newspapers were right: snow was general all over Ireland. It was falling on every part of the dark central plain, on the treeless hills, on the Bog of Allen and, farther westward, softly falling into the dark mutinous Shannon waves. It was falling, too, upon every part of the lonely churchyard on the hill where Michael Furey lay buried. It lay thickly drifted on the crooked crosses and headstones, on the spears of the little gate, on the barren thorns. His soul swooned slowly as he heard the snow falling faintly through the universe and faintly falling, like the descent of their last end, upon all the living and the dead."

The result? James Joyce. Don't know whether that supports my theory or not, but at least IWL identified the author correctly.
Hey, Tom,

Thanks for sharing your experimentation! If I were putting together a Web site to analyze similarity in writing, I'd do it in pretty much the way you describe. Here are a few of the basic concepts, from an area of computer science called information retrieval.

We start out by collecting a set of documents that cover the writers we're interested in. All of the short stories by Joyce and Lovecraft, for example, passages from novels by Jane Austen and so forth, and maybe newspaper articles from other writers. Each piece of writing is a different document, and we can break down the entire set of documents into subsets, one subset for each writer. We'll call each subset a cluster.

Now we get a new unknown document, whatever text someone has pasted into our Web form. The question we ask is this: Which cluster should the new document be put in? To answer this question we need a similarity metric, something that lets us measure how similar documents are to each other. I won't go into the mathematical details, but one useful concept is term frequency, or tf: how often does a term occur in a given document? If we computed tf for all terms in two documents, and the frequencies were similar, we'd think that the documents were similar, too. Another useful concept is inverse document frequency, or idf: how frequently is a given term used over all the documents in the original set? A combination of these two factors, tf and idf, can be used to define the similarity between two documents and also the similarity of an unknown document to a cluster of documents; we can assign the unknown document to the cluster it's most similar to.

Voila--you write like, um, Edgar Allen Poe.
Oh, and I forgot to mention this hilarious analysis of Dan Brown's writing style, by Geoffrey K. Pullum.
Ya know I love reading ya Rob but I'm sorry, Justin Bieber?!? I had to come racing over here because I simply could not get my head wrapped around Rob St Amant and The Bieb in the same blog..

..and KC's comment has me on the floor in a fit of insane giggles :D.

But..

Rated for commentary hilarity and TC to the Nth degree.
Hey, Seer. Sorry to have let you down--but I think sometimes I write posts just to read the commentary, myself. :-)