The Amazing, Astonishing Google Check: How I Used Google to Spell-Check Every Word in My Book.

Anyone who uses technical terms, scientific terms, foreign words, proper nouns, or brand names in their writing knows the limitations of the spell-checker built into a word processing program.  After all, it’s just a static list of words that got loaded onto your computer and never gets updated or expanded unless you do it yourself.  (And if you’ve ever tried one of the professional spell check add-ons, like Spellex, you may have noticed that they don’t always include every possible term in your field–I found the botanical checker particularly lacking–in which case, you still can’t be confident that you’ve caught everything.)

My latest book, The Drunken Botanist, is packed with weird, tricky words.  On a single page, I might mention the name of a flavor molecule, the Latin name of a plant, the surname of the French botanist who discovered it, and the brand name of a liqueur that is flavored by that plant.

That goes on for 400 pages.  You cannot imagine what a chore it was to proofread this book, and the level of sobriety required for the task.

After the completed, polished, edited, spell-checked manuscript had been proofread at least three times by me, my editor, a professional copy editor, a professional proofreader, a few other people I probably don’t even know about, and been read closely by a few smart friends and relatives, I got the pages back one last time for a final check.  It had already been typeset by then, so I got it as one long PDF.

Every time I saw a tricky term that didn’t look right, I double-clicked the word, copied it, and pasted it into Google to check.  Google, as you may know, is a surprisingly useful spell-checker:  if you get a word wrong, you’ll probably get “Did you mean…” right under the search term.  Even if that doesn’t happen, Google will generally take you to a variety of well-respected sources (or, in the case of a brand name, the company’s website) to help you check the spelling. It even catches pop culture terms, and it snags some context-specific stuff (for instance, if you wrote “hear” instead of “here”) And–bonus– Google is poly-lingual.

So as I was doing that, I was thinking, “I wish I could just Google the whole book.  Why can’t I do that?”

Then I realized that I could.  Google Docs (now called Google Drive) relies on Google’s search engine technology for its spell check function.

Why had I never thought of this before?  Here’s how I did it:

First, since I was working with a PDF, I copied the text and pasted it into a plain-text editor.

Once I had the whole document in Notepad, I copied chunks of it into a blank Google Docs document.  I found that there was an upper limit to how much text Google Docs could handle at once.  What worked for me was to put my cursor at a starting point in the Notepad text, then hit Page Down about 15 -20 times, and copy that much text at a time.    In my case, that worked out to about 35,000 words at a time.

Once you paste it into Google Docs, it takes a little while to process and save it–roughly 20 seconds.  At some point beyond that 20-second mark, with a larger chunk of text, it just gives up and won’t process it at all–at least, that was my experience.  So the sweet spot seems to be right about 35,000 words. (update: in 2020, you can pretty much always just paste the entire text in at once. But try breaking it into chunks if you find it’s too slow.)

Then all you have to do is go through and right-click on any word underlined in red.  It’ll give you a “Did you mean…” suggestion for anything that looks weird to Google–including people’s names, names of foreign cities, obscure scientific terms, all of it.

And guess what?  I found an astonishing thirty-eight errors with this method.

This is after it had been through a very rigorous and professional editing process that took months and passed through many very competent hands.  A process in which we’d all discussed how important it would be to check and double-check those tricky, difficult-to-check words. We weren’t even really proofreading anymore–this was just a final, quick look-see before it went to the printer.

And yet the silliest mistakes had escaped the notice of all of us.  Most of the mistakes I found had been in the original manuscript all along. We’d all missed them.

I can tell you that I will never again publish a book without running it through Google. (and I am fighting the temptation to Google my previous books–it is only the fact that I don’t have a PDF of the final version of each previous book that is holding me back.)

It’s time-consuming — the whole process took me 12-14 hours, in part because Google flagged a lot of words that were actually correct, but I still had to slow down and double-check them– but entirely worthwhile.   I think that if I had it to do over again, I’d run the Google check twice during the editing process.

The first time would be right before I transmit the final version of the manuscript. This is the version that my editor and I have already been over at least three times and that I have spell-checked (both with the computer and with my eyeballs) many times.  Once we transmit it, I never get it back as a Word document again.  From that point on, someone else inputs the changes. And new errors can get introduced as those changes are made.

So I’d Google-check it once right before transmittal just to eliminate obvious errors and make the professional copy editor and proofreader’s jobs easier.  The fewer mistakes they have to contend with, the more likely they will be to catch all the stuff that computers don’t catch.

Then, when I got final, typeset pages, maybe at the second pass stage, I’d take the PDF and copy/paste it and do the Google check one more time.  It probably wouldn’t turn up much, but then again, I wasn’t expecting to find 38 errors this time.

The genius behind this technology appears to be a guy named Yew Jin Lim.  Dude, you are invited to Thanksgiving at my house every year, from now on. Do not be surprised if I dedicate my next book to you. Srsly.  (Google got that word right, btw. And that one.)