Monday, November 23, 2009

Spelling correction

We are terrible at spelling. And in the age of internet, it is definitely getting worse. When we are chatting online or texting, we use all kinds of short cuts to save a couple of key strokes. What is the last time I actually typed "see you later" instead of "c u"? Seems to be a century ago. Also, a lot of time I seems to be typing "thx" instead of "thanks", or sometime "thaks".

A pattern matching based conversation agent can't handle those mis-spelling directly. Basically, there are three ways to tackle this problem. Method one is using some sophisticated matching algorithm that is capable to match words against likely mis-spellings, like able to match "restaraunt" to "restaurant", or "thaks" to "thanks". This can generate some amazing results. But it tends to be quite slow, and usually not working well with those internet slang. Second way will be actually program all the possible mis-spelling combination into the pre-stored conversation info. This works well for a small set. It can catch special phrase like "lol" and "c u". However, when the conversation set getting bigger, those pre-stored mis-spelling will grow in exponential. The third way (which I consider best) is to have a word by word (some phrases as well) mis-spelling dictionary and a special algorithm to match mis-spelled words to correct spelled words. The dictionary can work in two ends. One end is at the chat client, acting like a prompt/hint. Pigdin have a small dictionary works that way. And everybody is familiar with Google's search prompt. The other end is to work on the backend, automatically fixing those mis-spellings.

Personally, I prefer the spelling correction happens on the backend instead of on the front end. Type a few keyword and have a hint/prompt on the finger tip is one thing. Having that thing for everything I'm typing on during a chat session is just annoying. And I don't want be corrected when I typed "thx" as I'm typing on.

There is one problem with spelling correction on the back end. English is ambiguous. A mis-spelled word can be corrected in multiple ways. And not all "u" should be corrected to "you". However, a conversation agent does have a distinct edge, the Context. With the help of context, we can usually correct mis-spelling properly. In fact, this is exactly how human do it. Based on the context, we can ususally recognize the other people's typo and correct them in our mind with very little ambiguity.

No comments:

Post a Comment