Monday, November 23, 2009

Spelling correction

We are terrible at spelling. And in the age of internet, it is definitely getting worse. When we are chatting online or texting, we use all kinds of short cuts to save a couple of key strokes. What is the last time I actually typed "see you later" instead of "c u"? Seems to be a century ago. Also, a lot of time I seems to be typing "thx" instead of "thanks", or sometime "thaks".

A pattern matching based conversation agent can't handle those mis-spelling directly. Basically, there are three ways to tackle this problem. Method one is using some sophisticated matching algorithm that is capable to match words against likely mis-spellings, like able to match "restaraunt" to "restaurant", or "thaks" to "thanks". This can generate some amazing results. But it tends to be quite slow, and usually not working well with those internet slang. Second way will be actually program all the possible mis-spelling combination into the pre-stored conversation info. This works well for a small set. It can catch special phrase like "lol" and "c u". However, when the conversation set getting bigger, those pre-stored mis-spelling will grow in exponential. The third way (which I consider best) is to have a word by word (some phrases as well) mis-spelling dictionary and a special algorithm to match mis-spelled words to correct spelled words. The dictionary can work in two ends. One end is at the chat client, acting like a prompt/hint. Pigdin have a small dictionary works that way. And everybody is familiar with Google's search prompt. The other end is to work on the backend, automatically fixing those mis-spellings.

Personally, I prefer the spelling correction happens on the backend instead of on the front end. Type a few keyword and have a hint/prompt on the finger tip is one thing. Having that thing for everything I'm typing on during a chat session is just annoying. And I don't want be corrected when I typed "thx" as I'm typing on.

There is one problem with spelling correction on the back end. English is ambiguous. A mis-spelled word can be corrected in multiple ways. And not all "u" should be corrected to "you". However, a conversation agent does have a distinct edge, the Context. With the help of context, we can usually correct mis-spelling properly. In fact, this is exactly how human do it. Based on the context, we can ususally recognize the other people's typo and correct them in our mind with very little ambiguity.

Sunday, November 22, 2009

It's the Context, Stupid!

What ultimately constructs a logical and proper response to a question? What's the edge a conversation agent have over a search engine? What makes a chat bot / chatter bot smart? It is the recognition of CONTEXT!

Here is an example conversation happen between a real human and CindyBot:

Human : How's the weather like ?
Cindy : Where do you live ?
Human : Kirkland, WA
Cindy : 42.6F / 5.9C, Scattered Clouds - 8:45 PM PST Nov. 22
Human : tomorrow?
Cindy : Cloudy with a chance of rain in the morning...then rain likely in the afternoon. Highs in the 40s to lower 50s. Southeast wind 10 to 15 mph.

Look at those human's inputs, without the context establish by the first question, "tomorrow?" will make no sense. Imagine what will be Google's results for keyword "tomorrow"? Cindy was able to respond to "tomorrow" because it recognized the context. Context here means a follow up question about weather in Kirkland, WA. With this implicit information, Cindy can go and retrieve tomorrow's weather info for Kirkland and generate the proper response.

Context is not necessary previous conversation. Context can be many things. It can be the current web-page a customer is browsing on; it can be items in the customer's cart; it can be current time, location or even the browser version the user is running on. CML provides strong context support based on these information. Basically, the idea is the next conversation is likely to be a follow up of the previous conversation, or on the same topic. And a Web-page should be able to send "hints" like current page, current items in the shopping cart to the conversation agent to augment it's response generation.

Two basic approaches of making a chat bot

Chat bots are usually developed using two approaches. The first approach uses Natural Language processing algorithms. This involves both linguistic analysis and "understanding". "Understanding" is not very well defined task. Most time this means some form of matching pre-stored information and using some logic reasoning to produce a response. The term "understand" is inherently meaningless to me. Because the only available criterion to demonstrate "understanding" is the ability to produce valid responses. NLP is a very hard problem. Though some NLP system performs better than others, it is fair say that, to this day, there is no general purpose NLP can provide conversational artificial intelligence.

The second approach is pattern matching. This is the method used by most chat bots since the 1967 Eliza bot. This is basically the same approach used by search engines. In a chat bot system, pattern matching is usually augmented by other techniques to produce better results. For example, Kyle combine real-time learning with evolutionary algorithms to optimise their ability to communicate based on each conversation held.

The latter approach also works well for the purpose of information retrieving (just as search engine), which is one of the most practical use of chat bots. Some specialized software or programming languages are created specifically for this narrow function required. For example, A.L.I.C.E., utilises a programming language called AIML which is specific to its function as a conversational agent. My CML and Novabot is also such a system. In fact, it is inspired by AIML. However, I did add my own "augments" to the general pattern matching and focussed on the convesation CONTEXT.

Why using a chat bot ?

Chat bot / chatter bot has been around for a long time. The first computer program acts like a chatbot can be dated back to 1967. SciFi Movies made from last century with computers often features scene like type in a question and the computer respond with some direct answer, acting like a chat bot.

In the recent years, with the huge success of search engines, people are more used to type in some keywords and start searching, navigating through result page, instead of just asking a direct question and expecting answer. This works fine for a lot of occasions, especially when you have some good keywords combination to search. However, if you only have some general words, expecting getting hundreds of pages of totally unintended results. The latest "Bing" commercial certainly mads a joke about that.

The key here is context. Search engines don't have context to rely on. They act on keywords. Some search engine try to be smart and using searching statistics to be "context", and use that information to rank the search result. However, this degree of context is nowhere close to the contextual understanding of an intellectual conversation, even it is just a simple question and answer session.

For example, when you are asking a shopping assistant for "how much is shipping?" The implied context includes your address and your current in cart items. Based on information, a shopping assistant can give you a direct answer. However, if you type those words in a search engine, the best thing it can do is pointing you to the shipping calculation page. And most of times, it will also give you links to shipping policy, result related with word "much", "how", even ship models.



In many occasions, people just want simple straight answers for simple question instead of doing a search. Many websites make FAQ pages to cover those simple questions. Unfortunately, not many people have the patience of reading those pages.

Chat bots can solve this problem very well. It can provide simple, straight answer to simple questions. Statics has shown most of the customer questions answered by customer support are very simple, straight answer questions. This is probably why chat bot has gain some popularity recently. Paypal, IKEA, HP all have customer support chat bots on their webpage. www.chatbots.org has a list of over 500 chatbots in public domain.