Conversational Artificial Intelligence is a set of technologies that enable machines to simulate conversations and come up with the most appropriate answers. Gartner estimates that by 2020, 25% of all customer service support channels will be augmented with interactive virtual assistants. The major difference between a decent chatbot and a great chatbot is a feedback control mechanism. Without the feedback loop, you’re limiting the capability and intelligence that you get from the user. The ways in which feedback loops help you provide a different experience for users are:
- Understanding users: Feedback loops help us understand what users want as chatbots are essentially open-ended and can accommodate the feedback from the users. By analyzing and incorporating this feedback, the chatbot makes a better decision in answering the query.
- Continuous learning: Although there are many prominent chatbots, their responses usually degrade over time. Therefore, it is important to continuously improve their performance through feedback.
Artificial Intelligence (AI) and Machine Learning (ML) have a huge impact on multiple aspects of society. With these technologies, we can solve problems that humans can’t. While Artificial Intelligence is a powerful technology that has immense capabilities in all fields, its foundation is Data. Data is one of the most under-utilized assets. While techniques like predictive modeling on data have been there for quite some time, some of the most promising works in AI over the last few decades are from the use of deep learning models. Deep learning is a subfield in machine learning inspired by the structure of the brain called Artificial Neural Networks (ANN). While neural networks were discovered long back, they were widely used only recently because of the increase in data and computing power. Some of you might be familiar with supervised and unsupervised learning. Just as the name suggests, supervised learning is learning under supervision, more specifically having the labeled Data for training. The label is nothing but the answer that the algorithm should come with up on its own, based on certain training examples. Deep learning algorithms generally use a supervised approach and therefore require a lot of labeled training data. There is a paucity of high-quality data and it is not feasible both in terms of time and money to label all the data. This is the new bottleneck in the case of deep learning. Next, we will talk about active learning that aims to alleviate this problem.
Active learning: Active learning systems try to overcome the labeling bottleneck by asking an oracle (human annotator) to label the subset of unlabeled data. This way, the model aims to achieve high accuracy using a few labeled data. For example, consider a large pool of query-answer pairs with no feedback (label). As stated earlier, it is not feasible to manually provide feedback for each query. Therefore, few query answer pairs are selected from the pool based on parameters discussed later. These pairs are prompted to the oracle for labeling. For example, in a particular case, we can have a binary label that is ‘1’ if the answer is satisfactory for the query and ‘0’ when the answer is not good enough.
The major challenge is how do we identify the subset of data that must be labeled? Can we randomly select data points? Honestly speaking, labeling random sample is not as bad as you think. It is still better than having no labeled data. However, there is a lot of information available that can be used to select these data points. Let us take an example of the model looking at images and trying to classify it as either a dog or a cat. When the model is highly confident that the image is of a cat, then it would not make sense to ask the oracle to classify it for us. On the other hand, when the model is not confident enough, then the label from the oracle would help the model learn for similar data points. The following parameters can help in selecting data points from the pool:
- Least Confidence: In this case, the model selects the data points for which it has the least confidence in its most likely label. For example, given two datapoints, our model can predict the correct label for both the datapoints with 90% confidence and 50% confidence respectively. In this scenario, the second data point should be presented to the oracle for labeling as learning. From this labeling, our model will be able to correctly predict the label with high confidence.
- Margin Sampling: Least Confidence strategy is good in the case of two labels. However, in the case of multiple labels, it is important to look at the difference between the confidence of the top 2 labels. For example, the data points A and B have the confidence distribution for labels cat, dog, and none as follows: A – 40%, 30%, 30% and, B – 49%, 48%, 3%. According to the least confidence strategy, we would be selecting A for the oracle to the label. If you look closely, you will infer that the model is more confused between cat and dog for point B than point A. This is exactly what our model needs to learn. Margin sampling strategy will calculate the difference between the first and the second most confident label, which in this case, it is 10% for A and 1% for B. The point with the least difference is selected, which is B in the given example.
- Entropy Sampling: Entropy is another measure that quantifies the randomness in the data. The most informative points will have a higher entropy. Entropy for each data point is calculated and the points with maximum entropy value are selected for labeling.
For the feedback system in the chatbots, the same method can be applied where the model selects a subset of data from the pool and send it to the oracle for getting the correct feedback. The model then learns, based on the feedback received from the oracle and improves. In the next cycle, the new subset is selected from the unlabeled pool. This completes the active learning loop.
When enough labeled Data for training a model based on feedback is not available, reinforcement learning can be used. Reinforcement Learning is a goal-oriented learning approach. This is just like humans learning from their mistakes. There is no actual training phase. Instead, it is a trial and error approach. This approach includes a feedback system that tells the chatbot whether it is right or wrong about the actions or responses. This feedback is then incorporated to train the model. You can learn more about this approach here.
Business Value and Conclusion
The demand for smart interactive virtual assistants is proliferating. They are not only used in customer service support, but also as a personal assistant. Who wouldn’t want an assistant that actually mimics a human? A great chatbot can significantly improve the customer experience. Good customer experience not only lowers customer churn but also advocates the brand. This personal touch can only come when we allow our models to learn from us and our environment. Both active learning and reinforcement learning have the capability to take Conversation AI to the next level. We primarily covered active learning in the blog, stay tuned for content on reinforcement learning!