1 Introduction
Topic terms | Search keywords |
---|---|
Chatbot | chatbot* OR chatterbot* OR “dialog* system*” OR “conversation* system*” OR “conversation* agent*” OR “intelligent agent*” OR “virtual agent*” OR “intelligent assistant*” OR “virtual assistant*” |
Business | business OR commerce OR customer OR consumer OR industry OR trade OR transaction |
Deep learning | ‘deep learning’ OR ‘neural network*’ |
2 Background
3 Mainstream deep learning methods for business chatbot development
Technology | Uniqueness | Classic Literature |
---|---|---|
Fully connected Feed-forward neural network | Each neuron in one layer connects to each node in the adjacent layer; the information flows only in a forward direction without a cycle or loop connection | |
Recurrent neural network | It has directed cycles in model memory that can allow the temporal sequence as input and is specialized in textual data processing to learn the semantic relationships between words | |
Convolutional neural network | It replaces the general matrix multiplication with the convolution operation in at least one network layer; the sequential operations in the convolutional layer and pooling layer enable the network to cope with two and three-dimensional data | |
Capsule neural network | It utilizes a vector as the model output to represent the spatial information and probability value of a detected pattern; it overcomes the shortcomings of the max-pooling operations that would cause valuable spatial features to be lost | |
Graph neural network | It has a “graph-in, graph-out” architecture and transforms the embeddings of nodes and edges without changing the connectivity of the input graph | |
Generative adversarial network | It consists of two sub-models: a generative model to approximate the data distribution and a discriminative model to estimate the probability that a sample came from the real data rather than the generative model | |
Deep reinforcement learning | It combines reinforcement learning with deep learning to optimize the objective function and make better decisions in the sequential decision problem | |
Transformer | It fully uses the self-attention mechanism in the encoder-decoder structure that allows reading an entire sentence simultaneously |
3.1 Artificial neural network
3.2 Recurrent neural network
3.3 Convolutional neural network
3.4 Capsule neural network
3.5 Graph neural network
3.6 Generative adversarial network
3.7 Deep reinforcement learning
3.8 Transformer
4 Summary of deep learning applications in business chatbots
Usage technique | Natural language pre-processing | NLU | NLG | External knowledge enhancement | |||
---|---|---|---|---|---|---|---|
Intent recognition and slot filling | Topic and question identification | Response scoring model | Response selection model | Response generation | |||
Standard DNN | – | 2018: Oh et al. (2018) | 2021: Canas et al. (2021) | – | – | ||
2019: Paul et al. (2019) | |||||||
RNN | 2017: Bartl and Spanakis (2017) | 2018: Liao et al. (2018) | 2019: Zhao et al. (2019) | 2018: Singh et al. (2018) | 2018: Specialized knowledge detection (Liao et al. 2018) | ||
2018: Yang et al. (2018) | 2020: Franco et al. (2020) | 2019: Specialized knowledge detection (Olabiyi et al. 2019) | |||||
2021: Li et al. (2021) | |||||||
2020: Damani et al. (2020) | |||||||
CNN | 2017: Li et al. (2017) | 2019: Kulkarni et al. (2019) | 2018: Qiu et al. (2018) | 2019: Paul et al. (2019) | |||
2021: Li et al. (2021) | |||||||
2020: Song et al. (2020) | |||||||
2021: He and Tang (2021) | |||||||
CapsNet | – | 2021: Tiwari et al. (2021) | – | – | – | – | – |
GNN | – | – | – | – | – | – | 2021: Knowledge graph (Lin et al. 2021a) |
GAN | – | – | – | – | – | 2019: Olabiyi et al. (2019) | – |
2020: Ren et al. (2020) | |||||||
DRL | – | – | – | – | 2017: Williams et al. (2017) | 2017: Kandasamy et al. (2017) | – |
2019: Hatua et al. (2019) | 2018: Liao et al. (2018) | ||||||
2020: Zhao et al. (2020) | 2020: Ren et al. (2020) | ||||||
2021: Zhang et al. (2021) | |||||||
Transformer | – | 2020: Tahami et al. (2020) | 2021: Li et al. (2021) | 2020: Shalyminov et al. (2020) | 2021: Service mode classification (Lin et al. 2021b) | ||
Technique | Feature | ||||||||
---|---|---|---|---|---|---|---|---|---|
NLP | NLU | NLG | EKE | Highlight | Limitation | ||||
① | ② | ③ | ④ | ⑤ | |||||
Standard DNN | ✓ | ✓ | ✓ | ✓ | Good at processing static data; Capable of handling large-scale network models and integrating all information to fit various data types and tasks | Poor performance in processing sequential and spatial data; Prone to overfitting; Difficulty in extracting more abstract features | |||
RNN | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Good at processing sequential data and extracting temporal features | High computational complexity due to considering dependencies between time steps; Low training efficiency; Prone to gradient vanishing and explosion problems |
CNN | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Good at extracting spatial features; Strong parallelism; High training efficiency | Certain limitations on the length and width of input data; Loss of position information in sequential data processing; Difficulty in handling long-range dependencies |
CapsNet | ✓ | Good at handling hierarchical and spatial relationships with the concept of capsules that represent a feature with a vector | High computational complexity due to complex capsule structure and dynamic routing algorithm; Limited attempts in practice to examine the performance | ||||||
GNN | ✓ | Good at processing graph data; Capable of capturing structural information between nodes and generating informative features | High computational complexity for large graphs | ||||||
GAN | ✓ | Aimed at generating realistic samples in semi-supervised learning | Hard to train stably; Difficulty in handling discrete textual data | ||||||
DRL | ✓ | ✓ | Good at sequential decision problems considering long-term income in a trial-and-error mechanism | High computational complexity due to the requirements of a lot of data, time, and computing resources; Low training and sampling efficiency due to frequent trials and feedback | |||||
Transformer | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Good at extracting feature representations with contextual relevance and adapting to input sequences of different lengths; Strong parallelism; Global receptive field | Limited local information acquisition capability due to the unique position encoding method |
Paper | Contribution | |||||||
---|---|---|---|---|---|---|---|---|
NLP | NLU | NLG | EKE | Artifact design | ||||
① | ② | ③ | ④ | ⑤ | ||||
Bartl and Spanakis (2017) | RNN | RNN | A retrieval-based dialogue system utilizing utterance and context embeddings for customer services | |||||
Kandasamy et al. (2017) | DRL | A study of chatbots with RNN and DRL architectures when the rewards are noisy and expensive to obtain in the context of restaurant recommendations | ||||||
Li et al. (2017) | CNN | RNN | A chatbot designed for creating an innovative online shopping experience in e-commerce | |||||
Pradana et al. (2017) | RNN | A chatbot developed to improve the interactivity and effectiveness of corporate website | ||||||
Williams et al. (2017) | DRL | A hybrid code network optimized with supervised learning or reinforcement learning to reduce the amount of training data in a customer-facing dialog system | ||||||
Xu et al. (2017) | RNN | A chatbot for customer services showing empathy to help users on social media cope with emotional situations | ||||||
Aalipour et al. (2018) | RNN | A Bi-RNN architecture customized to fit the domain-specific nature of enterprise customer support | ||||||
Liao et al. (2018) | RNN | RNN, DRL | RNN | A multimodal fashion chatbot optimized with DRL to capture fine-grained semantics and generate responses | ||||
Ma et al. (2018) | RNN | Tensor Encoder Generative Model collaborating data of many shops in customer service dialogue systems to alleviate the disadvantage of data insufficiency | ||||||
Moirangthem et al. (2018) | RNN, CNN | A classification model to discriminate user utterances between task-oriented and chit-chat conversations | ||||||
Oh et al. (2018) | Standard DNN | An out-of-domain detection method based on sentence distance for banking dialogue systems | ||||||
Qiu et al. (2018) | CNN | Ae multi-turn conversation model based on CNN for context-aware question matching in e-commerce | ||||||
Quan et al. (2018) | RNN, CNN | A real estate chatbot with daily updated data of real estate information in Hanoi and Ho Chi Minh cities | ||||||
Singh et al. (2018) | RNN | A chatbot using TensorFlow for small industries or business | ||||||
Yang et al. (2018) | RNN | A learning framework that leverages external knowledge for response ranking in the context of technical support | ||||||
Aleedy et al. (2019) | Standard DNN | RNN, CNN | A chatbot predicting a suitable and automatic response to customers’ queries | |||||
Chen et al. (2019) | CNN | RNN, CNN | A review-driven framework of answer generation for product-related questions in e-commerce | |||||
Hardalov et al. (2019) | Standard DNN | Standard DNN | A deep neural architecture from the domain of machine reading comprehension to re-rank the suggested answers from different models using the question as a context | |||||
Hatua et al. (2019) | DRL | A goal-oriented chatbot using transfer learning and attention mechanism for movie ticket booking | ||||||
He et al. (2019) | CNN | A model based on recurrent pointer networks aligning question and answer utterances in customer services | ||||||
Kang and Lee (2019) | RNN | A context-aware dialog generation system through external memory for chit-chat conversations | ||||||
Kulkarni et al. (2019) | RNN | CNN | Standard DNN | A question–answer matching framework to answer both factoid and non-factoid user questions on product pages | ||||
Olabiyi et al. (2019) | RNN, GAN | RNN | A persona-based multi-turn conversation model in an adversarial learning framework | |||||
Paul et al. (2019) | Standard DNN | CNN | A focused domain contextual chatbot framework for customer services in resource-poor languages | |||||
Prajwal et al. (2019) | RNN | A universal semantic web assistant based on a sequence-to-sequence model | ||||||
Prasomphan (2019a) | Standard DNN | RNN, CNN | A retrieval-based method for chatbot improvement in trading systems for SMEs | |||||
Prasomphan (2019b) | RNN, CNN | RNN, CNN | A prototype combining retrieval and generative methods in trading systems for SMEs | |||||
Sheikh et al. (2019) | RNN | A generative model for Human Resource | ||||||
Xue et al. (2019) | Standard DNN | RNN | An agent-assist chatbot boosting the effectiveness of customer support agents | |||||
Zhao et al. (2019) | RNN | RNN | RNN, CNN | A chatbot providing instructional answers from a knowledge base in mobile customer services | ||||
Bhathiya and Thayasivam (2020) | RNN | A meta-learning method for few-shot joint intent detection and slot-filling | ||||||
Damani et al. (2020) | Transformer | RNN | Optimized Transformer models for FAQ answering | |||||
Haihong et al. (2020) | RNN | RNN | A multi-domain chatbot delivering or requesting information according to specific user requests | |||||
Franco et al. (2020) | RNN | A business-driven chatbot for cybersecurity planning and management | ||||||
Jiao (2020) | Standard DNN | A financial chatbot based on entity extraction using RASA NLU and neural network | ||||||
Kushwaha and Kar (2020) | RNN | RNN | A language model-driven chatbot for businesses to address marketing and selection of Products | |||||
Liu et al. (2020) | CNN | Gated attentive CNN dialogue state tracker utilizing the gated attentive convolutional encoder and introducing historical information | ||||||
Nuruzzaman and Hussain (2020) | RNN | A chatbot for the insurance industry that uses multiple strategies to generate a response | ||||||
Ren et al. (2020) | RNN, CNN | RNN, GAN, DRL | An end-to-end system to tackle the task of conversational recommendation with an adversarial reinforcement learning approach to refine the quality of generated system actions adaptively | |||||
Shalyminov et al. (2020) | Transformer | Transformer | A hybrid generative-retrieval model able to perform both response generation and ranking | |||||
Song et al. (2020) | CNN | A triple CNN model for retrieval-based question answering system in e-commerce | ||||||
Tahami et al. (2020) | Transformer | A cross-encoder architecture that transfers knowledge from one model to a bi-encoder model using distillation | ||||||
Yu et al. (2020) | RNN, CNN | A joint model based on intent information enhancement for multi-domain language understanding | ||||||
Zhao et al. (2020) | DRL | A Dynamic Reward-based Dueling Deep Dyna-Q model that can learn policies in noise robustly | ||||||
Brahma et al. (2021) | Standard DNN | A named entity recognition approach to identify the food quality descriptors from a given message | ||||||
Canas et al. (2021) | Standard DNN | A data-driven dialog management technique providing flexibility to develop, deploy, and maintain the dialog module in commercial platforms | ||||||
Chang and Hsing (2021) | Transformer | RNN | RNN, CNN | An emotion-infused deep neural network for emotionally resonant conversation | ||||
Ferrod et al. (2021) | Standard DNN | RNN | A classification model to identify users’ domain expertise from dialogues in Telco commerce | |||||
He and Tang (2021) | CNN | A method of context representation learning on sequential data for dialogue state tracking | ||||||
Kushwaha and Kar (2021) | RNN | A language model-driven chatbot for interactive marketing in the post-modern world | ||||||
Li et al. (2021) | Transformer | RNN, CNN, Transformer | A deep context modeling method for multi-turn response selection in dialogue systems | |||||
Lin et al. (2021a) | GNN | A personalized entity resolution method with dynamic heterogeneous knowledge graph representations | ||||||
Lin et al. (2021b) | RNN | RNN, CNN, Transformer | A predictive approach for Wait-or-answer tasks in e-commerce dialogue systems | |||||
Lothritz et al. (2021) | Transformer | A comparative study exploring multilingual and multiple monolingual models for intent classification and slot filling in banking | ||||||
Majid and Santoso (2021) | RNN | RNN | A conversation sentiment and intent categorization method using context RNN for emotion recognition | |||||
Tiwari et al. (2021) | CapsNet, Transformer | RNN | A dynamic goal-adapted task-oriented dialogue agent in mobile selling-buying scenarios | |||||
Wu et al. (2021) | RNN | A joint model of intent classification and slot filling for online customer services | ||||||
Yang et al. (2021a) | Transformer | An intelligent cloud customer service system based on tag recommendation | ||||||
Yang et al. (2021b) | Transformer | RNN | A model predicting users’ abandonment of a task-oriented chatbot service using explainable deep learning | |||||
Yu et al. (2021) | Transformer | A financial service chatbot based on deep bidirectional Transformers | ||||||
Zhang et al. (2021) | DRL | RNN | An Emotion-Sensitive Deep Dyna-Q model for task-completion dialogue policy learning |
4.1 Pre-processing of natural languages
4.2 Natural language understanding
4.2.1 Intent recognition and slot filling
4.2.2 Topic domain and question type classification
4.3 Natural language generation
4.3.1 Response scoring model
4.3.2 Response selection model
4.3.3 Response generation
4.4 External knowledge enhancement
5 Critical analysis of the characteristics of business dialogue systems
Characteristic | Category | |||
---|---|---|---|---|
Pipeline architecture | End-to-end architecture | |||
Retrieval method | Generative method | Retrieval method | Generative method | |
Response source | Preset corpus (template, dataset, database) | Generative model trained with corpora | Preset corpus (template, dataset, database) | Generative model trained with corpora |
Pre-processing | Statistical learning methods, Neural network embedding | Neural network embedding | Neural network embedding | Neural network embedding |
NLU extracts | Intent, slot, topic domain, question type | Intent, slot | – | – |
NLG | Scoring model, response classification model | Response generation | Scoring model, response classification model | Response generation |
Context-sensitive enhancement | Context embedding, sentiment, personality, topic-related knowledge | Context embedding, emotion, topic-related knowledge | Context embedding | Context embedding |
Extensibility of additional functionality | High: for framework-based, medium: for self-building | Medium | Low | Low |
Data volume demand | Small, medium, and large scale | Medium and large scale | Small and medium scale | Medium and large scale |
Dialogue scalability | Limited: added responses need to be compatible with other components | Scalable: the NLU component can be slightly adjusted to match added responses | Scalable: for scoring models, limited: for response classification models | Scalable: responses can be added to the training corpus |
Responding stability | High | Low | Medium | Low |
Difficulty of development | Low: for framework-based, Medium: for self-building | Medium | Low | Low |
Corpus update and maintenance | Components need to be coordinated | The generative model needs to be trained | The whole model needs to be trained and adjusted | The whole model needs to be trained |
Manual intervention level | High: for framework-based, Medium: for self-building | Medium | Low: for scoring models, Medium: for response classification models | Low |
Expected practicability | High | Low | Medium | Low |
Advantage/potential | Each component can be independently adjusted and evaluated | New responses can be generated | Simple to build without a large dataset | New responses can be generated |
Disadvantage/limitation | Error propagation, Manual intervention | Difficult to generate exact responses | Difficult to deal with multi-topic demands | Very difficult to generate exact responses |
Application scenario | E-commerce, Social media, Banking, Technical support, etc. | E-commerce | E-commerce, Technical support | E-commerce, Social media, Technical support, Chitchat, etc. |
Research gap | Unexplored application scenarios and commercial values Unclear user behaviors or psychological activities Unarticulated applicability and usability of diverse architectures A unified chatbot ecosystem |
5.1 Chatbot in pipeline architecture
5.1.1 Pipeline architecture with retrieval methods
5.1.2 Pipeline architecture with generative methods
5.2 Chatbot in end-to-end architecture
5.2.1 End-to-end architecture with retrieval methods
5.2.2 End-to-end architecture with generative methods
6 Future research directions
Research direction | Possible research topic |
---|---|
New scenarios and emerging technologies | Applying chatbots to new areas, such as individualized customer care, live streaming, online collaboration, short video marketing, and farmers’ market digitization Integrating emerging technologies into chatbot design, such as burgeoning deep learning branches and efficient Quantum computing |
Human–computer interaction and usability analysis | Assessing the effects of adopting the technology on human experience and activities Examining the practical usability for improving chatbot design |
Meta-theory and design principles | A framework or guideline for designing a practical chatbot systematically and thinking about the value reasonably Meta-theories, evaluation indicators, or design principles to guide and regulate artifact construction |