Deep Reinforcement Learning for Dialogue Generation
AI-generated Key Points
- Recent neural models of dialogue generation lack foresight and only predict utterances one at a time
- Traditional NLP models have incorporated reinforcement learning to address this limitation
- The authors propose a deep reinforcement learning approach to model future reward in chatbot dialogue
- The proposed RL model rewards sequences with three important conversational properties: informativity, coherence, and ease of answering
- Evaluation is done using human judgments and automatic metrics such as conversation length and diversity
- The RL model with dialogue simulation achieves the best evaluation score in terms of sustained conversations between virtual agents
- The algorithm generates more interactive responses and fosters longer conversations compared to other models
- Diversity is assessed by calculating the number of distinct unigrams and bigrams in generated responses
- This work represents an important step towards developing a neural conversational model that considers long-term success in dialogues
Authors: Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky
Abstract: Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity (non-repetitive turns), coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.