End-to-end reinforcement Learning of dialogue Agents for information Access end-to-end Enhanced Learning Dialog Agent Information access

Source: Internet
Author: User
Tags generator knowledge base

This paper proposes kb-infobot-a Dialogueagent the provides users with a entity from a knowledge Base (KB) byinteractive Ly asking for its attributes. All components of the Kbinfobot aretrained in a end-to-end fashion using reinforcement learning. Goal-orienteddialogue systems typically need to interact with a external database to accessreal-world knowledge (e.g., MO VIES playing in a city). Previous systemsachieved This by issuing a symbolic query to the database and adding retrievedresults to the dialogue E. However, such symbolic operations break thedifferentiability the system and prevent end-to-end training of Neuraldia Logue agents. In this paper, we are limitation by replacingsymbolic queries with a induced "soft" posterior over The KB thatindicates which entities the user is interested in. We also provide a modifiedversion of the episodic reinforce algorithm, which allows the Kbinfobot Toexplore and learn both The policy for Selecting Dialogue ActS and the Posteriorover the KB for retrieving the correct entities. Experimental results show thatthe End-to-end trained Kb-infobot outperforms competitive rule-based Baselines,as-as AG Ents which are not end-to-end trainable.

This article presents Kb-infobot-a dialog agent that provides the user with an entity from the Knowledge Base (KB) by interactively asking for its properties.

All components of kbinfobot use intensive learning to train in an end-to-end manner. Target-oriented dialog systems often need to interact with external databases to access real-world knowledge (for example, movies played in cities). The previous system was implemented by issuing a symbolic query to the database and adding the retrieved results to the dialog state. However, this symbolic operation breaks down the system's distinguishable nature and prevents end-to-end training of the neural dialogue agent. In this article, we solved this limitation by referencing the "soft" posterior distribution in lieu of a symbolic query to address this limitation, indicating which entity the user is interested in. We also provide a modified version of the scenario reinforce algorithm that allows Kbinfobot to explore and learn the policy and intellectual property of choosing the dialogue behavior to retrieve the correct entity.

The experimental results show that end-to-end training is better than the baseline based on competition rules and is not an end-to-end Kb-infobot agent.

Guage. In this work, we present kb-infobot,a dialogue Agent "identifies entities of interest to" user from Aknowledge base (KB) by interactively asking for attributes of that Entitywhich helps constrain the search. Such an agent finds application Ininteractive search settings. Figure 1 shows a dialogue example between a usersearching for a movie and the proposed.

Object-Oriented dialog systems help users interact with them through natural language to accomplish specific tasks, such as booking a flight or searching for a database. In this work, we propose Kb-infobot, a dialog agent, that identifies the entities of interest to the user from the Knowledge Base (KB) by asking the entity's properties interactively to help constrain the search. Such agents look for applications in the interactive search settings. Figure 1 shows an example of a dialog between a user searching for a movie and a proposed Kb-infobot.

A typical goal-oriented dialogue systemconsists of four basic components:a language Understanding (LU) Module foridentify ing user intents and extracting associated slots (Yao et al., 2014; Hakkanitur et al., 2016; Chen et al., 2016), a dialogue¨state tracker whichtracks the user goal and dialogue history (Henderson et al., 2014; HENDERSON,2015), a dialogue policy which selects the next system action based on Thecurrent state (Young et al., 2013), an D a Natural language generator (NLG) forconverting Dialogue-Acts into natural language (Wen et al., 2015; Wen et al.,2016a). For successful completion of user goals, it are also necessary to equip thedialogue with policy real-world from a Database. Previous End-to-endsystems achieved this by constructing a symbolic query to the current beliefstates of the agent and R Etrieving results from the database which match thequery (Wen et al., 2016b; Williams and Zweig, 2016; Zhao and Eskenazi, 2016). Unfortunately, such operations make tHe model Non-differentiable, and variouscomponents in a dialogue system are usually trained.

A typical goal-oriented dialogue system consists of four basic components:

A language understanding (LU) module that identifies the user intent and extracts the relevant time slots.

A status tracker that tracks user goals and dialog history (Henderson et al. , 2014; HENDERSON,2015),

The dialogue policy for selecting the next system action based on the current state (Young ET, 2013),

and a natural language generator that converts conversational behavior into natural language (Wen et al). , 2015; Wen et al. , 2016A).

In order to successfully complete the user goals, it is also necessary to combine the dialogue policy with the actual knowledge in the database. The previous End-to-end system achieves this by building symbolic queries from the current belief state of the agent and retrieving the results from the database to match the query (Wen, 2016b; Williams and zweig,2016; Zhao and Eskenazi, 2016). Unfortunately, such operations make the model indistinguishable, and the various components in the dialog system are usually trained separately.

 in We work, we replace Sql-like Querieswith a probabilistic framework for inducing a posterior distribution of THEU Ser target over KB entities. We build this distribution from the Belieftracker multinomials over attribute-values and binomial probabilities of theuser Not knowing the value of an attribute. The policy network receives asinput this full distribution to select its next action. In addition to makingthe model End-to-end trainable, this operation also-provides a principledframework to propagate the U Ncertainty inherent in language understanding tothe dialogue policy making the agent robust to LU errors. Our entire model isdifferentiable, which means and theory our system can is trained using only a Reinforcement signal from the "indicateswhether a dialogue is successful" or not. However, in practice, and we find thatwith random initialization the agent are unable to "any" rewards if thedatabase is Larg E Even it does, credit assignment Is tough. Hence, at thebeginning of training, we-have an imitation-learning phase (Argall etal., 2009) where both the belief T Racker and Policy Network are trained tomimic a rule-based agent. Then, on switching to reinforcement learning, theagent are able to improve further and increase its average reward. Such abootstrapping approach has been shown effective when applying reinforcementlearning to solve hard, problems Ly those with long decision Horizons (Silver et al., 2016).



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.