Term alignment
TaskBot engine: The core processing object is "skill". We define the skill as a structured (query+content), vertical scene-based task, such as real-time scene query, tool class, control class, etc.
QABot engine: including KG-QA engine, QAPair engine, DeepQA engine. KG-QA is mainly an encyclopedic and accurate question and answer around the whole network knowledge map; QAPair engine is mainly based on Q&A for production and consumption; DeepQA engine is based on url index, classification clustering, focus word, abstract multi-level system
ChatBot Engine: Includes a chat engine based on retrieval and generation
Content system
Web search and intelligent dialogue are different ways of carrying information services, and they are in the same line of data, algorithms and architecture. As a result, Google and other search engine companies can quickly launch their AI platform & products, based on information services To B / C.
Industry skill library
The first stage: the team spent half a year to upgrade the vertical search of the 100+ vertical industry, involving large-scale entertainment, big travel, news, medium to car, sports, tourism, small stocks, translation, Ancient poetry, etc.
The second stage: further structural upgrade of skills, fine Query structure, multi-round dialogue construction, and output to the Tmall Genie speaker
Network-wide knowledge map
Ali's only network-wide knowledge map, with knowledge card, entity recommendation, accurate question and answer and other product output;
Question and answer library
Community Q&A Library: Based on the question and answer library of the UGC Q&A community, the magnitude of 1B doc;
UPGC production: The campus production system established by Shenma "Knights", the Knights are the code name of the project, making full use of the campus to sort, process, and review the stock knowledge, improve the productivity and quality of the Q&A; Ten thousand level;
High-quality library: The community Q&A library has high coverage but uneven quality, and the quality of social production is high but the quantity is relatively small. The cleaning of the community question and answer library and the expansion of the social production library through the machine are finally precipitated into high-quality libraries;
Egg White Library: Egg white is a product strategy. When users talk to the bot, they most want to get the direct answer, "egg yolk", but sometimes the machine can get (or partially get) the user's problem but can't give the perfect answer. This time, the user "egg" is also an elegant. The means means that I understand you; the first version of the egg white has been completed, mainly covering the type of "description/method" problem;
Core library
In order to purify the Internet environment and improve the quality of content, we have run a set of core library processes in the form of operations + mining;
Skillsbase + Knowledge Base + Q&A Library + Chatting Library, which constitutes the infrastructure of intelligent dialogue in the information service scene. Several examples illustrate the satisfaction of different libraries for different queries. Pony students are watching an NBA game. ,He says:
"How many points does the Rockets lead now?" -> Skill Pool
"Who was invented by basketball?" -> Knowledge Base
"Can Harden enter the Hall of Fame?" -> Q&A Library
"Let's talk about the NBA?" -> Chatting library
General information services are always pursuing the coverage and quality of question and answer. This is also a difficult point in the industry, including the processing of semi-structured/unstructured data, content production mode, content sensitive issues, user satisfaction, etc.; The multi-level QA system accumulated in the exploration, MOPU (Machine/OGC/PGC/UGC) diversified production, and the process-scale and sustainable production system are at the forefront of the industry; in the latest Tmall Genie ideal query collection evaluation The trigger rate is 73% and the accuracy rate is 91%. What is the concept of this data, you can refer to the indicators of representative products in the industry:
According to a recent survey by Stone Temple, Google Virtual Assistant can answer 68% of user questions, 90.6% of which are correct, while Microsoft Cortana is able to answer 56.5% of users with an accuracy rate of 81.9%; Apple Siri answers The proportion of user problems is 21.7%, and the accuracy rate is 62.2%. The proportion of user questions answered by Amazon Alexa is 20.7%, and the accuracy rate is 87%.
Architecture
The above picture shows the overall picture of the architecture system. The "engine" is responsible for the construction of data and the carrying of computing. The "platform" is responsible for the closed-loop solution (production, multi-tenant consumption, operation, demand management, etc.) built with the engine as the core. The landing of the system has been able to search and accumulate for many years. The system is completely decoupled from the search business and carries the traffic of the business parties such as the Tmall Genie (and the live broadcast of the double eleventh party). The following is a description of the Gods coming platform, the TaskBot engine, and the QABot engine.
God's coming platform
The God Down Platform is a platform extension of the TaskBot engine to solve problems such as production, consumption, and operation. For external developers it is BotFramework; for external callers it is the gateway to the entire intelligent conversation; for internal RD it is the production and operation platform. Currently, the platform mainly serves the internal business of the group. God's coming is composed of skill open platform, skill production platform, statistical analysis platform and operation management platform.
Skill open platform
There are two levels of openness: open content + openness. The corresponding skill open platform also assumes two roles:
1. Ability to open (BotFramework): build a platform for the skills of the standard api.ai, and external developers build their own skills;
2. Content Consumption (OpenAPI): Through the creation of applications, selection skills / Q&A, intelligent dialogue directly through the API;
At present, we have not yet promoted BotFramework: Although there are many open platform products, the current model is difficult to meet the needs of developers. One skill requires a lot of work and a long link from product planning to production. It is not the context of submitting corpus configuration points. The output can be fixed (simple control class can barely). There are about 300+ different intents under the 20+ skills that we have completed in the first phase of our skills, and we have established a complete process for corpus collection, labeling, review, modeling, and testing. So our energy is mainly focused on polishing the truly usable built-in skills and producing real value.
Skill production platform
The skill production platform is used to produce built-in skills. It is consistent with the role of the skill open platform. Finally, the material is delivered to the TaskBot engine, but the user is an internal RD, covering the whole link process from product PRD to skill online. It involves writing structured PRD, demand management, and corpus management online. , entity management, skill building, skill training, skill verification, skill release.
For the universality of skills, we support multiple scenes in the skill group for each skill: standard no screen, mobile screen, big screen, standard no screen for the similar scene of the Tmall Genie speaker, mobile phone for Shenma's personal assistant scene They have different requirements in multiple rounds of demand, structured display, and sorting strategies. In addition to the entities, corpora, and scripts, the materials with built-in skills support the delivery of c++ dynamic libraries to support different sorting strategies and NLG strategies.
Through this platform, the skill construction will be online, and the PD/RD/QA/operation division will clearly define the pipeline production.
Statistical analysis platform
Multi-dimensional management of statistics, reports, and indicators. Issues involved in production consumption efficiency (through the direction of statistically directed production of content), content control feedback, and the quest for holistic and independent skills.
Operation management platform
The operation management platform is divided into two parts: content operation and application operation. Content operations: real-time intervention of key domains and modules;
Application operations: additions, deletions, and training of applications/skills;
TaskBot engine
The TaskBot engine is the core of skill building and consumption. It involves offline computing, content management, scheduling, and online services.
Offline calculations build the materials of the external platform into corresponding internal data; including entity dictionary, classification model, intent recognition & slotting plugin/pattern/model, NLG strategy and template, DM script plugin, US sorting plugin, webHook logic plugin and many more.
Content Management Manage the above data by application/skill version. Content management needs to be stateless and can be quickly ported, rolled back, and distributed.
Scheduling is divided into data scheduling, environmental management, and service management. Data scheduling is responsible for offline to online data distribution. A set of SDS engines contains multiple Roles, each of which will load corresponding data; environmental management is responsible for iterative, verification, pre-issued, automated management of production environment; service management is responsible for operation and maintenance. The work includes branches and branches (according to the application traffic branch, according to the skill consumption), expansion and contraction, etc.;
Online engine: SDS engine, SDS engine is the core of task-based dialogue. It accepts the user's query, takes the DM as the control center, uses the NLU as the understanding center, makes the recall and rank through the US, and outputs it in the NLG package. At present, information broadcast, time zone, limit line, history today, unit conversion, oil price, calendar, nba, lbs and other skills Tmall Genie online skill trigger rate 97-98%, accuracy rate 95% +;
DM (Dialog Manager): Dialogue management is a key part of the dialogue system. It is responsible for maintaining the dialogue context, managing the dialogue process, and keeping the dialogue process smooth. The user's input is processed by the NLU to generate information such as intent, slot, etc., and the DM makes corresponding decisions and actions according to the data and the context of the current conversation, including calling the NLG module to generate the natural language, and acquiring the dialogue process through the external service interface. Additional information. The DM manages the conversation in the form of a task tree. Each node of the tree is an Agent (inquiry, execution, response); considering the versatility and scalability of the dialog system, we will design the dialogue engine in the dialog management module. Partially related to the relevant parts of the domain, including reusable dialog agent components, editable dialog control options, and general external call mechanism, it is convenient to customize Agents with different functions to achieve different dialog scenarios.
The dialog engine has two important components in process control:
Conversation execution stack: Maintains the execution state of the agent in the form of a stack, and controls the dialog flow according to the context. The dialog stack puts the Agent into the stack, and the Agent at the top of the stack executes and selects the appropriate sub-agent to continue the stack execution. The dialog stack stores context information for the conversation, corresponding to a specific conversation scenario. The Agent at the top of the dialogue stack can be visually understood as the focus of the dialogue. The dialogue stack combined with the Agent relationship tree and the topic agenda can realize the tracking and management of the dialogue focus, and can flexibly maintain, switch, and backtrack the conversation theme.
Topic Agenda Table: Parameter information responsible for maintaining and managing the conversation process, used to collect the user input expected by the system. The agenda is divided into multiple levels, each level corresponding to an Agent in the dialog stack, so for different run stack information, the agenda represents the desired input in this dialog scenario. When the user keeps or shifts the topic, the corresponding expected parameters can be found and updated.
The execution unit of the DM is a "script", and the script tree constructed by the user on the open platform or the production platform by drag and drop will eventually be loaded into the c++ so that the script is loaded and executed. At present, through the combination of DM and NLU, multiple rounds of dialogues such as omission replacement, referential resolution, topic transfer, and error handling have been completed on multiple skills.
NLU: NLU has two different design concepts:
NLU around BotFramework: The user query is structured as a Domain/Intent/Slot and returned to the developer (with confidence). Some BotFramework products require the user to judge whether or not to accept the result. It will be more troublesome in the case of more skills. Because the core of this design helps users solve the problem of semantic understanding.
NLU around the dialogue product: combined with the results of the NLU classification and recall results to do multidimensional NBest strategy, which is particularly important in the information service scenario, such as the user said Li Bai, it may be the poet Li Bai, may be Sabinin's wife Li Bai, also It may be Li Ronghao's "Li Bai", which has different ways of handling it, such as using large search users to click, using the user's historical behavior, and even directly asking which Li Bai on the DM
The above 2 naturally covers 1, and the Shenma's NLU is a mode of 2. This year, the NLU system experienced two major upgrades, one is the NBest upgrade of the entire SDS, and the other is the sub-NLU. The sub-NLU allows different domains to customize the intent recognition and pumping strategy according to their own special internals, and improve the RD parallel. degree.
NLG/US/Skill-Gateway is no longer open.
QABot engine
The industry has different dimensions for question and answer. According to the content dimension, it can be divided into structured data question and answer, unstructured data question and answer, and Q&A based question and answer. From a technical point of view, the industry is generally divided into a search-based question answering system and a generative question-based answering system. The former is to build an information retrieval system on a large-scale dialogue dataset, and to achieve a reasonable response to user problems by establishing an effective question-matching and question-and-answer correlation metric model; the latter attempts to build end-to-end (End-to -End) The deep learning model automatically learns the semantic association between query and response from massive dialog data, so as to automatically generate a reply for any user problem.
We currently focus on search-based QA systems based on massive data, and are divided at the system level: KG-QA, Baike-QA, DeepQA, and PairQA, which are all handled by existing knowledge, but in data sources/requirements. Processing methods, matching methods, and coverage scenarios are not the same. The author believes that the ideal end of the world is structured (knowledge base), but this can never be realized, such as the continuous generation and update of information and the difficulty of natural semantic processing, so it is necessary to advance in two directions at the same time.
KG-QA and Baike-QA are accurate but limited in coverage. The unstructured Deep-QA coverage is high but the pollution is high. The social production of Pair-QA greatly improves productivity but requires good scenarios and problems. Many challenges have determined. The difficulty and barriers of question and answer.
Problem understanding
Problem understanding is a key part of the Q&A system to understand user intent, especially DeepQA. Here we reuse the ability of large search-based NLP (semantic extension, weight analysis, entity recognition, rewriting error correction, etc.); problem classification combined with machine learning classification algorithms and manual methods to achieve classification of questions, such as: meaningless, Chatting, characters, organization, time, etc.; focus word recognition, mainly to achieve precise positioning of information needs, refers to the main background or object of the question, the content of the topic, can reflect the descriptive role of the topic, such as entities, attributes, actions , examples, etc.
Information retrieval
Information retrieval is responsible for retrieving relevant/candidate information from the global corpus and passing it to the final answer generation module. Differences in information corpus, and industry
Information retrieval
Information retrieval is responsible for retrieving relevant/candidate information from the global corpus and passing it to the final answer generation module. Different information corpora, as well as different business scenarios, there are many forms of retrieval methods. At present, we mainly use inverted text retrieval and vector-based semantic retrieval. The former is a traditional full-text search engine. The advantage is that it is simple to implement and has high accuracy. However, it has a large dependence on the construction of the corpus. The latter is a better implementation of the semantic search engine. The advantage is that the generalization ability is strong. But there is a certain false trigger rate. The two sets of indexing mechanisms have their own advantages and disadvantages. Combining different corpora and business scenarios, using different indexing mechanisms, they will also be used in combination with each other to give play to their respective advantages.
Answer generation
Based on the candidate answers on the search side, further fine-choice, answer extraction, and confidence calculation are needed to finally obtain accurate and concise answers. PairQA, more is a strict sorting + confidence calculation through machine learning models and methods such as CNN, DSSM, GBDT; DeepQA, which is oriented to unstructured document/community corpus, requires deeper processing, including integration. Simple summary extraction of Bi-LSTM RNN model, cross-validation between answers to synonyms, and answer correlation verification.
Corpus construction
The construction of the corpus is the basis of QABot. Whether it is a question or answer for a specific field (such as mother and child, three countries, street dance), or an open domain question and answer (such as a chat), it is inseparable from the support of high quality corpus. For the Tmall Genie scene, we implemented a set of data mining and operational production processes for colloquial question and answer, including open problem mining, scene problem mining, socialized answer production, and high-quality answer extraction.
Graph engine
Knowledge map is the core infrastructure of Shenma search. It is also the oldest data product with the help of search big data and natural language processing and deep learning technology. It plays a key role in the process of search knowledge and intelligent development. Based on knowledge maps and natural language understanding, we have built three main products: knowledge cards, entity recommendations, and accurate questions and answers. In the intelligent dialogue business, for the scene of the speaker, it also focuses on the construction of recipes, ancient poetry, the Three Kingdoms, the world's most outstanding skills, and output to the Tmall Genie. On the production side, on the one hand, it continuously introduces cutting-edge new technologies of knowledge extraction and knowledge reasoning, on the other hand, it also establishes a socialized production model of maps to continuously build and supplement knowledge in professional fields, so that knowledge maps are better for business. Empowerment.
Summary
Last year, the intelligent dialogue team initially completed the technical upgrade from search to intelligent dialogue, and precipitated the architecture, algorithm, operation and content system of AI+ information service in actual combat. In the grateful era, the road to AI dialogue is very long, and we work together.