27, with the depth of learning to do automatic question and answer the general method

27, with the depth of learning to do automatic question and answer the general method _ DIY Chat robot

Last Update:2018-08-21 Source: Internet

Author: User

Tags cos

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chat robot is essentially a fan question system, since it is a question and answer system can not be separated from the choice of candidate answers, the use of in-depth learning methods to help us find the best answer, this section we describe the use of in-depth learning to do automatic question and answer the general method

Please respect the original, reprint please indicate the source website www.shareditor.com and the original link address corpus to obtain the method

For a question-and-answer system, we generally collect corpus information from the Internet, such as Baidu, Google and so on, using these results to build the corpus of question and answer. The corpus is then divided into several parts: training sets, development sets, test sets

Question and answer system training is actually a model of how to find the right answer in a bunch of answers, so in order for the sample to be more effective, we don't put all the answers into a vector space in the training process, but we do a grouping of them, first, we take samples from the corpus, Collect 500 answer sets for each question, of which 500 have positive samples, and randomly select negative samples to put inside, so that the positive sample can be highlighted.

The system design based on CNN

Three advantages of CNN: Sparse Interaction (sparse interaction), parameter sharing (parameter sharing), equivalent respresentation (equivalent representation). Because of these three advantages, it is more suitable for the answer selection model in the automatic question answering system training.

We design the convolution formula as follows (do not understand the implications of convolution see "machine learning Tutorial 15-Fine deconvolution Neural Network"):

Assuming that each word is represented by a three-dimensional vector, with 4 words to the left and a convolution matrix on the right, the output is:

If you do 1-maxpool based on this result, then take the maximum value in O

General training methods

Training to get the problem of Word vector vq (this word vector can use Google's Word2vec to train, about Word2vec content can see "do-it-Yourself Chat robot 25-google Text Mining depth Learning tool Word2vec Realization principle"), And a positive answer to the word vector va+, and a negative answer to the word vector va-, then compare the question to the similarity between the two answers, and if the difference between the two similarities is used to update the model parameters if it is greater than a threshold m, and then continue to select the answer in the candidate pool, less than M will not update the model, that is, the optimization function is:

The method of parameter updating is the same as that of other convolution neural networks, which are gradient descent and derivative of chain type.

For test data, calculating the Cos distance between the question and the candidate answer, the one with the greatest similarity is the prediction of the correct answer.

Design of neural network structure

Here are six structural designs, explain, in which HL represents hide layer hidden layer, its activation function is designed to Z = Tanh (wx+b), CNN is the convolution layer, P is the pool layer, the pool step is 1,t is tanh layer, p+t output is vector representation, The final output is the Cos similarity of two vectors

The image of HL or CNN is linked to the expression that they share the same weights. CNN's output is a few dimensions depending on how many convolution features are made, and if there are 4 convolution, then the result is the 4*3 matrix (3 of this will become 1 D after the next step is pooled).

Please respect original, reprint please indicate source website www.shareditor.com and original link address

The effect of the above structure is detailed in the paper "Applying Deep Learning to Answer selection-a Study and a Open Task".

Summarize

The key to applying deep learning to a chat robot is the following points:

1. Selection, combination and optimization of several neural network structures

2. Because it is about natural language processing, there is a word vector that can be recognized by the machine.

3. When it comes to similarity or matching relationships, we should consider the similarity calculation, the typical method is cos distance

4. If the requirement involves the global information of the text sequence, use CNN or LSTM.

5. When the accuracy is not high can add layer

6. Do not forget the parameter sharing and pooling when the calculation is too large

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More