The construction of automatic question answering system based on machine learning

Source: Internet
Author: User

The automatic question answering system is a very hot direction in the current natural language processing field. It uses the technology of knowledge representation, information retrieval and natural language processing synthetically. Automatic question and answer system can make the user in the form of natural language questioning rather than the combination of keywords, put forward information query requirements, the system based on the analysis of the problem, from a variety of data resources automatically find accurate answers. From the system function, automatic question and answer is divided into open domain automatic question answering and limited domain automatic question and answer. Open domain refers to the problem domain is not limited, the user arbitrarily ask questions, the system from the vast numbers of data to find answers; Limited domain refers to the system in advance, can only answer a certain area of the problem, other areas of the problem can not be answered.

In order to test the feasibility of this aspect, recently, the use of Baidu know the relevant question and answer corpus, tested the next.


Specific steps:

(1) Data pre-processing: The original data that Baidu knows through preprocessing into the format specification of the data into the database, easy to follow up, forming the training data required for the original data set.

(2) Build the classifier: using the data to train the text classifier model, when the user proposed test problems can be put on the test questions category tags, lock the knowledge of the answer range;

(3) Similarity search: The test problem and other problems under the same category in the training corpus are computed in text similarity, and the problem of higher similarity is found as a set of similar problems.

(4) Answer extraction: Sort all the answers in the collection of similar questions and select the best answers to feedback to the user.


Inside the core technology is the construction of the classifier, because there is no deep learning method, currently only using the SVM classifier to test, found that it is still feasible. and similar problems to calculate this, there are a lot of ready-made stuff.


Implemented in Java code, the test results are as follows:


  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.