Enterprise Search Solution Process

Last Update:2015-07-14 Source: Internet

Author: User

Tags knowledge base

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Prepare the project record for index of search data with enterprise level

1. First identify the range of data that needs to be searched

2. Create a corresponding index library for these base data to store the data that needs to be indexed

So for the core of such a bright spot for example, my personal hands-on experience, previously done is similar to the Enterprise document search solution, for the various kinds of technical documents indexed to autonomy, and then through the foreground platform services call autonomy provided by the restful interface to obtain data;

Data Preparation steps:

1. Prepare the required database tables to record the data that needs to be indexed

Business table:

Node Document Business Table

Delta tables:

Index_cm_inc General Document Delta Table Description: Used to store the data that needs to be added, deleted, modified, monitor the main business table by the trigger node, record the corresponding operation to this delta table

Index_kn_inc Knowledge Base Document Delta Table Description: Same as above, only distinguishes categories because they are stored in different index libraries according to categories

Index Table:

INDEX_CM General Document Index Table Description: Autonomy crawler crawls the document index into the Autonomy document library according to the table's content records;

INDEX_KN Knowledge Base Document Index Table description: Ibid.

Index Status Table:

Index_cm_status General Document Index status Table description: Autonomy crawler After completion, you need to check whether the corresponding document data crawled successfully, whether the index is successful

Index_kn_status Knowledge Base Document Index status Table description: Ibid.

2. Data preparation process and processing process

2.1 First prepare data for delta tables

For the Business table node to create the corresponding trigger, add, delete changes, the data stored in these monitored changes to the corresponding type of delta table; (You can also do not use triggers, but in the program to control the insertion into the corresponding delta table, each of these methods have their own advantages and disadvantages, I feel it.) ）

2.2 Delta table data is synchronized to the Index table (this table data stores all field data that needs to be indexed, as well as the location of the file, and so on, i.e. you need to index all the content fields to the index library)

Through the scheduled task calls the corresponding stored procedure calculation of the required fields to synchronize to the corresponding index table, before our project time is usually set to synchronize the data at 12 o'clock in the morning, in the process of synchronization first to clean up the day before the data to prevent re-crawling; also need to get the failed data from the Index state table to the index table , Index again (disadvantage: The policy is to only process the data for the day, that is, the published document will not take effect in time)

2.3 Start crawler (also scheduled task start)

Set different document categories to handle the corresponding index table, crawl fields and entity files, and so on.

2.4 After the crawler is over, you need to start the check task

According to the different kinds of index table to check whether the index data of the day is successful, the success and the unsuccessful need to indicate the status, because the next day when the index will need to re-index the failed data, then how to check, that is, according to the ID of the document called Autonomy Index Library to view, if you can query, Then the index succeeds, otherwise it fails;

The above is the basic data preparation and index of the great ambition process, in which the revelation can also be broken down, in order to ensure that the data can be indexed successfully, customized different strategies, such as the data in the state table, after the inspection can crawl and other strategies;

This solution revelation is not just for the autonomy, it has switched the search engine as follows: ES lucence based search framework is also available.

The disadvantage is in real-time, but on the basis of this idea can change the strategy to ensure that it's real-time I have not yet to realize the thinking, follow-up hope to do.

Enterprise Search Solution Process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More