Hubble.net search engine Analysis 2

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Hubble. Net Operating Mechanism

According to chapter 1, we know that the Hubble. Net system only stores the index information, while the index document is stored in the database .. We know that Hubble. Net uses inverted index technology to index the word associated with document information. Therefore, index words are the key points of the system. Create the corresponding index information and query the index information based on the index word. Now we will focus on the analysis of. hdx and. idx files, as well as their manifestations in the system and the operating mechanism of the entire system.

1. The index is created based on the data index column. When creating a Table, the system creates an index file based on the data index column specified by the user. Suppose a Table contains three columns: {title (index type Tokenized), url, content (index type Tokenized )}. The system automatically adds two columns to the table, and finally generates the {id (Document ID, when the record is inserted into the database. The value of all document IDs in the system.) title, url, content, Score (document Score )}. The final data columns {title, content} in this table need to create their own shift index objects, which are distinguished by column names. In addition, the inverted index object creates its own index file. The file name consists of seven numbers and column names ({d: 7} + fieldname ).

2. the inverted index file contains a list of independent index words. When a new document (record) index is created, the value is first segmented to determine whether the result word is already in the index word list. If so, the corresponding index information is updated, if not, add it to the index word list. In order to maintain its own index word list, so that the query is used.

3. When the system searches for elements, the query command format is fieldname command. Query commands are separated by spaces. Fieldname is the column name, indicating the index of the column to be queried. Command is the query method. Currently, the system has three query methods: FullTextQuery, MatchQuery, and MutiStringQuery. Words are the query words specified by the user. You can query multiple words. Based on fieldname, the system obtains the corresponding inverted index object and queries "Words" based on the index word list in the inverted index object. The results are combined according to the command type, and finally return it to the user.

4. When the system is running, first initialize according to the configuration file: index directory, IQuerys, IAnalyzers, IDBAdapters, data table (Tables, a table corresponds to a DBProvider object .) And then initialize the inverted index object list based on the table configuration information. In the inverted index file (IndexFile), files of the. hdx type are loaded. In accordance with the. hdx file, load the index file name, location, length, and other information corresponding to the index word to the memory, so far the initialization is complete. Update the. hdx and. idx files when you need to create an index document. When you need to query, configure the index word list, obtain the index file information, and then read the corresponding document ID and Count Based on the location and length (the number of times words appear in the document) DATA (where words appear in the document), Rank (warrant value), query results based on Command combinations and return to the user.

5. Merge (optimize) indexes. There are four types of merged indexes (OptimizationOption): Idle, Minimum, Middle, and Speedy. The merge index file only merges the index file in the column index file. Create an Optimize directory under the current index directory, and merge the directories in the word directory. Obtain the list of index files to be merged Based on the type. Then, read the index information of each word in the merged index list, write it to the new index file information (. idx), and record the new index location information (. hdx ). Finally, update the merged index location information in the memory and delete the pre-merged index file. Copy the merged index file to the current index directory.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hubble.net search engine Analysis 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hubble.net search engine Analysis 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support