Hubble.net search engine Analysis 2

Source: Internet
Author: User

1. Hubble. Net Operating Mechanism

According to chapter 1, we know that the Hubble. Net system only stores the index information, while the index document is stored in the database .. We know that Hubble. Net uses inverted index technology to index the word associated with document information. Therefore, index words are the key points of the system. Create the corresponding index information and query the index information based on the index word. Now we will focus on the analysis of. hdx and. idx files, as well as their manifestations in the system and the operating mechanism of the entire system.

1. The index is created based on the data index column. When creating a Table, the system creates an index file based on the data index column specified by the user. Suppose a Table contains three columns: {title (index type Tokenized), url, content (index type Tokenized )}. The system automatically adds two columns to the table, and finally generates the {id (Document ID, when the record is inserted into the database. The value of all document IDs in the system.) title, url, content, Score (document Score )}. The final data columns {title, content} in this table need to create their own shift index objects, which are distinguished by column names. In addition, the inverted index object creates its own index file. The file name consists of seven numbers and column names ({d: 7} + fieldname ).

2. the inverted index file contains a list of independent index words. When a new document (record) index is created, the value is first segmented to determine whether the result word is already in the index word list. If so, the corresponding index information is updated, if not, add it to the index word list. In order to maintain its own index word list, so that the query is used.

3. When the system searches for elements, the query command format is fieldname command. Query commands are separated by spaces. Fieldname is the column name, indicating the index of the column to be queried. Command is the query method. Currently, the system has three query methods: FullTextQuery, MatchQuery, and MutiStringQuery. Words are the query words specified by the user. You can query multiple words. Based on fieldname, the system obtains the corresponding inverted index object and queries "Words" based on the index word list in the inverted index object. The results are combined according to the command type, and finally return it to the user.

4. When the system is running, first initialize according to the configuration file: index directory, IQuerys, IAnalyzers, IDBAdapters, data table (Tables, a table corresponds to a DBProvider object .) And then initialize the inverted index object list based on the table configuration information. In the inverted index file (IndexFile), files of the. hdx type are loaded. In accordance with the. hdx file, load the index file name, location, length, and other information corresponding to the index word to the memory, so far the initialization is complete. Update the. hdx and. idx files when you need to create an index document. When you need to query, configure the index word list, obtain the index file information, and then read the corresponding document ID and Count Based on the location and length (the number of times words appear in the document) DATA (where words appear in the document), Rank (warrant value), query results based on Command combinations and return to the user.

5. Merge (optimize) indexes. There are four types of merged indexes (OptimizationOption): Idle, Minimum, Middle, and Speedy. The merge index file only merges the index file in the column index file. Create an Optimize directory under the current index directory, and merge the directories in the word directory. Obtain the list of index files to be merged Based on the type. Then, read the index information of each word in the merged index list, write it to the new index file information (. idx), and record the new index location information (. hdx ). Finally, update the merged index location information in the memory and delete the pre-merged index file. Copy the merged index file to the current index directory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.