Hubbledotnet allows you to easily create full-text indexes for existing tables or views in the database. manual intervention takes no more than 5 minutes. I will explain how to create full-text indexes for existing data tables in a few sections. This article describes how to create a full-text index in the append only mode.
Before creating a full-text index for an existing table or view, we still need to create a database in hubbledotnet. for how to create a database, see hubbledotnet open-source full-text search database project-create and delete a database.
After creating the hubbledotnet database, you can create a full-text index for existing tables or views in the relational database.
The following uses the news library as an example to create a full-text index.
Open the query analyzer and right-click the News Database Section and select create table, as shown in
Create a full-text search for Chinese news and configure basic information for hubbledotnet Data Tables
As shown in, I will first demonstrate how to create a full-text index for Chinese news. Follow the prompts on the page and enter the table name of hubbledotnet. Enter cnews here, enter the directory where the full-text index is located, and select the database Adapter. Here, because my relational database is sqlserver 2005, select sqlserver 2005 as the index. This adapter applies to SQL Server 2005 and later versions.
Then configure the connection string of the relational database. click the button below to test whether the connection string is correct. Click Next to go to the next step.
Select Index Mode
As shown in, we need to select the index mode for this step.
Because a full-text index is created from an existing data table, select
Build index from exist table
In the text box below, enter the name of the actual data table or view in the relational database. Enter news here.
There are two options in incremental mode.
The append only mode is applicable to the mode where data only grows without modification. In this mode, the full-text index field can be applied as long as it is not modified. This mode consumes less memory than append, delete, and update modes, and is faster. If you want to use this mode, the corresponding data table or view in the relational database must have a docid field, which must have a unique index (preferably a clustered index), and if it is self-increasing, or at least make sure that the values inserted later are larger than those inserted earlier.
Append, delete, and update modes. This mode can be used to add, delete, and modify data. The memory usage is larger than that of the previous method (4 bytes more per record ). In this mode, the corresponding data table or view in the relational database cannot be named docid, but there must be an int type Id field, the ID field name can be any name except "docid. If the table has a non-int type primary key field and an index is created, I will explain it later.
Next we will introduce the append only mode, as shown in. This is the structure of the corresponding data table in the relational database:
After configuring this step, click Next to go to the field setting step.
Note that, in versions 8.3.0 and earlier, if the data table contains some special data types, a TCP closed error may occur. This is a bug. Please upgrade it to version 8.3.0.1 or later, for how to upgrade, see the hubbledotnet open-source full-text search database project-how to upgrade. After the upgrade, A Correct prompt will be displayed. I will elaborate on the handling of special types in the future.
Configure index fields
As shown in, hubbledotnet will automatically list all indexed fields. Here we choose
The title and content fields are full-text index fields and tokenized fields. The word segmentation method is based on Chinese news. We choose pangusegment and pangusegment.
For the time field, we select a single-value index and an untokenized index.
If the URL field is not indexed, select none.
For the data types of hublledotnet, see hubbledotnet open-source full-text search database project-data and index types of data tables.
In the figure, the checkbox on the left of each field is used to delete the field. After the field is selected, click Delete to delete the selected field. If the field is not deleted, this checkbox is useless.
After completing this step, click Next to enter the last step.
Complete Index
This step lists the creation statements. You can perform the final check. If you are sure there is no problem, click Finish.
The prompt is displayed.
If you plan to start indexing immediately, select Yes
The rebuild table interface is displayed.
Click rebuild to create a full-text index.
After the full-text index is created, we can optimize it, as shown in
After optimization, you can search. (You can also search without optimization, and the performance will be slower)
Next let's see how to search
Search for Chinese news Example 1
Search for all records with two keywords "Beijing" and "" in the title and sort them in descending order based on the score size.
The parameter meanings following the word component are as follows:
The first parameter indicates the weight of the word component, which is 5000.
The second parameter indicates the actual position of the word component in the input searched sentence. For example, the "Beijing" position is 0 and the university start position is 2.
Top 10: output the first 10 matching records
Example 2
Search for all records with the title or content containing the "Beijing" and "" keywords and sort them in reverse order based on the score size.
Here, the title field is followed by a parameter 2, which indicates that the title field has a weight of 2, that is, the field is set to a weight value through this method.
Between 0 to 9 indicates that records from 0 to 9 are output. This method can be used for paging.
Example 3
Search for all records whose titles contain both the "Beijing" and "" keywords and sort them in descending order based on the score size.
The contains search can be used for exact matching. Here we find that the data searched by contains is much less than the match. Because only the words "Beijing" and "university" are included in the output.
Example 4
Search for the keywords "Beijing" and "" in the title, and all records whose time is later than January 1, January 1, 2007 and earlier than January 1, August 16, 2007 are sorted in reverse chronological order.
Example 5
The search title contains two keywords: "Beijing", "", and all records whose time is later than January 1, 2007 and earlier than August 16, 2007 are sorted in descending order by time and score.
That is, records sorted by time, records with the same time, and records with high scores are ranked first.
Return to hubble.net technical details