Hubbledotnet open-source full-text search database project-create full-text index for existing database tables (2) updatable Mode

Source: Internet
Author: User
ArticleDirectory
    • Complete Index

Hubbledotnet allows you to easily create full-text indexes for existing tables or views in the database. manual intervention takes no more than 5 minutes. I will explain how to create a full text for an existing data table in several sections

Index. This article describes how to create a full-text index in the updatable mode.

Updatable and append only share the same thing in passive mode for index building. The difference is that the updatable mode can be used to add, modify, and delete full-text indexes created for existing databases.

The append only mode can only be used to add or delete (the append only mode cannot be used to delete automatically.ProgramAnd cannot be modified. Updatable mode index synchronization is much more complex than append only,

For more information, see automatic synchronization with existing tables. If your table is only added or deleted without modification, we recommend that you use the append only mode. Otherwise, use the updatable mode.

Before creating a full-text index for an existing table, we still need to create a database in hubbledotnet,

For details about how to create a database, see hubbledotnet open-source full-text search for database projects-create and delete databases.

After creating the hubbledotnet database, you can create a full-text index for existing tables or views in the relational database.

The following uses the englishnews table in the test database as an example to explain how to create a full-text index for an existing table.

 

This is the englishnews table structure in the test database of SQL Server.

Create an index table in hubbledotnet to open the query analyzer. Right-click the News Database and select create table, as shown in.

 

 

 

As shown in, after you click Create Table, this interface will appear. Follow the prompts on the interface to enter the name of the hubbledotnet table. Enter englishnews here. This name does not have to be the same as the database name.

Enter the full-text index directory and select the database Adapter. Here, because my corresponding relational database is sqlserver 2005, select sqlserver 2005. This adapter applies

SQL Server 2005 and later versions. Then configure the connection string of the relational database. This connection string connects the user to the corresponding relational database. If you need to connect to the remote database,

You only need to specify data source = remote IP address here. Click the Test Db connection string button below to test the connection string. Click Next to go to the next step.

Select Index Mode

 

 

 

As shown in, because the index mode is passive, we select build index from exist table

Exist table name or view name: Enter the table name of the corresponding table in the database, and enter englishnews here. Because hubbledotnet can not only index existing tables,

You can also index the view. If the view is not indexed, you only need to enter the corresponding view name in the database.

Because this table needs to be added, deleted, and modified, the incremental mode is set to updatable.

Click Next to go to the next step.

 

Configure index fields

 

 

 

As shown in, hubbledotnet automatically reads the table structure from the database to help you generate the index table structure.

Updatable index. An existing database table must have an int or bigint type Id field. The field name can be any name except docid.

This field must be set to the untokenized index type. This field must be self-incrementing.

If the primary key of the table to be indexed is not of the Int or bigint type, for example, if the vchar type field is used as the primary key, we must insert an auto-increment field of the Int or bigint type in the indexed table,

Create a unique index for this field, and then point the ID field to this field.

After setting all fields, click Next.

 

Complete Index

 

This step lists the creation statements. You can perform the final check. If you are sure there is no problem, click Finish. The table creation statement is as follows. You can create an index table without writing the table creation Statement on the table creation page.

 

 
[Indexonly]
 
[Docid ('Id')]
 
[Directory ('D: \ test11 \ englishnews \')]
 
[Dbtablename ('Engishnews')]
 
[Dbadapter ('Sqlserver200')]
 
[Dbconnect ('Data source = (local); initial catalog = test; Integrated Security = true')]
 
Create TableEnglishnews
 
(
 
Id bigint untokenizedNull Default0,
GroupidIntUntokenizedNull Default0,
 
SiteidIntUntokenizedNull Default0,
 
Title nvarchar (Max) Tokenized Analyzer'Engishanalyzer' Null Default '',
 
Content nvarchar (Max) Tokenized Analyzer'Engishanalyzer' Null Default '',
 
TimeDatetime untokenizedNull Default '2014-1-1',
URL nvarchar (Max)Null,
 
Imageurl nvarchar (Max)Null 
 
);

 

The prompt is displayed.

 

If you plan to start indexing immediately, select Yes

The rebuild table interface is displayed.

Click rebuild to create a full-text index for the current englishnews table.

Note the following:

In the updatable mode, only empty indexes can be rebuilt. That is to say, if there is only one index record in hubbledotnet, rebuild cannot continue,

In this case, you can only click rebuildwholetable to clear the full table index and then rebuild again. If you want to increment or modify an index based on the original index, you must use the automatic

To complete the synchronization. For details about how to use the automatic synchronization function, see automatic synchronization with existing tables.

After the full-text index is created, we can optimize it, as shown in

 

After optimization, you can search. (You can also search without optimization, and the performance will be slower)

Next let's see how to search

Search for news Example 1

Search for all records with any of the keywords ABC News to cut in the title and sort them by matching degree.

 
SQL statement:
Select Top10 ID, title, scoreFromEnglishnewsWhereTitle
 
Match 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 Cut ^ 5000 ^ 9' Order ByScore DESC
 
Result:

 

The parameter meanings following the word component are as follows:

The first parameter indicates the weight of the word component, which is 5000.

The second parameter indicates the actual position of the word component in the input searched sentence. For example, the position of "ABC" is 0 and the start position of news is 3.

Top 10: output the first 10 matching records

Here we can see that the exact match record has the highest score.

 

Example 2

Search for records with all keywords in the ABC News to cut keyword in the title, and sort by matching degree

SQL statement:

 
Select Top10 ID, title, scoreFromEnglishnewsWhereTitleContains 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 Cut ^ 5000 ^ 9' 
 
Order ByScore DESC

 

Result:

The contains search can be used for exact matching. Here we find that the data searched by contains is much less than the match. Because only

Only records containing the four words ABC News to cut are output.

 

Furthermore, because the word "to" is too common, we want to match records that contain both the three words "ABC News cut", and if the record contains ",

The score is higher than the score that does not contain. This search method has already solved Google or Baidu search methods. If a keyword in Google contains a deprecated word,

Non-deprecated words are matched in the same way as the deprecated words, and the deprecated words are matched in the same way. However, if the record contains the deprecated words to be matched, the score is higher than the non-deprecated words.

SQL statement:

 
Select Top10 ID, title, scoreFromEnglishnewsWhereTitleContains 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 ^ 1 cut ^ 5000 ^ 9' 
 
Order ByScore DESC
 
Result:

Here we can see that there is an additional record. This record is in the third place. Compared with the first two records, this record does not belong to this word.

The parameter description following the word component:

To ^ 5000 ^ 7 ^ 1

The first two parameters have the same meaning as other words. One is the weight and the other is the position.

The third parameter is the flag field, and 1 indicates that it can be or.

Example 3

Search for records with any keyword in the ABC News to cut keyword in the title, sort by matching degree, and make statistics based on the groupid field,

10 groupid records at most

SQL statement:

 
 
 
[Groupby ('Count','*','Groupid', 10)]Select Top10 ID, title, scoreFromEnglishnewsWhereTitleMatch 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 Cut ^ 5000 ^ 9' 
 
Order ByScore DESC
 
 

Result:

 

 

The last parameter 10 is an optional parameter, indicating that the top 10 group statistical results are returned. If this parameter is not specified, all group statistical results are returned.
The group by field must be an untokenized index field and cannot be a string.
After this statement is executed, two able statements are output. The first is the result set of the SELECT statement, and the second is the result set of the Group by statement. That is to say, grouping statistics are performed simultaneously during full-text search,

This makes it easier to use and faster.

Further

 
[Groupby ('Count','*','Groupid', 10)] [groupby ('Count','*','Siteid', 10)]Select Top10 ID, groupid, siteid, title, scoreFromEnglishnewsWhereTitleMatch 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 Cut ^ 5000 ^ 9' 
 
Order ByScore DESC
 
 
 
 
This statement outputs the statistical results of both the groupid and siteid fields. Many e-commerce websites need to display the statistical results of multiple groups while searching, hubbledotnet
 
You can use the above syntax to output
 
. In addition, the groupby function of hubbledotnet performs statistics at the underlying layer, which is much simpler and faster than Lucene statistics that require filtering statistics through programs after the results are output.
 
 

The pseudo SQL statement in the groupby section is

 
Select Top10 groupid,Count(*)As Count FromEnglishnewsWhereTitleMatch 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 Cut ^ 5000 ^ 9' 
Group ByGroupidOrder By CountDesc
 
 

Further

[Groupby ('Count','*','Groupid, siteid', 10)]Select Top10 ID, groupid, siteid, title, scoreFromEnglishnewsWhereTitleMatch 'Abc ^ 5000 ^ 0 News ^ 5000 ^ 3 to ^ 5000 ^ 7 Cut ^ 5000 ^ 9' 
 
Order ByScore DESC

The preceding statement can be used for grouping statistics on multiple fields. Note that when multiple fields are groupby, the total number of bytes occupied by multiple fields cannot exceed 8 bytes. And up to 2 int types,

Or eight tinyint types.

For more examples, see hubbledotnet open-source full-text search database project-create full-text indexes for existing tables or views of the database (I) append only mode

return hubble.net technical details

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.