HubbleDotNet open-source full-text search database project-automatic synchronization with existing tables

Source: Internet
Author: User
Document directory
  • Table structure to be indexed
  • TriggerTable)
  • Set TableInfo
  • Create an update trigger
  • Create a delete trigger
  • Incremental Synchronization
  • Execute synchronization operations
  •  
  • Synchronization Process
  • Impact of triggers on Performance
  • Instantiation of TableSynchronization
  • Trigger synchronization:
  • Get synchronization progress:
  • Stop Synchronization

Author: eaglet

Reprinted with the source

For most applications, the full-text search function is only part of the application's functions, not all of them. Many systems often lack full-text search design at the beginning of design, most of the search functions are implemented using the like statement of the database. As the system capacity increases and the number of users increases, such like cannot meet the requirements of intra-site search in terms of performance and function. HubbleDotNet provides a loosely coupled system integration solution for these users. In less than one hour, users can implement the background part of the full-text search function of the existing system, you do not need to make any changes to the existing database table structure, and do not need to write a lot of code. Automatic synchronization with existing table data is an important part of this solution. This document describes how to configure full-text indexes and synchronize data from existing data tables.

Applicable version: HubbleDotNet 0.9.6.0

Home: http://www.hubbledotnet.com/

: Http://hubbledotnet.codeplex.com/

Applicability: This article applies to two modes of passive mode index Append Only and Updatable

Function introduction:

Index existing data tables. You can only create an index in passive mode.

For details, see: create full-text indexes for existing tables or views of the database (1) Append Only mode and HubbleDotNet open-source full-text search database project-create full-text indexes for existing tables of the database (2) Updatable Mode

Before Version 0.9, the passive mode index (IndexOnly = true) cannot be automatically synchronized. Because of the lack of trigger mechanisms between the HubbleDotNet full-text index and the database, the number of data tables in the database increases, changes and deletions cannot be known by hubbledotnet. You must synchronize data with the database manually. For details, see synchronize data with existing tables or views through programs.

Later than version 0.9.6.0, HubbleDotNet provides a mechanism for automatic synchronization with existing tables in the database. You only need to call Hubble. A class provided by SqlClient can trigger HubbleDotNet to synchronize with the database. You can determine when to trigger synchronization based on your actual situation and the synchronization cycle. HubbleDotNet does not provide automatic synchronization tasks for the time being. This is because the current requirement is not very clear. Different projects may have different requirements on the synchronization trigger time and synchronization cycle, for example, some users may want to trigger synchronization at night, and some users may want to synchronize once every five minutes. HubbleDotNet's current version has decided to give the user the initiative to trigger synchronization due to the large demand difference. Of course, this trigger synchronization operation is very simple and can be completed with only three lines of code.

 

The following describes how to set automatic synchronization.

Append only mode

In Append only mode, automatic synchronization is easy. You only need to set TableSynchronization to true in Table Info. You can.

Synchronization Process

 

As shown in, the user is executing Hubble. after the Synchronize method of TableSynchronization provided by SqlClient is used, it first determines whether synchronization is in progress. If synchronization is in progress, False is returned. You can wait until the synchronization is completed, execute this method to trigger synchronization. After the synchronization is triggered, the HubbleDotNet Server scans new records that are not indexed in the database to index these records. After the index is completed, the indexes are optimized according to the optimization scheme specified by the user.

The query analyzer (QueryAnalyzer) of HubbleDotNet provides the interface for calling synchronous data, which is also used as an example code for user reference.

The following is a simple example. The table structure is as follows:

The synchronization settings are as follows:

Execute synchronization operations

 

 

 

 

Start Synchronization

Parameter description:

Step: the number of records read from the database each time during synchronization. If it is set to 5000 and the number of records to be synchronized is 12000, the Server performs three batch indexes after synchronization, and the first two batch indexes are 5000 at a time, the last 2000 entries. Optimize all records to be synchronized after batch indexing. If you don't know what this is about, keep this value unchanged. 5000 is an ideal value I have tested.

Optimize option: Specifies the optimization policy. By default, optimization is performed in the minimal mode. If the index file is large, the optimization in the smallest way will take a long time. You can choose to optimize in this way, that is, the Middle method.

Click Start to Start synchronization. You can click Stop to Stop synchronization.

Synchronization completed

Updatable Mode

The Updatable mode is the original Append, update, and delete modes. This mode is complicated due to deletion and change operations. We need to create a secondary table to implement

The following is an example:

Table structure to be indexed

 

TriggerTable)
create table HBTrigger_EnglishNews
(
Serial bigint identity(1,1) not null primary key,
Id int not null,
Opr char(16),
Fields nvarchar(4000) null,
)

Go

Create index ITriggerOprSerial on HBTrigger_EnglishNews (Opr, Serial)

The table structure is described above. The secondary table must be created according to the table structure, and the table name can be set as needed.

The Id Field is used for the Docid Field in the index table. If the DocId Field in the index table is bigint, you must specify it as bigin.

The Opr field tells hubbledotnet whether to update or delete the changes.

The Fields field is valid only when it is updated. It tells hubbledotnet which Fields have been modified by the Update operation.

If your database is not an SQL SERVER, create a secondary trigger table for the corresponding database according to the table structure.

Note: The secondary trigger table must be in the same database as the primary table. To query Opr and Serial at the same time, we recommend that you create a composite index based on the preceding SQL statement.

Set TableInfo

Set TableSynchronization to true as shown in the following figure and specify the table name of the secondary table.

 

Create an update trigger

If your existing table has an update operation, you must create an update trigger. when the data is updated, the update trigger writes the updated Id and field information to the auxiliary trigger table.

All fields of the Tokenized and Untokenized types must be set in the update trigger. The sample code for updating the trigger is as follows:

Create Trigger HBTrigger_EnglishNews_Update    On EnglishNews                            for Update                             As                                           DECLARE @updateFields nvarchar(4000)    set @updateFields = ''    if Update(GroupId)              begin    set @updateFields = @updateFields + 'GroupId,'    end           if Update(SiteId)              begin    set @updateFields = @updateFields + 'SiteId,'    end           if Update(Time)              begin    set @updateFields = @updateFields + 'Time,'    end           if Update(Title)              begin    set @updateFields = @updateFields + 'Title,'    end           if Update(Content)              begin    set @updateFields = @updateFields + 'Content,'    end       if @updateFields <> ''    begin    insert into HBTrigger_EnglishNews select id, 'Update',  @updateFields from Inserted    end

 

Here, EnglishNews is the name of the main table, and HBTrigger_EnglishNews is the name of the table supporting the trigger table.

GroupId, SiteId, and so on are the untokenized and tokenized fields in the master table. The trigger is used to record which fields have changed when the table is updated.

 

 

 

 

This Code cannot be copied when the trigger is implemented for a specific table. You need to modify Time, Title, Content, and so on based on the index field of the specific table.

Similarly, if the database is not an SQL SERVER, create a trigger according to the trigger syntax of the corresponding database.

 

Create a delete trigger

If your existing table has a delete operation, you must create a delete trigger. when data is deleted, the delete trigger writes the deleted Id information to the secondary trigger table.

 

Create Trigger HBTrigger_EnglishNews_Delete
 
On EnglishNews
 
for Delete
 
As
 
insert into HBTrigger_EnglishNews select id, 'Delete', '' from Deleted

Incremental Synchronization

For incremental (Insert) data synchronization, the Updatable and Append only methods are similar and do not need to be implemented through triggers. Therefore, if your table is frequently incremental, you are not quite worried about the impact of the trigger on data insertion performance, because no trigger is triggered during incremental operations.

 

Execute synchronization operations

 

 

 

 

Start Synchronization

Parameter description:

Step: the number of records read from the database each time during synchronization. If it is set to 5000 and the number of records to be synchronized is 12000, the Server performs three batch indexes after synchronization, and the first two batch indexes are 5000 at a time, the last 2000 entries. Optimize all records to be synchronized after batch indexing. If you don't know what this is about, keep this value unchanged. 5000 is an ideal value I have tested.

Optimize option: Specifies the optimization policy. By default, optimization is performed in the minimal mode. If the index file is large, the optimization in the smallest way will take a long time. You can choose to optimize in this way, that is, the Middle method.

Click Start to Start synchronization. You can click Stop to Stop synchronization.

Synchronization completed

Synchronization Process

 

Impact of triggers on Performance

The trigger only affects the performance of the change and delete operations. Because the trigger only records the change ID and change field, but does not record the actual content of the change, even if there is an impact, this impact is also limited. In some applications, users feel that this impact is unacceptable. Therefore, you can only use this article to synchronize data with existing tables or views through programs.

 

Automatic and timed synchronization through background tasks

You can use either of the following two methods to trigger synchronization:

Automatically synchronize or optimize indexes using background tasks

You can also write a program to trigger synchronization operations. The following describes how to call the program.

 

Program call

See FormTableSynchronization. cs in QueryAnalyzer.

Reference Hubble. SQLClient

Instantiation of TableSynchronization
            TableSynchronization _TableSync;
 
            TableSynchronization.OptimizeOption option = TableSynchronization.OptimizeOption.Minimum;
            int step = (int)numericUpDownStep.Value;
 
            HubbleConnection conn = new HubbleConnection(connectString);
            conn.Open();
 
            _TableSync = new TableSynchronization(DataAccess.Conn, TableName, step, option);

Trigger synchronization:
_TableSync.Synchronize();

This function returns True, indicating that synchronization is successfully triggered.

 

Get synchronization progress:
double progress = _TableSync.GetProgress();  

This function returns the percentage of the synchronization progress. If 100 is returned, the synchronization is completed.

Stop Synchronization
_TableSync.Stop();

If the current table is being synchronized, the synchronization operation is terminated.

 

Return to Hubble.net technical details

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.