Full-Text Indexing----SOLR Server Update Incremental Index

Source: Internet
Author: User
Tags solr

in the previous article we introduced the full-volume update SOLR index, but when the volume of data is large, frequent updating of the index will consume system performance, if the update frequency is low, it will affect the short-term data accuracy, so the interval of the update time is difficult to define. Incremental indexing solves this problem, we can only update those changes in a short period of time, so as to avoid a large number of data updates, because of the small amount of data, we can set a shorter time interval, greatly improve the user experience. This article describes incremental indexes.
A configuration data source
1.1 Database

To facilitate comparison with full-scale indexes, we use the same database and data tables. The key to an incremental index is to find the modified data, so you need to add an identifier, the data type is a timestamp, the field is named UpdateTime, which is four fields, Id,title,content,updatetime, Where the UpdateTime data type is TIMESTAMP and the default value is Current_timestamp. The structure is as follows:


SOLR itself provides a last_index_time that records the time each record was imported (both incremental and full), and we only need to updatetime and last_index_ The time comparison will give you a record of the changes since the last index update.
1.2 Configuring Data-config.xml
The full-scale index continues, so the original configuration does not need to be modified, we only need to add the configuration of the incremental index. First, we use the UpdateTime field in the index, so we need to add the index of the UpdateTime field, and secondly, the key to the incremental index is to find the updated data, and with the above analysis, we need to start with the Last_index_ The Time field finds the updated record code as follows:
deltaquery= "SELECT ID from blog where updatetime > ' ${dataimporter.last_index_time} '"

Finally, we update the index based on the obtained ID, and the code is as follows:
deltaimportquery= "SELECT * from blog where id= ' ${dih.delta.id} '"

The final configuration is as follows:
<dataConfig>    <datasource name= "Jfinal_demo" type= "Jdbcdatasource" driver= "Com.mysql.jdbc.Driver"
    Url= "Jdbc:mysql://192.168.21.20:3306/jfinal_demo" user= "root" password= "123456" batchsize= "-1"/> <document Name= "Testdoc" ><entity name= "blog" datasource= "Jfinal_demo" pk= "id" query= "SELECT * From Blog" deltaimportquery= "SELECT * from blog where id= ' ${dih.delta.id} '" deltaquery= "SELECT ID from blog where updatetime > ' ${dataimporter.last_ Index_time} ' "><field column=" id "name=" id "/><field column=" title "Name=" title "/><field column=" Content "name=" content "/><field column=" UpdateTime "name=" UpdateTime "/></entity></document> </dataConfig>

Some property descriptions in Data-config.xml:
Transformer format conversion: Ignore HTML tags in Htmlstriptransformer index
        Query: Querying database tables to match record data
        Deltaquery: Incremental index query primary key ID, note This can only return ID field
        Deltaimportquery: Incremental index query for imported data
        Deletedpkquery: Incremental Index Delete primary key ID query, note This can only return ID field

1.3 Configuring Schema.xml

On the basis of the full-scale index, we only need to add the index of the Updatatime field, the code is as follows:

<field name= "id" type= "text_general" indexed= "true" stored= "true"/>  <field name= "title" Type= "Text_ General "indexed=" true "stored=" true "/>  <field name=" content "type=" Text_general "indexed=" true "stored=" True "/><field name=" UpdateTime "type=" Text_general "indexed=" true "stored=" true "/>

1.4 Modifying a database to produce a data source
We directly modify a record in the database, providing incremental index data, modified as follows:


Second update index via SOLR Admin client

The 2.1 update operations are as follows:


2.2 Results Test


Description: Use SOLR Admin client mode, simple, fast, intuitive, suitable for data testing.
Three update indexes with HTTP requests
3.1 Principle
We have already introduced the principle in the previous article, here no longer repeat. Previous connection: Click to open link
3.2 implementation

We continue to use the HttpURLConnection object to complete the HTTP request with the following code:

    /** * Access URL, full index */public static Boolean Runhttpget () {Boolean flag = false;            Sets the path of the request String strurl= "Http://192.168.22.216:8983/solr/dataimport?command=delta-import";                The requested parameter is UTF-8 encoded and converted to a byte array = try {//Create a URL object url url=new url (strurl);                Open a HttpURLConnection connection httpurlconnection urlconn= (httpurlconnection) url.openconnection ();                Sets the time of the connection timeout urlconn.setdooutput (true);                When using a POST request, the settings cannot use the cache Urlconn.setusecaches (false);                Set the request for POST request Urlconn.setrequestmethod ("GET");                Urlconn.setinstancefollowredirects (TRUE);                Configuration Request Content-type Urlconn.setrequestproperty ("Content-type", "Application/json, Text/javascript");                Perform connection operation Urlconn.connect (); Send the requested parameter DataOutputStream dos=new DataOutputStream (Urlconn.getoutputstream ());              Dos.flush ();                            Dos.close ();                  if (Urlconn.getresponsecode () ==200) {flag = true;                  Show InputStreamReader ISR = new InputStreamReader (Urlconn.getinputstream (), "utf-8");                       int i;                       String strresult = "";                   Read while ((i = Isr.read ())! =-1) {strresult = strresult + (char) i;                  }//system.out.println (Strresult.tostring ());                   Isr.close ();            }} catch (Exception e) {e.printstacktrace ();      } return flag; }

equivalent to the previous article, using this method, you can also use quartz to do task scheduling, code no longer demonstration.

3.3 Results Test


four use official scheduler to implement index update
4.1 Jar Package Configuration
Download the Apache-solr-dataimportscheduler-1.0.jar into the Web-inf lib directory of the WebApps SOLR directory of Tomcat.
4.2 Modify the Web. xml file under SOLR's Web-inf directory to add a child element to the <web-app> element,
<pre name= "code" class= "HTML" ><listener>        <listener-class>            Org.apache.solr.handler.dataimport.scheduler.ApplicationListener        </listener-class>    
4.3 Modifying the configuration file dataimport.properties:

Create a new directory under the SOLR_HOME\SOLR directory conf (note not solr_home\solr\collection1 below conf), Then use the decompression file to open the Apache-solr-dataimportscheduler-1.0.jar file, the inside of the Dataimport.properties file copied over, to modify, the following is the final update of my automatic timing configuration file content:

################################################# # # # # DataImport S Cheduler Properties # # # ######################################## ######### # to-sync or not-to-sync # 1-active; Anything else-inactive syncenabled=1 # which cores to schedule # in a multi-core environment can decide whic     H cores want syncronized # leave empty or comment it out if using Single-core deployment Synccores=core1,core2 # SOLR server name or IP address # [defaults to localhost if empty] server=localhost # SOLR Server Port # [DEFA Ults to + if empty] port=8080 # application Name/context # [defaults to current Servletcontextlistener ' s context (app) name] WEBAPP=SOLR # URL params [mandatory] # remainder of URL params=/dataimport?command=delta-import&c Lean=false&commit=true # Schedule Interval # Number of minutes between and runs  # [defaults to If empty] interval=1 # Redo Index time interval, unit minutes, default 7200, i.e. 5 days; # is empty, 0, or commented out: The parameter that never re-indexes rebuildindexinterval=7200 # Redo Index rebuildindexparams=/dataimport?command=full-import& Clean=true&commit=true # The timing start time of the redo index interval, the first real execution time =rebuildindexbegintime+rebuildindexinterval*60*1000; # two formats : 2012-04-11 03:10:00 or 03:10:00, the latter will automatically complete the date part as the date when the service was started rebuildindexbegintime=03:10:00

Five-Index Delete
5.1 Overview
In the process of using SOLR or testing, there will be some dirty data, we need to delete these dirty data in a timely manner, a SOLR admin client describes how to delete or empty the index.

5.2 Operation


5.3 Description
We choose the update operation, the file type select XML format, the UPDATE statement can fill in the deletion statement, if you delete an index, you can fill in the following code:
Method One
<delete><id>1</id></delete><commit/>
Method Two
<delete><query>id:1</query></delete><commit/>

If you want to clear all indexes, you can fill in the following code:

<delete><query>*:* </query></delete><commit/>

Vi. Summary
Incremental indexing provides the possibility of small-batch data updates that, in practice, can be combined with a full-scale index and an incremental index to achieve a short-time balance between data synchronization and performance consumption.

Full-Text Indexing----SOLR Server Update Incremental Index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.