The incremental index and dataimporter.properties configuration of SOLR timing full Index

Source: Internet
Author: User
Tags commit current time html tags solr time interval server port
SOLR Incremental Index configuration
1. Before making an incremental index, you first need to understand several necessary attributes, as well as database-building matters, and the Dataimporter.properties

data-config.xml inside the data <!--transformer format conversion: Htmlstriptransformer index ignores HTML tags--->
<!--query: Querying database tables conforms to record data--->
<!--deltaquery: Incremental index query PRIMARY key ID---> Note this only returns the ID field
<!--deltaimportquery: Incremental index query import data--->
<!--deletedpkquery: Incremental Index Delete primary key ID Query---> Note This only returns the ID field


Database Configuration Considerations 1. If only to add, and modify the business, then the database only need an additional timpstamp field can be, the default value is the current system time, Current_timestamp (the author's data for MySQL)
2. If it is also involved in the deletion of the business, then the data will need to add an additional field Isdelete,int type of 0, one to identify, whether this record is deleted, of course, can also be identified with other fields, ture or false can be

dataimporter.properties

This profile is important, it is used to record the current time with the last modification time, through which it can find, those, newly added, modified, or deleted records


The following is a demo of the author at that time, which added, modified, deleted, all involved the

    <dataConfig> <!---This section is configured with a MySQL data source (the data source can also be configured in Solrconfig.xml)---> <datasource n Ame= "MyDB" type= "Jdbcdatasource" driver= "Com.mysql.jdbc.Driver" url= "Jdbc:mysql://localhost/test" user= "root" password= "Ninemax"/> <document> <!--below to describe the properties (if there are errors, welcome to point out)--> <!--pk= "ID" This is necessary, because When querying a primary key ID for an incremental index, you need--> <!--datasource= "MyDB" is the name that references the data source above--> <!--name= "MyInfo" must be named only First, when multiple entities exist--> <!--query= "SELECT * from MyInfo WHERE isdelete=0 query refers to all eligible data in the list Because the author tests the deletion of the business, so where there is a qualifying isdelete=0, meaning that the query has not been deleted data (note that this query queries only for the first full import of the effect, the incremental import does not work)
			   ; <!--deltaquery= "Select ID from MyInfo where My_date > ' ${dataimporter.last_index_time} '" Deltaquery meaning Think yes, the ID of all the modified records may be a modification operation, add operation, delete operation (this query only works on incremental import, and only return ID value)--> <!--Deletedpkque ry= "Select ID from MyInfo WHEre isdelete=1 "This action value queries the ID of the data that is pseudo deleted in the database (that is, the data that Isdelete identifies as 1) SOLR uses it to delete the corresponding data in the index (this query only works on incremental imports and only returns ID values) --> <!--deltaimportquery= "SELECT * from MyInfo where id= ' ${dataimporter.delta.id} '" "The secondary query is to get the two steps above
			   
			   
			 
		     ID and then get all of its data, update the index library based on the obtained data, possibly delete, add, modify (this query works only on incremental imports, you can return the values of multiple fields, in general, all columns that return all fields)--> <entity pk= "ID" datasource= "MyDB" name= "MyInfo" query= "select * from MyInfo WHERE isdelete=0" D eltaquery= ' Select ID from MyInfo where My_date > ' ${dataimporter.last_index_time} ' ' deletedpkquery= ' sel 
			  
			 ECT ID from MyInfo where isdelete=1 "deltaimportquery=" select * from MyInfo where id= ' ${dataimporter.delta.id} '
		       > <!--This record is necessary to say that the ID specifies uppercase, corresponding to the above statement----> <field column= "id" name= "id"/> <field column= "name" name= "name"/> <field column= "Address" name= "Address"/> <field column= "AG
			  E "name=" Age "/>  <field column= "My_date" name= "my_date"/> <field column= "Isdelete" name= "Isdelete"/> </entity  
    > </document> </dataConfig>

dataimporter.properties Configuration

Reference: Official document: Http://wiki.apache.org/solr/DataImportHandler#Scheduling


Googlecode found: https://code.google.com/p/solr-dataimport-scheduler/


1. Copy the Solr-4.2.11\solr-4.2.1\dist directory Solr-dataimporthandler-4.2.1.jar and Solr-dataimporthandler-extras-4.2.1.jar to


The D:\program\tomcat6\webapps\solr\WEB-INF\lib directory


2. From Https://code.google.com/p/solr-dataimport-scheduler/downloads/list Download Apache-solr-dataimportscheduler-1.0-with-source.jar to


The D:\program\tomcat6\webapps\solr\WEB-INF\lib directory


3. Remove dataimport.properties from Apache-solr-dataimportscheduler-1.0-with-source.jar to D:\PROGRAM\TOMCAT6\SOLRAPP\SOLR \conf


Conf folder is not, to create a new


4. Modify D:\program\tomcat6\webapps\solr\WEB-INF\web.xml and join

<listener>
             <listener-class>org.apache.solr.handler.dataimport.scheduler.applicationlistener </listener-class>
   </listener>

5. Modify Dataimport.properties Content:

################################################# # # # # DataImport Sch # Eduler Properties # # # ############################################ ##### # to sync # 1-active; Anything else-inactive syncenabled=1 # which cores to schedule # into a multi-core environment you can decide which Es want syncronized # leave empty or comment it out if using Single-core deployment #syncCores =game,resource Synccore
S=collection1 # SOLR Server name or IP address # [defaults to localhost if empty] server=localhost # SOLR Server port # [Defaults to-if empty] port=8080 # application Name/context # [defaults to current Servletcontextlistener ' s Conte XT (APP) name] WEBAPP=SOLR # URL params [mandatory] # remainder of URL Params=/dataimport?command=delta-import&clea N=false&commit=true # Schedule Interval # Number of minutes between two runs # [defaults to If empty] interval=1 # Redo Index time interval, per minute, default 7200, or 1 days; # is empty, 0, or commented out: A parameter that represents never Redo Index rebuildindexinterval=2 # Redo Index rebuildindexparams=/dataimport?command=full-import&  Clean=true&commit=true # Redo The timing start time of the indexing interval, the first time the real execution =rebuildindexbegintime+rebuildindexinterval*60*1000; Two formats: 2012-04-11 03:10:00 or 03:10:00, the latter will automatically complement the full date section for the date when the service was started rebuildindexbegintime=03:10:00


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.