SOLR Incremental Index configuration
1. Before you perform an incremental index, you must first understand a few of the necessary attributes, as well as database-building-list matters, and dataimporter.properties
data-config.xml Data <!--transformer format conversion: HTML tags are ignored in the Htmlstriptransformer index--->
<!--query: Querying database tables to match record data--->
<!--deltaquery: Incremental index query PRIMARY key ID---> Note this only returns the ID field
<!--deltaimportquery: Incremental index query import data--->
<!--deletedpkquery: Incremental Index Delete primary key ID Query---> Note This only returns the ID field
Database Configuration Considerations 1. If only to add, and modify the business, then only one additional Timpstamp field in the database can be, the default value is the current system time, Current_timestamp (the author's data for MySQL)
2. If the deletion of the business is also involved, then the data will need to add more than one additional field Isdelete,int type 0, to identify, whether this record is deleted, of course, can also be identified with other fields, ture or false can be
dataimporter.properties
This configuration file is important, it is used to record the current time with the last modification time, through which it can find, those, newly added, modified, or deleted records
The following is the author at the time of the test when a demo, which add, modify, delete, are involved in
<dataConfig> <!---This session is configured with a MySQL data source, (the data source can also be configured in Solrconfig.xml)---> <datasource n Ame= "MyDB" type= "Jdbcdatasource" driver= "Com.mysql.jdbc.Driver" url= "Jdbc:mysql://localhost/test" user= "root" password= "Ninemax"/> <document> <!--below to describe the properties (if there is an error, please indicate)-<!--pk= "ID" This is necessary, because To query for a primary key ID for an incremental index, <!--datasource= "MyDB" refers to the name of the data source above--<!--name= the name "MyInfo" must be One, there are multiple entities--<!--query= "SELECT * from MyInfo WHERE isdelete=0 query queries refer to all eligible data in the table, Because the author tests the deletion of the business, so where there is a qualification isdelete=0, meaning that the query is not deleted data (note that the query queries only for the first full-volume import has a role, the incremental import does not work)-->
; <!--deltaquery= "Select ID from MyInfo where My_date > ' ${dataimporter.last_index_time} '" Deltaquery means The idea is that the ID of all the modified records could be a modification, an add operation, a delete operation (this query only works on an incremental import, and can only return ID values)-<!--Deletedpkque ry= "Select ID from MyInfo WHEre isdelete=1 "This action value queries the ID of the pseudo-deleted data in those databases (that is, the data isdelete is identified as 1) SOLR deletes the corresponding data within the index (this query only works on the incremental import and only returns the ID value) -<!--deltaimportquery= "SELECT * from MyInfo where id= ' ${dataimporter.delta.id} '" is to get the two steps above
ID, then takes all of its data, updates the index library based on the obtained data, may be deleted, added, modified (this query only works on incremental import, can return multiple field values, in general, columns that return all fields)-- <entity pk= "ID" datasource= "MyDB" name= "MyInfo" query= "select * from MyInfo WHERE isdelete=0" D eltaquery= "Select ID from MyInfo where My_date > ' ${dataimporter.last_index_time} '" Deletedpkquery= "sel
ECT ID from MyInfo where isdelete=1 "deltaimportquery=" select * from MyInfo where id= ' ${dataimporter.delta.id} ' "
> <!--This record is necessary to say that the ID is specified in uppercase and corresponds to the above statement----> <field column= "id" name= "id"/> <field column= "name" name= "name"/> <field column= "Address" name= "Address"/> <field column= "AG
E "name=" Age "/> <field column= "My_date" name= "my_date"/> <field column= "Isdelete" name= "Isdelete"/> </entity
> </document> </dataConfig>
dataimporter.properties Configuration
Reference: Official documents: http://wiki.apache.org/solr/DataImportHandler#Scheduling
Googlecode found: https://code.google.com/p/solr-dataimport-scheduler/
1. Copy the Solr-4.2.11\solr-4.2.1\dist directory under Solr-dataimporthandler-4.2.1.jar and Solr-dataimporthandler-extras-4.2.1.jar to
Under the D:\program\tomcat6\webapps\solr\WEB-INF\lib directory
2. From Https://code.google.com/p/solr-dataimport-scheduler/downloads/list Download Apache-solr-dataimportscheduler-1.0-with-source.jar to
Under the D:\program\tomcat6\webapps\solr\WEB-INF\lib directory
3. Remove the dataimport.properties in the Apache-solr-dataimportscheduler-1.0-with-source.jar to D:\PROGRAM\TOMCAT6\SOLRAPP\SOLR \conf
Conf folder is not, to create a new
4. Modify D:\program\tomcat6\webapps\solr\WEB-INF\web.xml, join
<listener>
<listener-class>org.apache.solr.handler.dataimport.scheduler.applicationlistener </listener-class>
</listener>
5. Modify Dataimport.properties Content:
################################################# # # # DataImport Sch Eduler Properties # # # ############################################ ##### # to-sync or not-to-sync # 1-active; Anything else-inactive syncenabled=1 # which cores to schedule # in a multi-core environment can decide which cor Es want syncronized # leave empty or comment it out if using Single-core deployment #syncCores =game,resource Synccore
S=collection1 # SOLR Server name or IP address # [defaults to localhost if empty] server=localhost # SOLR Server port # [defaults to + if empty] port=8080 # application Name/context # [defaults to current Servletcontextlistener ' s Conte XT (APP) name] WEBAPP=SOLR # URL params [mandatory] # remainder of URL Params=/dataimport?command=delta-import&clea N=false&commit=true # Schedule Interval # Number of minutes between and runs # [defaults to If empty] interval=1 # Redo Index time interval, unit minutes, default 7200, i.e. 1 days; # is empty, 0, or commented out: The parameter that never re-indexes rebuildindexinterval=2 # Redo Index rebuildindexparams=/dataimport?command=full-import& Clean=true&commit=true # The timing start time of the redo index interval, the first real execution time =rebuildindexbegintime+rebuildindexinterval*60*1000; Two formats: 2012-04-11 03:10:00 or 03:10:00, the latter one automatically complements the date part as the date when the service was started rebuildindexbegintime=03:10:00