SOLR incremental index-delete business, timed incremental index, and solr Increment
OK, I am writing about SOLR again. I modified and added the incremental Index yesterday. Today I will talk about the deleted incremental index and regularly updated incremental index, enter the text below.
I. Deletion of incremental Indexes
As I said yesterday, the incremental index is actually an index created by SOLR for the data changed in the database during the period from the last (incremental or full) index to the current index, yesterday we talked about adding a piece of data or modifying a piece of data and then creating an incremental index. Now we want to delete the incremental index of the data.
In fact, the deletion here is a false deletion. What does it mean? This is not to say that some data is completely deleted from the database, but to an identifier of the data you do not want to create an index, and then tell solr, you don't need to create an index for the data with this identifier. I don't need it. Then, when creating an index in solr, the data with this special identifier will be ignored, the general principle is like this. How can this problem be achieved? Let's take a look:
1.1 new fields in the database
When adding an incremental Index yesterday, you need to add a field to the database. If you delete the index today, you still need a field. Then, you can use this field to identify whether the data needs to create an index, the fields are as follows:
You can set the field name by yourself. The type can be set to an int. The length is also random. My value is 0, which indicates the data to be indexed and 1 indicates the data to be indexed, this is what I call false deletion data.
1.2 modify the configuration file
Similarly, you need to configure this field to the data-config.xml and schema. xml files, as shown below:
Data-config.xml
Note that the isdelete field must be converted into a file tag first, and the query statement must be added.WhereCondition to query all the data in the database that requires index creation, and then addDeletedPKQueryStatement, which is the same as deltaQuery and deltaImportQuery statements. It only takes effect during incremental indexing. deletedPKQuery is the ID of all the data that has been falsely deleted, deltaImportQuery then queries all the data of the records corresponding to these IDs, and then deletes these indexes from the created indexes.
Schema. xml
There is no major change in the schema. xml file. You only need to add the isdelete field to it:
1.3 view results
Follow the configuration above. Then let's take a look at the effect. The first is the database:
Or 17 data records from yesterday.
Index created by SOLR:
There are also 17 indexes, which correspond to the data in the database. Next I will change the isdelet field of two data items in the database to 1 and then create an incremental index. The method is the same as yesterday, I will stop. First, modify the database:
Change isdelete of Brother FA and Liang chaowei to 1. The incremental index is as follows:
We can see that there are two missing indexes at this time. Are there only a few of the two people I changed to 1? We can look for it. If it cannot be found, it will be right:
We can see that Zhou runfa was found at this time, but only xingye was found. This is still a problem with the word divider. We have already said it yesterday. Let's look at Liang chaowei again:
The search result is empty.
The above two Queries show that the incremental index we have done for false deletion is successful.
Ii. timed incremental Index
If we get http: // localhost: 8080/solr to perform an incremental index every time the database changes two pieces of data, isn't it very troublesome? Therefore, SOLR provides the scheduled task function. Of course, you can also integrate it yourself, such as using spring scheduled tasks or integrating Quartz, to regularly execute the URL of the incremental index, it can also achieve the same purpose, but what we are talking about today is not this. Let's start with the details below.
First, we need to introduce a JAR package. I have already published this package in the DEMO at the end of the first article. After decompression, we can see it, but what I want to say is, the JAR package I sent isJAR package with source code modifiedThe JAR package provided in many places isApache-solr-dataimportscheduler-1.0.jarThis JAR package is:
415 is always returned for http requests, prompting that the media type is not supportedThis problem happened to me for a day last Saturday, and I was almost depressed. Later I found someone else's article on the Internet to solve it, I will also post the article address later. Why? The reason is that one of the classes in this jar package uses the post method when sending an http request, but the request we send here is the get method, so it is always 415. Is it so painful, so you need to enter the jar package and modify the source code before it can work normally. The address of the article I read is.
The second step is to add the following code to the solr web. xml file:
Step 3: ExtractApache-solr-dataimportscheduler-1.0.jar file, find out from the extracted folderDataimport. propertiesFile, and then copy the file to the conf folder in your SOLR_HOME directory,
Note that this conf folder is not the conf folder under SOLR_HOME \ collection1, but the SOLR_HOME \ conf folder. It does not exist in the past and needs to be created by yourself. ,
Step 4: Open the dataimport. properties file and modify the file content. The modified file content is as follows:
######################################## ########### Dataimport schedort properties ######################### ########################### to sync or not to sync #1-active; anything else-inactivesyncEnabled = 1 # which cores to schedule # in a multi-core environment you can decide which cores you want syncronized # leave empty or comment it out if using single-core priority = collection1 # Solr server name or IP address # [defaults to localhost if empty] server = localhost # solr server port # [defaults to 80 if empty] port = 8080 # application name/context # [defaults to current ServletContextListener's context (app) name] webapp = solr # URL params [mandatory] # remainder of URLparams =/dataimport? Command = delta-import & clean = false & commit = true & wt = json & optimize = false # schedule interval # number of minutes between two runs # [ults to 30 if empty] interval = 1 # interval of redoing indexes, the Unit is minute. The default value is 7200, that is, 1 day. # If it is null, It is 0, or comment out: reBuildIndexInterval = 7200 # reBuildIndexParams =/dataimport? Command = full-import & clean = true & commit = true # timing start time of the redo index interval. The time for the first real execution = reBuildIndexBeginTime + reBuildIndexInterval * 60*1000; # Two formats: 03:10:00 or 03:10:00. The latter will automatically complete the date part of the reBuildIndexBeginTime = 03:10:00
Note:
1. syncCores = collection1 indicates creating an index for collection1 core at a scheduled time. If this parameter is not set, indexes are created for collection1 by default. If multicore is used, separate them with commas.
2. Change server = localhost, port = 8080 to your own container address and port number;
3. interval = 1 indicates the time interval of the regular incremental index, in minutes;
4. You can configure others according to the preceding annotations, and there is nothing difficult to understand;
OK. start tomcat according to the above configuration. After one minute, you can see the following information, that is, the scheduled incremental index is successful:
It's time to write a document today. It's really time-consuming. I started writing it at half past eight. It's time to write it for another hour. That's it for now. Let's use the word divider tomorrow, multicore may be added, but not necessarily. Let's talk about it tomorrow.