Solr4.8.0 Source Code Analysis (24) of the solrcloud of the Recovery Strategy (v)
Preface: The recovery strategy for Solrcloud has already written four articles, this should be a systematic introduction recovery strategy of the last article. This paper mainly introduces the master-slave synchronous replication of SOLR. It and the previous <solr4.8.0 source code Analysis (22) of the solrcloud of the Recovery Strategy (c) > Slightly different, the previous article is about Solrcloud leader and replica synchronization, You do not need to configure Solrconfig.xml to implement. This paper mainly introduces the single-machine mode, using Solrconfig.xml to realize the synchronization of master-slave server.
In a distributed environment, because of the high concurrency problem, we usually need to deploy multiple servers to load balance, thus avoiding the hot issue of single point of access, or the problem of server paralysis due to excessive load. Solr4.x after the introduction of Solrcloud distributed cluster scheme, and 4.x before the Master/slave cluster mode, the architecture has undergone significant changes, Solrcloud not only solves, high concurrency load balancing problem, but also solves the massive data retrieval performance problem, To a large index, the use of divide-and-conquer method index to a number of independent nodes, but this does not mean that the 4.x before the Master/slave architecture was eliminated, in fact, Solrcloud inside is also a comprehensive use of the master-slave approach, to solve the problem of concurrency. If the amount of data is not very large, it is not necessary to adopt the Solrcloud scheme, but at this time, you need to solve the high concurrency problem, or server single point of failure problem? The answer is to take the classic master-slave architecture pattern to solve,
1. Configuration
1<requesthandler name= "/replication"class= "SOLR." Replicationhandler ">2<lst name= "Master" >3<!--when to start synchronizing, parameter name Replicateafter, value can be startup commit optimize--4<str name= "Replicateafter" >startup</str>5<str name= "Replicateafter" >commit</str>6 7<!--data backup parameter name Backupafter can also appear as startup commit optimize--8<!--<str name= "Backupafter" >optimize</str>-9 Ten<!--Configure profiles to synchronize-- One<str name= "Conffiles" >schema.xml,stopwords.txt,elevate.xml</str> A<!--commit synchronization time interval, default is 10 seconds, generally do not change-- -<str name= "Commitreserveduration" >00:00:10</str>
<!--the number of backups--
<str name= "Maxnumberofbackups" >1</str>
</lst>
</requestHandler>
Parameter description:
-
- If commit is more frequent, or the network is slow, you need to fine-tune the <str name= "commitreserveduration" >00:00:10</str>. This is roughly the time that master downloads the 5M size of data to slave, which defaults to 10.
- If you set Replicateafter to startup, you also need to set commit/optimize if you need to trigger commit/optimize later. If startup is set only, then triggering over replication at startup will not be triggered in subsequent commit/optimize.
- Slave mode
1<requesthandler name= "/replication"class= "SOLR." Replicationhandler ">2<lst name= "Slave" >3 4< sync address!--Master Service--5<str name= "MasterUrl" >http://master_host:port/corename/replication</str>6 7<!--time interval for synchronous polling, which is the application counterattack--8<str name= "PollInterval" >00:00:20</str>9<str name= "Compression" >internal</str>Ten<!--HTTP-related parameters set-- One<str name= "Httpconntimeout" >5000</str> A<str name= "Httpreadtimeout" >10000</str> - -<!--if the primary service setting is authenticated, set the login user name password here-- the<str name= "Httpbasicauthuser" >username</str> -<str name= "Httpbasicauthpassword" >password</str> -</lst> -</requestHandler>
Parameter description:
-
- The Compression setting can be set to internal or external, which is used to compress files when transferring them. If set to external, it is a compressed transfer and is set to internal for normal transmission. Do not set external unless the network bandwidth is preferred, as it will affect replication.
- Repeater mode
If it is a master, multiple slave times, the load on master will be particularly large when multiple slave download files from master. To avoid this, you can use the Repeater mode to improve, that is, when a node is master and slave:
-
- also configured on Solrconfig.xml
- even if master on repeater is set to Replicateaf TER is optimize, and you need to set a commit. This is because optimize is invalid in slave.
- is compression or valid in repeater
1 <requesthandler name="/replication " Class = "SOLR." Replicationhandler "> <lst name=" master "> 3 <str name= "Replicateafter" >commit</str> 4 <str Name= "Conffiles" >schema.xml,stopwords.txt,synonyms.txt</str> 5 </lst> <lst name=" slave "> 7 <str name= "MasterUrl" >http:// Master.solr.company.com:8080/solr</str> 8 <str name= " PollInterval ">00:00:60</str> </lst>10 </requesthandler>
-
- You can use the following configuration to turn on master or slave in repeater. After opening, it will keep the information in the solrcore.properties files in the corresponding core.
1<requesthandler name= "/replication"class= "SOLR." Replicationhandler ">2<lst name= "Master" >3<str name= "Enable" >${enable.master:false}</str>4<str name= "Replicateafter" >commit</str>5<str name= "Conffiles" >schema.xml,stopwords.txt</str>6</lst>7<lst name= "Slave" >8<str name= "Enable" >${enable.slave:false}</str>9<str name= "MasterUrl" >http://master_host:8983/solr</str>Ten<str name= "PollInterval" >00:00:60</str> One</lst> A</requestHandler>
1 #solrcore. Properties in Master 2 enable.master=true3 enable.slave=false45# Solrcore.properties in slave6 enable.master=false7 enable.slave=True
2. Working principle
- When Master commits a commit or optimize, Replicationhandle obtains the file name of the commit, and then decides to store the files with Lucene or extract according to the Replicateafter parameter.
- Master does not know how many Slave,slave periodic polling (pollinterval) to check the version of Master, and if Master has a new version, it begins synchronous replication. The steps are as follows:
- Slave issue a filelist command to collect a list of files. This command returns a series of metadata (Size,lastmodified,alias, and so on).
- Slave see if it has these files locally, then it will start downloading the missing files (using command filecontent). If the connection fails, it starts again from the point of failure. It will retry 5 times and discard if it still fails.
- The file was downloaded to a temporary directory. Therefore, errors in the middle of the download do not affect slave.
- When the download is complete, all new files are moved to the working directory, and the timestamp of the file is matched to the master's timestamp.
- A commit command is executed by Replicationhandler, and the new index is loaded in.
- The replication steps for the configuration file are as follows:
- Files that need to be copied must be explicitly marked with Conffiles.
- Only files in the Conf directory of the SOLR instance can be copied.
- The profile is copied along with the new index, meaning that even if the master configuration file is modified, it will not be copied immediately, only if the master's optimize or commit is triggered.
- Unlike the index file through timestamp for file matching comparisons, configuration files are matched by their checksum, checksum the same file.
- When the profile is downloaded, it is placed in a temporary directory, and the old configuration file is renamed but still placed in the original directory, Replicationhandle does not delete it, and then moves the new file to the working directory.
- If there is a configuration file down, slave will be reload.
- What if I add data to slave or the index of slave is broken?
- If there is no new index data for master, then slave does not synchronize with master. When Master has a commit, the index version of Master is no longer the same as slave. Slave will fetch the files from master and then find that the local files are different from master, he will copy all of the Master's index files to the new directory, and then ask SOLR to update the indexed directory.
3. Http API
Description |
Command |
Gets the most recent version of the index that master can copy |
Http://master_host:port/solr/replication?command=indexversion |
Cancel copy of Master to slave index |
Http://slave_host:port/solr/replication?command=abortfetch |
If there is a commit index data, create a backup on master |
Http://master_host:port/solr/replication?command=backup |
Provide a directory path for backup at the same time |
&location=/foo/bar |
You can also set the number of backups Numbertokeep, the old backup will be automatically deleted, Numbertokeep is invalid if maxnumberofbackups is set in the configuration file |
&numbertokeep=2 |
Force slave to fetch data from master |
Http://slave_host:port/solr/replication?command=fetchindex |
You can set additional properties such as "MasterUrl" or "Compression" ( <lst name= "slave" > Configuration ) for one-time replication operations |
|
Prevent slave from polling with master |
Http://slave_host:port/solr/replication?command=disablepoll |
Turn on slave to poll master |
Http://slave_host:port/solr/replication?command=enablepoll |
Get all current status and configuration information |
Http://slave_host:port/solr/replication?command=details |
Gets the version information for the index |
Http://host:port/solr/replication?command=indexversion |
Gets the Lucene index file information under the current version |
Http://host:port/solr/replication?command=filelist&indexversion=<index-version-number> |
Disable master-to-index slave replication functionality |
Http://master_host:port/solr/replication?command=disablereplication |
Turn on the replication function of the master to index Slave |
Http://master_host:port/solr/replication?command=enablereplication |
Transferring files from master 1. Add the parameter file=<index-filename> if it is an index file 2. If it is a configuration file, add the parameter cf=<config-filename> |
http://master_host:port/solr/replication?command=filecontent&wt=filestream&indexversion=< Index-version-number> |
Summarize:
This section mainly introduces the configuration, implementation, and HTTP APIs of replication's master-slave replication. From the realization of the process, the front of the text is still consistent. So far the Solrcloud's recovery strategy has been finished. Next series want to learn the next Solrcloud oversee, feel online will oversee not much, just to learn this.
Solr4.8.0 Source Code Analysis (24) of the solrcloud of the Recovery Strategy (v)