The company's SOLR online server is divided into almost a core for different departments to use different business needs. Since I took over, there have been a lot of problems and troubles. There are many problems that need to be solved urgently. The first is search accuracy and data synchronization. The search accuracy has been solved by switching to the ansj word divider and the constantly optimized personal word dictionary and the disabled word dictionary. This is a process of continuous optimization and requires a long period of follow-up to achieve remarkable results. The second data synchronization problem actually involves fast new search core, search performance load, data synchronization, and recovery from downtime. In the past, there were various distributed processing solutions. For data synchronization, I wrote a rest WebService interface to implement it separately. downtime recovery can only be handled manually. The performance load is solved by the IT O & M department. Most of them are currently deprecated. Go directly to solrcloud!
It took about a week to complete solrcloud. In fact, it took an hour or two to build the system. It took a long time to understand the principles and advanced usage. various problems and headaches have occurred. This week, many classes have been added, which is really hard! Sometimes I think it is really no Zuo no die. In fact, the old solution is completely usable, and I wrote it by myself. Now I have to overturn it and understand it in depth, any problem can be overwhelmed. I am also engaged in nutch and hadoop, but now there is an unprecedented headache in solrcloud. Solrcloud + zookeeper has a lot of online materials. Unfortunately, 10 thousand articles can be integrated into one article, basically writing about how to build it, and then there will be no more, all kinds of plagiarism, all kinds of point to end. This is also a bad phenomenon in the Chinese open source field. We have repeatedly shared the existing technologies and difficult technologies, and we are reluctant to share them. As a result, Baidu posted dozens of pages and the content was not bad. During this study, there were two headaches: one was Google's failure, and the other was that wiki documents were mostly in English and rarely translated, I would like to thank some friends who have translated solrcloud wiki. I have talked a lot about it. In fact, I still want to advise developers to do more original things, read more documents, and understand more principles rather than the cloud. Now, let's write down my experiences and lessons in the development process!
1. zookeeper can be distributed on a single machine. There are a lot of online information, which is easier to build than hadoop. In the future, you will be interested in preparing a script to install the zookeeper service with one click.
2. Install SOLR. Note: During installation on Linux, do NOT edit SOLR. xml on the window and upload it again !!! In terms of blood, Zookeeper will rewrite SOLR. XML to synchronize data, because Windows and Linux are different things that cannot be rewritten on Linux! After the slor service is restarted, The SOLR core cannot be loaded and remains down. If the status of a SOLR physical node in your solrcloud is down, check vi solr. XML to see if there are many ~ M (often understood by Linux users ). In addition, we recommend that you remove all notes in SOLR. xml! You do not need to create a core in SOLR. xml. We have powerful zookeeper, '(* worker _ worker *)′
3. Upload the core configuration file. Let's talk about understanding things. Once understood, everything is easy to say.
./Zkcli. Sh-server localhost: 2181
You can view the file structure in zookeeper distributed mode. All core configuration files are in confs, and all core (the concept of collection is used in cloud) are in collections. The command to upload the configuration file is as follows:
Java-classpath. :/usr/local/tomcat7/webapplications/SOLR/WEB-INF/lib/* Org. apache. SOLR. cloud. zkcli-cmd upconfig-zkhost hadoop34: 2181, hadoop36: 2181-confdir/usr/local/soft/SOLR-space/alpha_wenuser/conf-confname alpha_wenuser
Upload the configuration file of the alpha_wenuser set (CORE) to two distributed machines. If dataimport is related to your configuration, an error will be reported here or later when you create a core (SET). You need to put dataimport-related jar in SOLR lib under tomcat, instead of putting it in the DIST of solr_home as a single-host SOLR.
4. Create a set. Once the configuration file is OK, you can create a set as you like. A single command is enough without the headache of the local machine. The command is as follows:
Curl 'HTTP: // hadoop36/SOLR/admin/collections? Action = create & name = alpha_wenuser & numshards = 1 & replicationfactor = 1 & collection. configname = alpha_wenuser'
For more information, see Baidu.
5. downtime test. Solrcloud is OK. What, how can it be so simple? Okay, it's so simple. The difficulty is the one-on-one bug. If you carefully read the above operations, basically all bugs can be avoided. Turn off a SOLR machine and test whether the leader is successfully switched. Restart the SOLR machine to check whether the SOLR node is down or active. If the SOLR node is down, check SOLR. XML, there must be a problem I just mentioned. The solution is to delete it and replace it with SOLR in example. XML. Delete the set. The command is as follows:
Curl 'HTTP: // hadoop36/SOLR/admin/collections? Action = Delete & name = alpha_wenuser'
Repeat Step 4 and Step 5.
The above is a mess. In fact, this article does not tell you the detailed method for setting up solrcloud, but the problems that may occur during the construction and use process and what you should pay attention.
References:
Http://shiyanjun.cn/archives/100.html
This step is very reliable and trustworthy. Of course, you need to understand it first. He also missed something.
Http://blog.csdn.net/natureice/article/details/9109351
This is also worth your reference.
Http://www.cnblogs.com/guozk/p/3498844.html
There are various operation commands, which are quite detailed.
Thanks to the experiences shared by these people, the Internet is really good!