Solr4.9 was released. As a netizen said, SOLR is now the biggest version beyond Firefox. The company's SOLR version is 4.0. In the past few days, it was decided to upgrade it to 4.8 (4.9 is not enough resources on maven). It took some time to sort out the distributed SOLR.
Today, I will write down the installation process of solr4.8, which is not much different from 4.0.
1. The environment must be atat7.0 or above, jdk7.0 or above, jre7.0 or above, and environment variables should be configured;
2. SOLR. War under example \ webapps;
3. SOLR home and configure web. xml;
4, copy all jar under example \ Lib \ ext to tomcat_home \ webapps \ SOLR \ WEB-INF \ Lib;
5. Create a new classes folder under tomcat_home \ webapps \ SOLR \ WEB-INF and copy log4j. properties under example \ resources;
6. Start bingo!
---------------------- Programmer's split line -------------------------
The above process can achieve SOLR Single-core and simple indexing. Of course, this is definitely not what we want. We want more than that. The first is multicore. Copy all files from example \ multicore to solr_home and configure them according to SOLR. XML to implement multiple cores. Pay close attention to the configuration of Schema. xml and solrconfig. xml. We recommend that you streamline schema. xml and compare it with solrconfig to avoid nonexistent fields.
After multi-core implementation, it is a database Association to facilitate the import of the database. At present, version 4.8 has bugs. After dataimport, the page will always be indexing. In fact, the process of indexing the database has been completed and cannot be seen, however, the function is not affected. To import a database, you need to put contrib and DIST under solr_home, delete the war and folder under Dist, and modify solrconfig. XML and create the specified dataimport. XML files, which are no different from previous versions. Do not forget to import the database driver to the Web-info/lib under SOLR.
After December 4.8, SOLR released a new word segmentation interface. Previously, solr4.0 interfaces were not available, so we had to implement the SOLR interface again when we needed Chinese word segmentation. So our friends chose a stable SOLR version, in a short period of time, do not change at will, and the workload is not small. However, solr4.8 is worth upgrading. Visual solr5.0 will soon be released. Wait for a quarter. For Chinese word segmentation, I use the ansj word divider, which is open source and continuously updated. The method will be described in another chapter.
Updated on January 1, July 9, 2014
Add the ansj Chinese Word divider.
My ansj is the latest version. I personally prefer this tokenizer. The GitHub address is https://github.com/ansjsun/ansj_seg/. For more information, see. I use the latest version, 2.0 or above. The author provides a variety of good word segmentation methods and a huge library of algorithm tools, which is dazzling. If you want to work with SOLR, You need to rewrite the SOLR or Lucene open Word Segmentation interface. I have rewritten the analyzer and tokenizer interfaces and used methods similar to the latest ik version to implement Chinese Word Segmentation search and indexing. There are many solutions on the Internet, and the implementation methods are different from me. However, SOLR updates and iterations are too fast, so I still read the source code of the word divider and rewrite it myself. The three Chinese Word Segmentation of ansj. If the index data volume is small and the number of users is small, it is recommended to use nlpanalysis to discover new words. Otherwise, toanalysis is more reliable (fast, above ik ).
Let's talk a few more words about word segmentation. Previously, our company used Ik, which I have been using and upgraded. Ik is a compact and powerful word divider, but it has not been updated in the last two years. I have observed that it is the most widely used word divider in China. There are a lot of materials, therefore, ik is not recommended for high requirements. problems can be solved to reduce development workload. Because it is open-source (with the source code hanging on Google, it is very painful) and the code is easy to understand, it is also easier for secondary development. I used to filter sensitive words and have rewritten them before, and the effect is good. Currently, ansj + IK is used. I recommend ansj. Currently, the word splitting speed and function scalability are very powerful, the disadvantage is that the author does not write Lucene and SOLR interfaces (of course, it is not necessary to search for natural languages), and the interfaces written by other experts are rarely updated continuously, you need to modify it yourself. There are too many features in ansj. If you only search, you will find the content redundant. However, since it is a programmer, we should not think of ourselves as coders, but strive to lean towards geeks. Once you understand ansj, you will feel a sense of openness.
Updated on January 1, July 10, 2014
Upgrade solr4.8.1 to fix 10 bugs.
Fastvectorhighlighter enables fast and efficient highlighting, with more Io
Solrconfig. xml configuration:
<Bool name = "F. Title. HL. usefastvectorhighlighter"> true </bool>
Schema. xml configuration:
<Field name = "title" type = "text_ansj" indexed = "true" stored = "true" multivalued = "true" termvectors = "true" termpositions = "true" termoffsets =" true "/>
Bug fixing time is up!
Warn no appenders cocould be found for logger, step 5 checks if there is a problem. I used to write classes as Classses for a long time!
<Schema name = "example core zero" version = "1.1"> <schema name = "example core zero" version = "1.1"> otherwise, the Chinese search performance is very poor, SOLR provides the example success stories! Check schema. xml carefully if your search results are unsatisfactory.
Prompt area!
If you use Chinese word segmentation, schema. xml configuration is critical. In addition to Pasting from the Internet, you 'd better understand the true meaning of the configuration file!
Solrconfig. xml and schema. XML are two configuration files. We recommend that you read them one by one.
I will keep updating this part. Friends who are interested in searching can follow each other. My address on iteye is http://lies-joker.iteye.com/