Using SOLR to build an enterprise search platform

Source: Internet
Author: User
Tags commit lowercase pack relative solr ssh tomcat

Find a lot of information about SOLR on the web and find it very incomplete, even in the official wiki.

Based on the SOLR application being researched at this stage, I will share some of my experience.

Today is to say: how to run up.

1 "First download good solr, I use solr1.3, download address:

Windows version

Http://labs.xiaonei.com/apache-mirror/lucene/solr/1.3.0/apache-solr-1.3.0.zip

Linux version

Http://labs.xiaonei.com/apache-mirror/lucene/solr/1.3.0/apache-solr-1.3.0.tgz

2 "Ready to run the container, I'm using tomcat6.0.20. If you are playing, you can also not be prepared specifically for easy, you just unzip the SOLR download package, locate the example folder, and then run Start.jar. Specific instructions: Java-jar Start.jar When doing the application, it is not recommended to use this method. The jetty is contained within this approach.

3 "The use of Tomcat is not the focus of this article, if you have a problem, see Tomcat usage. Unzip the Tomcat and copy the Apache-solr-1.3.0.war from the Dist folder below the SOLR package to the WebApps of Tomcat and rename it to Solr.war.

4 "Create a new Solr-tomcat folder, I am to put this folder in the C packing directory, you can use the same way, you can not, where you decide." After setting up the folder, put it under the set up a SOLR folder and put all the files under the EXAMPLE/SOLR folder inside the SOLR package into SOLR.

5 "The last step to configure Solr.home, there are three ways.

1) method based on the current path

This situation needs to be started in the c:/solr-tomcat/directory TOMCAT,SOLR find./SOLR, so you need to switch to c:/solr-tomcat/at boot time

2) based on environment variables

Windows establishes solr.home in environment variables with a value of C:/solr-tomcat

Linux adds the following environment variable in the current user's environment variable (. bash_profile) or in catalina.sh

Export java_opts= "$JAVA _OPTS-DSOLR.SOLR.HOME=/OPT/SOLR-TOMCAT/SOLR"

3) based on Jndi

Create the Catalina folder in the Tomcat Conf folder, and then in the Catalina folder, create a localhost folder under which Solr.xml is created, with the contents:

XML code

<context docbase= "C:/tomcat/webapps/solr.war" debug= "0" crosscontext= "true" >
<environment name= "Solr/home" type= "java.lang.String" value= "C:/SOLR-TOMCAT/SOLR" override= "true"/>
</Context>

Problem Description:

One problem with personal discovery is that if you configure Jndi, and then start Tomcat under Tomcat's Bin folder, the Solr folder is established under Tomcat's Bin, which is the main index file in this folder. Originally these things should be put into C:/SOLR-TOMCAT/SOLR. If you don't want this to happen, use a method based on the current path.

6. Open the browser and see if you can access the service. If you can access, congratulations on your running.

How to add Chinese participle in solr, reference some articles, but it took a lot of time to get out. Perhaps the heroes are too bull, so many details are not written out. But to point out is that many articles are copied to copy.

In the following:

In one of the above articles, SOLR has been run up, on the basis of the above, adding Chinese participle. I'm using a paoding word breaker.

1 "Please paoding the word breaker, download the address:

Http://code.google.com/p/paoding/downloads/list, thank you very much here paoding Author: qieqie

In the use of paoding to note: paoding dic position, that is, the location of the dictionary, there are two ways to solve:

1) Add the Paoding_dic_home variable to the system environment variable, and the value is the decompression position of DIC under the paoding compression packet.

2) Paoding-analysis.jar inside there is a paoding-dic-home.properties file, specify DIC also can, but to recompile this jar package, I use the latter method, as long as each fixed the DIC location, deployment is not very inconvenient, set environment variables I compare Disgusted

2. Document creation

Java code

Package Com.yeedoo.slor.tokenizer;
 
Import Java.io.Reader;
Import Java.util.Map;
 
Import Net.paoding.analysis.analyzer.PaodingTokenizer;
Import Net.paoding.analysis.analyzer.TokenCollector;
Import Net.paoding.analysis.analyzer.impl.MaxWordLengthTokenCollector;
Import Net.paoding.analysis.analyzer.impl.MostWordsTokenCollector;
Import Net.paoding.analysis.knife.PaodingMaker;
 
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.solr.analysis.BaseTokenizerFactory;
 
public class Chinesetokenizerfactory extends Basetokenizerfactory {
 
/**
* Maximum shard default mode
*/
public static final String Most_words_mode = "Most-words";
 
/**
* By Maximum segmentation
*/
public static final String Max_word_length_mode = "Max-word-length";
 
Private String mode = NULL;
 
public void SetMode (String mode) {
if (mode = = NULL | | Most_words_mode.equalsignorecase (MODE) | | "Default". Equalsignorecase (Mode)) {
This.mode = Most_words_mode;
} else if (Max_word_length_mode.equalsignorecase (MODE)) {
This.mode = Max_word_length_mode;
} else {
throw new IllegalArgumentException ("Illegal parser mode parameter set:" + mode);
}
}
 
@Override
public void init (map<string,string> args) {
Super.init (args);
SetMode (Args.get ("mode"). toString ());
}
 
Public Tokenstream Create (Reader input) {
return new Paodingtokenizer (input, Paodingmaker.make (), Createtokencollector ());
}
 
Private Tokencollector Createtokencollector () {
if (Most_words_mode.equals (MODE))
return new Mostwordstokencollector ();
if (Max_word_length_mode.equals (MODE))
return new Maxwordlengthtokencollector ();
throw new Error ("Never Happened");
}
 
}

Pack the file, and if you don't want to pack it, go to the attachment.

Please put two jars, one for this, and one for Paoding-analysis.jar to Tomcat webapps/solr/web-inf/lib/.

3 "Change the Schema.xml file to make the word breaker work. If you've seen the previous article, Schema.xml's location is below c:/solr-tomcat/solr/conf/.

Change content to:

XML code

<fieldtype name= "text" class= "SOLR. TextField "positionincrementgap=" >
<analyzer type= "Index" >
<!--<tokenizer class= "SOLR. Whitespacetokenizerfactory "/>-->
<tokenizer class= "Com.yeedoo.slor.tokenizer.ChineseTokenizerFactory" mode= "Most-words"/>
··· ···
</analyzer>
<analyzer type= "Query" >
<!--<tokenizer class= "SOLR. Whitespacetokenizerfactory "/>-->
<tokenizer class= "Com.yeedoo.slor.tokenizer.ChineseTokenizerFactory" mode= "Most-words"/>
··· ···
</analyzer>
</fieldType>

<!----> what needs to be changed

4 "Restart your tomcat, OK.

Report:

[SOLR word order]

SOLR index and query all the words to the word string, in the index library to add the index of the full-text search type, SOLR will first use a space for word segmentation, and then the word segmentation results are filtered using the specified filter, the last remaining results will be added to the index library for query. The order of the participle is as follows:

Index

1: Space whitespacetokenize

2: Filter Word Stopfilter

3: Chaizi Worddelimiterfilter

4: Lowercase filter lowercasefilter

5: English similar words englishporterfilter

6: Remove repeating words removeduplicatestokenfilter

Inquire

1: Query for similar words

2: Filter Words

3: Chaizi

4: Lowercase Filter

5: English similar words

6: Remove Duplicate words

The above is for English, Chinese except spaces, others are similar

Running SOLR is a simple matter, and it's not easy to get SOLR to run your project efficiently. There are too many factors to consider. It is important to understand the configuration of SOLR. Knowing the meaning of each configuration item in the configuration file will make it a duck's work.

In Solr, the main one is the Schema.xml,solrconfig.xml under SOLR's home directory, and if you read the first two articles, you should know where Solr's home directory is located (c:/solr-tomcat/solr/conf/).

In this article, we first say this schema.xml.

Schema.xml, which is equivalent to a data table configuration file, which defines the data type of the data being indexed. Mainly includes types, fields, and other default settings.

1 "First, we need to define a FieldType sub-node within the types node, including Name,class,positionincrementgap, and so on, name is the FieldType, Class refers to the class name in the Org.apache.solr.analysis package that defines this type of behavior. The most important thing to define in FieldType is to define the parser analyzer that this type of data will use when indexing and querying, including word breakers and filtering. In the example text this fieldtype is defined by using SOLR in the Index Analyzer. Whitespacetokenizerfactory This participle packet, is the space participle, and then use SOLR. STOPFILTERFACTORY,SOLR. WORDDELIMITERFILTERFACTORY,SOLR. LOWERCASEFILTERFACTORY,SOLR. ENGLISHPORTERFILTERFACTORY,SOLR. Removeduplicatestokenfilterfactory these filters. When you add a text type index to the index library, SOLR first uses a space to make the word, then filters the word-breaker results using the specified filter, and the remaining results are added to the index library for querying. SOLR's analysis package does not have a Chinese-supported package, in the second article detailed how to add paoding Chinese word breaker, for details, see http://lianj-lee.javaeye.com/blog/424474

2 "The next task is to define a specific field within the fields (similar to a field in the database), that is, the filed,filed definition includes name,type (for various fieldtype defined previously), indexed (whether indexed), stored (whether it is stored), multivalued (whether there are multiple values), and so on.

Cases:

XML code

<fields>
<field name= "id" type= "integer" indexed= "true" stored= "true" required= "true"/>
<field name= "name" type= "text" indexed= "true" stored= "true"/>
<field name= "Summary" type= "text" indexed= "true" stored= "true"/>
<field name= "Author" type= "string" indexed= "true" stored= "true"/>
<field name= "Date" type= "date" indexed= "false" stored= "true"/>
<field name= "Content" type= "text" indexed= "true" stored= "false"/>
<field name= "keywords" type= "Keyword_text" indexed= "true" stored= "false" multivalued= "true"/>
<field name= "All" type= "text" indexed= "true" stored= "false" multivalued= "true"/>
</fields>

The definition of field is quite important, there are a few tricks to note that there may be many worthwhile fields as far as possible to set the multivalued property to true, avoid building an index is thrown error, if you do not need to store the corresponding field value, try to set the stored property to False.

3 It is recommended to create a copy field that copies all the entire text fields to a single field for a unified search:

XML code

<field name= "All" type= "text" indexed= "true" stored= "false" multivalued= "true"/>

and complete the copy setting at the Copy field node:

XML code <copyfield source= "name" dest= "All"/>
<copyfield source= "Summary" dest= "All"/>

4 "In addition, you can define a dynamic field, so-called dynamic field is not to specify a specific name, as long as the definition of the field name of the rules, such as defining a dynamicfield,name to *_i, define its type is text, then when using this field, any to _ The fields at the end of I are considered to conform to this definition, for example: name_i,gender_i,school_i, etc.

SOLR multicore is a new feature of SOLR 1.3. Its purpose is a SOLR instance that can have multiple search applications.

Here's a look at the example that Solr gave out, in the article "Using SOLR to build one of the enterprise search platforms (running SOLR)", which has been discussed in how to run SOLR, this article is based on the use of SOLR to build an enterprise search platform (run SOLR), If you don't understand, see http://lianj-lee.javaeye.com/blog/424383.

1 "Find the example folder in SOLR download package, under it there is a multicore folder, copy everything under this folder to C:/SOLR-TOMCAT/SOLR below.

Note: There is a solr.xml (this is just the default file, and of course you can specify other files), such as:

XML code

<?xml version= "1.0" encoding= "UTF-8"?>
<SOLR persistent= "false" >
<cores adminpath= "/admin/cores" >
<core name= "core0" instancedir= "Core0"/>
<core name= "Core1" instancedir= "Core1"/>
</cores>
</solr>

This file is telling SOLR which core,cores should be loaded with core0,core1. CORE0 (can be analogous to the previous solr.home)/conf directory has schema.xml and solrconfig.xml, you can put the actual application of the copy over. Now the example is official.

2 "Start Tomcat, access the app, you can see the admin core0 and admin core1

3 "Using the default Solr.xml above, the index file will be stored under the same directory, here will be stored in C:/solr-tomcat/solr/data, if you want to change the directory, or two applications in different directories, see the following XML."

XML code

<core name= "core0" instancedir= "Core0" >
<property name= "DataDir" value= "/data/core0"/>
</core>

Add a child element to the core Property,property two properties don't say it, you can understand it at a glance.

Solr.core.name-The core ' s name as defined in Solr.xml

Solr.core.instanceDir-The core ' s instance directory (i.e. the directory under which that core ' s Conf/and data/directo Ry is located)

Solr.core.dataDir-The core ' s data directory (i.e. the directory under which that core's index directory is located)

Solr.core.configName-the name of the core S config file (solrconfig.xml by default)

Solr.core.schemaName-the name of the core ' s schema file (Schema.xml by default)

4 "Solr.xml Specific meaning:

1) SOLR

The <solr> tag accepts attributes:

Persistent-by default, should runtime core manipulation be saved in solr.xml so it's available after a restart.

Sharedlib-path to a directory containing. jar files is added to the classpath of every core. The path is relative to Solr.home (where Solr.xml sits)

2) Cores

The <cores> tag accepts attribute:

Adminpath-relative path to access the Coreadminhandler for dynamic core manipulation. For example, adminpath= "/admin/cores" configures access via Http://localhost:8983/solr/admin/cores. If This attribute are not specified, the dynamic manipulation is unavailable.

3) Core

The <core> tag accepts attributes:

Name-the registered core name. This'll be the what's the core is accessed.

Instancedir-the Solr.home directory for a given core.

Datadir-the Data directory for a given core. The default is <instancedir>/data.  It can take an absolute path or a relative path w.r.t instancedir. Solr1.4

4) Property

The <property> tag accepts attributes:

Name-the Name of the property

Value-the value of the property

Because E is too simple, it is not translated.

I believe many people, in preparing to submit data for SOLR to index the moment, very puzzled, despite reading a lot of online articles, but I think there are still a lot of people do not understand the place.

For example, submit an XML, using post, although some articles say that you can use HttpClient. But I didn't quite understand it at the time, and of course it seems that there is nothing. But for a beginner who is just getting started with SOLR, I would like to talk about the solrj of solr1.3 (SORLR J currently uses the binary format as the default format. For solr1.2 users, the XML format can be used by displaying the settings. )。

Let's start with an example:

Java code

public static final String Solr_url = "HTTP://LOCALHOST/SOLR/CORE0";
public static void Commit () {
Date date = new Date ();
Solrserver SOLR = null;
try {
SOLR = new Commonshttpsolrserver (Solr_url);
} catch (Malformedurlexception E1) {
E1.printstacktrace ();
}
for (int i = 0; i < 10000; i++) {
Solrinputdocument sid = New Solrinputdocument ();
Sid.addfield ("id", i);
Sid.addfield ("name", "Struts+hibernate+spring development Daquan" + i);
Sid.addfield ("Summary", "Comprehensive application of three kinds of frames" + i);
Sid.addfield ("Author", "Li Liange" + i);
Sid.addfield ("Date", new Date ());
Sid.addfield ("Content", "Advanced Application Book" + i);
Sid.addfield ("keywords", "SSH" + i);
try {
Solr.add (SID);
} catch (Malformedurlexception e) {
E.printstacktrace ();
} catch (Solrserverexception e) {
E.printstacktrace ();
} catch (IOException e) {
E.printstacktrace ();
}
System.out.println (i);
if (i = = 999)
System.out.println (New Date (). GetTime ()-date.gettime ())/60000 + "minutes");
}
try {
Solr.commit ();
} catch (Solrserverexception e) {
E.printstacktrace ();
} catch (IOException e) {
E.printstacktrace ();
}
}

The above code means: use for to submit 10,000 document, and print the time required to submit 10000.

1 Commonshttpsolrserver use httpclient and SOLR Server for communication.

2 Commonshttpsorlrserver allows setting of link properties.

Java code

Server.setsotimeout (1000); Socket Read timeout
Server.setconnectiontimeout (100);
Server.setdefaultmaxconnectionsperhost (100);
Server.setmaxtotalconnections (100);
Server.setfollowredirects (FALSE); Defaults to False
Allowcompression defaults to False.
Server side must support gzip or deflate for this to has any effect.
Server.setallowcompression (TRUE);
Server.setmaxretries (1); Defaults to 0. > 1 not recommended.

3 "Another class that implements the Solrserver interface: Embeddedsorrserver, which does not require an HTTP connection.

4 "When constructing the document, you can add one to the Solrserver, or you can build a collection that contains the document, add collection to the Solrserver, and commit.

5 "You can also construct a javabean that matches the document to submit

Use Java annotations to create Java beans. @Field, it can be used on a domain, or on a setter method. If the name of a domain is not the same as the name of the bean, then fill in the Java comment with the alias, specifically, you can refer to the following domain categories

Java code

Import Org.apache.solr.client.solrj.beans.Field;
public class Item {
@Field
String ID;
@Field ("Cat")
String[] categories;
@Field
List<string> features;
}

Java annotations can also be used on setter methods, as in the following example:

Java code

@Field ("Cat")
public void Setcategory (string[] c) {
This.categories = C;
}

There should be a relative, get method (without the Java comment) to read the property

Java code

Item item = new Item ();
Item.id = "one";
Item.categories = new string[] {"AAA", "BBB", "CCC"};

Add to SOLR

Java code

Server.addbean (item);

Submit multiple Beans to SOLR

Java code

list<item> beans;
Add Item objects to the list
Server.addbeans (beans);

Note: You can use solrserver repeatedly to improve performance.

6 "

Java code

public static void Update () {
Solrserver solrserver = null;
try {
Solrserver = new Commonshttpsolrserver (Solr_url);
} catch (Malformedurlexception e) {
E.printstacktrace ();
}
Updaterequest updaterequest = new Updaterequest ();
Solrinputdocument sid = New Solrinputdocument ();
Sid.addfield ("id", 100000);
Sid.addfield ("name", "Struts+hibernate+spring development Daquan");
Sid.addfield ("Summary", "Integrated application of three kinds of frameworks");
Sid.addfield ("Author", "Li Liange");
Sid.addfield ("Date", new Date ());
Sid.addfield ("Content", "Advanced Application Class Books");
Sid.addfield ("keywords", "SSH");
Updaterequest.setaction (UpdateRequest.ACTION.COMMIT, False, false);
Updaterequest.add (SID);
try {
Updateresponse updateresponse = updaterequest.process (solrserver);
System.out.println (Updateresponse.getstatus ());
} catch (Solrserverexception e) {
E.printstacktrace ();
} catch (IOException e) {
E.printstacktrace ();
}
}

Submit a document, using the Update method, note:

Java code

Updaterequest.setaction (UpdateRequest.ACTION.COMMIT, False, false);

7 "

Java code

public static void query () {
Solrserver SOLR = null;
try {
SOLR = new Commonshttpsolrserver (Solr_url);
} catch (Malformedurlexception e) {
E.printstacktrace ();
Return
}
Http://localhost:8983/solr/spellCheckCompRH?q=epod&spellcheck=on&spellcheck.build=true
Modifiablesolrparams params = new Modifiablesolrparams ();
Params.set ("Qt", "/SPELLCHECKCOMPRH");
Params.set ("Q", "Programming");
Params.set ("SpellCheck", "on");
Params.set ("Spellcheck.build", "true");
Queryresponse response = null;
try {
Response = Solr.query (params);
} catch (Solrserverexception e) {
E.printstacktrace ();
Return
}
SYSTEM.OUT.PRINTLN ("response =" + response);
}

This is a query method. Keyword: "Programming". For keywords on queries, see Slor Wikihttp://wiki.apache.org/solr/queryparametersindex, or wait for my blog to update, there will be an article detailing this issue later.

8. Manually optimize the index file for SOLR,

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.