Apache SOLR initial experience 4

Last Update:2018-12-06 Source: Internet

Author: User

Tags apache solr solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We talked about the basic usage and configuration file of SOLR several times before, and then we started our real code journey.

1) start with a simple program:

Public static void main (string [] ARGs) throws solrserverexception, ioexception, parserconfigurationexception, saxexception {

// Set SOLR. Home. Note that the environment variable SOLR. SOLR. Home is used.
System. setproperty ("SOLR. SOLR. Home", "E: \ SOLR ");
// Initialize the container to load the SOLR. Home configuration file.
Corecontainer. initializer = new corecontainer. initializer ();
Corecontainer = initializer. initialize ();

Embeddedsolrserver solrserver = new embeddedsolrserver (corecontainer ,"");
// Construct the parameter list
Solrquery = new solrquery ();
Map <string, string> map = new hashmap <string, string> ();
Map. Put (facetparams. facet_date, "manufacturedate_dt ");
Map. Put (facetparams. facet_date_start, "2004-01-01t00: 00: 00Z ");
Map. Put (facetparams. facet_date_end, "2010-01-01t00: 00: 00Z ");
Map. Put (facetparams. facet_date_gap, "+ 1 year ");
Map. Put ("indent", "on ");
Map. Put ("WT", "XML ");
Map. Put ("Hl. FL", "name ");
Solrparams Params = new mapsolrparams (MAP );
Solrquery. Add (Params );
Solrquery. setfacet (true );
Solrquery. setfields ("name, price, score ");
Solrquery. setquery ("SOLR ");
Solrquery. setsortfield ("price", solrquery. Order. ASC );
Solrquery. sethighlight (true );

System. Out. println (solrquery. tostring ());

Queryresponse = solrserver. Query (solrquery );
System. Out. println (queryresponse. tostring ());
System. Out. println ("found in total:" + queryresponse. getresults (). getnumfound () + "result ");
// Parse the returned Parameters
Solrdocumentlist SDL = (solrdocumentlist) queryresponse. getresponse (). Get ("response ");
For (INT I = 0; I <SDL. Size (); I ++ ){
Object OBJ = SDL. Get (I). Get ("manufacturedate_dt ");
String date = "";
If (OBJ! = NULL ){
Date = new simpledateformat ("yyyy-mm-dd"). Format (date) OBJ );
}

System. out. println (solrdocument) SDL. get (I )). get ("name") + ":" + date + ":" + (SDL. get (I ). get ("price ")));
}
}

In this case, we use embeddedsolrserver, which is used for the embedded SOLR service. Here we do not need to provide external services, so we will use this. There is another

The commonshttpsolrserver class is used to send commands. For example, you can use this class to send HTTP commands for query.

Next, let's analyze the code. First, we set an environment variable named SOLR. SOLR. home. This is the correct one. You have read it correctly. Next we initialize the container and let it load the SOLR. Home configuration file. The following system code constructs the parameter list.

The constructed parameter list is as follows: facet. date. start = 2004-01-01t00% 3a00% 3a00z & indent = on & facet. date = manufacturedate_dt & Hl. FL = Name & facet. date. gap = % 2b1year & Wt = xml & facet. date. end = 2010-01-01t00% 3a00% 3a00z & facet = true & FL = Name % 2 cprice % 2 cscore & Q = SOLR & sort = Price + ASC & HL = true

It is not the same as what we input directly in the browser, because it is encoded. After the construction is complete, we can use solrserver for query.

The query result is in JSON format. Note that the query results obtained through programs are in JSON format, rather than XML format. However, this is better, so that we can perform the next parsing.

The code below is to parse the content, which should be easy to understand.

2) Next we will try to write a program for indexing instead of using post. jar.

The program code is as follows:

Public static void main (string [] ARGs) throws ioexception, parserconfigurationexception, saxexception {

System. setproperty ("SOLR. SOLR. Home", "E: \ solrindex ");

// The following three lines of code are mainly used to load the configuration file.
Solrconfig = new solrconfig ("E: \ solrindex \ conf \ solrconfig. xml ");
Fileinputstream FCM = new fileinputstream ("E: \ solrindex \ conf \ schema. xml ");
Indexschema = new indexschema (solrconfig, "solrconfig", fiis );

Solrindexwriter SiW = new solrindexwriter ("solrindex", "E: \ solrindex", new standarddirectoryfactory ()
, True, indexschema );
Document document = new document ();
Document. Add (new field ("text", "test", field. Store. Yes, field. Index. analyzed, field. termvector. with_positions_offsets ));
Document. Add (new field ("test_t", "test again", field. Store. Yes, field. Index. analyzed, field. termvector. with_positions_offsets ));
SiW. adddocument (document );

SiW. Commit ();
SiW. Close ();

Solrcore = new solrcore ("E: \ solrindex", indexschema );

Solrindexsearcher SIS = new solrindexsearcher (solrcore, indexschema, "solrindex ",
New standarddirectoryfactory (). Open ("E: \ solrindex"), true );
Topdocs docs = sis. Search (New termquery (new term ("test_t", ""), 1 );

System. Out. println ("find" + docs. totalhits + "result ");

For (INT I = 0; I <docs. scoredocs. length; I ++ ){
System.out.println(sis.doc(docs.scoredocs? I =.doc). Get ("test_t "));
}

}

The code is not hard to understand, so no comments are written. It mainly refers to the Code for loading the configuration file. The next step is to add an index and then query the index. The deletion is relatively simple and a code is provided directly.

solrServer.solrServer.deleteById("SOLR1000");

solrServer.deleteByQuery()

Are relatively simple.

3) Next let's talk about Chinese word segmentation, which may be used in projects. There are many Chinese word segmentation, including Ik, paoding, and mmseg4j, there are other places in the Chinese Emy of sciences. However, I personally suggest using ik or mmseg4j. Both of them have direct support for SOLR and paoding. However, you may need to write the class to inherit basetokenizerfactory and then configure it.

The above example uses Chinese word segmentation. If you cannot find the result, it is normal because Chinese word segmentation has not been added. You can change Chinese to English and check again, you can find out.

To add Chinese word segmentation, we need to make a fuss in schema. xml. Find the types tag and find the type in which you want to perform Chinese word segmentation, such as the text type. We can configure the content for analysis by Chinese Word Segmentation:

<analyzer type="index">  
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>  
<filter class="solr.StopFilterFactory"  
                ignoreCase="true"  
                words="stopwords.txt"  
                enablePositionIncrements="true"  
/>  
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>  
<filter class="solr.LowerCaseFilterFactory"/>  
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>  
</analyzer>  
<analyzer type="query">  
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>  
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>  
<filter class="solr.StopFilterFactory"  
                ignoreCase="true"  
                words="stopwords.txt"  
                enablePositionIncrements="true"  
/>  
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>  
<filter class="solr.LowerCaseFilterFactory"/>  
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>  
</analyzer>

You may not understand the filter items first, but you must understand the configuration of tokenizer. It configures the tokenizer you want to apply. It must inherit from basetokenizerfactory. We can see that analyzer has a type attribute, which indicates the stage at which you want to use this tokenizer. If you want to use both the index and query, we can leave type unspecified, in this way, SOLR will use this tokenizer when indexing and querying. After the configuration is complete, you can test the Chinese word segmentation. We will re-index the above example with Chinese characters and then query it to see if there are any problems. My running results are as follows:

We found the result to prove that our Chinese Word Segmentation is okay.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More