Apache SOLR initial experience 4

Source: Internet
Author: User
Tags apache solr solr

We talked about the basic usage and configuration file of SOLR several times before, and then we started our real code journey.

1) start with a simple program:

Public static void main (string [] ARGs) throws solrserverexception, ioexception, parserconfigurationexception, saxexception {

// Set SOLR. Home. Note that the environment variable SOLR. SOLR. Home is used.
System. setproperty ("SOLR. SOLR. Home", "E: \ SOLR ");
// Initialize the container to load the SOLR. Home configuration file.
Corecontainer. initializer = new corecontainer. initializer ();
Corecontainer = initializer. initialize ();

Embeddedsolrserver solrserver = new embeddedsolrserver (corecontainer ,"");
// Construct the parameter list
Solrquery = new solrquery ();
Map <string, string> map = new hashmap <string, string> ();
Map. Put (facetparams. facet_date, "manufacturedate_dt ");
Map. Put (facetparams. facet_date_start, "2004-01-01t00: 00: 00Z ");
Map. Put (facetparams. facet_date_end, "2010-01-01t00: 00: 00Z ");
Map. Put (facetparams. facet_date_gap, "+ 1 year ");
Map. Put ("indent", "on ");
Map. Put ("WT", "XML ");
Map. Put ("Hl. FL", "name ");
Solrparams Params = new mapsolrparams (MAP );
Solrquery. Add (Params );
Solrquery. setfacet (true );
Solrquery. setfields ("name, price, score ");
Solrquery. setquery ("SOLR ");
Solrquery. setsortfield ("price", solrquery. Order. ASC );
Solrquery. sethighlight (true );

System. Out. println (solrquery. tostring ());

Queryresponse = solrserver. Query (solrquery );
System. Out. println (queryresponse. tostring ());
System. Out. println ("found in total:" + queryresponse. getresults (). getnumfound () + "result ");
// Parse the returned Parameters
Solrdocumentlist SDL = (solrdocumentlist) queryresponse. getresponse (). Get ("response ");
For (INT I = 0; I <SDL. Size (); I ++ ){
Object OBJ = SDL. Get (I). Get ("manufacturedate_dt ");
String date = "";
If (OBJ! = NULL ){
Date = new simpledateformat ("yyyy-mm-dd"). Format (date) OBJ );
}

System. out. println (solrdocument) SDL. get (I )). get ("name") + ":" + date + ":" + (SDL. get (I ). get ("price ")));
}
}


In this case, we use embeddedsolrserver, which is used for the embedded SOLR service. Here we do not need to provide external services, so we will use this. There is another

The commonshttpsolrserver class is used to send commands. For example, you can use this class to send HTTP commands for query.

Next, let's analyze the code. First, we set an environment variable named SOLR. SOLR. home. This is the correct one. You have read it correctly. Next we initialize the container and let it load the SOLR. Home configuration file. The following system code constructs the parameter list.

The constructed parameter list is as follows: facet. date. start = 2004-01-01t00% 3a00% 3a00z & indent = on & facet. date = manufacturedate_dt & Hl. FL = Name & facet. date. gap = % 2b1year & Wt = xml & facet. date. end = 2010-01-01t00% 3a00% 3a00z & facet = true & FL = Name % 2 cprice % 2 cscore & Q = SOLR & sort = Price + ASC & HL = true

It is not the same as what we input directly in the browser, because it is encoded. After the construction is complete, we can use solrserver for query.

The query result is in JSON format. Note that the query results obtained through programs are in JSON format, rather than XML format. However, this is better, so that we can perform the next parsing.

The code below is to parse the content, which should be easy to understand.

2) Next we will try to write a program for indexing instead of using post. jar.

The program code is as follows:

Public static void main (string [] ARGs) throws ioexception, parserconfigurationexception, saxexception {

System. setproperty ("SOLR. SOLR. Home", "E: \ solrindex ");

// The following three lines of code are mainly used to load the configuration file.
Solrconfig = new solrconfig ("E: \ solrindex \ conf \ solrconfig. xml ");
Fileinputstream FCM = new fileinputstream ("E: \ solrindex \ conf \ schema. xml ");
Indexschema = new indexschema (solrconfig, "solrconfig", fiis );

Solrindexwriter SiW = new solrindexwriter ("solrindex", "E: \ solrindex", new standarddirectoryfactory ()
, True, indexschema );
Document document = new document ();
Document. Add (new field ("text", "test", field. Store. Yes, field. Index. analyzed, field. termvector. with_positions_offsets ));
Document. Add (new field ("test_t", "test again", field. Store. Yes, field. Index. analyzed, field. termvector. with_positions_offsets ));
SiW. adddocument (document );

SiW. Commit ();
SiW. Close ();

Solrcore = new solrcore ("E: \ solrindex", indexschema );

Solrindexsearcher SIS = new solrindexsearcher (solrcore, indexschema, "solrindex ",
New standarddirectoryfactory (). Open ("E: \ solrindex"), true );
Topdocs docs = sis. Search (New termquery (new term ("test_t", ""), 1 );

System. Out. println ("find" + docs. totalhits + "result ");

For (INT I = 0; I <docs. scoredocs. length; I ++ ){
System.out.println(sis.doc(docs.scoredocs? I =.doc). Get ("test_t "));
}

}


The code is not hard to understand, so no comments are written. It mainly refers to the Code for loading the configuration file. The next step is to add an index and then query the index. The deletion is relatively simple and a code is provided directly.

solrServer.solrServer.deleteById("SOLR1000");  

Or

solrServer.deleteByQuery()  

Are relatively simple.

3) Next let's talk about Chinese word segmentation, which may be used in projects. There are many Chinese word segmentation, including Ik, paoding, and mmseg4j, there are other places in the Chinese Emy of sciences. However, I personally suggest using ik or mmseg4j. Both of them have direct support for SOLR and paoding. However, you may need to write the class to inherit basetokenizerfactory and then configure it.

The above example uses Chinese word segmentation. If you cannot find the result, it is normal because Chinese word segmentation has not been added. You can change Chinese to English and check again, you can find out.

To add Chinese word segmentation, we need to make a fuss in schema. xml. Find the types tag and find the type in which you want to perform Chinese word segmentation, such as the text type. We can configure the content for analysis by Chinese Word Segmentation:

<analyzer type="index">  
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>


You may not understand the filter items first, but you must understand the configuration of tokenizer. It configures the tokenizer you want to apply. It must inherit from basetokenizerfactory. We can see that analyzer has a type attribute, which indicates the stage at which you want to use this tokenizer. If you want to use both the index and query, we can leave type unspecified, in this way, SOLR will use this tokenizer when indexing and querying. After the configuration is complete, you can test the Chinese word segmentation. We will re-index the above example with Chinese characters and then query it to see if there are any problems. My running results are as follows:

 

We found the result to prove that our Chinese Word Segmentation is okay.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.