Schema. xml configuration and solrj usage
This article describes how to build a SOLR runtime environment and perform word segmentation for Chinese Query statements. This article mainly describes the configuration of Schema. xml and how to use solrj.
For a search program, the most important thing is to understand its overall architecture. SOLR is also a Lucene-based full-text search server. At the same time, it is expanded to provide a richer query language than Lucene, and at the same time, it is configurable and scalable, and the query performance is optimized, it also provides a complete functional management interface. however, his execution process is equivalent to Lucene.
A typical component of the search program. The shadow part is completed by Lucene.
Let's first talk about this schema. xml.
Schema. XML, which is equivalent to the data table configuration file, which defines the data type of the data to be indexed. It mainly includes types, fields, and other default settings.
1) first define a fieldtype subnode in the types node, including parameters such as name, class, positionincrementgap. Name is the name of fieldtype, and class points to Org. apache. SOLR. the class name corresponding to the analysis package, used to define this type of behavior. When fieldtype is defined, the most important thing is to define the analyzer used to index and query data of this type, including word segmentation and filtering. The second article details how to add a Chinese Word divider. For more information, see http://3961409.blog.51cto.com/3951409/833417.
2) The next step is to define a specific field (similar to a field in a database) in the fields node, that is, filed. The filed definition includes name and type (for various fieldtypes previously defined ), indexed, stored, multivalued, and so on.
Example:
- <field name="id" type="string" indexed="true" stored="true" required="true" />
- <field name="ant_title" type="textComplex" indexed="true" stored="true" />
- <field name="ant_content" type="textComplex" indexed="true" stored="true" />
- <field name="all" type="textComplex" indexed="true" stored="false" multiValued="true"/>
Field definition is very important. There are several tips to note that you should set the multivalued attribute to true if there may be multiple value fields to avoid an index creation error. If you do not need to store the corresponding field value, set the stored attribute to false whenever possible.
3) We recommend that you create a copy field to copy all full-text fields to a field for unified search: (in this case, using all: Jason for query is equivalent to using ant_title: Jason.
Or ant_content: Jason)
- <field name="all" type="textComplex" indexed="true" stored="false" multiValued="true"/>
Complete the copy settings at the copy field node:
- <copyField source="ant_title" dest="all"/>
- <copyField source="ant_content" dest="all"/>
4) In addition, you can define dynamic fields. The so-called dynamic field does not need to specify a specific name. As long as you define a field name rule, such as defining a dynamicfield with the name * _ I, define its type as text. When this field is used, any field ending with _ I is considered to comply with this definition, such as name_ I, gender_ I, and school_ I.
The schema. xml configuration file is basically like this. For more details, see SOLR wiki http://wiki.apache.org/solr/schemaxml.
The following operations are performed on the index using solrj:
1) create a project and add the following jar package (refer to the http://wiki.apache.org/solr/Solrj)
From/Dist:
From/Dist/solrj-lib
- Commons-codec-1.3.jar
- Commons-httpclient-3.1.jar
- Commons-io-1.4.jar
- Jcl-over-slf4j-1.5.5.jar
- Slf4j-api-1.5.5.jar
That is, the commons-codec-x.xjar in SOLR/Dist/solrj-lib/, commons-httpclient-x.x.jar
Commons-io-x.x.jar
Jcl-over-slf4j-x.x.jar
, Slf4j-api-x.x.jar and SOLR/Dist/medium apache-solr-solrj-x.x.x.jar
Apache-solr-core-x.x.x.jar
2) create a test class
- Package cn.edu. ccut. blackant;
-
- Import java. Io. ioexception;
- Import java.net. malformedurlexception;
-
- Import org. Apache. SOLR. Client. solrj. solrserverexception;
- Import org. Apache. SOLR. Client. solrj. impl. commonshttpsolrserver;
- Import org. Apache. SOLR. Common. solrinputdocument;
- Import org. JUnit. test;
-
- Public class solrtest {
- @ Test
- Public void test (){
- Final string url = "http: // localhost: 8080/SOLR ";
- // Create a solrserver object (commonshttpsolrserver)
- Try {
- Commonshttpsolrserver Server = new commonshttpsolrserver (URL );
- Solrinputdocument Doc = new solrinputdocument ();
- Doc. addfield ("ID", "2"); // The ID must exist. The value type depends on the ID type specified in schema. xml.
- Doc. addfield ("ant_title", "atitle ");
- Doc. addfield ("ant_content", "Jason ");
- Server. Add (DOC );
- Server. Commit ();
- } Catch (malformedurlexception e ){
- // Todo auto-generated Catch Block
- E. printstacktrace ();
- } Catch (solrserverexception e ){
- // Todo auto-generated Catch Block
- E. printstacktrace ();
- } Catch (ioexception e ){
- // Todo auto-generated Catch Block
- E. printstacktrace ();
- }
- }
- }
Add JUnit to the project. Right-click the project and choose "add library"> "JUnit"> "junit4"> "finish ".
3) run the test class (you need to view the log files of the console or Tomcat for the running information)
You can use Luke to view the running result. You must select Luke Based on the SOLR version before using it. Here solr3.5 is used, so Luke must also use version 3.5.
Http://code.google.com/p/luke/downloads/detail? Name = lukeall-3.5.0.jar
Usage:
3.1) enter the file path
3.2) Open the software in the command line Java-jar./lukeall-3.5.0.jar
Running interface:
It must be noted that you must specify the SOLR index file path. Here it is/home/Jason/SOLR-Tomcat/SOLR/data/index. After specifying the path
If it runs successfully, a new index will be generated, as shown in the lower right corner. If the id value in the program remains the same, the index value with ID 2 will be overwritten each time, so that you can update the index.
4) Access http: // 127.0.0.1: 8080/SOLR/admin/
Query *: * (query all). If the result contains information in the program, congratulations!