Apache SOLR Configuration

Last Update:2014-11-19 Source: Internet

Author: User

Tags apache solr solr solr query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SOLR Configuration

The main function of SOLR is full-text indexing, which is divided into two processes: creating indexes and searching for indexes; Before you create an index, you need to focus on two profiles: Solr_home/collection1/conf/schema.xml ( The structure that defines the document is similar to the table structure that defines the db) & Solrconfig.xml (SOLR runs the configuration such as how requests are handled); During SOLR's creation of the index, each piece of data is abstracted into a document, The attributes of each data are abstracted into field (fields), and the document is added and deleted by the SOLR native support xml,json,csv format document file, but the reality is that many applications are stored in a relational database or XML file. To index this data needs to be done via data Import request Handler (SOLR extension), which provides a full-scale index (index all data) and an incremental index (only indexes data after a point in time) The following author will illustrate the whole process through an example of indexing data in MySQL database, 1, creating indexes on data in Data Table topic, topic table structure as follows: Create TABLE ' topic ' (
' ID ' INT (8) Not NULL auto_increment COMMENT ' self-increment id ',
' title ' VARCHAR (+) DEFAULT NULL COMMENT ' title ',
' Content ' text COMMENT ' contents ',
' Create_date ' BIGINT () DEFAULT NULL COMMENT ' creation time ',
' Update_date ' BIGINT () DEFAULT NULL COMMENT ' Update Time ',
PRIMARY KEY (' id '),
Engine=innodb Charset=utf8 Full-Text search only displays other fields for the Title,content field, 2, defines the document structure, and the solr_home/collection1/conf/ Schema.xml make the following modifications: Add the following field definition:<field name= "test_id" type= " String "indexed=" true "stored=" true "required=" true "multivalued=" false "/> <field name=" Test_title "Type=" Text_chinese_ik "indexed=" true "stored=" true "/><field name=" test_content "type=" Text_chinese_ik " Indexed= "true" stored= "true"/><field name= "test_create_date" type= "Long" indexed= "false" stored= "true"/> <field name= "Test_update_date" type= "Long" indexed= "false" stored= "true"/>field Property Description: Name: Must, Field Name type: Must, The field type name, defined Indexed:true by FieldType in <types>, indicates that the field needs to be indexed (searchable and sorted) stored: True indicates that the field stored in the index can be read later multivalued:true indicates that the field has more than one value in the document Required:field if there must be a value, and if the field is empty during indexing, the default is faulted: The default value increases FieldType definition, because to support Chinese search, the use of Chinese word breaker when indexing, I use IK analyzer, download IK Analyzer 2012FF_HF1 version can support SOLR4, the above configuration uses type= "Text_ Chinese_ik "FieldType, the fieldType is not a predefined type of SOLR, so you need to add a definition of that type to <types> and support Chinese word breakers, defined as follows: <fieldtype name= "Text_chinese_ik" class= "SOLR. TextField ">
<analyzer type= "index" ismaxwordlength= "false" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>
<analyzer type= "Query" ismaxwordlength= "true" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/></fieldtype >org.wltea.analyzer.lucene.ikanalyzer is used for indexing and searching for Chinese parts of speech, where Ikanalyzer2012ff_u1.jar in IK Analyzer is required, Stopword.txt,ikanalyzer.cfg files are copied to tomcat_home/webapp/solr/web-inf/lib under Set UniqueKey, each document can be positioned by UniqueKey, SOLR guarantees that a uniquekey only exists for one document:<uniquekey>test_id</uniquekey> (as UniqueKey field must be required) 3, With the addition of the Dataimporter processor, the SOLR rest-style APIs guarantee that all functions can be implemented via HTTP requests, such as Query/select, index update/update, and so on, which are pre-defined on SOLR, dataimporter are extended functions, You need to add the data Import Request Handler in Solr_home/collection1/conf/solrconfig.xml, as in this example: <requesthandler name= "/ DataImport "class=" Org.apache.solr.handler.dataimport.DataImportHandler ">
<lst name= "Defaults" >
<str name= "config" >./data-config.xml</str>
</lst>
The </requesthandler>org.apache.solr.handler.dataimport.dataimporthandler is a Dataimporter processor (expansion module), Need to copy Solr-4.2.0/dist/solr-dataimporthandler-4.2.0.jar,solr-dataimporthandler-extras-4.2.0.jar to tomcat_home/ Webapps/solr/web-inf/lib, Data-config.xml is the data source configuration file that Dataimporter uses to read data from the data source 4, Configure Data-config.xml, this example imports data from MySQL table topic:<dataconfig>
<datasource type= "Jdbcdatasource"
Batchsize= "-1"
Driver= "Com.mysql.jdbc.Driver"
Url= "Jdbc:mysql://127.0.0.1:3306/test"
user= "Reader"
password= "Reader"/>
<document>
<entity name= "topic" pk= "id" onerror= "Skip" transformer= "Com.zj.transformer.MySolrTransformer" query= "SELECT ID, Title,content,create_date,update_date from topic "
deltaimportquery= "Select Id,title,content,create_date,update_date from topic where Id=${dataimporter.delta.id}" Del taquery= "SELECT ID from topic where update_date> ' ${dataimporter.last_index_time} '" >
<field column= "id" name= "test_id"/>
<field column= "title" Name= "Test_title"/>
<field column= "Content" name= "Test_content"/>
<field column= "Create_date" name= "test_create_date"/> <field column= "update_date" name= "Test_update_date" /> </entity>
</document>
</dataConfig><dataSource> is used to define a data source, this example defines jdbcdatasource as the data source <entity> definition extraction, transforms and adds data to the index, name, PK is the primary key, OnError defines error handling (abort|skip|continue), transformer for data conversion (after query execution, before adding the index), query defines the full-scale index when the data query SQL, Deltaimportquery define the incremental index when the data query Sql,deltaquery define which data needs a Delta index of the query sql<field> define the database column for the indexed field to SOLR, column is the name of the database table field, Name is the index field name defined for SOLR in this example query= "Select Id,title,content,create_date,update_date from topic", The full index adds all the data from the topic table to the SOLR index, and after the full index is complete, SOLR automatically generates dataimport.properties to hold the most recent index start timestamp last_index_time, By configuring Deltaimportquery= "Select Id,title,content,create_date,update_date from topic where id=${dataimporter.delta.id}" , deltaquery= "SELECT ID from topic where update_date> ' ${dataimporter.last_index_time} '", the incremental index will be topic in Update_ Date greater than last_index_time data added to the index for incremental updates (Note: ${dataimporter.delta.id},${dataimporter.last_index_time} is a fixed notation in addition to the ID need to deltaquery= "SELECT ID ..." corresponding to the other cannot be changed or Dataimporter not be the corresponding value) The Com.zj.transformer.MySolrTransformer of this example is mainly to introduce transformer, and has no special purpose: PackageCom.zj.transformer; Public classMysolrtransformer { PublicObject Transformrow (map<string, object> Row) {//row in the Save database query out of a record <column_name, value>//Can be on RO W for various modifications returnRow }}transformer's customization is very simple and completely non-intrusive, just to achieve PublicObject Transformrow (map<string, object> Row) method can be 5, start the index process to build a full-scale index, in the browser input: Http://ip:port/solr/dataimport? Command=full-import&commit=true Build Incremental Index: Http://ip:port/solr/dataimport?command=delta-import&clean=false &commit=true (can also be sent by timer to send HTTP requests to build incremental index); 6, query index 1, directly through SOLR query page query: Http://ip:port/solr/#/collection1/query2, Querying via the SOLRJ API: Solr-4.2.0/dist/solr-solrj-4.2.0.jar,solr-4.2.0/dist/solrj-lib/httpclient-4.2.3.jar, Httpcore-4.2.2.jar,httpmime-4.2.3.jar copy into the classpath of the project, create the following code: PackageCom.mobcent.searcher.solr.searcher; ImportOrg.apache.solr.client.solrj.SolrQuery; ImportOrg.apache.solr.client.solrj.SolrServer; ImportOrg.apache.solr.client.solrj.SolrServerException; ImportOrg.apache.solr.client.solrj.impl.HttpSolrServer; ImportOrg.apache.solr.client.solrj.response.QueryResponse; ImportOrg.apache.solr.common.SolrDocumentList; Public classCopyofsolrsearcher { Public Static voidMain (string[] args) {Solrserver Server = NewHttpsolrserver ("HTTP://127.0.0.1:8080/SOLR"); ((httpsolrserver) server). Setsotimeout (3000); ((httpsolrserver) server). Setconnectiontimeout (3000); ((httpsolrserver) server). Setmaxtotalconnections (100); ((httpsolrserver) server). Setdefaultmaxconnectionsperhost (100); Solrquery query = NewSolrquery (); Set keyword query.setquery ("keyword"); Set filter. Query.addfilterquery ("Field:value"); Set form to. Start Page query.setstart (0); Query.setrows per page (10); Queryresponse Queryresponse; Try{queryresponse = server.query (query); Solrdocumentlist docList = Queryresponse.getresults (); if( NULL! = docList) System. out. println ("Find total:" + doclist.getnumfound ()); } Catch(Solrserverexception e) {E.printstacktrace (); }}}, the above is configured with a specific example of SOLR, which is finished using SOLR's entire process, creating an index and searching the index; by the way, Solr's wiki is a good place to learn SOLR; Apache Solrcloud Introduction and Installation | Apache SOLR Introduction and Installation

Apache SOLR Configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More