Index MySQL data using SOLR

Source: Internet
Author: User
Tags apache download solr solr query server port

Environment construction

1, to Apache download SOLR, Address: http://mirrors.hust.edu.cn/apache/lucene/solr/

2. Unzip to a directory

3. CD into D:\Solr\solr-4.10.3\example

4. Execute the server by "Java-jar Startup.jar" SOLR will automatically run on its own jetty

5. Visit http://localhost:8983/solr/#/

ps:solr-5.0 above the default schema management is to use Managed-schema, cannot be manually modified, need to use schema RESTful API operation. If you want to modify the configuration manually, change the managed-schema copy to Schema.xml, and modify the following in Solrconfig.xml:

<!--<schemafactory class= "Managedindexschemafactory" > <bool name= "mutable" >true</bool> <st R name= "Managedschemaresourcename" >managed-schema</str> </schemaFactory> <!--< Processor class= "SOLR. Addschemafieldsupdateprocessorfactory "> <str name=" defaultfieldtype ">strings</str> <lst name=" Typemapping "> <str name=" valueclass ">java.lang.Boolean</str> <str name=" FieldType ">bool eans</str> </lst> <lst name= "typemapping" > <str name= "Valueclass" >java.util.date&        lt;/str> <str name= "FieldType" >tdates</str> </lst> <lst name= "typemapping" > <str name= "Valueclass" >java.lang.Long</str> <str name= "Valueclass" >java.lang.integer</str > <str name= "FieldType" >tlongs</str> </lst> <lst name= "typemapping" > &L T;str name= "ValuEClass ">java.lang.Number</str> <str name=" FieldType ">tdoubles</str> </lst> </ Processor>--<schemafactory class= "Classicindexschemafactory"/>

Create MySQL data

DataBase Name:mybatis

Table Name:user

Db.sql

1SET foreign_key_checks=0;2-- ----------------------------3--Table structure for' user '4-- ----------------------------5 DROP TABLE IF EXISTS ' user ';6 7 CREATE TABLE ' user ' (8' ID 'int(11) not NULL auto_increment,9' UserName ' varchar (50) DEFAULT NULL,Ten' Userage 'int(11) DEFAULT NULL, One' useraddress ' varchar (200) DEFAULT NULL, A PRIMARY KEY (' id ') -) Engine=innodb auto_increment=6 DEFAULT charset=UTF8; -  the-- ---------------------------- ---Records of User --- ---------------------------- -INSERT into ' user ' VALUES (' 1 ', ' Summer ', ' + ', ' Shanghai ')); +INSERT into ' user ' VALUES (' 2 ', ' test1 ', ' a ', ' Suzhou ')); -INSERT into ' user ' VALUES (' 3 ', ' test1 ', ' + ', ' some place ')); +INSERT into ' user ' VALUES (' 4 ', ' Lu ', ' I ', ' some place ')); AINSERT into ' user ' VALUES (' 5 ', ' Xiaoxun ', ' + ', ' Nanjing ');

Import and index data using Dataimporthandler

1) configuration D:\Solr\solr-4.10.3\example\solr\collection1\conf\solrconfig.xml

In <requesthandler name= "/select" class= "SOLR. Searchhandler "> on the front plus a dataimport to handle the Handler

<requesthandler name= "/dataimport" class= "Org.apache.solr.handler.dataimport.DataImportHandler" >     < LST name= "Defaults" >        <str name= "config" >data-config.xml</str>     </lst> </ Requesthandler>

2) add data-config.xml in the same directory

<?xml version= "1.0" encoding= "UTF-8"?><dataconfig>    <datasource type= "Jdbcdatasource" driver= "  Com.mysql.jdbc.Driver "url=" Jdbc:mysql://127.0.0.1:3306/mybatis "user=" root "password=" Luxx "batchsize="-1 "/> <document name= "Testdoc" >        <entity name= "user" pk= "id"                query= "select * from User" >         <field column= "id" name= "id"/>        <field column= "UserName" name= "UserName"/>            <field column= "Userage" name = "Userage"/>            <field column= "useraddress" name= "useraddress"/>  </entity> </document> </dataConfig>

Description

DataSource is a database data source.

An entity is a table of entities, andPK is the primary key, andquery is the queried statement.

Fields correspond to a field,column is the column name in the database , and the Name property follows the name of SOLR's filed.

3) Modify the Schema.xml in the same directory , which is the schema for SOLR to index the data in the database

(1) Keep _version_ this field

(2) Add an indexed field: Here each field name is the same as the field name of the entity in Data-config.xml .

<field name= "id" type= "int" indexed= "true" stored= "true" required= "true" multivalued= "false"/><!--<field Name= "id" type= "int" indexed= "true" stored= "true" required= "true" multivalued= "false"/>--><field name= " UserName "type=" Text_general "indexed=" true "stored=" true "/><field name=" userage "type=" int "indexed=" true " Stored= "true"/><field name= "useraddress" type= "Text_general" indexed= "true" stored= "true"/>

(3) Delete the extra field, delete the settings in the Copyfield, these are not used. Note:The text field cannot be deleted, otherwise SOLR fails to start.

<field name= "text" type= "Text_general" indexed= "true" stored= "false" multivalued= "true"/>

(4) Set unique primary key: <uniquekey>id</uniquekey> Note: The primary key for an index in SOLR is only supported by the Type= "string" string type, and the ID in my database is int type, there is a problem, Workaround: Modify the Elevate.xml in the same directory, comment out the following 2 lines, which seems to be SOLR's bug, the reason is unclear.

<doc id= "ma147ll/a"/><doc id= "IW-02" exclude= "true"/>

4) Copy Mysql-connector-java-5.1.22-bin.jar and Solr-dataimporthandler-4.10.3.jar to D:\Solr\solr-4.10.3\example\ Solr-webapp\webapp\web-inf\lib. One is the Java driver for MySQL, and the other in the D:\Solr\solr-4.10.3\dist directory is the jar where the Org.apache.solr.handler.dataimport.DataImportHandler resides.

Restart SOLR.

If configured correctly, it can start successfully.

Solrconfig.xml is the basic file of SOLR, which is configured with various Web request processors, request response processors, logs, caches, and so on.

The Schema.xml configuration maps the indexing scheme for various data types. The configuration of the word breaker, the fields contained in the index document are also configured here.

Index test

Go to the SOLR homepage and select Collection1:http://localhost:8983/solr/#/collection1 in core selector

Click Dataimport,command Select Full-import (Default), click "Execute", Refresh status can see the results:

Indexing completed. Added/updated:7 documents. Deleted 0 documents.

Requests:1, Fetched:7, skipped:0, Processed:7

Started:8 minutes ago

Query test: You can see the result by entering Username:test1 in Q to retrieve it.

This uses Full-import to index all the data in the configuration database, and SOLR can query the corresponding data.

Using SOLRJ indexes and retrieving data

The above is to test indexing and retrieval using the functionality on the SOLR Admin page, or you can use code to manipulate SOLR, the following code tests the addition of a user class entity to the SOLR index and returns the result by finding all index.

User entity class:

Package Com.mybatis.test.model;import Org.apache.solr.client.solrj.beans.field;public class User {    @Field    private int id;        @Field    private String userName;        @Field    private int userage;        @Field    private String useraddress;    public int getId () {        return ID;    }    public void setId (int id) {        this.id = ID;    }    Public String GetUserName () {        return userName;    }    public void Setusername (String userName) {        this.username = userName;    }    public int getuserage () {        return userage;    }    public void setuserage (int userage) {        this.userage = userage;    }    Public String getuseraddress () {        return useraddress;    }    public void setuseraddress (String useraddress) {        this.useraddress = useraddress;    }    @Override public    String toString () {        return this.username + "" + This.userage + "" + This.useraddress;    }}

The properties that use the @field annotation correspond to the field that SOLR configures.

Test code:

Package Com.solr.test;import Java.io.ioexception;import Org.apache.solr.client.solrj.solrquery;import Org.apache.solr.client.solrj.solrserver;import Org.apache.solr.client.solrj.solrserverexception;import Org.apache.solr.client.solrj.impl.httpsolrserver;import Org.apache.solr.client.solrj.response.QueryResponse; Import Org.apache.solr.client.solrj.response.updateresponse;import org.apache.solr.common.SolrDocumentList;        Import Com.mybatis.test.model.user;public class Solrtest {private static solrserver server;        private static final String Default_url = "Http://localhost:8983/solr/collection1";    public static void Init () {server = new Httpsolrserver (Default_url);                public static void Indexuser (user user) {try {///Add user bean to index library try {                Updateresponse response = Server.addbean (user);                Server.commit ();            System.out.println (Response.getstatus ());     } catch (IOException e) {           E.printstacktrace ();        }} catch (Solrserverexception e) {e.printstacktrace ();        }}//Test Add a new bean instance to the index public static void Testindexuser () {User user = new User ();        User.setid (8);        User.setuseraddress ("place");        User.setusername ("CDEBCDCCGA");                User.setuserage (83);    Indexuser (user);                } public static void Testqueryall () {solrquery params = new Solrquery ();                Query keywords, *:* represent all attributes, all values, i.e. all index Params.set ("Q", "*:*");        Paging, start=0 is starting from 0, rows=5 currently returns 5 records, the second page is the change start this value is 5.        Params.set ("Start", 0);                    Params.set ("Rows", integer.max_value);        Sort, if sorted by ID, then change score desc to ID DESC (or ASC)//Params.set ("Sort", "score desc");                Params.set ("Sort", "id ASC");                Return information * For all, here is all plus score, if not added below you can not use score Params.set ("FL", "*,score");       Queryresponse response = null; try {response = Server.query (params);        } catch (Solrserverexception e) {e.printstacktrace ();            } if (Response!=null) {System.out.println ("Search Results:");            Solrdocumentlist list = Response.getresults ();            for (int i = 0; i < list.size (); i++) {System.out.println (List.get (i));        }}} public static void Main (string[] args) {init ();        Testindexuser ();    Testqueryall (); }}

If you add a piece of data to the database, but the SOLR index does not have the index data, you cannot find it, so when you use SOLR to retrieve the contents of the database, you insert the database first, and then index the data in SOLR. Use SOLR's fuzzy query or word breaker to retrieve the contents of the database.

DIH incrementally importing data from a MySQL database
Have learned how to import the full amount of MySQL data, full import in the large amount of data when the cost is very large, generally applicable to the incremental way to import data, the following describes how to incrementally import data in the MySQL database, and how to set the timing to do.

1) changes to database tables

A user table has been created in the previous section, in order to be able to incrementally import, a new field updatetime is required, the type is TIMESTAMP, and the default value is Current_timestamp.

With such a field, SOLR can determine what data is new when an incremental import is made.

Because SOLR itself has a default value of Last_index_time, the time of the last full import or delta import (incremental imports) is recorded, and this value is stored in the Dataimport.properties file of the file conf directory.

2) Data-config.xml setting of required properties in

Transformer format conversion: Ignore HTML tags in Htmlstriptransformer index

Query: Querying database tables to match record data

Deltaquery: Incremental index query primary key ID Note This only returns the ID field

Deltaimportquery: Incremental index query for imported data

Deletedpkquery: Incremental Index Delete primary Key ID Query Note This can only return the ID field

For an explanation of "query", "Deltaimportquery", "Deltaquery", refer to the official website description as follows:
The query gives the data needed to populate fields of the SOLR document in Full-import
The deltaimportquery gives the data needed to populate fields when running a delta-import
The deltaquery gives the primary keys of the current entity which has changes since the last index time

If you need to correlate a child table query, you may need to use the Parentdeltaquery

The parentdeltaquery uses the changed rows of the current table (fetched with deltaquery) to give the changed rows in Thep Arent table. This was necessary because whenever a row in the child table changes, we need to re-generate the document which have that fi Eld.

For more instructions, see the Deltaimporthandler documentation.

For the user table, the configuration of the Data-config.xml file is as follows:

<?xml version= "1.0" encoding= "UTF-8"?><dataconfig>    <datasource type= "Jdbcdatasource" driver= "  Com.mysql.jdbc.Driver "url=" Jdbc:mysql://127.0.0.1:3306/mybatis "user=" root "password=" Luxx "batchsize="-1 "/> <document name= "Testdoc" >        <entity name= "user" pk= "id"                query= "select * from User"                deltaimportquery= "SELECT * from user where id= ' ${dih.delta.id} '"                deltaquery= "SELECT id from user where updatetime > ' ${dataimporter.last_index_time} ' ">         <field column=" id "name=" id "/> <field        column=" UserName "name=" UserName "/>            <field column=" userage "name=" Userage "/> <field column=            " UserAddress "Name=" useraddress "/><field column=" UpdateTime "name=" UpdateTime "/>  </entity> </ Document></dataconfig>

The principle of incremental indexing is to query the ID number of all data that needs to be incrementally imported from the SQL statement specified by deltaquery in the database.

The data for all these IDs is then returned based on the SQL statement specified by Deltaimportquery, which is the data to be processed for this increment.

The core idea is to log the ID of the secondary index and the time of the last index by using the built-in variables "${dih.delta.id}" and "${dataimporter.last_index_time}".

Note: The newly added updatetime field is also configured in the field property, and also in the Schema.xml file:

<field name= "UpdateTime" type= "date" indexed= "true" stored= "true"/>

If there is a delete operation in the business, you can add a isdeleted field in the database to indicate whether the data has been deleted, and when SOLR updates index, it can update the indexes of the deleted records based on this field.

You need to add the following in Dataconfig.xml:

query= "SELECT * from user where isdeleted=0" deltaimportquery= "select * from user where id= ' ${dih.delta.id} '" deltaquery= " Select ID from user where updatetime> ' ${dataimporter.last_index_time} ' and Isdeleted=0 ' deletedpkquery= ' select ID From user where isdeleted=1 "

When SOLR indexes incrementally, it deletes the index of isdeleted=1 data in the database.

Test Incremental Import

If there is data in the user table, you can empty the previous test data (because the added updatetime does not have a value), add a user with my MyBatis tester, and the database will be assigned to the field at the current time. Use the query query in SOLR to find out if there are any queries to that value, use Dataimport?command=delta-import incremental import, and query all of them again to find the value you just inserted into MySQL.

Set up incremental import tasks for timed execution

It is also possible to use Windows Scheduled Tasks, or Linux cron, to periodically access the incremental import connection to complete the scheduled incremental import, which is fine, and should not be a problem.

But more convenient and more integrated with SOLR itself is the use of its own timing incremental import function.

1. Download Apache-solr-dataimportscheduler-1.0.jar into the \solr-webapp\webapp\web-inf\lib directory:
: http://code.google.com/p/solr-dataimport-scheduler/downloads/list
Can also go to Baidu Cloud disk download: Http://pan.baidu.com/s/1dDw0MRn

Note: Apache-solr-dataimportscheduler-1.0.jar has a bug, reference: http://www.denghuafeng.com/post-242.html

2. Modify the Web. xml file under SOLR's Web-inf directory:
Add a child element to a <web-app> element

<listener>        <listener-class>    Org.apache.solr.handler.dataimport.scheduler.ApplicationListener        </listener-class>    </listener >

3. New configuration file Dataimport.properties:

Create a new directory under the SOLR_HOME\SOLR directory conf (note not solr_home\solr\collection1 below conf), Then use the decompression file to open the Apache-solr-dataimportscheduler-1.0.jar file, the inside of the Dataimport.properties file copied over, to modify, the following is the final update of my automatic timing configuration file content:

################################################## # # DataImport Sched Uler Properties # # ################################################# # # to sync or not to sync# 1-active; Anything else-inactivesyncenabled=1# which cores to schedule# in a multi-core environment can decide which cores y ou want syncronized# leave empty or comment it out if using Single-core deployment# Synccores=game,resourcesynccores=col lection1# SOLR server name or IP address# [defaults to localhost if empty]server=localhost# solr server port# [default s to + if empty]port=8983# application name/context# [defaults to current Servletcontextlistener ' s context (APP) name]w ebapp=solr# Urlparams [mandatory]# remainder of url#http://localhost:8983/solr/collection1/dataimport?command= Delta-import&clean=false&commit=trueparams=/dataimport?command=delta-import&clean=false&commit =true# schedule interval# Number of minutes between the runs# [defaults to If empty]interval=1# the time interval, in minutes, by default 7200, or 1 Days # is empty, 0, or commented out: Indicates that the index will never be re-indexed # rebuildindexinterval=2# parameter rebuildindexparams=/dataimport?command=full-import&  clean=true&commit=true# The timing start time of the redo index interval, the first real execution time =rebuildindexbegintime+rebuildindexinterval*60*1000;# Two formats: 2012-04-11 03:10:00 or 03:10:00, the latter one automatically complements the date part as the date when the service was started rebuildindexbegintime=03:10:00

Here, in order to do the test, an incremental index is performed every 1 minutes, and the Full-import full-scale index is disable.

4. Testing

Inserting a piece of data into the database, querying in SOLR query, is not available at first, and SOLR can query it once the incremental index has been indexed.

In general, there are a few things to consider in the introduction of SOLR in your project:
1, Data Update frequency: How much data increment each day, timely update or regular update
2. Total data: How long will the data be kept?
3. Conformance requirements: How long is expected to see the updated data, the maximum allowed for a long time delay
4. Data features: What the data source includes, the average single record size
5, business characteristics: What sort requirements, search conditions
6, Resource reuse: What is the existing hardware configuration, whether there is an upgrade plan

Reference:

Http://wiki.apache.org/solr/DataImportHandler

Http://wiki.apache.org/solr/Solrj

Http://www.denghuafeng.com/post-242.html

Original address

I have been tested is not a problem, I use the solr4.9

Index MySQL data using SOLR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.