使用Solr索引MySQL資料

最後更新：2017-08-24 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：補全 list using leave 清空頁面 base 串連 request

環境搭建

1、到apache下載solr，地址：http://mirrors.hust.edu.cn/apache/lucene/solr/

2、解壓到某個目錄

3、cd into D:\Solr\solr-4.10.3\example

4、Execute the server by “java -jar startup.jar”Solr會自動運行在內建的Jetty上

5、訪問http://localhost:8983/solr/#/

PS：solr-5.0 以上預設對schema的管理是使用managed-schema，不能手動修改，需要使用Schema Restful的API操作。如果要想手動修改配置，把managed-schema拷貝一份修改為schema.xml，在solrconfig.xml中修改如下：

<!-- <schemaFactory class="ManagedIndexSchemaFactory">    <bool name="mutable">true</bool>    <str name="managedSchemaResourceName">managed-schema</str>  </schemaFactory> -->  <!-- <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">      <str name="defaultFieldType">strings</str>      <lst name="typeMapping">        <str name="valueClass">java.lang.Boolean</str>        <str name="fieldType">booleans</str>      </lst>      <lst name="typeMapping">        <str name="valueClass">java.util.Date</str>        <str name="fieldType">tdates</str>      </lst>      <lst name="typeMapping">        <str name="valueClass">java.lang.Long</str>        <str name="valueClass">java.lang.Integer</str>        <str name="fieldType">tlongs</str>      </lst>      <lst name="typeMapping">        <str name="valueClass">java.lang.Number</str>        <str name="fieldType">tdoubles</str>      </lst>    </processor> -->      <schemaFactory class="ClassicIndexSchemaFactory"/>

建立MySQL資料

DataBase Name: mybatis

Table Name: user

Db.sql

 1 SET FOREIGN_KEY_CHECKS=0; 2 -- ---------------------------- 3 -- Table structure for `user` 4 -- ---------------------------- 5 DROP TABLE IF EXISTS `user`; 6  7 CREATE TABLE `user` ( 8   `id` int(11) NOT NULL AUTO_INCREMENT, 9   `userName` varchar(50) DEFAULT NULL,10   `userAge` int(11) DEFAULT NULL,11   `userAddress` varchar(200) DEFAULT NULL,12   PRIMARY KEY (`id`)13 ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;14 15 -- ----------------------------16 -- Records of user17 -- ----------------------------18 INSERT INTO `user` VALUES (‘1‘, ‘summer‘, ‘30‘, ‘shanghai‘);19 INSERT INTO `user` VALUES (‘2‘, ‘test1‘, ‘22‘, ‘suzhou‘);20 INSERT INTO `user` VALUES (‘3‘, ‘test1‘, ‘29‘, ‘some place‘);21 INSERT INTO `user` VALUES (‘4‘, ‘lu‘, ‘28‘, ‘some place‘);22 INSERT INTO `user` VALUES (‘5‘, ‘xiaoxun‘, ‘27‘, ‘nanjing‘);

使用DataImportHandler匯入並索引資料

1）配置D:\Solr\solr-4.10.3\example\solr\collection1\conf\solrconfig.xml

在<requestHandler name="/select" class="solr.SearchHandler">前面上加上一個dataimport的處理的Handler

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">　　     <lst name="defaults">　　        <str name="config">data-config.xml</str>　　     </lst>　　</requestHandler>

2）在同目錄下添加data-config.xml

<?xml version="1.0" encoding="UTF-8"?><dataConfig>    <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/mybatis" user="root" password="luxx" batchSize="-1" />　　<document name="testDoc">        <entity name="user" pk="id"                query="select * from user">　　　         <field column="id" name="id"/>　　　        <field column="userName" name="userName"/>            <field column="userAge" name="userAge"/>            <field column="userAddress" name="userAddress"/>　　　  </entity>　　</document></dataConfig>

說明：

dataSource是資料庫資料來源。

Entity就是一張表對應的實體，pk是主鍵，query是查詢語句。

Field對應一個欄位，column是資料庫裡的column名，後面的name屬性對應著Solr的Filed的名字。

3）修改同目錄下的schema.xml，這是Solr對資料庫裡的資料進行索引的模式

（1）保留_version_ 這個field

（2）添加索引欄位：這裡每個field的name要和data-config.xml裡的entity的field的name一樣，一一對應。

<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" /><!--<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false"/> --><field name="userName" type="text_general" indexed="true" stored="true" /><field name="userAge" type="int" indexed="true" stored="true" /><field name="userAddress" type="text_general" indexed="true" stored="true" />

（3）刪除多餘的field，刪除copyField裡的設定，這些用不上。注意：text這個field不能刪除，否則Solr啟動失敗。

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

（4）設定唯一主鍵：<uniqueKey>id</uniqueKey>，注意：Solr中索引的主鍵預設是只支援type="string"字串類型的，而我的資料庫中id是int型的，會有問題，解決方案：修改同目錄下的elevate.xml，注釋掉下面2行，這貌似是Solr的Bug，原因不明。

<doc id="MA147LL/A" /><doc id="IW-02" exclude="true" />

4）拷貝mysql-connector-java-5.1.22-bin.jar和solr-dataimporthandler-4.10.3.jar到D:\Solr\solr-4.10.3\example\solr-webapp\webapp\WEB-INF\lib。一個是mysql的java驅動，另一個在D:\Solr\solr-4.10.3\dist目錄裡，是org.apache.solr.handler.dataimport.DataImportHandler所在的jar。

重啟Solr。

如果配置正確就可以啟動成功。

solrconfig.xml是solr的基礎檔案，裡面配置了各種web要求處理常式、請求響應處理器、日誌、緩衝等。

schema.xml配置映射了各種資料類型的索引方案。分詞器的配置、索引文檔中包含的欄位也在此配置。

索引測試

進入Solr首頁，在Core Selector中選擇collection1：http://localhost:8983/solr/#/collection1

點擊Dataimport，Command選擇full-import（預設），點擊“Execute”，Refresh Status就可以看到結果：

Indexing completed. Added/Updated: 7 documents.Deleted 0 documents.

Requests: 1, Fetched: 7, Skipped: 0, Processed: 7

Started: 8 minutes ago

Query測試：在q中輸入userName:test1進行檢索就可以看到結果。

這裡使用full-import索引了設定資料庫中的全部資料，使用Solr可以查詢對應的資料。

使用Solrj索引並檢索資料

上面是使用Solr Admin頁面上的功能測試索引和檢索，也可以使用代碼來操作Solr，下面的代碼測試了在Solr索引中添加了一個User類實體，並通過尋找所有的index來返回結果。

User實體類：

package com.mybatis.test.model;import org.apache.solr.client.solrj.beans.Field;public class User {    @Field    private int id;        @Field    private String userName;        @Field    private int userAge;        @Field    private String userAddress;    public int getId() {        return id;    }    public void setId(int id) {        this.id = id;    }    public String getUserName() {        return userName;    }    public void setUserName(String userName) {        this.userName = userName;    }    public int getUserAge() {        return userAge;    }    public void setUserAge(int userAge) {        this.userAge = userAge;    }    public String getUserAddress() {        return userAddress;    }    public void setUserAddress(String userAddress) {        this.userAddress = userAddress;    }    @Override    public String toString() {        return this.userName + " " + this.userAge + " " + this.userAddress;    }}

使用@Field註解的屬性要和Solr配置的Field對應。

測試代碼：

package com.solr.test;import java.io.IOException;import org.apache.solr.client.solrj.SolrQuery;import org.apache.solr.client.solrj.SolrServer;import org.apache.solr.client.solrj.SolrServerException;import org.apache.solr.client.solrj.impl.HttpSolrServer;import org.apache.solr.client.solrj.response.QueryResponse;import org.apache.solr.client.solrj.response.UpdateResponse;import org.apache.solr.common.SolrDocumentList;import com.mybatis.test.model.User;public class SolrTest {        private static SolrServer server;        private static final String DEFAULT_URL = "http://localhost:8983/solr/collection1";        public static void init() {        server = new HttpSolrServer(DEFAULT_URL);    }        public static void indexUser(User user){        try {            //添加user bean到索引庫            try {                UpdateResponse response = server.addBean(user);                server.commit();                System.out.println(response.getStatus());            } catch (IOException e) {                e.printStackTrace();            }        } catch (SolrServerException e) {            e.printStackTrace();        }     }        //測試添加一個新的bean執行個體到索引    public static void testIndexUser(){        User user = new User();        user.setId(8);        user.setUserAddress("place");        user.setUserName("cdebcdccga");        user.setUserAge(83);                indexUser(user);    }        public static void testQueryAll() {        SolrQuery params = new SolrQuery();                // 查詢關鍵詞，*:*代表所有屬性、所有值，即所有index        params.set("q", "*:*");                // 分頁，start=0就是從0開始，rows=5當前返回5條記錄，第二頁就是變化start這個值為5就可以了。        params.set("start", 0);        params.set("rows", Integer.MAX_VALUE);                    // 排序，如果按照id排序，那麼將score desc 改成 id desc(or asc)        // params.set("sort", "score desc");        params.set("sort", "id asc");                // 返回資訊*為全部，這裡是全部加上score，如果不加下面就不能使用score        params.set("fl", "*,score");                QueryResponse response = null;        try {            response = server.query(params);        } catch (SolrServerException e) {            e.printStackTrace();        }                if(response!=null){            System.out.println("Search Results: ");            SolrDocumentList list = response.getResults();            for (int i = 0; i < list.size(); i++) {                System.out.println(list.get(i));            }        }    }         public static void main(String[] args) {        init();        //testIndexUser();        testQueryAll();    }}

如果在資料庫中添加一條資料，但是Solr索引中沒有index這條資料，就查不到，所以一般在使用Solr檢索資料庫裡的內容時，都是先插入資料庫，再在Solr中index這條資料，使用Solr的模糊查詢或是分詞功能來檢索資料庫裡的內容。

DIH增量從MYSQL資料庫匯入資料
已經學會了如何全量匯入MySQL的資料，全量匯入在資料量大的時候代價非常大，一般來說都會適用增量的方式來匯入資料，下面介紹如何增量匯入MYSQL資料庫中的資料，以及如何設定定時來做。

1）資料庫表的更改

前面已經建立好了一個User的表，這裡為了能夠進行增量匯入，需要新增一個欄位updateTime，類型為timestamp，預設值為CURRENT_TIMESTAMP。

有了這樣一個欄位，Solr才能判斷增量匯入的時候，哪些資料是新的。

因為Solr本身有一個預設值last_index_time，記錄最後一次做full import或者是delta import(增量匯入）的時間，這個值儲存在檔案conf目錄的dataimport.properties檔案中。

2）data-config.xml中必要屬性的設定

transformer 格式轉化：HTMLStripTransformer 索引中忽略HTML標籤

query：查詢資料庫表符合記錄資料

deltaQuery：增量索引查詢主鍵ID 注意這個只能返回ID欄位

deltaImportQuery：增量索引查詢匯入的資料

deletedPkQuery：增量索引刪除主鍵ID查詢注意這個只能返回ID欄位

有關“query”，“deltaImportQuery”， “deltaQuery”的解釋，引用官網說明，如下所示：
The query gives the data needed to populate fields of the Solr document in full-import
The deltaImportQuery gives the data needed to populate fields when running a delta-import
The deltaQuery gives the primary keys of the current entity which have changes since the last index time

如果需要關聯子表查詢，可能需要用到parentDeltaQuery

The parentDeltaQuery uses the changed rows of the current table (fetched with deltaQuery) to give the changed rows in theparent table. This is necessary because whenever a row in the child table changes, we need to re-generate the document which has that field.

更多說明看DeltaImportHandler的說明文檔。

針對User表，data-config.xml檔案的配置內容如下：

<?xml version="1.0" encoding="UTF-8"?><dataConfig>    <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/mybatis" user="root" password="luxx" batchSize="-1" />　　<document name="testDoc">        <entity name="user" pk="id"                query="select * from user"                deltaImportQuery="select * from user where id=‘${dih.delta.id}‘"                deltaQuery="select id from user where updateTime> ‘${dataimporter.last_index_time}‘">　　　         <field column="id" name="id"/>　　　        <field column="userName" name="userName"/>            <field column="userAge" name="userAge"/>            <field column="userAddress" name="userAddress"/><field column="updateTime" name="updateTime"/>　　　  </entity>　　</document></dataConfig>

增量索引的原理是從資料庫中根據deltaQuery指定的SQL語句查詢出所有需要增量匯入的資料的ID號。

然後根據deltaImportQuery指定的SQL語句返回所有這些ID的資料，即為這次增量匯入所要處理的資料。

核心思想是：通過內建變數“${dih.delta.id}”和 “${dataimporter.last_index_time}”來記錄本次要索引的id和最近一次索引的時間。

注意：剛新加上的updateTime欄位也要在field屬性中配置，同時也要在schema.xml檔案中配置：

<field name="updateTime" type="date" indexed="true" stored="true" />

如果業務中還有刪除操作，可以在資料庫中加一個isDeleted欄位來表明該條資料是否已經被刪除，這時候Solr在更新index的時候，可以根據這個欄位來更新哪些已經刪除了的記錄的索引。

這時候需要在dataConfig.xml中添加：

query="select * from user where isDeleted=0"deltaImportQuery="select * from user where id=‘${dih.delta.id}‘"deltaQuery="select id from user where updateTime> ‘${dataimporter.last_index_time}‘ and isDeleted=0"deletedPkQuery="select id from user where isDeleted=1"

這時候Solr進行增量索引的時候，就會刪除資料庫中isDeleted=1的資料的索引。

測試增量匯入

如果User表裡有資料，可以先清空以前的測試資料（因為加的updateTime沒有值），用我的Mybatis測試程式添加一個User，資料庫會以目前時間賦值給該欄位。在Solr中使用Query查詢所有沒有查詢到該值，使用dataimport?command=delta-import增量匯入，重新查詢所有就可以查詢到剛剛插入到MySQL的值。

設定增量匯入為定時執行的任務

可以用Windows計劃任務，或者Linux的Cron來定期訪問增量匯入的串連來完成定時增量匯入的功能，這其實也是可以的，而且應該沒什麼問題。

但是更方便，更加與Solr本身整合度高的是利用其自身的定時增量匯入功能。

1、下載apache-solr-dataimportscheduler-1.0.jar放到\solr-webapp\webapp\WEB-INF\lib目錄下：
：http://code.google.com/p/solr-dataimport-scheduler/downloads/list
也可以到百度雲端硬碟下載：http://pan.baidu.com/s/1dDw0MRn

注意：apache-solr-dataimportscheduler-1.0.jar有bug，參考：http://www.denghuafeng.com/post-242.html

2、修改solr的WEB-INF目錄下面的web.xml檔案：
為<web-app>元素添加一個子項目

<listener>        <listener-class>    org.apache.solr.handler.dataimport.scheduler.ApplicationListener        </listener-class>    </listener>

3、建立設定檔dataimport.properties：

在SOLR_HOME\solr目錄下面建立一個目錄conf（注意不是SOLR_HOME\solr\collection1下面的conf），然後用解壓檔案開啟apache-solr-dataimportscheduler-1.0.jar檔案，將裡面的dataimport.properties檔案拷貝過來，進行修改，下面是最終我的自動定時更新設定檔內容：

##################################################                                               ##       dataimport scheduler properties         ##                                               ###################################################  to sync or not to sync#  1 - active; anything else - inactivesyncEnabled=1#  which cores to schedule#  in a multi-core environment you can decide which cores you want syncronized#  leave empty or comment it out if using single-core deployment#  syncCores=game,resourcesyncCores=collection1#  solr server name or IP address#  [defaults to localhost if empty]server=localhost#  solr server port#  [defaults to 80 if empty]port=8983#  application name/context#  [defaults to current ServletContextListener‘s context (app) name]webapp=solr#  URLparams [mandatory]#  remainder of URL#http://localhost:8983/solr/collection1/dataimport?command=delta-import&clean=false&commit=trueparams=/dataimport?command=delta-import&clean=false&commit=true#  schedule interval#  number of minutes between two runs#  [defaults to 30 if empty]interval=1#  重做索引的時間間隔，單位分鐘，預設7200，即1天; #  為空白,為0,或者注釋掉:表示永不重做索引# reBuildIndexInterval=2#  重做索引的參數reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true#  重做索引時間間隔的計時開始時間，第一次真正執行的時間=reBuildIndexBeginTime+reBuildIndexInterval*60*1000；#  兩種格式：2012-04-11 03:10:00 或者  03:10:00，後一種會自動補全日期部分為服務啟動時的日期reBuildIndexBeginTime=03:10:00

這裡為了做測試每1分鐘就進行一次增量索引，同時disable了full-import全量索引。

4、測試

在資料庫中插入一條資料，在Solr Query中查詢，剛開始查不到，Solr進行一次增量索引後就可以查詢到了。

一般來說要在你的項目中引入Solr需要考慮以下幾點：
1、資料更新頻率：每天資料增量有多大，及時更新還是定時更新
2、資料總量：資料要儲存多長時間
3、一致性要求：期望多長時間內看到更新的資料，最長允許多長時間延遲
4、資料特點：資料來源包括哪些，平均單條記錄大小
5、業務特點：有哪些排序要求，檢索條件
6、資源複用：已有的硬體設定是怎樣的，是否有升級計劃

參考：

http://wiki.apache.org/solr/DataImportHandler

http://wiki.apache.org/solr/Solrj

http://www.denghuafeng.com/post-242.html

原文地址

本人經過實驗是沒有問題的，我用的是solr4.9

使用Solr索引MySQL資料

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More