This article transferred from: http://blog.csdn.net/xiaoyu714543065/article/details/11849115
I. Data import (DATAIMPORTHANDLER-DIH)
DIH is a toolkit provided by SOLR for importing databases, xml/http, and rich text objects into the SOLR index library. Only the database is introduced here.
A. Prepare the following JAR packages
Apache-solr-dataimporthandler-4.0.0.jar
Apache-solr-dataimporthandler-extras-4.0.0.jar
Apache-solr-dataimportscheduler-1.1.jar (incremental import use)
The JDBC driver package for the database is used by Oracle Oracle10g.ja into Tomcat6.0.36/webapps/sol/web-inf/lib
B, Configuration Solrconfig.xml
Add the following configuration to the Solrconfig.xml:
<requesthandlername= "/dataimport" class= "Org.apache.solr.handler.dataimport.DataImportHandler" >
<lst name= "Defaults" >
<str name= "config" >xx-data-config.xml</str>
</lst>
</requestHandler>
C. Configure the data source
Establish the Xx-data-config.xml file in the above configuration in the same directory as the Solrconfig.xml file, configured as follows
The query property is used when full import is in use. Others are used for incremental imports.
<?xml version= "1.0″encoding=" utf-8″?>
<dataConfig>
<datasource type= "Jdbcdatasource"
Driver= "Oracle.jdbc.driver.OracleDriver"
Url= "JDBC:ORACLE:THIN:@192.168.0.129:1521:ORCL"
User= "username"
password= "Password"/>
<document>
<entity name= "Business_info" pk= "ID"
query= "Select T.id id,business_name,bussiness_type from Business T"
deltaimportquery= "Select T.id id,business_name,bussiness_type from Business t where id= ' ${dataimporter.delta.id} '"
deltaquery= "Select T.id id,business_name,bussiness_type from the business T where To_char (UpdateTime, ' Yyyy-mm-dd hh24:mi: SS ') > ' ${dataimporter.last_index_time} ' >
<field column= "id" name= "id"/>
</entity>
</document>
</dataConfig>
Now that all DIH configuration is complete, enter the command in the browser:
Full import:
Http://localhost:8085/solr/core0/dataimport?command=full-import&commit=ture
Incremental import:
Http://localhost:8085/solr/core0/dataimport?command=delta-import&clean=false&commit=ture
View import Status
Http://localhost:8085/solr/core0/dataimport?command=status
D. Processing the Clob field
<entity name= "meta" query= "select Id,filename,content,bytes from Documents" transformer= "Clobtransformer" >
<field column= "id" name= "id"/>
<field column= "Content" name= "Content" clob= "true"/>
</entity>
The column of the ClOB field must be capitalized!!
E, Dih Memory overflow error
When using Dih, it is easy to report a memory overflow error. Can be resolved by setting the JVM size. The Setup method is as follows:
In Tomcat\bin\startup.bat add set java_opts=-xms128m-xmx1024m configuration here is 1024M, according to the situation can increase the amount of
F, automatic full import and automatic incremental import
This feature can be implemented on its own, or can be done with the Apache-solr-dataimportscheduler-1.0.jar package. The configuration is as follows:
Modify the Web-inf/web.xml in Solr.war and increase it before the servlet node:
<listener>
<listener-class>
Org.apache.solr.handler.dataimport.scheduler.ApplicationListener
</listener-class>
</listener>
Remove the dataimport.properties from the Apache-solr-dataimportscheduler-.jar and modify it according to the actual situation and put it on solr.home/conf (not solr.home/core/ conf) directory below
Specific configuration can be consulted: http://code.google.com/p/solr-dataimport-scheduler/
"Go" SOLR import data from a database (DIH)