Solrj and Solr DIH index efficiency comparison and analysis, sol1_dih
Test software environment:
1. 16 GB Windows 7 x64 32 core cpu.
2. jdk 1.7 tomcat 6.x solr 4.8
Database software environment:
1. 16 GB Windows 7 x64 32 core cpu.
2. Oracle 11g
1. Solr default index tool DIH.
Use Solr DIH to index data, 19 million data records, which takes about 45 minutes, 6500 records per second, totaling records per minute.
The maximum jvm heap memory is 4 GB, and solr index config uses the default parameter.
Solr DIH import:
It takes about one hour to import data records.:
About 15 indexed fields:
(Note: The fewer fields, the smaller the field value, and the faster the indexing speed. Therefore, it is particularly important to optimize Solr query and indexing efficiency)
Ii. Solrj API index data.
The efficiency of using Solrj APIS is slightly lower, with a total of 30 W per second, which takes more than an hour.
The Solr Server configuration parameters are the same as above. Read database data on the client machine and use the Solrj api for indexing. The Code is as follows:
Import java. io. IOException; import java. SQL. connection; import java. SQL. driverManager; import java. SQL. resultSet; import java. SQL. SQLException; import java. SQL. statement; import java. util. arrayList; import java. util. hashMap; import java. util. list; import java. util. UUID; import org. apache. solr. client. solrj. solrServer; import org. apache. solr. client. solrj. solrServerException; import org. apache. solr. client. solrj. I Mpl. httpSolrServer; import org. springframework. context. applicationContext; import org. springframework. context. support. classPathXmlApplicationContext; import org. springframework. util. stringUtils; import com. tianditu. search. v2.POI; public class ImportPOI implements IJobDef {private SolrServer server; private DatasourceConfig jdbcConfig; private SolrConfig solrConfig; private POIImportConfig poiConfig; public Performanceconfig getJdbcConfig () {return jdbcConfig;} public void setJdbcConfig (performanceconfig jdbcConfig) {this. jdbcConfig = jdbcConfig;} public SolrConfig getSolrConfig () {return solrConfig;} public void setSolrConfig (SolrConfig solrConfig) {this. solrConfig = solrConfig;} public POIImportConfig getPoiConfig () {return poiConfig;} public void setPoiConfig (POIImportConfig poiConfig) {this. poiConfig = PoiConfig;}/*** @ param args */public static void main (String [] args) {// TODO Auto-generated method stubApplicationContext context = new ClassPathXmlApplicationContext ("app-spring.xml"); ImportPOI importTool = (ImportPOI) context. getBean ("importPOITool"); importTool. submit (new JobDoneCallBack () {public void onCallback (JobStatus) {// TODO Auto-generated method stubSystem. out. println (status. ge TStatus (); System. out. println (status. getMessage () ;}, new JobTimer () {public void onTimeUpdate (long timeCost) {// TODO Auto-generated method stubSystem. out. println ("solr submitted once, time elapsed from task start:" + timeCost/(1000*60) + "Minute") ;}) ;} public SolrServer getServer () {return server;} public void setServer (SolrServer server) {this. server = server;} public boolean importPOI (HashMap <String, Object> params) {return false;} priv Ate POI getPOI (ResultSet rs) throws SQLException {POI poi = new POI (); poi. setId (UUID. randomUUID ()). toString (); poi. setName (rs. getString ("nameforStore"); poi. setAddress (rs. getString ("addressforStore"); String lat = rs. getString ("lat"); if (lat! = Null &&! Lat. equalsIgnoreCase ("null") & lat. length ()> 0) {poi. setLat (Double. valueOf (lat);} String lon = rs. getString ("lon"); // poi. setLon (rs. getDouble ("lon"); if (lon! = Null &&! Lon. equalsIgnoreCase ("null") & lon. length ()> 0) {poi. setLon (Double. valueOf (lon);} poi. setNid (rs. getString ("DOCID"); String totalCity = rs. getString ("totalcity"); if (! StringUtils. isEmpty (totalCity) {// --------- citycodeString [] cities = totalCity. split (""); List <String> cs = new ArrayList <String> (); for (String c: cities) {cs. add (c);} poi. setCities (cs);} String types = rs. getString ("type"); if (! StringUtils. isEmpty (types) {// type ----------------- String [] typea = types. split (""); List <String> t = new ArrayList <String> (); for (String c: typea) {t. add (c);} // poi. setCities (cs); poi. setTypes (t) ;}return poi ;}; public void submit (JobDoneCallBack callback, JobTimer timer) {if (solrConfig = null) {throw new IllegalArgumentException ("SolrJ is not correctly configured. ");} if (jdbcConfig = null) {throw new IllegalArgumentException (" JDB C is not correctly configured. ");} if (poiConfig = null) {throw new IllegalArgumentException (" the POI configuration file is incorrectly configured. ");} Connection con = null; Statement pst = null; ResultSet rs = null; SolrServer ss = null; JobStatus status = new JobStatus (); status. setName ("ImportPOI"); status. setStatus ("failure"); int I = 0; int c = 0; long start = System. currentTimeMillis (); try {Class. forName (jdbcConfig. getDriverClass ()). newInstance (); con = DriverManag Er. getConnection (jdbcConfig. getUrl (), jdbcConfig. getUserName (), jdbcConfig. getPassWord (); int batchSize = Integer. valueOf (poiConfig. getImportRecordSize (); ss = new HttpSolrServer (solrConfig. getSolrUrl (); if (poiConfig. isDeleteOnstartup () {ss. deleteByQuery ("*: *"); ss. commit ();} if (jdbcConfig. getDriverClass (). toString (). contains ("mysql") {// mysqlpst = (com. mysql. jdbc. statement) con. createStatement (Resu LtSet. FETCH_FORWARD, ResultSet. CONCUR_READ_ONLY); pst. setFetchSize (1); (com. mysql. jdbc. statement) pst ). enableStreamingResults ();} else {pst = con. createStatement ();} rs = pst.exe cuteQuery (poiConfig. getimpsql SQL (); POI p = null; List <POI> pois = new ArrayList <POI> (); while (rs. next () {p = getPOI (rs); // ss. addBean (p); pois. add (p); if (I> = batchSize) {long commitT = System. currentTimeMillis (); // System. out. println ("Elapsed time:" + (commitT-start)/1000*60 + "Minutes"); timer. onTimeUpdate (commitT-start); // System. out. println ("submit once"); ss. addBeans (pois); ss. commit (); pois. clear (); c ++; I = 0 ;}else {I ++ ;}} ss. addBeans (pois); ss. commit (); long end = System. currentTimeMillis (); status. setStatus ("success"); status. setMessage ("processed successfully, total time consumed:" + (end-start)/1000*60 + "Minutes"); status. setTimeCost (end-start)/1000*60);} catch (SQLException e) {// TODO Auto-gene Rated catch block // e. printStackTrace (); status. setMessage (e. toString ();} catch (ClassNotFoundException e) {// TODO Auto-generated catch block // e. printStackTrace (); status. setMessage (e. toString ();} catch (InstantiationException e) {// TODO Auto-generated catch block // e. printStackTrace (); status. setMessage (e. toString ();} catch (IllegalAccessException e) {// TODO Auto-generated catch block // e. printSt AckTrace (); status. setMessage (e. toString ();} catch (SolrServerException e) {// TODO Auto-generated catch block // e. printStackTrace (); status. setMessage (e. toString ();} catch (IOException e) {// TODO Auto-generated catch block // e. printStackTrace (); status. setMessage (e. toString ();} finally {try {if (rs! = Null) {rs. close () ;}} catch (SQLException e) {// TODO Auto-generated catch blocke. printStackTrace () ;}try {if (pst! = Null) pst. close ();} catch (SQLException e) {// TODO Auto-generated catch blocke. printStackTrace ();} try {if (con! = Null) con. close ();} catch (SQLException e) {// TODO Auto-generated catch blocke. printStackTrace ();} if (callback! = Null) {callback. onCallback (status) ;}// return false ;};}
The whole process is to read the database, convert the data to DTO, then insert SolrServer. addBeans into solr server, call SolrServer. commit to submit the index (you can query the results ).
The code for reading the conversion process from the database is as follows:
private POI getPOI(ResultSet rs) throws SQLException{POI poi = new POI();poi.setId((UUID.randomUUID()).toString());poi.setName(rs.getString("nameforStore"));poi.setAddress(rs.getString("addressforStore"));String lat = rs.getString("lat");if(lat!=null&&!lat.equalsIgnoreCase("null")&&lat.length()>0){poi.setLat(Double.valueOf(lat));}String lon = rs.getString("lon");//poi.setLon(rs.getDouble("lon"));if(lon!=null&&!lon.equalsIgnoreCase("null")&&lon.length()>0){poi.setLon(Double.valueOf(lon));}poi.setNid(rs.getString("DOCID"));String totalCity = rs.getString("totalcity");if(!StringUtils.isEmpty(totalCity)){//---------citycodeString[] cities = totalCity.split(" ");List<String> cs = new ArrayList<String>();for(String c:cities){cs.add(c);}poi.setCities(cs);}String types = rs.getString("type");if(!StringUtils.isEmpty(types)){//type-----------------String[] typea = types.split(" ");List<String> t = new ArrayList<String>();for(String c:typea){t.add(c);}//poi.setCities(cs);poi.setTypes(t);}return poi;};
SolrJ index Process Code:
List <POI> pois = new ArrayList <POI> (); while (rs. next () {// traverse JDBC ResultSetp = getPOI (rs); // ss. addBean (p); pois. add (p); if (I> = batchSize) {// quantitative batch index logic long commitT = System. currentTimeMillis (); // System. out. println ("elapsed time:" + (commitT-start)/1000*60 + "Minutes"); timer. onTimeUpdate (commitT-start); // System. out. println ("submit once"); ss. addBeans (pois); // sent to SolrServerss. commit (); pois. clear (); c ++; I = 0 ;}else {I ++ ;}} ss. addBeans (pois); // make the final submission ss. commit ();
Analysis:
1. What are the major performance differences?
A: The main difference between solution 1 and solution 1 is that solution 1 directly calls the Solr internal UpdateHandler after accessing the data and directly places the data into the index. Solution 2: Calling SolrJ to index data requires a network IO. Solution 2: convert data to DTO before solrj indexing, then Solrj converts DTO SolrInputDocument object, and then SolrInputDocument object to the string required by the solr rest interface, there are multiple conversions and performance loss (Note: Calling Solrj addBeans to import indexes in batches is a way to improve performance. If one commit is performed, the performance will be worse, http requests ).
2. How to optimize it?
Answer: Question 1 is the answer to question 2. The main sections for data entity conversion are as follows: 1. Use the calling interface as simple as possible. Use ResultSet to directly convert to SolrInputDocument object, with less data conversion. 2. Use arrays and other data structures to replace the current List <Bean>.
3. Can I directly create an index using Solr EmbededSolrServer to improve efficiency?
A: After testing, EmbededSolrServer can improve the index efficiency, which is more than twice that of DIH. Use the following code:
Private SolrServer getSolrServer () {// System. setProperty ("solr. solr. home "," R :\\ solrhome1 \ solr \ POI \ "); CoreContainer coreContainer = new CoreContainer (" R: \ solrhome1 \ solr \\"); coreContainer. load (); // initialize // while (! CoreContainer. isLoaded ("POI") {// System. out. println ("loading... "); //} System. out. println (coreContainer. getAllCoreNames (); server = new EmbeddedSolrServer (coreContainer, "POI"); return server ;}
(Note: EmbededSolrServer ensures that the program runs on the Solr server and does not pass the http method. Generally, the application scenario is two cores. One core is used. After this method is completed, swarp checks the core, to provide external retrieval services)
Article reprint, please indicate the source: http://www.cnblogs.com/likehua/p/4465514.html