Using SOLR's dihandler to build MySQL large table full-scale index, the solution of memory overflow problem

Source: Internet
Author: User
Tags solr

The solution that SOLR officially gives is:

Dataimporthandler is designed to stream row One-by-one. It passes a Fetch size value (DEFAULT:500) to statement#setfetchsize which some drivers does not honor. For MySQL, add batchsize-dataSource configuration with value-1. This would pass integer.min_value to the driver as the fetch size and keep it from going out of memory for large tables. Should look Like:<datasource type= "Jdbcdatasource" name= "ds-2" driver= "Com.mysql.jdbc.Driver" url= "JDBC: Mysql://localhost:8889/mysqldatabase "batchsize="-1 "user=" root "password=" root "/>

Description: The Dataimporthandler design is supported on a per-row basis. It sets the number of fetches per statement#setfetchsize by default of 500. However some drivers do not support setting fetchsize. For MySQL, pass the Fetchsize property value-1 to the DataSource configuration. It passes Integer.min_value ( -231,-2147483648 [0x80000000]) to the driver as fetchsize, ensuring that large tables do not cause large table removal.

The official explanation given by MySQL is:

Resultsetby Default, ResultSets is completely retrieved and stored in memory.  The most cases the efficient-operate and, due to the design of the MySQL network protocol, are easier to Implement.  If you is working with resultsets that has a large number of rows or large values and cannot allocate heap space in your JVM for the memory required,You can tell the driver to stream the results back one row at a time. To enable this functionality,Create a Statement instance in the following manner:stmt = Conn.createstatement (Java.sql.ResultSet.TYPE_FORWARD_ONLY, Java.sql.ResultSet.CONCUR_READ_ONLY); stmt.se Tfetchsize (integer.min_value);The combination of a forward-only, read-only result set, with a fetch size of integer.min_value serves as a signal to The driver to stream result sets Row-by-row. after this,Any  result sets created with the statement would be retrieved Row-by-row. There is some caveats with this approach. You must read all of the rows in the result set (or close it) before you can issue any other queries on the CONNECTION,or an exception would be thrown. The earliest the locks these statements hold can is released (whether they be MyISAM table-level locks or row-level locks In some other storage engine such as InnoDB) are when the statement completes. If The statement is within scope of a transaction, then locks is released when the transaction completes (which implies so the statement needs to complete first). As with the other databases,Statements is not complete until all the results pending on the statement is read or the active result set for the Statement is closed. Therefore, if using streaming results, process them as quickly as possible if you want to maintain concurrent access to the tables Referenced by the statement producing the result set.

By combining the forward-only,read-only resultset and fetchsize values for Integer.min_value as a driver, a row is used to get the result stream signal. Once set, all statement created resultset will fetch the result set in rows.

Reference documents:

"1" Https://wiki.apache.org/solr/DataImportHandlerFaq

"2" http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html

Using SOLR's dihandler to build MySQL large table full-scale index, the solution of memory overflow problem

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.