使用solr的DIHandler 構建mysql大表全量索引,記憶體溢出問題的解決方案

來源:互聯網
上載者:User

標籤:

solr官方給出的解決方式是:

DataImportHandler is designed to stream row one-by-one. It passes a fetch size value (default: 500) to Statement#setFetchSize which some drivers do not honor. For MySQL, add batchSize property to dataSource configuration with value -1. This will pass Integer.MIN_VALUE to the driver as the fetch size and keep it from going out of memory for large tables.Should look like:<dataSource type="JdbcDataSource" name="ds-2" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:8889/mysqldatabase" batchSize="-1" user="root" password="root"/>

說明:DataImportHandler 設計是支援按行擷取的。它通過Statement#setFetchSize來設定每次擷取的數量,預設是500個。然而一些驅動不支援設定fetchSize。對mysql來說,傳遞fetchSize屬性值-1到Datasource配置中。它將將Integer.MIN_VALUE(-231,-2147483648 [0x80000000])傳給驅動作為fetchsize,此時確保大表不會造成大表移除。

mysql官方給出的解釋是:

ResultSetBy default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement. If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.To enable this functionality, create a Statement instance in the following manner:stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,              java.sql.ResultSet.CONCUR_READ_ONLY);stmt.setFetchSize(Integer.MIN_VALUE);The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.There are some caveats with this approach. You must read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.If the statement is within scope of a transaction, then locks are released when the transaction completes (which implies that the statement needs to complete first). As with most other databases, statements are not complete until all the results pending on the statement are read or the active result set for the statement is closed.Therefore, if using streaming results, process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.

通過聯合使用forward-only,read-only resultSet和fetchsize值為Integer.MIN_VALUE作為驅動一行行擷取結果流的訊號。設定完以後,所有statement建立的resultSet將會一行行的擷取結果集。

參考文獻:

【1】 https://wiki.apache.org/solr/DataImportHandlerFaq

【2】http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html

使用solr的DIHandler 構建mysql大表全量索引,記憶體溢出問題的解決方案

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.