Http://boylook.itpub.net/post/43144/531416
A Job of online Sqoop extracts data from MySQL to Hadoop a few days ago) Suddenly reported OOME, and then re-run and perform java trace to find that memory usage is byte [], at the same time, the top 3 method of cpu is com. mysql. jdbc. byteArraryBuffer. getBytes means that the memory is all consumed by data. It is strange, why is OOME specified with fetch-size = 100 in option? The average record length is less than 1 kb );
Looking at the records that were successfully detected yesterday, we found that the memory occupied was MB, which is obviously not effective due to fetch-size.
+ --------- + ------------ + ---------- + ------------- + -------------- +
| Type | status | host | cpusec | mrinput_rec | memory_mb |
+ --------- + ------------ + ---------- + ------------- + -------------- +
| CLEANUP | SUCCESS | A | 0.3400 | NULL | 191.84765625 |
| MAP | SUCCESS | A | 335.6400 | 1006942 | 862.39843750 |
| SETUP | SUCCESS | B | 0.2000 | NULL | 179.34765625 |
+ --------- + ------------ + ---------- + ------------- + -------------- +
No way. I finally found RC by turning out the sqoop source code: fetchsize is ignored.
ProtectedVoidInitOptionDefaults (){
If(Options. getFetchSize () =Null){
LOG.info ("Preparing to use a MySQL streaming resultset .");
Options. setFetchSize (Integer. MIN_VALUE );
}ElseIf(
! Options. getFetchSize (). equals (Integer. MIN_VALUE)
&&! Options. getFetchSize (). equals (0 )){
LOG.info ("Argument '-- fetch-size" + options. getFetchSize ()
+ "'Will probably get ignored by MySQL JDBC driver .");
}
}
The reason is that the APIS provided by MySQL only support the row-by-row and all modes:
By default, ResultSets are completely retrieved and stored in memory. in most cases this isthe most efficient way to operate, and due to the design of the MySQL networkprotocol is easier to implement. if you are working with ResultSets that have alarge number of rows or large values, and cannot allocate heap space in yourJVM for the memory required, you can tell the driver to stream the results backone row at a time.
Http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-implementation-notes.html
Finally, fetchsize is removed. The Job is successfully executed and 400 rows occupy MB of memory;
+ --------- + ------------ + ---------- + ------------- + -------------- +
| Type | status | host | cpusec | mrinput_rec | memory_mb |
+ --------- + ------------ + ---------- + ------------- + -------------- +
| CLEANUP | SUCCESS | A | 0.4200 | NULL | 183.49218750 |
| MAP | FAILED | B | NULL |
| MAP | SUCCESS | A | 377.1200 | 7195560 | 408.08593750 |
| SETUP | SUCCESS | C | 0.2900 | NULL | 188.64843750 |
+ --------- + ------------ + ---------- + ------------- + -------------- +
This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1298634