Sqoop fetchsize invalid

Source: Internet
Author: User
Tags sqoop

Http://boylook.itpub.net/post/43144/531416


A Job of online Sqoop extracts data from MySQL to Hadoop a few days ago) Suddenly reported OOME, and then re-run and perform java trace to find that memory usage is byte [], at the same time, the top 3 method of cpu is com. mysql. jdbc. byteArraryBuffer. getBytes means that the memory is all consumed by data. It is strange, why is OOME specified with fetch-size = 100 in option? The average record length is less than 1 kb );

Looking at the records that were successfully detected yesterday, we found that the memory occupied was MB, which is obviously not effective due to fetch-size.

+ --------- + ------------ + ---------- + ------------- + -------------- +

| Type | status | host | cpusec | mrinput_rec | memory_mb |

+ --------- + ------------ + ---------- + ------------- + -------------- +

| CLEANUP | SUCCESS | A | 0.3400 | NULL | 191.84765625 |

| MAP | SUCCESS | A | 335.6400 | 1006942 | 862.39843750 |

| SETUP | SUCCESS | B | 0.2000 | NULL | 179.34765625 |

+ --------- + ------------ + ---------- + ------------- + -------------- +

No way. I finally found RC by turning out the sqoop source code: fetchsize is ignored.

ProtectedVoidInitOptionDefaults (){

If(Options. getFetchSize () =Null){

LOG.info ("Preparing to use a MySQL streaming resultset .");

Options. setFetchSize (Integer. MIN_VALUE );

}ElseIf(

! Options. getFetchSize (). equals (Integer. MIN_VALUE)

&&! Options. getFetchSize (). equals (0 )){

LOG.info ("Argument '-- fetch-size" + options. getFetchSize ()

+ "'Will probably get ignored by MySQL JDBC driver .");

}

}

The reason is that the APIS provided by MySQL only support the row-by-row and all modes:

By default, ResultSets are completely retrieved and stored in memory. in most cases this isthe most efficient way to operate, and due to the design of the MySQL networkprotocol is easier to implement. if you are working with ResultSets that have alarge number of rows or large values, and cannot allocate heap space in yourJVM for the memory required, you can tell the driver to stream the results backone row at a time.

Http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-implementation-notes.html

Finally, fetchsize is removed. The Job is successfully executed and 400 rows occupy MB of memory;

+ --------- + ------------ + ---------- + ------------- + -------------- +

| Type | status | host | cpusec | mrinput_rec | memory_mb |

+ --------- + ------------ + ---------- + ------------- + -------------- +

| CLEANUP | SUCCESS | A | 0.4200 | NULL | 183.49218750 |

| MAP | FAILED | B | NULL |

| MAP | SUCCESS | A | 377.1200 | 7195560 | 408.08593750 |

| SETUP | SUCCESS | C | 0.2900 | NULL | 188.64843750 |

+ --------- + ------------ + ---------- + ------------- + -------------- +


This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1298634

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.