Spark Read MongoDB failed, reported executor time out and GC overhead limit exceeded exception

Source: Internet
Author: User
Tags gc overhead limit exceeded

Code:

ImportCom.mongodb.spark.config.ReadConfigImportcom.mongodb.spark.sql._val config=SqlContext.sparkContext.getConf.set ("Spark.mongodb.keep_alive_ms", "15000"). Set ("Spark.mongodb.input.uri", "mongodb://10.100.12.14:27017"). Set ("Spark.mongodb.input.database", "BI"). Set ("Spark.mongodb.input.collection", "usergroupmapping") Val Readconfig=readconfig (config) Val objusergroupmapping=SqlContext.read.format ("Com.mongodb.spark.sql"). MONGO (Readconfig) Objusergroupmapping.printschema () Val tbusergroupmapping=objusergroupmapping.todf () tbusergroupmapping.registertemptable ("Usergroupmapping") Select _id,c,g,n,rn,t,ut from usergroupmapping where UT> ' 2018-05-02 ' limit 100

Using the above code to take the 100 records after the Usergroupmapping collection, a executor time out and GC overhead limit exceeded exception occurred. At first thought that the task from MongoDB data is too large, resulting in spark executor memory is not enough, and then researched the spark MongoDB connector when fetching data is conditional, That is, the first filter from MongoDB and then retrieve the spark memory, so that there is not enough memory. Later, after online research, there is a statement that the task is too many, leading to the time of the task GC contention for GC Time and memory resources (this is not very clear), according to this statement, I will originally task core from 16 to 6 after the run program, actually won't error. As for the specific reason is not very clear, first recorded here.

Spark Read MongoDB failed, reported executor time out and GC overhead limit exceeded exception

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.