[Interactive Q & A sharing] Stage 1 wins the public welfare lecture hall of spark Asia Pacific Research Institute in the cloud computing Big Data age

Source: Internet
Author: User

"Winning the cloud computing Big Data era"

Spark Asia Pacific Research Institute Stage 1 Public Welfare lecture hall [Stage 1 interactive Q & A sharing]

 

Q1: How is jobserver enterprise used?

  • A video website in China has been using jobserver for more than half a year;

  • Jobserver is strongly recommended for spark summit in 2013 and 2014;

     


Q2: Is jobserver suitable for internal enterprise or external customers (may have concurrency and security requirements), or are the two OK?

  • Currently, visible enterprise use cases are used within the enterprise;

  • If it is outside the enterprise, it can be used as a cloud service or big data resource pool;


 

Q3: Could you tell me how much memory is required for spark to run 1 TB of data soon?

  • First, it depends on the memory used on each worker when the program is running and the CPU. You can manually configure it when submitting the program;

  • Second, it is related to bandwidth. Shuffle should minimize data;

  • The configuration of the machine where the driver is located is also extremely important. Generally, the memory and CPU of the client where the driver is located should be configured as much as possible based on the actual situation. At the same time, it is also crucial that the driver and spark cluster should be in the same network environment, and should be the executor of the worker for the driver to be continuously assigned to tasks, and the driver data should be accepted at the same time;


 

Q4: I am currently solving stackoverflow error. I use checkpoint to solve the problem of too long lineage. But this will affect the efficiency. How can we balance the efficiency with the error?

  • : Stackoverflow can be mitigated by configuring blockmanager memory management policies;

  • For checkpoint, it should be adjusted according to the actual situation. For example, if spark streaming contains two data copies by default, if the processing capability cannot consume real-time stream data in time, it is very easy to generate stackoverflow, and the time window and checkpoint should be adjusted according to the actual situation;



This article is from the spark Asia Pacific Research Institute blog, please be sure to keep this source http://rockyspark.blog.51cto.com/2229525/1555110

[Interactive Q & A sharing] Stage 1 wins the public welfare lecture hall of spark Asia Pacific Research Institute in the cloud computing Big Data age

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.