"Winning the cloud computing Big Data era"
Spark Asia Pacific Research Institute Stage 1 Public Welfare lecture hall [Stage 1 interactive Q & A sharing]
Q1: How is jobserver enterprise used?
Q2: Is jobserver suitable for internal enterprise or external customers (may have concurrency and security requirements), or are the two OK?
Currently, visible enterprise use cases are used within the enterprise;
If it is outside the enterprise, it can be used as a cloud service or big data resource pool;
Q3: Could you tell me how much memory is required for spark to run 1 TB of data soon?
First, it depends on the memory used on each worker when the program is running and the CPU. You can manually configure it when submitting the program;
Second, it is related to bandwidth. Shuffle should minimize data;
The configuration of the machine where the driver is located is also extremely important. Generally, the memory and CPU of the client where the driver is located should be configured as much as possible based on the actual situation. At the same time, it is also crucial that the driver and spark cluster should be in the same network environment, and should be the executor of the worker for the driver to be continuously assigned to tasks, and the driver data should be accepted at the same time;
Q4: I am currently solving stackoverflow error. I use checkpoint to solve the problem of too long lineage. But this will affect the efficiency. How can we balance the efficiency with the error?
: Stackoverflow can be mitigated by configuring blockmanager memory management policies;
For checkpoint, it should be adjusted according to the actual situation. For example, if spark streaming contains two data copies by default, if the processing capability cannot consume real-time stream data in time, it is very easy to generate stackoverflow, and the time window and checkpoint should be adjusted according to the actual situation;
[Interactive Q & A sharing] Stage 1 wins the public welfare lecture hall of spark Asia Pacific Research Institute in the cloud computing Big Data age