Spark1.6 Version Issue Summary

Source: Internet
Author: User
Tags shuffle

This article will share some of the issues you encounter with the upgrade to spark 1.6 process

Memory IssuesSpillable Collection Memory Overflow

Specific error message for this issue: unable to acquire 163 bytes of memory, got 0. This exception is a spark that is actively thrown out when the application is not running memory, and is easily reproduced in a scenario where large amounts of data are processed. In Spark 1.6, the default shuffle is still the sort mode, which is actually the version after the merge of Tungsten-sort and sort, in order to elaborate on the root cause of this problem, first review the Tungsten-sort process (1).

Figure 1 The Tungsten-sort process

The key and value of the record are written in binary format to Bytearrayoutputstream, and the advantage of storing in binary form is that it reduces the time to serialize and deserialize. It then determines whether the current page has enough memory and writes to the current page if there is enough space (note: The page is a contiguous memory). After the page is written, the memory address and PartitionID are encoded into a 8-byte long record in Inmemorysorter. The current page or inmemorysorter memory is not enough time will go to request memory, if the memory is not enough to spill the current data to disk, if the application is not memory will be reported to the exception.

Why can't spill release memory? By adding the log information, it was found that a Spillable object requested a lot of memory, but did not register the name, belonging to the Heihu general existence.

The issue:spark-11293 was found in the community. Obviously found the problem but not solve, let people very helpless.

The issue was also present in the version of Spark 1.4 that was previously used, but did not actively throw errors and eventually evolved into executorlost. The specific implementations of spillable are Externalappendonlymap and externalsorter, and they are primarily responsible for aggregation and sequencing. Although its own spill mechanism, but its spill is not necessary, there is a very large probability that all the data will be stored in memory, and so on after the collection of all the data, will be freed memory. There are two ways to reduce the probability that it will occur without modifying the code:

    1. Increase the number of partition in the error phase, which reduces the average amount of data processed per task when the data distribution is more evenly distributed.
    2. Set the Spark.shuffle.spill.numElementsForceSpillThreshold, which is mainly used for testing, because the unit test can be clearly known to write a few test data. Its default value is Long.maxvalue, and when the number of inserted data bars exceeds this threshold, data is forcibly written to disk.

Both methods have obvious flaws, the first method does not apply to the data tilt scene, the second way users can not clearly know how much to set their own, adjust the size of no effect, the smaller will produce a lot of disk I/O overhead, need to repeatedly try. If the second way to change the size of the memory is more intuitive, and can be set as the default parameters, this is the simplest way to modify the code in total no more than 10 lines.

Modify the method as follows:

    1. Modify the Spillable interface so that it inherits from the Memoryconsumer object and legitimately applies the memory, never by the back door.
    2. Add the system parameter Spark.shuffle.spill.memoryForceSpillThreshold, the default value is set to 640m, modify the Spillable Maybespill method, when the currentmemory exceeds the threshold value when the data spill to disk.
Storage Memory footprint Running memory

Prior to spark 1.6, memory was divided into storage memory and running memory, which could not be borrowed from each other, and storage memory was wasted when the task was not cached. Spark 1.6 introduces unified memory management (Unified Management), by default Executor-memory 75% will be taken out for storage memory and running memory use (each half), storage memory and running memory can be borrowed from each other, Avoid the waste situation, effectively improve the memory utilization.

It sounds good, but when you run the task, you find a problem: storage memory takes up all the memory, shuffle it slowly out of memory, and wastes a load and culling time.

Why is this happening? Because the cache operation occurs before the shuffle operation, this time running memory is idle, it will run out of memory, and so on shuffle operation, no memory available, can only require storage memory to return.

To avoid this, make a slight change to the policy: storage memory cannot be borrowed from running memory, and running memory can borrow idle storage memory.

Specific modifications in the class Unifiedmemorymanager:

    1. Set the parameter spark.unifiedMemory.useStaticStorageRegion.
    2. Modify the Maxstoragememory function, which affects the maximum amount of memory that is displayed on the job UI.
def maxstoragememory: Long = synchronized {  if (usestaticstoragememory) {   Storageregionsize  Else {    maxmemory-onheapexecutionmemorypool.memoryused  }}

3. Modify the Acquirestoragememory function, add the following code, and modify the Maxborrowmemory value.

if (usestaticstoragememory &&  (Storageregionsize-storagememorypool. Poolsize) <                               onheapexecutionmemorypool. MemoryFree) {  maxborrowmemory = Storageregionsize-storagememorypool. Poolsize}
Blockmanager deadlock Problem

This problem occurs more in cache data than in scenarios where storage memory is insufficient. Use the Jstack command to print the thread information directly to show the deadlock, specific information please see this issue:spark-13566.

The cause of the problem is that the cached block block lacks a read-write lock, and when memory is insufficient, Blockmanager cleans the broadcast variable thread and executor task thread culling block and selects a block. And they lock each other in the object they need. Blockmanager locks the Memorymanager first, then requests that the Blockinfo,executor task locks the Blockinfo first, and then needs to lock Memorymanager when the block is removed in memory. Very typical of deadlocks.

After Jira opened a issue, the great God replied that in Spark 2.0 will be added to block the read-write lock protection, but the PR modification is very much, it is difficult to apply in spark 1.6, but the problem is very serious, need to cache data is more easily reproduced when the memory is large. A temporary workaround is as follows:

    1. In Blockmanager, add a global concurrenthashmap[blockid, Long] to record the task ID that locked the block, record before locking blockinfo, and access Memorymanager Delete record.
    2. Before locking the Blockinfo, you must first determine whether the Concurrenthashmap exists, and if it already exists, the function exits.
    3. Given the likelihood that a task might fail, all locks associated with that task ID are released after the task ends.
      The detailed code can see the PR in the issue.
Use excess memory by yarn to kill

It is believed that a friend who runs a spark program on yarn will encounter this problem: current usage:12.1 GB of physical memory used; 13.9 GB of GB virtual memory used. Killing container.

Under the same parameters, a program that runs normally on Spark 1.4, running on spark 1.6, is prone to this problem. When spark applied for memory to yarn, it applied for (Execuor memory + extra memory), the additional memory by default is 10% of executor memory, the minimum is 384M. In most cases, 10% of the extra memory is not enough, it is easy to be killed by yarn. I tried to set the parameter Spark.shuffle.io.preferDirectBufs to false, prohibit netty use of out-of-heap memory, but also tried to set the parameter spark.memory.fraction to 0.6 or even smaller, but did not have any effect.

The problem occurs very high frequency, by setting the parameter Spark.yarn.executor.memoryOverhead can be solved, but how much is the appropriate setting, let the user to set? The fewer parameters the user knows, the better, the more you can focus on the implementation of the business code. It's more reasonable to set it up in proportion, after all, there's no guarantee that everyone will not change the default configuration. Then modify the code to change the parameter to be set by scale. When the scale is set to 20%, the limit is reduced, and in a few cases it is super, but it does not cause the entire task to fail. This problem also occurs in the shuffle phase, by increasing the number of partition methods can be reduced.

Hopefully, spark developers will be able to focus on this issue and strictly control the use of out-of-heap memory. Divide the memory settings into executor memory + extra memory is also the original practice of spark, just like after the rent, but also to collect a garbage disposal fee. This is a kind of memory use estimation is not accurate frustration, has been advertised itself as a memory computing framework, but not a good limit on the use of memory. Expect Spark to one day be like MapReduce, Samza these frameworks set how much it is.

Other details Spit GrooveSeveral issues with the UI
    1. On the home page of Spark 1.3 and previous versions of Master, the workers list is the machine name that is displayed, and it is very frustrating to change from spark 1.4 to display the IP list. Occasionally a machine hangs, how to search out the boundless IP sea to come out this "the MH370 that loses the union"? If it can show like yarn The Dead node is also well handled, unfortunately not.
    2. On the Master home page, the workers list is on top, just imagine that every time you open a page, you have to cross hundreds of worker nodes to see the list of running applications.
    3. After executing SQL, you will see a SQL tab in the job UI, but some programs do see more than one, SQL1, SQL2 ... And none of these interfaces have any data. Look at the source code and find that it has this incremental mechanism. To focus on SPARK-11206, merging this issue PR into the 1.6 branch solves multiple SQL tab problems, but there is still no way to display the contents of the SQL tab correctly on some programs.
Annoying tip information

Spark-sql startup time will always show a bunch of set spark.hive.version=1.2.1 information, the focus is not only annoying, with spark-sql–e or spark-sql–f Execute SQL after completion, It will output this information and the results together. This information is output to the standard output stream by looking at the source code. Issue was raised to the spark community, and no one has been processed until now. The code is in line 508 of the Clientwrapper file, just change the state.out to State.err.

Executor don't let the positive go on strike

Occasional executor hangs will cause an individual task fail, although the problem will not kill the task, but there is always someone to ask the wrong what happened? The spark code exits with System.exit (1) Unexpectedly, so the information seen from the driver is org.apache.hadoop.util.shell$exitcodeexception. No one can understand this hint, the user is really no way to ask, log on to the server to see the error message container output, found received Launchtask command but executor is null error message. Check the Jira, found a similar issue:spark-13112. The error is in the following location:

 Case Launchtask  (data) =  if null) {    LogError ("Received launchtask command but executor is null")    system.exit (1)

The reason is that the task was received by driver, but executor did not complete the initialization.

When did the executor finish initializing? It is only after receiving the message that the executor registration of driver is successful. Then there is a question, why wait to receive driver Executor registration success message before going to initialize it? When executor receives a message from driver to perform the task, it indicates that it has been registered successfully. Driver sending a message is asynchronous, it is possible to receive a message that executes the task before receiving a successful registration.

In the help community without fruit, so changed the two lines of code, first instantiate the good executor and then go to driver registration, the problem solved.

the joy and worry of Spark SQL

First of all, we tested a few sets of SQL statements provided by the user, and spark 1.6 has more than 20% performance gains over spark 1.4!

Worry is that there is no big change in the grammatical compatibility, error hints are still difficult to read. The problem of a type conversion was soon discovered: SPARK-13772. Test found: If statement can not exist in both double and decimal two different types, or will be error. This statement is normal on spark 1.4. After viewing the source code, it is found that it is caused by using the wrong function when matching, and the detailed code can see the PR contained in the issue.

Concluding remarks

Upgrading from spark 1.4 to spark 1.6 is an obvious performance boost, and users generally reflect that tasks are running faster. Encountered a lot of problems, especially in terms of memory, as of the latest release of Spark 1.6.1, the above problems are not resolved, there are similar problems please refer to the method I provide or issue in the PR self-solve.

Spark is moving in the direction of fine management memory, and the code churn in this part of memory management is very large, and in the present situation, it is out of control, and two outstanding issues are shown as:

    1. The logic of memory application is obviously defective, which can not be released by spill mechanism when it is insufficient, but only by initiating the suicidal way to end the task. Memory consumer A cannot require consumer B to be released when it is not available, and can only be released by writing data to disk, because all memory consumers ' spill functions have a check code to prevent other memory consumers from triggering their own spill functions. Even if memory consumer a frees up memory, it may not be able to apply, it also needs to be limited by the maximum memory that the task can use (maximum available memory = number of tasks running in memory/running).
    2. The use of out-of-heap memory is in an unregulated state and can only be avoided by adding memory at this point. From a code perspective, the community has a plan to manage the out-of-heap memory allocated through Bytebuffer, but that part of the work is not yet complete.
For more information, please visit the Superman Academy website Http://www.crxy.cn?sxy or follow the Superman Academy number: CRXY-CN


Spark1.6 Version Issue Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.