Spark Shuffle out-of-heap memory overflow problem and resolution (Shuffle communication principle)

Last Update:2016-03-07 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark Shuffleout-of-heap memory overflow problem and resolution (Shufflecommunication principle)Problem description

Spark-1.6.0 was already in January release, in order to verify its performance, I used some large SQL to verify its performance, some of the SQL The Shuffle Failure issue occurs with detailed stack information as follows:

16/02/17 15:36:36 WARN Server. Transportchannelhandler:exception in connection from/10.196.134.220:7337

Java.lang.OutOfMemoryError:Direct Buffer Memory

At Java.nio.Bits.reserveMemory (bits.java:658)

At Java.nio.directbytebuffer.<init> (directbytebuffer.java:123)

At Java.nio.ByteBuffer.allocateDirect (bytebuffer.java:306)

At Io.netty.buffer.poolarena$directarena.newchunk (poolarena.java:645)

At Io.netty.buffer.PoolArena.allocateNormal (poolarena.java:228)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:212)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:132)

At Io.netty.buffer.PooledByteBufAllocator.newDirectBuffer (pooledbytebufallocator.java:271)

At Io.netty.buffer.AbstractByteBufAllocator.directBuffer (abstractbytebufallocator.java:155)

At Io.netty.buffer.AbstractByteBufAllocator.directBuffer (abstractbytebufallocator.java:146)

At Io.netty.buffer.AbstractByteBufAllocator.ioBuffer (abstractbytebufallocator.java:107)

At Io.netty.channel.adaptiverecvbytebufallocator$handleimpl.allocate (adaptiverecvbytebufallocator.java:104)

At Io.netty.channel.nio.abstractniobytechannel$niobyteunsafe.read (abstractniobytechannel.java:117)

At Io.netty.channel.nio.NioEventLoop.processSelectedKey (nioeventloop.java:511)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (nioeventloop.java:468)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeys (nioeventloop.java:382)

At Io.netty.channel.nio.NioEventLoop.run (nioeventloop.java:354)

At Io.netty.util.concurrent.singlethreadeventexecutor$2.run (singlethreadeventexecutor.java:111)

At Java.lang.Thread.run (thread.java:744)

As you can see from the failure message, there is an out-of-heap memory overflow problem, why is there an out-of-heap memory overflow?

Sparkof theShufflepart of the use ofNettyFramework for network transmission, butNettywill request the out-of-heap memory cache (Pooledbytebufallocator, Abstractbytebufallocator);Shuffle, eachReduceall need to get eachMapthe corresponding output, when aReduceneed to get aMapdata is relatively large (e.g.1G), you will apply for a1Gof out-of-heap memory, and there is a limit to the amount of out-of-heap memory that occurs when a heap outside memory overflows.

Shuffledo not use out -of-heap memory

Adding configuration -dio.netty.nounsafe=true for Executor allows shuffle to not use out-of-heap memory , but the same job still appeared OOM, this way can't solve the problem.

Java.lang.OutOfMemoryError:Java Heap Space

At Io.netty.buffer.poolarena$heaparena.newunpooledchunk (poolarena.java:607)

At Io.netty.buffer.PoolArena.allocateHuge (poolarena.java:237)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:215)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:132)

At Io.netty.buffer.PooledByteBufAllocator.newHeapBuffer (pooledbytebufallocator.java:256)

At Io.netty.buffer.AbstractByteBufAllocator.heapBuffer (abstractbytebufallocator.java:136)

At Io.netty.buffer.AbstractByteBufAllocator.heapBuffer (abstractbytebufallocator.java:127)

At Io.netty.buffer.CompositeByteBuf.allocBuffer (compositebytebuf.java:1347)

At io.netty.buffer.CompositeByteBuf.consolidateIfNeeded (compositebytebuf.java:276)

At Io.netty.buffer.CompositeByteBuf.addComponent (compositebytebuf.java:116)

At Org.apache.spark.network.util.TransportFrameDecoder.decodeNext (transportframedecoder.java:148)

At Org.apache.spark.network.util.TransportFrameDecoder.channelRead (transportframedecoder.java:82)

At Io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead (abstractchannelhandlercontext.java:308)

At Io.netty.channel.AbstractChannelHandlerContext.fireChannelRead (abstractchannelhandlercontext.java:294)

At Io.netty.channel.DefaultChannelPipeline.fireChannelRead (defaultchannelpipeline.java:846)

At Io.netty.channel.nio.abstractniobytechannel$niobyteunsafe.read (abstractniobytechannel.java:131)

At Io.netty.channel.nio.NioEventLoop.processSelectedKey (nioeventloop.java:511)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (nioeventloop.java:468)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeys (nioeventloop.java:382)

Can write disk directly when data volume is large

When the amount of Shuffle data in MapReduce is large, the Shuffle data is written to disk.

Spark Shuffle Communication mechanism

Shows the communication principle of Shuffle.

The service side will start shuffle_service.

(1) client code call stack

Blockstoreshufflereader.read

Shuffleblockfetcheriterator.sendrequest

Externalshuffleclient.fetchblocks

Oneforoneblockfetcher.start

Transportclient.sendrpc

Send rpcrequest (openblocks) information

(2) service-side code call stack

Transportrequesthandler.processrpcrequest

Externalshuffleblockhandler.receive

Externalshuffleblockhandler.handlemessage

Externalshuffleblockresolver.getblockdata (Shuffle_shuffleid_mapid_reduceid)

Externalshuffleblockresolver.getsortbasedshuffleblockdata

Filesegmentmanagedbuffer

Handlemessagewill put the requiredAppIDof aExecutorneed to beFetchof theBlockall packaged intolist<managedbuffer>, and then register as aStream, and then putStreamidand theBlockidReturned to the client , the last information returned to the client isRpcresponse (Streamhandle (Streamid, msg.blockIds.length)).

(3) Client

After the client receives the rpcresponse , it calls for each blockid :

Transportclient.fetchchunk

Send chunkfetchrequest (Streamchunkid (Streamid, Chunkindex))

(4) service side

Transportrequesthandler.processfetchrequest

Oneforonestreammanager.getchunk

Return respond (new Chunkfetchsuccess (Req.streamchunkid, buf)) to the client, BUF is a blockid filesegmentmanagedbuffer.

(5) client

OneForOneBlockFetcher.ChunkCallback.onSuccess

Listener.onblockfetchsuccess (Blockids[chunkindex], buffer)

ShuffleBlockFetcherIterator.sendRequest.BlockFetchingListener.onBlockFetchSuccess

Results.put (New Successfetchresult (Blockid (blockid), Address, Sizemap (blockid), buf))

Another thread of the client

Shuffleblockfetcheriterator.next

(Result.blockid, New Bufferreleasinginputstream (Buf.createinputstream (), this))

principles of communication for Download files

There is also a stream communication protocol, in which the client first needs to construct the streamrequest request,streamrequest Contains the URL of the file you want to download .

(1) client Call stack

Executor.updatedependencies ...

Org.apache.spark.util.Utils.fetchFile

Org.apache.spark.util.Utils.doFetchFile

Nettyrpcenv.openchannel

Transportclient.stream

Send streamrequest (Streamid) Streamid is the directory for the file.

(2) service-side processing process

Transportrequesthandler.handle

Transportrequesthandler.processstreamrequest

Oneforonestreammanager.openstream

Return to new Streamresponse (Req.streamid, Buf.size (), buf)

(3) client processing flow

Transportresponsehandler.handle

Transportframedecoder.channelread

Transportframedecoder.feedinterceptor

Streaminterceptor.handle

Callback.ondata is NettyRpcEnv.FileDownloadCallback.onData

Then return to Client.stream (Parseduri.getpath (), callback) to utils.dofetchfile, and finally Org.apache.spark.util.Utils.downloadFile

Problem Analysis:

current spark shuffle fetch protocol due to the use of out-of-heap memory storage fetch fetch a map When the data is particularly large, it is easy to get out of heap memory oom netty The code that comes with , we cannot modify it.

On the other hand,Stream is the protocol that downloads the file, needs to provide the URLof the file,and Shuffle only gets a piece of data from the file, and does not know URL, so you cannot use the Stream interface directly.

Solution:

Add aFetchstreamcommunication protocols, inOneforoneblockfetcher, if aBlockless than100M(spark.shuffle.max.block.size.inmemory), use the original methodFetchdata, if greater than100MIs used, the newFetchstreamprotocol, service side in processingfetchstreamrequestand thefetchrequestthe difference is thatfetchstreamrequestreturns the data stream that the client writes to the local temporary file based on the amount of data returned, and then constructsFilesegmentmanagedbufferto the subsequent processing process.

Spark Shuffle out-of-heap memory overflow problem and resolution (Shuffle communication principle)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More