Spark Shuffleout-of-heap memory overflow problem and resolution (Shufflecommunication principle)Problem description
Spark-1.6.0 was already in January release, in order to verify its performance, I used some large SQL to verify its performance, some of the SQL The Shuffle Failure issue occurs with detailed stack information as follows:
16/02/17 15:36:36 WARN Server. Transportchannelhandler:exception in connection from/10.196.134.220:7337
Java.lang.OutOfMemoryError:Direct Buffer Memory
At Java.nio.Bits.reserveMemory (bits.java:658)
At Java.nio.directbytebuffer.<init> (directbytebuffer.java:123)
At Java.nio.ByteBuffer.allocateDirect (bytebuffer.java:306)
At Io.netty.buffer.poolarena$directarena.newchunk (poolarena.java:645)
At Io.netty.buffer.PoolArena.allocateNormal (poolarena.java:228)
At Io.netty.buffer.PoolArena.allocate (poolarena.java:212)
At Io.netty.buffer.PoolArena.allocate (poolarena.java:132)
At Io.netty.buffer.PooledByteBufAllocator.newDirectBuffer (pooledbytebufallocator.java:271)
At Io.netty.buffer.AbstractByteBufAllocator.directBuffer (abstractbytebufallocator.java:155)
At Io.netty.buffer.AbstractByteBufAllocator.directBuffer (abstractbytebufallocator.java:146)
At Io.netty.buffer.AbstractByteBufAllocator.ioBuffer (abstractbytebufallocator.java:107)
At Io.netty.channel.adaptiverecvbytebufallocator$handleimpl.allocate (adaptiverecvbytebufallocator.java:104)
At Io.netty.channel.nio.abstractniobytechannel$niobyteunsafe.read (abstractniobytechannel.java:117)
At Io.netty.channel.nio.NioEventLoop.processSelectedKey (nioeventloop.java:511)
At Io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (nioeventloop.java:468)
At Io.netty.channel.nio.NioEventLoop.processSelectedKeys (nioeventloop.java:382)
At Io.netty.channel.nio.NioEventLoop.run (nioeventloop.java:354)
At Io.netty.util.concurrent.singlethreadeventexecutor$2.run (singlethreadeventexecutor.java:111)
At Java.lang.Thread.run (thread.java:744)
As you can see from the failure message, there is an out-of-heap memory overflow problem, why is there an out-of-heap memory overflow?
Sparkof theShufflepart of the use ofNettyFramework for network transmission, butNettywill request the out-of-heap memory cache (Pooledbytebufallocator, Abstractbytebufallocator);Shuffle, eachReduceall need to get eachMapthe corresponding output, when aReduceneed to get aMapdata is relatively large (e.g.1G), you will apply for a1Gof out-of-heap memory, and there is a limit to the amount of out-of-heap memory that occurs when a heap outside memory overflows.
Shuffledo not use out -of-heap memory
Adding configuration -dio.netty.nounsafe=true for Executor allows shuffle to not use out-of-heap memory , but the same job still appeared OOM, this way can't solve the problem.
Java.lang.OutOfMemoryError:Java Heap Space
At Io.netty.buffer.poolarena$heaparena.newunpooledchunk (poolarena.java:607)
At Io.netty.buffer.PoolArena.allocateHuge (poolarena.java:237)
At Io.netty.buffer.PoolArena.allocate (poolarena.java:215)
At Io.netty.buffer.PoolArena.allocate (poolarena.java:132)
At Io.netty.buffer.PooledByteBufAllocator.newHeapBuffer (pooledbytebufallocator.java:256)
At Io.netty.buffer.AbstractByteBufAllocator.heapBuffer (abstractbytebufallocator.java:136)
At Io.netty.buffer.AbstractByteBufAllocator.heapBuffer (abstractbytebufallocator.java:127)
At Io.netty.buffer.CompositeByteBuf.allocBuffer (compositebytebuf.java:1347)
At io.netty.buffer.CompositeByteBuf.consolidateIfNeeded (compositebytebuf.java:276)
At Io.netty.buffer.CompositeByteBuf.addComponent (compositebytebuf.java:116)
At Org.apache.spark.network.util.TransportFrameDecoder.decodeNext (transportframedecoder.java:148)
At Org.apache.spark.network.util.TransportFrameDecoder.channelRead (transportframedecoder.java:82)
At Io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead (abstractchannelhandlercontext.java:308)
At Io.netty.channel.AbstractChannelHandlerContext.fireChannelRead (abstractchannelhandlercontext.java:294)
At Io.netty.channel.DefaultChannelPipeline.fireChannelRead (defaultchannelpipeline.java:846)
At Io.netty.channel.nio.abstractniobytechannel$niobyteunsafe.read (abstractniobytechannel.java:131)
At Io.netty.channel.nio.NioEventLoop.processSelectedKey (nioeventloop.java:511)
At Io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (nioeventloop.java:468)
At Io.netty.channel.nio.NioEventLoop.processSelectedKeys (nioeventloop.java:382)
Can write disk directly when data volume is large
When the amount of Shuffle data in MapReduce is large, the Shuffle data is written to disk.
Spark Shuffle Communication mechanism
Shows the communication principle of Shuffle.
The service side will start shuffle_service.
(1) client code call stack
Blockstoreshufflereader.read
Shuffleblockfetcheriterator.sendrequest
Externalshuffleclient.fetchblocks
Oneforoneblockfetcher.start
Transportclient.sendrpc
Send rpcrequest (openblocks) information
(2) service-side code call stack
Transportrequesthandler.processrpcrequest
Externalshuffleblockhandler.receive
Externalshuffleblockhandler.handlemessage
Externalshuffleblockresolver.getblockdata (Shuffle_shuffleid_mapid_reduceid)
Externalshuffleblockresolver.getsortbasedshuffleblockdata
Filesegmentmanagedbuffer
Handlemessagewill put the requiredAppIDof aExecutorneed to beFetchof theBlockall packaged intolist<managedbuffer>, and then register as aStream, and then putStreamidand theBlockidReturned to the client , the last information returned to the client isRpcresponse (Streamhandle (Streamid, msg.blockIds.length)).
(3) Client
After the client receives the rpcresponse , it calls for each blockid :
Transportclient.fetchchunk
Send chunkfetchrequest (Streamchunkid (Streamid, Chunkindex))
(4) service side
Transportrequesthandler.processfetchrequest
Oneforonestreammanager.getchunk
Return respond (new Chunkfetchsuccess (Req.streamchunkid, buf)) to the client, BUF is a blockid filesegmentmanagedbuffer.
(5) client
OneForOneBlockFetcher.ChunkCallback.onSuccess
Listener.onblockfetchsuccess (Blockids[chunkindex], buffer)
ShuffleBlockFetcherIterator.sendRequest.BlockFetchingListener.onBlockFetchSuccess
Results.put (New Successfetchresult (Blockid (blockid), Address, Sizemap (blockid), buf))
Another thread of the client
Shuffleblockfetcheriterator.next
(Result.blockid, New Bufferreleasinginputstream (Buf.createinputstream (), this))
principles of communication for Download files
There is also a stream communication protocol, in which the client first needs to construct the streamrequest request,streamrequest Contains the URL of the file you want to download .
(1) client Call stack
Executor.updatedependencies ...
Org.apache.spark.util.Utils.fetchFile
Org.apache.spark.util.Utils.doFetchFile
Nettyrpcenv.openchannel
Transportclient.stream
Send streamrequest (Streamid) Streamid is the directory for the file.
(2) service-side processing process
Transportrequesthandler.handle
Transportrequesthandler.processstreamrequest
Oneforonestreammanager.openstream
Return to new Streamresponse (Req.streamid, Buf.size (), buf)
(3) client processing flow
Transportresponsehandler.handle
Transportframedecoder.channelread
Transportframedecoder.feedinterceptor
Streaminterceptor.handle
Callback.ondata is NettyRpcEnv.FileDownloadCallback.onData
Then return to Client.stream (Parseduri.getpath (), callback) to utils.dofetchfile, and finally Org.apache.spark.util.Utils.downloadFile
Problem Analysis:
current spark shuffle fetch protocol due to the use of out-of-heap memory storage fetch fetch a map When the data is particularly large, it is easy to get out of heap memory oom netty The code that comes with , we cannot modify it.
On the other hand,Stream is the protocol that downloads the file, needs to provide the URLof the file,and Shuffle only gets a piece of data from the file, and does not know URL, so you cannot use the Stream interface directly.
Solution:
Add aFetchstreamcommunication protocols, inOneforoneblockfetcher, if aBlockless than100M(spark.shuffle.max.block.size.inmemory), use the original methodFetchdata, if greater than100MIs used, the newFetchstreamprotocol, service side in processingfetchstreamrequestand thefetchrequestthe difference is thatfetchstreamrequestreturns the data stream that the client writes to the local temporary file based on the amount of data returned, and then constructsFilesegmentmanagedbufferto the subsequent processing process.
Spark Shuffle out-of-heap memory overflow problem and resolution (Shuffle communication principle)