Spark Shuffle out-of-heap memory overflow problem and resolution (Shuffle communication principle)

Source: Internet
Author: User
Tags shuffle

Spark Shuffleout-of-heap memory overflow problem and resolution (Shufflecommunication principle)Problem description

Spark-1.6.0 was already in January release, in order to verify its performance, I used some large SQL to verify its performance, some of the SQL The Shuffle Failure issue occurs with detailed stack information as follows:

16/02/17 15:36:36 WARN Server. Transportchannelhandler:exception in connection from/10.196.134.220:7337

Java.lang.OutOfMemoryError:Direct Buffer Memory

At Java.nio.Bits.reserveMemory (bits.java:658)

At Java.nio.directbytebuffer.<init> (directbytebuffer.java:123)

At Java.nio.ByteBuffer.allocateDirect (bytebuffer.java:306)

At Io.netty.buffer.poolarena$directarena.newchunk (poolarena.java:645)

At Io.netty.buffer.PoolArena.allocateNormal (poolarena.java:228)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:212)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:132)

At Io.netty.buffer.PooledByteBufAllocator.newDirectBuffer (pooledbytebufallocator.java:271)

At Io.netty.buffer.AbstractByteBufAllocator.directBuffer (abstractbytebufallocator.java:155)

At Io.netty.buffer.AbstractByteBufAllocator.directBuffer (abstractbytebufallocator.java:146)

At Io.netty.buffer.AbstractByteBufAllocator.ioBuffer (abstractbytebufallocator.java:107)

At Io.netty.channel.adaptiverecvbytebufallocator$handleimpl.allocate (adaptiverecvbytebufallocator.java:104)

At Io.netty.channel.nio.abstractniobytechannel$niobyteunsafe.read (abstractniobytechannel.java:117)

At Io.netty.channel.nio.NioEventLoop.processSelectedKey (nioeventloop.java:511)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (nioeventloop.java:468)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeys (nioeventloop.java:382)

At Io.netty.channel.nio.NioEventLoop.run (nioeventloop.java:354)

At Io.netty.util.concurrent.singlethreadeventexecutor$2.run (singlethreadeventexecutor.java:111)

At Java.lang.Thread.run (thread.java:744)

As you can see from the failure message, there is an out-of-heap memory overflow problem, why is there an out-of-heap memory overflow?

Sparkof theShufflepart of the use ofNettyFramework for network transmission, butNettywill request the out-of-heap memory cache (Pooledbytebufallocator, Abstractbytebufallocator);Shuffle, eachReduceall need to get eachMapthe corresponding output, when aReduceneed to get aMapdata is relatively large (e.g.1G), you will apply for a1Gof out-of-heap memory, and there is a limit to the amount of out-of-heap memory that occurs when a heap outside memory overflows.

Shuffledo not use out -of-heap memory

Adding configuration -dio.netty.nounsafe=true for Executor allows shuffle to not use out-of-heap memory , but the same job still appeared OOM, this way can't solve the problem.

Java.lang.OutOfMemoryError:Java Heap Space

At Io.netty.buffer.poolarena$heaparena.newunpooledchunk (poolarena.java:607)

At Io.netty.buffer.PoolArena.allocateHuge (poolarena.java:237)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:215)

At Io.netty.buffer.PoolArena.allocate (poolarena.java:132)

At Io.netty.buffer.PooledByteBufAllocator.newHeapBuffer (pooledbytebufallocator.java:256)

At Io.netty.buffer.AbstractByteBufAllocator.heapBuffer (abstractbytebufallocator.java:136)

At Io.netty.buffer.AbstractByteBufAllocator.heapBuffer (abstractbytebufallocator.java:127)

At Io.netty.buffer.CompositeByteBuf.allocBuffer (compositebytebuf.java:1347)

At io.netty.buffer.CompositeByteBuf.consolidateIfNeeded (compositebytebuf.java:276)

At Io.netty.buffer.CompositeByteBuf.addComponent (compositebytebuf.java:116)

At Org.apache.spark.network.util.TransportFrameDecoder.decodeNext (transportframedecoder.java:148)

At Org.apache.spark.network.util.TransportFrameDecoder.channelRead (transportframedecoder.java:82)

At Io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead (abstractchannelhandlercontext.java:308)

At Io.netty.channel.AbstractChannelHandlerContext.fireChannelRead (abstractchannelhandlercontext.java:294)

At Io.netty.channel.DefaultChannelPipeline.fireChannelRead (defaultchannelpipeline.java:846)

At Io.netty.channel.nio.abstractniobytechannel$niobyteunsafe.read (abstractniobytechannel.java:131)

At Io.netty.channel.nio.NioEventLoop.processSelectedKey (nioeventloop.java:511)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized (nioeventloop.java:468)

At Io.netty.channel.nio.NioEventLoop.processSelectedKeys (nioeventloop.java:382)

Can write disk directly when data volume is large

When the amount of Shuffle data in MapReduce is large, the Shuffle data is written to disk.

Spark Shuffle Communication mechanism

Shows the communication principle of Shuffle.

The service side will start shuffle_service.

(1) client code call stack

Blockstoreshufflereader.read

Shuffleblockfetcheriterator.sendrequest

Externalshuffleclient.fetchblocks

Oneforoneblockfetcher.start

Transportclient.sendrpc

Send rpcrequest (openblocks) information

(2) service-side code call stack

Transportrequesthandler.processrpcrequest

Externalshuffleblockhandler.receive

Externalshuffleblockhandler.handlemessage

Externalshuffleblockresolver.getblockdata (Shuffle_shuffleid_mapid_reduceid)

Externalshuffleblockresolver.getsortbasedshuffleblockdata

Filesegmentmanagedbuffer

Handlemessagewill put the requiredAppIDof aExecutorneed to beFetchof theBlockall packaged intolist<managedbuffer>, and then register as aStream, and then putStreamidand theBlockidReturned to the client , the last information returned to the client isRpcresponse (Streamhandle (Streamid, msg.blockIds.length)).

(3) Client

After the client receives the rpcresponse , it calls for each blockid :

Transportclient.fetchchunk

Send chunkfetchrequest (Streamchunkid (Streamid, Chunkindex))

(4) service side

Transportrequesthandler.processfetchrequest

Oneforonestreammanager.getchunk

Return respond (new Chunkfetchsuccess (Req.streamchunkid, buf)) to the client, BUF is a blockid filesegmentmanagedbuffer.

(5) client

OneForOneBlockFetcher.ChunkCallback.onSuccess

Listener.onblockfetchsuccess (Blockids[chunkindex], buffer)

ShuffleBlockFetcherIterator.sendRequest.BlockFetchingListener.onBlockFetchSuccess

Results.put (New Successfetchresult (Blockid (blockid), Address, Sizemap (blockid), buf))

Another thread of the client

Shuffleblockfetcheriterator.next

(Result.blockid, New Bufferreleasinginputstream (Buf.createinputstream (), this))

principles of communication for Download files

There is also a stream communication protocol, in which the client first needs to construct the streamrequest request,streamrequest Contains the URL of the file you want to download .

(1) client Call stack

Executor.updatedependencies ...

Org.apache.spark.util.Utils.fetchFile

Org.apache.spark.util.Utils.doFetchFile

Nettyrpcenv.openchannel

Transportclient.stream

Send streamrequest (Streamid) Streamid is the directory for the file.

(2) service-side processing process

Transportrequesthandler.handle

Transportrequesthandler.processstreamrequest

Oneforonestreammanager.openstream

Return to new Streamresponse (Req.streamid, Buf.size (), buf)

(3) client processing flow

Transportresponsehandler.handle

Transportframedecoder.channelread

Transportframedecoder.feedinterceptor

Streaminterceptor.handle

Callback.ondata is NettyRpcEnv.FileDownloadCallback.onData

Then return to Client.stream (Parseduri.getpath (), callback) to utils.dofetchfile, and finally Org.apache.spark.util.Utils.downloadFile

Problem Analysis:

current spark shuffle fetch protocol due to the use of out-of-heap memory storage fetch fetch a map When the data is particularly large, it is easy to get out of heap memory oom netty The code that comes with , we cannot modify it.

On the other hand,Stream is the protocol that downloads the file, needs to provide the URLof the file,and Shuffle only gets a piece of data from the file, and does not know URL, so you cannot use the Stream interface directly.

Solution:

Add aFetchstreamcommunication protocols, inOneforoneblockfetcher, if aBlockless than100M(spark.shuffle.max.block.size.inmemory), use the original methodFetchdata, if greater than100MIs used, the newFetchstreamprotocol, service side in processingfetchstreamrequestand thefetchrequestthe difference is thatfetchstreamrequestreturns the data stream that the client writes to the local temporary file based on the amount of data returned, and then constructsFilesegmentmanagedbufferto the subsequent processing process.

Spark Shuffle out-of-heap memory overflow problem and resolution (Shuffle communication principle)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.