6 of Apache Spark Source code reading-storage subsystem analysis

Last Update:2014-07-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Wedge

One of the reasons why spark is much faster than hadoop is that the intermediate results are cached in memory rather than directly written to disk. This article attempts to analyze the composition of the storage subsystem in spark, taking Data Writing and Data Reading as an example, the interaction between various components in the storage subsystem is clearly described.

Storage subsystem Overview

Is the relationship between several main modules in the spark storage subsystem.

Cachemanager RDD obtains data through cachemanager and stores the computing results through cachemanager.
Blockmanager cachemanager depends on the blockmanager interface for Data Reading and access. blockmanager determines whether data is obtained from memory or diskstore.
Memorystore stores or reads data from memory.
Diskstore is responsible for writing data to or reading data from a disk
Writing blockmanagerworker data to the local memorystore or diskstore is a synchronization operation. For fault tolerance, you also need to copy the data to another computing node to prevent data loss and recovery, data replication is performed asynchronously. blockmanagerworker is used to process this part of data.
Connectionmanager is responsible for establishing connections with other computing nodes and sending and receiving data.
Blockmanagermaster note that this module only runs on the executor where the driver application is located. The function is to record the slaveworker on which all blockids are stored. For example, the RDD task runs on machine, the required blockid is 3, but there is no value of blockid 3 on machine A. In this case, slave worker needs to use blockmanager to ask blockmanagermaster about the data storage location, then, use connectionmanager to obtain the information. For more information, see"Remote data retrieval"

Supported operations

Because blockmanager is used for actual storage control, when talking about supported operations, the public API in blockmanager is used as an example.

Put Data Writing
Get Data Reading
Remoterdd data deletion. Once the entire job is completed, all intermediate computing results can be deleted.

Startup Process Analysis

The above modules are created by sparkenv.Sparkenv. CreateComplete

    val blockManagerMaster = new BlockManagerMaster(registerOrLookup(      "BlockManagerMaster",      new BlockManagerMasterActor(isLocal, conf)), conf)    val blockManager = new BlockManager(executorId, actorSystem, blockManagerMaster, serializer, conf)    val connectionManager = blockManager.connectionManager    val broadcastManager = new BroadcastManager(isDriver, conf)    val cacheManager = new CacheManager(blockManager)

This code is confusing. It seems that blockmanagermasteractor has been created on all cluster nodes. In fact, it is not. Check the implementation of the registerorlookup function carefully.If the current node is a driver, the actor is created; otherwise, the connection to the driver is established.

    def registerOrLookup(name: String, newActor: => Actor): ActorRef = {      if (isDriver) {        logInfo("Registering " + name)        actorSystem.actorOf(Props(newActor), name = name)      } else {        val driverHost: String = conf.get("spark.driver.host", "localhost")        val driverPort: Int = conf.getInt("spark.driver.port", 7077)        Utils.checkHost(driverHost, "Expected hostname")        val url = s"akka.tcp://[email protected]$driverHost:$driverPort/user/$name"        val timeout = AkkaUtils.lookupTimeout(conf)        logInfo(s"Connecting to $name: $url")        Await.result(actorSystem.actorSelection(url).resolveOne(timeout), timeout)      }    }

One of the main actions during initialization is that blockmanager needs to initiate registration to blockmanagermaster.

Data Writing Process Analysis

Brief Data Writing Process

RDD. iterator is the entry for interaction with the storage subsystem
Cachemanager. getorcompute calls the put interface of blockmanager to write data.
Data is first written to memorystore, that is, memory. If the data in memorystore is full, the data that is not frequently used is written to the disk.
Notify blockmanagermaster to write new data and save the metadata in blockmanagermaster.
Synchronize the written data with other slave worker. Generally, data written to the local machine is backed up by another machine, that is, replicanumber = 1.

Serialization or not

The specific content written can be serialized bytes or non-serialized values. Here we have an understanding of the either, left, right keywords in Scala syntax.

Data read Process Analysis

 def get(blockId: BlockId): Option[Iterator[Any]] = {    val local = getLocal(blockId)    if (local.isDefined) {      logInfo("Found block %s locally".format(blockId))      return local    }    val remote = getRemote(blockId)    if (remote.isDefined) {      logInfo("Found block %s remotely".format(blockId))      return remote    }    None  }

Local read

First, check whether the required block data exists in the memorystore and diskstore of the Local Machine. If not, initiate a remote data acquisition.

Remote reading

Remotely obtain the call path, getremote-> dogetremote. The most important thing in dogetremote is to callBlockmanagerworker. syncgetblockTo obtain data remotely.

def syncGetBlock(msg: GetBlock, toConnManagerId: ConnectionManagerId): ByteBuffer = {    val blockManager = blockManagerWorker.blockManager    val connectionManager = blockManager.connectionManager    val blockMessage = BlockMessage.fromGetBlock(msg)    val blockMessageArray = new BlockMessageArray(blockMessage)    val responseMessage = connectionManager.sendMessageReliablySync(        toConnManagerId, blockMessageArray.toBufferMessage)    responseMessage match {      case Some(message) => {        val bufferMessage = message.asInstanceOf[BufferMessage]        logDebug("Response message received " + bufferMessage)        BlockMessageArray.fromBufferMessage(bufferMessage).foreach(blockMessage => {            logDebug("Found " + blockMessage)            return blockMessage.getData          })      }      case None => logDebug("No response message received")    }    null  }

The most interesting part of the above Code isSendmessagereliablysync,Remote Data Reading is undoubtedly an asynchronous I/O operation. How can the code be written here is like a synchronous operation. That is to say, how do you know the response sent from the recipient?

Don't worry. Continue to check the sendmessagereliablysync definition.

def sendMessageReliably(connectionManagerId: ConnectionManagerId, message: Message)      : Future[Option[Message]] = {    val promise = Promise[Option[Message]]    val status = new MessageStatus(      message, connectionManagerId, s => promise.success(s.ackMessage))    messageStatuses.synchronized {      messageStatuses += ((message.id, status))    }    sendMessage(connectionManagerId, message)    promise.future  }

If I say the secret is here, you will definitely say that I am talking nonsense, but it is true here. Note that the keywords promise and future do not exist.

If the future is completed, S. ackmessage is returned. Let's see where this ackmessage was written. Take a lookConnectionmanager. handlemessageCode snippets in

case bufferMessage: BufferMessage => {        if (authEnabled) {          val res = handleAuthentication(connection, bufferMessage)          if (res == true) {            // message was security negotiation so skip the rest            logDebug("After handleAuth result was true, returning")            return          }        }        if (bufferMessage.hasAckId) {          val sentMessageStatus = messageStatuses.synchronized {            messageStatuses.get(bufferMessage.ackId) match {              case Some(status) => {                messageStatuses -= bufferMessage.ackId                status              }              case None => {                throw new Exception("Could not find reference for received ack message " +                  message.id)                null              }            }          }          sentMessageStatus.synchronized {            sentMessageStatus.ackMessage = Some(message)            sentMessageStatus.attempted = true            sentMessageStatus.acked = true            sentMessageStaus.markDone()          }

Note thatSentmessagestatus. markdoneTheSendmessagereliablysyncThe promise. Success defined in. Take a look at the definition of messagestatus.

 class MessageStatus(      val message: Message,      val connectionManagerId: ConnectionManagerId,      completionHandler: MessageStatus => Unit) {    var ackMessage: Option[Message] = None    var attempted = false    var acked = false    def markDone() { completionHandler(this) }  }

Now I want to clarify the call relationship. The future and promise in Scala are still a little difficult to understand.

Tachyonstore

In the latest spark source code, the storage subsystem introduces tachyonstore. tachyonstore implements the HDFS file system interface in the memory. The main purpose is to use the memory as the data persistence layer as much as possible to avoid excessive disk read/write operations.

For more information about the functions of this module, see the http://www.meetup.com/spark-users/events/117307472/

Summary

A little doubt, in the spark storage subsystem, the data transmitted in the communication module is "Heartbeat detection message", "Data Synchronization message", and "data retrieval and other information flows ". If possible, you need to detach the NIC used for heartbeat detection and data synchronization, that is, data retrieval, to improve reliability.

References

Spark Source Code Analysis-storage module http://jerryshao.me/architecture/2013/10/08/spark-storage-module-analysis/
Http://www.slideshare.net/rxin/a-tachyon-2013-0509sparkmeetup? Qid = 39ee582d-e0bf-41d2-ab01-dc2439abc626 & V = default & B = & from_search = 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

6 of Apache Spark Source code reading-storage subsystem analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

6 of Apache Spark Source code reading-storage subsystem analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support