6 of Apache Spark Source code reading-storage subsystem analysis

Source: Internet
Author: User
Wedge

One of the reasons why spark is much faster than hadoop is that the intermediate results are cached in memory rather than directly written to disk. This article attempts to analyze the composition of the storage subsystem in spark, taking Data Writing and Data Reading as an example, the interaction between various components in the storage subsystem is clearly described.

Storage subsystem Overview

Is the relationship between several main modules in the spark storage subsystem.

  • Cachemanager RDD obtains data through cachemanager and stores the computing results through cachemanager.
  • Blockmanager cachemanager depends on the blockmanager interface for Data Reading and access. blockmanager determines whether data is obtained from memory or diskstore.
  • Memorystore stores or reads data from memory.
  • Diskstore is responsible for writing data to or reading data from a disk
  • Writing blockmanagerworker data to the local memorystore or diskstore is a synchronization operation. For fault tolerance, you also need to copy the data to another computing node to prevent data loss and recovery, data replication is performed asynchronously. blockmanagerworker is used to process this part of data.
  • Connectionmanager is responsible for establishing connections with other computing nodes and sending and receiving data.
  • Blockmanagermaster note that this module only runs on the executor where the driver application is located. The function is to record the slaveworker on which all blockids are stored. For example, the RDD task runs on machine, the required blockid is 3, but there is no value of blockid 3 on machine A. In this case, slave worker needs to use blockmanager to ask blockmanagermaster about the data storage location, then, use connectionmanager to obtain the information. For more information, see"Remote data retrieval"
Supported operations

Because blockmanager is used for actual storage control, when talking about supported operations, the public API in blockmanager is used as an example.

  • Put Data Writing
  • Get Data Reading
  • Remoterdd data deletion. Once the entire job is completed, all intermediate computing results can be deleted.
Startup Process Analysis

The above modules are created by sparkenv.Sparkenv. CreateComplete

    val blockManagerMaster = new BlockManagerMaster(registerOrLookup(      "BlockManagerMaster",      new BlockManagerMasterActor(isLocal, conf)), conf)    val blockManager = new BlockManager(executorId, actorSystem, blockManagerMaster, serializer, conf)    val connectionManager = blockManager.connectionManager    val broadcastManager = new BroadcastManager(isDriver, conf)    val cacheManager = new CacheManager(blockManager)

This code is confusing. It seems that blockmanagermasteractor has been created on all cluster nodes. In fact, it is not. Check the implementation of the registerorlookup function carefully.If the current node is a driver, the actor is created; otherwise, the connection to the driver is established.

    def registerOrLookup(name: String, newActor: => Actor): ActorRef = {      if (isDriver) {        logInfo("Registering " + name)        actorSystem.actorOf(Props(newActor), name = name)      } else {        val driverHost: String = conf.get("spark.driver.host", "localhost")        val driverPort: Int = conf.getInt("spark.driver.port", 7077)        Utils.checkHost(driverHost, "Expected hostname")        val url = s"akka.tcp://[email protected]$driverHost:$driverPort/user/$name"        val timeout = AkkaUtils.lookupTimeout(conf)        logInfo(s"Connecting to $name: $url")        Await.result(actorSystem.actorSelection(url).resolveOne(timeout), timeout)      }    }

One of the main actions during initialization is that blockmanager needs to initiate registration to blockmanagermaster.

Data Writing Process Analysis

Brief Data Writing Process

  1. RDD. iterator is the entry for interaction with the storage subsystem
  2. Cachemanager. getorcompute calls the put interface of blockmanager to write data.
  3. Data is first written to memorystore, that is, memory. If the data in memorystore is full, the data that is not frequently used is written to the disk.
  4. Notify blockmanagermaster to write new data and save the metadata in blockmanagermaster.
  5. Synchronize the written data with other slave worker. Generally, data written to the local machine is backed up by another machine, that is, replicanumber = 1.
Serialization or not

The specific content written can be serialized bytes or non-serialized values. Here we have an understanding of the either, left, right keywords in Scala syntax.

Data read Process Analysis
 def get(blockId: BlockId): Option[Iterator[Any]] = {    val local = getLocal(blockId)    if (local.isDefined) {      logInfo("Found block %s locally".format(blockId))      return local    }    val remote = getRemote(blockId)    if (remote.isDefined) {      logInfo("Found block %s remotely".format(blockId))      return remote    }    None  }
Local read

First, check whether the required block data exists in the memorystore and diskstore of the Local Machine. If not, initiate a remote data acquisition.

Remote reading

Remotely obtain the call path, getremote-> dogetremote. The most important thing in dogetremote is to callBlockmanagerworker. syncgetblockTo obtain data remotely.

def syncGetBlock(msg: GetBlock, toConnManagerId: ConnectionManagerId): ByteBuffer = {    val blockManager = blockManagerWorker.blockManager    val connectionManager = blockManager.connectionManager    val blockMessage = BlockMessage.fromGetBlock(msg)    val blockMessageArray = new BlockMessageArray(blockMessage)    val responseMessage = connectionManager.sendMessageReliablySync(        toConnManagerId, blockMessageArray.toBufferMessage)    responseMessage match {      case Some(message) => {        val bufferMessage = message.asInstanceOf[BufferMessage]        logDebug("Response message received " + bufferMessage)        BlockMessageArray.fromBufferMessage(bufferMessage).foreach(blockMessage => {            logDebug("Found " + blockMessage)            return blockMessage.getData          })      }      case None => logDebug("No response message received")    }    null  }

The most interesting part of the above Code isSendmessagereliablysync,Remote Data Reading is undoubtedly an asynchronous I/O operation. How can the code be written here is like a synchronous operation. That is to say, how do you know the response sent from the recipient?

Don't worry. Continue to check the sendmessagereliablysync definition.

def sendMessageReliably(connectionManagerId: ConnectionManagerId, message: Message)      : Future[Option[Message]] = {    val promise = Promise[Option[Message]]    val status = new MessageStatus(      message, connectionManagerId, s => promise.success(s.ackMessage))    messageStatuses.synchronized {      messageStatuses += ((message.id, status))    }    sendMessage(connectionManagerId, message)    promise.future  }

If I say the secret is here, you will definitely say that I am talking nonsense, but it is true here. Note that the keywords promise and future do not exist.

If the future is completed, S. ackmessage is returned. Let's see where this ackmessage was written. Take a lookConnectionmanager. handlemessageCode snippets in

case bufferMessage: BufferMessage => {        if (authEnabled) {          val res = handleAuthentication(connection, bufferMessage)          if (res == true) {            // message was security negotiation so skip the rest            logDebug("After handleAuth result was true, returning")            return          }        }        if (bufferMessage.hasAckId) {          val sentMessageStatus = messageStatuses.synchronized {            messageStatuses.get(bufferMessage.ackId) match {              case Some(status) => {                messageStatuses -= bufferMessage.ackId                status              }              case None => {                throw new Exception("Could not find reference for received ack message " +                  message.id)                null              }            }          }          sentMessageStatus.synchronized {            sentMessageStatus.ackMessage = Some(message)            sentMessageStatus.attempted = true            sentMessageStatus.acked = true            sentMessageStaus.markDone()          }

Note thatSentmessagestatus. markdoneTheSendmessagereliablysyncThe promise. Success defined in. Take a look at the definition of messagestatus.

 class MessageStatus(      val message: Message,      val connectionManagerId: ConnectionManagerId,      completionHandler: MessageStatus => Unit) {    var ackMessage: Option[Message] = None    var attempted = false    var acked = false    def markDone() { completionHandler(this) }  }

Now I want to clarify the call relationship. The future and promise in Scala are still a little difficult to understand.

Tachyonstore

In the latest spark source code, the storage subsystem introduces tachyonstore. tachyonstore implements the HDFS file system interface in the memory. The main purpose is to use the memory as the data persistence layer as much as possible to avoid excessive disk read/write operations.

For more information about the functions of this module, see the http://www.meetup.com/spark-users/events/117307472/

Summary

A little doubt, in the spark storage subsystem, the data transmitted in the communication module is "Heartbeat detection message", "Data Synchronization message", and "data retrieval and other information flows ". If possible, you need to detach the NIC used for heartbeat detection and data synchronization, that is, data retrieval, to improve reliability.

References
  1. Spark Source Code Analysis-storage module http://jerryshao.me/architecture/2013/10/08/spark-storage-module-analysis/

  2. Http://www.slideshare.net/rxin/a-tachyon-2013-0509sparkmeetup? Qid = 39ee582d-e0bf-41d2-ab01-dc2439abc626 & V = default & B = & from_search = 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.