Understanding of closures in spark

Source: Internet
Author: User
Tags closure

Conceptual understanding: Functions can access variables outside the function, but changes to variables within the function are not visible outside the function.

RDD related operations need to pass in a custom closure function (closure), if the function needs to access external variables, then you need to follow a certain rule, or throw a run-time exception. When a closure function is passed into a node, the following steps are required: The driver, through reflection, finds all the variables that are accessed by the closure, marshals it into an object, serializes the object, transfers the serialized object over the network to the Worker node, and the worker node deserializes the closure object; The worker node executes the closure function.

Note: Changes to external variables within closures are not fed back to the driver.
In short, it is through the network, passing functions to the Worker node, and then executing. So the passed variables must be serializable, otherwise the delivery fails. When executed locally, the four steps above will still be performed.

Broadcast mechanisms can also do this, but frequent use of broadcasts will make the code less concise, and the purpose of the broadcast design is to cache large data on the node, avoid multiple data transfer, improve computational efficiency, rather than for external variable access.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.