Flink Distributed Cache Distributed caching

Source: Internet
Author: User

Flink provides a distributed cache, similar to Hadoop, that allows users to easily read local files in parallel functions. This feature can be used to share files, including static external data, such as dictionaries or machine-learned regression models.

This caching works as follows: The program registers a file or directory (local or remote file system, such as HDFs or S3), registers the cache file with executionenvironment and names it. When the program executes, Flink automatically copies files or directories to the local file system of all worker nodes. The user function can find the file or directory through this specified name, and then access it from the local file system of the worker node.

The following example uses distributed caching:

Java code:

Registering files or directories in Executionenvironment

Executionenvironment env = Executionenvironment.getexecutionenvironment ();

Register a file from HDFs
env.registercachedfile ("Hdfs:///path/to/your/file", "Hdfsfile")

//Register a locally executable script file
Env.registercachedfile ("File:///path/to/exec/file", "Localexecfile", True)

//Define program code and execute
...
dataset<string> input = ...
dataset<integer> result = Input.map (new Mymapper ());
...
Env.execute ();

Access the cache file or directory (here is a map function) in the user function. This function must inherit richfunction because it needs to read the data using Runtimecontext

Inherit richfunction in order to get Runtimecontext public
final class Mymapper extends Richmapfunction<string, integer> { c1/> @Override public
    Void open (Configuration config) {

      //access cache file by Runtimecontext and Distributedcache
      MyFile = Getruntimecontext (). Getdistributedcache (). GetFile ("Hdfsfile");
      Read file (or local directory) ...
    }

    @Override Public
    Integer map (String value) throws Exception {
      //Use the contents of the cache file to do some processing
      ...
    }
}



Scala code:

Registering files or directories in Executionenvironment

Val env = executionenvironment.getexecutionenvironment

//Register a file from HDFs
env.registercachedfile ("Hdfs:///path /to/your/file "," Hdfsfile ")

//Register a locally executable script file
env.registercachedfile (" File:///path/to/exec/file "," Localexecfile ", True)

//Define program code and execute
...
Val input:dataset[string] = ...
Val Result:dataset[integer] = Input.map (new Mymapper ()) ...
Env.execute ()

Access the cache file or directory (here is a map function) in the user function. This function must inherit richfunction because it needs to read the data using Runtimecontext

Inherit richfunction in order to get Runtimecontext
class Mymapper extends Richmapfunction[string, Int] {

  override def open ( config:configuration): Unit = {

    //access cache file via Runtimecontext and Distributedcache
    val myfile:file = Getruntimecont Ext.getDistributedCache.getFile ("Hdfsfile")
    //Read file (or local directory) ...
  }

  Override def map (value:string): Int = {
    //Use the contents of the cache file to do some processing
    ...
  }
}

for more information on big data, videos and technical exchanges, please Dabigatran:

QQ Group No. 1:295,505,811 (full)

QQ Group number 2:54902210

QQ Group number 3:555684318






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.