This paper is an interpretation of Mr Case: Map-join .
In Hadoop, there are several ways to use global variables or global file sharing
- Using the Set () method of the configuration, only suitable for scenes with relatively small data content
- Keep shared files on HDFs, read them every time, and be less efficient
- Put the shared file in Distributedcache, after Setup () initialization once, can be used multiple times, the disadvantage is that the modification operation is not supported, only can read
When you use Distributedcache to share some global configuration files or variables, you need to be aware of:
- The shared file must be uploaded to HDFs. "The default access protocol for shared files is (hdfs://)"
- Through Job.addcachefile (new Path (args[0]). Touri ()); method to load a shared file.
- The shared file is read and processed in the setup () initialization method of the Mapper class. The Setup () method executes only once during the execution of the job
- In the map () and reduce () methods, you can use the processed shared file
Distributedcache function: ① share the global cache file. ② the small table into the cache when performing some join operations to increase the efficiency of the connection.
Distributed Cache Distributedcache