Background
Company data processing has two computing frames, single frame and Mr Framework. Now I've abstracted a set of API interface for business computing developers to use.
The operation schedule of API is implemented in two computing frames respectively. The application developer has time to upload the override configuration file. To adjust the number of business calculation parameters. The stand-alone framework is easy to implement. However, in the Mr Framework, the distribution of the override configuration file needs to be resolved.
Realize
1. Pass in the configuration file path through the command line;
2. The MR job client reads in the local configuration file and adds Distributedcache. The command-line parameters are appended to the MR Child JVM startup parameter array.
3. When the MR child JVM starts, it checks the startup parameters, discovers the configuration file, and the configuration file does not exist. Replace the configuration file path with the corresponding local path to the Distributedcache.
4. The child JVM job reads in the replaced configuration file and applies it to the Mr Job. Implement changes to the calculation parameters.
References
http://dongxicheng.org/mapreduce-nextgen/hadoop-distributedcache-details/
Hadoop Distributedcache use case