Background
Company data processing has two computing frames, single frame and Mr Framework. Now I have abstracted a set of API interface for business computing developers to use. The implementation of API execution scheduling is carried out in two computing frames respectively. The application developer has time to adjust the business calculation parameters by uploading the override configuration file. The stand-alone framework is easy to implement, but in the Mr Framework, the distribution of the override configuration file needs to be addressed.
Realize
1. Pass in the configuration file path through the command line;
2. The Mr Job client reads in the local configuration file and joins Distributedcache, and appends the command line arguments to the MR Child JVM startup parameter array.
3. After the MR child JVM starts checking the startup parameters, discovers the configuration file, and the configuration file does not exist, replace the profile path with the local path of the Distributedcache.
4. The child JVM job reads in the replaced configuration file and applies it to the Mr Job to implement the modification of the calculation parameters.
Reference
http://dongxicheng.org/mapreduce-nextgen/hadoop-distributedcache-details/
Hadoop Distributedcache use case