optimization of sorting algorithm based on mapreduce model
Jin Yu
MapReduce has developed into a parallel computing model for large data domain standards. Ideally, a mapreduce system should allow all nodes involved in the calculation to be highly balanced and minimize space usage, CPU and I/O usage, and network transport overhead. Traditional algorithms are usually only optimized for one of the above metrics. Based on the good parallelism of the algorithm, the design criterion of mapreduce optimization algorithm is put forward. According to the theory analysis of the most important sorting algorithm in the Data processing field, the last algorithm under the constraint of multiple indexes is given, and it is proved that the optimization algorithm satisfies the criterion of mapreduce optimization algorithm. Finally, the effectiveness and efficiency of the optimized sorting algorithm are verified by experiments.
optimization of sorting algorithm based on mapreduce model