[Linux]shell Multi-process concurrency-detailed version

Source: Internet
Author: User
Tags shebang hadoop fs

Business background

The schedule.sh script is responsible for scheduling the execution of the user's trajectory engineering script, and intercepts some of the code as follows:

#!/bin/bashSource/etc/profile;ExportUsertrackpathcollecthome=/home/pms/bigdataengine/analysis/script/usertrack/master/pathcollect################################ Process a################################ Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! Artificial product is not exist '    Exit 1Else    Echo ' Artificial product is OK!!!!!! 'fi# Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! mix product is not exist '    Exit 1Else    Echo ' mix product is OK!!!!!! 'fi################################ Process B################################ Generate Group purchase Information table, currently only grab group purchase ID, item ID two itemsSh$userTrackPathCollectHome/scripts/extract_groupon_info.shlines= ' Hadoop fs-ls/user/hive/pms/extract_groupon_info | Wc- L`if[$lines-le0] ; Then    Echo ' error! Groupon info is not exist '    Exit 4Else    Echo ' Groupon info is OK!!!!! 'fi# Generate a series of products, the total file size of about 320MSh$userTrackPathCollectHome/scripts/extract_product_serial.shlines= ' Hadoop fs-ls/user/hive/pms/product_serial_id | Wc- L`if[$lines-le0] ; Then    Echo ' error! product serial is not exist '    Exit 5Else    Echo ' product serial is OK!!!!! 'fi# preprocessing generates EXTRACT_TRFC_PAGE_KPI table--The number of PV, UV number of the page on which to summarize the statistics according to PageIDSh$userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh$dateLines= ' Hadoop fs-ls/user/hive/pms/extract_trfc_page_kpi/ds=$date| Wc- L`if[$lines-le0] ; Then    Echo ' error! extract_trfc_page_kpi is not exist '    Exit 6Else    Echo ' EXTRACT_TRFC_PAGE_KPI is OK!!!!!! 'fi# Synchronize Term_category to hive and convert foreground class to background classSh$userTrackPathCollectHome/scripts/extract_term_category.shlines= ' Hadoop fs-ls/user/hive/pms/temp_term_category | Wc- L`if[$lines-le0] ; Then    Echo ' error! temp_term_category is not exist '    Exit 7Else    Echo ' temp_term_category is OK!!!!!! 'fi################################ Process C################################ Generate Extract_track_info tableSh$userTrackPathCollectHome/scripts/extract_track_info.shlines= ' Hadoop fs-ls/user/hive/warehouse/extract_track_info | Wc- L`if[$lines-le0] ; Then    Echo ' error! extract_track_info is not exist '    Exit 1Else    Echo ' extract_track_info is OK!!!!! 'fi...

As above, it takes 55 minutes for the entire preprocessing session to complete the script execution.

Optimization

The above script execution process can be divided into three processes:

流程A->流程B->流程C

Given that each sub-task in process B does not affect each other, there is no need for sequential execution, and the idea of optimization is to execute those unrelated subtasks in process B in parallel.
In fact, Linux does not execute this particular command concurrently, the above-mentioned concurrent execution is actually put these subtasks in the background execution, so that the so-called "concurrent Execution", the script is modified as follows:

#!/bin/bashSource/etc/profile;ExportUsertrackpathcollecthome=/home/pms/bigdataengine/analysis/script/usertrack/master/pathcollect################################ Process a################################ Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! Artificial product is not exist '    Exit 1Else    Echo ' Artificial product is OK!!!!!! 'fi# Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! mix product is not exist '    Exit 1Else    Echo ' mix product is OK!!!!!! 'fi################################ Process B################################ Concurrent process, generate Group purchase information table, currently only grab group purchase ID, item ID two items{sh$userTrackPathCollectHome/scripts/extract_groupon_info.sh lines= ' Hadoop fs-ls/user/hive/pms/extract_groupon_info | Wc- L`if[$lines-le0] ; Then        Echo ' error! Groupon info is not exist '        Exit 4    Else        Echo ' Groupon info is OK!!!!! '    fi}&# Concurrent process, generate series of goods, total file size around 320M{sh$userTrackPathCollectHome/scripts/extract_product_serial.sh lines= ' Hadoop fs-ls/user/hive/pms/product_serial_id | Wc- L`if[$lines-le0] ; Then        Echo ' error! product serial is not exist '        Exit 5    Else        Echo ' product serial is OK!!!!! '    fi}&# Concurrent processes, preprocessing generates EXTRACT_TRFC_PAGE_KPI tables--for the number of PV and UV numbers on the page where summary statistics are performed by PageID{sh$userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh$dateLines= ' Hadoop fs-ls/user/hive/pms/extract_trfc_page_kpi/ds=$date| Wc- L`if[$lines-le0] ; Then        Echo ' error! extract_trfc_page_kpi is not exist '        Exit 6    Else        Echo ' EXTRACT_TRFC_PAGE_KPI is OK!!!!!! '    fi}&# Concurrent processes, synchronizing term_category to Hive, and converting foreground classes to back-end class entries{sh$userTrackPathCollectHome/scripts/extract_term_category.sh lines= ' Hadoop fs-ls/user/hive/pms/temp_term_category | Wc- L`if[$lines-le0] ; Then        Echo ' error! temp_term_category is not exist '        Exit 7    Else        Echo ' temp_term_category is OK!!!!!! '    fi}&################################ Process C################################ Wait for all the background processes above to finish executingWaitEcho ' End of backend jobs above!!!!!!!!!!!!!!!!!!!!!!!!!!!! '# Generate Extract_track_info tableSh$userTrackPathCollectHome/scripts/extract_track_info.shlines= ' Hadoop fs-ls/user/hive/warehouse/extract_track_info | Wc- L`if[$lines-le0] ; Then    Echo ' error! extract_track_info is not exist '    Exit 1Else    Echo ' extract_track_info is OK!!!!! 'fi

In the above script, all the non-affected subtasks in process B are executed in the background, resulting in "concurrent execution", and in order not to disrupt the execution process of the script:

流程A->流程B->流程C

You need to add the following before process C execution:

# 等待上面所有的后台进程执行结束

The purpose is to wait for all background processes in process B to complete before executing process C

Conclusion

After optimization, the execution time of the script, from 55 minutes to 15 minutes, has a significant effect.

[Linux]shell Multi-process concurrency-detailed version

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.