[Linux]shell Multi-process concurrency-detailed version

Last Update:2015-05-12 Source: Internet

Author: User

Tags shebang hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Business background

The schedule.sh script is responsible for scheduling the execution of the user's trajectory engineering script, and intercepts some of the code as follows:

#!/bin/bashSource/etc/profile;ExportUsertrackpathcollecthome=/home/pms/bigdataengine/analysis/script/usertrack/master/pathcollect################################ Process a################################ Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! Artificial product is not exist '    Exit 1Else    Echo ' Artificial product is OK!!!!!! 'fi# Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! mix product is not exist '    Exit 1Else    Echo ' mix product is OK!!!!!! 'fi################################ Process B################################ Generate Group purchase Information table, currently only grab group purchase ID, item ID two itemsSh$userTrackPathCollectHome/scripts/extract_groupon_info.shlines= ' Hadoop fs-ls/user/hive/pms/extract_groupon_info | Wc- L`if[$lines-le0] ; Then    Echo ' error! Groupon info is not exist '    Exit 4Else    Echo ' Groupon info is OK!!!!! 'fi# Generate a series of products, the total file size of about 320MSh$userTrackPathCollectHome/scripts/extract_product_serial.shlines= ' Hadoop fs-ls/user/hive/pms/product_serial_id | Wc- L`if[$lines-le0] ; Then    Echo ' error! product serial is not exist '    Exit 5Else    Echo ' product serial is OK!!!!! 'fi# preprocessing generates EXTRACT_TRFC_PAGE_KPI table--The number of PV, UV number of the page on which to summarize the statistics according to PageIDSh$userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh$dateLines= ' Hadoop fs-ls/user/hive/pms/extract_trfc_page_kpi/ds=$date| Wc- L`if[$lines-le0] ; Then    Echo ' error! extract_trfc_page_kpi is not exist '    Exit 6Else    Echo ' EXTRACT_TRFC_PAGE_KPI is OK!!!!!! 'fi# Synchronize Term_category to hive and convert foreground class to background classSh$userTrackPathCollectHome/scripts/extract_term_category.shlines= ' Hadoop fs-ls/user/hive/pms/temp_term_category | Wc- L`if[$lines-le0] ; Then    Echo ' error! temp_term_category is not exist '    Exit 7Else    Echo ' temp_term_category is OK!!!!!! 'fi################################ Process C################################ Generate Extract_track_info tableSh$userTrackPathCollectHome/scripts/extract_track_info.shlines= ' Hadoop fs-ls/user/hive/warehouse/extract_track_info | Wc- L`if[$lines-le0] ; Then    Echo ' error! extract_track_info is not exist '    Exit 1Else    Echo ' extract_track_info is OK!!!!! 'fi...

As above, it takes 55 minutes for the entire preprocessing session to complete the script execution.

Optimization

The above script execution process can be divided into three processes:

流程A->流程B->流程C

Given that each sub-task in process B does not affect each other, there is no need for sequential execution, and the idea of optimization is to execute those unrelated subtasks in process B in parallel.
In fact, Linux does not execute this particular command concurrently, the above-mentioned concurrent execution is actually put these subtasks in the background execution, so that the so-called "concurrent Execution", the script is modified as follows:

#!/bin/bashSource/etc/profile;ExportUsertrackpathcollecthome=/home/pms/bigdataengine/analysis/script/usertrack/master/pathcollect################################ Process a################################ Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! Artificial product is not exist '    Exit 1Else    Echo ' Artificial product is OK!!!!!! 'fi# Verify that the relevant product data source exists for the machine collocationLines= ' Hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday| Wc- L`if[$lines-le0] ; Then    Echo ' error! mix product is not exist '    Exit 1Else    Echo ' mix product is OK!!!!!! 'fi################################ Process B################################ Concurrent process, generate Group purchase information table, currently only grab group purchase ID, item ID two items{sh$userTrackPathCollectHome/scripts/extract_groupon_info.sh lines= ' Hadoop fs-ls/user/hive/pms/extract_groupon_info | Wc- L`if[$lines-le0] ; Then        Echo ' error! Groupon info is not exist '        Exit 4    Else        Echo ' Groupon info is OK!!!!! '    fi}&# Concurrent process, generate series of goods, total file size around 320M{sh$userTrackPathCollectHome/scripts/extract_product_serial.sh lines= ' Hadoop fs-ls/user/hive/pms/product_serial_id | Wc- L`if[$lines-le0] ; Then        Echo ' error! product serial is not exist '        Exit 5    Else        Echo ' product serial is OK!!!!! '    fi}&# Concurrent processes, preprocessing generates EXTRACT_TRFC_PAGE_KPI tables--for the number of PV and UV numbers on the page where summary statistics are performed by PageID{sh$userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh$dateLines= ' Hadoop fs-ls/user/hive/pms/extract_trfc_page_kpi/ds=$date| Wc- L`if[$lines-le0] ; Then        Echo ' error! extract_trfc_page_kpi is not exist '        Exit 6    Else        Echo ' EXTRACT_TRFC_PAGE_KPI is OK!!!!!! '    fi}&# Concurrent processes, synchronizing term_category to Hive, and converting foreground classes to back-end class entries{sh$userTrackPathCollectHome/scripts/extract_term_category.sh lines= ' Hadoop fs-ls/user/hive/pms/temp_term_category | Wc- L`if[$lines-le0] ; Then        Echo ' error! temp_term_category is not exist '        Exit 7    Else        Echo ' temp_term_category is OK!!!!!! '    fi}&################################ Process C################################ Wait for all the background processes above to finish executingWaitEcho ' End of backend jobs above!!!!!!!!!!!!!!!!!!!!!!!!!!!! '# Generate Extract_track_info tableSh$userTrackPathCollectHome/scripts/extract_track_info.shlines= ' Hadoop fs-ls/user/hive/warehouse/extract_track_info | Wc- L`if[$lines-le0] ; Then    Echo ' error! extract_track_info is not exist '    Exit 1Else    Echo ' extract_track_info is OK!!!!! 'fi

In the above script, all the non-affected subtasks in process B are executed in the background, resulting in "concurrent execution", and in order not to disrupt the execution process of the script:

流程A->流程B->流程C

You need to add the following before process C execution:

# 等待上面所有的后台进程执行结束

The purpose is to wait for all background processes in process B to complete before executing process C

Conclusion

After optimization, the execution time of the script, from 55 minutes to 15 minutes, has a significant effect.

[Linux]shell Multi-process concurrency-detailed version

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More