[Linux] shell multi-process concurrency-Details

Source: Internet
Author: User
Tags hadoop fs

[Linux] shell multi-process concurrency-Details
Business background

The schedule. sh script schedules the execution of the user's trajectory project script. Part of the Code is as follows:

#! /Bin/bashsource/etc/profile; export userTrackPathCollectHome =/home/pms/bigDataEngine/analysis/script/usertrack/master/pathCollect ##################### ########### process ########################### ##### verify that lines = 'hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/ $ yesterday | wc-l 'if [$ lines-le 0]; then echo 'error! Artificial product is not exist 'exit 1 else echo 'artificial product is OK !!!!!! 'Fi # verify whether lines = 'hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$ yesterday exists in related product data sources of machine matching | wc-l 'if [$ lines-le 0]; then echo 'error! Mix product is not exist 'exit 1 else echo 'mix product is OK !!!!!! 'Fi ################################ process B #### ########################## generate a group buying information table, currently, only group buying IDs and product IDs are captured. sh $ userTrackPathCollectHome/scripts/extract_groupon_info.shlines = 'hadoop fs-ls/user/hive/pms/extract_groupon_info | wc-l' if [$ lines- le 0]; then echo 'error! Groupon info is not exist 'exit 4 else echo 'groupon info is OK !!!!! 'Fi # generate product series, the total file size is around mb. sh $ userTrackPathCollectHome/scripts/extract_product_serial.shlines = 'hadoop fs-ls/user/hive/pms/product_serial_id | wc-l' if [$ lines-le 0]; then echo 'error! Product serial is not exist 'exit 5 else echo 'product serial is OK !!!!! 'Fi # generate the extract_trfc_page_kpi table through preprocessing -- used to collect statistics on the pv count and uv count of the page according to pageId sh $ userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh $ datelines = 'hadoop fs/ hive/pms/extract_trfc_page_kpi/ds = $ date | wc-l 'if [$ lines-le 0]; then echo 'error! Extract_trfc_page_kpi is not exist 'exit 6 else echo 'extract _ trfc_page_kpi is OK !!!!!! 'Fi # synchronize term_category to hive, convert the foreground category to the background category sh $ userTrackPathCollectHome/scripts/extract_term_category.shlines = 'hadoop fs-ls/user/hive/pms/temp_term_category | wc-l' if [$ lines-le 0]; then echo 'error! Temp_term_category is not exist 'exit 7 else echo 'temp _ term_category is OK !!!!!! 'Fi ################################ Process C #### ########################## generate the extract_track_info table sh $ userTrackPathCollectHome/scripts/extract_track_info.shlines =' hadoop fs-ls/user/hive/warehouse/extract_track_info | wc-l 'if [$ lines-le 0]; then echo 'error! Extract_track_info is not exist 'exit 1 else echo 'extract _ track_info is OK !!!!! 'Fi...

As shown above, it takes 55 minutes to complete the script execution during the entire preprocessing process.

Optimization

The above script execution process can be divided into three processes:

Process A-> process B-> Process C

ConsideringProcess BEach subtask in does not affect each other, so there is no need to execute it in sequence. The idea of optimization isProcess BThe sub-tasks that do not affect each other are executed in parallel.
In fact, linux does not concurrently execute this specific command. The preceding concurrent execution actually puts these subtasks in the background for execution, so that the so-called "concurrent execution" can be realized ", the script transformation is as follows:

#! /Bin/bashsource/etc/profile; export userTrackPathCollectHome =/home/pms/bigDataEngine/analysis/script/usertrack/master/pathCollect ##################### ########### process ########################### ##### verify that lines = 'hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/ $ yesterday | wc-l 'if [$ lines-le 0]; then echo 'error! Artificial product is not exist 'exit 1 else echo 'artificial product is OK !!!!!! 'Fi # verify whether lines = 'hadoop fs-ls/user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$ yesterday exists in related product data sources of machine matching | wc-l 'if [$ lines-le 0]; then echo 'error! Mix product is not exist 'exit 1 else echo 'mix product is OK !!!!!! 'Fi ################################ process B #### ########################### concurrent processes, generate the group buying information table, currently, only group buying ID and product ID items are crawled. {sh $ userTrackPathCollectHome/scripts/extract_groupon_info.sh lines = 'hadoop fs-ls/user/hive/pms/extract_groupon_info | wc-l 'if [$ lines-le 0]; then echo 'error! Groupon info is not exist 'exit 4 else echo 'groupon info is OK !!!!! 'Fi} & # concurrent process, generation of product series, the total file size is around m {sh $ userTrackPathCollectHome/scripts/extract_product_serial.sh lines = 'hadoop fs-ls/user/hive/pms/product_serial_id | wc-l' if [$ lines-le 0]; then echo 'error! Product serial is not exist 'exit 5 else echo 'product serial is OK !!!!! 'Fi} & # concurrent processes, pre-process to generate the extract_trfc_page_kpi table -- used to collect statistics on the pv count and uv count of the page by pageId {sh $ userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh $ date lines = 'hadoop fs-ls /pms/extract_trfc_page_kpi/ds = $ date | wc-l 'if [$ lines-le 0]; then echo 'error! Extract_trfc_page_kpi is not exist 'exit 6 else echo 'extract _ trfc_page_kpi is OK !!!!!! 'Fi} & # concurrent processes, synchronizing term_category to hive, convert the foreground category to the background category {sh $ userTrackPathCollectHome/scripts/extract_term_category.sh lines = 'hadoop fs-ls/user/hive/pms/temp_term_category | wc-l' if [$ lines -le 0]; then echo 'error! Temp_term_category is not exist 'exit 7 else echo 'temp _ term_category is OK !!!!!! 'Fi} ################################# Process C ## ############################ wait until all the preceding background processes finish running wait echo 'End of backend jobs above !!!!!!!!!!!!!!!!!!!!!!!!!!!! '# Generate the extract_track_info table sh $ userTrackPathCollectHome/scripts/extract_track_info.shlines = 'hadoop fs-ls/user/hive/warehouse/extract_track_info | wc-l' if [$ lines-le 0]; then echo 'error! Extract_track_info is not exist 'exit 1 else echo 'extract _ track_info is OK !!!!! 'Fi

In the above scriptProcess BAll sub-tasks that do not affect each other are executed in the background, so as to achieve "concurrent execution", and to avoid disrupting the script execution process:

Process A-> process B-> Process C

You needProcess CAdd:

# Wait for all the background processes to finish wait

The purpose is to waitProcess BAll background processes are executed only after they are executed.Process C

Conclusion

After optimization, the execution time of the script is reduced from 55 minutes to 15 minutes, with remarkable results.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.