[Linux]shell多進程並發

[Linux]shell多進程並發—詳細版

最後更新：2015-05-12 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

業務背景

schedule.sh指令碼負責調度使用者軌跡工程指令碼的執行，截取部分代碼如下：

#!/bin/bashsource /etc/profile;export userTrackPathCollectHome=/home/pms/bigDataEngine/analysis/script/usertrack/master/pathCollect################################ 流程A################################ 驗證機器搭配的相關商品資料來源是否存在lines=`hadoop fs -ls /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday | wc -l`if [ $lines -le 0 ] ;then    echo ‘Error! artificial product is not exist‘    exit 1else    echo ‘artificial product is ok!!!!!!‘fi# 驗證機器搭配的相關商品資料來源是否存在lines=`hadoop fs -ls /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday | wc -l`if [ $lines -le 0 ] ;then    echo ‘Error! mix product is not exist‘    exit 1else    echo ‘mix product is ok!!!!!!‘fi################################ 流程B################################ 產生團購資訊表,目前只抓取團購ID、商品ID兩項sh $userTrackPathCollectHome/scripts/extract_groupon_info.shlines=`hadoop fs -ls /user/hive/pms/extract_groupon_info | wc -l `if [ $lines -le 0 ] ;then    echo ‘Error! groupon info is not exist‘    exit 4else    echo ‘groupon info is ok!!!!!‘fi# 產生系列商品，總檔案大小在320M左右sh $userTrackPathCollectHome/scripts/extract_product_serial.shlines=`hadoop fs -ls /user/hive/pms/product_serial_id | wc -l `if [ $lines -le 0 ] ;then    echo ‘Error! product serial is not exist‘    exit 5else    echo ‘product serial is ok!!!!!‘fi# 預先處理產生extract_trfc_page_kpi表--用於按照pageId進行匯總統計所在頁面的pv數、uv數sh $userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh $datelines=`hadoop fs -ls /user/hive/pms/extract_trfc_page_kpi/ds=$date | wc -l`if [ $lines -le 0 ] ;then    echo ‘Error! extract_trfc_page_kpi is not exist‘    exit 6else    echo ‘extract_trfc_page_kpi is ok!!!!!!‘fi# 同步term_category到hive，並將前台類目轉換為後台類目sh $userTrackPathCollectHome/scripts/extract_term_category.shlines=`hadoop fs -ls /user/hive/pms/temp_term_category | wc -l`if [ $lines -le 0 ] ;then    echo ‘Error! temp_term_category is not exist‘    exit 7else    echo ‘temp_term_category is ok!!!!!!‘fi################################ 流程C################################ 產生extract_track_info表sh $userTrackPathCollectHome/scripts/extract_track_info.shlines=`hadoop fs -ls /user/hive/warehouse/extract_track_info | wc -l `if [ $lines -le 0 ] ;then    echo ‘Error! extract_track_info is not exist‘    exit 1else    echo ‘extract_track_info is ok!!!!!‘fi...

如上，整個預先處理環節指令碼執行完，需要耗時55分鐘。

最佳化

上面的指令碼執行流程可以分為三個流程：

流程A->流程B->流程C

考慮到流程B中的每個子任務都互不影響，因此沒有必要順序執行，最佳化的思路是將流程B中這些互不影響的子任務並存執行。
其實linux中並沒有並發執行這一特定命令，上面所說的並發執行實際上是將這些子任務放到後台執行，這樣就可以實現所謂的“並發執行”，指令碼改造如下：

#!/bin/bashsource /etc/profile;export userTrackPathCollectHome=/home/pms/bigDataEngine/analysis/script/usertrack/master/pathCollect################################ 流程A################################ 驗證機器搭配的相關商品資料來源是否存在lines=`hadoop fs -ls /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday | wc -l`if [ $lines -le 0 ] ;then    echo ‘Error! artificial product is not exist‘    exit 1else    echo ‘artificial product is ok!!!!!!‘fi# 驗證機器搭配的相關商品資料來源是否存在lines=`hadoop fs -ls /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday | wc -l`if [ $lines -le 0 ] ;then    echo ‘Error! mix product is not exist‘    exit 1else    echo ‘mix product is ok!!!!!!‘fi################################ 流程B################################ 並發進程，產生團購資訊表,目前只抓取團購ID、商品ID兩項{    sh $userTrackPathCollectHome/scripts/extract_groupon_info.sh    lines=`hadoop fs -ls /user/hive/pms/extract_groupon_info | wc -l `    if [ $lines -le 0 ] ;then        echo ‘Error! groupon info is not exist‘        exit 4    else        echo ‘groupon info is ok!!!!!‘    fi}&# 並發進程，產生系列商品，總檔案大小在320M左右{    sh $userTrackPathCollectHome/scripts/extract_product_serial.sh    lines=`hadoop fs -ls /user/hive/pms/product_serial_id | wc -l `    if [ $lines -le 0 ] ;then        echo ‘Error! product serial is not exist‘        exit 5    else        echo ‘product serial is ok!!!!!‘    fi}&# 並發進程，預先處理產生extract_trfc_page_kpi表--用於按照pageId進行匯總統計所在頁面的pv數、uv數{    sh $userTrackPathCollectHome/scripts/extract_trfc_page_kpi.sh $date    lines=`hadoop fs -ls /user/hive/pms/extract_trfc_page_kpi/ds=$date | wc -l`    if [ $lines -le 0 ] ;then        echo ‘Error! extract_trfc_page_kpi is not exist‘        exit 6    else        echo ‘extract_trfc_page_kpi is ok!!!!!!‘    fi}&# 並發進程，同步term_category到hive，並將前台類目轉換為後台類目{    sh $userTrackPathCollectHome/scripts/extract_term_category.sh    lines=`hadoop fs -ls /user/hive/pms/temp_term_category | wc -l`    if [ $lines -le 0 ] ;then        echo ‘Error! temp_term_category is not exist‘        exit 7    else        echo ‘temp_term_category is ok!!!!!!‘    fi}&################################ 流程C################################ 等待上面所有的後台進程執行結束wait echo ‘end of backend jobs above!!!!!!!!!!!!!!!!!!!!!!!!!!!!‘# 產生extract_track_info表sh $userTrackPathCollectHome/scripts/extract_track_info.shlines=`hadoop fs -ls /user/hive/warehouse/extract_track_info | wc -l `if [ $lines -le 0 ] ;then    echo ‘Error! extract_track_info is not exist‘    exit 1else    echo ‘extract_track_info is ok!!!!!‘fi

上面的指令碼中，將流程B中互不影響的子任務全部放到了後台執行，從而實現了“並發執行”，同時為了不破壞指令碼的執行流程：

流程A->流程B->流程C

就需要在流程C執行之前加上：

# 等待上面所有的後台進程執行結束wait

其目的是等待流程B的所有後台進程全部執行完成，才執行流程C

結論

經過最佳化後，指令碼的執行時間，從耗時55分鐘，降到了耗時15分鐘，效果很顯著。

[Linux]shell多進程並發—詳細版

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More