基於sparksql調用shell指令碼運行SQL

最後更新：2017-06-19 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：site ssh mod overwrite when cli char data- comm

[Author]: kwu

基於sparksql調用shell指令碼運行SQL，sparksql提供了類似hive中的 -e , -f ,-i的選項

1、定時呼叫指令碼

#!/bin/sh  # upload logs to hdfs    yesterday=`date --date=‘1 days ago‘ +%Y%m%d`  /opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark://10.130.2.20:7077 --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075   -e "insert overwrite table st.stock_realtime_analysis PARTITION (DTYPE=‘01‘ )  select t1.stockId as stockId,         t1.url as url,         t1.clickcnt as clickcnt,         0,         round((t1.clickcnt / (case when t2.clickcntyesday is null then   0 else t2.clickcntyesday end) - 1) * 100, 2) as LPcnt,         ‘01‘ as type,         t1.analysis_date as analysis_date,         t1.analysis_time as analysis_time    from (select stock_code stockId,                 concat(‘http://stockdata.stock.hexun.com/‘, stock_code,‘.shtml‘) url,                 count(1) clickcnt,                 substr(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘),1,10) analysis_date,                 substr(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘),12,8) analysis_time            from dms.tracklog_5min           where stock_type = ‘STOCK‘             and day =                 substr(from_unixtime(unix_timestamp(), ‘yyyyMMdd‘), 1, 8)           group by stock_code           order by clickcnt desc limit 20) t1    left join (select stock_code stockId, count(1) clickcntyesday                 from dms.tracklog_5min a                where stock_type = ‘STOCK‘                  and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘),1)                  and substr(datetime, 12, 5) <substr(from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss‘), 12, 5)                  and day = ‘${yesterday}‘                group by stock_code) t2      on t1.stockId = t2.stockId;  "  sqoop export  --connect jdbc:mysql://10.130.2.245:3306/charts   --username guojinlian  --password Abcd1234  --table stock_realtime_analysis  --fields-terminated-by ‘\001‘ --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql內容為載入udf:

add jar /opt/bin/UDF/hive-udf.jar;create temporary function udtf_stockidxfund as ‘com.hexun.hive.udf.stock.UDTFStockIdxFund‘;create temporary function udf_getbfhourstime as ‘com.hexun.hive.udf.time.UDFGetBfHoursTime‘;create temporary function udf_getbfhourstime2 as ‘com.hexun.hive.udf.time.UDFGetBfHoursTime2‘;create temporary function udf_stockidxfund as ‘com.hexun.hive.udf.stock.UDFStockIdxFund‘;create temporary function udf_md5 as ‘com.hexun.hive.udf.common.HashMD5UDF‘;create temporary function udf_murhash as ‘com.hexun.hive.udf.common.HashMurUDF‘;create temporary function udf_url as ‘com.hexun.hive.udf.url.UDFUrl‘;create temporary function url_host as ‘com.hexun.hive.udf.url.UDFHost‘;create temporary function udf_ip as ‘com.hexun.hive.udf.url.UDFIP‘;create temporary function udf_site as ‘com.hexun.hive.udf.url.UDFSite‘;create temporary function udf_UrlDecode as ‘com.hexun.hive.udf.url.UDFUrlDecode‘;create temporary function udtf_url as ‘com.hexun.hive.udf.url.UDTFUrl‘;create temporary function udf_ua as ‘com.hexun.hive.udf.useragent.UDFUA‘;create temporary function udf_ssh as ‘com.hexun.hive.udf.useragent.UDFSSH‘;create temporary function udtf_ua as ‘com.hexun.hive.udf.useragent.UDTFUA‘;create temporary function udf_kw as ‘com.hexun.hive.udf.url.UDFKW‘;create temporary function udf_chdecode as ‘com.hexun.hive.udf.url.UDFChDecode‘;

設定ui的port

--conf spark.ui.port=4075

默覺得4040，會與其它正在跑的任務衝突，這裡改動為4075

設定任務使用的記憶體與CPU資源

--executor-memory 6g --total-executor-cores 45

原來的語句是用hive -e 啟動並執行，改動為spark後速度大加快了。

原來為15min，提升速度後為 45s.

基於sparksql調用shell指令碼運行SQL

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More