[hadoop+spark+python]大資料實戰隨手筆記

來源:互聯網
上載者:User
關鍵字 大資料 PYTHON Hadoop Spark

1.提交任務

指令(我配置了spark-submit的環境變數)

spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.12.233:7077 --executor-memory 10G -- total-executor-cores 10 fielname

逐條解釋如下:

(1)–class org.apache.spark.examples.SparkPi

設置運行的環境,java或者yarn

(2)–master spark://192.168.12.233:7077

設置spark服務的位址,格式為 spark:// +

(3)–executor-memory 10G

分配記憶體,這個屬性每個worker都會分配10G,根據實際情況分配

(4)–total-executor-cores 10

分配運行cpu核數,不超過總核數即可

(5)fielname

要運行的檔,相對路徑或者絕對路徑都可以,如果是python檔一定要能在命令列環境運行,requirment和包環境都要滿足才行,我一般是把專案打包寫好setup.py先編譯一遍在運行主程式

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.