標籤:style http os ar strong for 檔案 資料 div
./bin/spark-shell --master spark://MASTER:PORT
啟動叢集模式:MASTER=spark://`hostname`:7077 bin/spark-shellbin/spark-shell --master spark://es122:7077單機模式:bin/spark-shell local[4] 載入一個text檔案Spark context available as sc. 串連到Spark的master之後,若叢集中沒有Distributed File System,Spark會在叢集中每一台機器上載入資料,所以要確保叢集中的每個節點上都有完整資料。 wget http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/spam.data 單機時:var inFile = sc.textFile("./spam.data")叢集時:import org.apache.spark.SparkFiles;var file = sc.addFile("spam.data")var inFile = sc.textFile(SparkFiles.get("spam.data")) 處理檔案的一行,按空格拆分,然後轉成double。var nums = inFile.map(x => x.split(" ").map(_.toDouble))註:x => x.toDouble 等價於_.toDouble 查看:inFile.first()nums.first() 羅吉斯迴歸:import org.apache.spark.util.Vectorcase class DataPoint(x: Vector, y: Double)def parsePoint(x: Array[Double]): DataPoint = { DataPoint(new Vector(x.slice(0, x.size-2 )), x(x.size -1 ))}
營運系列:08、Spark Shell