./Bin/spark-shell -- master spark: // master: Port
StartCluster mode: Master = spark: // 'hostname': 7077 bin/spark-shellbin/spark-shell -- master spark: // es122: 7077 standalone mode: bin/spark-shell local [4] loads a text file spark context available as SC. after connecting to the spark master, if the cluster does not have a distributed file system, spark loads data on each machine in the cluster. Therefore, ensure that all nodes in the cluster have complete data. Wget http://www-stat.stanford.edu /~ Tibs/elemstatlearn/datasets/spam. when data is single-Host: var infile = SC. textfile (". /spam. data ") Cluster: Import Org. apache. spark. sparkfiles; var file = SC. addFile ("spam. data ") var infile = SC. textfile (sparkfiles. get ("spam. data ") process a row of a file, split it by space, and convert it to double. VaR Nums = infile. map (x => X. split (""). map (_. todouble) Note: x => X. todouble is equivalent _. todouble view: infile. first () nums. first () Logistic regression: Import Org. apache. spark. util. vectorcase class datapoint (X: vector, Y: Double) def parsepoint (X: array [double]): datapoint = {datapoint (new vector (X. slice (0, x. size-2), x (x. size-1 ))}
O & M series: 08, spark Shell