Spark Compile:
1, Java installation (recommended with jdk1.6)
2. Compiling commands
./make-distribution.sh--tgz-phadoop-2.4-dhadoop.version=2.6.0-pyarn-dskiptests-phive-phive-thriftserver
Spark Launcher:
├──bin
│├──beeline
│├──beeline.cmd
│├──compute-classpath.cmd
│├──compute-classpath.sh
│├──load-spark-env.sh
│├──pyspark
│├──pyspark2.cmd
│├──pyspark.cmd
│├──run-example
│├──run-example2.cmd
│├──run-example.cmd
│├──spark-class
│├──spark-class2.cmd
│├──spark-class.cmd
│├──spark-shell Interactive execution of SPARK commands
│├──spark-shell2.cmd
│├──spark-shell.cmd
│├──spark-sql
│├──spark-submit This command to execute the app, read the conf/spark-default.conf configuration by default
│├──spark-submit2.cmd
│├──spark-submit.cmd
│├──utils.sh
│└──windows-utils.cmd
├──changes.txt
├──conf
│├──fairscheduler.xml.template
│├──hadoop Save Hadoop Config file, name customization, set Hadoop profile path in spark-env.sh
│├──hive-site.xml Spark and hive integration, you need to provide hive configuration files, primarily hive's source database configuration
│├──log4j.properties configuration information for logs
│├──log4j.properties.template
│├──metrics.properties.template
│├──slaves slave nodes, one row per server (IP address)
│├──slaves.template
│├──spark-defaults.conf The configuration information that is read by default when this file is Spark-submit command, to configure various parameters of the current app
│├──spark-defaults.conf.template
│├──SPARK-ENV.SH Spark-initiated environment variables
│├──spark-env.sh.template
│├──spark-kafka.conf Personal custom profiles, spark-submit execution with--properties-file parameters, instead of spark-defaults.conf
│└──spark-sql.conf with spark-kafka.conf
├──data
│└──mllib
├──ec2
│├──deploy.generic
│├──readme
│├──spark-ec2
│└──spark_ec2.py
├──examples
│└──src
├──lib
│├──datanucleus-api-jdo-3.2.6.jar
│├──datanucleus-core-3.2.10.jar
│├──datanucleus-rdbms-3.2.9.jar
│├──spark-1.3.1-yarn-shuffle.jar
│├──spark-assembly-1.3.1-hadoop2.6.0.jar
│├──spark-examples-1.3.1-hadoop2.6.0.jar
│└──tachyon-0.5.0-jar-with-dependencies.jar
└──sbin
├──slaves.sh
├──spark-config.sh
├──spark-daemon.sh
├──spark-daemons.sh
├──start-all.sh start the current master and all slaves
├──start-history-server.sh
├──start-master.sh starting the current master
├──start-slave.sh starting the current slaves
├──start-slaves.sh start all the slave
├──start-thriftserver.sh
├──stop-all.sh close the current master and all slaves
├──stop-history-server.sh
├──stop-master.sh Close the current master
├──stop-slaves.sh Close all Slave
└──stop-thriftserver.sh
Getting Started with spark