Or are you going to choose Python to learn spark programming
Because the Java write function is more complex, Scala learning curve is steep, and the combination of SBT and Eclipse and Maven is a bit of a crash, often can't find the main class to execute
Python hasn't used it before, but it's a reputation, and it's easy to process data.
Integrating the Pydev plugin in eclipse to write a Python program has been studied
Today I used a python development environment with Anaconda integration, and it felt good.
Especially Ipython notebook or jupyter notebook are easy to visualize
But how do you start in Pyspark?
Check out some of the English literature is configured under Linux
Ipython profile Create Spark
Creates some configuration scripts that are required for startup, after they are set in the script
Ipython Notebook--profile Spark
You can start notebook in Pyspark, but I'm not a success.
And then I saw an easy way
The Python interpretation environment can be transferred to Jupyter notebook by adding two variables that need to be checked at startup Pyspark directly in the Windows environment variable
The first variable is a pyspark_driver_python:jupyter
Another variable is Pyspark_driver_python_opts:notebook
If this is started from the command line (double-clicking startup is not possible), you can open a Web service in notebook and the Py script will run on Spark.
Reference documents:
Http://www.cnblogs.com/NaughtyBaby/p/5469469.html
http://blog.csdn.net/sadfasdgaaaasdfa/article/details/47090513
http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/
Spark machine learning by Nick Pentreath
Start Jupyter notebook in Pyspark