1. Flume Create configuration file Flume-spark-tail-conf.properties# The configuration file needs to define the sources, # the channels and the sinks.# Sources, channels and sinks are defined per agent, # in this case called ‘agent‘a2.sources = r2a2.channels = c2a2.sinks = k2### define sourcesa2.sources.r2.type = execa2.sources.r2.command = tail -F /opt/datas/spark_word_count.loga2.sources.r2.shell = /bin/bash -c### define channelsa2.channels.c2.type = memorya2.channels.c2.capacity = 1000a2.chan
Developing a spark project with Python requires that spark be installed locally
A Local installation
1. Download http://spark.apache.org/downloads.html
Select the Hadoop version for this machine, click the link to download
2. Click the link to complete the download
3. Extracting files
4. Configure Environment variables
New spark_home=d:\spark\spark-2.2.0-bin-hadoop2.6
Append%spark_home%/bin to the system variable path;
Two to start spark locally
1. Enter D:\spark\hadoop
the classOrg.apache.spark.deploy.master.Master,Start listening on port 8080, as shown in the logModify Configuration1. Enter the $spark_home/conf directory 2. Rename Spark-env.sh.template to Spark-env.sh3. Modify spark-env.sh to add the followingExport Spark_master_ip=localhostExport Spark_local_ip=localhostrunning workerbin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077- I.127.0.0.1- C1- M512MThe worker starts to complete and connects to master. Open the maser WebUI t
to see some answers, in the look. Scala programming I think after functional programming, it feels like Python is a lot of ideas, not very difficult, in the Spark project will use Pyspark and Sparkshell to do the small demand.Machine learning key or understand the algorithm source code, so just will use the tool is actually not enough, so I read the Python version of machine learning, in fact, I am still only in the supervision of learning module has
Environment
Executors
Wordcount
After the preceding environment is ready, run the simplest example in sparkshell and enter the following code in spark-shell:
scala>sc.textFile("README.md").filter(_.contains("Spark")).count
The code above calculates the number of spark lines in readme. md.Detailed description of deployment process
Shows the components in the spark layout environment.
Driver programIn brief, the wordcount statement entered in spar
Test documents:[email protected] shell-script]# cat Text1.txt12345[email protected] shell-script]# cat Text2.txtOracleMysqlPostgreSQLHadoopSparkUsing Paste stitching text1.txt text2.txt file contents:# paste Text1.txt text2.txt1 Oracle2 MySQL3 PostgreSQL4 Hadoop5 SparkYou can also use the parameter-D "" to specify the delimiter. Such as:# paste Text1.txt text2.txt-d ","1,oracle2,mysql3,postgresql4,hadoop5,sparkShell uses the Paste command to stitch mu
After a user applies new sparkcontext, the cluster will allocate executors to the worker. What is the process? This article takes standalone's cluster as an example to describe this process in detail. The sequence diagram is as follows:
1. sparkcontext create taskscheduler and Dag Scheduler
Sparkcontext is the main interface for switching between a user application and a spark cluster. A user application must be created first. If you use sparkshell, y
/opt/db/spark-1.5.2-bin-hadoop2.6/bin/spark-shell--master Spark://u1:7077 --jars ~/ Spark-cassandra-connector-full.jarThe following is the Sparkshell command4. Prepare the data source://Most documents may stop the current SC, and then restart one, in fact, there is no need, directly on the original SC add Cassandra parameters just fineScala>sc.getconf.set ("Spark.cassandra.connection.host", "172.16.163.131")//reading a data source on HDFsScala>val df
Download2.Spark compilation and Deployment (bottom)--spark compile and install download3.Spark programming Model (above)--concept and Sparkshell actual combat download3.Spark programming model (bottom)--idea Construction and practical download4.Spark Run schema download5.Hive (UP)--hive Introduction and Deployment Download5.Hive (next)--hive actual download6.SparkSQL (a)--sparksql introduction download6.SparkSQL (ii)--in-depth understanding of operat
Environment
Executors
WordCountAfter the environment is ready, let's run the simplest example in Sparkshell and enter the following code in Spark-shellScala>sc.textfile ("Readme.md"). Filter (_.contains ("Spark")). CountThe code above counts the number of lines in readme.md that contain sparkDetailed deployment processThe components in the spark layout environment are as shown.
Driver Program briefly describes the Driver program
. ExecutorsWordCountAfter the environment is ready, let's run the simplest example in Sparkshell and enter the following code in Spark-shellscala>sc.textfile ("readme.md"). Filter (_.contains ("Spark")). CountThe code above counts the number of lines in readme.md that contain sparkDetailed deployment ProcessThe components in the spark layout environment are as shown.650) this.width=650; "src=" Http://static.oschina.net/uploads/img/201505/28162436_SPXn
The installation of Spark is divided into several modes, one of which is the local run mode, which needs to be decompressed on a single node without relying on the Hadoop environment.
Run Spark-shell
Local mode running Spark-shell is very simple, just run the following command, assuming the current directory is $spark_home
$ master=local
$ bin/spark-shell
Master=local is the indication that the current operation is in stand-alone mode. If all goes well, you will see the following message:
C
Contact Us
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.