Restart idea:
Restart idea:
After restart, enter the following interface:
Step 4: Compile scala code in idea:
First, select "create new project" on the interface that we entered in the previous step ":
Select the "Scala" option in the list on the left:
To facilitate future development, select the "SBT" option on the right:
Click "Next" to go to the next step and set the name and directory of the scala project:
Click "finish" to create the project:
Because we have selec
follows: Step 1: Modify the host name in/etc/hostname and configure the ing between the host name and IP address in/etc/hosts: We use the master machine as the master node of hadoop. First, let's take a look at the IP address of the master machine: The IP address of the current host is "192.168.184.20 ". Modify the host name in/etc/hostname: Enter the configuration file: We can see the default name when installing ubuntu. The name of the machine in the configuration file is
. From the configuration above, we can see that we use the master node as the master node and as the data processing node. This is due to the consideration of three copies of our data and the limited number of machines. Copy the master configured masters and slaves files to the conf folder under the hadoop installation directory of slave1 and slave2 respectively: Go to the slave1 or slave2 node to check the content of the masters and slaves files: It is found that the copy is completel
slave2 machines.
In this case, the id_rsa.pub of slave1 is sent to the master, as shown below:
At the same time, the slave2 id_rsa.pub is sent to the master, as shown below:
Check whether the data has been copied on the master:
Now we can see that the public keys of slave1 and slave2 nodes have been transmitted.
All public keys are integrated on the master node:
Copy the master's public key information authorized_keys to the. SSH directory of slave1 and slave1:
Log on to slave1
The command to end historyserver is as follows:
Step 4: Verify the hadoop distributed Cluster
First, create two directories on the HDFS file system. The creation process is as follows:
/Data/wordcount in HDFS is used to store the data files of the wordcount example provided by hadoop. The program running result is output to the/output/wordcount directory, through web control, we can find that we have successfully created two folders:
Next, upload the data of the local file to the HDFS
Step 2: Use the spark cache mechanism to observe the Efficiency Improvement
Based on the above content, we are executing the following statement:
It is found that the same calculation result is 15.
In this case, go to the Web console:
The console clearly shows that we performed the "count" Operation twice.
Now we will execute the "Sparks" variable for the "cache" Operation:
Run the Count operation to view the Web console:
At this tim
Step 2: Use the spark cache mechanism to observe the Efficiency Improvement
Based on the above content, we are executing the following statement:
It is found that the same calculation result is 15.
In this case, go to the Web console:
The console clearly shows that we performed the "count" Operation twice.
Now we will execute the "Sparks" variable for the "cache" Operation:
Run the Count operation to view the Web console:
At this time, we found
implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop, Spark and Scala are tightly integrated, and Scala can manipulate distributed datasets as easily as local collection objects.
Although the Spark was created to support an iterative job on a distributed dataset, it is actually a supplement to Hadoop that can be run in parallel in the Hadoo file system. This behavior
Label:This article explains the structured data processing of spark, including: Spark SQL, DataFrame, DataSet, and Spark SQL services. This article focuses on the structured data processing of the spark 1.6.x, but because of the rapid development of spark (the writing time o
. Features: Master, worker, and executor all run on separate JVM processes.4. Yarn cluster: The applicationmaster role in yarn ecology, using the Apache developed Spark Applicationmaster instead, The NodeManager role in each yarn ecosystem is equivalent to a worker role in the spark ecosystem, and Nodemanger is responsible for executor startup.5. Mesos cluster: N
Using Idea+maven to build Spark's development environment, encounter a little pit, fortunately finally completed successfully, using MAVEN to manage the project is still very necessary ~ ~ ~1. Create a new MAVEN project, select the Scala class project, and Next2. Fill in the Groupid,artifactid,projectname, continue next, next, and fill in the project name3. After the project has been generated, delete test class Myspec.scala, if not deleted, may report a test error when running4. Set Scala to th
Step 5: test the spark IDE development environment
The following error message is displayed when we directly select sparkpi and run it:
The prompt shows that the master machine running spark cannot be found.
In this case, you need to configure the sparkpi execution environment:
Select Edit configurations to go to the configuration page:
In program arguments, enter "local ":
This configuration i
Next package, use Project structure's artifacts:Using the From modules with dependencies:Select Main Class:Click "OK":Change the name to Sparkdemojar:Because Scala and spark are installed on each machine, you can delete both Scala and spark-related jar files:Next Build:Select "Build Artifacts":The rest of the operation is to upload the jar package to the server, and then execute the
Reason:Running the spark code with the root userWorkaround: Run spark with a non-administrator account[[Email protected] Bin]$./Add-User.ShWhatType of userDoYou wish to add?A) Management User (Mgmt-Users.Properties)B) Application User (Application-Users.Properties)(A):BEnterThe details of theNewUser to add.Realm (Applicationrealm) : Applicationrealm ---->> Careful Here . YouNeed to typeThisor leave it blank
(_ + _)Counts. saveastextfile ("HDFS: // master: 9000/user/output/wikiresult3 ")}}
Package it into myspark. jar and upload it to/opt/spark/newprogram of the master.
Run the program:
Root @ master:/opt/spark #./run-CP newprogram/myspark. Jar wordcount master @ master: 5050 newprogram/myspark. Jar
Mesos automatically copies the
/49/D5/wKioL1QbpNKDWXo_AAElnZLjV4U229.jpg "style =" float: none; "Title =" 14.png" alt = "wkiol1qbpnkdwxo_aaelnzljv4u229.jpg"/>
Select "yes" to enable automatic installation of scala plug-in idea.
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/49/D3/wKiom1QbpLijqttNAAE3LTevJ5I077.jpg "style =" float: none; "Title =" 15.png" alt = "wkiom1qbplijqttnaae3ltevj5i077.jpg"/>
In this case, it takes about 2 minutes to download and install the SDK. Of course, the download time varies depen
; "src =" http://s3.51cto.com/wyfs02/M02/4A/13/wKioL1QiJJPzxOm0AAFxk_FS8AU762.jpg "style =" float: none; "Title =" 51.png" alt = "wkiol1qijjpzxom0aafxk_fs8au762.jpg"/>
We found that we fully used the new background and correctly ran the program, which is much faster than the first operation.
This article is from the spark Asia Pacific Research Institute blog, please be sure to keep this source http://rockyspark.blog.51cto.com/2229525/1557591
[
essentially shuffle. So before shuffle, a lot of maps can be manipulated with partition. Each stage corresponds to multiple maptask or multiple resulttask, a task set in the stage that synthesizes a Taskset class, managed by Tasksetmanager to manage the running state of those tasks, locality processing ( such as the need for delay scheduling). This tasksetmanager is on the spark level, how to manage your tasks, that is, the task thread, this layer an
-9]+] \s*] "" ". R//Regular expression for connecting to Spark DEPL Oy clusters val spark_regex = "" "spark://(. *)" "". R//Regular expression for connection to Mesos cluster by mesos:// Or zk://URL val mesos_regex = "" "(MESOS|ZK)://.*" "". R//Regular expression for connec
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.