Discover spark sample programs in scala, include the articles, news, trends, analysis and practical advice about spark sample programs in scala on alibabacloud.com
must be set. The installation path is used to determine which several nodes Spark runs. The jar name will enable Spark to automatically transmit jar files to slave nodes.This program file relies on Spark APIs, so we must have an sbt configuration file to describe the dependencies between the program and Spark. The fol
change is to add a word to the statistical results according to the frequency of the sorting function:
Added by Dumbbell Yang at 2016-07-24Wordcounts.sortby (x => x._2, False, WordCounts.partitions.size)
You can compare Java methods to implement sorting, Exchange keys and value, sort, and then switch back in the tedious, Scala language is indeed very convenient.
After the above changes, spark Chinese word
, WordCounts.partitions.size)You can compare Java methods to achieve sorting, Exchange key and value, sort, and then swap back the tedious, Scala language is really handy for a lot. After the above changes, spark Chinese word segmentation statistics can be called from the Main method, such as the original call in the associated object: /** * Use Scala to dev
from its pom.xml, it is based on scala-2.11.5. There is only one code file, which is Xmlhelloworld.scala. As long as you can smoothly pull to the pom.xml in the dependency pack, you can directly right-click Xmlhelloworld.scala, Run as-> Scala application.
At this point, Ecipse+scala+maven was built. Next, configure the spark
Val sc=new Sparkcontext (CONF)
//The statement above is equivalent to Val sc=new sparkcontext ("local", "Testrdd")
val data=sc.textfile ("e:// Hello.txt ")//Read local file
data.flatmap (_.split (" "))//underscore is a placeholder, FlatMap is a way to manipulate rows, splitting the data that is being read in
. Map ((_,1))// Convert each item to Key-value, and the data is Key,value is 1
. Reducebykey (_+_)//combine items with the same key into one
. Col
Java switch-case (pair value)Scala is not only for values, but also for types, collections (map,list metadata matching), Object,classScala uses a lot of pattern matching (Match case)Scala's pattern match, which differs from the Java switch case:1. Not only can match the value, can match type2. Can match the collection of arraysAn array of the same array, the same length, and an array beginning with an elementAutomatic variable assignment for arrays of
What is Scala? S-Cala is a language designed to achieve scaleable language. Officially, it is called the mixed language of object-oriented language and functional language.Scala can be seamlessly spliced with Java programs, because Scala files are compiled as. class files and run on the JVM.Spark was developed by Scala.Scala installation? Here's the process of
[Introduction to Apache spark Big Data Analysis (i) (http://www.csdn.net/article/2015-11-25/2826324)
Spark Note 5:sparkcontext,sparkconf
Spark reads HBase
Scala's powerful collection data operations example
Some RDD operations and transformations in spark
# Create Textfilerdd
val textfile = Sc.textfile ("readme.md")
Te
file system, read the file from HDFs by default
classification and function of spark operators
value type transformation operator
input partition and output partition one-to- one
Map
FlatMap
mappartitions
Glom
input partition and output partition many-to-one type
Union
Cartesian
input partition and output partition Many-to-many types
GroupBy
output partition as input partition subset type
Filter
distinct
Subtract
WordCount written by Java"). Setmaster ("local")); /*** 2nd step: Create a Sparkcontext Object * Sparkcontext is the only entry for all the functions of the Spark program, whether in Scala, Java, Python *, R, etc. must have a sparkcontext (different Language specific class name is different, if Java is javasparkcontext) * Sparkcontext Core role: Initialize the Spark
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.