(Collectors.groupingby (Article::getauthor));}Very good! Using the Groupingby operation and the Getauthor method, we get a cleaner, more readable code. Now we look at all the different tags in the collection. Let's start with the example of using loops. Public Set getdistincttags () { Setnew hashset(); for (article article:articles) { Result.addall (article.gettags ()); } return result;}OK, let's take a look at how to use the stream operation to solve this problem. Publ
like an iterator, and a new stream must be generated when the same element needs to be accessed again.
Java provides the following ways to generate a stream
Collection.stream () creates a stream with a collection;
Collection.parallelstream () creates a parallel stream using a collection;
Arrays.stream (object[]);
Stream.of (object[]), intstream.range (int, int) or stream.iterate (Object, Unaryoperator), Stream.empty (), Stream.generate (Supplier
[TOC]
IntroducedThe key is how to sort the words by the number of words in the previous WordCount word count example.As follows:scala> val retRDD = sc.textFile("hdfs://ns1/hello").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_)scala> val retSortRDD = retRDD.map(pair => (pair._2, pair._1)).sortByKey(false).map(pair => (pair._2, pair._1))scala> retSortRDD.collect().foreach(println)...(hello,3)(me,1)(you,1)(he,1)
The following tests all
(print);------DescriptionParallelize: Creating an RDD from an external data setMap: Receives a function that acts on each element of the RDD, and the input type and return type do not need to be the same.Mkstring: Add delimiter---------------------2. Example: Flatmap/first---------------------Val Lines =sc.parallelize (List ("Hello World", "Hi Hi hhe"));Val Words=lines.flatmap (line = Line.split (""));Words.collect (). foreach (println);Words.first (
on the myParallel-x thread;
Just keep this log (), and you can see that the subscribeOn data flow is still executing on the myParallel-x thread.
Through the above three log() outputs, it can be found that for the operation chain as shown:
publishOn will affect subsequent operators in the chain , such as the first Publishon adjustment Scheduler is elastic, the filter processing operation is performed in the elastic thread pool, and the same
uses Stream data methods and calculation methods
Sort a collection using Stream API
Save results to a collection using the Collect method and group/partition data using the Collectors class
Use of merge () and FlatMap () methods of the Stream API
Exceptions and assertions
Use Try-catch and throw statements
Use catch, multi-catch, and finally clauses
Use Autoclose resources with a try-with-resources statement
Crea
Rules for conflict resolution 1969.4.1 three rules to solve the problem 1969.4.2 Select the interface that provides the most specific implementation of the default method 1979.4.3 conflict and how to explicitly eliminate ambiguity 1989.4.4 Diamond Inheritance Problem 2009.5 Summary 201The 10th chapter replaces null 202 with optional10.1 How to model missing values 20310.1.1 use defensive check to reduce null-pointerexception 20310.1.2 Null brings about a variety of problems 20410.1.3 alternativ
Java.lang.Math
Scala> Textfile.map (line = Line.split (""). Size). Reduce ((a, B) = Math.max (A, B))
Res4:int = 14
As we all know, a common data flow pattern prevalent in Hadoop is mapreduce. Spark makes it easy to implement MapReduce:[Plain]View PlainCopy
scala> val wordcounts = textfile.flatmap (line + line.split ("")). Map (Word = = (Word, 1)). Reducebykey ((a, b) = A + b)
wordcounts:org.apache.spark.rdd.rdd[(String, Int)] = shuffledrdd[8] at Reducebykey at
,list), 2,3,4 (list (list), List (5,6,7)). 8,9 If you want to continue the flattening operation, you will get an error List (list (1,list (2,3,4)), list (list (5,6,7), List (8,9)) .flatten.flattenerror: ---> FLATMAP equivalent to a combination of two functions of Map and flattenVal list = list (4,5,6)//The x here is equivalent to flatten the source data, then operates on each element List.flatmap (X=>x.map (_*2))// This clause is equivalent
, $, $, $,6576,7, $, About,7, $, +,543, -,453, $,345, the)//array starting from 1 Valk=3 Valheap=NewHeap1to Kforeach(I=>heap.Insert(Array(i))) K+1to array.length-1 foreach(i=>{ while(Heap.GetSize>Ten) {heap.Deletemin}if(Array(i) >heap.Getmin) {heap.DeleteminHeap.Insert(Array(i))}}) heap.Print}And again, Heap how it's achieved.The use of Scala often comes across the option of changing and not changing variables, which is a problem.There for is also the derivation formula, which is actually map
handling mechanismEasy to use concurrencyThree. Rxjava Application ScenarioRxbinding throttling (prevents repeated clicks of the button)PollingTimed operationRxpermissionsRxbusRxjava and retrofit to handle network requestsInstead of listening/observing modelsThread management, providing thread pooling, threading switching (schedulers)Resolving nested callbacks (FLATMAP)Provide time delay, timer processing (interval)Four. How to learn RxjavaThe main c
HDFS) , according to the Scala collection, the other RDD operation * Data is divided into a series of partitions by the RDD, and the data assigned to each partition belongs to the processing scope of a task * Note: The file path cannot be directly used in the Windows path of the reverse skew \, to change to L Inux under the slope*/JavarddSC. textfile ("D:/hu.txt"); /*** 4th step: The initial Javardd for the transformation level of processing, such as map, filter and other high-order functions,
same words added to get the final result.
For the first step, we naturally think of using the flatmap operator to split a line of text into multiple words, and then for the second step we need to use the map operator to convert a single word into a count key-value pair, that is, Word-> (word,1). For the last step to count the occurrences of the same word, we need to use the Reducebykey operator to add the count of the same word to the final result.C.
operations
1.map
Map[b] (f: (A) ⇒b): List[b]
Defines a transformation that applies the transform to each element of the list, and the original list is unchanged, returning a new list of data
Example1 Square Transform
Val nums = List (1,2,3)
val square = (x:int) => x*x
val squareNums1 = nums.map (num => num*num) //list (1, 4,9)
val squareNums2 = Nums.map (Math.pow (_,2)) //list (1,4,9)
val squareNums3 = nums.map (square) // List (1,4,9)
Example2 save a few columns in the tex
also returns future. Completablefuture is flexible enough to understand that our function results should now be the top future, contrasting completablefuture
Copy Code code as follows:
... Async changes are also available, and in the following case, carefully observe the types and differences of thenapply () (map) and Thencompose () (FLATMAP) when applied Calculaterelevance () method returns Completablefuture:
Copy Code c
time-consuming operations
. Subscribeon (Schedulers.io ())
/ /The main thread to observe, you can do UI update operations
. Observeon (Androidschedulers.mainthread ())
//Observed objects
. Subscribe (user->{
// Gets an object, user Toast.maketext (this,user.getusername (), Toast.length_short). Show ();
I used the lambda expression for shorthand when I looked at t
function conversion
Filter (func)
Returns a new dataset, consisting of the original element that returns a value of True after the Func function
Flatmap (func)
Similar to map, but each INPUT element is mapped to 0 to multiple output elements (therefore, the return value of the Func function is a seq rather than a single element)
Sample (Withreplacement, Frac, Seed)
Random sampling of FRAC data based on a given random seed seeded seed
Union (Otherda
suffix, regular expression
Implicit class Jsonhelper (private Val sc:stringcontext) extends Anyval {
def JSON (args:any*): Jsonobject = ...
}
Val X:jsonobject = json "{A: $a}" copy code
Custom String Interceptor
$ control Process
if (check) happy else sad
if (check) Happy//below
if (check) happy Else () copy code
>> If statement
while (x do {println (x); x + 1} while (x
While statement
Import Scala.util.control.breaks._
breakable {
for (x if (Math.random }
}
for (x Xs.filter (_%2 = = 0)
StreamingContext (sparkconf, Duration (5000)) Scc.checkpoint (".")//Because Updatestatebykey is used, it must be set checkpoint val topics = set ("Kafka-spark-demo")//We need to consume the KAF Ka data topic val kafkaparam = Map ("Metadata.broker.list", "localhost:9091"//Kafka Broker Lis T address) Val stream:inputdstream[(String, string)] = CreateStream (SCC, Kafkaparam, topics) stre Am.map (_._2)//Remove Value FlatMap (_.split (""))//Add WordString
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.