Spark Machine Learning
1 Online Learning
The model keeps updating itself as new messages are received, rather than being trained again and again, like offline training.
2 Spark Streaming
- Discrete stream (DStream)
Input source: Akka actors, Message queue, Flume, Kafka 、......
Http://spark.apache.org/docs/latest/streaming-programming-guide.html
Class group (Lineage): A collection of conversion operators and execution operators applied to the RDD
3 mlib+streaming Application 3.0 BUILD.SBT
dependent on spark mllib and spark streaming
"scala-spark-streaming-app"version := "1.0"scalaVersion := "2.11.7"libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.5.1"libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.5.1"
Using the domestic mirror warehouse
~/.sbt/repositories
[repositories]localosc: http://maven.oschina.net/content/groups/public/typesafe: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnlysonatype-oss-releasesmaven-centralsonatype-oss-snapshots
3.1 Production messages
ObjectStreamingproducer {DefMain (args:array[String]) {Val random =NewRandom ()Maximum number of events per secondValMaxevents =6Read the list of possible namesVal Namesresource =This.getClass.getResourceAsStream ("/names.csv")Val names = Scala.io.Source.frominputstream (Namesresource). Getlines (). ToList. Head Split (","). ToseqGenerate a sequence of possible productsVal products =Seq ("IPhone Cover"9.99,"Headphones"5.49,"Samsung Galaxy Cover"8.95,"IPad Cover"7.49)/** Generate A number of random product events * *DefGenerateproductevents (N:INT) = {(1 to N). map {i =Val (product, price) = Products (Random.nextint (products.size))Val user = Random.shuffle (names). Head (user, product, price)}}Create a network producerVal Listener =NewServerSocket (9999) println ("Listening on port:9999")while (True) {Val socket = listener.accept ()NewThread () {OverrideDefrun = {println ( "Got client connected from:" + socket.getinetaddress) val out = new PrintWriter (Socket.getoutputstream (), true) while (true) {thread.sleep (1000) val num = random.nextint (maxevents) val productevents = generateproductevents (num) productevents.foreach{event = = Out.write ( Event.productIterator.mkString (s "Created $num events ...")} socket.close ()} }.start ()}}}
[1] MonitoringStreamingModel [2] SimpleStreamingApp [3] SimpleStreamingModel [4] StreamingAnalyticsApp [5] StreamingModelProducer [6] StreamingProducer [7] StreamingStateAppEnter number: 6
3.2 Printing Messages
To read the full text, please click: http://click.aliyun.com/m/8713/
Spark Machine Learning · Real-Time Machine learning