Scalaz-Scalaz-stream: In-depth understanding of-source

Source: Internet
Author: User

The main design goal of the Scalaz-stream library is to implement functional I/O programming (functional I/O). This allows the user to use a single, functional base I/O function combination as a fully functional I/O program. Another goal is to ensure the secure use of resources (resource safety): The I/O program written with Scalaz-stream ensures the secure use of resources, especially after an I/O task is completed and automatically frees all resources that are consumed, including file handle, Memory and so on. In our previous discussion, we explained in general terms the basic situation of scalaz-stream core type process, but most of the time was used to introduce the PROCESS1 channel type. In this discussion, we will introduce the design principle and application purpose of the whole scalaz-stream chain from the angle of practical application. We mentioned that process has emit/await/halt three states, while append is an important type of link stream node. First look at these types of definitions in Scalaz-stream:

Case Class Emit[+o] (Seq:seq[o]) extends Haltemitorawait[nothing, O] with emitorawait[nothing, O]case class Await[+f[_], A , +o] (    req:f[a]    , RCV: (Earlycause/A) = Trampoline[process[f, O]] @uncheckedVariance    , preempt:a =&G T Trampoline[process[f,nothing]] @uncheckedVariance = (_:a) = Trampoline.delay (halt:process[f,nothing])    ) Extends Haltemitorawait[f, O] with Emitorawait[f, O] Case class Halt (cause:cause) extends Haltemitorawait[nothing, Nothin G] with haltorstep[nothing, Nothing]case class Append[+f[_], +o] (    head:haltemitorawait[f, O]    , stack:vector[ Cause = Trampoline[process[f, O]] @uncheckedVariance    

We see that process[f,o] is embedded in the trampoline type, so the process is structured through the trampoline function, which effectively solves a large number of stream operation Stack Overflow problems (stackoverflowerror). Aside from complex syntax such as trampoline, the above types can be simplified into the following theoretical structures:

Trait Process[+f[_],+o]case object Causecase class Emit[o] (out:o) extends process[nothing, O] case class Halt (Cause:caus E) extends Process[nothing,nothing]case class Await[+f[_],e,+o] (  req:f[e],  rcv:e = Process[f,o],  Preempt:e = process[f,nothing] = Halt) extends Process[f,o]case class Append[+f[_],+o] (  Head:process[f,o],  stack:vector[cause = Process[f,o]]) extends Process[f,o]  


let's explain:

Process[f[_],o]: From its type it can be inferred that Scalaz-stream can perform F type operations that may have side effects during the output O-type element.

Emit[o] (out:o): Send an O type element; No additional operations are possible

Halt (Cause:cause): stop sending; cause is the reason for the stop: end-complete send, err-error terminated, kill-forcibly terminated

Await[+f[_],e,+o]: This is the core process state of the computation flow. First, the F operation req, the result of the input function after e RCV transition to the next process state, after completion of the preempt after the cleanup function. This is not a FLATMAP function structure version. It is important to note that the type E is an internal type and is no longer referenced by the F operation after it has been entered RCV. We can release resources in the PREEPMT function. If we need to build an arithmetic flow, it looks like we're going to have to use this await type.

Append[+f[_],+o]:append is a process[f,o] link type. First of all, it not only carries the transmission of element O, but more importantly, it can also transfer the F operation of the previous node to the next node. This allows you to run the post-disposition function (finalizer) on the previous node at the following node. Append can make multiple nodes into a large node: Head is the first node, stack is a string of functions, each function takes the previous node completion state to operate the next node state

A complete scalaz-stream consists of three types of nodes (source point)/transducer (Point-of-transfer)/sink (destination). Links between nodes via await or append. Let's take a look at the type of Source/transducer/sink style:

Upstream: Source >>> Process0[o] >>> Process[f[_],o]

Midstream: Transduce >>> Process1[i,o]

Downstream: Sink/channel >>> process[f[_],o = F[unit]], Channel >>> process[f[_],i = F[o]]

We can use a document processing process to describe the role of the complete Scalaz-stream chain:

Process[f[_],o], the O value in the file is read in F[o] mode, and F is a side effect

>>> Process[i,o],i represents the raw data that is read from the file, and O represents the output data that is filtered and processed

>>> o = f[unit] is a function that does not return a result, which represents an F operation on an input O-type data, such as writing O-type data to a file

/>> i = F[o] is a function that returns a result, an F operation on input I returns O, such as writing a record to a database and returning a write state

The above process is a simple description: Read data from the file, processing read data, write another file. Although it looks simple from the description, our goal is to secure the use of resources: no matter in any termination situation: normal read and write, forced stop halfway, error termination, Scalaz-stream will actively close the open file, stop the use of the thread, release the memory and other resources. It seems to be not so easy. Let us first try to analyze the effects of these types of Source/transducer/sink:

Import process._emit (0)                        //> res0:scalaz.stream.process0[int] = emit (Vector (0)) Emitall (Seq (+))            // > Res1:scalaz.stream.process0[int] = Emit (List (1, 2, 3)) Process (//>)                 Res2:scalaz.stream.process0[int ] = Emit (Wrappedarray (1, 2, 3)) Process ()                      //> res3:scalaz.stream.process0[nothing] = Emit (List ())


These are the Process0 of the building, but also the data source. But they just represent a string of values in memory that doesn't make sense to us because we want to get these values from the peripherals, like reading data from a file or a database, which requires an F-arithmetic effect. Process0[o] >>> Process[nothing,o], and what we need is process[f,o]. So how do we write this?

Val P:process[task,int] = Emitall (Seq)       //> P  : scalaz.stream.process[scalaz.concurrent.task,int] = Emit (List (1, 2, 3)) Emitall (SEQ). Tosource   //> Res4:scalaz.stream.process[scalaz.concurrent.task,int] = Emit (List (1, 2, 3))                                                  


The type is matched, but the expression emit (...) There is no shadow of a task, this does not satisfy our need for source. It seems to be the only way to do this:

Await (Task.delay{3}) (emit)                        //> res5:scalaz.stream.process[scalaz.concurrent.task,int] = await ([email protected],<function1>,<function1>) eval (task.delay{3})                               //> res6:scalaz.stream.process[ Scalaz.concurrent.task,int] = Await ([email protected],<function1>,<function1>)


Now not only does the type match, but the expression also contains the task operation. We can do the side-effects of file reading with Task.delay, because await will run Req:f[e] >>> Task[int]. This is the source we need. So can we use this source to send out a bunch of data?

def Emitseq[a] (Xa:seq[a]):P rocess[task,a] = xa match {case h:: t = = await (Task.delay {h}) (emit) + + EMITSEQ (t) CAs E Nil = halt}//> emitseq: [A] (Xa:seq[a]) scalaz.stream.process[scalaz.concur Rent. Task,a]val es1 = emitseq (SEQ)//> es1:scalaz.stream.process[scalaz.concurrent.task,int] = Append (Aw AIT ([email protected],<function1>,<function1>), Vector (<function1>)) Val es2 = Emitseq (Seq (" A "," B "," C "))//> es2:scalaz.stream.process[scalaz.concurrent.task,string] = Append (Await ([email protected] ,<function1>,<function1>), Vector (<function1>)) Es1.runLog.run//> Res7:vecto R[int] = vector (1, 2, 3) es2.runLog.run//> res8:vector[string] = vector (A, B, c) 


In the above example, we use await to calculate the Task and then return to Process[task,?], a source that may have side-effect operations. In fact, in many cases we need to use task from the external source to get some data, usually these data sources do asynchronous (asynchronous) processing of data acquisition, and then provide the data by callback way. We can use the Task.async function to turn these callback functions into tasks, and the next step is to use process.eval or await to upgrade the task to Process[task,?]. Let's look at a simple example: if we use scala.concurrent.Future for asynchronous data reading, we can transform the future into a process:

def getData (dbname:string): task[string] = task.async {cb =   import scala.concurrent._   Import Scala.concurrent.ExecutionContext.Implicits.global   Import scala.util.{ Success,failure} future   {s ' Got data from $dbName '}.oncomplete {case     Success (a) = CB (a.right) Case     Fail Ure (E) = CB (E.left)   }}                                        //> GetData: (dbname:string) Scalaz.concurrent.task[string]val procgetdata = Eval (GetData ("MySQL"))//> procgetdata  : scalaz.stream.process[scalaz.concurrent.task,string] = Await ([email protected],<function1>,<function1>) ProcGetData.runLog.run                   //> res9:vector[string] = Vector (got Data from MySQL)


We can also turn Java callback into a task:

  Import com.ning.http.client._ val asynchttpclient = new Asynchttpclient ()//> asynchttpclient:com.ning.http.c Lient. Asynchttpclient = [email protected] def get (s:string): task[response] = Task.async[response] {callback = = Ynchttpclient.prepareget (s). Execute (New Asynccompletionhandler[unit] {def oncompleted (r:response): Unit = C Allback (r.right) def onError (e:throwable): Unit = Callback (E.left)})}//> get: (s:                    String) Scalaz.concurrent.task[com.ning.http.client.response] val prcget = Process.eval (Get ("http://sina.com")) > Prcget:scalaz.stream.process[scalaz.concurrent.task,com.ning.http.client.response] = Await ([email  protected],<function1>,<function1>) prcGet.run.run//> 12:25:27.852 [New I/O worker #1] DEBUG C.N.H.C.P . n.r.nettyconnectlistener-request using non cached Channel ' [id:0x23fa1307,/192.168.200.3:50569 =>sina.com/ 66.102.251.33:80] ':

If you directly follow the type of Scalaz Task callback style def async (callback: (throwable \ Unit) = unit):

  DEF read (callback: (Throwable \ array[byte]) = unit): unit =???     > read: (callback:scalaz.\/[throwable,array[byte]] = unit) unit val T:task[array[byte]] = Task.async (read)  > T:scalaz.concurrent.task[array[byte]] = [email protected] val T2:task[array[byte]] = for {bytes <- T moarbytes <-t} yield (bytes + + moarbytes)//> T2:scalaz.concurrent.task[array[byte]] = [EMAIL&N Bsp;protected] Val prct2 = process.eval (T2)//> prct2:scalaz.stream.process[scalaz.concurrent.task,array[b Yte]] = Await ([email protected],<function1>,<function1>) def asyncread (Succ:array[byte] = Unit,                          fail:throwable = unit): unit =???  > AsyncRead: (succ:array[byte] = unit, fail:throwable = unit) unit val T3:task[array[byte]] = task.async {             callback = AsyncRead (b = Callback (b.right), err = callback (Err.left))}         > T3:scalaz.concurrent.task[array[byte]] = [email protected] val T4:task[array[byte]] = T3.flatmap (b = Task (b))//> t4:scalaz.concurrent.task[array[byte]] = [email protected] Val P Rct4 = Process.eval (T4)//> prct4:scalaz.stream.process[scalaz.concurrent.task,array[byte]] = Await ([email&nbsp ;p rotected],<function1>,<function1>)

We can also use a timer to generate Process[task,a]:

  Import scala.concurrent.duration._  implicit val scheduler = Java.util.concurrent.Executors.newScheduledThreadPool (3)                  //> Scheduler  :  Java.util.concurrent.ScheduledExecutorService = [email protected][running, pool size = 0, Active threads = 0, queued tasks = 0, completed tasks = 0]  val fizz = Time.awakeevery (3.seconds). Map (_ = "Fizz")                  //> Fizz  : Scalaz.stre Am. Process[scalaz.concurrent.task,string] = Await ([email protected],<function1>,<function1>)  Val Fizz3 = Fizz.take (3)   //> fizz3  : scalaz.stream.process[scalaz.concurrent.task,string] = Append (Halt (End), Vector (<function1>))  fizz3.runLog.run           //> res9:vector[string] = vector (fizz, fizz, fizz)

Queue, top, and signal are all available as builders with side-effects data sources. Let's start by looking at how the queue generates the data source:

  Type Bigstringresult = String val qjobresult = async.unboundedqueue[bigstringresult]//> qjo Bresult:scalaz.stream.async.mutable.queue[demo.ws.blogstream.bigstringresult] = [email protected] def LongGet                      (jobnum:int): Bigstringresult = {Thread.Sleep (+) S "Some large data sets from Job#${jobnum}"} > Longget: (jobnum:int) Demo.ws.blogStream.BigStringResult//multi-tasking val start = System.currentt Imemillis ()//> Start:long = 1468556250797 task.fork (qjobresult.enqueueone (1)). Longget    {Case _ = = ()} Task.fork (Qjobresult.enqueueone (Longget (2))). Unsafeperformasync{case _ = = ()} task.fork (Qjobresult.enqueueone (                                                  Longget (3))). Unsafeperformasync{case _ = = ()} val Timemill = System.currenttimemillis ()-Start > Timemill:long = thread.sleep (+) QJobResult.close.run val bigdata = {// Multi-tasking    Val J1 = qjobresult.dequeue val j2 = qjobresult.dequeue val J3 = qjobresult.dequeue for {R1 <-J1 R2 <-J2 R3 <-J3} yield r1 + "," + R2 + "," + R3}//> BIGDATA:SCALAZ.STREAM.P    Rocess[[x]scalaz.concurrent.task[x],string] = Await ([email protected],<function1>,<function1>) BigData.runLog.run//> res9:vector[string] = Vector (Some large data sets from job#2,some large data sets from job #3, Some large data sets from job#1)


Then look at the topic demonstration:

Import scala.concurrent._ Import scala.concurrent.duration._ import scalaz.stream.async.mutable._ Import Scala.concurrent.ExecutionContext.Implicits.global val Shareddata:topic[bigstringresult] = async.topic ()//> sh Areddata:scalaz.stream.async.mutable.topic[demo.ws.blogstream.bigstringresult] = [email protected] Val Subscriber = SharedData.subscribe.runLog//> subscriber:scalaz.concurrent.task[vector[ Demo.ws.blogStream.BigStringResult]] = [email protected] val otherthread = future {subscriber.run//Added This Here-now Subscriber is really attached to the topic}//> otherthread:scala.concurrent.future[vect  Or[demo.ws.blogstream.bigstringresult]] = List ()//need to give subscriber some time-to-start up.  I doubt you ' d does this in actual code. Topics seem more useful-hooking up things like//sensors that produce a continual stream of data,//And where I  Ndividual values can dropped on/floor. Thread.Sleep (+) Shareddata.publishone (Longget (1)). Run//don ' t just call publishone; Need to run the resulting task SharedData.close.run//Don ' t just call close;       Need to run the resulting task//need to wait for the output val result = Await.result (Otherthread, Duration.inf) > Result:vector[demo.ws.blogstream.bigstringresult] = Vector (Some large data sets from job#1)

The above provides explanations and demonstrations of various generation methods for source that may have side effects. Other types of Scalaz-stream nodes are described in depth in the following discussion.







Scalaz-Scalaz-stream: In-depth understanding of-source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.