Pipelines-. NET new IO API guidelines (i)

Last Update:2018-08-31 Source: Internet

Author: User

Tags rewind stream api

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

https://zhuanlan.zhihu.com/p/39223648

Original: Pipelines-a guided tour of the new IO APIs in. NET, Part 1

Marcgravell

About two years ago, I published an article about. NET in the upcoming experience of the new IO API blog-At that time it was called "Channels", at the end of 2018 May, it finally landed in the System.IO.Pipelines namespace, I am interested in this series of APIs, and a few weeks ago, I was assigned to use "Pipelines" transformation StackExchange.Redis as part of our 2.0 update

I hope in this series can be discussed:

What's "pipelines"?
How to use them in code
When you might want to use them

To be more specific, after the introduction of "pipelines", I intend to explain in great detail the relevant transformations in Stackexchange.redis, and as part of the discussion of what problems it solves separately in different scenarios. Briefly: In almost all cases, the answer can be summed up as:

It is well suited to the complex but pervasive pain points in IO code, allowing us to replace those ugly packages (Kludge), workarounds (workaround) or compromises (compromise) – with a dedicated solution designed to be elegant in the framework.

I'm sure that the pain points I've covered below must be very familiar to those working at the data protocol level.

What does the pipelines replace/refine?

First: What is closest to pipelines in the existing framework? Very simple, stream, stream API is very familiar to those who have done serialization or data protocol work, but the stream is actually a very vague api--it behaves very differently in different scenarios:

Some stream is read-only, some are write-only, some are read/write
The same entity types are sometimes read-only and sometimes write-only (for example DeflateStream )
When a stream is read/write, it is like a tape, the read and write operations all work on the same underlying data ( FileStream , MemoryStream ), and sometimes it is like two different streams, read writing is used to essentially a completely different two stream ( NetworkStream , SslStream )-- That's duplex stream.
In many Deplex (duplex) scenarios, it is difficult or impossible to say "no new data will come, but you should continue to read the data until the end"-only Close() , and it will close both parts of Deplex
Sometimes the stream will be a seekable and supported Position Length concept, but most will not
As the API progresses over time, there are often several ways to express the same action-for example, we can use read (synchronous), Beginread/endread (IAsyncResult mode async), or Readasync (async/ Async in await mode); In most cases, the calling code has no way of knowing which method is the recommended/best API
If you use any kind of asynchronous API, it's often hard to tell exactly what the threading model is; Is it essentially synchronous? If not, which thread will be callback? Does it use a synchronization context? Thread pool? IO complection-port thread?
And recently, with the Allow Span<byte> / Memory<byte> replace byte[] api--again, the caller has no way of knowing which is the "better" API
This API essentially encourages replication of data, and buffers are required. Is it copying the data into another piece of memory that requires a data warehouse that has not yet been processed? It also replicates the data into another piece of memory

So even before we start talking about the real world stream examples and the problems that are caused by using them, it's clear that the stream API itself has a lot of problems, so the first thing to be clear is that pipelines solves the confusion

What is Pipelines

Speaking of "pipelines", I'm referring to a set of 4 key APIs that enable read and write access to a binary stream decoupling, overlap (overlapped), including buffer management (pooling, recycling), thread awareness, rich backlog control, and overflow protection through back pressure- All of this is based on a non-contiguous memory design Api,that's a heck of a word salad--but don't worry, I'll discuss each element to explain what I mean.

from a simple start: write to and read from a single pipe

Let's first prepare a peer stream, write something simple, and then read it back-insisting on using only the stream API. We'll use only ASCII text so we don't have to worry about any complicated coding, and our read-write code doesn't make assumptions about the underlying data flow. We just write the data and read it to the end of the stream to consume it.

We're going to use stream to do these--familiar areas, and then we re-implement it with pipelines to see similarities and differences, and after that we'll look at what's going on inside of it, and then we can see why it attracts us.

You might say, "Ah, I remember TextReader / TextWriter " I deliberately don't use them--because I'm here trying to talk about the stream API, so our example can be extended to a wide range of data protocols and scenarios

using (MemoryStream ms = new MemoryStream()){ // write something WriteSomeData(ms); // rewind - MemoryStream works like a tape ms.Position = 0; // consume it ReadSomeData(ms);}

Now, to write to the stream, the caller needs to get and populate a buffer and pass it to stream, at which point we use the synchronous API to simplify it, and simply assign a byte array

void writesomedata (stream  Stream) {byte[] bytes = encoding. Ascii. Getbytes ( "Hello, world!" ); stream. Write (bytes0 Bytes. Length); stream. Flush () }

Note: There are a lot of things you can do in the code above if you want to be efficient, but that's not the point. So if you're familiar with this type of code and look at the diaphragm, don't panic, and then we'll make it uglier--well, I mean more efficient.

The code for reading logic is more complex than the write logic, because the read code cannot assume that all the data can be obtained with a single call, and a read operation on the stream may not return anything (indicating that it has been read to the end of the data), or it may fill our buffers, Or just return a byte even though we have prepared a huge buffer. So the stream reads most of the code as a loop:

voidReadsomedata(StreamStream){IntBytesread;Note that the caller usually can ' t know much aboutthe size;. Length is not usually usableByte[]Buffer=NewByte[256];Do{Bytesread=Stream.Read(Buffer,0,Buffer.Length);If(Bytesread> 0) {//note this is only works for Single-byte encodings string s = Encodingascii. Getstring (buffer0, bytesread); console. Write (s} } while  (bytesread > 0) }

Now we translate it into pipelines, a pipe can be roughly compared to a MemoryStream , in addition to not multiple rewind (rewind), the data is a simple FIFO queue, we have a The writer API can push data at one end, while a reader API can take the data out at the other end, and the pipe is a buffer between the two. Let's recreate the previous scene, but replace MemoryStream with a pipe (again, in practice we don't usually do that, but it's easy to do so):

 pipe pipe = pipe ()  Write Somethingawait writesomedataasync (pipe< Span class= "P". writer); //signal that there won ' t be anything else Writtenpipe. Writer. Completeawait readsomedataasync (pipereader

First we create a pipe with the default option, and then we write it. Note that the IO operations in the pipe are usually asynchronous, so we need to await our two help methods, and also note that we did not pass this pipe into them-unlike stream, pipelines has different API levels for both read and write, so we will have a The PipeWriter incoming Help method is used to write data, then pass PipeReader in one to read the data, write the data, we are on the PipeWriter increase Complete() . We don't need to MemoryStream do this in the because it will automatically eofs--when it reaches the end of the buffered data but in some other stream implementations-especially one-way flows-we may need to call after writing the dataClose

Okay, so what's ours WriteSomeDataAsync ? Note that I intentionally wrote more comments in the following code:

AsyncValuetaskWritesomedataasync(PipewriterWriter){Use an oversized size guessMemory<Byte>Workspace=Writer.GetMemory(20);//write the data to the workspace int bytes = encoding. Ascii. Getbytes ( "Hello, world!" workspace. Span); //tell the pipe what much of the workspace //we actually want to commit writer. Advance (bytes//this is **not** the same as stream.flush! await writer. Flushasync ()

The first thing to note is that when dealing with pipelines: not you control the buffer, but the pipe, recall our stream code, read and write the code all created local byte[], but here we do not, instead, we pass GetMemory (or its twin method GetSpan ) Request a buffer () to the pipe workspace , first you think from the name, this gives us one Memory<byte> or one Span<byte> --its capacity is at least 20 bytes

After we get this buffer, we encode our string, which means we write directly to the pipe's memory and record how many bytes we actually use, and then we Advance tell the pipe that we are not subject to the 20-byte limit of the previous request-we can write 0 , 20, or even 50 bytes, the last one may seem surprising, but this is actually encouraged! The previous focus was "at least"-in fact writer could give us a much larger buffer than we requested. When dealing with large data, luck is very common: ask for a minimum space that we can use effectively, but after checking the volume of the Memory/span provided to us, we decide how much to actually write.

The invocation of the pair Advance is important, which means that the end of a write operation makes the data in the pipe available for reader consumption. The invocation of the pair is FlushAsync equally important, but there are subtle differences, but before we can fully articulate what the difference is, we need to look at reader first. This is our ReadSomeDataAsync approach:

AsyncValuetaskReadsomedataasync(PipereaderReader){While(True){Await some data being availableReadresultRead=AwaitReader.Readasync();Readonlysequence<Byte>Buffer=Read.Buffer;Check whether we ' ve reached the endand processed everythingIf(Buffer.IsEmpty&&Read.IsCompleted)Break;Exit loop?Process what we receivedForeach(Memory<Byte>segment in buffer) { string s = encoding. Ascii. Getstring (segment. Span); console. Write (s} //tell the pipe, we used everything reader.< span class= "n" >advanceto (buffer. End); }}             /span>

Like the stream example, we have a loop that lasts until we read to the end of the data, and in the stream, this is determined by the Read method returning an pipeline result, but there are two ways to check it in the case:

read.IsCompletedTell us if the pipe is written and no more data is written (Pipe.Writer.Complete (); This sentence in the previous code)
buffer.IsEmptyTell us that there is no data left to deal with in this operation

If there is no more data in the pipe and writer is notified of complete, then there will never be anything in this pipe, then we can quit.

If we have data there, we can look at the buffer, so first--we're going to talk about buffering; that's a new type in the code- ReadOnlySequence<byte> -This concept combines several roles:

Describes discontinuous memory, especially a sequence consisting of 0, 1, or more ReadOnlyMemory<byte> blocks
Describes a logical location in this data stream ( SequencePosition )--in particular via and buffer.Startbuffer.End

非连续It is important here that we will soon see where these data actually go, but in terms of reading: we need to be prepared to handle data that can be spread across multiple parts. Here, we use a simple traversal buffer to decode each piece of data in turn to achieve the goal. Note that even though the API is designed to describe multiple non-contiguous buffers, the data received is typically contiguous in a single buffer. In this case, you can typically write an optimized implementation for a single buffer. You can buffer.IsSingleSegment do that by checking and visiting buffer.First .

Finally, we call AdvanceTo and tell the pipe how much data we actually use.

key point: You don't need to take out all the data you provide

Contrast stream: When you raise the stream with read, it puts all the data in the buffer you give it, and in most real-world scenarios, it's not always possible to consume all the data in time--maybe it only makes sense to consider "commands" As "entire text lines", and you haven ' t yet seen a cr /in lf the data. This is a pit for stream-once the data is given to you, it's your problem, and if you can't use it now, you It's time to reserve the data somewhere, but for pipelines, you can tell it you've been spending it. In our case, we pass it on to buffer.End AdvanceTo tell it that we have consumed all the data we provided before. This means that we will never see this data again, as with stream, but we can also pass it buffer.Start , meaning "We haven't used anything"--we can check the data in time and it will still remain in the pipe for subsequent reading. We can also get arbitrary values in the buffer- SequencePosition For example, if we read 20 bytes-so we have complete control over how much data is discarded from the pipe. Here are two ways to get SequencePosition :

You can be like Slice(...) an Span<T> o or Memory<T> Slice(...) one ReadOnlySequence<byte> --and then access the sub-set .Start or.End
You can use the ReadOnlySequence<byte> .GetPosition(...) method in, which returns a related position without really splitting

More subtle: We can tell it separately that we're consuming a few quantities, but we've checked for a different number, and the most common example here is the expression "you can discard so much – I've done it, but I've finished all the data and I can't handle it at this time – I need more data (you can drop This much -I ' m is done with that; But I looked at everything, I can ' t do any more progress at the moment-i need more data) ", specifically:

reader.AdvanceTo(consumedToPosition, buffer.End);

Here's PipeWriter.FlushAsync() PipeReader.ReadAsync() where the subtle interplay comes in, I skipped PipeWriter.FlushAsync() it before, and it actually provides two functions in a single call:

If there is a ReadAsync call, it will be noticed because it requires data, and then it wakes up reader so that the read loop continues
If writer is faster than reader, such as a pipe filled with data that is not clearly read by reader, it suspends writer (via synchronous not completing)--when the pipe has more space, it is reactivated (writer hangs/ The recovery threshold can be specified when a pipe instance is created)

Clearly, these concepts do not work in our example, but they are the core idea of how pipelines works. The ability to push data back to the pipe greatly simplifies a large number of IO scenarios. In fact, every protocol processing code I saw before I had a pipelines had a lot of code related to the backlog of incomplete data-it was such a repetitive logic that I was very happy to see that it could be handled well in the framework.

what does "wake up" or "reactive" mean?

You may notice that I did not really define what I meant before, on the surface, I mean: ReadAsync FlushAsync an await operation for or is incomplete before its return, and now the asynchronous continuation is generated, allowing our async method to resume execution, yes, But this is just a re async -explanation/ await what it means. But my focus in debug is on which thread the code runs on-the reason I'll discuss it in the later series. So it's not enough for me to say "asynchronous continuation is produced". I want to know who is calling it, as far as threading is concerned. The most common answer is:

It passes SynchronizationContext the delegate (note: not in many systems SynchronizationContext )
The thread that triggered the state change is used when the state changes to produce a continuation
The global thread pool is used to generate continuations

In some cases, all of this can be fine, and in some cases all of this may be bad! The synchronization context is a sophisticated mechanism that can be returned from a worker thread to the main application thread (exception: UI thread in a desktop application). However, it is not necessary to just say that we have completed an IO operation and then ready to jump back to an application thread, and doing so will actually transfer a lot of IO code and data processing code to the application thread-which is usually what we want to avoid. Also, if the application code is used in an asynchronous invocation Wait() or .Result causes a deadlock (assuming you are not intentional). The second option ("inline" executes the callback on a thread that triggers it) may be problematic because it can steal the thread you want to use to do something else (and potentially cause a deadlock), and in some extreme cases, when two asynchronous methods are essentially running as a coprocessor, May cause stack-dive (final stack overflow). The last option (Global thread pool) does not have the first two problems, but you may encounter serious problems under certain load conditions-I'll discuss this in the sections later in this series.

But the good news is that pipelines here gives you control. When creating a pipe instance, we can provide an PipeScheduler instance for reader and writer (respectively) to use. PipeSchedulerused to perform these activations. If not, it is checked by I by default SynchronizationContext and then uses the global thread pool to use an inline continuation (using the thread that caused the state change) as another available option. However: you can provide your PipeScheduler own implementation , giving you complete control over the threading model.

Summary

So: We have studied what is Pipe , and how we can PipeWriter write a pipe, and use it to PipeReader read from the pipe-and how to "advance" both. We have examined the similarities and differences in stream, and we have discussed ReadAsync() and FlushAsync() how to interactively control the execution of the shards of writer and reader. We examined how the responsibility for buffers was reversed when all buffers were supplied through the pipe-and how the pipe simplifies the management of the backlog data. Finally, we discuss the threading model that activates activating the continuation of the await operation.

This is probably enough for the first step. After that, we'll look at the memory model of pipelines's work--such as where the data survives. We'll also look at how to use pipelines in real-world scenarios to start doing something interesting .

Pipelines-. NET new IO API guidelines (i)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More