Golang Walkthrough:io Package

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Go is the programming language used to process bytes. Whether you have a byte list, a stream of bytes or a single byte, go is easy to handle. From these simple primitives, we build our abstractions and services. IO packages are one of the most basic packages in the standard library. It provides a set of interfaces and assistants for processing byte streams.

This article is part of a series of walkthroughs that can help you better understand the standard library. Although the official documentation provides a lot of information, it is still difficult to understand the meaning of the library in the real world environment. This series is designed to provide a context for how to use standard library packages in a daily application. If you have any questions or comments, you can contact me at @benbjohnson on Twitter. (Of course you can contact me, listomebao@gmail.com)

Reading bytes

Bytes have two of the most basic operations, and . Let's look at how to read bytes first.

Reader interface

The most basic structure for reading bytes from the data stream is the Reader interface:

type Reader interface {        Read(p []byte) (n int, err error)}

This interface is implemented throughout the standard library, from the network connection to the file is the packaging of the memory chip.

The reader passes the buffer p to the Read() method so that we can reuse the same bytes. If you Read() return a byte slice instead of taking a parameter, the reader will have to Read() allocate a new byte slice on each call. This will cause serious damage to the garbage collector.

ReaderOne problem with interfaces is that it comes with some subtle rules. First, when the stream is complete, it returns an io.EOF error as the normal part of the use. This can be confusing for beginners. Second, your buffers are not guaranteed to be filled in. If you pass a 8-byte fragment, you can receive it anywhere between 0 and 8 bytes. Processing partial reads can be messy and error-prone. Fortunately, there are help functions for these problems.

Improving reader guarantees

Assuming you have a protocol you are parsing, you know you need to read a 8-byte UInt64 value from the reader. In this case, it is best io.ReadFull() to use because you have a fixed size read:

func ReadFull(r Reader, buf []byte) (n int, err error)

This feature ensures that your buffers fully populate the data before returning. If your buffer portion is read, then you will receive one io.ErrUnexpectedEOF . If no bytes are read, it is returned io.EOF . This simple guarantee can greatly simplify your code. To read 8 bytes, you only need to do this:

buf := make([]byte, 8)if _, err := io.ReadFull(r, buf); err == io.EOF {        return io.ErrUnexpectedEOF} else if err != nil {        return err}

There are also many higher-level parsers, such as processing to parse specific types binary.Read() . We'll cover the different packages in a future walkthrough.
Another useful function is ReadAtLeast() :

func ReadAtLeast(r Reader, buf []byte, min int) (n int, err error)

This function reads the additional data into the buffer and, if available, always returns the minimum number of bytes. I do not see the need for this feature, but if you need to minimize the Read() call and you are willing to buffer the additional data, I can see that it is useful.

concatenating streams

Many times, you will encounter instances where multiple read operations need to be combined. You can use Multireader to combine them into a single reader:

func MultiReader(readers ...Reader) Reader

For example, you may be sending an HTTP request body that combines in-memory HTTP headers with data on disk. Many people try to copy headers and files into a memory buffer, but they are slow and use a lot of memory. This is a much simpler approach:

r := io.MultiReader(        bytes.NewReader([]byte("...my header...")),        myFile,)http.Post("http://example.com", "application/octet-stream", r)

MultiReaderLet the http.Post() two read interfaces be treated as a separate connection read interface.

duplicating streams

One problem that you might encounter with a read interface is that you cannot reread the data after reading the data. For example, your application might not be able to resolve the HTTP request body, and you cannot debug the problem because the parser has consumed the data. TeeReaderis a good choice for capturing the reader's data without interfering with the reading interface of the consumer.

func TeeReader(r Reader, w Writer) Reader

This function constructs a new read interface that wraps your read interface r . Any read from the new read interface will also be written w . The writer can be anything from a memory buffer to a log file, or stderr. For example, you can capture undesirable requests such as the following:

var buf bytes.Bufferbody := io.TeeReader(req.Body, &buf)// ... process body ...if err != nil {        // inspect buf        return err}

However, you must limit the body of the request that you want to capture to avoid running out of memory.

Restricting stream length

Because the flow is unbounded, in some cases it can cause memory or disk problems. The most common example is the file upload endpoint. The endpoint typically has a size limit to prevent the disk from being filled, but it can be tedious to do this manually. LimitReaderThis functionality is provided by generating a read interface that restricts the total number of bytes read:

func LimitReader(r Reader, n int64) Reader

LimitReaderOne problem is that it doesn't tell you if your underlying reader is more than N. Once r n a byte is read from, it will simply return io.EOF . One trick you can use is to set the limit to n + 1 , and then check whether the last read is more than n a byte.

Writing bytes

Now that we've covered the read bytes from the stream, let's look at how to write them to the stream.

Writer interface

WriterInterface is Reader the opposite operation. We provide a byte buffer to push to a stream.

type Writer interface {        Write(p []byte) (n int, err error)}

In general, writing bytes is easier than reading. The reader complicates data processing because they allow partial reads, but partial writes always return an error.

Duplicating writes

Sometimes you will send a write to multiple streams. Maybe it's a log file or a stderr. This is TeeReader similar to just that we want to repeat the write instead of repeating the read.

In this case, it comes in MultiWriter handy:

func MultiWriter(writers ...Writer) Writer

The name is a bit confusing, because it is not the same logic as Multireader. A MultiReader few read interfaces are concatenated into one, and MultiWriter a write interface is returned, which copies each write to multiple write interfaces. I use multiwriter extensively in unit tests, and I need to assert that the service is logging correctly:

type MyService struct {        LogOutput io.Writer}...var buf bytes.Buffervar s MyServices.LogOutput = io.MultiWriter(&buf, os.Stderr)

Use MultiWriter allows me to verify the contents of BUF while also seeing the full log output in my terminal for debugging.

Optimizing String Writes

There are many writers in the standard library WriteString() that have methods to improve write performance by not requiring allocations when converting strings to byte fragments. You can use io.WriteString() the function to take advantage of this optimization. This function is simple in function. It first checks to see if the author implements the WriteString() method and uses it (if available). Otherwise, it returns the copy string to the byte slice and uses the Write() method.

Copying bytes

Now we can read bytes, we can write bytes, and only then we want to plug the two sides together and copy between the read interface and the write interface.

Connecting Readers & Writers

The most basic way to copy data from a read interface to a write interface is a Copy() function:

func Copy(dst Writer, src Reader) (written int64, err error)

This function uses a 32KB buffer src to read from and then write to dst . If any error occurs in the read or write io.EOF , the copy is stopped and an error is returned. Copy()One problem is that you cannot guarantee the maximum number of bytes. For example, you might want to copy the log file to the current file size. If the log continues to grow during replication, there will eventually be more bytes than expected. In this case, you can use CopyN() a function to specify the exact number of bytes to write:

func CopyN(dst Writer, src Reader, n int64) (written int64, err error)

Copy()Another problem is that it needs to allocate one for each 32KB buffer that is called. If you are performing a large number of copies, you can use them CopyBuffer() to reuse your own buffers:

func CopyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error)

I did not find Copy() the overhead very high, so I personally do not use CopyBuffer() .

Optimizing copy

To avoid the full use of intermediate buffers, the type can implement direct read and write interfaces. When implemented, the Copy() function avoids intermediate buffers and uses these implementations directly. WriterTothe interface is appropriate for the type of data to be written directly:

type WriterTo interface {        WriteTo(w Writer) (n int64, err error)}

I used this in boltdb to Tx.WriteTo() allow the user to snapshot the database from the transaction. In this aspect of reading data, the read interface is allowed to read the ReaderFrom data directly:

type ReaderFrom interface {        ReadFrom(r Reader) (n int64, err error)}

Adapting Reader & Writers

Sometimes you will find that you have a Reader function to accept, but all you have is writer. Perhaps you need to write the data dynamically to an HTTP request, but http.NewRequest() accept only one Reader . You can use io.Pipe() to reverse-transcription the interface:

func Pipe() (*PipeReader, *PipeWriter)

This provides you with a new read interface and a write interface. Any writes to the new PipeWriter will go to PipeReader . I seldom use this feature directly, but exec.Cmd use it to implement Stdin , Stdout and Stderr pipeline, which is useful when using commands to execute.

Closing streams

All good things have to end, and this is no exception when using a byte stream. The shutdown interface provides a common way to turn off the stream:

type Closer interface {        Close() error}

CloserThere's nothing to say, because it's simple, but I find that the Close() function always returns an error, and my type can implement closer when needed. Closer is not always used directly, but is sometimes used in combination with other interfaces ReadCloser , WriteCloser and ReadWriteCloser .

Moving around within streams

Streams are typically continuous byte streams from beginning to end, but there are some exceptions. For example, a file can be manipulated as a stream, but you can also jump to a specific location in a file. Provides an Seeker interface for jumping in a stream:

type Seeker interface {        Seek(offset int64, whence int) (int64, error)}

There are three ways of jumping: moving from the current position, moving from the beginning, moving from the last. You can use whence parameters to specify the move mode. offsetparameter specifies the number of bytes to move. An offset can be useful if you use a fixed-length block in a file or if your file contains an offset index. Sometimes, this data is stored in the header, so it makes sense to start from scratch, but sometimes the data is specified in the trailer, so you need to move from the last.

Optimizing for Data Types

If you need a byte or rune, then reading and writing blocks can be cumbersome. Go provides some interfaces to make this easier.

Working with individual bytes

ByteReaderand ByteWriter interfaces provide a simple interface for reading and writing a single byte:

type ByteReader interface {        ReadByte() (c byte, err error)}type ByteWriter interface {        WriteByte(c byte) error}

You will notice that there are no length parameters, because the length will always be 0 or 1. Returns an error if a byte has not been read or written. The ByteScanner interface also provides a buffer byte reader for processing:

type ByteScanner interface {        ByteReader        UnreadByte() error}

This allows you to push the previously read bytes back to the reader for the next read. This is especially useful when writing the LL (1) parser because it allows you to peek at the next available byte.

Working with individual runes

If you are parsing Unicode data, you need to use Rune instead of a single byte. In this case, the following will be used RuneReader and RuneScanner :

type RuneReader interface {        ReadRune() (r rune, size int, err error)}type RuneScanner interface {        RuneReader        UnreadRune() error}

Conclusion

Byte stream is critical to most go programs. They are the interface from the network to the files on the disk to all the content that the user enters from the keyboard. The IO package provides the basis for all these interactions. We studied reading bytes, writing bytes, copying bytes, and finally studied optimizing these operations. These primitives may seem simple, but they provide the foundation for all data-intensive applications. Please look at the IO package and consider its interface in the application.

Reference links

Https://medium.com/go-walkthrough/go-walkthrough-io-package-8ac5e95a9fbd

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.