Deep understanding of the bufio of the Go standard library. Scanner

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed. As is known to all, the [buffered IO Standard library] (https://golang.org/pkg/bufio/) has been a powerful tool for optimizing read and write operations in Go. For a write operation, the ' IO buffer ' provides a temporary storage area to hold the data before it is sent to the ' socket ' or hard disk, and the buffer stores the data at a certain capacity before being "released" for next storage, which greatly reduces the number of writes or the final system call being triggered. This will undoubtedly save huge overhead when using system resources frequently. For read operations, ' buffered IO ' means that each operation can read more data, reducing the number of system calls and using the underlying hardware more efficiently by reading hard disk data in blocks. This article will focus more on the [Scanner] (https://golang.org/pkg/bufio/#Scanner) scanner module in the [Bufio] (https://golang.org/pkg/bufio/) package, Its main function is to divide the data stream into tags and remove the spaces between them. "'" "Foo bar baz" "If we just want to get the words in the string above, then the scanner can help us retrieve the three words" foo "," Bar "and" Baz "in order ([View Source] (https://play.golang.org/p/_ GKMSMZMWZ) "Gopackage mainimport (" Bufio "" FMT "" strings ") func main () {input: =" Foo bar baz "Scanner: = Bufio. Newscanner (Strings. Newreader (input)) scanner. Split (Bufio. Scanwords) for scanner. Scan () {fmt. PRINTLN (scanner. Text ())}} ' output: ' The ' foobarbaz ' ' Scanner ' scanner reads the data stream using IO with buffer and accepts ' IO. Reader ' as a parameter. If you need to work with strings or bytes slices in memory, you can first consider using [bytes. Split] (https://golang.org/pkg/bytes/#Split) or [strings. Split] (https://golang.org/pkg/strings/#Split), the method in the ' bytes ' or ' strings ' standard library can be the simplest and most reliable when dealing with these streaming data. At the bottom, the scanner uses buffering to keep storing data, and when the buffer is non-empty or is read to the end of the file (EOF) the ' split ' function is called, we now introduce a predefined ' split ' function, but it may be used more broadly depending on the signature of the function below. "' Gofunc (data []byte, ateof bool) (advance int, token []byte, err Error) ' So far, we know that the ' Split ' function will be called when reading the data, and from the return value, its execution should There are 3 different situations. # # 1. Need to add more data this means that the incoming data is not enough to generate a character stream of tokens, when the returned value is ' 0, nil, Nil ' when the scanner will try to read more data, if the buffer is full, then the buffer will automatically expand to the original twice times before any read operation, Let's take a closer look at this process [view source] (https://play.golang.org/p/j7RDUVujNv) ' Gopackage mainimport ("Bufio" "FMT" "strings") Func main () {Input: = "ABCDEFGHIJKL" Scanner: = Bufio. Newscanner (Strings. Newreader (Input)) Split: = Func (data []byte, ateof bool) (advance int, token []byte, err Error) {FMT. Printf ("%t\t%d\t%s\n", Ateof, Len (data), data) return 0, nil, nil} scanner. Split buf: = Make ([]byte, 2) scanner. Buffer (buf, Bufio. maxscantokensize) for scanner. Scan () {fmt. Printf ("%s\n", scanner. Text ())}} ' output: ' ' FALSE2ABFALSE4ABCDFALSE8ABCDEFGHFALSE12ABCDEFGHIJKLTRUE12ABCDEFGHIJKLThe ' split ' function in the above example can be said to be simple and extremely greedy--always asking for more data, ' Scanner ' trying to read more data while ensuring that the buffer has enough space to hold the data. In the example above, we set the buffer size to 2. "' Gobuf: = make ([]byte, 2) scanner. Buffer (buf, Bufio. Maxscantokensize) ' After the ' split ' function is first called, ' scanner ' multiplies the buffer's capacity, reads more data, and then calls the ' split ' function again. After the second call, the growth factor remains constant, and by observing the output you can find the first call to ' split ' to get a slice of size 2, then 4, 8, and finally 12, because there is no more data. * The default size of the buffer is [4096] (https://github.com/golang/go/blob/13cfb15cb18a8c0c31212c302175a4cb4c050155/src/bufio/scan.go#L76) bytes. * In this case we're going to discuss the ' ateof ' parameter, which we can use in the ' Split ' function to determine if there is data available, which can be triggered at the end of the data (EOF) or read error, once any of the above occurs, ' scanner ' Will refuse to read anything, such as the ' flag ' flag can be used to throw an exception (because of its incomplete character tag), will eventually cause ' scanner. Split () ' returns ' false ' at the time of invocation and terminates the entire process. Exceptions can be obtained by the ' Err ' method. "' Gopackage mainimport (" Bufio "" Errors "" FMT "" strings ") func main () {input: =" ABCDEFGHIJKL "Scanner: = Bufio. Newscanner (Strings. Newreader (Input)) Split: = Func (data []byte, ateof bool) (advance int, token []byte, err Error) {FMT. Printf ("%t\t%d\t%s\n", Ateof, Len (data), data) if ateof {return 0, nil, errors. New("Bad Luck")} return 0, Nil, nil} scanner. Split buf: = Make ([]byte, Scanner). Buffer (buf, Bufio. maxscantokensize) for scanner. Scan () {fmt. Printf ("%s\n", scanner. Text ())} if scanner. ERR ()! = nil {fmt. Printf ("Error:%s\n", scanner. ERR ())}} ' output: ' ' False12abcdefghijkltrue12abcdefghijklerror:bad luck ' ' ateof ' parameter can also be used to process data that is left in the buffer, One of the predefined ' split ' functions, progressive scan input, reflects [this behavior] (https://github.com/golang/go/blob/be943df58860e7dec008ebb8d68428d54e311b94/ src/bufio/scan.go#l403), such as when we enter the following words "Foobarbaz" because there is no ' \ n ' character at the end of the line, so when [Scanlines] (https://golang.org/pkg/ bufio/#ScanLines) When a new line of characters cannot be found, it returns the remaining characters as the last character marker ([view source] (https://golang.org/pkg/bufio/#ScanLines)) ' Gopackage Mainimport ("Bufio" "FMT" "strings") func main () {input: = "Foo\nbar\nbaz" Scanner: = Bufio. Newscanner (Strings. Newreader (Input)///In fact, there is no need to pass in Scanlines because this is the default Split function scanner of the standard library. Split (Bufio. scanlines) for scanner. Scan () {fmt. PRINTLN (scanner. Text ())}} ' output: ' ' Foobarbaz ' # # 2. Character Mark found (token) when ' split ' letterThis happens when the number is able to detect the _ mark _. It returns the number of characters moving forward in the buffer and the _ tag _ itself. The reason for returning two values is that the distance that the _ tag _ moves forward is not always equal to the number of bytes. Assuming the input is "foo foo foo", when our goal is to just find the word ([scan Word] (https://golang.org/pkg/bufio/#ScanWords)), the ' Split ' function skips the spaces between them. "(4," foo ") (4," foo ") (3," foo ")" Let's look at a specific example, the following function will only look for a continuous ' foo ' string, [view Source] (https://play.golang.org/p/X_ Adw-knum) "Gopackage mainimport (" Bufio "" bytes "" FMT "" IO "" strings ") func main () {input: =" Foofoofoo "Scanner: = Bufi O.newscanner (Strings. Newreader (Input)) Split: = Func (data []byte, ateof bool) (advance int, token []byte, err error) {if bytes. Equal (Data[:3], []byte{' f ', ' o ', ' o '}) {return 3, []byte{' F '}, nil} If ateof {return 0, nil, io. EOF} return 0, nil, nil} scanner. Split for scanner. Scan () {fmt. Printf ("%s\n", scanner. Text ())}} ' output: ' ' FFF ' # # 3. If the ' Split ' function returns an error then the scanner will stop working, [view Source] (Https://play.golang.org/p/KpiyhMFUyT) ' Gopackage mainimport ("Bufio" " Errors "" FMT "" strings ") func main () {input: =" ABCDEFGHIJKL "Scanner: = Bufio. Newscanner (Strings. NewreaDer (Input) Split: = Func (data []byte, ateof bool) (advance int, token []byte, err Error) {return 0, nil, errors. New ("Bad Luck")} scanner. Split for scanner. Scan () {fmt. Printf ("%s\n", scanner. Text ())} if scanner. ERR ()! = nil {fmt. Printf ("Error:%s\n", scanner. ERR ())}} ' output: ' ' Error:bad luck ' However, one of the special errors does not cause the scanner to stop working immediately. The # # # Errfinaltoken scanner gives the signal (signal) an option called the [final tag] (https://golang.org/pkg/bufio/#pkg-variables). This is a special token that does not break the loop (the scanning process still returns to true), but subsequent calls can cause the scan action to terminate immediately. "' Gofunc (S *scanner) Scan () bool {if S.done {return false} ..." in Go language official [issue #11836] (https://github.com/golang/g o/issues/11836) provides a way to stop scanning immediately when a special tag is found. [View Source] (Https://play.golang.org/p/ArL-k-i2OV) "Gopackage mainimport (" Bufio "" bytes "" FMT "" Strings ") func split (data []byte , ateof bool) (advance int, token []byte, err Error) {Advance, token, err = Bufio. Scanwords (data, ateof) if Err = = Nil && token! = Nil && bytes. Equal (token, []byte{' e ', ' n ', ' d '}) {return 0, []byte{' e ', ' n ', ' d '}, Bufio. Errfinaltoken} Return}func Main () {input: = "Foo End bar" Scanner: = Bufio. Newscanner (Strings. Newreader (input)) scanner. Split for scanner. Scan () {fmt. PRINTLN (scanner. Text ())} if scanner. ERR ()! = nil {fmt. Printf ("Error:%s\n", scanner. ERR ())}} ' output result: ' ' fooend ' > ' io. Both EOF ' and ' errfinaltoken ' types of errors are not considered to be really functional errors--the ' Err ' method will still return ' nil ' when any of these two errors occur and stop the Scanner # # # max Mark Size/Errtoolong by default, The maximum length of the buffer should be less than ' 64 * 1024 ' bytes, which means that the found tag cannot be greater than this limit. "' Gopackage mainimport (" Bufio "" FMT "" strings ") func main () {input: = strings. Repeat ("X", Bufio. Maxscantokensize) Scanner: = Bufio. Newscanner (Strings. Newreader (input)) for scanner. Scan () {fmt. PRINTLN (scanner. Text ())} if scanner. ERR ()! = nil {fmt. PRINTLN (scanner. ERR ())}} ' The program above will print ' Bufio. Scanner:token too long ', we can customize the length of the buffer by [buffer] (https://golang.org/pkg/bufio/#Scanner. Buffer) method, This method appears in the first section above, but we will give a more pertinent example this time, [view Source] (https://play.golang.org/p/ZsgJzuIy4r) "Gobuf: = Make ([]byte) Input: = Strings. Repeat ("X", scanner): = Bufio. Newscanner (Strings. Newreader (input)) scanner. Buffer (BUF) for scanner. Scan () {fmt. PRINTLN (scanner. Text ())}if scanner. ERR ()! = nil {fmt. PRINTLN (scanner. ERR ())} ' output result: ' ' Bufio. Scanner:token too Long "# # # prevents the dead loop a few years ago [issue #8672] (https://github.com/golang/go/issues/8672) was proposed, the solution is to add a piece of code, by judging ' Ateof ' is true and the buffer is empty to determine that the ' split ' function can be called, and the existing code may go into a dead loop. "' Gopackage mainimport (" Bufio "" bytes "" FMT "" strings ") func main () {input: =" Foo|bar "Scanner: = Bufio. Newscanner (Strings. Newreader (Input)) Split: = Func (data []byte, ateof bool) (advance int, token []byte, err Error) {if I: = bytes. Indexbyte (data, ' | '); I >= 0 {return i + 1, data[0:i], nil} if ateof {return len (data), Data[:len (data)], nil} return 0, nil, nil} Scann Er. Split for scanner. Scan () {if scanner. Text ()! = "" {fmt. PRINTLN (scanner. Text ())}}} ' Split ' function assumes that when ' ateof ' is true it is safe to use the remaining buffers as tokens, which raises the [issue #8672] (https://github.com/golang/go/issues/8672) Another problem after being repaired: because the buffer can be empty, the ' split ' function does not increase the size of the buffer when it returns ' (0, [], nil) ', [Issue #9020] (https://github.com/golang/go/issues/9020) found the ' panic ' in this case, [view Source] (https://play.golang.org/p/ Hubd-zinaq) ' Foobarpanic:bufio. scan:100 empty tokens without progressing "when I first read about **scanner** or [Splitfunc] (https://golang.org/pkg/bufio/# Splitfunc) Document I did not understand how they worked in all cases, even reading the source code helped very little, because [Scan] (https://github.com/golang/go/blob/ Be943df58860e7dec008ebb8d68428d54e311b94/src/bufio/scan.go#l128 looks really complicated, hopefully this article will help others to better understand the details of this piece.

Via:https://medium.com/golangspec/in-depth-introduction-to-bufio-scanner-in-golang-55483bb689b4

Author: Michałłowicki Translator: yujiahaol68 proofreading: Rxcai polaris1119

This article by GCTT original compilation, go language Chinese network honor launches

This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove

2,338 reads ∙2 likes
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.