This is a creation in Article, where the information may have evolved or changed. In this article, we'll use some of the well-known parallel paradigms of Go (Goroutine and Waitgroup) to efficiently traverse directories with a large number of files. All code can be found on GitHub [here] (Https://github.com/Tim15/golang-parallel-io). I am developing a project to write a program to package a directory into a file. Then I began to look at the file IO system of Go. There seem to be several ways to traverse the directory. You can use ' filepath '. Walk () ', or you can write one yourself. [Some people point out] (https://github.com/golang/go/issues/16399), compared to ' find ', ' filepath. Walk () ' is really slow, so I want to know if I can write a faster method. I'll show you how I use some of the great features of Go to achieve this. You can apply them to other issues. # # Recursive version Donnald Knut (Donald Knuth) once wrote: "Immature optimizations are the root of all evils (premature optimization is the root of all evil.)". Following this recommendation, we will first write a simple recursive version of ' Find ' with Go and then parallelize it. First, open the directory: "' Gofunc lsfiles (dir string) {file, err: = OS. Open (dir) if err! = Nil {fmt. PRINTLN ("Error opening directory")}defer file. Close () "Then, get the sub-file slices in this file (Slice, which is the list or array in other languages). "' gofiles, err: = file. Readdir ( -1) if err! = Nil {fmt. PRINTLN ("Error reading Directory")} ' then we will iterate over the files and call our function again. Gofor _, F: = Range Files {if F.isdir () {lsfiles (dir + "/" + F.name ())}fmt. Println (dir + "/" + F.name ())}} "can see that only if the file is a directory, we will call our function, otherwise, just print out the path and name of the file. # # EarlyStep test Now, let's test it out. On a MacBook Pro with SSD, using ' time ', I get the following results: ' $ find/users/alexkreidler274165real0m2.046suser0m0.416ssys0m1.640s$./ Recursive/users/alexkreidler274165real0m13.127suser0m1.751ssys0m10.294s ' and put it with ' filepath. Walk () ' compared to: ' Gofunc main () {err: = filepath. Walk (OS. Args[1], func (path string, fi os. FileInfo, err Error) error {if Err! = Nil {return err}fmt. PRINTLN (PATH) return nil}) if err! = Nil {log. Fatal (Err)}} ""./walk/users/alexkreidler274165real0m13.287suser0m2.033ssys0m10.863s "# # # Goroutine Well, it's time to parallelize. What happens if we try to change the recursive call to Goroutine? Just "'" ' goif F.isdir () {lsfiles (dir + "/" + F.name ())} "" Changed to "" GOif F.isdir () {Go lsfiles (dir + "/" + F.name ())} "Oops, not good! Now, it just lists some top-level files. This program generates a lot of goroutine, but with the end of the main function, the program does not wait for Goroutine to complete. We need to let the program wait for all the goroutine to end. # # Waitgroup To do this, we will use a ' sync '. Waitgroup '. Basically, it keeps track of the number of Goroutine in the group and remains blocked until there are no more goroutine. First, create our ' waitgroup ': ' Govar wg sync. Waitgroup "and then we'll start the recursive function by adding one to this waitgroup, using Goroutine. When ' Lsfiles () ' ends, our ' main ' function will remain before the ' WG ' is emptyBlocking state. "' GOWG. ADD (1) lsfiles (DIR) WG. Wait () "Now, for each goroutine we produce, add one to Waitgroup:" ' goif F.isdir () {WG. ADD (1) Go lsfiles (dir + "/" + F.name ())} "and then, at the tail of our ' lsfiles ' function, call ' WG. Done () ' to subtract a count from Waitgroup. "' Godefer WG. Done () "Good!" Now, before it prints every file, it should be in a wait state. # # Ulimits and Semaphore channel is now a tricky part. Depending on your CPU and the number of cores in the CPU, you may or may not be experiencing this problem. If the Go scheduler has enough cores available, it can fully load the goroutine ([see here] (https://stackoverflow.com/questions/8509152/max-number-of-goroutines ))。 However, most operating systems will limit the number of open files per process. For UNIX systems, this limitation is kernel ' ulimits '. On my Mac, the limit is 10,240 files, but because I have only 2 cores, I'm not affected by this. On a recently produced computer with more cores, the GO scheduler may create more than 10,240 goroutine at the same time. Each goroutine will open the file, so you will get the error: ' Too many open files ' to solve this problem, we will use a semaphore channel: ' ' Govar Semaphorechan = Make (chan struct{}, runtime. Gomaxprocs (runtime. NUMCPU ()) "The channel size is limited to the number of CPUs or cores on our machines. "' Gofunc lsfiles (dir string) {///full time block Semaphorechan <-struct{}{}defer func () {//read to release slot <-semaphorechanwg.done ( )} () ... ' When we try to send to this channel, it will be blocked. Then when finished, read from the channel to release the slot. For more information, see [This StackoVerflow posts] (https://stackoverflow.com/questions/38824899/golang-too-many-open-files-in-go-function-goroutine). # # Test and Benchmark ' go$./benchmark.shcpus/cores:2gomaxprocs:2find/users/ alexkreidler274165real0m2.046suser0m0.416ssys0m1.640s./recursive/users/ alexkreidler274165real0m13.127suser0m1.751ssys0m10.294s./parallel/users/ alexkreidler274165real0m9.120suser0m4.781ssys0m10.676s./walk/users/ Alexkreidler274165real0m13.287suser0m2.033ssys0m10.863s ' # # All right, ' find ' is still the king of IO, but at least, our parallel version is on the original recursive version and ' FilePath. Walk () ' version of the improvements. Hopefully this article shows you how to use some of the powerful features of Go to build a parallel system. We discussed: * goroutine* waitgroup* Channel (semaphore) in fact, in [Github.com/golang/tools/imports/fastwalk.go] (https://github.com/ GOLANG/TOOLS/BLOB/MASTER/IMPORTS/FASTWALK.GO), Golang has a ' filepath. Walk ' 's faster implementation, and its implementation principle is the same as this article. Because of the API guarantee in the ' filepath ' package, you can modify it in the Go 2.0 release.
via:https://timhigins.ml/benchmarking-golang-file-io/
Author: Timothy higinbottom Translator: Ictar proofreading: Rxcai
This article by GCTT original compilation, go language Chinese network honor launches
This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove
1403 reads