This is a creation in Article, where the information may have evolved or changed. This article is from the latest [Justforfunc] (http://justforfunc.com/) in the same title section. This program's [code] (HTTPS://GITHUB.COM/CAMPOY/JUSTFORFUNC/BLOB/MASTER/24-GO-SCANNER/MAIN.GO) can be used in the [Justforfunc Warehouse] (https:// Github.com/campoy/justforfunc) found in the. # # # Problem Statement Imagine how you can extract all of these identifiers for the following code snippet. "' Gopackage mainimport" FMT "Func Main () {FMT. Println ("Hello, World")} "we expect to get a list of **main**, **fmt** and **println**. What is the # # # identifier exactly? In order to answer this question, we need to understand the theoretical knowledge about computer language. But just one thing is enough, don't worry about how complicated it is. Computer language is made up of a series of valid rules. For example, the following rule: "' ifstmt =" if "[simplestmt";] Expression Block ["Else" (ifstmt | Block)]. "Above this rule tells us what the IF statement looks like in the Go language. * * "If" * *, * * ";" * *, and * * "Else" * * is a key word to help us understand the structure of the program. At the same time, there are other rules **expression block**, **simplestmt** and the like. The collection of these Rules is syntax, and you can find detailed definitions of them in the Go language specification. These rules are not simply defined by a single character of the program, but rather consist of a series of tokens. These tokens, in addition to atomic tokens like **if** and **else**, have composite tokens such as the integer 42, float 4.2, and string "Hello", as well as identifiers such as **main**. But how do we know that main is an identifier, not a number? It turns out that it has special rules to define. If you have read the identifiers section of the Go language specification, you will find the following rules: "' identifier = Letter | unicOde_digit}. "In this rule, letter and Unicode_digit are not tokens but characters. So with these rules, you can write a program to parse characters by character, and once a set of characters is detected to match a rule, a token will be "emitted" (emits). So, if we take **fmt. println**, for example, can generate these tokens: identifier **fmt**, * * "." * *, as well as identifier **println**. Is this a function call? Here we are not sure, and we do not care. Its structure is a sequence that indicates the order in which tokens appear. [] (Https://raw.githubusercontent.com/studygolang/gctt-images/master/most-common-identifier/1.png) The program that can generate a token sequence for a given sequence of characters is called a scanner. The Go/scanner in the Go standard library comes with a scanner. The tokens it generates are defined in Go/token. # # # Use Go/scanner we already know what a scanner is, how does it work? # # # # # # # # # # # # # # # # # # # # # from the command line let's start with a simple program and print the arguments passed to it: ' ' Gopackage mainimport ("FMT" "OS") Func main () {if Len (OS). Args) < 2 {fmt. fprintf (OS. Stderr, "usage:\n\t%s [files]\n", Os. ARGS[0]) os. Exit (1)}for _, arg: = Range OS. Args[1:] {fmt. Println (ARG)}} "Next, we need to scan the files that come in from the parameters: we need to create a new scanner and initialize it with the contents of the file. # # # # Print each token before we call scanner. Before Scanner the Init method, you need to read the contents of the file and then create a **token for each scanned file. fileset** in order to save the **token. file**. Once the scanner is initialized, we can call its scan method to print tokens. Once we get a EOF (end of file) token, it means that the end of the file is reached. "' Gofs: = token. Newfileset () for _, ARG: = Range OS. Args[1:] {b, err: = Ioutil. ReadFile (ARG) if err! = Nil {log. Fatal (err)}f: = fs. AddFile (ARG, FS. Base (), Len (b)) Var s scanner. Scanners.init (f, B, nil, scanner). Scancomments) for {_, tok, lit: = S.scan () if Tok = = token. EOF {break}fmt. Println (tok, Lit)}} ' # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # In Go, the best way to achieve the above requirements is to use a map, the identifier to do the key, the number of occurrences to do value. Whenever an identifier appears once, the counter is incremented by one. Finally, we convert the map to an array that can be sorted and printed. "' Gocounts: = Make (Map[string]int)//[code removed for clarity]for {_, tok, lit: = S.scan () If Tok = token. EOF {break}if Tok = = token. IDENT {counts[lit]++}}//[to read clearly, remove part of the code]type pair struct {s stringn int}pairs: = Make ([]pair, 0, Len (counts)) for S, N: = r Ange counts {pairs = append (pairs, pair{s, n}) rm-f}sort. Slice (Pairs, func (i, J int) bool {return PAIRS[I].N > PAIRS[J].N}) for I: = 0; I < Len (pairs) && i < 5; i++ {fmt. Printf ("%6d%s\n", PAIRS[I].N, Pairs[i].s)} "in order not to affect the understanding, some code has been deleted." You can be in [here] (https://github.com/campoy/justforfunc/bLOB/MASTER/24-GO-SCANNER/MAIN.GO) Get the full source code. # # # which are the most commonly used identifiers? Let's use this program to analyze the code on the Github.com/golang/go: "bash$ go install github.com/campoy/justforfunc/24-ast/scanner$ scanner ~/go /src/**/*.go 82163 v 46584 err 44681 Args 43371 t 37717 x "In short identifiers, the most commonly used identifiers are the letters **v**. Then we modify the code to calculate some long identifiers: "' Gofor s, N: = range counts {If Len (s) >= 3 {pairs = append (pairs, pair{s, n})}" "Go Again:" ' bash$ go Install github.com/campoy/justforfunc/24-ast/scanner$ scanner ~/go/src/**/*.go 46584 err 44681 Args 36738 nil 25761 true 21723 Addarg ' ' Sure enough, err and nil are the most common identifiers, after all, every program has a statement such as if err! = nil. But the frequency of Args is so high what's going on? For more information, listen to tell.
via:https://medium.com/@francesc/whats-the-most-common-identifier-in-go-s-stdlib-e468f3c9c7d9
Author: Francesc Campoy Translator: Kaneg proofreading: polaris1119
This article by GCTT original compilation, go language Chinese network honor launches
This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove
504 Reads