When compiling lexer or parser, except lexer and parser, tokenize and tokenizer often appear, basically all source code that involves lexical parsing will use tokenize.
It is named by developers who use English. Otherwise, the name may be replaced by other simple words and will not be visualized, therefore, different languages and cultures may lead to different ways of thinking. Therefore, Chinese people's ways of thinking must be different from that of Westerners, it may be difficult for them to understand our language.
In any case, good things should be learned and used for reference. If tokenize is used so frequently, it must be meaningful? How can this problem be solved? Here is a piece of mastering Java 2 translated by Qiu zhongpan:
The streamtokenizer class extracts identifiable substrings and markup symbols from the input stream according to user-defined rules. This process is called token ([I] tokenizing [/I]), because the stream is simplified to the token. Tokens ([I] token [/I]) usually represent keywords, variable names, strings, direct quantities, braces, and other syntax punctuation.
Refer to Qiu zhongpan's translation:
Token: token
Tokenize: Token-based
Tokenizer: token parser
Another translation I see is: token can be translated as "tag", tokenize can be translated as "tag resolution" or "Resolution tag ", tokenizer can be translated as a "tag parser"
In my understanding, tokenize is responsibleCodeIt is parsed as "strings", and paser generates the corresponding syntax structure based on the pre-and post-sequential relationships of these "strings. Tokens seem to be more vivid, but I always think it sounds very blunt. However, the translation into a "tag" has a narrow range. I cannot find a more suitable word for translation. In short, understanding is the most important.
I don't know how you understand it and how to translate it.