Introduction to Sed and AWK

Source: Internet
Author: User
Introduction to Sed and AWK is a programming language dedicated for text processing. yes, it is a programming language. it only serves text processing, so you cannot use it to write system software or Scientific Computing (of course, it can also do mathematical computing ), it can only be used for text processing. unlike sed, AWK has... introduction to Sed and AWK is a programming language dedicated for text processing. yes, it is a programming language. it only serves text processing, so you cannot use it to write system software or Scientific Computing (of course, it can also do mathematical computing ), it can only be used for text processing. unlike sed, AWK has the programming language features, including built-in functions, logical statements, and input and output statements. In fact, it looks like a C language, but all the functions are concentrated on text processing. unlike Sed, the most powerful feature of AWK is to process structured text, that is, the text has a certain organizational structure. command format: awk [-F value] [-v var = value] 'Program text' [files...] awk [-F valu E] [-v var = value]-f program-file [files...] example: [plain] [alex @ alexon: ~] $ Awk '{print} 'persons.txt 1011, Alex Perkins, Product, Software Developer 3923, Jimmey Mills, Operation, COO 23934, Kevin Kim, Management, CEO 2321, Chris Paul, UI, for more information about Designer, see cat. more meaningful: [plain] [alex @ alexon: ~] $ Awk-F,-v OFS =: '{print $1, $2, $3, $4} 'persons.txt 1011: Alex Perkins: Product: Software Developer 3923: jimmey Mills: Operation: COO 23934: Kevin Kim: Management: CEO 2321: Chris Paul: UI: Designer awk can recognize the text structure and format the output. the format of the program is 'projectt' or the content in the program file: BEGIN {actions}/pattern/{actions} END {actions} BEGIN is executed before processing the file. the center is called Body loop. the END is executed after the processing is completed. you can use \ to implement branch input: BEGIN {action} \/pattern {ac Tion} \ END {action} if written in a file, you can write the program-file.awk like the C language: BEGIN {actions;}/pattern/{actions;} END {actions ;} the AWK execution method first executes the content in the BEGIN segment, and then executes the body for each row of the file. after all rows are processed, the END segment is executed. that is to say, BEGIN and END are executed only once, while the Body loop needs to be executed many times, depending on the number of rows and pattern matching. because it needs to be executed multiple times, it is called the Body loop. the built-in variable AWK assumes that the input text is a structured text in the form of a table, each row is a Record, and each column is a Field ). when AWK reads data, it processes the text in a structured manner. in this case, some built-in variables are used: the FS -- Field Separator domain delimiter. by default, the delimiter used to separate RS -- Record Separator records is used. By default, FILENAME -- current filenameNF -- Number of Feilds in current recordNR -- Number of Record input records are separated by line breaks, which is equivalent to the same Number of rows. multiple files are incremented. FNR -- File Number of Record input current Number of records, each file calculates $0 separately -- the whole record the current entire record $ n -- the nth field of the current record and the nth field utilizes these built-in variables, after AWK reads the text, it can process the text to break down the structured text. the objective is to convert the input into a structured information in the form of a Table. right, there are corresponding variables for Output to control the Output format: ORS -- Output Record, the domain Separator for Output by OFS -- Ouput Field Separator The print statement of the record Separator statement (actions) when the Separator is output in the form of a string, and each variable following it is treated as a string. when separated by commas (,), OFS is used to separate fields. if the fields are separated by spaces, OFS: [plain] [alex @ alexon: ~] is used as a space. $ Awk-F, 'In in {OFS = ";"} {print $1, $2, $3, $4} 'persons.txt 1011; Alex Perkins; Product; software Developer 3923; Jimmey Mills; Operation; COO 23934; Kevin Kim; Management; CEO 2321; Chris Paul; UI; Designer [alex @ alexon: ~] $ Awk-F, 'In in {OFS = "; "} {print $1 $2 $3 $4} 'persons.txt 1011 Alex Perkins Product Software Developer 3923 Jimmey Mills Operation COO 23934 Kevin Kim Management CEO 2321 Chris Paul UI Designer print not with parameters, output the current record. the printf statement can be formatted and output very similar to the C language. [plain] [alex @ alexon: ~] $ Awk-F, 'In in {OFS = ";"} {printf "% d:", NR; print $1, $2, $3, $4} 'persons.txt 1: 1011; Alex Perkins; Product; Software Developer 2: 3923; Jimmey Mills; Operation; COO 3: 23934; Kevin Kim; Management; CEO 4: 2321; chris Paul; UI; the programming language of the Designer is very similar to that of the C language. there are operators, built-in functions, and variables that can implement very powerful functions. This part is usually not used and is not clear in an article, you can refer to awk man documents or books. recommendation The metacharacters of the regular expression are: ^ $. [] | () * +? In AWK, it is the same as the standard regular expression: location character: ^ --- the beginning of a row $ ---- the end of a row. ---- any non-linefeed '\ n' character \ B ---- end of a word. a word is defined as a series of letters or numbers. it can be placed at either end or two ends. limit * --- 0 or one or more + --- one or more? --- 0 or 1 {m} --- appear m Times {m, n} --- appear m times to n times, for example, {} indicates appear 1 to 5 times, 3, 4, 5 times) escape character \ --- can escape special character set [] --- any character in it [^] --- match any character operator not in this character set | ---- or operation, abc \ | 123 matches 123 or abc (...) ---- combination to form a group. it is mainly used for the Index \ n ---- the nth combination before, and/\ (123 \) \ 1/matches 123123.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.