Interview question content:
Requirement: use "|" as the separator to print the first character of the first field is 1, the second character of the second field is 2, and the third character of the Third Field is 3, and so on; the first eight characters in the last and second fields are the date of the day, for example, "20140610 ".
SOURCE file raw_data.txt:
1 | 12 | a7f865ce-b274-4b23-890c-893c7d1f2198 | 2016082055104
3 | 22 | bd166d4f-5222-4d69-a277-deb543db1a9d | 20141117012936 |
1 | a2af | 1135ea2-067a-4d4c-b56f | 01442332 | 308g5dfg | 955226r9 | 2016092037 | 20150428102737
1 | 222 | 1f3f1950-6b0e-4459-a1ee | sad4sadf | adsa5dfd | 7746765 | 2016092002 |
$ Date + % Y % m % d
20160920
Code implemented in Python:
$./Check_string.py
Matched: 1 | a2af | 1135ea2-067a-4d4c-b56f | 01442332 | 308g5dfg | 955226r9 | 2016092037 | 20150428102737
#! /Usr/bin/env python
Import datetime
Def check_item (string ):
Item_list = string. split ('| ')
For item_id in range (0, len (item_list )):
Try:
Item_sub_list = list (item_list [item_id])
Failed T IndexError:
Return False
Special_item_id = len (item_list)-2
Expect_item_sub_value = item_id + 1
If item_id! = Special_item_id:
Try:
If not item_sub_list [item_id]! = Expect_item_sub_value:
Return False
Failed T IndexError:
Return False
Else:
If not datetime. datetime. now (). strftime ("% Y % m % d") = ''. join (item_sub_list [0: 8]):
Return False
Return True
If _ name __= = '_ main __':
Filename = 'raw_data.txt'
With open (filename) as fp:
For line in fp:
If check_item (line. replace ('\ n ','')):
Print "Matched: {0}". format (line. replace ('\ n ',''))
Code implemented by "O & M @ Su Dong" using awk:
$ Cat raw_data.txt | awk-F \ | 'In in {d = strftime ("% Y % m % d")} {I = 1; j = NF-1; k = 0; while (I <J-1) {if (substr ($ I, I, 1) = I) {k ++ ;}; I ++ }; if (k = i-1 & substr ($ j,) = d & substr ($ NF, NF, 1) = NF) {print $0 }}'
1 | a2af | 1135ea2-067a-4d4c-b56f | 01442332 | 308g5dfg | 955226r9 | 2016092037 | 20150428102737
Variable name meaning
Number of ARGC command line variables
ARGV command line meta array
FILENAME current input file name
Number of records in the current FNR file
The input field delimiter of FS. The default delimiter is a space.
RS input record delimiter
Number of domains in the current NF record
No. Of NR records so far
OFS output domain separator
ORS output record separator
1. awk '/101/'file: The file contains 101 matching rows.
Awk '/101/,/105/' file
Awk '$1 = 5' file
Awk '$1 = "CT"' file must contain double quotation marks
Awk '$1 * $2> 100' file
Awk '$2> 5 & $2 <= 15' file
2. awk '{print NR, NF, $1, $ NF,}' file displays the current file record number, number of fields, and the first and last fields of each row.
Awk '/101/{print $1, $2 + 10}' file displays the first and second fields of the matching row of the file plus 10.
Awk'/101/{print $1 $2} 'file
Awk '/101/{print $1 $2}' file displays the first and second fields of the matching row of the file, but it shows that there is no separator in the middle of the time domain.
3. df | awk '$4> 1000000' is input through a pipeline operator. For example, a row with 4th fields meeting the conditions is displayed.
4. awk-F "|" '{print $1}' file is operated according to the new separator "|.
Awk 'In in {FS = "[: \ t |]"}
{Print $1, $2, $3} 'file modifies the input separator by setting the input separator (FS = "[: \ t |.
Sep = "|"
Awk-F $ Sep '{print $1}' file uses the environment variable Sep value as the separator.
Awk-F' [: \ t |] ''{print $1} 'file uses the value of the regular expression as the separator. Here, space,:, TAB, and | are used as the separator at the same time.
Awk-F' [] [] ''{print $1} 'file uses the value of the regular expression as the separator, which indicates [,]
5. The awk-f awkfile file is sequentially controlled by the awkfile content.
Cat awkfile
/101/{print "\ 047 Hello! \ 047 "} -- print 'Hello! '. \ 047 represents single quotes.
{Print $1, $2} -- because there is no mode control, print the first two fields of each row.
6. awk '$1 ~ /101/{print $1} 'file: The first field in the file matches 101 rows (records ).
7. awk 'In in {OFS = "% "}
{Print $1, $2} 'file modifies the output format by setting the output separator (OFS = "%.
8. awk 'In in {max = 100; print "max =" max} BEGIN indicates the operation performed before any row is processed.
{Max = ($1> max? $1: max); print $1, "Now max is" max} 'file gets the maximum value of the first domain of the file.
(Expression 1? Expression 2: Expression 3 is equivalent:
If (expression 1)
Expression 2
Else
Expression 3
Awk '{print ($1> 4? "High" $1: "low" $1)} 'file
9. awk '$1 * $2> 100 {print $1}' file indicates that the first domain in the file matches 101 rows (records ).
10. awk '{$1 = 'Chi' {$3 = 'China '; print} 'file: find the matching row, replace the first 3rd fields, and then display the row (record ).
Awk '{$ 7% = 3; print $7}' file divides the 7th domain by 3, assigns the remainder to the 7th domain, and then prints it.
11. awk '/tom/{wage = $2 + $3; printf wage}' file: find the matching row, assign a value to the variable wage, and print the variable.
12. awk '/tom/{count ++ ;}
END {print "tom was found" count "times"} 'file END indicates processing after all input rows are processed.
13. awk 'gsub (/\ $/, ""); gsub (/,/, ""); cost + = $4;
END {print "The total is $" cost> "filename"} 'file gsub function replaces $ and with an empty string, and then outputs The result to filename.
1 2 3 $1,200.00
1 2 3 $2,300.00
1 2 3 $4,000.00
Awk '{gsub (/\ $/, ""); gsub (/,/,"");
If ($4> 1000 & $4 <2000) c1 + = $4;
Else if ($4> 2000 & $4 <3000) c2 + = $4;
Else if ($4> 3000 & $4 <4000) c3 + = $4;
Else c4 + = $4 ;}
END {printf "c1 = [% d]; c2 = [% d]; c3 = [% d]; c4 = [% d] \ n", c1, c2, c3, c4} "'file
Use if and else if to complete the condition statement
Awk '{gsub (/\ $/, ""); gsub (/,/,"");
If ($4> 3000 & $4 <4000) exit;
Else c4 + = $4 ;}
END {printf "c1 = [% d]; c2 = [% d]; c3 = [% d]; c4 = [% d] \ n", c1, c2, c3, c4} "'file
Exit is used to exit when a condition is specified, but the END operation is still executed.
Awk '{gsub (/\ $/, ""); gsub (/,/,"");
If ($4 & gt; 3000) next;
Else c4 + = $4 ;}
END {printf "c4 = [% d] \ n", c4} "'file
Use next to skip this row in case of a condition and perform operations on the next row.
14. awk '{print FILENAME, $0}' file1 file2 file3> fileall writes all contents of file1, file2, and file3 to fileall. The format is
Print the file and the file name.
15. awk '$1! = Previous {close (previous); previous = $1}
{Print substr ($0, index ($0, "") + 1)> $1} 'fileall splits the merged file into three files. And is consistent with the original file.
16. awk 'In in {"date" | getline d; print d} 'sends the execution result of date to getline through the pipeline, assigns it to the variable d, and then prints it.
17. awk 'In in {system ("echo \" Input your name: \ c \ ""); getline d; print "\ nYour name is", d, "\ B! \ N "}'
Use the getline command to enter and display the name.
Awk 'In in {FS = ":"; while (getline <"/etc/passwd"> 0) {if ($1 ~ "050 [0-9] _") print $1 }}'
Print the username in the/etc/passwd file that contains the 050x _ username.
18. awk '{I = 1; while (I <NF) {print NF, $ I; I ++}' file loops through the while statement.
Awk '{for (I = 1; I <NF; I ++) {print NF, $ I}' file loops through the for statement.
Type file | awk-F "/"'
{For (I = 1; I <NF; I ++)
{If (I = NF-1) {printf "% s", $ I}
Else {printf "% s/", $ I }}' shows the full path of a file.
Display date with for and if
Awk 'In in {
For (j = 1; j <= 12; j ++)
{Flag = 0;
Printf "\ n % d Month \ n", j;
For (I = 1; I <= 31; I ++)
{
If (j = 2 & I> 28) flag = 1;
If (j = 4 | j = 6 | j = 9 | j = 11) & I> 30) flag = 1;
If (flag = 0) {printf "% 02d % 02d", j, I}
}
}
}'
19. The system variable to be called in awk must be enclosed in single quotation marks. If it is double quotation marks, it indicates a string.
Flag = abcd
Awk '{print' $ flag'} 'returns abcd
Awk '{print "$ Flag"}' returns $ Flag
The conversion from chinaunix is as follows:
Sum:
$ Awk 'In in {total = 0} {total + = $4} END {print total} 'a.txt -----sum the Fourth Field of the.txt file!
$ Awk '/^ (no | so)/'test ----- print all rows starting with "no" or "so.
$ Awk '/^ [ns]/{print $1}' test ----- if the record starts with n or s, print the record.
$ Awk '$1 ~ /[0-9] [0-9] $/(print $1} 'test ----- print this record if the first field ends with two digits.
$ Awk '$1 = 100 | $2 <50' test ----- print this row if the first or equal to 100 or the second domain is less than 50.
$ Awk '$1! = 10'test ----- print the row if the first field is not equal to 10.
$ Awk '/test/{print $1 + 10}' test ----- if the record contains the regular expression test, add 10 to the first field and print it out.
$ Awk '{print ($1> 5? "OK" $1: "error" $1)} 'test ----- if the first field is greater than 5, the expression value after the question mark is printed; otherwise, the expression value after the colon is printed.
$ Awk '/^ root/,/^ mysql/'test ---- print all records starting with the regular expression root to the record range starting with the regular expression mysql. If you find a new record starting with the regular expression root, print it until the next record starting with the regular expression mysql, or until the end of the file.