Shell (string truncation)

Source: Internet
Author: User

In shell (string truncation), cut is a processing object for each row. This mechanism is the same as sed. (The Introduction to sed will be released soon) 2. What is the basis of cut? That is to say, how can I tell cut what I want to locate? The cut command mainly accepts three positioning Methods: First, bytes, option-B Second, character (characters), option-c third, field (fields ), use Option-f3 to locate in "Byte" to give the simplest example? For example, when you execute the ps command, the output is similar to the following: [rocrocket @ rocrocket programming] $ whorocrocket: 0: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0) if we want to extract the 3rd bytes of each row, it is like this: [rocrocket @ rocrocket programming] $ who | cut-B 3ccc: see it.-B can specify which byte to extract, in fact, there is no space between-B and 3, but space is recommended :) 4 What if I want to extract 3rd, 4th, 5th, and 8th bytes in the "Byte" location? -B can be written as 3-5 characters, and multiple positioning locations are separated by commas. Let's take a look at the example: [rocrocket @ rocrocket programming] $ who | cut-B 3-5, 8croecroecroe, but note that if the cut command uses the-B option, when you execute this command, cut first sorts all the positions after-B, and then extracts them. The order of positioning cannot be reversed. This example illustrates this problem: [rocrocket @ rocrocket programming] $ who | cut-B 8, 3-5croecroecroe5 what are tips like "3-5! [Rocrocket @ rocrocket programming] $ whorocrocket: 0 11: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0) [rocrocket @ rocrocket programming] $ who | cut-B-3 rocrocroc [rocrocket @ rocrocket programming] $ who | cut-B 3-crocket: 0 11: 07 crocket pts/0 (: 0.0) crocket pts/1 (: 0.0) You must have seen it too.-3 indicates from the first byte to the third byte, 3-indicates from the third byte to the end of the row. If you are careful, you can see that in both cases, the third Byte "c" is included ". What do you think if I run who | cut-B-3, 3? The answer is to output the entire row without two consecutive overlapping c records. View: [rocrocket @ rocrocket programming] $ who | cut-B-3,3-rocrocket: 0 11: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0) 6 is the simplest example of a character-based positioning mark! In the following example, you may have known each other and extracted 3rd, 4th, 5th, and 8th characters: [rocrocket @ rocrocket programming] $ who | cut-c 3-5, 8croecroecroe,, why is it no different from-B? Does Momo-B play the same role as-c? Otherwise, it seems the same, just because the example is not good, who outputs only single-byte characters, so there is no difference between-B and-c. If you extract Chinese characters, the difference is as follows: [rocrocket @ rocrocket programming] $ cat cut_ch.txt Monday Tuesday Thursday [rocrocket @ rocrocket programming] $ cut-B 3 cut_ch.txt �� [rocrocket @ rocrocket programming] $ cut-c 3 cut_ch.txt saw it in 1234, if-c is used, the output is normal in characters, while-B is only silly in bytes (8-bit binary), and the output is garbled. Now that you have mentioned this knowledge point, I would like to add that if you have learned the knowledge, you can improve it. When encountering multi-byte characters, you can use the-n option,-n is used to tell cut not to separate the multi-byte characters. Example: [rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-B 2 �� [rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 2 [rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 1, 2, 3. What happened to the stars, stars, and 6 domains? Explanation: Why is there "domain" extraction? Because the-B and-c mentioned just now can only extract information from documents in a fixed format, but they are helpless for non-fixed format information. At this time, the "Domain" will be used. (The following content is explained when you know the content and organization of the/etc/passwd file .) If you have observed the/etc/passwd file, you will find that it is not in a fixed format as the output information of who, but rather scattered. However, the colon plays a very important role in each line of the file. It is used to separate each item. We are lucky that the cut command provides such an extraction method. Specifically, it sets the "interval" and then sets "extract the first few fields! Take the first five lines of/etc/passwd as an example: [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 root: x: 0: 0: root: /root:/bin/bashbin: x: 1: 1: bin:/sbin/nologindaemon: x: 2: daemon:/sbin/nologinadm: x: 3: 4: adm:/var/adm:/sbin/nologinlp: x: 4: 7: lp:/var/spool/lpd: /sbin/nologin [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1rootbindaemonadmlp, use-d to set the delimiter to colon, and then use-f to set that I want to take the first domain, and then press Enter. All user names are listed Coming out! Have a sense of accomplishment! Of course, when setting-f, you can also use a format such as 3-5 or 4-similar: [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f, 3-5root: 0: 0: rootbin: 1: 1: bindaemon: 2: 2: daemonadm: 3: 4: admlp: 4: 7: lp [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d: -f 1, 3-5, 7 root: 0: 0: root:/bin/bashbin: 1: 1: bin:/sbin/nologindaemon: 2: 2: daemon:/sbin/nologinadm: 3: 4: adm:/sbin/nologinlp: 4: 7: lp:/sbin/nologin [rocrocket @ rocrocket prog Ramming] $ cat/etc/passwd | head-n 5 | cut-d:-f-2 root: xbin: xdaemon: xadm: xlp: x7 If spaces and tabs are encountered, how can this problem be identified? I think it's a bit messy. What should I do? Sometimes tabs are hard to identify. There is a way to see whether a space is composed of several spaces or a tab. [Rocrocket @ rocrocket programming] $ cat tab_space.txtthis is tab finish. this is several space finish. [rocrocket @ rocrocket programming] $ sed-n l tab_space.txtthis is tab \ tfinish. $ this is several space finish. $. If it is a TAB, it is displayed as the \ t symbol. If it is a space, it is displayed as is. This method can be used to determine the tabs and spaces. Note that the character after sed-n is a lowercase letter of L. (There are also or operations for letters l and numbers 1 | it's really hard to tell ..., It seems that these three are more difficult to tell than the tabs ...) 8. What symbols should I use in cut-d to set tabs or spaces? Quietly tell you that the default delimiter of the cut-d option is a tab, so when you want to use a tab, You can omit the-d option, you can directly use-f to retrieve the domain! Trust me! If you set a space as a delimiter, you should note: [rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1thisthis, there must be a space between two single quotes. In addition, you can only set one space after-d, but not multiple spaces, because cut only allows the delimiter to be one character. [Rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1cut: the delimiter must be a single characterTry 'cut -- help' for more ionion.9 how can I always repeat the last two lines when using the ps and cut commands? The problem is described as follows. When cut and ps are used together: [rocrocket @ rocrocket programming] $ psPID tty time limit 2977 pts/0 00:00:00 bash5032 pts/0 00:00:00 ps [rocrocket @ rocrocket programming] $ ps | cut-b3P900 check, the last 0 has been repeated twice !! In addition, I have tried ps ef or ps aux. This problem does not occur when ps works with other commands. For example, if cut works with who, it is normal: [rocrocket @ rocrocket programming] $ whorocrocket: 0 11: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0) [rocrocket @ rocrocket programming] $ who | cut-b3ccc, a seemingly weird question that cannot be solved, has been answered by sunway. Thank you very much. The original post address is [here]. In fact, this problem is like this. ps | cut creates a process by itself, so when ps also extracts the process and outputs it to cut through the pipeline. After cut, there is an extra row. The reason why we repeat the content of the previous line is that we happen to have the same character as the content of the previous line. Run ps and ps in the test. | cat knows the reason! :) What are the defects and shortcomings of 10 cut? Have you guessed it? Yes, that is, when processing multiple spaces. If some fields in the file are separated by several spaces, it is a little troublesome to use cut, because cut is only good at processing the text content "separated by one character" and encounters a string problem in shell, the first consideration is grep, sed, awk, and cut, first, let's take a look at what can be done in a simple method in string truncation. $ {} (1) the difference between single quotes and double quotes in shell is: single quotes close all characters with special functions, the double quotation marks only require shell to ignore the majority. Specifically, they are the ① dollar sign (②) the reverse quotation mark (③ The backslash). These three special characters are not ignored (2) evaluate the string length ----- (1) expr $ x :'. * '(2) $ {# x} (3) Evaluate the string substring ----- $ {x: $ pos: $ len}, where $ pos position, $ len Length, $ len can be omitted (4) string replacement ----- $ {x/a/B} replaces a with B; $ {x // a/B} replace all the first and last troughs of a (5) strings with B ----- $ {x # */} There are characters on the/left, and other matching can be used to replace */$ {x # */}. Only the first occurrence of all characters on the/left is removed. The order is left to right. $ {x %/*} removes all/right characters, $ {x %/*} removes the first occurrence/right character, and the sequence is from right to left. eg ,#! /Bin/bashy = kdjfkd: dfkdjfkd: 8888: 9899899: kdjfkdfjq = 'echo $ y | cut-d ":"-f4' // to: intercept, take the fourth field, that is, 9899899 m =$ {q # * 8} // remove all the characters on the left of 8, echo $ mn =$ {q # * 8} // remove the first field 8 character echo $ nresult on the left: 99 99899 (6) Shell array defines a = (1 2 3 4) and cannot have spaces, such as a = (1 2 3 4) and a = (1 2 3 4) are not allowed. (7) array length: $ {# a [@]} or $ {# a [*]}; all arrays $ {a [@]} or {$ a [*]} return the length of the 1 2 3 4 (8) array element $ {# a [I]}, I is a subscript. Like other languages, starting from 0, the array element $ {a [I]} (9) uses space as the domain Separator by default, "+" and "? "Only applicable to awk, not sed and grep awk condition operators: <, <=, >=, = ,! = ,~,!~ Awk string processing functions: gsub (r, s), gsub (r, s, t), index (s, t) and other awk custom environment variables: FILENAME, FNR, FS, NF, NR, and other awk-F: '{print $1}'/etc/passwd print the value of the first column, $0 print all values awk-F: 'In in {print "name passwd \ n -----------------"} {print $1 "\ t" $5} 'END {print "End of file"}'/etc/passwd output increases awk '{if ($0 ~ /Root/) print $0} '/etc/passwd output line matching "root", equivalent to awk' $0 ~ /Root/'/etc/passwd (10) Whatever the command is? Sed does not deal with the initialization file. It only operates on a copy of the file. If you do not redirect to a file, it is displayed directly on the standard output (Display. Search and replace with sed. Sed-n'1, $ p '/etc/passwd print all content from 1 to the last line sed-e'/root/= '/etc/passwd print the line number of "root" and all rows sed -n'/root/= '/etc/passwd: print only the row number sed-n-E'/root/P'/etc/passwd. Only the matching row sed with "root" is printed. -n-e '/root/p'-E'/root/='/etc/passwd print matching row and row number sed's/^ M // G'/etc/ passwd deletes the end control character of a row (^ M) = ctrl + v + m sed's/^ 0 * // G'/etc/passwd Delete multiple zeros (11) at the beginning of a row grep is generally used to search for fields or strings, sed is used to search for or replace. awk can perform complex operations and custom operations (12) First, let alone the Shell version to check the Shell variables. There are three variables in the Shell.: System variables, environment variables, user variables, system variables: $ # number of parameters passed to the script; $ id of the current process in which the script runs; $? The exit status of the last command. Table 0 is successful. $! PID of the previous command; $ @ with "parameter 1" "parameter 2 "... save all parameters. $ * uses "parameter 1 parameter 2... "Save all parameters in the form; $0 indicates the Script Name User variable: Use the set command to view the environment variable: Use setenv to view (13) different shell versions have different array assignment methods, the bourne shell (such as bash) does not support array. (14) view the directory or file. Symbolic Link: ls-l | grep '^ d' or ls-l | grep' ^ -', ls-l | grep '^ l' (15) to view the last column: ls-l | grep '^ l' | awk' {print $ NF} '(separated by spaces by default). In awk, NF indicates the number of fields, $ NF indicates the last field 00

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.