Usage of the Linux Cut command
Font: [Increase decrease] Type: Reprint time: 2013-10-03 I want to comment
Cut is a selection command that analyzes a piece of data and takes out what we want. In general, the selection of information is usually for "line" for analysis, not the entire information analysis
(1) The syntax format is:
Cut [-bn] [file] or cut [-c] [file] or cut [-DF] [file]
Instructions for use
The cut command cuts bytes, characters, and fields from each line of the file and writes those bytes, characters, and fields to standard output.
If you do not specify a File parameter, the Cut command reads standard input. One of the-B,-C, or-f flags must be specified.
Main parameters
-B: Split in bytes. These byte locations will ignore multibyte character boundaries unless the-n flag is also specified.
-C: Split in characters.
-D: Custom delimiter, default is tab.
-F: Used with-D to specify which area to display.
-N: Cancels splitting multibyte characters. Used only with the-B flag. If the last byte of the character falls within the range of <br/> indicated by the List parameter of the-B flag, the character will be written out; otherwise, the character will be excluded.
(2) What is cut generally based on? In other words, how do I tell cut what I want to locate?
The cut command mainly accepts three positioning methods:
First, byte (bytes), with option-B
Second, character (characters), with option-C
Third, domain (fields), with option-f
(3) Positioning as "byte"
For example, when you execute the PS command, it will output something like this:
[email protected] programming]$ who
rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)
If we want to extract the 3rd byte of each row, that's it:
[Email protected] programming]$ Who|cut-b 3
C
C
C
(4) If "byte" is located, I want to extract the 3rd, 4th, 5th and 8th bytes, what should I do?
-B supports the notation of form 3-5, and multiple positions are separated by commas. Let's take a look at examples:
[Email protected] programming]$ Who|cut-b 3-5,8
Croe
Croe
Croe
But one thing to note is that if you use the-B option for the Cut command, when you execute this command, the cut will first sort all the positions after-B and then extract them. Can not reverse the order of positioning Oh. This example can illustrate the problem:
[Email protected] programming]$ Who|cut-b 8,3-5
Croe
Croe
Croe
(5) What are the "3-5" tips, please?
[email protected] programming]$ who
rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)
[Email protected] programming]$ Who|cut-b-3
Roc
Roc
Roc
[Email protected] programming]$ Who|cut-b 3-
crocket:0 2009-01-08 11:07
Crocket pts/0 2009-01-08 11:23 (: 0.0)
Crocket pts/1 2009-01-08 14:15 (: 0.0)
As you can see, 3 means from the first byte to the third byte, and 3-from the third byte to the end of a line. If you are careful, you can see that in both cases, the third byte "C" is included.
What do you think would happen if I executed Who|cut-b -3,3-? The answer is to output an entire line, without the occurrence of two consecutive overlapping c. See:
[Email protected] programming]$ Who|cut-b -3,3-
rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)
(6) Give the simplest example of a character-based marker!
The following example you déjà vu, extract the 3rd, 4th, 5th and 8th characters:
[Email protected] programming]$ who|cut-c 3-5,8
Croe
Croe
Croe
But, what's the difference between looking and B? Does B and C function the same? In fact, it looks the same, just because this example is not good, who output is a single-byte character, so with-B and c no difference, if you extract Chinese, the difference is seen, to see the situation of Chinese extraction:
[email protected] programming]$ cat Cut_ch.txt
Monday
Tuesday
Wednesday
Thursday
[Email protected] programming]$ Cut-b 3 cut_ch.txt
?
?
?
?
[Email protected] programming]$ cut-c 3 cut_ch.txt
One
Two
Three
Four
See, with-C will be in character units, the output is normal, and-B will only be silly in bytes (8-bit bits) to calculate, the output is garbled.
Now that we have mentioned this point of knowledge, I would like to add that if you learn more, you can improve it.
When you encounter multibyte characters, you can use the-n option, which is used to tell the cut not to disassemble multibyte characters.
Examples are as follows:
[email protected] programming]$ cat Cut_ch.txt |cut-b 2
?
?
?
?
[email protected] programming]$ cat Cut_ch.txt |CUT-NB 2
[email protected] programming]$ cat Cut_ch.txt |CUT-NB
Star
Star
Star
Star
(7) What's going on in the field? Explanation and Explanation:
Why is there a "domain" extraction, because the B and C just mentioned can only extract information in a fixed-format document, and for non-fixed-format information is helpless. This is where "domain" comes in handy. If you look at the/etc/passwd file, you will find that it is not the same format as the WHO output, but rather fragmented emissions. However, the colon plays a very important role in each line of the file, and the colon is used to separate each item.
We are fortunate that the cut command provides such an extraction method, specifically to set the "spacer", and then set the "Extract the first few domains", OK!
Take the first five elements of/etc/passwd as an example:
[Email protected] programming]$ Cat/etc/passwd|head-n 5
Root:x:0:0:root:/root:/bin/bash
Bin:x:1:1:bin:/bin:/sbin/nologin
Daemon:x:2:2:daemon:/sbin:/sbin/nologin
Adm:x:3:4:adm:/var/adm:/sbin/nologin
Lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
[[email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1
Root
Bin
Daemon
Adm
Lp
See, use-D to set the delimiter is a colon, and then use-F to set the first domain I want to take, and then press ENTER, all the user names are listed! Oh, there is a sense of accomplishment!
Of course, when you set-F, you can also use a format such as 3-5 or 4-similar:
[Email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1,3-5
Root:0:0:root
Bin:1:1:bin
Daemon:2:2:daemon
Adm:3:4:adm
Lp:4:7:lp
[Email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1,3-5,7
Root:0:0:root:/bin/bash
Bin:1:1:bin:/sbin/nologin
Daemon:2:2:daemon:/sbin/nologin
Adm:3:4:adm:/sbin/nologin
Lp:4:7:lp:/sbin/nologin
[Email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-f-2
Root:x
Bin:x
Daemon:x
Adm:x
Lp:x
(8) If you encounter spaces and tabs, how to distinguish it? I feel a little bit messy, how to do?
Sometimes the tab is really difficult to identify, there is a way to see whether a space is composed of a number of spaces or a tab character.
[email protected] programming]$ cat Tab_space.txt
This is tab finish.
This is several space finish.
[Email protected] programming]$ sed-n l tab_space.txt
This is tab\tfinish.$
This is several space finish.$
See, if it is a tab (tab), then it will be displayed as the \ t symbol, if it is a space, it will be displayed as is.
Tabs and spaces can be judged by this method.
Note that the character behind the sed-n is the lowercase letter of L. Oh, don't look wrong.
(9) What symbols should I use in cut-d to set tabs or spaces?
In fact, the default spacer of the-D option for cut is a tab, so when you are going to use a tab, you can omit the-D option and use-F to take the domain directly.
If you set a space as a spacer, that's it:
[email protected] programming]$ cat tab_space.txt |cut-d '-F 1
This
This
Note that there is a space between the two single quotes that you really want to have. Oh, don't be lazy.
Also, you can only set a space after-D, you don't have to set multiple spaces, because cut only allows the spacer to be one character.
[email protected] programming]$ cat tab_space.txt |cut-d '-F 1
Cut:the delimiter must is a single character
Try ' Cut--help ' for more information.
(ten) What are the defects and deficiencies of the cut?
Did you guess? Yes, when dealing with multiple spaces.
If some fields within a file are spaced by several spaces, then using cut is a bit of a hassle, because cut is only good at working with "one character interval" text
This article from "shuosirhttp://shuosir.blog.51cto.com" blog, reproduced please contact the author!
Use of shell scripts---cut intercept data