Linux cut command details, cut command details
Cut is a selection command, which is to analyze a piece of data and retrieve what we want. In general, the selection information is usually analyzed for "rows", not the entire information analysis.
(1) The syntax format is:
Cut [-bn] [file] or cut [-c] [file] or cut [-df] [file]
Instructions for use
The cut command cut bytes, characters, and fields from each line of the file and writes these bytes, characters, and fields to the standard output.
If the File parameter is not specified, the cut command reads the standard input. -B,-c, or-f must be specified.
Main Parameters
-B: Split in bytes. These byte locations ignore the multi-byte character boundary unless the-n flag is also specified.
-C: separated by characters.
-D: custom delimiter. The default Delimiter is tab.
-F: used with-d to specify the region to display.
-N: undelimiter multi-byte characters. It is used only with the-B flag. If the last byte of a character falls within the <br/> range indicated by the List parameter of the-B flag, the character is written. Otherwise, the character is excluded.
(2) What is the basis for cut? That is to say, how can I tell cut what I want to locate?
The cut command mainly accepts three positioning methods:
First, byte (bytes), with option-B
Second, character (characters), with option-c
Third, the field (fields), with option-f
(3) locate by byte
For example, when you execute the ps command, the output is similar to the following:
[Rocrocket @ rocrocket programming] $ who
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
If we want to extract the 3rd bytes of each row, it will be like this:
[Rocrocket @ rocrocket programming] $ who | cut-B 3
C
C
C
(4) What if I want to extract 3rd, 4th, 5th, and 8th bytes from the "Byte" location?
-B can be written as 3-5 characters, and multiple positioning locations are separated by commas. Let's take a look at the example:
[Rocrocket @ rocrocket programming] $ who | cut-B 3-5, 8
Croe
Croe
Croe
Note that if the cut command uses the-B option, when executing this command, the cut command First sorts all the positions after-B and then extracts them. The order of positioning cannot be reversed. This example illustrates the problem:
[Rocrocket @ rocrocket programming] $ who | cut-B 8, 3-5
Croe
Croe
Croe
(5) What other tips are like "3-5!
[Rocrocket @ rocrocket programming] $ who
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
[Rocrocket @ rocrocket programming] $ who | cut-B-3
Roc
Roc
Roc
[Rocrocket @ rocrocket programming] $ who | cut-B 3-
Crocket: 0
Crocket pts/0 (: 0.0)
Crocket pts/1 (: 0.0)
As you can see,-3 indicates from the first byte to the third byte, and 3-indicates from the third byte to the end of the row. If you are careful, you can see that in both cases, the third Byte "c" is included ".
What do you think if I run who | cut-B-3, 3? The answer is to output the entire row without two consecutive overlapping c records. See:
[Rocrocket @ rocrocket programming] $ who | cut-B-3-
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
(6) give a simple example of a character-based positioning mark!
In the following example, you may have known each other and extracted 3rd, 4th, 5th, and 8th characters:
[Rocrocket @ rocrocket programming] $ who | cut-c 3-5, 8
Croe
Croe
Croe
But why is it no different from-B? Does Momo-B play the same role as-c? Otherwise, it seems the same, just because the example is not good, who outputs only single-byte characters, so there is no difference between-B and-c. If you extract Chinese characters, the difference is as follows:
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt
Monday
Tuesday
Wednesday
Thursday
[Rocrocket @ rocrocket programming] $ cut-B 3 cut_ch.txt
��
��
��
��
[Rocrocket @ rocrocket programming] $ cut-c 3 cut_ch.txt
I
II
3.
Thu
As you can see,-c will take the unit of characters and the output will be normal.-B will only be silly to calculate in bytes (8-bit binary), and the output will be garbled.
Now that you have mentioned this knowledge point, I would like to add that if you have learned the knowledge, you can improve it.
When encountering multi-byte characters, you can use the-n option,-n is used to tell cut not to separate the multi-byte characters. Example:
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-B 2
��
��
��
��
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 2
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 1, 2, 3
Star
Star
Star
Star
(7) What is the domain? Explanation :)
Why is there "domain" extraction? Because the-B and-c mentioned just now can only extract information from documents in a fixed format, but they are helpless for non-fixed format information. At this time, the "Domain" will be used. If you have observed the/etc/passwd file, you will find that it is not in a fixed format as the output information of who, but rather scattered. However, the colon plays a very important role in each line of the file. It is used to separate each item.
We are lucky that the cut command provides such an extraction method. Specifically, it sets the "interval" and then sets "extract the first few fields!
Take the first five elements of/etc/passwd as an example:
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5
Root: x: 0: 0: root:/bin/bash
Bin: x: 1: 1: bin:/sbin/nologin
Daemon: x: 2: 2: daemon:/sbin/nologin
Adm: x: 3: 4: adm:/var/adm:/sbin/nologin
Lp: x: 4: 7: lp:/var/spool/lpd:/sbin/nologin
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1
Root
Bin
Daemon
Adm
Lp
You can see, use-d to set the delimiter as a colon, and then use-f to set what I want to take as the first domain, and press Enter, all user names are listed! Have a sense of accomplishment!
Of course, when setting-f, you can also use a format such as 3-5 or 4-similar:
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1, 3-5
Root: 0: 0: root
Bin: 1: 1: bin
Daemon: 2: 2: daemon
Adm: 3: 4: adm
Lp: 4: 7: lp
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1, 3-5, 7
Root: 0: 0: root:/bin/bash
Bin: 1: 1: bin:/sbin/nologin
Daemon: 2: 2: daemon:/sbin/nologin
Adm: 3: 4: adm:/sbin/nologin
Lp: 4: 7: lp:/sbin/nologin
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f-2
Root: x
Bin: x
Daemon: x
Adm: x
Lp: x
(8) How can we distinguish between spaces and tabs? I think it's a bit messy. What should I do?
Sometimes tabs are hard to identify. There is a way to see whether a space is composed of several spaces or a tab.
[Rocrocket @ rocrocket programming] $ cat tab_space.txt
This is tab finish.
This is several space finish.
[Rocrocket @ rocrocket programming] $ sed-n l tab_space.txt
This is tab \ tfinish. $
This is several space finish. $
As you can see, if it is a TAB, it will be displayed as the \ t symbol. If it is a space, it will be displayed as is.
This method can be used to determine the tabs and spaces.
Note that the character after sed-n is a lowercase letter of L.
(9) What symbols should I use in cut-d to set tabs or spaces?
In fact, the default delimiter of the cut-d option is a tab, so when you want to use a tab, You can omit the-d option, you can directly use-f to retrieve the domain.
If you set a space as the delimiter, the following will apply:
[Rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1
This
This
Note that there must be a space between two single quotes.
In addition, you can only set one space after-d, but not multiple spaces, because cut only allows the delimiter to be one character.
[Rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1
Cut: the delimiter must be a single character
Try 'cut -- help' for more information.
(10) What are the defects and shortcomings of cut?
Have you guessed it? Yes, that is, when processing multiple spaces.
If some fields in the file are separated by several spaces, it is a little troublesome to use cut, because cut is only good at processing the text content "separated by one character ".