1. Describe the cut command in one or two sentences!
As its name suggests, cut is used to cut data. Specifically, cut data is used in files.
Cut is a processing object for each row. This mechanism is the same as sed. (The Introduction to sed will be released soon)
2. What is the basis for cut? That is to say, how can I tell cut what I want to locate?
The cut command mainly accepts three positioning methods:
First, byte (bytes), with option-B
Second, character (characters), with option-c
Third, the field (fields), with option-f
3. Locate by byte to give the simplest example?
For example, when you execute the ps command, the output is similar to the following:
[Rocrocket @ rocrocket programming] $ who
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
If we want to extract the 3rd bytes of each row, it will be like this:
[Rocrocket @ rocrocket programming] $ who | cut-B 3
C
C
C
See,-B can be followed to specify which byte to extract. In fact, there is no space between-B and 3, but spaces are recommended :)
4. What if I want to extract 3rd, 4th, 5th, and 8th bytes from the "Byte" location?
-B can be written as 3-5 characters, and multiple positioning locations are separated by commas. Let's take a look at the example:
[Rocrocket @ rocrocket programming] $ who | cut-B 3-5, 8
Croe
Croe
Croe
Note that if the cut command uses the-B option, when executing this command, the cut command First sorts all the positions after-B and then extracts them. The order of positioning cannot be reversed. This example illustrates the problem:
[Rocrocket @ rocrocket programming] $ who | cut-B 8, 3-5
Croe
Croe
Croe
5. What other tips like "3-5" are available!
[Rocrocket @ rocrocket programming] $ who
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
[Rocrocket @ rocrocket programming] $ who | cut-B-3
Roc
Roc
Roc
[Rocrocket @ rocrocket programming] $ who | cut-B 3-
Crocket: 0
Crocket pts/0 (: 0.0)
Crocket pts/1 (: 0.0)
As you can see,-3 indicates from the first byte to the third byte, and 3-indicates from the third byte to the end of the row. If you are careful, you can see that in both cases, the third Byte "c" is included ".
What do you think if I run who | cut-B-3, 3? The answer is to output the entire row without two consecutive overlapping c records. See:
[Rocrocket @ rocrocket programming] $ who | cut-B-3-
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
6. give a simple example of a character-based positioning mark!
In the following example, you may have known each other and extracted 3rd, 4th, 5th, and 8th characters:
[Rocrocket @ rocrocket programming] $ who | cut-c 3-5, 8
Croe
Croe
Croe
But why is it no different from-B? Does Momo-B play the same role as-c? Otherwise, it seems the same, just because the example is not good, who outputs only single-byte characters, so there is no difference between-B and-c. If you extract Chinese characters, the difference is as follows:
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt
Monday
Tuesday
Wednesday
Thursday
[Rocrocket @ rocrocket programming] $ cut-B 3 cut_ch.txt
?
?
?
?
[Rocrocket @ rocrocket programming] $ cut-c 3 cut_ch.txt
I
II
3.
Thu
As you can see,-c will take the unit of characters and the output will be normal.-B will only be silly to calculate in bytes (8-bit binary), and the output will be garbled.
Now that you have mentioned this knowledge point, I would like to add that if you have learned the knowledge, you can improve it.
When encountering multi-byte characters, you can use the-n option,-n is used to tell cut not to separate the multi-byte characters. Example:
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-B 2
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 2
?
?
?
?
[Rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 1, 2, 3
Star
Star
Star
Star
7. What is the domain? Explanation:
Why is there "domain" extraction? Because the-B and-c mentioned just now can only extract information from documents in a fixed format, but they are helpless for non-fixed format information. At this time, the "Domain" will be used.
(The following content is explained when you know the content and organization of the/etc/passwd file .)
If you have observed the/etc/passwd file, you will find that it is not in a fixed format as the output information of who, but rather scattered. However, the colon plays a very important role in each line of the file. It is used to separate each item.
We are lucky that the cut command provides such an extraction method. Specifically, it sets the "interval" and then sets "extract the first few fields!
Take the first five elements of/etc/passwd as an example:
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5
Root: x: 0: 0: root:/bin/bash
Bin: x: 1: 1: bin:/sbin/nologin
Daemon: x: 2: 2: daemon:/sbin/nologin
Adm: x: 3: 4: adm:/var/adm:/sbin/nologin
Lp: x: 4: 7: lp:/var/spool/lpd:/sbin/nologin
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1
Root
Bin
Daemon
Adm
Lp
You can see, use-d to set the delimiter as a colon, and then use-f to set what I want to take as the first domain, and press Enter, all user names are listed! Have a sense of accomplishment!
Of course, when setting-f, you can also use a format such as 3-5 or 4-similar:
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1, 3-5
Root: 0: 0: root
Bin: 1: 1: bin
Daemon: 2: 2: daemon
Adm: 3: 4: adm
Lp: 4: 7: lp
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1, 3-5, 7
Root: 0: 0: root:/bin/bash
Bin: 1: 1: bin:/sbin/nologin
Daemon: 2: 2: daemon:/sbin/nologin
Adm: 3: 4: adm:/sbin/nologin
Lp: 4: 7: lp:/sbin/nologin
[Rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f-2
Root: x
Bin: x
Daemon: x
Adm: x
Lp: x
8. How can I distinguish between spaces and tabs? I think it's a bit messy. What should I do?
Sometimes tabs are hard to identify. There is a way to see whether a space is composed of several spaces or a tab.
[Rocrocket @ rocrocket programming] $ cat tab_space.txt
This is tab finish.
This is several space finish.
[Rocrocket @ rocrocket programming] $ sed-n l tab_space.txt
This is tab \ tfinish. $
This is several space finish. $
As you can see, if it is a TAB, it will be displayed as the \ t symbol. If it is a space, it will be displayed as is.
This method can be used to determine the tabs and spaces.
Note that the character after sed-n is a lowercase letter of L. (There are also or operations for letters l and numbers 1 | it's really hard to tell ..., It seems that these three are more difficult to tell than the tabs ...)
9. What symbols should I use in cut-d to set tabs or spaces?
Quietly tell you that the default delimiter of the cut-d option is a tab, so when you want to use a tab, You can omit the-d option, you can directly use-f to retrieve the domain! Trust me!
If you set a space as the delimiter, the following will apply:
[Rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1
This
This
Note that there must be a space between two single quotes.
In addition, you can only set one space after-d, but not multiple spaces, because cut only allows the delimiter to be one character.
[Rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1
Cut: the delimiter must be a single character
Try 'cut -- help' for more information.
10. How can I always repeat the last two lines when I use the ps and cut commands together?
The problem is described as follows.
When cut and ps are used together:
[Rocrocket @ rocrocket programming] $ ps
PID TTY TIME CMD
2977 pts/0 00:00:00 bash
5032 pts/0 00:00:00 ps
[Rocrocket @ rocrocket programming] $ ps | cut-b3
P
9
0
0
Look, the last 0 has been repeated twice !! In addition, I have tried ps ef or ps aux.
However, this problem does not occur when ps works with other commands. For example, if cut works with who, it is normal:
[Rocrocket @ rocrocket programming] $ who
Rocrocket: 0
Rocrocket pts/0 (: 0.0)
Rocrocket pts/1 (: 0.0)
[Rocrocket @ rocrocket programming] $ who | cut-b3
C
C
C
I am very grateful for the strange answer to this strange question, which cannot be solved. The original post address is [here].
In fact, this problem is like this. ps | cut creates a process by itself, so when ps also extracts the process and outputs it to cut through the pipeline. After cut, there is an extra row. The reason why we repeat the content of the previous line is that we happen to have the same character as the content of the previous line.
Run ps and ps in the test. | cat knows the reason! :)
11. What are the defects and shortcomings of cut?
Have you guessed it? Yes, that is, when processing multiple spaces.
If some fields in the file are separated by several spaces, it is a little troublesome to use cut, because cut is only good at processing text content "separated by one character.