Cut command detailed

Source: Internet
Author: User

The line will log back, calculate the request time-out rate, will use the Cut command, read a good article, turned.

1 Describe the cut command

As its name, cut's job is to "cut", specifically in the document is responsible for cutting the data used.

Cut is a processing object with each action, and this mechanism is the same as SED

2 Cut what is the general basis for it? In other words, how do I tell cut what I want to locate?

The cut command mainly accepts three positioning methods:

First, byte (bytes), with option-B

Second, character (characters), with option-C

Third, domain (fields), with option-f

3 with "byte" positioning, give the simplest example?

For example, when you execute the PS command, it will output something like this:

[email protected] programming]$ who

rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)

If we want to extract the 3rd byte of each row, that's it:

[Email protected] programming]$ Who|cut-b 3

C

C

C

See, B, after the can be set to extract which byte, in fact, there is no space between-B and 3 is also possible, but the recommended space:)

4 If "byte" is positioned, I want to extract 3rd, 4th, 5th and 8th bytes, what do I do?

-B supports the notation of form 3-5, and multiple positions are separated by commas. Let's take a look at examples:

[Email protected] programming]$ Who|cut-b 3-5,8

Croe

Croe

Croe

But one thing to note is that if you use the-B option for the Cut command, when you execute this command, the cut will first sort all the positions after-B and then extract them. Can not reverse the order of positioning Oh. This example can illustrate the problem:

[Email protected] programming]$ Who|cut-b 8,3-5
Croe
Croe
Croe

5 What are some of the "3-5" tips, please?

[email protected] programming]$ who
rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)
[Email protected] programming]$ Who|cut-b-3
Roc
Roc
Roc
[Email protected] programming]$ Who|cut-b 3-
crocket:0 2009-01-08 11:07
Crocket pts/0 2009-01-08 11:23 (: 0.0)
Crocket pts/1 2009-01-08 14:15 (: 0.0)

As you can see, 3 means from the first byte to the third byte, and 3-from the third byte to the end of a line. If you are careful, you can see that in both cases, the third byte "C" is included.
What do you think would happen if I executed Who|cut-b -3,3-? The answer is to output an entire line, without the occurrence of two consecutive overlapping c. See:
[Email protected] programming]$ Who|cut-b -3,3-
rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)

6 The simplest example of a character-based marker!
The following example you déjà vu, extract the 3rd, 4th, 5th and 8th characters:
[Email protected] programming]$ who|cut-c 3-5,8
Croe
Croe
Croe
But, what's the difference between looking and B? Does B and C function the same? In fact, it looks the same, just because this example is not good, who output is a single-byte character, so with-B and c no difference, if you extract Chinese, the difference is seen, to see the situation of Chinese extraction:
[email protected] programming]$ cat Cut_ch.txt
Monday
Tuesday
Wednesday
Thursday
[Email protected] programming]$ Cut-b 3 cut_ch.txt
?
?
?
?
[Email protected] programming]$ cut-c 3 cut_ch.txt
One
Two
Three
Four

See, with-C will be in the character unit, the output is normal, and-B only silly in bytes (8-bit bits) to calculate, the output is garbled.
Since this point of knowledge is mentioned, add another sentence, if you learn more than you can, then raise it.
when multi-byte characters are encountered, you can use the-n option to tell the cut not to disassemble multibyte characters. The example is as follows:
[[email protected] programming]$ cat cut_ch.txt |cut-b 2
?
?
?
?
[[email protected] programming]$ cat cut_ch.txt |cut-nb 2
 
[[email protected] programming] $ cat Cut_ch.txt |CUT-NB-------
star
Star
Star
Star

6 Domain What's going on? Explanation:)
Why there is a "domain" extraction, because the only mentioned-B and-C can only be in a fixed format of the document to extract information, and for non-fixed format information is helpless. This is where "domain" comes in handy.
(The following explanation is done if you know more about the content and organization of the/etc/passwd file.)
If you look at the/etc/passwd file, you'll find that it's not as fixed as the WHO output, but rather as a fragmented emission. However, the colon plays a very important role in each line of the file, and the colon is used to separate each item.
We are fortunate that the cut command provides such an extraction method, specifically, to set the "spacer", and then set "Extract the domain", OK!
Take the first five elements of/etc/passwd as an example:
[[email protected] programming]$ cat/etc/passwd|head-n 5
root:x:0:0:root:/ Root:/bin/bash
Bin:x:1:1:bin:/bin:/sbin/nologin
Daemon:x:2:2:daemon:/sbin:/sbin/nologin
Adm:x:3:4:adm :/var/adm:/sbin/nologin
Lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
[[email protected] programming]$ Cat /etc/passwd|head-n 5|cut-d:-F 1
Root
Bin
Daemon
ADM
LP

See, use-D to set the delimiter is a colon, and then use-F to set the first domain I want to take, and then press ENTER, all the user names are listed! Oh, there is a sense of accomplishment!
Of course, when you set the-F, you can also use a format such as 3-5 or 4-similar:
[[email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1,3-5
Root:0:0:root
Bin:1:1:bin
Daemon:2:2:daemon
Adm:3:4:adm
LP:4:7:LP
[[email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1,3-5,7
Root:0:0:root:/bin/bash
Bin:1:1:bin:/sbin/nologin
Daemon:2:2:daemon:/sbin/nologin
Adm:3:4:adm:/sbin/nologin
Lp:4:7:lp:/sbin/nologin
[[email  Protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-f-2
root:x
bin:x
daemon:x
adm:x
lp:x

7 If you encounter spaces and tabs, how to tell? I feel a little bit messy, how to do?
Sometimes the tab is really difficult to identify, there is a way to see whether a space is composed of a number of spaces or a tab character.
[email protected] programming]$ cat Tab_space.txt
This is tab finish.
This is several space finish.
[Email protected] programming]$ sed-n l tab_space.txt
This is tab\tfinish.$
This is several space finish.$
See, if it is a tab (tab), then it will be displayed as the \ t symbol, if it is a space, it will be displayed as is.
Tabs and spaces can be judged by this method.
Note that the character behind the sed-n is the lowercase letter of L. Oh, don't look wrong. (The letter L, the number 1 also has or operations | It's hard to tell ... and it seems that these three are harder to tell than the tabs ... )

8 What symbols should I use in cut-d to set tabs or spaces?
Quietly tell you, cut's-D option is the default spacer is a tab, so when you are to use a tab, you can completely omit the-D option, and directly with-F to take the field! Don't worry, trust me!
If you set a space as a spacer, that's it:
[email protected] programming]$ cat tab_space.txt |cut-d '-F 1
This
This
Note that there is a space between the two single quotes that you really want to have. Oh, don't be lazy.
Also, you can only set a space after-D, you don't have to set multiple spaces, because cut only allows the spacer to be one character.
[email protected] programming]$ cat tab_space.txt |cut-d '-F 1
Cut:the delimiter must is a single character
Try ' Cut--help ' for more information.

9 I want to use the PS and cut commands together, how always in the last two lines of repetition?
The specific description of the problem is as follows.
When cut and PS are mated:
[[Email protected] programming]$ PS
PID TTY Time CMD
2977 pts/0 00:00:00 Bash
5032 pts/0 00:00:00 PS
[Email protected] programming]$ PS|CUT-B3
P
9
0
0
Look, the last 0 repeats two times!! Also, I've tried PS ef or PS aux for this issue.
When PS and other commands mate, there is no such problem, such as cut and who mates are normal:
[email protected] programming]$ who
rocrocket:0 2009-01-08 11:07
Rocrocket pts/0 2009-01-08 11:23 (: 0.0)
Rocrocket pts/1 2009-01-08 14:15 (: 0.0)
[Email protected] programming]$ WHO|CUT-B3
C
C
C
This seemingly bizarre problem that I baffled, got the answer of Sunway, thank him very much here.
In fact, the problem is this, ps|cut will create a process itself, so when PS will also extract the process, and then through the pipeline output to cut, so cut interception, there is a row, the reason is repeated on a line of content, because we happen to take the same line of content and the same character.
You will know why you are doing PS and Ps|cat under test! :)

What are the defects and deficiencies of the ten cut?
Did you guess? Yes, when dealing with multiple spaces.
If some of the fields in the file are spaced by several spaces, then using cut is a bit of a hassle, because cut is only good at working with "one character interval" text content

Cut command Details (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.