Linux cut usage

Source: Internet
Author: User
In linux, cut is a selection command, which is to analyze a piece of data and retrieve what we want. Generally, the selected information is usually analyzed for rows, rather than the entire information. (1) the syntax format is: cut [-bn] [file] or cut [-c] [fi...
In linux, cut is a selection command, which is to analyze a piece of data and retrieve what we want. In general, the selection information is usually analyzed for "rows", not the entire information analysis. (1) the syntax format is cut [-bn] [file], cut [-c] [file], or cut [-df] [file].
Use the cut command to cut bytes, characters, and fields from each line of the file and write these bytes, characters, and fields to the standard output. If the File parameter is not specified, the cut command reads the standard input. -B,-c, or-f must be specified. The main parameter www.2cto.com-B is separated in bytes. These byte locations ignore the multi-byte character boundary unless the-n flag is also specified. -C: separated by characters. -D: custom delimiter. the default delimiter is tab. -F: used with-d to specify the region to display. -N: undelimiter multi-byte characters. It is used only with the-B flag. If the last byte of the character falls under the List parameter indicated by the-B flag
Within the range, the character will be written; otherwise, the character will be excluded.
(2) What is the basis for cut? That is to say, how can I tell cut what I want to locate? The cut command mainly accepts three positioning methods: first, bytes, option-B second, character (characters), option-c third, field (fields ), option-f
(3) take the "byte" as an example. when you execute the ps command, the following content will be output: [rocrocket @ rocrocket programming] $ whorocrocket: 0 11: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0) if we want to extract the 3rd bytes of each row, it is like this: [rocrocket @ rocrocket programming] $ who | cut-B 3ccc
(4) What if I want to extract 3rd, 4th, 5th, and 8th bytes from the "byte" location? -B can be written as 3-5 characters, and multiple positioning locations are separated by commas. Let's take a look at the example: [rocrocket @ rocrocket programming] $ who | cut-B 3-5, 8 croecroe www.2cto.com croe, but note that if the cut Command uses the-B option, when you execute this command, cut first sorts all the positions after-B, and then extracts them. The order of positioning cannot be reversed. This example illustrates this problem: [rocrocket @ rocrocket programming] $ who | cut-B 8, 3-5 croecroecroe
(5) what other tips are like "3-5! [Rocrocket @ rocrocket programming] $ whorocrocket: 0 11: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0) [rocrocket @ rocrocket programming] $ who | cut-B-3 rocroc
[Rocrocket @ rocrocket programming] $ who | cut-B 3-crocket: 0 11: 07 crocket pts/0 (: 0.0) crocket pts/1 (: 0.0) as you can see,-3 indicates from the first byte to the third byte, and 3-indicates from the third byte to the end of the row. If you are careful, you can see that in both cases, the third byte "c" is included ". What do you think if I run who | cut-B-3, 3? The answer is to output the entire row without two consecutive overlapping c records. View: www.2cto.com [rocrocket @ rocrocket programming] $ who | cut-B-3, 3-rocrocket: 0 11: 07 rocrocket pts/0 (: 0.0) rocrocket pts/1 (: 0.0)
(6) give a simple example of a character-based positioning mark! In the following example, you may have known each other and extracted 3rd, 4th, 5th, and 8th characters: [rocrocket @ rocrocket programming] $ who | cut-c 3-5, 8croecroecroe,, why is it no different from-B? Does Momo-B play the same role as-c? Otherwise, it seems the same, just because the example is not good, who outputs only single-byte characters, so there is no difference between-B and-c. If you extract Chinese characters, the difference is as follows: [rocrocket @ rocrocket programming] $ cat cut_ch.txt Monday Thursday [rocrocket @ rocrocket programming] $ cut-B 3 cut_ch.txt [rocrocket @ rocrocket programming] $ cut-c 3 cut_ch.txt one second let's see it on Thursday, if-c is used, the output is normal in characters, while-B is only silly in bytes (8-bit binary), and the output is garbled. Now that you have mentioned this knowledge point, I would like to add that if you have learned the knowledge, you can improve it. When encountering multi-byte characters, you can use the-n option,-n is used to tell cut not to separate the multi-byte characters. Example: [rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-B 2 [rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut-nb 2 [rocrocket @ rocrocket programming] $ cat cut_ch.txt | cut -nb 1, 2, 3 stars and stars
(7) What is the domain? Explanation: why is there "domain" extraction? because the-B and-c mentioned just now can only extract information from documents in a fixed format, but they are helpless for non-fixed format information. At this time, the "domain" will be used. If you have observed the/etc/passwd file, you will find that it is not in a fixed format as the output information of who, but rather scattered. However, the colon plays a very important role in each line of the file. it is used to separate each item. We are lucky that the cut Command provides such an extraction method. Specifically, it sets the "interval" and then sets "extract the first few fields! Www.2cto.com
Take the first five lines of/etc/passwd as an example: [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 root: x: 0: 0: root: /root:/bin/bashbin: x: 1: 1: bin:/sbin/nologindaemon: x: 2: daemon:/sbin/nologinadm: x: 3: 4: adm:/var/adm:/sbin/nologinlp: x: 4: 7: lp:/var/spool/lpd: /sbin/nologin [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f 1 rootbindaemonadmlp
You can see, use-d to set the delimiter as a colon, and then use-f to set what I want to take as the first domain, and press enter, all user names are listed! Have a sense of accomplishment! Of course, when setting-f, you can also use a format such as 3-5 or 4-similar: [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f, 3-5root: 0: 0: rootbin: 1: 1: bindaemon: 2: 2: daemonadm: 3: 4: admlp: 4: 7: lp [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d: -f 1, 3-5, 7 root: 0: 0: root:/bin/bashbin: 1: 1: bin:/sbin/nologindaemon: 2: 2: daemon:/sbin/nologinadm: 3: 4: adm:/sbin/nologinlp: 4: 7: lp: /sbin/nologin [rocrocket @ rocrocket programming] $ cat/etc/passwd | head-n 5 | cut-d:-f-2 root: x www.2cto.com bin: xdaemon: xadm: xlp: x
(8) How can we distinguish between spaces and tabs? I think it's a bit messy. what should I do? Sometimes tabs are hard to identify. there is a way to see whether a space is composed of several spaces or a tab. [Rocrocket @ rocrocket programming] $ cat tab_space.txtthis is tab finish. this is several space finish. [rocrocket @ rocrocket programming] $ sed-n l tab_space.txtthis is tab \ tfinish. $ this is several space finish. $. if it is a TAB, it is displayed as the \ t symbol. if it is a space, it is displayed as is. This method can be used to determine the tabs and spaces. Note that the character after sed-n is a lowercase letter of L.
(9) what symbols should I use in cut-d to set tabs or spaces? In fact, the default delimiter of the cut-d option is a tab, so when you want to use a tab, you can omit the-d option, you can directly use-f to retrieve the domain. If you set a space as a delimiter, then [rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1 this www.2cto.com this
Note that there must be a space between two single quotes. In addition, you can only set one space after-d, but not multiple spaces, because cut only allows the delimiter to be one character. [Rocrocket @ rocrocket programming] $ cat tab_space.txt | cut-d ''-f 1cut: the delimiter must be a single characterTry 'cut -- help' for more information. (10) What are the defects and shortcomings of cut? Have you guessed it? Yes, that is, when processing multiple spaces. If some fields in the file are separated by several spaces, it is a little troublesome to use cut, because cut is only good at processing the text content "separated by one character ".
 
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.