Use of shell scripts---cut intercept data

Source: Internet
Author: User

Usage of the Linux Cut command


Font: [Increase decrease] Type: Reprint time: 2013-10-03 I want to comment

Cut is a selection command that analyzes a piece of data and takes out what we want. In general, the selection of information is usually for "line" for analysis, not the entire information analysis


(1) The syntax format is:

Cut [-bn] [file] or cut [-c] [file] or cut [-DF] [file]


Instructions for use

The cut command cuts bytes, characters, and fields from each line of the file and writes those bytes, characters, and fields to standard output.

If you do not specify a File parameter, the Cut command reads standard input. One of the-B,-C, or-f flags must be specified.


Main parameters

-B: Split in bytes. These byte locations will ignore multibyte character boundaries unless the-n flag is also specified.

-C: Split in characters.

-D: Custom delimiter, default is tab.

-F: Used with-D to specify which area to display.

-N: Cancels splitting multibyte characters. Used only with the-B flag. If the last byte of the character falls within the range of <br/> indicated by the List parameter of the-B flag, the character will be written out; otherwise, the character will be excluded.


(2) What is cut generally based on? In other words, how do I tell cut what I want to locate?

The cut command mainly accepts three positioning methods:

First, byte (bytes), with option-B

Second, character (characters), with option-C

Third, domain (fields), with option-f


(3) Positioning as "byte"

For example, when you execute the PS command, it will output something like this:

[email protected] programming]$ who

rocrocket:0 2009-01-08 11:07

Rocrocket pts/0 2009-01-08 11:23 (: 0.0)

Rocrocket pts/1 2009-01-08 14:15 (: 0.0)

If we want to extract the 3rd byte of each row, that's it:


[Email protected] programming]$ Who|cut-b 3

C

C

C


(4) If "byte" is located, I want to extract the 3rd, 4th, 5th and 8th bytes, what should I do?

-B supports the notation of form 3-5, and multiple positions are separated by commas. Let's take a look at examples:

[Email protected] programming]$ Who|cut-b 3-5,8

Croe

Croe

Croe

But one thing to note is that if you use the-B option for the Cut command, when you execute this command, the cut will first sort all the positions after-B and then extract them. Can not reverse the order of positioning Oh. This example can illustrate the problem:

[Email protected] programming]$ Who|cut-b 8,3-5

Croe

Croe

Croe

(5) What are the "3-5" tips, please?

[email protected] programming]$ who

rocrocket:0 2009-01-08 11:07

Rocrocket pts/0 2009-01-08 11:23 (: 0.0)

Rocrocket pts/1 2009-01-08 14:15 (: 0.0)

[Email protected] programming]$ Who|cut-b-3

Roc

Roc

Roc

[Email protected] programming]$ Who|cut-b 3-

crocket:0 2009-01-08 11:07

Crocket pts/0 2009-01-08 11:23 (: 0.0)

Crocket pts/1 2009-01-08 14:15 (: 0.0)

As you can see, 3 means from the first byte to the third byte, and 3-from the third byte to the end of a line. If you are careful, you can see that in both cases, the third byte "C" is included.


What do you think would happen if I executed Who|cut-b -3,3-? The answer is to output an entire line, without the occurrence of two consecutive overlapping c. See:


[Email protected] programming]$ Who|cut-b -3,3-

rocrocket:0 2009-01-08 11:07

Rocrocket pts/0 2009-01-08 11:23 (: 0.0)

Rocrocket pts/1 2009-01-08 14:15 (: 0.0)

(6) Give the simplest example of a character-based marker!


The following example you déjà vu, extract the 3rd, 4th, 5th and 8th characters:


[Email protected] programming]$ who|cut-c 3-5,8

Croe

Croe

Croe

But, what's the difference between looking and B? Does B and C function the same? In fact, it looks the same, just because this example is not good, who output is a single-byte character, so with-B and c no difference, if you extract Chinese, the difference is seen, to see the situation of Chinese extraction:

[email protected] programming]$ cat Cut_ch.txt

Monday

Tuesday

Wednesday

Thursday

[Email protected] programming]$ Cut-b 3 cut_ch.txt

?

?

?

?

[Email protected] programming]$ cut-c 3 cut_ch.txt

One

Two

Three

Four

See, with-C will be in character units, the output is normal, and-B will only be silly in bytes (8-bit bits) to calculate, the output is garbled.


Now that we have mentioned this point of knowledge, I would like to add that if you learn more, you can improve it.

When you encounter multibyte characters, you can use the-n option, which is used to tell the cut not to disassemble multibyte characters.


Examples are as follows:

[email protected] programming]$ cat Cut_ch.txt |cut-b 2

?

?

?

?

[email protected] programming]$ cat Cut_ch.txt |CUT-NB 2


[email protected] programming]$ cat Cut_ch.txt |CUT-NB

Star

Star

Star

Star

(7) What's going on in the field? Explanation and Explanation:


Why is there a "domain" extraction, because the B and C just mentioned can only extract information in a fixed-format document, and for non-fixed-format information is helpless. This is where "domain" comes in handy. If you look at the/etc/passwd file, you will find that it is not the same format as the WHO output, but rather fragmented emissions. However, the colon plays a very important role in each line of the file, and the colon is used to separate each item.


We are fortunate that the cut command provides such an extraction method, specifically to set the "spacer", and then set the "Extract the first few domains", OK!


Take the first five elements of/etc/passwd as an example:

[Email protected] programming]$ Cat/etc/passwd|head-n 5

Root:x:0:0:root:/root:/bin/bash

Bin:x:1:1:bin:/bin:/sbin/nologin

Daemon:x:2:2:daemon:/sbin:/sbin/nologin

Adm:x:3:4:adm:/var/adm:/sbin/nologin

Lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

[[email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1

Root

Bin

Daemon

Adm

Lp

See, use-D to set the delimiter is a colon, and then use-F to set the first domain I want to take, and then press ENTER, all the user names are listed! Oh, there is a sense of accomplishment!


Of course, when you set-F, you can also use a format such as 3-5 or 4-similar:

[Email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1,3-5

Root:0:0:root

Bin:1:1:bin

Daemon:2:2:daemon

Adm:3:4:adm

Lp:4:7:lp

[Email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-F 1,3-5,7

Root:0:0:root:/bin/bash

Bin:1:1:bin:/sbin/nologin

Daemon:2:2:daemon:/sbin/nologin

Adm:3:4:adm:/sbin/nologin

Lp:4:7:lp:/sbin/nologin

[Email protected] programming]$ cat/etc/passwd|head-n 5|cut-d:-f-2

Root:x

Bin:x

Daemon:x

Adm:x

Lp:x

(8) If you encounter spaces and tabs, how to distinguish it? I feel a little bit messy, how to do?


Sometimes the tab is really difficult to identify, there is a way to see whether a space is composed of a number of spaces or a tab character.

[email protected] programming]$ cat Tab_space.txt

This is tab finish.

This is several space finish.

[Email protected] programming]$ sed-n l tab_space.txt

This is tab\tfinish.$

This is several space finish.$

See, if it is a tab (tab), then it will be displayed as the \ t symbol, if it is a space, it will be displayed as is.

Tabs and spaces can be judged by this method.

Note that the character behind the sed-n is the lowercase letter of L. Oh, don't look wrong.


(9) What symbols should I use in cut-d to set tabs or spaces?

In fact, the default spacer of the-D option for cut is a tab, so when you are going to use a tab, you can omit the-D option and use-F to take the domain directly.


If you set a space as a spacer, that's it:

[email protected] programming]$ cat tab_space.txt |cut-d '-F 1

This

This

Note that there is a space between the two single quotes that you really want to have. Oh, don't be lazy.

Also, you can only set a space after-D, you don't have to set multiple spaces, because cut only allows the spacer to be one character.

[email protected] programming]$ cat tab_space.txt |cut-d '-F 1

Cut:the delimiter must is a single character

Try ' Cut--help ' for more information.


(ten) What are the defects and deficiencies of the cut?

Did you guess? Yes, when dealing with multiple spaces.

If some fields within a file are spaced by several spaces, then using cut is a bit of a hassle, because cut is only good at working with "one character interval" text

This article from "shuosirhttp://shuosir.blog.51cto.com" blog, reproduced please contact the author!


Use of shell scripts---cut intercept data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.