Wildcards and regular expressions of Linux Shell

Source: Internet
Author: User

Overview

Wildcard is processed by shell. It only appears in the argument of the command-neither in command_name nor in options. When a wildcard is encountered in argument, shell will regard it as a path or file name to search for possible matching on the disk: If the matching meets the requirements, it will perform replacement (Path Extension ); otherwise, the wildcard will be passed to the command as a common character and handed over to the command for processing. All in all, wildcard is actually a Path Extension function implemented by shell. After wildcard is processed, shell will reorganize the command first, and then continue to process the restructured command until it is executed.

For example, if the current directory contains three files: cha1, cha2, and DES, and I want to use grep to search for lines containing the string cha in Des, write the following command:

Grep Cha * des ①

When the command is processed by shell, the * in Cha * will first be treated as a wildcard, so it will search for possible matches in the current directory. * As wildcard matches zero or multiple arbitrary characters. Therefore, the cha1 and cha2 files meet the matching requirements. Shell automatically restructured the command. The restructured command is as follows:

Grep cha1 cha2 des ②

This is the final command text form. Therefore, the actual action of command ① is to search for lines containing the cha1 string in the files cha2 and DES. This is quite different from what grep expects to do.

However, if the current directory does not have a file or folder (PATH) that can match Cha *, the shell will discard the * replacement because it cannot find a possible match, pass it to the command for processing. The restructured command is as follows:

Grep Cha * des ③

This is also the text form of the final command in this case. In this case, the action of command ① is not consistent with the expected action. Because when * is handed over to grep for processing, * no longer represents 0 or multiple arbitrary characters -- This is used as a wildcard, meaning of processing in shell -- In grep processing, the * sign is treated as a symbol in a regular expression, indicating that the previous character appears 0 or multiple times.

In the first case, how can we set the restructured command as command ③ instead of command ②? In shell commands, all the text can be divided into Meta and literal: literal is a common plain text, which has no special significance for shell; meta is a special reserved character with specific functions in shell, such as <> |. If there is no strict distinction, wildcard can also be classified into this category. In other words, Meta is processed in shell, and thus special characters in the form of text are lost in the final command used for execution (from this point of view, it is inappropriate to classify wildcard into Meta, because wildcard may be replaced or not replaced ). If you want to enter the Meta in shell into the final execution form of command in the form of text-as we expected previously, you must tell shell not to process the meta, command must use their text format. This work is done by Shell quoting (escape. This kind of processing is required to use regular expression (Regular Expression) because regular expression (Regular Expression) contains many special characters (which can be considered as meta in RE) it is the same as meta and wildcard in shell. To allow special characters in regular expressions to pass in regular expressions through shell, they must be escaped. Similarly, shell quoting, such as TR, is also used in the commands that define the self-defined meta if the self-defined Meta is repeated with the meta or wildcard in the shell.

 

Wildcard

* Match 0 or multiple characters

? Match any single character

[LIST] matches any single character in the list

[! LIST] match any single character not in list

{String1, string2,...} matches one of sring1 or string2 (or more) strings.

Example:

A * B a and B can contain any character of any length or none, such as aabcb, axyzb, a012b, and AB.

A? B A and B must have only one character, which can be any character, such as AAB, ABB, ACB, a0b.

A [xyz] B a and B must have only one character, but only X, Y, or Z, such as AXB, ayb, and azb.

A [! 0-9] B a and B must have only one character, but cannot be Arabic numerals, such as AXB, AAB, A-B.

A {ABC, XYZ, 123} B a and B can only be one of the three strings ABC, XYZ, or 123.

 

Meta in Shell

Below are some common examples:

IFS is composed of <space>, <tab>, or <enter> (commonly used space ).

Cr is generated by <enter>.

= Set variables.

$ Replace the variable or operation (do not mix it with shell prompt ).

> Redirect to stdout.

<Redirection to stdin.

| Command pipeline.

& Redirect file descriptor or place commands in the background for execution.

() Place its commands in the nested subshell for execution, or use them for calculation or command replacement.

{} Place its commands in the non-named function for execution, or use them in the defined range of variable replacement.

When the previous command ends, ignore the returned value and continue to execute the next command.

& When the previous command ends, if the returned value is true, continue to execute the next command.

| When the previous command ends, if the returned value is false, continue to run the next command.

! Run the commands in the History list.

 

Shell quoting

There are three types of escape characters, which can be viewed as meta in shell:

''(Single quotes ):

It is also called hard quote, and all Shell Meta in it will be turned off. Note: '(single quotation marks) is not allowed in hard quotes ).

"" (Double quotation marks ):

Soft quote: only the specific shell meta can appear inside it:

$ For parameter replacement

'Back quotes, used for command replacement

/$ Implement the dollar sign

/'Normalize backquotes (special meaning of backquotes removal)

/"To normalize double quotation marks (remove the special meaning of double quotation marks)

// Implement regionalization of the backslash (remove the special meaning of the backslash)

Note: In Soft quotes, single quotes have no special meaning, that is, text.

/(Backslash ):

Also called Escape, remove the special meaning of meta or wildcard that follows it.

In fact, quote is used to skip shell's processing of special characters.

 

Regular Expression

Anchor ):

Used to identify the position of the RE in a sentence. Common examples include:

^ Indicates the beginning of a sentence. For example, ^ ABC indicates the sentence that begins with ABC.

$ Indicates the end of a sentence. For example, ABC $ indicates a sentence ending with ABC.

/<Indicates the beginning of a word. For example,/<ABC indicates the word that starts with ABC.

/> Indicates the end of a word. For example, ABC/> indicates the word ending with ABC.

Modifier ):

Independent Representation is meaningless. It is used to modify the number of occurrences of the previous character set. Common examples include:

* Indicates that the occurrence of the previous character set is 0 or multiple times. For example, AB * C indicates that a and c may have 0 or more bits.

? The number of occurrences of the previous character set is 0 or 1. Such as AB? C indicates that there may be 0 or 1 B between A and C.

+ Indicates that the first character set appears for one or more times. For example, AB + C indicates that one or more BITs exist between A and C.

{N} indicates that the number of occurrences of the previous character set must be n. For example, AB {3,} C indicates that three bits must exist between A and C.

{N,} indicates that the first character set appears at least N times. For example, AB {3,} C indicates that at least three bits exist between A and C.

{N, m} indicates the number of occurrences of the previous character set is n to M. For example, AB {3, 5} C indicates that there are 3 to 5 BITs between A and C.

 

Summary

In general, it is precisely because Meta and wildcard in shell are sometimes the same as meta in command. In order to prevent the Meta in command from being parsed by shell, shell quoting must be used to ensure text immutability.

 

Appendix: shell script interpretation process

Note that the content in the double quote skips steps 1-4 and 9-10, and the content in the single quote skips steps 1-10. That is to say, the double quote can be sent to the execution step only after parameter extension, command replacement, and arithmetic replacement, while the single quote is directly sent to the execution step. In addition, both double quotes and single quotes can tell each command itself to be integrated, but they are not part of the command text during execution.

For example

Lsdetail = "LS-L"

$ Lsdetail

The double quote is not part of the command.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/chen_dx/archive/2008/05/20/2463493.aspx

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.