POSIX shell tips

Source: Internet
Author: User
Tags glob month name expression engine

POSIX shell tips

This article is used to record some of the useful tips I have found. This is a paper about POSIX (and other noncompliant shell script techniques. I have always insisted that the Bourne system is not a good programming language, which is roughly the same as Perl. I personally disagree with the sh language and I will not talk about it more here. Therefore, you will not see in this article that I have spent too many words describing mainstream shell languages such as Bash and ksh.

Print the variable value

printf %s\\n "$var"

\ N can be omitted without line breaks, but quotation marks are required. This won't work as follows:

echo "$var"

Remember, you don't need to use echo like above. According to POSIX, if echo contains "\" in the parameter, or its first parameter is "-n", it must have a specified behavior. To implement XSI-conformant, the Unix standard adopts a bad method ("\" is a c-style escape), while other popular shell languages such as Bash's Parameter Parsing scheme, "-n" is not used as a special option (even in POSIX compatibility mode), which is very abrupt. See the following:

The content in the above table means echo "$ var". You can be killed when you do not specify the content in the var variable, for example, it is a non-negative integer. Even if you are a GNU/Linux-centric user, all day calling for good Bash methods, and don't care about portability at all, one day, when you encounter "-n", "-e", or even "-neeeneene", your script will also be in trouble.

But if you really love echo and want to use it in your script, the following function can let echo run in a reasonable way (this is similar to the Bash echo command, however, with the agreed rules, the last parameter is not considered as an option, so echo "$ var" is safe even if the content in var looks like an option .) :

echo () (fmt=%s end=\\n IFS=" " while [ $# -gt 1 ] ; docase "$1" in[!-]*|-*[!ne]*) break ;;*ne*|*en*) fmt=%b end= ;;*n*) end= ;;*e*) fmt=%b ;;esacshiftdone printf "$fmt$end" "$*")

Giving up the script at the beginning may solve the troubles caused by the echo command. However, if you think that echo has an option, you can use "$ *" instead of "$ @" here @".

echo () { printf %s\\n "$*" ; }

I used to think that printing a simple variable will become so difficult, right? Now you should understand why the Bourne language should not be used for formal programming.

Read input row by row

IFS= read -r var

The preceding command reads a line of input, stops with line breaks, file Terminators, and error conditions, retrieves data from stdin, and stores the results in var. If a new row is read, the exit status is 0 (successful ). If an error is returned or a file Terminator is encountered, it is not 0 (failed ). Some robust scripts with high functional requirements may need to be clearly identified.

According to my understanding of POSIX, the content in var should be filled by the read data, even if an error occurs or a file Terminator occurs. However, I am not sure that all implementation methods will be implemented in this way, and whether this is strictly compliant with the standards. Here, you are very welcome to give your advice.

However, there is a common trap that someone may try to read the command from the pipeline:

foo | IFS= read var

POSIX allows commands in any pipeline to run in subshell, while commands in the main shell may differ in implementation, especially Bash and ksh. The following things may help you overcome this problem:

IFS = read var <EOF $ (foo) EOF byte read input read dummy oct <EOF $ (dd bs = 1 count = 1 | od-B) EOF

This command causes the variable oct to be input in octal bytes (one byte. Note that dd is the only standard command. It can safely and accurately read the input content in bytes to ensure that no Bytes overflow or loss occurs. In addition to being portable, head-c 1 can be cached using the stdio function of C.

Because the READ command processes text files, it is necessary to convert some escape formats (such as octal here. In fact, it cannot process all bytes, especially it cannot store an empty byte in shell variables. Other non-ASCII bytes problems may depend on your implementation method and language environment. You can modify the code to read bytes one by one, but remember to pay attention to exceptions such as null bytes.

You can use the next sh technique to convert octal data to binary data.

Write numeric bytes to stdout

writebytes () { printf %b `printf \\\\%03o "$@"` ; }writebytes 65 66 67 10

This function allows octal, decimal, and hexadecimal values. The octal and hexadecimal values must be prefixed with 0 or 0x respectively. If you want to treat your parameters as octal values, for example, you can try the following when using the previous tips to read binary data streams and process values:

writeoct () { printf %b `printf \\\\%s "$@"` ; }

Note that if your octal value is greater than three, it will collapse, so do not add 0 in front. The implementation of the following versions is much slower, but at least this problem can be avoided:

writeoct2 () { printf %b $(printf \\%03o $(printf 0%s\  "$@")) ; }

Use xargs with find

GNU dead fans generally like to use the-print0 or-0 option for find and xargs to obtain powerful results respectively. Without the GNU extension, the output of find is a single line, which means that if some path name characters exist after the line break, we cannot restore the actual path name.

If you do not mind that your script will be affected by the path name characters contained after the line break, at least make sure that this will not cause the vulnerability of privilege escalation. Next, let's take a look at the following:

find ... | sed 's/./\\&/g' | xargs command

The sed command here is mandatory. Unlike most people's views (which I once thought), xargs does not accept the list with line breaks, but only the list with shell encoding, for example, if the input is separated by spaces, all spaces must be encoded. The above command simply encodes all characters with a backslash to meet this requirement, so as to protect the space in the file name from being lost.

Use the + command in the find command

Of course, we also have a smarter way to use find to pass commands to the file, with-exec, and replace ";" with "+":

find path -exec command '{}' +

Here, find first replaces "{}" with as many file names as possible, each of which serves as its own parameter. In this case, there will be no line break issues. Unfortunately, although it has been in POSIX for a long time, in the most common GNU, find does not support "+" for a long time ", therefore, it is not very convenient in actual use. A reasonable solution is to compile a support for "+" and replace "+" with ";", which can be used to fix systems that do not support such find.

Find is a command that must be successfully run in any POSIX-compliant system, but you will Find that after the parameter ";" is lost, will lose support for "+:

find /dev/null -exec true '{}' +

This uses "/dev/null" as the only path, one of the three pairs specified by POSIX.

Find-print0 easy-to-use Edition

find path -exec printf %s\\0 '{}' +

However, the support for ";" is still missing until the latest version. It is better to use ";" instead of "+" if necessary.

Note that this method is not necessarily useful because the output is not a text file, and only the GUN xargs should be able to parse it.

Find-print

Although find has many problems, it can still have a strong parsing of the output. For each searched absolute path, use "/. "prefix replaced with"/", and similar search"/. /"with". /. /", which is used to identify whether a line break is generated as a separator or embed a path name.

Get impeccable output from command replacement

The following code is not secure:

var=$(dirname "$f")

Line breaks are added at the end of most command output. The Bourne command can be replaced, but not just a line break in the output. In the preceding command, if f contains the line break of the directory name, the line break will be replaced to generate a different directory name than expected. Although line breaks in such directory names are very harsh, such places may be exploited by hackers to launch attacks.

The solution to this problem is very simple. Add a security character after the last line break, and then use the shell parameter replacement mechanism to remove the security character.

var=$(command ; echo x) ; var=${var%?}

In the case of the directory name command, remove the line break added to the directory name, for example:

var=$(dirname "$f" ; echo x) ; var=${var%??}

Of course, there is also a simpler way to obtain the directory part of the path, if you do not care about these special cases:

var=${f%/*}

Of course, this will be ineffective for files in the root directory. In other cases, it is better to write a special shell function for this special case. But please note that such a function must store the results in a variable. If you print them directly from stdout, a common practice is to write a shell function to process strings. We will encounter "$ (...)" In this case, replace the linefeed and then return to the system.

Returns a string from a shell function.

We can see that stdout is not a good way to let the shell function return a string to the call port, unless line breaks are not involved. Of course, this approach does not apply to functions that are intended to process arbitrary strings. So what can we do?

For example:

func () {body hereeval "$1=\${foo}"}

Of course, $ {foo} can be replaced with any form of substitution. The key point is the eval line and the use of escape. When the main command parser creates eval parameters, "$1" is extended. But "$ {foo}" is not extended here because "$" is encoded. However, when eval executes the parameter, it expands. If you are not sure why this is important, consider the following Disadvantages:

foo='hello ; rm -rf /'dest=bareval "$dest=$foo"

However, the following versions are secure:

foo='hello ; rm -rf /'dest=bareval "$dest=\$foo"

Note that in the original case, "$1" is used in the call port to pass the target variable name as a function parameter. If your function requires the shift command, for example, write the remaining parameters as "$ @", in this way, saving the value of "$1" in a temporary variable at the beginning of the function may take effect.

Shell-encoded arbitrary string

Sometimes it is necessary to encode the string in shell format, such as extending it into a command executed with eval or writing a generated script. There are many methods, but many may fail because the string contains line breaks. The following is a usable version:

quote () { printf %s\\n "$1" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/'/" ; }

This function only replaces? '? (Single quotes),? '\''?, Then, place the single quotation marks at the beginning and end of the string. This is safe because only one single quotation mark character is of special significance. After correct line breaks are processed, the single quotation marks at the end are used as a safe symbol to avoid the command replacement attack caused by line breaks, such:

quoted=$(quote "$var")

Use Arrays

Unlike the enhanced version of The Bourne shell (such as Bash), the POSIX shell has no array type. However, after sacrificing a little efficiency, you can get something like an array of classes. The following method allows you to get an array (only one), the location parameter "$1", "$2", and so on. You can use this array to interact with the content. Replacing the content in the "$ @" array is simple:

set -- foo bar baz boo

Or more practical:

set -- *

But we do not know the current content in "$ @", so you can retrieve it after replacement, and how to generate these so-called "arrays" programmatically ". Try this encoding function based on the previous tips:

save () {for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; doneecho " "}

The usage can be as follows:

myarray=$(save "$@")set -- foo bar baz booeval "set -- $myarray"

Here, an array is prepared for the use of the encoded eval command to restore the location parameter. Other forms such as myarray = $ (save *) are also possible, and the value of the array variable generated by programming.

We can generate an array variable from the output of the find command. Either use a command cleverly constructed with the-exec option, or ignore the possibility of line breaks in the path name, and use the sed command for the xargs result of find:

findarray () {find "$@" -exec sh -c "for i do printf %s\\\\n \"\$i\" \\| sed \"s/'/'\\\\\\\\''/g;1s/^/'/;\\\$s/\\\$/' \\\\\\\\/\"done" dummy '{}' +}

For example, the following script:

old=$(save "$@")eval "set -- $(findarray path)"for i do command "$i" ; doneeval "set -- $old"

Note that common errors are used here, such as "for I in 'Find... '; Do ...".

Whether the string matches the format of the file name (glob)

fnmatch () { case "$2" in $1) return 0 ;; *) return 1 ;; esac ; }

Now you can do this:

if fnmatch 'a??*' "$var" ; then ... ; fi

Suddenly, the [["command in Bash is particularly needed.

Count the number of occurrences of Characters

tr -dc 'a' | wc -c

The number of times that character a appears is counted and other characters are deleted. However, if the input contains non-character binary data, tr-dc is not very useful. POSIX is different from other implementations. We can do this:

tr a\\n \\na | wc -l

The wc-l command reads the number of occurrences of all linefeeds. Therefore, use "a" to replace the linefeeds, and use tr to count the number of occurrences of ".

Override locale categories

This will not take effect as follows:

LC_COLLATE=C ls

This is because LC_ALL will appear in the environment and overwrite any specific category variable. LC_ALL that is not set will have incorrect behavior, so that it may change all category. For example:

eval export `locale` ; unset LC_ALL

This command explicitly sets all the specific category variables based on the implicit variables they receive, including the language, classification variable, and LC_ALL, your script may overwrite individual category like the preceding script command:

Remember, only C scripts (or POSIX) can be used to set locale variables ).

The glob mode and regular expression in the range of [a-z] are sorted Based on the ascll code point, rather than the natural or incompatible ascll character set (such as EBCDIC. This also applies to the character range (LC_COLLATE) of the tr command ). Conditional ing of (LC_COLLATE) "I" or "I" based on ascll code points is reasonable (LC_CTYPE ). Date is printed in the traditional Unix format (LC_TIME ).

If there are something you cannot assume, or the locale of C will have a worse effect than the existing locale:

Binary data outside the easy-to-use character set (ascll) is not necessarily a character, because they may be non-character bytes and are treated as ISO Latin-1, think of an abstract character set without attributes, or even a UTF-8 character, which affects whether they can match the Regular Expression in glob (LC_CTYPE ). If LC_CTYPE changes, the data of other separate category depends on the character encoding, such as the LC_TIME month name, LC_MESSAGES string, and LC_COLLATE elements. They have undefined behaviors (LC_CTYPE ). If LC_COLLATE is set to C and non-accll characters appear in the regular expression range, it is not clear what POSIX is, but the regular expression engine in the gnu c library will crash historically.

Therefore, it is safe to use C to replace individual categories such as LC_COLLATE or LC_TIME to obtain predictable output, but it is not safe to replace LC_CTYPE unless LC_ALL is replaced. Replacing LC_CTYPE may suppress strange and dangerous condition ing in specific scenarios, but in the worst case, it can completely block access to all file names containing non-ascll characters. At present, there should be no special solution.

Remove all exports

unexport_all () {eval set -- `export -p`for i do case "$i" in*=*) unset ${i%%=*} ; eval "${i%%=*}=\${i#*=}" ;;esac ; done}

Use glob to match all "vertex Files"

.[!.]* ..?*

The above two glob matches. The first one starts matching with "." In all files, followed by any character. The second matches two "." And another non-"." characters. Between them, they match all file names starting with "..." and "." and have their own special meanings.

Remember, if a glob does not match any file name and it does not disappear from the command as a single existence, you may need to pass the test, such as matching or ignoring hidden errors.

Check whether the directory is empty

is_empty () (cd "$1"set -- .[!.]* ; test -f "$1" && return 1set -- ..?* ; test -f "$1" && return 1set -- * ; test -f "$1" && return 1return 0 )

This Code uses three sections of glob, matching the situation except "." or "...", and processing the literal name that matches the glob string word by word.

If you do not care about permission protection, there is a simpler implementation method below:

is_empty () { rmdir "$1" && mkdir "$1" ; }

If other users have the write permission on the directory or other processes can change the directory, both methods can compete, therefore, the latter has a method that appropriately limits umask. In fact, it is more desirable that its results have correct original attributes.

is_empty_2 () ( umask 077 ; rmdir "$1" && mkdir "$1" )

Query the home Directory of a specific user

This is not acceptable:

foo=~$user

Try this:

eval "foo=~$user"

Make sure that the content in user variables is safe. Otherwise, something bad will happen. Writing a function is a good choice:

her_homedir () { eval "$1=~$2" ; }her_homedir foo alice

In this way, the variable foo will contain the wave extension ~ Alice.

Recursive directory processing without using find

Because finding is difficult or even impossible in the strong limit mode, why not write recursive scripts instead. Unfortunately, I haven't made a way to nest a sub-shell in each directory tree, but here is a point using the sub-shell:

myfind () (cd -P -- "$1"[ $# -lt 3 ] || [ "$PWD" = "$3" ] || exit 1for i in ..?* .[!.]* * ; do[ -e "$i" ] && eval "$2 \"\$i\""[ -d "$i" ] && myfind "$i" "$2" "${PWD%/}/$i"done)

The usage is as follows:

handler () { case "$1" in *~) [ -f "$1" ] && rm -f "$1" ;; esac ; }myfind /tmp handler   # Remove all backup files found in /tmp

In recursively traversing each file of "$1", a function or command "$2" is used to include the file in the current working directory and add the file name to the end of the command line. The third position parameter "$3" is used internally recursively to prevent traversal of symbolic links. It contains the expected physical path, PWD should be included in the cd-P "$1" command to ensure that "$1" is not a symbolic link.

"Seconds" after Epoch"

Unfortunately, the GNU date % s format cannot be transplanted, so we can do this:

secs=`date +%s

Try this:

secs=$((`TZ=GMT0 date \+"((%Y-1600)*365+(%Y-1600)/4-(%Y-1600)/100+(%Y-1600)/400+%j-135140)\*86400+%H*3600+%M*60+%S"`))

The number here is 135140, which is the number of days between 1600-01-01 and 1970-01-01 (lunar calendar. Use 1600 instead of 2000 as the benchmark year, because division of the C series is not sensitive to negative numbers.

Postscript

It is expected that more things will be added in the future. I hope these things will allow everyone to write correct and robust POSIX shell scripts, despite some pitfalls, will make writing abnormal and inefficient. If some of the above hack ideas inspire some people to use such a real language, rather than sh, Bash, or some people fix the pitfalls caused by shell language, I will be very happy.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.