Shell Scripting Learning Guide [IV] (Arnold Robbins & Nelson h.f. Beebe) _linux Shell

Source: Internet
Author: User
Tags bitwise bitwise operators gpg sin square root string format inode usage egrep

Recall a thing: Before using Linux to find Chinese input method, in Baidu entered the FCITX, and then the result has a, you are looking for is not: satirical Tencent. Originally can not remember this input method name, but later brother remember this input method name is how to spell, thanks to Baidu.

Nineth Chapter Awk's amazing performance

The invocation of awk can define variables, providers, and specify input files, syntax:

 code as follows:

awk [-F FS] [-V var=value ...] ' Program ' [--] [var=value ...] [File (s)]
awk [-F FS] [-V var=value ...]-f programfile [--] [var=value ...] [File (s)]

Short programs are usually provided directly on the command line, while longer programs specify the-F option and can be reused. If the command line does not specify a filename, awk reads from the standard input. --is a special option, stating that awk itself has no further command-line options. Any subsequent options can be used by your program.
The-f option is used to redefine the default field separator character, and it is generally customary to use it as the first command option. The FS parameter immediately following the-F option is a regular expression or is provided as the next argument. The field separator character can also be set to be specified using the built-in variable FS. Such as:
Awk-f ' \ t ' {...} ' files fs= ' [\f\v] ' files
Above example the value set by the-F option is applied to the first filegroup, and the value specified by FS is applied to the second group. The-v option initialized must be placed before any programs that are given directly on the command line before the program is started. After a command-line program, the-v option is interpreted as a filename. Initialization elsewhere on the command line completes when the parameter is processed and takes a file name, such as:
awk ' {...} ' Pass=1 *.tex pass=2 *.tex
The list of processed files is two times, the first pass is set to 1, and the second time is 2. Initializing with string values does not need to be enclosed in quotation marks unless the shell requires such a reference to protect special characters or whitespace.

The special file name-(hyphen) represents the standard input. Most modern awk implementations (excluding POSIX) assume that the special name/dev/stdin is standard input, even if the host operating system does not support the file name. Similarly:/dev/stderr and/dev/stdout can be used in awk programs, representing standard error output and standard output, respectively.

The General awk Command mode or operation can omit one, and if the pattern is omitted, each input is manipulated, and if the operation is omitted, the default action is the record of the output matching pattern. Although the pattern is mostly a numeric or string expression, AWK provides two special modes to retain the from begin and end.

The action associated with begin is executed only once, before any command-line files or general command-line assignments are processed, but after any of the start-V options have been specified. Most of it is used to handle any special initialization work that the program requires. The end operation is only performed once. Used after all output data has been processed. The Begin and end modes can be in any order and can exist anywhere within the AWK program. When multiple begin or end modes are specified, they are executed in the order in which they are in the awk program.

AWK provides scalar and array two variables to hold data, numbers, and string expressions, as well as some statement types to handle data: assignment, annotation, condition, function, input, loop, and output. Many of the functions of awk expressions are similar to the C language. The notes in Awk start with the # and the end of the line. Cross-line statements need to be preceded by a backslash.

The string constants in awk are bounded in quotation marks, and strings can contain any 8bit of characters except the control character Nul. Because the NUL is in the underlying implementation language (C), it acts as a character that breaks the string. The length of the awk string is dependent on memory. The backslash escape sequence allows the representation of nonprinting characters.

AWK provides a number of built-in functions that can be executed on a string, which, in detail, says two length (string) returns the number of characters in a string. Strings are compared using traditional relational operators: = =,!=, <, <=, >, >=. When you compare strings of different lengths, and one of the strings is the initial substring of another, the shorter definition is less than the longer one. In the shell, string concatenation can be done directly without the need for a connection symbol.

Most powerful parts of awk come from its support for regular expressions. There are two operators: ~ (Match) and!~ (mismatch) make it easier for awk to use regular expressions: the "ABC" ~ "^[a-z]+$" result is true, and regular expression constants can be bounded by quotes or slashes:/^[a-z]+$/. Note that if you have a literal symbol, you need a backslash to escape it.

The numbers in awk, are expressed in double-precision floating-point values, such as 1/32 written 0.03125, 3.125e-2, and so on, awk does not provide a function to string the numbers, but it is simple to add 0 to the string, such as: s = "123", n = 0 + S. So 123 assigns a value to N. The general "+123ABC" translates to 123, while the "ABC123" and "" are converted to 0. Even though all of the numerical operations in awk are done in floating-point arithmetic, the integer value can be represented, as long as the value is not too large, this value is limited to 53 digits, that is, 2^53 9000 trillion. The numeric operator of awk does not have a bitwise operator, an exponential operator (^ or * * or **=, but avoids the use of * * and *=, which is not part of POSIX awk) it is right-binding and is the only right binding operator with the assignment operator. For example, the order of A^B^C^D operations is a^ (b^ (c^d)). The remainder operation in awk tests 5 3 is 2; 5%-3 is 2; -5% 3 is-2; -5-3 is-2; the result of the remainder is found to depend on the positive or negative of the remaining number. There is also an builtin function:
Int (x) to x rounding
Rand takes a random number between 0 and 1
Srand (x) sets X to the new input value of rand
cos (x) gives the cosine of x
sin (x) gives the sine value of x
atan2 ( X,y) gives the tangent value of the Y/X
exp (x) gives the x power of e
log (x) gives X's usual pair of values (base e)
sqrt (x) gives x the positive square root value
Exit (x) ends the awk program, if there is an X value, return x, otherwise return 0.
Index (S,T) returns the first starting position of T in S. If T is not a substring of s, then returns 0]
Length (x) for x (number of characters)
Substr (s,x,y) obtains a substring of y from the X character in the string s.

awk Built-in string function
Gsub (R,s) replaces R with s in the entire $
Gsub (R,S,T) replaces R in the whole T with S
Index (S,T) returns the first position of the string T in S
Length (s) returns s
Match (S,R) test s contains a string that matches R
Split (S,A,FS) on FS will be divided into sequence a
Sprint (FMT,EXP) returns the FMT-formatted EXP
Sub (r,s) replaces s with the leftmost longest substring in the $
SUBSTR (s,p) returns the suffix portion of the string s that starts with P
SUBSTR (s,p,n) returns the suffix portion of the string s starting with the length n from p

AWK provides a number of built-in variables, all capitalized names, often used in several of the following:
FILENAME name of the current input file
FNR the number of records in the current input file
FS Field separator character (regular expression) (default: "")
NF current record number of fields
The number of NR records in the work
OFS output Field separator character (default: "")
ORS Output Record separator character (default: \ n)
RS Input Record separator character (for regular expressions in gawk and Mawk only) (default: \ n)

Tests allowed by awk:
X==y x equals y?
X!=y x is not equal to Y?
X>y x greater than Y?
X>=y x is greater than or equal to Y?
x X<=y x is less than or equal to Y?
X~re x matches the regular expression re?
X!~re X does not match the regular expression re?

The operator of awk
=, + =, =, *=,/=,%=
|| && > >= < <= = =!= ~!~
XY (string link, ' x ' y ' becomes "xy")
+ - * / % ++ --

AWK does not provide bitwise operators, but it provides related functions:
and (v1, V2) return to the bitwise and the values provided by V1 and V2.
Compl (VAL) return the bitwise complement of VAL.
LShift (Val, count) return the value of Val, shifted left by count bits.
or (v1, v2) return the bitwise OR of the values provided by V1 and V2.
Rshift (Val, Count) return to the value of Val, shifted right by count bits.
XOR (v1, v2) return the bitwise XOR to the values provided by V1 and V2.

After the array variable of awk allows the array name, enclose any number or string expression in square brackets as the index. An array indexed with any value is called an associative array. AWK will be applied to the array, allowing you to find operations such as inserts and deletions that are completed within a certain amount of time, regardless of how many items are stored. (Say so much is actually a hash array). Delete Array[index] Deletes elements from the array. Delete array deletes the entire array. The awk array can also be used in this way:
Print maildrop[53, "Oak Lane", "t4q 7XV"]
Print maildrop["Subsep" "Oak Lane" subsep "t4q 7XV"]
Print maildrop["53\034oak Lane", "t4q 7XV"]
Print maildrop["53\034oak lane\034t4q 7XV"]
The output of the above results are the same. The built-in variable subsep default value is \034 and can be changed. If you change the value of Subsep later, the index of the stored data will be invalidated, so subsep should be set only once per program, in the begin operation.

AWK's automated handling of the command line means that the awk program hardly needs to care about themselves. awk makes command-line arguments available through built-in variable argc (parameter count) and argv (parameter vectors, or parameter values). Give an example to illustrate its usage:

 code as follows:

$ cat >showargs.awk
print "ARGC =", ARGC
for (k = 0; k < ARGC; k++)
Print "argv[" k "] = [" argv[k] "]"

$ awk-v one=1-v two=2-f showargs.awk three=3 file1 four=4 file2 file3
ARGC = 6
Argv[0] = [awk]
ARGV[1] = [three=3]
ARGV[2] = [File1]
ARGV[3] = [four=4]
ARGV[4] = [File2]
ARGV[5] = [File3]

As in C + +, the parameters are stored in the array items 0, 1 ..., and ARGC-1, and the No. 0 item is the name of the AWK program itself. However, parameters that are binding with the-F and-V options are not available. Similarly, any command-line program is not available:

 code as follows:

$ Awk ' begin{for (k=0;k<argc;k++)
Print "argv[" k "] = [" argv[k] "]"} ' a B C
Argv[0] = [awk]
ARGV[1] = [A]
ARGV[2] = [b]
ARGV[3] = [C][/C][/C]

Whether you need to display the directory path in the program name, see the actual situation. The awk program modifies ARGC and ARGV, noting that both are consistent.
When Awk sees that the parameter contains program content or a special--option, it immediately stops interpreting the argument as an option. Any subsequent parameters that look like an option must be processed by your program and then removed from the argv or set to an empty string.

AWK provides access to all environment variables in the built-in array environ:

 code as follows:

$ Awk ' begin{print environ[' home '; print environ[' USER '} '

There is nothing special about the Environ array, you can modify the deletion at will. However, POSIX requires that the subprocess inherit the environment in which awk started, and we also find that, under the current implementation, it is not possible to pass changes to the Environ array to a subprocess or built-in function. Specifically, this means that you cannot control a string function, such as ToLower (), under a specific locale by changing the evniron["Lc_all". So you should treat environ as a reading group. If you want to control the locale of a child process, you can do so by setting the appropriate environment variable in the command line string. Such as:
System ("env lc_all=es_es sort infile > outfile") #Sort files by Spanish locale
The system () function is described later.
Patterns and Operations form the core of the awk program. The mode is true to operate. A normal pattern is a regular expression that is taken to match an entire input record, such as:
NF = 0 × select empty record
NF > 3 - select records with more than three fields
NR < 5 ා select the first to the fourth record
$~ / Jones / ා record of Jones in selected field 1
/[XX] [mm] [ll] / ා ignore case to select records with XML
Awk can also use a range expression, two expressions separated by commas, on the matching feature. Like what:
(FNR = 3), (FNR = =) (select each input file to record 3 to 10
/[[HH] [TT] [mm] [ll] >
FILENAME, FNR, NF, and NR are not initially defined in the begin operation, and NULL is returned when referenced to them.
By matching the pattern, the pass to the true record should be manipulated. Some examples are given:
#UNIX word counter WC:
awk ' {C + = Length ($) + 1; W + NF} end {print NR, W, C} '
Note: The Mode/action group does not need to be delimited by newline characters, which are generally used for easy reading. We can also initialize using begin{C = W = 0}, but awk has the default initialization guarantee.
#Print the original data values and their logarithm as a single column data file:
awk ' {print $, log ($)} ' file (s)
#To randomly print 5% rows of samples from a text file:
awk ' rand () < 0.05 ' file (s)
#Report nth column and:
Awk-v column=n ' {sum + = $COLUMN} end {print sum} ' file (s)
#Generate the average value of field n column
Awk-v column=n ' {sum + = $COLUMN} end {print Sum/nr} ' file (s)
#Count the total number of the last field in the file
awk ' {sum = $NF; print $, sum} ' file (s)
#Three ways to find text in a file:
Egrep ' Pattern|pattern ' file (s)
awk '/pattern|pattern/' file (s)
awk '/pattern|pattern/{print FILENAME ': ' FNR ': ' $} ' file (s)
#Find 100-150 lines of matching information only
Sed-n-E 100,150p-s file (s) | Egrep ' pattern '
awk ' (100<=FNR) &amp;&amp; (fnr<=150) &amp;&amp;/pattern/{print FILENAME ': ' FNR ': ' $} ' file (s)
#To change two or three columns in a four column table, assembling tab dividers:
Awk-f ' \ t '-v ofs= ' \ t ' {print $1,$3,$2,$4} ' old > New
awk ' BEGIN {fs=ofs= ' \ t '} {print $1,$3,$2,$4} ' old > New
Awk-f ' \ t ' {print $ \ t "$" \ T "$" \ T "$}" old > New
#Replace the grid separator with a tab:
Sed-e ' s/\t/\&amp;/g ' file (s)
awk ' BEGIN {fs= ' \ t '; Ofs= "&amp;"} {$ = $ print} ' file (s)
#To delete a sorted duplicate row:
Sort file (s) | Uniq
Sort file (s) | awk ' last!= $ {print} {last = $} '
Line end of ා / newline character, consistent conversion to end of line with newline characters:
Sed-e ' s/\r$//' file (s)
Sed-e ' s/^m$//' file (s)
Mawk ' BEGIN {rs= ' \ r \ n '} {print} ' file (s)
#Find lines longer than 72 characters:

Egrep-n ' ^. {A.} ' file (s)
awk ' Length ($) > ' {print FILENAME ': ' FNR ': ' $} ' file (s)

AWK supports continuous execution of statements. Support conditional statements, if else is similar to C language, support loop while () {} or do{} while () or for (;; ) {] is similar to C language. There is also a for (key in array) {}.
such as awk ' BEGIN {for (x=0 x<=1;x+=0.05) print x} '. Although many resemble C, note that awk lacks the comma operator. Loops can also use break and continue.

awk directly handles the input files that are marked on the command line, typically without the user opening and processing the files themselves, but can also do so through the getline statement of awk. Usage:
Getline reads the next record from the current input file into $ $ and updates NF, NR, FNR
Getline var from the current input file, reads the next record into Var and updates the Nr, FNR
Getline < file reads the next record from the fle, deposits $, and updates NF
Getline var < file from file read the next record into Var
cmd | Getline from external command cmd read the next record into $ $ and update NF
cmd | Getline var reads the next record from an external command and stores it in Var
If you like to ensure input from the control terminal: Getline var < "/dev/tty"

In awk, you can mix a pipe with an external shell command:

 code as follows:

Tmpfile = "/tmp/telephone.tmp"
Comman = "Sort >" tmpfile
for (name in telephone)
Print name "\ t" Telephone[name] | Command
Close (command)
while ((Getline < tmpfile) > 0)
Close (tmpfile)

Close closes the open file to terminate the available resources. There is also no sort function in awk, thinking that it only needs to copy the powerful sort command.

Getline statements and output redirection in the awk pipeline can communicate with external programs, and the System (command) function provides a third way: its return value is the exit code for the command. So the above example can be written as:

 code as follows:

Tmpfile = "/tmp/telephone.tmp"
for (name in telephone)
Print name "\ t" Telephone[name] | > Tmpfile
Close (tmpfile)
System ("Sort <" tmpfile)
while ((Getline < tmpfile) > 0)
Close (tmpfile)

You do not need to call Close () on commands executed by system () because close () is only for files or pipes opened with the I/O redirection operator, as well as getline, print, and printf. A few other examples:
System ("Rm-f" tmpfile)
System (Cat < because each call to System () starts a new shell, there is no simple way to pass data between commands within separate system () calls, except through intermediate files.

So far, awk is enough to write any data processing program. For large programs that are not conducive to maintenance and viewing, AWK provides functions, like C, in which awk can optionally return a scalar value. Functions can be defined anywhere at the top of the program: pairs of Mode/action groups before, between, and after. In a single file program, the practice is to put all functions behind pairs of patterns/opcode, and have them sorted alphabetically, which can be easily read. The definition is as follows:
Function name (Arg1,arg2 ...) {statement (s); return expression;}
A local variable overrides a global variable of the same name.

Other built-in functions in awk:
Substring extraction substr (String,start,len), subscript starting from 1.

Letter Case Conversion ToLower (string), ToUpper (String). Rare letters and accented letters cannot be processed.

Character Lookup index (string,find), which returns the starting position and cannot be found to 0.

The string matches match (STRING,REGEXP), the match returns the index of string, and the global variable Rstart and rlength are updated, and the matching method is obtained: substr (String, rstart,rlength).

Strings Replace sub (regexp,replacement,target) and Gsub (Regexp,replacement,target). The former matches target with the regular expression, replacing the leftmost longest matching part with a string.
Gsub () is similar to running, but it replaces all matching strings. Both functions return the number of replacements. If the third argument is omitted, the default value is the current record of $. The characters & in the replacement in the two functions are replaced with the text that matches the regexp in target. Use \& to turn off this feature, and remember to escape it with a double slash if you want to use it in a quote string. such as Gsub (/[aeiouyaeiouy]/, "&&") multiplies all vowel letters in current $ twice, while Gsub (/[aeiouyaeiouy]/, "\\&\\&") replaces all vowels with a pair & Symbols.

String Segmentation: Awk automatically provides a convenient split for $ $ ... $NF, or function: Split (STRING,ARRAY,REGEXP) cuts the string into fragments and stores it in the array. If RegExp is omitted, the default built-in field separator is FS. function returns the number of elements in the array. When filling in the separator, pay attention to the difference between the default field delimiter "and []": The former ignores front and end whitespace, and treats whitespace as a separate space at run time, which matches exactly one space, and for most text processing, the first pattern already satisfies functional requirements.

string format sprintf (Format,expression1,expression2,...), which returns the formatted string as its function value. The same is true for printf (), except that it displays formatted strings on standard output or redirected files, rather than returning their function values. These functions are similar to printf in the shell, but there are some differences, and notice when you use them.

numeric function:
ATAN2 (y,x) returns the y/x of the tangent
EXP (x) returns the exponent of X, ex
int (x), log (x), cos (x), sin (x), sqrt (x),
RAND () returns to 0<=r<1
Srand (x) sets the seed of the virtual random generator to X and returns the correct seed. If x is omitted, the current time (in seconds) is used. If Srand () is not invoked, each execution of awk starts with the default seed.

awk built-in variables (predefined variables)
Description: The V item in the table represents the first tool to support the variable (same below): A=awk,n=nawk,p=posix Awk,g=gawk

V variable meaning default value
N ARGC The number of command line arguments
G Argind The argv identifier of the file currently being processed
N ARGV array of command line arguments
G CONVFMT Digital Conversion format%.6g
P ENVIRON UNIX Environment variables
N ERRNO UNIX System error messages
G fieldwidths A white space-delimited string of input field widths
A FILENAME The name of the current input file
P FNR current record number
A FS input Field separator space
G IGNORECASE Control Case Sensitive 0 (case sensitive)
A NF number of fields in the current record
A NR has been read out of the number of records
The output format of a ofmt number%.6g
A OFS output Field separator space
A ORS output Records separated Shing
A RS input Record He's separated Shing
N Rstart The string first matched by the matching function
N rlength The string length matched by the matching function
N subsep Subscript Separator "34"

The above is basically all the content of the awk finished, very powerful, online search some other about awk, did not find any explanation like this book so full.
The above example gives less, there are a lot of examples to refer to.

Chapter Tenth document processing

First told the LS command, should be very familiar, and then list the main options bar:
-1 digit 1, Force single column output, default to fit window width output
-A displays all files
-D displays information about the directory, not the files they contain
-F uses special ending characters to mark specific file types. Try it. The path has a slash, and the executable is added with the * number. I didn't try anything else.
-G applies only to groups: omitting owner names
-I list inode number
-L tightly attached to symbolic connections, listing the files they point to.
-L lowercase l, displaying details.
-R Inverted Default sort
-R recursive listing down along entry to each directory
-S is sorted by the size count of file sizes, only the GNU version supports.
-s lists the size of the file as a block (System-related) unit.
-T sort by last modified time
--full-time Displays the full time stamp

Explain the content of the long message display:
Drwxrwxr-x 2 Administrator Administrator 1024 January 5 10:43 Bin
The first letter-represents a general file D represents a directory L representing a symbolic connection
The next 9 characters, every three are a group, reporting permissions for all groups, R for readable, W for writable, and X for executable. The first three were before the owner was selected, the middle three were the permissions of the user's group, and the last three were other people's permissions.
The second column contains the connection count. The 34th column represents the owner and the owning group. The column five is the byte unit size. Finally, the time and file (folder) name.

The book gives a command OD said to show really filename, ls | Od-a-B, tried a bit, completely do not understand the output content. It seems to be using NL (octal 012) to do the separator, and then list the appearance of the file name. If the file name has Chinese characters, the display will be some symbols. All kinds of don't understand.

The book uses a section to update the modification time with a touch and says that sometimes the timestamp is meaningful, but the content is not. A common example is to lock a file to indicate that the program is already in execution and that the second instance should not be started. Another purpose is to record the time stamp of the file for later use in comparison with other documents. The touch Default (-m) action changes the last time the file was modified, or the last access time of the file with the-a option. You can also modify the time with the-t option by adding [[cc]yy]mmddhhmm[]. SS] in the form of arguments, Century, Gong Yuanyan and seconds are optional, for example:
$ touch-t 201201010000.00 Date #Create a file set timestamp
Touch also provides the-R option to copy the timestamp of the reference file.

On a date, the Unix timestamp is zero-based and 1970/1/1/00:00:00 UTC.

The temporary file/tmp is then introduced in a section. In general, to solve the temporary files generated by their own programs, the shared directory or multiple instances of the same program may cause temporary file naming conflicts, typically using the process ID, can be in the shell variable

 code as follows:

Umask 077 #???????????????
Tmpfile=${tmpdir-/tmp}/myprog. "/>you can ' t use ' macro parameter character # ' in math mode #产生临时性文件名
Trap ' rm-f $TMPFILE ' EXIT #完成时删除临时文件

But like/tmp/myprog.

 code as follows:

$ cat $HOME/html2xhtml.sed
CD Top level Web site Directory
Find. -name ' *.html '-o-name ' *.htm '-type f |
While read file
Echo $file
MV $file $file. Save
Sed-f $HOME/html2xhtml.sed < $file. Save > $file

The book said a section to find the problem file, meaning that there are special characters in the file name, can be useful find-print0 to parse, but did not understand what to say these are what to use.

Then I introduced a command xargs to handle the problem of lengthy scripting, so we sometimes write the following commands for finding strings:
$ grep posix_open_max/dev/null $ (find/usr/include-type F | sort)
We looked for a string like Posix_open_max in the next pile of files. If there are few files behind find, that's fine, and the command will execute smoothly, but if it's too long it will give you a hint: ****:argument list too long. This way. We can use getconf Arg_max to see what the maximum allowable value of your system is. The above command has a file that is empty file/dev/null, which is to prevent find from finding any files that allow grep to enter the null state of obtaining information from standard input, and to allow the grep command to have multiple file parameters so that the result can display the file name and the number of rows that appear.
We can solve such an argument with an too long problem by beginning to refer to the Xargs command, such as:
$ find/usr/include-type F | Xargs grep posix_open_max/dev/null
Here Xargs if the input file name is not obtained, it is terminated by default. GNU's Xargs support--null option: The nul end of the list of file names that can be processed by the-PRINT0 option for GNU find. Xargs each such file name as a complete parameter, passing to the command it executes, without the risk of the shell (error) explaining the problem or the confusion of the newline symbol, and then the subsequent command to process its arguments. Alternatively, the Xargs option can control which parameters need to be replaced, as well as limit the number of parameters passed.

If you understand the spatial information of the file system, we can use the Find and LS commands to help with the AWK program, such as:
$ Find-ls | awk ' {sum +=$7} end {printf ("Total:%.0f bytes\n", Sum)} '
But does not use, the code long does not say also does not know the free space. There are two handy commands to address this need: DF and Du.

DF (disk free) provides a single-line summary that shows the used and available space for a loaded file system. Display units to see the corresponding version. You can force a practical kilobytes unit with a practical K. There is also an option-l only displays the local file system, excluding the file system loaded by the network. There is also the-I option, which provides access to inode usage. The GNU DF also provides the-H (human-readable) option for easy reading. You can also provide one or more file system names or mount points to limit output items: $ df-lk/dev/sda6/var.

Du will digest the free space of the file system, but will not tell you how much space is required for a particular directory tree, which is the work of DU (disk usage). Different systems may vary,-K control units,-s display summary.
The GNU version provides-H, with DF. One common problem that du can solve is to find out which user is using most of the system space: $ du-s-k/home/users/* | SORT-K1NR | Less
Assume that the user directory is all placed under/home/users.

Two commands for comparing files with CMP and diff. CMP can be directly followed by two file parameters, if the different output results will indicate the position of the first difference, the same without any output. -S can suppress output by $? To view the departure status code, Non-zero indicates different. The diff convention is to use the old file as the first argument, the different rows with the left angle bracket before, corresponding to the left file, and the front right angle bracket refers to the right side of the file. Another extension is DIFF3, which compares 3 files.

Sometimes the need to fix different places, the patch command provides a very convenient approach:

 code as follows:

$ echo Test 1 > Test.1
$ echo Test 2 > Test.2
$ diff-c Test. [A] > Test.dif
$ Patch < TEST.DIF

When you look at the test.1 you will find that the contents have changed to test 2. Patch will try to apply the differences as far as possible, and then report the failed part, which is handled by the user. Although patch can also handle the general diff output, the general is the information that handles the DIFF-C option.

If you suspect that many files have the same content, practical CMP or diff is cumbersome. At this point can be a practical file checksum (document checksum), to obtain approximate linear performance to complete this tedious work. There are a number of tools available, such as Sum, cksum, checksum, message digest tools MD5 and md5sum, security three-column (Secure-hash) algorithm Tools sha, Sha1sum, SHA256, and sha384. Unfortunately: The instances of sum are not identical on each platform, making their output impossible to compare file checksums across different UNIX versions. The general will be like this:
$ md5sum/bin/l?
The output has 32 hexadecimal digits, equal to 128 bits, so the likelihood of two different files hashing the same signature at the end is very low. Once you know this, you can write a simple script to achieve our previous goals.

 code as follows:

#! /bin/sh-
# according to their MD5 checksum, the file name that shows some degree of content opportunity has been
# show-indentical-files Files

Ifs= '
Export PATH

Md5sum "$@"/dev/null 2>/dev/null |
awk ' {
if (count[$1] ==1) first[$1]=$0
if (count[$1] ==2) print first[$1]
if (count[$1] >1) print $
}' |
Sort | awk ' {
if (last!= $) print ""
Last = $

The program is very simple, do not comment it. Can test:
$ show-indentical-files/bin/*
Found that a lot of orders can be loaded ah, in fact, the same content--!

This is useful for digital signature verification.
When Software is released, it typically includes a checksum of the distribution file, which makes it easy to know if the downloaded file matches the original file. However, a separate checksum does not provide validation (verification) Work: If the checksum is recorded in another file in your download software, the attacker can maliciously modify the software and then only need to modify the checksum accordingly. The solution to the
problem is public key cryptography (Public-key cryptography). Under this mechanism, the security of data comes from the existence of two related keys: A secret key, which is known only to the owner, and to a public key. One of the two keys is encrypted and the other is used for decryption. The security of public key encryption, which relies on a known public key and the text that can be decrypted by the key, provides an information that is not actually useful but can be used to reply to the private key. The biggest breakthrough in this invention is the solution to the most serious cryptography problem: How to Exchange encryption keys securely between objects that need to communicate with each other.
How does the secret key and public key work and operate? Assuming that Alice wants to sign a public file, she can use her private key (private key) to encrypt the file. Bob then decrypts the signed file with Alice's public key, which makes sure that the file is signed by Alice and that Alice does not have to disclose her secret key to make the file trusted.
If Alice wants to send him a letter that only Bob can read, she should encrypt the letter with Bob's public key, and Bob will then use its secret key to decrypt the message. As long as Bob takes care of his secret key, Alice can be sure that only Bob can read her letter.
encryption of the entire information is not necessary: relative, if only the file's checksum encryption, it is equal to a digital signature (digital signature). This approach is useful if the information itself is public, but there is a way to verify its authenticity. To fully explain the public key encryption mechanism, the need for the entire book, you can refer to "Security and cryptography."

Computers are becoming more vulnerable to attacks, and downloading files or software is very safe. General software archive files are incorporated into the digital signature of the file checksum information, which can be validated if it is not certain that the downloaded item is safe. Example:
$ ls-l coreutils-5.0.tar*
-rw-rw-r--1 Jones devel 6020616 APR 2 2003 coreutils-5.0.tar.gz
-rw-rw-r--1 Jones devel Apr 2 2003 Coreutils-5.0.tar.gz.sig
$GPG Coreutils-5.0.tar.gz.sig #Try to verify this signature
Gpg:signature made Wed APR 2 14:26:58 2003 MST using DSA key ID D333CBA1
Gpg:can ' t check signature:public key not found
The validation failed because we have not added the public key of the signer to the GPG key ring. We can find the public key on the signature author's personal website or ask by email. Fortunately, however, people who use digital signatures tend to register their public key with a third-party (thrid-party) public key server, and the registration is automatically shared with other key servers.
Store the key contents to a temporary file such as "Temp.key" and add it to the key ring:
$ GPG--import Temp.key
The signature can then be validated successfully.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.