Original article: Python scripts as a replacement for Bash utility scripts
From: http://www.oschina.net/translate/python-scripts-replacement-bash-utility-scripts
Translator: enixyu,
Showme,
Onion noodles
For Linux users, the reputation of command line is quite high. Unlike other operating systems, the command line is a terrible proposition, but for those experienced in the Linux community, the command line is the most worthy of encouragement. Generally, command lines provide better and more efficient solutions than graphical user interfaces.
With the growth of the Linux community, command lines, such as Bash and zsh, have become a powerful tool and an important part of Unix shell. Bash and other similar shells can be used to obtain some useful functions, such as pipelines, file name wildcards, and read commands from files, that is, scripts.
Let's introduce the powerful functions of the command line in actual operations. Every time a user logs on to a service, their user names are recorded in a text file. For example, let's see how many independent users have used the service.
The following series of commands demonstrate the powerful functions achieved by concatenating small commands:
$ cat names.log | sort | uniq | wc -l
Pipeline symbol (|) transmits the standard output of a command to the standard input of another command. In this example, the output of cat names. log is sent to the input of the sort command. The sort command sorts each row alphabetically. Next, the pipeline sends the output to the uniq command, which can delete duplicate names. Finally, the uniq output is sent to the WC command. WC is a character counting command. You can use the-l parameter to return the number of rows. Pipeline allows you to concatenate a series of commands.
However, sometimes the requirements are complex, and serial commands become very cumbersome. In this case, the shell script can solve this problem. Shell scripts are a series of commands that are read by Shell programs and executed in sequence. Shell scripts also support some programming language features, such as variables, process control, and data structures. Shell footsteps are very useful for batch processing programs that often run repeatedly. However, shell scripts also have some weaknesses:
- Shell scripts can easily become complex code, making it difficult for developers to read and modify them.
- Generally, its syntax and interpretation are not so flexible and intuitive.
- Its code is usually not used by other scripts. The code reuse rate in the script is very low, and the script usually solves some very specific problems.
- They generally do not support library features, such as HTML interpreters or http request libraries, because libraries generally only appear in popular languages and scripting languages.
These problems usually lead to inflexible scripts and waste a lot of time for developers. The Python language is a good alternative. Using python as an alternative to shell scripts has many advantages:
- Python is installed by default in mainstream Linux releases. Open the command line and enter python to enter the world of Python immediately. This feature makes it the best choice for most script tasks.
- Python is easy to read and Its syntax is easy to understand. Its Style focuses on writing simple and clean code, allowing developers to write style code suitable for shell scripts.
- Python is an interpreted language, which means that compilation is not required. This makes python the ideal scripting language. Python still reads, deducts, and outputs the loop style at the same time, which allows developers to quickly try new code through the interpreter. Developers can implement their own ideas without having to rewrite the entire program.
- Python is a fully functional programming language. Code reuse is very simple, because the python module can be easily imported and used in scripts. Scripts can be easily expanded.
- Python can access excellent standard libraries and a large number of third-party libraries that implement multiple functions. For example, interpreter and request library. For example, the standard library of Python contains a time library, which allows us to convert time to various formats we want and can be compared with other dates.
- Python can be a part of the command chain. Python cannot completely replace bash. Python programs can be like UNIX (read from standard input and output from standard output), so Python programs can implement shell commands such as CAT and sort.
Let's use Python again based on the issues mentioned above. In addition to the completed work, let's see how many times a user has logged on to the system. The uniq command simply deletes duplicate records without prompting the number of repeated records. We use the Python script to replace the uniq command, and the script can be part of the command chain. In this example, the script is named namescount. py ):
#! /Usr/bin/ENV pythonimport sysif _ name _ = "_ main _": # initialize a names dictionary, blank content # Name and number of key-value pairs in the dictionary names ={}# sys. stdin is a file object. All methods referenced in the file object # can be applied to SYS. stdin. for name in SYS. stdin. readlines (): # each line has a newline character ending # We need to delete it name = Name. strip () If name in names: Names [name] + = 1 else: Names [name] = 1 # iteration dictionary, # output name, space, next, the number of names displayed for name, count in names. iteritems (): SYS. stdout. write ("% d \ t % s \ n" % (count, name ))
Let's take a look at how Python scripts play a role in the command chain. First, it reads data from the standard input SYS. stdin object. All outputs are written to the SYS. stdout object, which is the standard output implementation in Python. Then, the python Dictionary (in other languages, called a hash table) is used to save the ing between names and repeated times. To read the logon times of all users, run the following command:
$ cat names.log | python namescount.py
Here, the number of times a user appears and his/her name will be output. tab is used as the separator. The next step is to output the logs in descending order of the number of user logins. This can be implemented in Python, but let's use Unix commands. As mentioned above, the sort command can be sorted alphabetically. If the sort command receives a-Rn parameter, it sorts the parameters in descending order of numbers. Because the Python script is output to the standard output, we can use the pipe link sort command to obtain the output:
$ cat names.log | python namescount.py | sort -rn
In this example, python is used as part of the command chain. The advantages of using Python are:
- It can be linked with commands such as CAT and sort. Simple tools (Reading files and sorting files by numbers) can use mature Unix commands. These commands are read in one row, which means these commands can be compatible with large-capacity files, and they are highly efficient.
- If a part of the command chain is difficult to implement and clear, we can use Python scripts, which allows us to do what we want and then reduce the burden on the chain of commands.
- Python is a reusable module. Although names is specified in this example, if you need to process other input of repeated rows, You can output each row and the number of repeated rows. Modularize the Python script so that you can apply it elsewhere.
To demonstrate the power of combining modules and pipelines in the Python script, let's extend this question. Let's find the top five users who use the most services. The head command allows us to specify the number of rows to be output. Add this command to the command chain:
$ cat names.log | python namescount.py | sort -rn | head -n 5
This command only lists the first five users. Similarly, to obtain the minimum number of users using the service, you can use the tail command, which uses the same parameters. The result of the python command is output to the standard output, which allows you to expand and build its functions.
To demonstrate the modularization of the script, let's expand the problem. The Service also generates a CSV Log File separated by commas (,), which includes a list of email addresses and the evaluation of this address on our service. Here is an example:
"email@example.com", "This service is great."
This task provides a way to send a thank-you message to the top 10 users who use the service at most. First, we need a script to read CSV and output a field. Python provides a Standard CSV reading module. The following Python script implements this function:
#! /Usr/bin/ENV Python # CSV module that comes with the python standard libraryimport csvimport sysif _ name _ = "_ main __": # The CSV module uses a reader object as the input # in this example, sys. stdin. csvfile = CSV. reader (sys. stdin) # This script must receive a parameter and specify the column number # Use sys. argv obtains parameters. column_number = 0 if Len (sys. argv)> 1: column_number = int (sys. argv [1]) # each line of the CSV file uses a comma as the field separator for row in csvfile: Print row [column_number]
This script can convert the CSV file and return the text of the field specified by the parameter. It uses print to replace SYS. Stout. Write, Because print uses the standard output file by default.
Let's add this step to the command chain. The new script is combined with other commands to output the email address with the most comments. (Hypothetical .csv file name is emailcomments.csv, the new script is csvcolumn. py)
Next, you need a method to send emails. In the python function standard library, you can import the smtplib library, which is a module used to connect to the SMTP server and send emails. Let's write a simple Python script and use this module to send an email to each top 10 users.
#!/usr/bin/env pythonimport smtplibimport sysGMAIL_SMTP_SERVER = "smtp.gmail.com"GMAIL_SMTP_PORT = 587GMAIL_EMAIL = "Your Gmail Email Goes Here"GMAIL_PASSWORD = "Your Gmail Password Goes Here"def initialize_smtp_server(): ''' This function initializes and greets the smtp server. It logs in using the provided credentials and returns the smtp server object as a result. ''' smtpserver = smtplib.SMTP(GMAIL_SMTP_SERVER, GMAIL_SMTP_PORT) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo() smtpserver.login(GMAIL_EMAIL, GMAIL_PASSWORD) return smtpserverdef send_thank_you_mail(email): to_email = email from_email = GMAIL_EMAIL subj = "Thanks for being an active commenter" # The header consists of the To and From and Subject lines # separated using a newline character header = "To:%s\nFrom:%s\nSubject:%s \n" % (to_email, from_email, subj) # Hard-coded templates are not best practice. msg_body = """ Hi %s, Thank you very much for your repeated comments on our service. The interaction is much appreciated. Thank You.""" % email content = header + "\n" + msg_body smtpserver = initialize_smtp_server() smtpserver.sendmail(from_email, to_email, content) smtpserver.close()if __name__ == "__main__": # for every line of input. for email in sys.stdin.readlines(): send_thank_you_mail(email)
This Python script can connect to any SMTP server, whether local or remote. For ease of use, I use the SMTP server of Gmail. Under normal circumstances, you should provide your password to connect to Gmail. This script uses the function in the SMTP library to send emails. Once again, it is proved that the Python script is powerful. It is easy to read for interactive operations like SMTP to be written in Python. If the same shell script is used, it may be complicated and there are basically no libraries like SMTP.
To send emails to the top 10 users with the highest frequency of comments, you must first obtain the content in the email column separately. To retrieve a column, you can use the cut command in Linux. In the following example, the command is in two separate strings. For ease of use, I write and output to a temporary file, which can be loaded into the second string of commands. This only makes the process more readable (the python mail sending script is sendemail. py ):
$ cat emailcomments.csv | python csvcolumn.py | python namescount.py | sort -rn > /tmp/comment_freq$ cat /tmp/comment_freq | head -n 10 | cut -f2 | python sendemail.py
This shows the real power of python as a utility such as Bash command chain. The written scripts accept data from the standard input and write any output to the standard output, allowing developers to concatenate these commands, these fast, simple commands in the chain, and Python programs. This philosophy of designing applets for one purpose is very suitable for the command stream method used here.
The Python script used in the command line. When they run a command, the parameter is selected by the user. For example, the head command gets a-n parameter mark and the number following it, and then only prints the number of rows in this number size. Every parameter in the Python script is provided through the SYS. argv array and can be accessed later in import sys. The following code shows how to use a single word as a parameter. This program is a simple calculator, which has two numeric parameters, add them, and print the output to the user. However, the use of such command line parameters is very basic. This is also very easy to get wrong-for example, input two strings, such as hello and world, this command, you will get the error at the beginning:
#!/usr/bin/env pythonimport sysif __name__ == "__main__": # The first argument of sys.argv is always the filename, # meaning that the length of system arguments will be # more than one, when command-line arguments exist. if len(sys.argv) > 2: num1 = long(sys.argv[1]) num2 = long(sys.argv[2]) else: print "This command takes two arguments and adds them" print "Less than two arguments given." sys.exit(1) print "%s" % str(num1 + num2)
Fortunately, Python has many modules that process command line parameters. I personally prefer optionparser. Optionparser is part of the optparse module provided by the standard library. Optionparser allows you to perform a series of very useful operations on command line parameters.
- If no specific parameter is provided, you can specify the default parameter
- It supports parameter flags (displayed or not displayed) and parameter values (-N 10000 ).
- It supports different formats for passing parameters-for example, the difference is-N = 100000 and-N 100000.
We will use optionparser to improve the sending-mail script. The original script has a lot of hard-coded variables, such as SMTP details and user login creden. The Code provided below is used to pass command line parameters in these variables:
#!/usr/bin/env pythonimport smtplibimport sysfrom optparse import OptionParserdef initialize_smtp_server(smtpserver, smtpport, email, pwd): ''' This function initializes and greets the SMTP server. It logs in using the provided credentials and returns the SMTP server object as a result. ''' smtpserver = smtplib.SMTP(smtpserver, smtpport) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo() smtpserver.login(email, pwd) return smtpserverdef send_thank_you_mail(email, smtpserver): to_email = email from_email = GMAIL_EMAIL subj = "Thanks for being an active commenter" # The header consists of the To and From and Subject lines # separated using a newline character. header = "To:%s\nFrom:%s\nSubject:%s \n" % (to_email, from_email, subj) # Hard-coded templates are not best practice. msg_body = """ Hi %s, Thank you very much for your repeated comments on our service. The interaction is much appreciated. Thank You.""" % email content = header + "\n" + msg_body smtpserver.sendmail(from_email, to_email, content)if __name__ == "__main__": usage = "usage: %prog [options]" parser = OptionParser(usage=usage) parser.add_option("--email", dest="email", help="email to login to smtp server") parser.add_option("--pwd", dest="pwd", help="password to login to smtp server") parser.add_option("--smtp-server", dest="smtpserver", help="smtp server url", default="smtp.gmail.com") parser.add_option("--smtp-port", dest="smtpserverport", help="smtp server port", default=587) options, args = parser.parse_args() if not (options.email or options.pwd): parser.error("Must provide both an email and a password") smtpserver = initialize_smtp_server(options.stmpserver, options.smtpserverport, options.email, options.pwd) # for every line of input. for email in sys.stdin.readlines(): send_thank_you_mail(email, smtpserver) smtpserver.close()
This script shows the role of optionparser. It provides a simple and easy-to-use interface for command line parameters, allowing you to define certain attributes for each command line option. It also allows you to specify the default value. If some parameters are not provided, it can report a specific error to you.
How much have you learned? Instead of replacing all bash commands with a Python script, we recommend that python complete some difficult tasks. This requires more modular and reusable scripts, and makes good use of the powerful functions of Python.
Stdin is used as the file object, which allows python to read the input. This input is sent to another command output by the pipeline, and the output is to stout, it allows python to pass information to the next step of the pipeline system. With these functions, You can implement powerful programs. The example mentioned here is to implement a log file that processes the service.
In practical applications, I recently processed a GB-level CSV file. I need to use a Python script to convert an SQL command containing inserted data. Understand the files I need to process and process the data in a table. It takes 23 hours for the script to execute and generate a 20 gb SQL file. The advantage of the python programming style mentioned in this article is that we do not need to read this file into the memory. This means that the entire 20 GB + file can be processed in one row. In addition, we can clearly break down every step (read, sort, maintain, and output) as some logical steps. We also get the assurance of these commands, which are the core tools of Unix-type environments. They are very efficient and stable, and can help us build stable and secure programs.
Another advantage is that we do not need hard-coded file names. This makes the program more flexible and requires only one parameter to be passed. For example, if the script is interrupted at 20000 in a file, we do not need to re-run the script. We can use tail to specify the number of failed lines to continue running the script at this location.
Python has a wide range of applications in shell and is not limited to this article, such as the OS module and subprocess module. The OS module is a standard library that can perform many operating system-level operations, such as listing the directory structure, file statistics, and an excellent OS. path sub-module, which can process standard directory paths. The Subprocess module allows python to run system commands and other advanced commands, such as pipeline processing between Python code and spawned processes mentioned above. If you need to write Python shell scripts, these libraries are worth studying.
Address: http://www.oschina.net/translate/python-scripts-replacement-bash-utility-scripts
Address: http://www.linuxjournal.com/content/python-scripts-replacement-bash-utility-scripts
All translations in this article are only used for learning and communication purposes. For reprinting, please be sure to indicate the author, source, and link to this article.
Our translation work complies with the CC protocol. If our work infringes your rights and interests, please contact us in time