Tutorial on implementing part of the Bash shell with a Python script under Linux

Source: Internet
Author: User
Tags html interpreter starttls python script gmail password
For Linux users, the command line has a very high reputation. Unlike other operating systems, the command line is a scary proposition, but for the seasoned Daniel of the Linux community, the command line is the most recommendable. Typically, the command line compares the graphical user interface to provide a more elegant and efficient solution.

Command line along with the growth of the Linux community, Unix shells, such as bash and zsh, have grown into a powerful tool and an important part of the Unix shell. With bash and other similar shells, you can get some useful features, such as pipelines, filenames, and read commands from a file, which is the script.


Let's introduce the powerful features of the command line in the actual operation. Each time a user logs in to a service, their user name is recorded in a text file. For example, let's see how many independent users have used the service.

The following series of commands show the powerful features that are implemented after a small set of commands are strung together:

$ Cat Names.log | Sort | Uniq | Wc-l

Pipe symbol (|) sends the standard output of one command to the standard input of another command. In this example, the output of the cat names.log is transmitted to the input of the sort command. The sort command re-sorts each line in alphabetical order. Next, the pipeline transmits the output to the Uniq command, which can remove duplicate names. Finally, the output of the Uniq is sent to the WC command. WC is a character count command that uses the-l parameter to return the number of rows. The pipeline allows you to string together a series of commands.


However, sometimes the requirements can be complicated and the string commands become cumbersome. In this case, the shell script can solve the problem. A shell script is a series of commands that are read by the shell program and executed sequentially. Shell scripts also support the features of some programming languages, such as variables, process controls, and data structures. Shell footsteps are useful for batch programs that run frequently repeatedly. However, shell scripts have some weaknesses:

    • Shell scripts can easily become complex code, which makes it difficult for developers to read and modify them.
    • In general, its syntax and interpretation are not so flexible, and not intuitive.
    • Its code is usually not used by other scripts. The code reuse rate in the script is very low, and the script usually solves some very specific problems.
    • They generally do not support library features, such as the HTML interpreter or the HTTP request library, because libraries are generally only available in popular languages and scripting languages.

These problems often result in scripts becoming inflexible and wasting a lot of time for developers. The Python language, as a substitute for it, is a pretty good choice. Using Python as an alternative to shell scripting often has many advantages:

    • Python is installed by default in the mainstream Linux distribution. Open the command line and enter Python to get to the world of Python right away. This feature allows it to be the best choice for most scripting tasks.
    • Python is very easy to read and the syntax is easy to understand. Its style focuses on writing simple and clean code, allowing developers to write style code that is appropriate for Shell scripting.
    • Python is an explanatory language, which means that no compilation is required. This makes Python an ideal scripting language. Python also reads, interprets, and outputs the loop style, which allows developers to quickly try out new code through the interpreter. Developers can implement some of their own ideas without having to rewrite the entire program.
    • Python is a full-featured programming language. Code reuse is straightforward, because Python modules can be easily imported and used in scripts. The script can be easily extended.
    • Python has access to a good standard library, as well as a large number of third-party libraries that implement multiple functions. such as the interpreter and the request library. For example, Python's standard library contains a time library that allows us to convert time to the various formats we want, and can be compared with other dates.
    • Python can be part of a command chain. Python does not completely replace bash. Python programs can be as UNIX-style (read from standard input, output from standard output), so Python programs can implement some shell commands, such as cat and sort.

Let's reuse the Python build based on the problem mentioned earlier in the article. In addition to the work that has been done, let's look at how many times a user has logged on to the system. The Uniq command simply deletes duplicate records without prompting how many times the duplicate records are repeated. We use a Python script instead of the Uniq command, and the script can be part of the command chain. The following is a Python program that implements this function (in this case, the script is called namescount.py):


#!/usr/bin/env pythonimport sys if __name__ = = "__main__":  # Initializes a names dictionary, the content is empty  # dictionary for name and the number of key values that appear in  names = {}  # Sys.stdin is a file object. All methods referenced by the file object,  # can be applied to Sys.stdin.  For name in Sys.stdin.readlines ():      # Each line has a newline character to do the end      # We need to remove it      name = Name.strip ()      if name in names:          Names[name] + = 1      else:          names[name] = 1   # iteration dictionary,  # output name, space, followed by the number of occurrences of the name for  name, count in Names.iteritems ():      sys.stdout.write ("%d\t%s\n"% (count, name))

Let's take a look at how the Python script works in the command chain. First, it reads data from the standard input Sys.stdin object. All outputs are written to the Sys.stdout object, which is the implementation of standard output in Python. You then use a Python dictionary (called a hash table in other languages) to save the name and number of repetitions of the map. To read the number of logins for all users, simply execute the following command:

$ Cat Names.log | Python namescount.py


This will output the number of times a user appears and his name, using tab as the delimiter. The next thing is to output in descending order of the number of user landings. This can be implemented in Python, but let's do it using UNIX commands. As mentioned earlier, using the sort command can be sorted alphabetically. If the sort command receives a-RN parameter, it is sorted in descending order of numbers. Because the Python script is output to standard output, we can use the Pipe link Sort command to get the output:

$ Cat Names.log | Python namescount.py | Sort-rn

This example uses Python as part of the command chain. The advantages of using Python are:

    • Can be linked with commands such as cat and sort. Simple tools (reading files, sorting files by numbers), can use mature UNIX commands. These commands are read in a row, which means that these commands can be compatible with large files, and they are highly efficient.
    • If a part of the command chain is difficult to implement, it is clear that we can use the Python script, which allows us to do what we want to do, and then lighten the burden of a chain command.
    • Python is a reusable module, although this example is specifying names, if you need to handle the other inputs of the duplicate rows, you can output each row and the number of repetitions of that row. Make the Python script modular so that you can apply it elsewhere.

To demonstrate the power of combining modules and piping styles in a Python script, let's extend the issue. Let's find out the top 5 users with the most services. The head command lets us specify the number of rows that need to be output. Add this command to the command chain:

$ Cat Names.log | Python namescount.py | Sort-rn | Head-n 5

This command will only list the top 5 users. Similarly, to get the 5 users with the least number of services, you can use the tail command, which uses the same parameters. The result of the Python command is output to standard output, which allows you to extend and build its functionality.


To demonstrate the modularity of the script, let's expand the problem. The service also generates a comma-separated CSV log file that contains a list of email addresses, as well as an evaluation of our service at that address. Here is one example:

"Email@example.com", "This service is great."


The task is to provide a way to send a thank-you message to the top 10 users who use the service. First, we need a script to read the CSV and output one of the fields. Python provides a standard CSV reading module. The following Python script implements this feature:

#!/usr/bin/env python# CSV module that comes with the Python standard libraryimport csvimport sys  if __name__ = = "__ma in__ ":  # CSV module uses a Reader object as input  # in this case, it is Sys.stdin.  CSVFile = Csv.reader (Sys.stdin)   # This script must receive a parameter, specifying the column ordinal  # using SYS.ARGV to get the parameter.  Column_number = 0  If len (SYS.ARGV) > 1:      column_number = Int (sys.argv[1])   # CSV file each line is delimited  with a comma as the field For row in CSVFile:      print Row[column_number]

This script can convert the CSV and return the text of the field specified by the parameter. It uses print instead of Sys.stout.write because print defaults to its output file using the standard output.

Let's add this step to the command chain. The new script is combined with other commands to achieve the most commented-out email address. (Suppose the. csv file name is called Emailcomments.csv, and the new script is csvcolumn.py)


Next, you need a way to send mail, in the Python function standard library, you can import the Smtplib library, which is a module used to connect to the SMTP server and send mail. Let's write a simple Python script that uses this module to send a message to each top 10 user.

#!/usr/bin/env pythonimport smtplibimport sys gmail_smtp_server = "smtp.gmail.com" Gmail_smtp_port = 587 Gmail_email = "Yo ur gmail Email Goes here "Gmail_password =" Your Gmail PASSWORD Goes Here "def initialize_smtp_server ():" This functi  On initializes and greets the SMTP server.  It logs in using the provided credentials and returns the SMTP server object as a result. "' SmtpServer = smtplib. SMTP (Gmail_smtp_server, Gmail_smtp_port) Smtpserver.ehlo () Smtpserver.starttls () Smtpserver.ehlo () Smtpserver.login (Gmail_email, Gmail_password) return smtpserver def send_thank_you_mail (email): to_email = Email From_email = Gmail_em AIL subj = "Thanks for being a active commenter" # The header consists of the and from and Subject lines # separate D using a newline character header = "to:%s\nfrom:%s\nsubject:%s \ n"% (To_email, From_email, subj) # hard-coded Te  Mplates is not the best practice. Msg_body = "" "Hi%s, Thank very much for your repeated comments on OUR service.   The interaction is much appreciated. Thank "" "% email content = header +" \ n "+ msg_body SmtpServer = Initialize_smtp_server () Smtpserver.sendmail (from  _email, To_email, content) Smtpserver.close () if __name__ = = "__main__": # for every line of input. For mail in Sys.stdin.readlines (): send_thank_you_mail (email)

This Python script can connect to any SMTP server, either locally or remotely. For ease of use, I use Gmail's SMTP server, which normally should provide you with a password to connect to Gmail, which uses functions in the SMTP library to send messages. Once again, the powerful thing about using Python scripts is that interactions like SMTP are easier to read with Python. The same shell script, it may be more complex and like SMTP library is basically not.

In order to send e-mail to the top 10 users with the most frequent comments, you must first get the contents of the e-mail column separately. To remove a column, you can use the Cut command in Linux. In the example below, the command is in two separate strings. For ease of use, I write the output to a temporary file, which can be loaded into the second string of commands. This just makes the process more readable (Python sends a mail script short for sendemail.py):

$ Cat Emailcomments.csv | Python csvcolumn.py |? python namescount.py | Sort-rn >/tmp/comment_freq$ cat/tmp/comment_freq | Head-n 10 | cut-f2 |? python sendemail.py

This shows the true power of Python as a utility such as a bash command chain. Scripts are written to accept data from standard input and write any output to standard output, allowing developers to string these commands, the quick, simple commands in the chain, and the Python program. This philosophy of designing small programs for one purpose is well suited to the way the command flow is used here.


Python scripts are typically used at the command line, and when they run a command, the parameters are selected by the user. For example, the head command obtains a-n parameter flag and a number after it, and then prints only the number of rows of that number. Each parameter of the Python script is provided through the SYS.ARGV array and can be accessed later in the import sys. The following code shows how to use a single word as an argument. This program is a simple adder, it has two numeric parameters, add them, and print the output to the user. However, this type of command-line parameter usage is very basic. It is also easy to make mistakes-for example, to enter two strings, such as Hello and world, this command, you will get the error from the beginning:

#!/usr/bin/env pythonimport sys if __name__ = = "__main__": # The first argument of SYS.ARGV is always the  Filename,
  # meaning that the length of system arguments would be  is # more than one when command-line arguments exist.  If Len (SYS.ARGV) > 2:      num1 = Long (sys.argv[1])      num2 = Long (sys.argv[2])  else:      print "This command Takes arguments and adds them "      print" less than, arguments given. "      Sys.exit (1)  print "%s"% str (NUM1 + num2)

Fortunately, Python has a lot of modules for handling command-line arguments. Personally, I like Optionparser more. Optionparser is part of the Optparse module provided by the standard library. Optionparser allows you to do a series of very useful operations on command-line arguments.

    • If you do not provide a specific parameter, you can specify a default parameter
    • It supports parameter flags (either displayed or not) and parameter values (-N 10000).
    • It supports different formats for passing parameters-for example, differentiated-n=100000 and-n 100000.

Let's use Optionparser to improve the Sending-mail script. The original script had many variables hard-coded places, such as SMTP details and the user's login credentials. The code provided below, where these variables are used to pass command-line arguments:

#!/usr/bin/env pythonimport smtplibimport sys from optparse import optionparser def initialize_smtp_server (SmtpServer,  Smtpport, email, pwd): "This function initializes and greets the SMTP server.  It logs in using the provided credentials and returns the SMTP server object as a result. "' SmtpServer = smtplib. SMTP (SmtpServer, Smtpport) Smtpserver.ehlo () Smtpserver.starttls () Smtpserver.ehlo () smtpserver.login (email, pwd) ret Urn SmtpServer def send_thank_you_mail (email, smtpserver): to_email = Email From_email = Gmail_email SUBJ = "Thanks fo R being an active commenter "# The header consists of the-and From and Subject lines # separated using a newline char  Acter. Header = "to:%s\nfrom:%s\nsubject:%s \ n"% (To_email, From_email, subj) # hard-coded templates is not best practice  .  Msg_body = "" "Hi%s, Thank very much for your repeated comments on our service.   The interaction is much appreciated. Thank "" "% email content = header +" \ n "+ msg_body Smtpserver.sendmail (From_email, to_email, content) if __name__ = = "__main__": Usage = "Usage:%prog [options]" p  Arser = Optionparser (usage=usage) parser.add_option ("--email", dest= "email", help= "email to login to SMTP server") Parser.add_option ("--pwd", dest= "pwd", help= "password to login to SMTP server") parser.add_option ("--smtp-server", D Est= "SmtpServer", help= "SMTP server URL", default= "smtp.gmail.com") parser.add_option ("--smtp-port", dest= "Smtpserve Rport ", help=" SMTP Server Port ", default=587) options, args = Parser.parse_args () if not (Options.email or options . pwd): Parser.error ("must provide both an email and a password") SmtpServer = Initialize_smtp_server (options.stmpse  RVer, Options.smtpserverport, Options.email, options.pwd) # for every line of input. For mail in Sys.stdin.readlines (): send_thank_you_mail (email, smtpserver) smtpserver.close ()

This script shows the role of Optionparser. It provides a simple, easy-to-use interface to command-line arguments, allowing you to define certain properties for each command-line option. It also allows you to specify a default value. If you do not give some parameters, it can give you a specific error.


How much have you learned now? Instead of using a Python script to replace all bash commands, we recommend that Python do some of these difficult tasks. This requires more modularity and reuse of scripts, and a good use of Python's powerful capabilities.

Using stdin as a file object, this allows Python to read the input, which is transmitted by the pipeline to the output of other commands, and output to stout, allowing Python to pass the information to the next part of the piping system. Combined with these features, powerful programs can be implemented. The example mentioned here is to implement a log file that handles the service.

In practice, I recently processed a gigabyte-level CSV file, and I need to use a Python script to transform a SQL command that contains the inserted data. Knowing the files I need to work with and working with them in a single table, the script takes 23 hours to execute and generate 20GB of SQL files. The advantage of using the Python programming style mentioned in the article is that we don't need to read this file into memory. This means that the entire 20gb+ file can be processed one line at a line. And we have a clearer breakdown of each step (read, sort, maintain, and output) for some logical steps. And we have the assurance of these commands, which are the core tools of the Unix-type environment, which are highly efficient and stable, and can help us build stable and secure programs.


Another advantage is that we don't need hard-coded filenames. This makes the program more flexible and requires only one parameter to be passed. For example, if a script is interrupted at 20000 in a file, and we don't need to rerun the script, we can use tail to specify the number of failed rows to allow the script to continue running at that location.

Python has a wide range of applications in the shell, not limited to this article, such as OS modules and subprocess modules. The OS module is a standard library that can perform many operating system-level operations, such as listing the structure of a directory, file statistics, and an excellent os.path submodule that can handle canonical directory paths. The subprocess module allows Python programs to run system commands and other advanced commands, such as those mentioned above using the Python code and the spawned process. If you need to write a Python shell script, these libraries are worth studying.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.