Commands that should be run by Linux administrators: sed and awk

Source: Internet
Author: User

Commands that should be run by Linux administrators: sed and awk

Image Source: Shutterstock

Do not let the next-generation Linux and Unix administrators forget the benefits of initializing scripts and basic tools.

I once saw a post in Reddit, "How do I operate text files ". This is a simple requirement, just as we often encounter Unix people every day. The problem is how to delete duplicate rows in a file and keep them unchanged. This sounds simple, but it is complicated when the file is large enough.

There are many different answers to this question. You can use almost any language to write such a script, but the time investment and Code complexity are different. Depending on your personal level, it will take about 20-60 minutes. However, if you use one of Perl, Python, and Ruby, you may implement it quickly.

Or you can use the following method to make it very warm: Only awk is used.

The answer is the simplest and simplest way to solve the problem so far. It only needs one line!

awk '!seen[$0]++' <filename>

Let's see what happened:

In this command, a lot of code is actually hidden. Awk is a text processing language with many internal presets. First, what you see is actually the result of a for loop. Awk assumes that you want to process each row of the input file cyclically, so you do not need to specify it explicitly. Awk also assumes that you need to print the output data, so you do not need to specify it. Finally, awk assumes that the loop ends after the execution of the last command, and you do not need to specify it.

In this example, the string seen is the name of an associated array. $0 is a variable that indicates the entire row. Therefore, translating this command into a human language is "checking every line of this file. If you have never seen it before, print it out ." If the key name of the associated array does not exist, add it to the array and add its value. In this way, awk will not match the same row next time (the condition is "false "), so that it is not printed out.

Some people think this is elegant, while others think this may cause confusion. Anyone who uses awk in their daily work is the first type. Awk is designed for this purpose. In awk, you can write multiple lines of code. You can even use awk to write complicated and disturbing functions. However, after all, awk is still a text processing program, usually through pipelines. Removing (unnecessary) the loop definition is a common and convenient method. However, if you are happy, you can also use the following code to do the same thing:

  1. awk '{ if (!seen[$0]) print $0; seen[$0]++ }’

This produces the same results.

Awk is the perfect tool for doing this. However, I believe that many administrators, especially new administrators, will switch to Bash or Python to complete this task, because the knowledge of awk and its ability seem to be gradually forgotten over time. I think this is a sign of a problem. Due to lack of understanding about the previous solutions, the problems that have been solved for decades have suddenly appeared.

Shell, grep, sed, and awk are the basis of Unix. If you cannot use them very easily, you will be bound by yourself because they constitute the basis for interaction with Unix systems through command lines and scripts. One of the best ways to learn how these tools work is to observe real examples and experiments. You can find a lot in the initialization systems of various Unix derivative systems, however, they have been replaced by systemd in the Linux release.

Millions of Unix administrators know how to read, write, modify, and use Shell scripts and Unix tools on Initialization scripts. The initialization scripts of different systems are very different, and even different Linux distributions are different. But they all come from sh, and they all use core command line tools like sed, awk, and grep.

Every day, I hear many people complain that the initialization script is too "old" and "difficult ". But in fact, the initialization script is the same as the tool that Unix administrators use every day. It also provides a great way to get familiar with and get used to these tools. It is hard to read and use initialization scripts. In fact, you are not familiar with basic Unix tools.

Speaking of what I saw on Reddit, I also encountered this problem, from a new Linux system administrator, "Ask him if he should learn the old-fashioned initialization system sysvinit ". Most of the answers in this post are positive-yes, you should learn sysvinit and systemd. One reviewer even pointed out that initializing scripts is a good way to learn Bash. Another message is that Fortune's top 50 companies are not yet planning to migrate to the release based on systemd.

However, this reminds me that this is indeed a problem. If we continue to develop along the way of eliminating scripts and leaving the core components of the operating system, we will inadvertently make it difficult for new administrators to learn basic Unix tools.

I don't know why some people want to abstract the Unix interior layer after layer, but this development may turn the new generation of System Administrators into workers who only press the button. I don't think this is a good thing.

Introduction and use of AWK

AWK introduction and Examples

Shell script-AWK text editor syntax

Learning and using AWK in Regular Expressions

AWK diagram of Text Data Processing

How to Use the awk command in Linux

Text Analysis Tool-awk

Via: http://www.infoworld.com/article/2985804/linux/remember-sed-awk-linux-admins-should.html

Author: Paul Venezia Translator: Bestony Proofreader: wxy

This article was originally compiled by LCTT and launched with the honor of Linux in China

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.