How to use the GNU text utility

Source: Internet
Author: User
Tags ibm developerworks
Article Title: how to use the GNU text utility. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
This tutorial shows how to use the GNU text Utility Set to process log files, documents, structured text databases, and other textual data or content sources. The utilities in this collection have been improved by UNIX/Linux developers for decades and have proved to be useful, and should be your first choice for general text processing tasks.
  
This tutorial is intended for Linux/UNIX programmers and system administrators.
  
Prerequisites for this tutorial
  
In this tutorial, you should be familiar with some UNIX-like environments, especially the command line shell. You don't need to be a programmer: in fact, the technology described in this tutorial will be most useful to system administrators and users who need to process special reports, log files, project documents, and similar content (hence not so useful for formal programming code processing). In the course of this tutorial, it is best to open a shell at any time, and test the examples shown in this tutorial and their forms of change.
  
The basic concepts will be reviewed in the introduction: UNIX philosophy where you can review the basics of pipeline, stream, grep, and script programming.
  
David Mertz has a persistent hobby in text processing. He even wrote a book specifically for this, Text Processing in Python, and often talked about topics in his articles and columns for IBM developerWorks.
  
For technical questions and comments about this tutorial, contact David Mertz or click "feedback" at the top of any screen ". David's Web site is also a good source of related information.
  
   Introduction: UNIX philosophy
Combine small utilities to complete large tasks
  
In UNIX-inspired operating systems such as Linux, FreeBSD, Mac OS X, Solaris, and AIX, there is a common philosophy behind the development environment and even shell and work environment. The purpose of this philosophy is to use small utilities to complete each small task perfectly (without any other negative impact) and then combine these utilities to perform a composite task. Most of the products produced by the GNU project support this philosophy ?? In fact, the specific GNU implementation has been transplanted to many platforms, and some platforms are not even traditionally regarded as UNIX classes. However, the Linux kernel must be a bit more single software ?? However, its kernel modules, file systems, and video drivers are all componentized.
  
In this tutorial, you should be familiar with some UNIX-like environments, especially the command line shell. You don't need to be a programmer: in fact, the technology described in this tutorial will be most useful to system administrators and users who need to process special reports, log files, project documents, and similar content (hence not so useful for formal programming code processing). In the course of this tutorial, it is best to open a shell at any time, and test the examples shown in this tutorial and their forms of change ..
  
Files and streams
  
If this UNIX philosophy has a moral aspect that advocates the minimum modular components and collaboration, it also has an ontology: "Everything is a file ". Abstract: A file only supports some operations: first, reading and writing bytes, but there are also operations such as pointing out its current location and figuring out when it reaches the end of the file. The UNIX permission model is also built around the concept of files.
  
Specifically, a file can be a specific area on a record medium (with a tag provided by the file system regarding its name, size, and location on the disk ). However, a file can also be a virtual device in the/dev/hierarchy, or a remote stream is sent through TCP/IP or through advanced protocols such as NFS. It is important that special files STDIN, STDOUT, and STDERR can be used to read or write to the user console, and to transmit data between utilities. These special files can be represented by virtual file names and have special syntax:
  
STDIN is/dev/stdin and/or/dev/fd/0
STDOUT is/dev/stdout and/or/dev/fd/1
STDERR is/dev/stderr and/or/dev/fd/2
  
The advantage of UNIX file ontology is that most of the utilities discussed here will process various data sources in a unified and neutral manner, regardless of the storage or transmission mechanism actually located under the byte transmission.
  
Redirection and MPs queue
  
A common combination of UNIX/Linux utilities is pipeline and redirection. Many utilities either automatically or optional accept input from STDIN and send their output to STDOUT (special messages are sent to STDERR ). The pipeline sends the STDOUT of a utility to the STDIN of another utility (or to a new call to the same utility ). Redirect or read the content of a file as STDIN, or send STDOUT and/or STDERR output to a specified file. Redirection is usually used to save data for post-processing or repeat processing (for the latter, the utility will use STDIN redirection ).
  
In almost all shells, pipelines are executed using the vertical line | symbol, while redirection is executed using greater than or less than signs:> and <. To redirect STDERR, use 2> or use &> to redirect STDOUT and STDERR to the same place at the same time. You can also use the double sign (>) to append the output to the end of an existing file. For example:
  
  
Source code: -------------------------------------------------- $ foo fname | bar-> myout 2> myerr
--------------------------------------------------
  
Here, the utility foo may process the file named fname and output it to STDOUT. Utility bar uses a common usage: specify a dash when the output is taken from STDIN rather than from the specified file (other utilities only accept STDIN ). STDOUT from bar is saved in myout, and its STDERR is saved in myerr.
  
What is a text utility?
  
GNU text utilities are a collection of tools used to process and manipulate text files and streams. they are extracted as UNIX-like operating systems evolve and proved to be the most useful. Most of these are components of earlier UNIX implementations, although many have added additional options over time.
  
Utilities collected in the archive file textutils-2.1 include 27 tools; however, the GNU project maintainer has recently decided to package these tools as part of a larger set of coreutils-5.0 (which is expected to be done in a later version ). On systems evolved from BSD rather than GNU tools, the packaging methods for the same utility may be slightly different, but most of the same utilities are still provided.
  
This tutorial focuses on 27 utilities traditionally included in textutils, and occasionally mentions and uses related tools generally available on UNIX-like systems. However, I will skip the introduction to the utility ptx (permuted index, replace index) because it is too narrow and hard to understand.
  
Grep (General regular expression processor)
  
This tool is not a part of textutils, but it is worth mentioning. The utility grep is one of the most widely used UNIX utilities and is often used for pipeline input or output of text utilities.
  
The work done by grep is very simple in one sense. in another sense, it is quite complicated and hard to understand. Basically, grep identifies the rows in the file that match a regular expression. It has some switches that allow you to modify the output in various ways, such as printing the upstream and downstream lines around, numbering the matching lines, or recognizing only the files with matching items instead of the individual lines. Essentially, grep is only used as a filter for the rows in the file (but it is very powerful ).
  
The complex part of grep is a regular expression. you can specify it to describe the required matching conditions. However, this will be covered in another tutorial (see the developerWorks tutorial "Using regular expressions" in reference. Many other utilities support regular expressions, but grep is one of the most common tools. Therefore, compared with the weak filters provided by other tools, it is usually easier to put grep into the pipeline. A simple grep example is as follows:
  
Source code: ---------------------------------------------- $ grep-c [Ss] ystem $ * 2>/dev/null | grep: [^ 0] $
INSTALL: 1
Aclocal. m4: 2
Config. log: 1
Configure: 1 ------------------------------------
  
This example lists the files that contain rows ending with "system" (or upper-case strings) and the number of times these instances appear (that is, if not 0 ). (In fact, this example will not process a count greater than 9 ).
  
Shell script
  
Although the text utility is designed to generate output in a variety of useful formats (usually modified through the command line switch), it is useful to jump and loop explicitly in some cases. Shell such as bash allows you to combine utilities through the control flow to execute more complex tasks. Shell scripts are particularly useful for encapsulating composite tasks that require multiple executions, especially those that involve task parameterization.
  
  
Explaining bash script programming is obviously out of the scope of this tutorial. See references to learn about Bash in the "bash by example" series on developerWorks. Once you understand these utilities, it is quite easy to combine them into saved shell scripts. For demonstration purposes, the following is a simple example of using bash for throttling:
  
Source code :----------------------------------------------------
[~ /Bacchus/articles/scratch/tu] $ cat flow
#! /Bin/bash
For fname in 'ls $ 1'; do
If (grep $2 $ fname>/dev/null); then
Echo "Creating: $ fname. new ";
Tr "abc" "ABC" <$ fname> $ fname. new
Fi
Done
[~ /Bacchus/articles/scratch/tu] $./flow '*' bash
Creating: flow. new
Creating: test1.new
[~ /Bacchus/articles/scratch/tu] $ cat flow. new
#! /Bin/BAsh
For fnAme in 'ls $ 1'; do
If (grep $2 $ fnAme>/dev/null); then
ECho "CreAting: $ fnAme. new ";
Tr "ABC" "ABC" <$ fnAme> $ fnAme. new
Fi
Done
   Stream-oriented filtering
Cat and tac
  
The simplest text utility is

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.