Wc/split and special characters of Linux shell commands

Source: Internet
Author: User
Tags linux shell commands



[Time: 2018-07] [Status: Open]
[Key words: Linux, WC, split, wildcard, escape character, Linux command]


0 Introduction


The purpose of this article is not to learn, just to strengthen the memory, so that the next time can be used directly without a re-search.



This article will mainly organize the commands under the Linux shell, if you are not used under the *nix system, it is recommended to disregard this article.



The following consists mainly of three parts: WC, split, and special characters in the shell.


1wcCharacter Counting tool


wccommand to output the number of rows, bytes, characters, words, and maximum line widths (lines > Word > Char > Byte) for a given file or list of files. The specific syntax is as follows:


Short Options Long Options Description and Description
-C --bytes Output bytes, consistent withllthe file length of the command
-M --chars The number of characters, the character is worth the display of characters, such as multibyte code in a single character to occupy multiple bytes
-W --words The number of words used is separated by a space, so this statistic may not be accurate, not the number of words in the traditional sense
-L --lines Number of output lines
-L --max-line-length Output maximum line width, in bytes
1.1 Application Examples


wcThe default output of the command is as follows:


~$ WC Text
0 2 text-----------sequential, number of words, and bytes


The output of the number of characters is as follows:


~$ wc-m Text
9 text


The text of my test file contains the following content:


File options, 213--------Note there are spaces in the middle


There is only one space in this file, and there is no line break, so the output is the result above.


1.2 References
    • gnu-wc-Help Documentation
    • WC command
2splitFile Split command


My problem is this: in the log analysis, if the day file is too large, such as 10G files, most of the editors are usually not open or open very slow, large file search support is very weak. For example, notepad++ directly restricts file size to 500MB. To quickly analyze logs, improve efficiency, or try to avoid directly opening approximately 1GB of files. So, find out if you can complete the file segmentation tool.



Finally foundsplit.splitsupports splitting files by number of rows, byte size, or specifying the number of shards.
The split default behavior is to slice the input file into 1000 rows of shards. ( so the file is relatively large when the default parameters are used cautiously )
The specific usage is as follows:


Short Options Long Options Description and Description
-L NUM --lines=num Shard files each containing num lines, default line breaksASCII LF
-B Size --bytes=size Shard file Each length is a size byte
-C size --line-bytes=size Splits in the behavior unit, but the total length of the Shard does not exceed the size byte
-N Chunks --number=chunks By a given number of chunks shards, the format supports the following
N, k/n, l/n, l/k/n, r/n, r/k/n
-T Sep --separator=sep Specify line break
-U --unbuffered No buffering mode, slightly slower speed
-A length --suffix-length=length The suffix length of the Shard file name, which defaults to 2
-D from --numeric-suffixes=from Use numbers instead of alpha numbers as suffixes, specify starting numbers, default to 0
-X from --numeric-suffixes=from Use hex as suffix
2.1 Example


By row slicing, the file is 1G max after slicing


Split-c 1G Input


Limit the Shard length to 500MB according to byte segmentation


Split-b 500M Input Sdata


Specify split number slicing, split into three files


Split-n 3 Input


Specify split number slicing, without slicing the branches:


Split-n L/3 Input


Specifies the split number slice, divided by the uniform distribution:


Split-n R/3 Input


Specify split number slicing, Output K (k=2) split content to stdout:


Split-n 2/3 input

2.2 References
    • gnu-split-Help Documentation
    • Split large file segmentation and cat merge files in Linux
2.3 Extension


There are split commands, there must be merge commands, split and cat are corresponding. Interested can see the linux-manual.


3 special characters in the shell


Only the wildcard and escape characters are organized here, and the other suggestions are to look at the Linux Shell wildcard, metacharacters, escape character usage examples.


3.1 Wildcard characters


The wildcard characters in the shell are somewhat similar to regular expressions, but they are relatively simple points. When the shell encounters a wildcard character in parameters, the shell treats it as a path or file name to search for possible matches on the disk: replace (path extension) if a matching match exists, or the wildcard is passed as a normal character to "command" and then processed by the command. In short, a wildcard is actually a kind of path extension that the shell implements.
Common wildcard characters are:


character meaning Example Description
* Match 0 or more characters A*b can have any character of any length between A and B,
such as AABCB, AXYZB, a012b, AB
? Match any one character A?b There is only one character between A and B,
such as AAB, ABB, ACB, a0b
[List] listany single character in a match A[xyz]b There is only one character between A and B, but only x/y/z,
such as: AXB, Ayb, Azb
[!list] Matchlistany single character in addition A[!0-9]b There is only one character between A and B, but not Arabic numerals, such as AXB, AAB, A-a
[C1-C2] Match any single character in the C1-C2 A[0-9]b There is only one character between A and B, which must be between the characters 0-9, such as a0b, a1b ... a9b
{S1,S2,...} Match S1 or S2 (or more) one string A{abc,xyz,123}b A and B can only be one of the three strings of ABC or XYZ or 123
3.2 Escape Character


The following is a collation of the escape character, as follows:


character Description
\ (back slash) The most common escape character, removing the special meaning of the metacharacters or wildcards immediately followed.
"(single quote) Also called hard escape, all of the shell metacharacters and wildcard characters inside it will be invalidated. Note that ' (single quotation marks) are not allowed in hard escaping.
"" (double quotes) Also called soft escape, which allows only specific shell metacharacters to appear inside: $ for parameter substitution ' for command substitution
4 Summary


This article collates the use of the shell command WC and split, while simply tidying up the wildcard and escape characters in the shell. Only for the enhancement of memory, as a follow-up reference.



Wc/split and special characters of Linux shell commands


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.