[Time: 2018-07] [Status: Open]
[Key words: Linux, WC, split, wildcard, escape character, Linux command]
0 Introduction
The purpose of this article is not to learn, just to strengthen the memory, so that the next time can be used directly without a re-search.
This article will mainly organize the commands under the Linux shell, if you are not used under the *nix system, it is recommended to disregard this article.
The following consists mainly of three parts: WC, split, and special characters in the shell.
1wcCharacter Counting tool
wccommand to output the number of rows, bytes, characters, words, and maximum line widths (lines > Word > Char > Byte) for a given file or list of files. The specific syntax is as follows:
Short Options |
Long Options |
Description and Description |
-C |
--bytes |
Output bytes, consistent withllthe file length of the command |
-M |
--chars |
The number of characters, the character is worth the display of characters, such as multibyte code in a single character to occupy multiple bytes |
-W |
--words |
The number of words used is separated by a space, so this statistic may not be accurate, not the number of words in the traditional sense |
-L |
--lines |
Number of output lines |
-L |
--max-line-length |
Output maximum line width, in bytes |
1.1 Application Examples
wcThe default output of the command is as follows:
~$ WC Text
0 2 text-----------sequential, number of words, and bytes
The output of the number of characters is as follows:
~$ wc-m Text
9 text
The text of my test file contains the following content:
File options, 213--------Note there are spaces in the middle
There is only one space in this file, and there is no line break, so the output is the result above.
1.2 References
- gnu-wc-Help Documentation
- WC command
2splitFile Split command
My problem is this: in the log analysis, if the day file is too large, such as 10G files, most of the editors are usually not open or open very slow, large file search support is very weak. For example, notepad++ directly restricts file size to 500MB. To quickly analyze logs, improve efficiency, or try to avoid directly opening approximately 1GB of files. So, find out if you can complete the file segmentation tool.
Finally foundsplit.splitsupports splitting files by number of rows, byte size, or specifying the number of shards.
The split default behavior is to slice the input file into 1000 rows of shards. ( so the file is relatively large when the default parameters are used cautiously )
The specific usage is as follows:
Short Options |
Long Options |
Description and Description |
-L NUM |
--lines=num |
Shard files each containing num lines, default line breaksASCII LF |
-B Size |
--bytes=size |
Shard file Each length is a size byte |
-C size |
--line-bytes=size |
Splits in the behavior unit, but the total length of the Shard does not exceed the size byte |
-N Chunks |
--number=chunks |
By a given number of chunks shards, the format supports the following N, k/n, l/n, l/k/n, r/n, r/k/n |
-T Sep |
--separator=sep |
Specify line break |
-U |
--unbuffered |
No buffering mode, slightly slower speed |
-A length |
--suffix-length=length |
The suffix length of the Shard file name, which defaults to 2 |
-D from |
--numeric-suffixes=from |
Use numbers instead of alpha numbers as suffixes, specify starting numbers, default to 0 |
-X from |
--numeric-suffixes=from |
Use hex as suffix |
2.1 Example
By row slicing, the file is 1G max after slicing
Split-c 1G Input
Limit the Shard length to 500MB according to byte segmentation
Split-b 500M Input Sdata
Specify split number slicing, split into three files
Split-n 3 Input
Specify split number slicing, without slicing the branches:
Split-n L/3 Input
Specifies the split number slice, divided by the uniform distribution:
Split-n R/3 Input
Specify split number slicing, Output K (k=2) split content to stdout:
Split-n 2/3 input
2.2 References
- gnu-split-Help Documentation
- Split large file segmentation and cat merge files in Linux
2.3 Extension
There are split commands, there must be merge commands, split and cat are corresponding. Interested can see the linux-manual.
3 special characters in the shell
Only the wildcard and escape characters are organized here, and the other suggestions are to look at the Linux Shell wildcard, metacharacters, escape character usage examples.
3.1 Wildcard characters
The wildcard characters in the shell are somewhat similar to regular expressions, but they are relatively simple points. When the shell encounters a wildcard character in parameters, the shell treats it as a path or file name to search for possible matches on the disk: replace (path extension) if a matching match exists, or the wildcard is passed as a normal character to "command" and then processed by the command. In short, a wildcard is actually a kind of path extension that the shell implements.
Common wildcard characters are:
character |
meaning |
Example |
Description |
* |
Match 0 or more characters |
A*b |
can have any character of any length between A and B, such as AABCB, AXYZB, a012b, AB |
? |
Match any one character |
A?b |
There is only one character between A and B, such as AAB, ABB, ACB, a0b |
[List] |
listany single character in a match |
A[xyz]b |
There is only one character between A and B, but only x/y/z, such as: AXB, Ayb, Azb |
[!list] |
Matchlistany single character in addition |
A[!0-9]b |
There is only one character between A and B, but not Arabic numerals, such as AXB, AAB, A-a |
[C1-C2] |
Match any single character in the C1-C2 |
A[0-9]b |
There is only one character between A and B, which must be between the characters 0-9, such as a0b, a1b ... a9b |
{S1,S2,...} |
Match S1 or S2 (or more) one string |
A{abc,xyz,123}b |
A and B can only be one of the three strings of ABC or XYZ or 123 |
3.2 Escape Character
The following is a collation of the escape character, as follows:
character |
Description |
\ (back slash) |
The most common escape character, removing the special meaning of the metacharacters or wildcards immediately followed. |
"(single quote) |
Also called hard escape, all of the shell metacharacters and wildcard characters inside it will be invalidated. Note that ' (single quotation marks) are not allowed in hard escaping. |
"" (double quotes) |
Also called soft escape, which allows only specific shell metacharacters to appear inside: $ for parameter substitution ' for command substitution |
4 Summary
This article collates the use of the shell command WC and split, while simply tidying up the wildcard and escape characters in the shell. Only for the enhancement of memory, as a follow-up reference.
Wc/split and special characters of Linux shell commands