This article will focus on the usage of the Perl split function. A very useful function in Perl is the Perl split function, which separates strings and puts the split results into arrays. This Perl split function uses a rule expression (RE), which works on the $ _ variable if not specified.
Perl split Function
A very useful function in Perl is the Perl split function, which separates strings and puts the split results into arrays. This Perl split function uses a rule expression (RE), which works on the $ _ variable if not specified.
The Perl split function can be used as follows:
Copy codeThe Code is as follows: $ info = "Caine: Michael: Actor: 14, LeafyDrive ";
@ Personal = split (//, $ info );
The result is: @ personal = ("Caine", "Michael", "Actor", "14, LeafyDrive ");
◆ If we have stored the information in the $ _ variable, we can do this:
Copy codeThe Code is as follows: @ personal = split (/:/);
If each domain is separated by any number of colons, you can use the RE code to separate it:
Copy codeThe Code is as follows: $ _ = "Capes: Geoff: Shotputter: BigAvenue ";
@ Personal = split (/: + /);
The result is: @ personal = ("Capes", "Geoff", "Shotputter", "BigAvenue ");
But the following code:
Copy codeThe Code is as follows: $ _ = "Capes: Geoff: Shotputter: BigAvenue ";
@ Personal = split (/:/);
The result is: @ personal = ("Capes", "Geoff", "", "Shotputter", "", "BigAvenue ");
◆ In this Perl split function, words can be divided into characters, sentences can be divided into words, and paragraphs can be divided into sentences:
Copy codeThe Code is as follows: @ chars = split (//, $ word );
@ Words = split (//, $ sentence );
@ Sentences = split (/\./, $ paragraph );
In the first sentence, the Null String matches each character, so the @ chars array is an array of characters.>
// The part between the regular expressions (or separation rules) used by split)
\ S is a wildcard that represents a space.
+ Indicates repeat once or more times.
Therefore, \ s + represents one or more spaces.
Split (/\ s +/, $ line) indicates that the string $ line is separated by spaces.
For example, $ line = "Hello friend, welcome to my website jb51.net ";
Split (/\ s +/, $ line) and get:
Welcome to my website jb51.net.
General Usage: @ somearray = split (/: +/, $ string); # parentheses. If $ string is not specified, the default variable $ _ is operated. The Delimiter is used between the two slashes. You can use a regular expression to create a strong exception.
In the perl manual, there is a rare usage. That is: split/PATTERN/, EXPR, LIMIT; the key is this LIMIT parameter, which can save a lot of time. If LIMIT is used and it is a positive number, it indicates that the domain is not more than the number specified by LIMIT. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop wocould do well to remember ). if LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.
By specifying a LIMIT, only the domain values of the key first few columns can be returned in the row segmentation operation with a long length (which generates tens of thousands of elements or fields, reduces memory usage and time consumption. For example, for general genotype data, the first column is usually the material name, which must be determined by the material name. This can be used in this case. My ($ firstfield) = split/\ t/, $ someline, 1; if you need the values of the first few columns, this method is very efficient for large files: my (undef, $ var1, undef, $ var2) = split/\ t/, $ someline, 6;
Some netizens tested this method and it showed better. Reference:
>>>
There are 18 items in each line for a single file, and each item is separated by \ t. 6th items are used for use, and several usage methods are involved.
Copy codeThe Code is as follows: my @ array = split ("\ t", $ _); my $ var = $ array [6]; The average test file time is 8.2 s.
My ($ var) = (split ("\ t", $ _) [6]; test average time: 5.1 s
My (undef, $ var) = split ("\ t", $ _); Average Time: 3.53 s
My (undef, $ var) = split ("\ t", $ _, 7); Average Time: 3.52 s
My $ var = (split ("\ t", $ _, 7) [6]; 3.53 s on average
It seems that the last three are the kings. If you need to use more than one, you can make appropriate changes. However, if the span of the two items is relatively large, 3, 4 should be a good choice, 5 can only use an intermediate array.
Test it by yourself.