Split function of Perl,python,awk

Source: Internet
Author: User

Use in common to perl,python,awk,r, although Java,c,c++,vala also learned but do not like, you say how to do.

It seems the life of the script.

Perl

@rray = split/pattern/, STRING, LIMIT

You can see that split consists of 2 parts (String,pattern) and an optional limit part, anyway split, original aim, must have

You want to split the string,split of the definition, split the save, the other can be plus

Let's give a simple example:

> Cat Test.txt (in order to align, yellow denotes <tab> green denotes one or more spaces)

[We want to extract the numbers and the words]

   yahoo 17:56 Ray---boring

> perl-e ' $str = "yahoo\t 17:56 Ray---boring"; @num_word =split/[\s:\-]+/, $str;p rint "@num_word \ N "'

Yahoo! Ray Boring

A more complicated example:

[We want to extract the light blue part]

.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R1_001.fastq.gz

.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R2_001. Fastq.gz

> perl-e ' $str = '. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz "; $name = join "_", @{[split/_/, [split/\//, $str]->[2]]}[0..7]; print "$name \ n" '

tgh_leaf_1_rrna_removal_20140624_attcct_l008

Let's analyze:

split/\//, $str divide the string into 3 segments "..", "sample_tgh_leaf_1_rrna_removal_20140624", "Tgh_leaf_1_rrna_removal_ 20140624_attcct_l008_r1_001.fastq.gz "

[split/\//, $str] turn the result of split into an anonymous array

[Split/\//, $str] ->[2] is the 3rd element of this anonymous array obtained by reference

[split/_/, [Split/\//, $str]->[2]] is and turns the split result into an array (10 elements) "Tgh", "Leaf", "1", "RRNA", "removal" , "20140624", "Attcct", "L008", "R1", "001.fastq.gz"

Because an anonymous array recognizes array slices ([array]->[0..7] does not work), you need to add the array dominance to the @{} array tag

@{[Split/_/, [Split/\//, $str]->[2]]}[0..7] got an array of length 8 slices, still array "Tgh", "Leaf", "1", "RRNA", " Removal "," 20140624 "," Attcct "," L008 "

join "_", @rray to concatenate these elements into a string: tgh_leaf_1_rrna_removal_20140624_attcct_l008

[Note that the delimiter part is pattern, which is the regular expression, not the string]

[In fact, you can just use a split, what parameters do not take, so the default Pattren is a space, the default strniing is $_ type of string ( this is the Perl pass ), this is used in the function, for the default parameters such as loop is $_ very convenient]

For example:

A BC DEF Ghij

KLMN OPQ RS T

U VWX YZ

> perl-e ' @rray = ("A BC DEF ghij", "KLMN OPQ RS T", "U VWX YZ"), @all = (), for (@rray) {@tmp =split;p ush @all, @tmp}; print "@all \ n" '

A BC DEF ghij klmn OPQ RS T U VWX YZ

Python

Python is very human, but not powerful, but other ways to provide and more powerful re.split

Since everything in Python is an object and a class,

A simple example:

Yahoo 17:56 Ray---boring

Eehhhh, the Str class split cannot complete the above segmentation (but you have to think, see attachment 1 below), can only use Re.split to complete

> py3-c ' import re;stri= ' yahoo 17:56 Ray---boring ";p rint (Re.split (" [\s:\-]+ ", Stri)); '

[' + ', ' Yahoo ', ' + ', ' a ', ' Ray ', ' boring ']

A simple example:

[We want to extract the light blue part]

.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R1_001.fastq.gz

.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R2_001.fastq.gz

> py3-c ' stri= '. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz ";p rint ( "_". Join (Stri.split ("/") [2].split ("_") [0:7])) '

Tgh_leaf_1_rrna_removal_20140624_attcct

This is a simple Python implementation.

[Split can also have nothing to add, only a pair of empty parentheses, by default is separated by one or more spaces]

Awk

Split (STRING,ARRAY,SEP) is a function, a very old function.

[We want to extract the light blue part]

.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz

.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r2_001.fastq.gz

> echo .. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz | awk '{split ($1,AA, "/"); Split (Aa[3],bb, "_");} End{for (i=1;i<=8;i++) {if (i<8) {printf ("%s_", Bb[i])}else{print Bb[i]}}'

Or

Awk- F '/' {split ($3,BB, "_");} End{for (i=1;i<=8;i++) {if (i<8) {printf ("%s_", Bb[i])}else{print Bb[i]}} '

Attachment:

1. py3-c ' import re;stri= "Yahoo 17:56 Ray---boring"; orig=stri.split ();p art2=orig[2].split (":");p Art3=orig[3] . Split ("---");p rint (ORIG[0:2]+PART2+PART3); '

Split function of Perl,python,awk

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.