Use in common to perl,python,awk,r, although Java,c,c++,vala also learned but do not like, you say how to do.
It seems the life of the script.
Perl
@rray = split/pattern/, STRING, LIMIT
You can see that split consists of 2 parts (String,pattern) and an optional limit part, anyway split, original aim, must have
You want to split the string,split of the definition, split the save, the other can be plus
Let's give a simple example:
> Cat Test.txt (in order to align, yellow denotes <tab> green denotes one or more spaces)
[We want to extract the numbers and the words]
yahoo 17:56 Ray---boring
> perl-e ' $str = "yahoo\t 17:56 Ray---boring"; @num_word =split/[\s:\-]+/, $str;p rint "@num_word \ N "'
Yahoo! Ray Boring
A more complicated example:
[We want to extract the light blue part]
.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R1_001.fastq.gz
.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R2_001. Fastq.gz
> perl-e ' $str = '. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz "; $name = join "_", @{[split/_/, [split/\//, $str]->[2]]}[0..7]; print "$name \ n" '
tgh_leaf_1_rrna_removal_20140624_attcct_l008
Let's analyze:
split/\//, $str divide the string into 3 segments "..", "sample_tgh_leaf_1_rrna_removal_20140624", "Tgh_leaf_1_rrna_removal_ 20140624_attcct_l008_r1_001.fastq.gz "
[split/\//, $str] turn the result of split into an anonymous array
[Split/\//, $str] ->[2] is the 3rd element of this anonymous array obtained by reference
[split/_/, [Split/\//, $str]->[2]] is and turns the split result into an array (10 elements) "Tgh", "Leaf", "1", "RRNA", "removal" , "20140624", "Attcct", "L008", "R1", "001.fastq.gz"
Because an anonymous array recognizes array slices ([array]->[0..7] does not work), you need to add the array dominance to the @{} array tag
@{[Split/_/, [Split/\//, $str]->[2]]}[0..7] got an array of length 8 slices, still array "Tgh", "Leaf", "1", "RRNA", " Removal "," 20140624 "," Attcct "," L008 "
join "_", @rray to concatenate these elements into a string: tgh_leaf_1_rrna_removal_20140624_attcct_l008
[Note that the delimiter part is pattern, which is the regular expression, not the string]
[In fact, you can just use a split, what parameters do not take, so the default Pattren is a space, the default strniing is $_ type of string ( this is the Perl pass ), this is used in the function, for the default parameters such as loop is $_ very convenient]
For example:
A BC DEF Ghij
KLMN OPQ RS T
U VWX YZ
> perl-e ' @rray = ("A BC DEF ghij", "KLMN OPQ RS T", "U VWX YZ"), @all = (), for (@rray) {@tmp =split;p ush @all, @tmp}; print "@all \ n" '
A BC DEF ghij klmn OPQ RS T U VWX YZ
Python
Python is very human, but not powerful, but other ways to provide and more powerful re.split
Since everything in Python is an object and a class,
A simple example:
Yahoo 17:56 Ray---boring
Eehhhh, the Str class split cannot complete the above segmentation (but you have to think, see attachment 1 below), can only use Re.split to complete
> py3-c ' import re;stri= ' yahoo 17:56 Ray---boring ";p rint (Re.split (" [\s:\-]+ ", Stri)); '
[' + ', ' Yahoo ', ' + ', ' a ', ' Ray ', ' boring ']
A simple example:
[We want to extract the light blue part]
.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R1_001.fastq.gz
.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_R2_001.fastq.gz
> py3-c ' stri= '. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz ";p rint ( "_". Join (Stri.split ("/") [2].split ("_") [0:7])) '
Tgh_leaf_1_rrna_removal_20140624_attcct
This is a simple Python implementation.
[Split can also have nothing to add, only a pair of empty parentheses, by default is separated by one or more spaces]
Awk
Split (STRING,ARRAY,SEP) is a function, a very old function.
[We want to extract the light blue part]
.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz
.. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r2_001.fastq.gz
> echo .. /sample_tgh_leaf_1_rrna_removal_20140624/tgh_leaf_1_rrna_removal_20140624_attcct_l008_r1_001.fastq.gz | awk '{split ($1,AA, "/"); Split (Aa[3],bb, "_");} End{for (i=1;i<=8;i++) {if (i<8) {printf ("%s_", Bb[i])}else{print Bb[i]}}'
Or
Awk- F '/' {split ($3,BB, "_");} End{for (i=1;i<=8;i++) {if (i<8) {printf ("%s_", Bb[i])}else{print Bb[i]}} '
Attachment:
1. py3-c ' import re;stri= "Yahoo 17:56 Ray---boring"; orig=stri.split ();p art2=orig[2].split (":");p Art3=orig[3] . Split ("---");p rint (ORIG[0:2]+PART2+PART3); '
Split function of Perl,python,awk