標籤:perl 正則 p
今天研究一個Perl指令碼,有幾個正則非常不解:
$text =~ s/([?!]) +([\‘\"\(\[\?\?\p{IsPi}]*[\p{IsUpper}])/$1\n$2/g; #multi-dots followed by sentence starters $text =~ s/(\.[\.]+) +([\‘\"\(\[\?\?\p{IsPi}]*[\p{IsUpper}])/$1\n$2/g; # add breaks for sentences that end with some sort of punctuation inside a quote or parenthetical and are followed by a possible sentence starter punctuation and upper case $text =~ s/([?!\.][\ ]*[\‘\"\)\]\p{IsPf}]+) +([\‘\"\(\[\?\?\p{IsPi}]*[\ ]*[\p{IsUpper}])/$1\n$2/g; # add breaks for sentences that end with some sort of punctuation are followed by a sentence starter punctuation and upper case $text =~ s/([?!\.]) +([\‘\"\(\[\?\?\p{IsPi}]+[\ ]*[\p{IsUpper}])/$1\n$2/g;
其中\p後面的字元代表了一個unicode屬性。也就是在perl裡每個unicode編碼都有一個獨特的屬性,我們可以根據它們各自的unicode屬性找到匹配的字元。
關於unicode屬性的介紹如下:
http://shouce.jb51.net/perl/PatternMatching.html
http://blog.csdn.net/wushuai1346/article/details/7206749
http://perldoc.perl.org/perluniprops.html
著作權聲明:本文為博主原創文章,未經博主允許不得轉載。
Perl中正則\p屬性