初學Regex2（在Perl下使用）

最後更新：2014-12-15 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：blog http io ar os 使用 sp java on

/**  * 在Perl下的使用：  *     #華氏溫度和攝氏溫度之間的轉換功能更加完善print "請輸入要轉換的溫度，例如:20C/30F\n";$input = <STDIN>; #擷取輸入的內容chomp($input);    #去掉文本結尾的換行if($input =~ m/^([-+]?[0-9]+(\.[0-9]*)?) *([CFcf])$/){ # m/^([-+]?[0-9]+)(\.[0-9]*)?([CFcf])$/等價於：$inputNum = $1;                            # m/^([-+]?[0-9]+(\.[0-9]*)?)([CFcf])$/$type = $2;if($type eq "F"){$huashi = ($inputNum * 9 / 5) + 32;printf "華氏溫度為：%.2f轉換之後的華氏溫度為：%.2f\n",$inputNum,$huashi;}else{$sheshi = ($inputNum - 32) * 5 / 9 ;printf "攝氏溫度為：%.2f轉換之後的攝氏溫度為：%.2f\n",$inputNum,$sheshi;}}else{print "輸入錯誤\"$input\"\n";}   *   * Perl裡面的(?:   )表示只分組不捕獲  * 上面程式的規則運算式改為  m/^([-+]?[0-9]+(?:\.[0-9]*)?) *([CFcf])$/  *   * [ |\t]*與[ *|\t*]的異同：\t表示空白字元包括換行、定位字元等  * [ |\t]*可以匹配   \t\t  \t\t  \t \t\t\t    * [ *|\t*]只能匹配：\t\t\t\t或                                        無數的空格  *   * \b 用來匹配一個單詞的分界符  * \s 匹配所有空白包括空格、斷行符號、定位字元、分行符號。  * m//i  :i是m//的修飾符可以忽略大小寫。  *       :g是全域匹配的意思  *       ：x寬鬆排列的運算式  * 尋找並替換：  * $var =~ s/regex/replacement/  * $var =~ s/\bSun\b/Sun/i 的作用 Sun不管單詞的大小寫都會被替換為Sun  *   * 公函產生程式：  * 給定一個數保留兩位或三位小數：  * 當小數點後面的第三位不是0時，保留三位小數，否則保留兩位小數  */  // $price =~ s/^([0-9]+\.[0-9][0-9][0-9]?)[0-9]*/$1/  // $price =~ s/(\.\d\d[0-9]?)\d*/$1/  /**   * perl -p -i".bak" -e "s/read/ready/g" file   * perl -w mkreply king.in > king.out   * <>操作符   * while($line = <>){   * if(line =~ m/^\s*$/){   * last; #停止while迴圈內的處理，跳出迴圈   * }   * #...處理header資訊...   * if($line =~ m/^Subject:(.*)/i){   *$subject = $1;   *}   * }   * ...處理郵件的其他資訊...   * 環視：在檢查匹配的過程中不會佔用任何字元   * 肯定順序環視：(?=   ) 從左向右   * 肯定逆序環視：(?<=   )從右向左   * 舉例：longlong xiao   * (?=xiao)則匹配的是xiao之前的空格這個位置           * 把環視和Regex結合起來可以更加準確的匹配位置。   * 例如：longlongxiao和longxiao   * (?=longlognxiao)long 分別匹配上面的兩個字串第二個則不能匹配到結果   * 上面運算式的意思是匹配long，但是(?=longlongxiao)則限制了必須是longlongxiao這個字串裡面的內容   * 幾個例子：   * names改變為name's   * 1、s/names/name's/g   * 2、s/\bnames\b/name's/g   * 3、s/\b(name)(s)\b/$1'$2/g   * 4、s/\bname(?=s\b)/name'/g   * 5、s/(?<=\bname)(?=s\b)/'/g 找到一個位置緊跟在name之後，又在s之前   *    * 一個具體的例子:在長數字裡面加入逗號，每隔三位加入一個逗號。   * 思路：從右邊開始，每隔三位插入一個逗號，不再數位前面加入逗號。   * 三位元字\d\d\d，三的倍數(\d\d\d)+,結尾\b 位置(?=(\d\d\d)+\b)   * 限制不在數位前面加逗號,即不在,123,456一前面加逗號 (?<=\d)   * 綜合起來:(?<=\d)(?=(\d\d\d)+\b)   * 思考：(?<=\d)(?(\d\d\d)+\b)和(?=(\d\d\d)+\b)(?<=\d)匹配結果有區別嗎？答案在右邊                                                     木有   * 為了提高效能我們可以把捕獲型括弧改變為非捕獲型(?<=\d)(?(?:\d\d\d)+\b)   * 在perl中使用s/(?<=\d)(?=(?:\d\d\d)+\b)/'/g   * 對987654321使用的結果是：987,654,321   * 如果把正則表達改為s/(?<=\d)(?=(?:\d\d\d)+)/'/g則會出現9,8,7,6,5,4,321   * 分析：先定位到一個數字，然後如果後面有三位元字或三位元字的倍數則添加,    * 另外兩種否定環視（前面兩種被稱為肯定環視）：   * 否定順序環視:<?!   >    （子運算式不能匹配右側的文本）   * 否定逆序環視:<?<!   >   （子運算式不能匹配左側的文本）   * 單詞的起始分節符號和結束分節符號：   * 起始分界符：右側是單詞，左側不是單詞(?=\w)(?<!\w)   * 結束分界符：左側是單詞，右側不是單詞(?=\w)(?<!\w)   * 那麼\b（單詞分界符）就等價於(?=\w)(?<!\w)|(?=\w)(?<!\w)*/

#單詞去重問題的解決$/ = ".\n";#設定特殊的額快模式；讀取一塊文本的終結為點號和分行符號的結合體while(<>){#讀取一塊文本儲存在預設的變數裡面 next unless s{\b([a-z]+)(?:\s|<[^>]+>)+(\1\b)}     {\e$1\e$2}igx;s/^(?:[^\e]*\n)+//mg;s/^/$ARGV:/mg;print;}

程式碼分析：

JAVA代碼：

package RegularExpression;import java.io.BufferedReader;import java.io.BufferedWriter;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.FileWriter;import java.io.IOException;import java.util.regex.Pattern;import org.slf4j.Logger;import org.slf4j.LoggerFactory;//單詞去重public class WordReduceRepeat {private static Logger  logger = LoggerFactory.getLogger("RegularExpression");private static String fileName = "src/RegularExpression/sun.txt";public static void main(String[] args) {Pattern regex1 = Pattern.compile("\\b([a-z]+)((?:\\s|\\<[^>]\\>)+)(\\1\\b)",Pattern.CASE_INSENSITIVE);String replace = "\033$1\033$2\033$3\033";Pattern regex2 = Pattern.compile("^(?:[^\\e]*\\n)+",Pattern.MULTILINE);Pattern regex3 = Pattern.compile("^([^\\n]+)",Pattern.MULTILINE);try {BufferedReader be = new BufferedReader(new FileReader(fileName));String text;try{while((text = getPara(be))!=null){text = regex1.matcher(text).replaceAll(replace);text = regex2.matcher(text).replaceAll("");text = regex3.matcher(text).replaceAll(fileName+":$1");System.out.println(text);writeToFile(fileName, text);}}catch(IOException e){logger.info("檔案讀寫錯誤！"+e);}finally{try {be.close();} catch (IOException e) {logger.info("檔案關閉失敗！"+e);}}} catch (FileNotFoundException e) {logger.info("檔案沒有找到"+e);}}//end main()//從檔案中讀取資料，返回一個字串public static String getPara(BufferedReader in) throws IOException{StringBuffer sb = new StringBuffer();String line;while(((line = in.readLine()) != null) && ( ((sb.length()) == 0)||((line.length()) != 0)) ){sb.append(line+"\n");}return sb.length() == 0 ? null : sb.toString();}//end getPara()//把改變的內容存入到另外一個檔案中public static void writeToFile(String fileName,String content){String[] path = fileName.split("/");String newFileName = path[0]+"/"+path[1]+"/"+"新"+path[2];try {BufferedWriter bw = new BufferedWriter(new FileWriter(newFileName));try{bw.write(content);}finally{bw.flush();bw.close();}} catch (IOException e) {logger.info("檔案寫入錯誤"+e);}}//end writeToFile()}

初學Regex2（在Perl下使用）

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More