Transferred from: http://blog.csdn.net/csdn_yaobo/article/details/48377757
Originally is to collect, but do not know how, point the collection did not respond, had to first reprint, and so can be collected when, will delete this article
After learning the regular expression, the sense of regular expression is very strong, but in order to better understand the regular expression, found a very practical problem to share their learning experience. This topic is the recruitment of a topic, the general meaning of the topic is this: the user every time on the internet to consume things, then will give business comments, but this review will be a lot of people commenting on the small ads, in order to find these comments, we assume such a scenario:
经常的一些小广告会有下面一下词语:”网店地址“,“销售”,“代购”; 假如一个用户的评论是这样:这家酒店性价比高,提供海外代%……&购*&&6服……&**务”,网店地址:¥……**&*6“; 要求:匹配出关键字,并打印出关键字和该条评论。 就如上面这个题,我想已经表达的够清楚了,我们该怎么下手呢?我们经常的想法是一个一个扫描匹配,但是这样是不是太麻烦,而且有的评论中会有很多特殊字符,那我们怎么办呢?我的思路是这样,我们匹配的都是关键字,也就是汉字,那么可以将上述评论中所有的特殊符号(包括字母,空格,数字等)全部删除掉,只剩下汉字,然后我们去匹配关键字,这样就简单了,下来用一段程序说一下怎样去除特殊符号: String string1 = "我爱编[email protected]#程 www#
She does not 5454 Dadad &*$ () # # # (Love knitting Dadada cheng w! ";
String regEx="[`[email protected]#$%^&*()+=|{}:;\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“’。,、?a-zA-Z0-9 ]"; Pattern c = Pattern.compile(regEx); Matcher mc=c.matcher(string1); String result = mc.replaceAll("").trim(); System.out.println(result);
Explain the above procedure: I gave this sentence "I love to make up [email protected] #程 www#
She does not 5454 Dadad &*$ () # # # (Love knitting Dadada cheng w! ", and then from this sentence to find the matching" programming "keyword; and then output, above the regex is a regular expression, the purpose is to filter out all the special characters, maybe I write this has missing characters, Can be modified according to their own needs, so we will pass this sentence filtered into the following:
The above has gone to remove the special symbol, down is the most crucial is how to match the keyword? Another example of a simple program is to say:
Pattern p = pattern.compile ("[edit] [process]") ; = P.matcher (result); while (M.find ()) { System.out.println (M.group ()); }
The above "([[Edit] []]" is the match keyword, cannot be written as "[programming]", if written "[Programming]" will appear each word and the above sentence match, rather than a word match, and we want the result is very different, This can refer to the concept of a regular expression (written in a very rain or a concept, written in two is two conditions are and meaning: [0-9,a-z] and [0-9][a-z]). If there is a match, the output matches the keyword. Here is the result of the output, this sentence has appeared two times programming, we all match out, and all lose out.
Java Regular expressions find Chinese in strings