Several application examples of Java regular expressions (matching URLs, matching US security code, matching date)

Source: Internet
Author: User



because recent projects need to extract strings from the English text to the topic of clustering, so it took a day to learn the Java regular expression, a few small examples are some of my small lianbi, if there are unreasonable, but also hope you advice!!



1. This example is used to filter out URLs in English text and to export the filtered string



First I need to paste out the English text I need to filter, I will have these text in a name englishtxt.txt, the content is



  


1 www.baidu.com2Bank run: A blind spot that could spark the next financial crisis http://mp.weixin.qq.com/s?__biz=mjm5mdy4mzg2ma==&mid=200223248&idx=1&sn= A5b668754a60a8e07f335bd59521fb03#rd?...3Beijing CBD right now pic.twitter.com/Zcnp4cfrrk4I See more and more Chinese ask the same question Online:whatifMost #MH370 passengers were Americans; How would the US government react?510:27:01 Chinese Net friend expectations http://chinafree.greatzhonghua.org/showthread.php?tid=5377?... Chinese Net friend expectations-...601:47:01 times silly and fantastic notions, Gu Xiaojun thought Yiu Glorious http://chinafree.greatzhonghua.org/showthread.php?tid=4969?... T ...7[Strong air problems are more serious than AIDS] China smog in Center of <<air pollution deaths cited>> by the WHO http://bloom.bg/1rqnrbp?/via @BloombergNews8[Android Gaudenzai] LIHK is reborn, will you spend hk$10 to buy? Https//Play.google.com/store/apps/details?id=com.lihk.hkgolden.app.reborn?...9#Taiwan(China) Protests:water Cannons is an indiscriminate tool forDispersing protesters &can result in serious injuryTenNASA's new space suit ... http://jscfeatures.jsc.nasa.gov/z2/? OnePhotos:marijuana through the years http://Ow.ly/uxzuq? (AP Photo/dea) pic.twitter.com/4lsp4nllmq AProtest in Taiwan
(China)
Baby born on board diverted Cathay flight dies http://www.scmp.com/news/hong-kong/article/1456417/baby-born-board-diverted-cathay-flight-dies?.../via @SCMP_News -What is does Apple think about the lack of diversity in emojis? We have their response. http//on.mtv.com/owu6d7?/via @MTVact theLinkin Park releases customizable music video powered by Xbox ' s Project Spark http://www.theverge.com/2014/3/25/5546982/l Inkin-park-releases-customizable-music-video-powered-by-xboxs?... -Full Draw for@afcasiancup are here pic.twitter.com/nryjo1mm9g #AC2015 - Interesting draw RT @afcasiancup: Group b:saudi Arabia, China PR, DPR Korea, Uzbekistan #AC2015 - Finally: @emirates is activating their Twitter account. +Interior Minister Prince Mohammed bin Naif launchesNewMinistry site aboard what appears like aPrivateJet-spa pic.twitter.com/ndsgjvbxts





We can see from the text document that there are a lot of URLs in the text, if the topic clustering directly, will generate a lot of noise data, so need to remove these URLs, so my code is as follows





1 ImportJava.io.BufferedReader;2 ImportJava.io.File;3 Importjava.io.FileNotFoundException;4 ImportJava.io.FileReader;5 Importjava.io.IOException;6 ImportJava.util.regex.Matcher;7 ImportJava.util.regex.Pattern;8 9  Public classUrlmatcher {Ten  Public Static voidMain (string[] args)throwsIOException { OneBufferedReader br =NewBufferedReader (NewFileReader (NewFile ("D://englishtxt.txt"))); ASystem.out.println ("Start reading data from text"); -String line =br.readline (); -  while(line!=NULL) the  { -String value = Line.replaceall ("(http://|https://|ftp://)?" ( \\w+\\.) +\\w+ (: \\d*)? ([^#\\s]*) "," "). ReplaceAll (" [\\/?:;[ Email protected]#$%^&*+ () "<<>>...-" "," "); -StringBuilder STRB =NewStringBuilder (); +Pattern PTN = Pattern.compile ("\\w+"); -Matcher MCH =Ptn.matcher (value); +  while(Mch.find ()) A  { at Strb.append (Mch.group ()); -Strb.append (""); -  } - System.out.println (strb.tostring ()); -line =br.readline (); -} in  - } to}


The above code is not only able to filter out a large number of URLs, but also to remove some special punctuation



The results of the operation are as follows:




start reading data from Text Rd Beijing CBD Right now I see more and more Chinese ask the same question online whatifMost MH passengers were Americans how would the US government react Chinese net friend expectations Chinese net Frien D Expectations times silly and fantastic notions Gu Xiaojun thought Yiu Glorious T China smog at Center of Air pollution D Eaths cited by the WHO via Bloombergnews Android lihk HK I Taiwan(China)
(China)
(China)
protests water cannons is an indiscriminate tool fordispersing protesters can result in serious injury NASA PHOTOS marijuana through the years APS Photodea protest in Tai Wan via Flickr F Baby born on board diverted Cathay flight dies via SCMP News What does Apple think about the lack of dive Rsity in emojis We had their response via mtvact Linkin Park releases customizable music video powered by Xbox s Project Spark Full Draw forAfcasiancup is here ac interesting draw RT Afcasiancup Group B Saudi Arabia China PR DPR Korea Uzbekistan AC Finally Emirates is activating their Twitter account Interior Minister Prince Mohammed bin Naif launchesNewMinistry site aboard what appears like aPrivateJet SPA


From the above results can be seen, the site is basically filtered out.






2. The following small example is used to match the U.S. security Code



The code is as follows:



            String Safenum = "This was a safe Num 999-99-9999,this is the second num 456003348,this are the third num 456-909090,this is The Forth num 45677-0764 ";  = Pattern.compile ("\\d{3}\\-?\\d{2}\\-?\\d{4}");  = Ptn.matcher (safenum);   while (Mch.find ())            {                System.out.println (Mch.group ());            }


The result of the final output is:



999-99-9999456003348456-90909045677-0764





3. This small example is used to match the date in English



            String strdate = "This is a date June 26,1951";  = Pattern.compile ("([a-za-z]+) \\s[0-9]{1,2},\\s*[0-9]{4}");  = Ptn.matcher (strdate);   while (Mch.find ())            {                System.out.println (Mch.group ());            }


The output is:



June 26,1951


The above 3 small examples are I am learning regular expression when the small lianbi, I hope that everyone's learning has helped!!






Several application examples of Java regular expressions (matching URLs, matching US security code, matching date)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.