How to extract Email addresses from webpages and Email addresses from webpages
It's been a long time since I started to publish technical documents for the first time today. Before that, I always saved some good examples on my computer. It's also very difficult to find them after a long time, so it's easier to classify them in my blog, in this way, we can share some of our experiences in the learning process with you and help you.
A friend needs my help to write a small program that can extract the Email addresses on the webpage, so I used the Java language to help him. Please forgive me for any imperfections, and propose and learn together.
For the source code, see the attachment! Then, place readme.htm in F: \ share \ readme.htm. You can also customize the directory. to customize the directory, You need to modify the corresponding code file path. Source code .rar
Import java. io. bufferedReader; import java. io. bufferedWriter; import java. io. fileNotFoundException; import java. io. fileReader; import java. io. fileWriter; import java. io. IOException; import java. util. regex. matcher; import java. util. regex. pattern;/*** email crawler * @ author xiaoxin * @ date 2014/10/29 */public class EmailSpider {public static void main (String [] args) {try {BufferedReader br = new BufferedReader (New FileReader ("F: \ share \ readme.htm"); BufferedWriter bw = new BufferedWriter (new FileWriter ("F: \ share \ email.txt ")); string line = ""; while (line = br. readLine ())! = Null) {parse (line, bw);} bw. flush (); bw. close (); br. close ();} catch (FileNotFoundException e) {e. printStackTrace ();} catch (IOException e) {e. printStackTrace () ;}}/*** Email Resolution Method * @ param line filter by line * @ param bw output to email.txt */private static void parse (String line, BufferedWriter bw) {Pattern p = Pattern. compile ("[\ w [. -] + @ [\ w [. -] + \\. [\ w] + "); Matcher m = p. matcher (line); try {while (m. find () {bw. write (m. group () + "; \ r \ n"); // line feed display, applicable to windows, Linux \ r, Mac \ n // bw. newLine (); // we recommend that you use this line feed System. out. println (m. group () ;}} catch (IOException e) {e. printStackTrace (); System. exit (-1 );}}}