Requirement Description: Use Java to crawl Web page information and return it as a string.
Implemented using Java code:
PackageNet.ibuluo.spider.util;Importjava.io.IOException;ImportJava.io.InputStream;ImportJava.io.InputStreamReader;ImportJava.io.Reader;Importjava.net.MalformedURLException;ImportJava.net.URL;/*** HTTP Tool *@authorRobin Zhang **/ Public classHttputil {/*** Crawl Web page information according to the URL and return it as a string *@paramurlstr * URL String *@return * @throwsmalformedurlexception*/ Public Staticstring GetUrl (String urlstr) {string result=NULL; Try{URL URL=NewURL (URLSTR); Result=inputstream2string (Url.openstream ()); } Catch(malformedurlexception e) {e.printstacktrace (); } Catch(IOException e) {e.printstacktrace (); } returnresult; } /*** Read the information in the byte stream and convert to a string *@paramInputStream * Byte stream to read *@return * @throwsIOException*/ Private StaticString inputstream2string (inputstream inputstream)throwsioexception{Reader Reader=NULL; StringBuilder Builder=NULL; Try{ //Stream bytes to a character flowReader =NewInputStreamReader (InputStream); //Creating a String containerBuilder =NewStringBuilder (); //set character stream read length Char[] buffer =New Char[1024]; //records the length of each read, primarily to record the length of the last read intOffset = 0; while((offset=reader.read (buffer)) > 0){ //converts the read content into a string and puts it into the builderBuilder.append (NewString (buffer, 0, offset)); } returnbuilder.tostring (); } Catch(IOException e) {e.printstacktrace (); }finally{ if(NULL!=reader) {Reader.close (); } } return NULL; } Public Static voidMain (string[] args) {System.out.println (GETURL ("http://www.ibuluo.net/") ); }}
The above content can be implemented using a third-party plug-in Jsoup. Use the Jsoup implementation code as follows:
try { = Jsoup.connect ("http://www.baidu.com/"). get (); System.out.println (doc.html ()); Catch (IOException e) { e.printstacktrace (); }
Jsoup is more useful in implementing the analysis of HTML documents. You can refer to the official website of Jsoup.
Java for simple web crawling