The first time to write a crawler, was garbled problem for two days, tried a lot of methods can not, today casually try, incredibly good.
A buffered byte input stream is created when the Web page is fetched, and the problem is on this stream, and the red code is added to
BufferedReader in = null;
in = new BufferedReader (New InputStreamReader (
Connection.getinputstream (), "Utf-8"));
Enclose the code for reference.
1 Publicstring sendget (string url) {2Writer Write =NULL;3 //defines a string used to store Web page content4String result =NULL;5 //defines a buffered character input stream6BufferedReader in =NULL;7 Try {8 //to turn a string into a URL object9URL Realurl =Newurl (url);Ten //initialize a connection to that URL OneURLConnection connection =realurl.openconnection (); A //start the actual connection - Connection.connect (); - //initializes the BufferedReader input stream to read the response of the URL thein =NewBufferedReader (NewInputStreamReader ( -Connection.getinputstream (), "Utf-8")); - //used to temporarily store data for each row crawled to - String Line; + -File File =NewFile (Saveessayurl, fileName); +File file2 =NewFile (saveessayurl); A at if(file2.isdirectory () = =false) { - file2.mkdirs (); - Try { - file.createnewfile (); -System.out.println ("********************"); -System.out.println ("create" + filename + "file Success!! "); in -}Catch(IOException e) { to e.printstacktrace (); + } - the}Else { * Try { $ file.createnewfile ();Panax NotoginsengSystem.out.println ("********************"); -System.out.println ("create" + filename + "file Success!! "); the}Catch(IOException e) { + e.printstacktrace (); A } the } +Writer W =NewFileWriter (file); - $ while(line = In.readline ())! =NULL) { $ //traverse each row that is fetched and store it in result - //Line = new String (line.getbytes ("Utf-8"), "GBK"); - W.write (line); theW.write ("\ r \ n"); -Result + =Line ;Wuyi } the w.close (); -}Catch(Exception e) { WuSYSTEM.OUT.PRINTLN ("Send GET request exception!") " +e); - e.printstacktrace (); About } $ //use finally to close the input stream - finally { - Try { - if(In! =NULL) { A in.close (); + } the -}Catch(Exception E2) { $ e2.printstacktrace (); the } the } the returnresult; the}
Java web crawler, garbled problem finally perfect solution