JAVA super simple crawler example (1), java crawler example
Crawls the data of the entire page and effectively extracts information. comments are not nonsense:
Public class Reptile {public static void main (String [] args) {String url1 = ""; // input the page address you want to crawl. InputStream is = null; // create an input stream for reading the stream BufferedReader br = null; // wrap the stream to speed up reading StringBuffer html = new StringBuffer (); // Save the data for reading the page. string temp = ""; // create a temporary String to save each row of data read, and then html calls the append method to write data to temp; try {URL url2 = new URL (url1 ); // obtain the URL; is = url2.openStream (); // open the stream and prepare to start reading data; br = new BufferedReader (new InputStream Reader (is); // package the stream into a response stream and call br. readLine () can improve reading efficiency, reading a row each time; while (temp = br. readLine ())! = Null) {// read data, call br. the readLine () method reads a row of data each time and assigns the value to temp. If no data exists, the value = null, which jumps out of the loop. html. append (temp); // append the temp value to html. Note that the difference between String and StringBuffere is that the former is not variable and the latter is variable.} // System. out. println (html); // print out all the code for crawling the page; if (is! = Null) // The next step is to close the stream to prevent resource waste. {is. close (); is = null;} Document doc = Jsoup. parse (html. toString (); // parse the page through Jsoup to generate a document object; Elements elements = doc. getElementsByClass ("XX"); // get it by class Name (XX). An array object named Elements contains the data we want, as for the div value, open the browser and press F12. for (Element element: elements) {System. out. println (element. text (); // print the information of each node; you can choose to retain the data you want, generally obtaining a fixed index;} catch (MalformedURLException e) {// TODO Auto-generated catch block e. printStackTrace ();} catch (IOException e) {// TODO Auto-generated catch block e. printStackTrace ();}}
Upload a self-crawled image and use fusioncharts to generate a report. (If int type data is captured, the report can be generated intuitively)