Java Call PHANTOMJS collection Ajax load generated Web page

Source: Internet
Author: User

Java Call Phantomjs Collection Ajax loading generated Web page a few days ago, when I put all the corresponding pages of the link to the hand, ready to start according to the link to collect (write crawler crawl) corresponding to the terminal page, found that the data obtained by the program has no corresponding content, But my browser see the content is clearly there, so the browser to view the source code also found that there is no, this time think that the page should be Ajax loaded.    Children who do not know Ajax can learn web development. There are two ways to capture Ajax-generated content. One is the request to load the page through HTTP observation, and then we imitate the request to get the corresponding content, the second is to imitate the browser behavior to render this page to get the content. I decided to use the second way here, has been playing WebKit, but has been to load the page too wasted resources, at this time learned to have a fun thing phantomjs, this is a command line to operate WebKit of the thing, Then you can also directly in the inside with the JS API to operate the page (of course, my side is more simple to bother with).DownloadAfter Phantomjs, the direct decompression can be used, and then in the path directory to join the PHANTOMJS path (so that directly on the command line can execute the PHANTOMJS command). Next to complete a code, one is to use PHANTOMJS to get the page (using JS writing behavior), one is to use Java to call PHANTOMJS to achieve the role of content, then directly paste the code.
  1. Codes.js
  2. System = require (' system ')
  3. address = system.args[1]; //Get command line the second parameter is then used
  4. Console.log (' Loading a Web page ');
  5. var page = require (' webpage '). Create ();
  6. var url = address;
  7. Console.log (URL);
  8. Page.open (URL, function (status) {
  9. //page is loaded!
  10. if (status!== ' success ') {
  11. Console.log (' unable to post! ');
  12. } Else {
  13. //console.log (page.content);
  14. //var title = Page.evaluate (function () {
  15. //Return document.title;//demo How to use the Jsapi of the page to manipulate the www.oicqzone.com of the page
  16. //  });
  17. //console.log (title);
  18. Console.log (page.content);
  19. }
  20. Phantom.exit ();
  21. });
The above JS code estimates should not be able to read a few ... Next stick Java code!
  1. Import Org.apache.commons.io.IOUtils;
  2. Import java.io.*;
  3. /**
  4. * Created with IntelliJ idea.
  5. * USER:LSZ
  6. * Date:14-4-22
  7. * Time: 1:17
  8. * Utils for HTTP
  9. */
  10. Public class Httputils {
  11. public static string getajaxcotnent (string url) throws IOException {
  12. Runtime RT = Runtime.getruntime ();
  13. Process p = rt.exec ("Phantomjs.exe c:/phantomjs/codes.js" +url); Here my codes.js is stored in the PHANTOMJS directory below the C drive.
  14. InputStream is = P.getinputstream ();
  15. BufferedReader br = new BufferedReader (new InputStreamReader (IS));
  16. StringBuffer SBF = new StringBuffer ();
  17. String tmp = "";
  18. While ((tmp = Br.readline ()) =null) {
  19. Sbf.append (TMP);
  20. }
  21. //system.out.println (sbf.tostring ());
  22. return sbf.tostring ();
  23. }
  24. public static void Main (string[] args) throws IOException {
  25. Getajaxcotnent ("http://www.oicqzone.com");
  26. }
  27. }
In fact, the principle is very simple, is through interprocess communication with Java call PHANTOMJS This component to request rendering page, but this practice because each time to restart the PHANTOMJS process, so slow, there is another kind of directly loaded with Phantomjs page, Post the content to our custom HTTP backend to receive the data is a bit faster.

Java Call PHANTOMJS collection Ajax load generated Web page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.