Using PHANTOMJS to implement Web Capture services

Source: Internet
Author: User

This is a small demand encountered in the first half, want to implement the page crawl, and save as a picture. Studied a lot of tools, the effect is not ideal, not the display is too poor (Canvas, Html2image, Cobra), is not good performance (such as SWT's Brower). After discovering that no interface browser can meet this condition, roughly study the PHANTOMJS and Cutycapt, both are WebKit kernels, which phantomjs more convenient to use, especially on the Windows platform, if under Linux, From the 2.0 version of the need to go to the machine to compile (about 3 hours to compile, I have to say, g++ is a slag, the same project, under the VC compiled quickly, do not talk about, after all, is free open source compiler). The following is a PHANTOMJS of web technologies implemented with Java code:

First, the Environment preparation

1. Directory of PHANTOMJS scripts:d:/xxx/phantomjs-2.0.0-windows/bin/phantomjs

2. Script: D:/xxx/phantomjs-2.0.0-windows/bin/rasterize.js

The script is available on the official website, but here I need to explain its high-width design principle:

Page.viewportsize = {width:600, height:600};

This is the default height, that is, 600x600, I suggest you set the height of a smaller, my side set is width:800,height:200. Because, in fact, when setting the height and brightness in different situations, if the real Web page height is greater than the set value, the picture will automatically expand the high-width, until the entire page is displayed (when you want to intercept the small picture, it may be because the default setting is too large, it will make the picture a lot of empty). If you set a high width at the same time, the following code will be executed, and the part of the page will be intercepted:

Page.cliprect = {top:0, left:0, Width:pagewidth, height:pageheight};

3, first test with the command line:

D:/XXX/PHANTOMJS-2.0.0-WINDOWS/BIN/PHANTOMJS d:/xxx/phantomjs-2.0.0-Windows/bin/rasterize.js/http Www.qq.com D:/test.png

If it's configured, you should see the resulting picture. Of course, you can also configure high-width parameters, after the above command to add: "1000px" or "1000px*400px", are OK.

Second, the server code

As a Web service, this part of the code should be sent to the server, of course, do not have to copy all, according to their own needs to use it:

1  Packagelekkoli.test;2 3 ImportJava.io.BufferedInputStream;4 ImportJava.io.BufferedReader;5 ImportJava.io.ByteArrayOutputStream;6 ImportJava.io.File;7 ImportJava.io.FileInputStream;8 Importjava.io.IOException;9 ImportOrg.apache.log4j.Logger;Ten  One /** A * page to Picture processing class, using external cmd -  * @authorLekkoli -  */ the  Public classPhantomtools { -  -     Private Static FinalLogger _logger = Logger.getlogger (phantomtools.class); -  +     //private static final String _temppath = "/data/temp/phantom_"; -     //private static final String _shellcommand = "/usr/local/xxx/phantomjs/usr/local/xxx/rasterize.js"; Commands under Linux +     Private Static FinalString _temppath = "D:/data/temp/phantom_"; A     Private Static FinalString _shellcommand = "D:/xxx/phantomjs-2.0.0-windows/bin/phantomjs d:/xxx/phantomjs-2.0.0-windows/bin/ Rasterize.js ";  at  -     PrivateString _file; -     PrivateString _size; -  -     /** - * Construction Class in * @parm Hash is used for temporary file directory uniqueness -      */ to      PublicPhantomtools (inthash) { +_file = _temppath + hash + ". png"; -     } the  *     /** $ * Construction ClassPanax Notoginseng * @parm Hash is used for temporary file directory uniqueness -      * @paramthe size of a picture, such as 800px*600px (which is cut at this height), or 800px (at this point the height is minimal = width *9/16, height is not trimmed) the      */ +      PublicPhantomtools (inthash, String size) { A          This(hash); the         if(Size! =NULL) +_size = "" +size; -     } $  $     /** - * Convert target page to picture byte stream -      * @paramURL Destination page address the      * @returnByte stream -      */Wuyi      Public byte[] getbyteimg (String URL)throwsIOException { theBufferedinputstream in =NULL; -Bytearrayoutputstream out =NULL; WuFile File =NULL; -         byte[] ret =NULL; About         Try { $             if(Execmd (_shellcommand + URL + "" + _file + (_size! =NULL? _size: ""))) { -File =NewFile (_file); -                 if(File.exists ()) { -out =NewBytearrayoutputstream (); A                     byte[] B =New byte[5120]; +in =NewBufferedinputstream (Newfileinputstream (file)); the                     intN; -                      while((n = in.read (b, 0, 5120))! =-1) { $Out.write (b, 0, n); the                     } the File.delete (); theRET =Out.tobytearray (); the                 } -}Else { inRET =New byte[] {}; the             } the}finally { About             Try { the                 if(Out! =NULL) { the out.close (); the                 } +}Catch(IOException e) { - _logger.error (e); the             }Bayi             Try { the                 if(In! =NULL) { the in.close (); -                 } -}Catch(IOException e) { the _logger.error (e); the             } the             if(File! =NULL&&file.exists ()) { the File.delete (); -             } the         } the         returnret; the     }94  the     /** the * Execute cmd command the      */98     Private Static Booleanexecmd (String commandstr) { AboutBufferedReader br =NULL; -         Try {101Process p =runtime.getruntime (). exec (COMMANDSTR);102             if(P.waitfor ()! = 0 && p.exitvalue () = = 1) {103                 return false;104             } the}Catch(Exception e) {106 _logger.error (e);107}finally {108             if(BR! =NULL) {109                 Try { the br.close ();111}Catch(Exception e) { the _logger.error (e);113                 } the             } the         } the         return true;117     }118}

Using the Phantomtools class above, it is convenient to call the Getbyteimg method to generate and retrieve the contents of the picture.  

Attach my configuration script: Rasterize.js, as for PHANTOMJS, we will go to the official website to download it.

Reprint Please specify original site:http://www.cnblogs.com/lekko/p/4796062.html 

Using PHANTOMJS to implement Web services

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.