The simplest way to crawl a particular page into a PNG image is to use CUTYCAPT, a command-line tool that makes it easy to convert HTML pages into vector graphics and bitmap image formats under Linux (for example, SVG, PDF, PS, PNG, JPEG, TIFF, GIF). Cutycapt internally uses the WebKit rendering engine to export Web page rendering output to a picture file. It is built with QT, and Cutycapt is actually a cross-platform app that can also be used on Windows. In this tutorial, I'll describe how to convert an HTML page to a PNG image using CUTYCAPT. Refer to Learning Linux Video Tutorials .
Installing CUTYCAPT on Linux
This is the installation command on a specific Linux distribution.
Install cutycapt on Debian, Ubuntu or Linux Mint
$ sudo apt-get install cutycapt
Installing CUTYCAPT on Fedora
$ sudo yum install subversion qt-devel qtwebkit-devel gcc-c++ make
$ SVN Co svn://svn.code.sf.net/p/cutycapt/code/cutycapt
$ CD Cutycapt/cutycapt
Before compiling on fedora, you need to patch the source code.
Use a text editor to open CUTYCAPT.HPP, and add the following two lines at the beginning of the file.
#include
#include
Finally, compile and install Cutycapt as follows.
$ qmake-qt4
$ make
$ sudo cp cutycapt/usr/local/bin/cutycapt
Install CUTYCAPT on CentOS or Rhel
First, enable the Epel repository on your Linux. Then compile the installation with the same steps as on Fedora.
Convert HTML to PNG using CUTYCAPT
Make an HTML page into a PNG image, as long as you run CUTYCAPT using the following format.
$ cutycapt--url=http://www.cnn.com--out=cnn.png
To save an HTML page in a different format (for example, a PDF), specify the output file as appropriate.
$ cutycapt--url=http://www.cnn.com--out=cnn.pdf
The cutycapt command option is displayed.
Use CUTYCAPT to convert HTML to PNG on a server that does not contain X
Although CUTYCAPT is a command-line tool, it requires the X service to run. If you try to run on a machine that does not contain X services, you will get the following error:
Cutycapt:cannot Connect to X server:0
If you run cutycapt on a server that does not contain x, you can install XVFB (the lightweight "fake" X11 service) on the server. So the cutycapt will not error.
To install XVFB on Debian, Ubuntu, or Linux Mint:
$ sudo apt-get install Xvfb
To install XVFB on Fedora, CentOS, or RHEL:
$ sudo yum install Xvfb
After installing XVFB, run cutycapt like this.
$ xvfb-run--server-args= "-screen 0, 1280x1200x24" cutycapt--url=http://www.cnn.com--out=cnn.png
It first runs the XBFB service and then uses CUTYCAPT to crawl the page. As a result, it may take a longer time. If you want more than one, you may need to start the XVFB as a daemon in advance.
To learn more about Linux systems, please login to e-mentor Network .
How to convert HTML pages to PNG images on Linux