Linux command line crawl Web snapshot-(xvfb+cutycapt)

Source: Internet
Author: User

Objective:

Implement command line crawl Web snapshots on a Debian server with no X-server installed

Software:
    • XVFB (simulates the x-server at the command line, caches the rendered graphics)-provides image rendering in an environment where x-server is not installed
    • CUTYCAPT (Simulation browser to download Web pages, HTML, CSS rendering, JavaScript execution, and the final rendering completed pages to take a snapshot)-main Gan
    • Qt (CUTYCAPT is developed based on this framework)
Practice:

1. Install CUTYCAPT, QT and related packages:

Help
12345 sudoapt-get install subversion libqt4-webkit libqt4-dev g++svn co https://cutycapt.svn.sourceforge.net/svnroot/cutycaptcdcutycapt/CutyCaptqmakemake

2. Install XVFB:

Help
1 apt-get installxvfb

3. Crawl test:

Help
1 xvfb-run --server-args="-screen 0, 1024x768x24"./CutyCapt--url=http://www.zol.com.cn --out=zol.png

Found caught in the Chinese page garbled:

4. Toss a half-day, the original is not installed Chinese fonts, install Chinese fonts, and then catch ~ ~

Summarize:

The basic implementation of the Linux command line to achieve the Web page snapshot crawl function, but cutycapt to JavaScript parsing ability is still limited, from can be seen through swfobject loaded flash is not rendered. Later will try to do the rendering crawl directly with Firefox.

Reference Links:

Http://cutycapt.sourceforge.net/http://www.x.org/archive/X11R6.8.2/doc/Xvfb.1.html http://www.yeeach.com/tag/ Screenshot/http://hi.baidu.com/pkubuntu/blog/item/7dcc064ff0246a3eaec3abe2.html http://qt.nokia.com/http:// En.wikipedia.org/wiki/xvfb

Install Chinese fonts: http://hi.baidu.com/spiritualcity/blog/item/96369c2afa8740fde6cd40d2.html linux Chinese internal Code Control scheme:/HTTP Zhcon.sourceforge.net/index_cn.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.