How to use Java to invoke python to download Web pages

Source: Internet
Author: User
Tags filter command line windows x86

This article refers to: http://tonl.iteye.com/blog/1918245

Python version: 2.7 64bit window version;

Download python:http://www.python.org/getit/

Python 2.7.5 Windows x86-64 Installer (Windows Amd64/intel 64/x86-64 binary [1]--does not include source) for installation:

First write the following spider.py script:

#-*-Coding:utf-8-*-#import urllib2 from urllib import urlopen import OS import sys class Spider: 
        "" "Download Web site from the given file" "Def __init__ (Self,filename,downloadpath):" "  
            init the filename, if the filename is not raise a error ' "' If not os.path.isfile (filename):  
            print ' The given file does not exist,the program'll exit ' Sys.exit (0) Else: Self.fname=filename if not Os.path.isdir (downloadpath): print ' Given download path does not  
        exist, the programe'll exit ' Else:self.dpath=downloadpath def download (self): "" Download the Web site from the given file by line "" Fp=open (Self.fname, ' R ') while T  
                Rue:line=fp.readline () If not line:break if ' HTML ' in line: Tempname=filter (str.isalnum,line). Replace (' HTML ', '. html ') Else:tempname=filter (str.isalnum,line) + '. html ' self.download_html (line,self.dpath+ ' \ +tempname) fp.close () def download_html
        (self,website,filename): "" "Download the HTML by the given Web site and save to name" "  
        Response=urlopen (website) data=response.read () fp=file (filename, ' A + ') fp.write (data) Fp.close () def test (): "" "" Test Program "" Filename=sys.argv[1] Downloadpath=s YS.ARGV[2] Spider=spider (Filename,downloadpath) spider.download () If __name__ = ' __main__ ': Test ()

The above script, to enter two parameters, one is to download the page address file, format generally as follows (Websites.txt):

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/Java/

http://blog.csdn.net/fansy1990  
http://www.baidu.com

Another parameter is the location where the downloaded Web page is stored.

You can then run at the command line:

Python d:\\spider.py d:\\websites.txt d:\\download_tmp

Then go to the download_tmp under D disk to find the downloaded file, if found, then the configuration is correct;

Finally write the following Java program, you need to import Jython-*.jar package (LZ download is 2.2):

Package test;  
       
Import java.io.IOException;  
       
public class Pytest {  
       
    /** 
     * @param args 
     * @throws ioexception *  
     @throws interruptedexception  
     * * Public
    static void Main (string[] args) throws IOException, interruptedexception {     
          String py_path= "d:\\ spider.py ";  
          String websites= "D:\\websites.txt";  
          String outdir= "d:\\tmp";   
          Process pr=runtime.getruntime (). EXEC ("python" +py_path+ "" +websites+ "" +outdir);  
          Pr.waitfor ();  
          System.out.println ("Done ...");  
    }  
       

To run the above command, you need to set the environment attribute in Eclipse, add a path variable, and the value is the Python installation directory;

After running, you will be prompted:

*sys-package-mgr*: Can ' t create package cache dir, *jython-2.2.jar\cachedir\packages '

This can be used without control and will not affect the program running.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.