Simple crawler download site content from a specific address

Source: Internet
Author: User
Tags array to string

Http01app.java
1. Using multi-threaded, IO stream, net (network package)

Package main;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import Java.net.URL;
import java.net.URLConnection;
/**
* Created by lxj-pc on 2017/6/27.
*/
Public class Http01app {
Public static void Main (string[] args) {
String url = "http://tuijian.hao123.com:80/index.html";
//start thread to download HTML content at the specified location
New Thread (new Downloadhtmltask (URL)). Start ();
//Download the specified HTML content and exist d://html/hao123.html
}

Static class Downloadhtmltask implements Runnable {
private String URL;
String fileName = "hao123.html";
String Dirpath = "D:/LXJ";

Public downloadhtmltask (String url) {

this.url = URL;
}

@Override
Public void Run () {
//download URL specified HTML web page content
try {
url htmlurl = new URL (URL);
//Open Network resource connection
try {
URLConnection urlconnection = htmlurl.openconnection ();//.filed
HttpURLConnection conn = (httpurlconnection) urlconnection;
//Get read stream of network resource
InputStream is = Conn.getinputstream ();
//Determine if the network resource response is successful
if (conn.getresponsecode () = =) {

//Memory stream Bytearrayoutputstream
Bytearrayoutputstream BAOs = new Bytearrayoutputstream ();

byte[] buffer = new BYTE[20 * 1024];//maximum memory size 20k per read, buffer size
int len =-1;//byte length per read

//Start reading network data
//Check the progress of file download
//1. Getting the total length of network resources
int contentlength = Conn.getcontentlength ();
//2. Declaring the currently read resource length, accumulating Len
int curlen = 0;

While (len = is.read (buffer))! =-1) {
//reads read data to the memory stream
baos.write (buffer, 0, Len);

//3. Calculating download Progress
Curlen + = len;
System.out.println (Curlen + "" + contentlength);
//4. Calculating download Progress
int p = curlen/contentlength;
System.out.println ("Download progress" + p + "%");
}
//Download complete to get the data in the memory stream
byte[] bytes = Baos.tobytearray ();
//Convert byte array to string and print to console
//"Hello". GetBytes (); Zifu->zijie
string htmlcontent = new String (bytes, "Utf-8");
writerfile (htmlcontent, Dirpath, fileName);//member method below
System.out.println (htmlcontent);
//System.out.println (htmlcontent);
}} catch (IOException e) {
e.printstacktrace ();
}
} catch (Malformedurlexception e) {
e.printstacktrace ();
}


}

//Store files in a file of the specified path
private void Writerfile (String htmlcontent, String Dirpath, String fileName) {
file dir = new file (Dirpath);

FileWriter FileWriter = null;
try {
fileWriter = new FileWriter (New File (dir, fileName));
} catch (IOException e) {
e.printstacktrace ();
}
try {
Filewriter.write (htmlcontent);
} catch (IOException e) {
e.printstacktrace ();
}
try {
filewriter.close ();
} catch (IOException e) {
e.printstacktrace ();
}


}
}
Public static void OutputFile (String content,string dirpath,string fileName) {
file Dir=new file (dirpath);
try {
FileOutputStream fileoutputstream=new FileOutputStream (New File (Dir,filename));
Fileoutputstream.write (content.getbytes ("Utf-8"));
fileoutputstream.close ();
} catch (IOException e) {
e.printstacktrace ();
}
}
}

Simple crawler download site content from a specific address

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.