HTTP-based download for Android

Source: Internet
Author: User
Tags domain server microsoft iis

Android systems have download mechanisms, such as downloadmanager used by browsers. Unfortunately, downloadmanager is only available to browsers and cannot be called by common applications. In addition, if downloadmanager is used frequently, downloadmanager is inefficient. To solve these problems, I think our best solution is to implement the download by ourselves. This article is a brief introduction to HTTP-based download.

1. Introduction to HTTP 

HTTP is an object-oriented protocol at the application layer. It is applicable to distributed hypermedia information systems due to its simple and fast method. It proposed in 1990 that, after several years of use and development, it has been continuously improved and expanded. Currently, the sixth version of HTTP/1.0 is used in WWW, standardization of HTTP/1.1 is in progress, and suggestions for HTTP-NG (Next Generation of HTTP) have been put forward.
 
The main features of HTTP are as follows:

1. Supports the customer/Server mode.

2. simple and fast: when a customer requests a service from the server, they only need to send the request method and path. Common Request methods include get, Head, and post. Each method specifies the type of contact between the customer and the server. Because the HTTP protocol is simple, the program size of the HTTP server is small, so the communication speed is fast.

3. Flexibility: HTTP allows transmission of any type of data objects. The type being transferred is marked by Content-Type.

4. No connection: No connection means that only one request is allowed for each connection. After the server processes the customer's request and receives the customer's response, the connection is disconnected. This method can save transmission time.

5. Stateless: HTTP is stateless. Stateless means that the Protocol has no memory for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be re-transmitted, which may increase the amount of data transmitted each connection. On the other hand, when the server does not need previous information, its response is faster.

1.1 URL

The format of http url (a URL is a special type of URI that contains sufficient information for searching a resource) is as follows:

Http: // host [":" port] [abs_path]

HTTP indicates that network resources need to be located through the HTTP protocol;

Host indicates a valid Internet host domain name or IP address;

Port specifies a port number. If it is null, the default port 80 is used;

Abs_path specifies the URI of the requested resource;

Note: If the URL does not provide abs_path, it must be given in the form of "/" when it is used as the request URI. Generally, this work is automatically completed by the browser.

For example:

1. Enter www.guet.edu.cn
The browser automatically converts to: http://www.guet.edu.cn/

2. http: 192.168.0.116: 8080/index. jsp

1.2 requests

An HTTP request consists of three parts: the request line, the message header, and the request body.

1.2.1 request line

The request line starts with a method symbol and is separated by spaces, followed by the requested URI and Protocol version. The format is as follows:

Method Request-Uri http-version CRLF

Where:

Method indicates the request method;
Request-Uri is a unified resource identifier;
HTTP-version indicates the HTTP protocol version of the request;
CRLF indicates carriage return and line feed (except for CRLF at the end, separate CR or lf characters are not allowed ).

For example:

Post/hello.htm HTTP/1.1 ("/R/N ")

1) Request Method:

There are multiple request methods (all methods are capitalized). The methods are described as follows:

GET request to get the resource identified by request-Uri

Post attaches new data to the resource identified by request-Uri

Head request to obtain the Response Message Header of the resource identified by request-Uri

The put request server stores a resource and uses request-Uri as its identifier.

The Delete request server deletes the resource identified by request-Uri.

Trace Request information received by the server for testing or diagnosis

Connect reserved for future use

Options requests query server performance, or query resource-related options and requirements

2) request-Uri:

Identifies the network resource to be accessed. Generally, you only need to give a relative directory relative to the root directory of the server, so it starts.

3) Protocol version.

1.2.2 Message Header

An HTTP message consists of a client-to-server request and a server-to-client response. Request Message and Response Message are both from the start line (for request message, the start line is the request line, and for response message, the start line is the status line), the message header (optional ), empty line (only CRLF line), message body (optional.

1) Common header:

In a common header, there are a few header fields used for all request and response messages, but not for transmitted entities, only for transmitted messages.

Cache-control: used to specify cache commands. cache commands are unidirectional (Cache commands in the response may not appear in the request ), it is independent (the cache command of one message does not affect the cache mechanism of the other message processing), and the similar header domain used by http1.0 is Pragma.

Cache commands for requests include: No-Cache (used to indicate that the request or response message cannot be cached), No-store, Max-age, Max-stale, Min-fresh, only-if-cached;
Cache commands for response include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, Max-age, and s-maxage.

Date: Specifies the date and time when a message is generated.

Connection: the option that allows sending a specified connection in a common header field. For example, if the specified connection is continuous or the "close" option is specified, the server is notified and the connection is closed after the response is complete.

2) Request Header:

Allows the client to send additional request information and client information to the server. Common request headers are as follows:

Accept:

The accept request header field is used to specify the types of information the client accepts. Eg: accept: image/GIF indicates that the client wants to accept resources in the GIF image format; accept: text/html indicates that the client wants to accept HTML text.

Accept-charset:

The accept-charset request header field is used to specify the character set accepted by the client. Eg: Accept-charset: iso-8859-1, gb2312. if this field is not set in the request message, it is acceptable by default for any character set.

Accept-encoding:

The accept-encoding Request Header domain is similar to accept, but it is used to specify acceptable content encoding. Eg: Accept-encoding: gzip. Deflate. If the domain server is not set in the request message, it is assumed that the client can accept all content encoding.

Accept-language:

The accept-language Request Header domain is similar to accept, but it is used to specify a natural language. Eg: Accept-language: ZH-CN. If this header field is not set in the request message, the server assumes that the client is acceptable to all languages.

Authorization:

The authorization request header domain is used to prove that the client has the right to view a resource. When a browser accesses a page, if the response code of the server is 401 (unauthorized), it can send a request containing the authorization request header domain, requiring the server to verify the request.

Host (this header field is required when a request is sent ):

The host request header field is used to specify the Internet host and port number of the requested resource. It is usually extracted from the http url.

For example, enter http://www.guet.edu.cn/index.htmlin the browser. The request message sent by the Browser contains the host Request Header domain, as follows:

HOST: www.guet.edu.cn

The default port number is 80. If the port number is specified, it is changed to: Host: www.guet.edu.cn: the specified port number.

User-Agent:

When we log on to the forum online, we will often see some welcome information, which lists the names and versions of your operating system, the names and versions of your browsers, this is often amazing for many people. In fact, the server application obtains this information from the User-Agent Request Header domain. The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not required. If we write a browser and do not use the User-Agent to request the header field, the server will not be able to know our information.

Example of request header:

GET/form.html HTTP/1.1 (CRLF)
Accept: image/GIF, image/X-xbitmap, image/JPEG, application/X-Shockwave-flash, application/vnd. MS-Excel, application/vnd. MS-
Powerpoint, application/MSWord, */* (CRLF)
Accept-language: ZH-CN (CRLF)
Accept-encoding: gzip, deflate (CRLF)
If-modified-since: Wed, 05 Jan 2007 11:21:25 GMT (CRLF)
If-None-Match: W/"80b1a4c018f3c41: 8317" (CRLF)
User-Agent: Mozilla/4.0 (compatible; msie6.0; Windows NT 5.0) (CRLF)
HOST: www.guet.edu.cn (CRLF)
Connection: keep-alive (CRLF)
(CRLF)

3) Response Header:

The Response Header allows the server to transmit additional response information that cannot be placed in the status line, as well as information about the server and the next access to the resource identified by the request-Uri.

Common Response Headers:

Location:

The location response header field is used to redirect the receiver to a new location. Location response header fields are often used when domain names are changed.

Server:

The server response header contains the software information used by the server to process requests. It corresponds to the User-Agent Request Header domain. The following is an example of the server response header domain:

Server: APACHE-Coyote/1.1

WWW-Authenticate:

The WWW-authenticate Response Header domain must be included in the 401 (unauthorized) Response Message. When the client receives the 401 Response Message and sends the Authorization Header domain request server to verify the message, the server response header contains this header field. Eg: www-Authenticate: Basic realm = "basic auth test! "// You can see that the server uses a basic authentication mechanism for requested resources.

4) object header:

Both request and response messages can be transmitted as an entity. An object consists of the object header domain and the Object Body, but it does not mean that the object header domain and the Object Body must be sent together, but only the object header domain can be sent. The object header defines metadata about the Object Body (eg: whether there is an entity body) and the resource identified by the request.

Common Object headers:

Content-encoding:

The content-encoding object header field is used as a modifier of the media type. Its value indicates the encoding of additional content that has been applied to the Object Body, to obtain the media types referenced in the Content-Type header field, the corresponding decoding mechanism must be adopted. Content-encoding is used to record the File compression method. Eg: Content-encoding: Gzip

Content-language:

The content-language object header field describes the natural language used by the resource. If this field is not set, the entity content is provided to all language readers. Eg: Content-language: da

Content-Length:

The Content-Length object header field is used to specify the length of the Object Body, which is represented by a decimal number stored in bytes.

Content-Type:

The Content-Type object header field specifies the media type of the Object Body sent to the recipient. Eg: Content-Type: text/html; charset = ISO-8859-1, Content-Type: text/html; charset = gb2312

Last-modified:

The last-modified object header field is used to indicate the last modification date and time of the resource.

Expires:

The expires object header field specifies the response expiration date and time. To enable the proxy server or browser to update the cache after a period of time (when accessing the previously visited page again, load the page directly from the cache, shorten the response time and reduce the server load, we can use the expires object header field to specify the page expiration time. Eg: expires: Thu, 15 Sep 2006 16:23:12 GMT

The client and cache of http1.1 must regard other illegal date formats (including 0) as expired. Eg: to prevent the browser from caching pages, we can also use the expires object header field to set it to 0. The JSP program is as follows: Response. setdateheader ("expires", "0 ");

1.3 response

After receiving and interpreting the request message, the server returns an HTTP Response Message. HTTP response is composed of three parts: Status line, message header, and response body.


It mainly refers to the status line. The status line format is as follows:

HTTP-version status-code reason-phrase CRLF

Where:

HTTP-version indicates the HTTP protocol version of the server;
Status-code indicates the response status code sent back by the server;
Reason-phrase indicates the text description of the status code.

The status code consists of three numbers. The first number defines the response category and has five possible values:

1xx: indicates that the request has been received and continues to be processed.
2XX: Success-indicates that the request has been successfully received, understood, and accepted
3xx: Redirection-further operations are required to complete the request
4xx: client error-the request has a syntax error or the request cannot be implemented
5xx: Server Error -- the server fails to fulfill the valid request

Common status codes, status descriptions, and descriptions:

200 OK // client request successful
400 bad request // The client request has a syntax error and cannot be understood by the server
401 unauthorized // The request is unauthorized. This status code must be used with the WWW-Authenticate header domain
403 Forbidden // The server receives the request but rejects the service.
404 Not found // The requested resource does not exist. For example, the incorrect URL is entered.
500 internal server error // unexpected Server Error
503 server unavailable // The server cannot process client requests at present, and may return to normal after a period of time

Eg: HTTP/1.1 200 OK (CRLF)



Ii. Download HTTP

After learning about the basic rules of the HTTP protocol, we can apply it to file downloads. This section describes how to download through HTTP.

2.1 file requests

Send the following request to the server:

GET/path/filename HTTP/1.0
HOST: www.server.com: 80
Accept :*/*
User-Agent: generaldownloadapplication
Connection: Close

Each line is separated by a "Carriage Return", and a "Carriage Return" is appended to the end of the request.

The Host field indicates the host name and port number. If the port number is 80 by default, you can leave it empty.
*/* In the accept field indicates receiving data of any type.
User-Agent indicates the user agent. This field is optional, but we strongly recommend that you add it because it is the basis for server statistics, tracking, and client identification.
In the connection field, close indicates that a non-persistent connection is used.

2.2 Server Response

If the server successfully receives the request without any errors, it will return data similar to the following:

HTTP/1.0 200 OK
Content-Length: 13057672
Content-Type: Application/octet-stream
Last-modified: Wed, 10 Oct 2005 00:56:34 GMT
Accept-ranges: bytes
Etag: "2f38a6cac7cec51: 160c"
Server: Microsoft-Microsoft IIS/6.0
X-powered-by: ASP. NET
Date: Wed, 16 Nov 2005 01:57:54 GMT
Connection: Close

The Content-Length field is an important field that indicates the length of the data returned by the server. This length does not contain the HTTP header length. In other words, our request does not contain the range field (which will be discussed later), indicating that we are requesting the entire file, so Content-Length is the size of the entire file. Other fields are attributes of files and servers.

The returned data also ends with the end mark (line feed) of the last line and an additional line feed, that is, "/R/n/R/N ". "/R/n/R/N" is followed by the file content, so that we can find "/R/n/R/N ", starting from the first byte after it, the system continuously reads data and writes it to the file.

2.3 resumable upload

Resumable upload is easy to implement. You only need to add a range field in the request. If a file contains 1000 bytes, the range is 0-999, then:

Range: bytes = 500-indicates the length of 500-bytes to read the file, totaling bytes.
Range: bytes = 500-599 indicates the length of 100-bytes to read the file, totaling bytes.

There are several other methods of range writing, but the above two are the most commonly used, and it is sufficient for resumable data transfer. If the HTTP request contains the range field, the server returns 206 (partial content), and the HTTP header also has a corresponding content-range field, similar to the following format:

Content-range: bytes 500-999/1000

The content-range field indicates that the server returns a certain range of files and the total length of the files. At this time, the Content-Length field is not the size of the entire file, but the number of bytes corresponding to the file range. Pay attention to this 1.1.

2.4 redirection

Many software download websites use program redirection for File Download Links. For example, the HTTP of pchome ACDSee is:

Http://download.pchome.net/php/tdownload2.php? SID = 5547 & url =/multimedia/Viewer/acdc31sr1b051007.exe & SVR = 1 & typ = 0

This address does not directly identify the location of the file, but is redirected through the program. If you request such a URL to the server, the server will return 302 (moved temporarily), which means you need to redirect the URL and contain a location field in the HTTP header, the value of the location field is the destination URL after redirection. In this case, you need to disconnect the current connection and send a request to the redirected server.

Iii. httpclient

Although JDK's java.net package provides basic functions for accessing the HTTP protocol, for most applications, the functions provided by the JDK library itself are not rich and flexible. Httpclient is a sub-project under Apache Jakarta common. It is used to provide an efficient, up-to-date, and function-rich client programming toolkit that supports http protocol. It also supports the latest versions and suggestions of HTTP protocol. Httpclient has been applied to many projects. For example, the other two open-source projects cactus and htmlunit on Apache Jakarta both use httpclient. The httpclient project is very active and many people are using it. Currently, the httpclient version is 3.0 RC4 released in 5.10.11.

The main functions of httpclient are as follows:

1) Implement all HTTP methods (get, post, put, Head, etc );
2) automatic steering is supported;
3) supports HTTPS;
4) supports proxy servers.

3.1 environment construction and required packages

Java Development Environment JDK is required, and network access is required. The android program requires a permission of "android. Permission. Internet.

Required packages:

1. commons-httpclient-3.1.jar: includes the classes required for the HTTP protocol.
2. commons-logging-1.1.jar: includes a class that records activity logs when the program is running.
3, commons-codec-1.3.jar: including encoding and decoding class.

These packages are open-source projects of Apache. You can find them at http://www.apache.org.

3.2 httpclient implements basic HTTP Communication Operations

Before you implement all the operations, you must first instantiate an httpclient, that is, initialize a client.

Httpclient client = new httpclient ();

3.2.1 request

Take the GET request as an example.

A. instantiate a request method.

Httpmethod method = new getmethod ("http://www.google.cn ");


Note:

① Although Google has moved the server out of mainland China, httpclient can achieve automatic redirection. So when the status code returned by the server is 3××, it will be automatically redirected to know the actual location of the file ).

② The string in the getmethod constructor represents the URI address of the file. The complete name is required because no server host address is specified. You can also do this:

Client. gethostconfiguration (). sethost ("www.imobile.com.cn", 80, "HTTP ");

......

Httpmethod method = new getmethod ("/simcard. php? Simcard = 1330227 ");


B. Add the desired message header.

Method. addrequestheader ("range", "bytes = 500 -");


Httpclient will construct the necessary message header information. If there are no special requirements, you do not need to modify it. However, if you need to add some special information in the message header, such as resumable upload during download, you can use the above method to modify it.

C. Send a request (execute a command ).

Int statuscode = client.exe cutemethod (method );


At this point, the program actually sends a request to the server. After the connection is successful, the function returns, and the return value is the status code.

3.2.2 response

Example.

A. Return status code.

In the preceding example, "statuscode" is the status code. In addition, you can:

Int statuscode = method. getstatuscode ();

Note: there is an "httpstatus" Class in the httpclient package, which defines most status codes. For example:

Httpstatus. SC _ OK
Httpstatus. SC _forbidden.

B. response header.

Header [] headers = method. getresponseheaders ();

Obtains the Response Headers returned by all servers.

Header header = method. getrequestheader ("Content-Type ");

Obtains the key-value pairs specified in the response header.

You can call header. getname () and header. getvalue () to obtain relevant information.

C. Response body.

Byte [] bytes = method. getresponsebody ();

Inputstream = method. getresponsebodyasstream ();

String string = method. getresponsebodyasstring ();

The above three methods are selected as needed.

3.2.3 disconnect

Method. releaseconnection ();


Disconnect.

3.2.4 others

Others include things that have nothing to do with downloads, but are very basic and useful.

A. Post Data.

The post request and get request are roughly the same. The only thing you need to note is how to add the information you need to transmit to the post information.

Postmethod. setrequestbody (inputstream body );

Postmethod. setrequestbody (namevaluepair [] parameterbody );

Postmethod. setrequestbody (string body );

B. proxy server.

You only need to specify the proxy of the httpclient instance. All operations based on this instance will be performed by this proxy.

Httpclient. gethostconfiguration (). setproxy (hostname, Port );


C. character encoding.

The encoding of a target page may appear in two places:

The first part is the HTTP header returned by the server (Content-Type and content-encoding fields of requestheader );

The other part is the HTML/XML page. For example:

<Meta http-equiv = "Content-Type" content = "text/html; charset = gb2312"/>
Or <? XML version = "1.0" encoding = "gb2312"?>

D. Automatic jump.

Httpclient can automatically redirect GET requests. However, automatic redirection is not supported for post and put requests that require subsequent services.

When the status code returned by the server is 3××, You Need To jump Based on the address of the "location" field in the message header. Note that the address of the "location" field may be a relative address and must be processed by yourself.

Another possibility is page navigation. For example, in HTML, <meta http-equiv = "refresh" content = "5; url = http://www.ibm.com/us">.

E. HTTPS protocol.

See:Httpclient getting started.


References

There are not many things in this article, most of which are excerpted, referenced, and summarized from some online materials. I would like to thank the authors of these articles for sharing their documents.

1,HTTP protocol (favorites)
2,File Download Principles 1 HTTP protocol
3,Httpclient getting started
4,Httpclient getting started tutorial

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.