# # Rcurl Author # #
Duncan Temple Lang
Associate professor at the University of California, U.C Davis, USA
Research on information technology based on statistical integration
Overview of Rcurl
The Rcurl package is a r-interface to the Libcurl library that provides HTTP
Facilities. This allows us to download files from the Web servers, post forms, use
HTTPS (The Secure HTTP), use persistent connections, upload files, use binary
Content, handle redirects, password authentication, etc.
Rcurl This package provides an interface from R to Libcurl library to implement some functions of HTTP. For example, from
The server downloads files, keeps connections, uploads files, reads in binary format, handles redirection, password authentication, and so on.
What is Curl&libcurl
–curl: Open source file Transfer tool that works with URL syntax in command line mode
The library behind –curl is Libcurl.
Function
– Get Page
– Related Certifications
– Upload and download
– Information Search
– ......
HTTP protocol
Protocol refers to the rules or rules that must be adhered to in communication between two computers in a computer communication network, and Hypertext Transfer Protocol (HTTP) is a communication protocol that allows Hypertext Markup Language (HTML) documents to be routed from a Web server to a client's browser
We are currently using the http/1.1 version
1. URL details
Basic format: schema://host[:p ort#]/path/.../[?query-string][#anchor]
Scheme specifies the protocol used by the lower layer (for example: HTTP, HTTPS, FTP)
The default port for the host HTTP server's IP address or domain name
port# http Server is 80, which can be omitted.
Path to access resource
Query-string data sent to the HTTP server
anchor-anchor
2. Request requests
Request line, request header, message body
Method represents the request method, than such as "GET", "POST", "HEAD", "PUT" and so on
Path-to-resource represents the requested resource
Http/version-number represents the version number of the Http protocol
request Header
? Host server address
? Accept media types that are acceptable on the browser side, text/html
? Accept-encoding the encoding method that the browser receives, usually refers to the compression method
? Accept-language the language that the browser declares itself to receive
? User-agent tell the server client's operating system, browser version
? The component of the most important request header of a Cookie, the data stored on the user's local terminal (usually encrypted)
in order to identify the user and perform session tracking. Referer jump page
? Connection Client-Server connection Status
3. Response Response
Status line, message header, response body
Http/version-number represents the version number of the HTTP protocol
Status-code and message for status code and status information
status-code (status code)
? The status code is used to tell the HTTP client whether the HTTP server produced the expected response.
? The 5 class status codes are defined in the http/1.1, the status codes consist of three-bit numbers, the first number defines the class of the response
–1xx Prompt-Indicates that the request has been successfully received, continues processing
–2xx Success-Indicates that the request was successfully received, understood, accepted
–3x X Redirect-must be further processed to complete the request
–4XX Client Error-Request syntax error or request not implemented
–5XX server-side error-the server failed to implement a legitimate request
message Header
? Software information for server servers, such as Nginx
? Date response Dates
? Last-modified Last Modified Time
? The Content-type server tells the browser what type of object it responds to, text/html
? Connection whether the server and client remain linked
? X-powered-by that the site is a technology development, such as PHP
? The length of bytes returned by the Content-length request
? Set-cookie responds to the most important header for sending a cookie to the appropriate browser, and each write cookie generates a Set-cookie
Three major functions of Rcurl
GetURL ()
GetForm ()
Postform ()
The--rcurl of R language crawler