The--rcurl of R language crawler

Source: Internet
Author: User

# # Rcurl Author # #
Duncan Temple Lang
Associate professor at the University of California, U.C Davis, USA
Research on information technology based on statistical integration

Overview of Rcurl

The Rcurl package is a r-interface to the Libcurl library that provides HTTP
Facilities. This allows us to download files from the Web servers, post forms, use
HTTPS (The Secure HTTP), use persistent connections, upload files, use binary
Content, handle redirects, password authentication, etc.

Rcurl This package provides an interface from R to Libcurl library to implement some functions of HTTP. For example, from
The server downloads files, keeps connections, uploads files, reads in binary format, handles redirection, password authentication, and so on.

What is Curl&libcurl
–curl: Open source file Transfer tool that works with URL syntax in command line mode
The library behind –curl is Libcurl.

Function
– Get Page
– Related Certifications
– Upload and download
– Information Search
– ......

HTTP protocol

Protocol refers to the rules or rules that must be adhered to in communication between two computers in a computer communication network, and Hypertext Transfer Protocol (HTTP) is a communication protocol that allows Hypertext Markup Language (HTML) documents to be routed from a Web server to a client's browser

We are currently using the http/1.1 version

1. URL details
Basic format: schema://host[:p ort#]/path/.../[?query-string][#anchor]
Scheme specifies the protocol used by the lower layer (for example: HTTP, HTTPS, FTP)
The default port for the host HTTP server's IP address or domain name
port# http Server is 80, which can be omitted.
Path to access resource
Query-string data sent to the HTTP server
anchor-anchor
2. Request requests
Request line, request header, message body

Method represents the request method, than such as "GET", "POST", "HEAD", "PUT" and so on
Path-to-resource represents the requested resource
Http/version-number represents the version number of the Http protocol

request Header
? Host server address
? Accept media types that are acceptable on the browser side, text/html
? Accept-encoding the encoding method that the browser receives, usually refers to the compression method
? Accept-language the language that the browser declares itself to receive
? User-agent tell the server client's operating system, browser version
? The component of the most important request header of a Cookie, the data stored on the user's local terminal (usually encrypted)
in order to identify the user and perform session tracking. Referer jump page
? Connection Client-Server connection Status
3. Response Response
Status line, message header, response body

Http/version-number represents the version number of the HTTP protocol
Status-code and message for status code and status information
status-code (status code)
? The status code is used to tell the HTTP client whether the HTTP server produced the expected response.
? The 5 class status codes are defined in the http/1.1, the status codes consist of three-bit numbers, the first number defines the class of the response

–1xx Prompt-Indicates that the request has been successfully received, continues processing
–2xx Success-Indicates that the request was successfully received, understood, accepted
–3x X Redirect-must be further processed to complete the request
–4XX Client Error-Request syntax error or request not implemented
–5XX server-side error-the server failed to implement a legitimate request

message Header
? Software information for server servers, such as Nginx
? Date response Dates
? Last-modified Last Modified Time
? The Content-type server tells the browser what type of object it responds to, text/html
? Connection whether the server and client remain linked
? X-powered-by that the site is a technology development, such as PHP
? The length of bytes returned by the Content-length request
? Set-cookie responds to the most important header for sending a cookie to the appropriate browser, and each write cookie generates a Set-cookie

Three major functions of Rcurl

GetURL ()
GetForm ()
Postform ()

The--rcurl of R language crawler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.