"Perl" LWP module

Source: Internet
Author: User



The LWP (short for ' Library for World Wide Web in Perl') is a block group of modules that are used to obtain network data.



"Lwp::simple"



by calling the Get ($url) function, you can get the content of the relevant URL. If no error occurs, the Get function returns this page, otherwise, returns UNDEF.



lwp::simple is convenient when doing simple work. However, because cookies are not supported, user authentication, editing of the HTTP request header (requests headers), and the HTTP Resonse header (response header) Read and write (mainly HTTP error message). Therefore, when these features are required, the LWP Class model is used.



" lwp::useragent and http::response 




< Span style= "font-family: ' Microsoft yahei ', MingLiU, Georgia, Sans-serif;font-size:13px;line-height : 20.7999992370605px; " > You only need one $browser object in a program, but each time you send a request, you get a new Http::response object. http::response object has some of the following valuable properties:
A status code value, indicating success or failure. You can use $response->is_success to detect it.
http Status Line (HTTP status description), observe the result of the $response->status_line .


mime content-type (file type) is obtained by $response->content_type. such as "text/html", "Image/gif", "Application/xml" and so on.
content of the response (in response to the returned content) is stored in the $response->content. The content may be in HTML format. In the case of GIF format, $response->content is binary GIF data.
Many other methods can be found in Http::response and its superclasses (parent class) Http::message and Http::headers.


< Span style= "font-family: ' Microsoft yahei ', MingLiU, Georgia, Sans-serif;font-size:13px;line-height : 20.7999992370605px;background-color: #FFE500; " > add additional HTTP requests headers
request (request) is $response = $browser->get ($url), but if necessary, you can add other HTTP headers to your request with a list of key values after the $url. Like this:
    $response = $browser->get ($url, $key 1, $value 1, $key 2, $value 2, ...);

For example, if you want to make a request for a Web site that only allows Netscape browsers to be connected, you need to send a header like Netscape, as follows:
my @ns_headers = (
 ' user-agent ' = ' mozilla/4.76 [en] (Win98; U) ',
 ' Accept ' = ' image/gif, Image/x-xbitmap, Image/jpeg,image/pjpeg, Image/png, * * ',
 ' accept-charset ' = ' iso-8859-1,*,utf-8′,
 ' accept-language ' = ' en- us ',
 );
If you are only going to modify user-agent, you can change the default agent ' libwww-perl/5.65 ' (or something else) through the Lwp::useragent agent method.
$browser->agent (' mozilla/4.76 [en] (Win98; U);

< Span style= "font-family: ' Microsoft yahei ', MingLiU, Georgia, Sans-serif;font-size:13px;line-height : 20.7999992370605px;background-color: #FFE500; " > use cookies
The default Lwp::useragent object works like a browser that does not support cookies. There is more than one way to set its Cookie_jar property, which allows it to support cookies. A "Cookie jar" is a container for storing HTTP cookies. You can save it to your hard drive (like Netscape using Cookies.txt) or in memory. Cookies that are stored in memory disappear after the program is completed.
memory-style cookie usage: $browser->cookie_jar ({}); You can also store cookies in a file on your hard disk:

use HTTP :: Cookies;
my $ cookies = HTTP :: Cookies-> new (
    ‘File’ => ‘/some/where/cookies.lwp’, # address where cookies are stored
    ‘Autosave’ => 1, # Automatically save to hard disk when finished
  )
$ browser-> cookie_jar ($ cookies);
the cookie in the file is stored in an LWP-custom format, and if you want to use this cookie file in Netscape, you can use Http::cookies::netscape class:






use HTTP::Cookies;# yes, loads HTTP::Cookies::Netscape too
my $ns_cookies=HTTP::Cookies::Netscape->new(
   ‘file’ => ‘c:/Program Files/Netscape/Users/DIR-NAME-HERE/cookies.txt’,
       # where to read cookies
);
$browser->cookie_jar($ns_cookies );
you can also use ' AutoSave ' and 1 as above. But Netscape's cookies are sometimes discarded before they are written to the hard drive, at least when you write this article.




submit a form via post
most HTML tables use HTML POST to submit data to the server, where you can:
$response = $browser->post ($url,
  [
   formkey1 = value1,
   Formkey2 = value2,
   ... ..
  ],
);
or you can also send the HTTP header together
$response = $browser->post ($url,
  [
   formkey1 = value1,
   Formkey2 = value2,
   ... ..
  ],
 headerkey1 = value1,
 Headerkey2 = value2,
);


Use the LWP to access Renren because Renren Access user information needs to log in to complete, so simply to crawl must walk, so must be simulated logging cookies, using the LWP module can be completed:





#! / usr / bin / perl
use strict;
use warnings;
use HTTP :: Request;
use HTTP :: Cookies;
use LWP :: UserAgent;
my $ url = ‘http://passport.renren.com/PLogin.do’;
# Used to store cookies
my $ cookie_jar = HTTP :: Cookies-> new (
file => "./acookies.lwp",
autosave => 1,
);
# Put cookies to LWP :: UserAgent to handle cookies
# Landing
my $ browser = LWP :: UserAgent-> new;
my $ cookies = $ browser-> cookie_jar ($ cookie_jar);
$ browser-> agent (‘Mozilla / 9 [en] (Centos; Linux)‘);
my $ res = $ browser-> post ($ url,
[
email => ‘XXXX’,
password => ‘XXXXXX’,
origURL => ‘http // www.renren.com / home’,
domain => ‘renren.com’,
],
);
#Now can access the friends inside
# $ res = $ browser-> get (‘http://www.renren.com/home.do’);
$ res = $ browser-> get (‘http://www.renren.com/235018505?pma=p_profile_m_pub_friendslist_a_profile’);
print $ res-> content (); 


Reference: http://blog.mcshell.org/2012/03/19/perl-lwp-simple-use.html









"Perl" LWP module


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.