Introduce the DOM selector for mojolicious Mojo::D om and its mojo::useragent (compare Web::scraper)

Source: Internet
Author: User

Recently just need to do page analysis, before all with Anyevent::http and Web::scraper. This time tried mojo::D om and mojo::useragent.
First of all, my trial conclusion is: If the program is not with the web, just a page analysis or file processing program, it is good. Otherwise, you can consider mojo.

First say Mojo: The advantages of:D om and mojo::useragent:
Mojo: This DOM selector made by:D Om is very handy at some point.
After reading the HTML, you can accurately locate the required elements or iterate through them.

    1. My $dom = Mojo::D om->new ($html _string);
    2. $dom->find (' p[id] ')->each (sub {say Shift->{id}});
Copy Code

It is more convenient when used in conjunction with Mojo::useragent. Mojo::useragent is rich in functionality, but if you don't want to use that, you can use it as a wget (HTTP client). It supports both synchronous get and non-blocking get web pages. and Mojo::D OM is well integrated. Like what:

    1. My $ua = mojo::useragent->new;
    2. My $title = $tx->res->dom->at (' head title ')->text;
Copy Code

It's better when you put it all in the mojolicious web framework, because it's written by an author, and the integration is very good. The previous work to Selectmen is now 2, and 3 lines of code are complete.

Look at the above are very good, I say some of the shortcomings in my opinion.
1. XPath is not supported.
I am familiar with XPath, but unfortunately, XPath is not supported. Although a lot of things can be achieved in a mojo way, but I can still say something I used but did not realize. And I guess because of this, the efficiency will be much worse. Because Web::scraper is XPath and can be parsed with Xml::libxml, Html/xml,xml::libxml is the fastest in all DOM modes (LIBXML2 > Expat). So I think a purely Perl-written non-XPath Dom selector is inefficient enough to do large-scale data analysis. (Guess only)

2. It may be my usage habits, when the page is complicated, I prefer to use Web::scraper
People who have used Web::scraper know that you need to use XPath to write a uniform rule that conforms to a certain type of page, and then use this set of rules to analyze a class of pages. When the page information is complex, this set of rules may be dozens of or even hundreds of lines. And with Mojo::D OM can only use a lot of Find->each and Perl callback functions wrapped together, inconvenient debugging, write page analysis rules of the people also have to be Perl.

3. CORO::ROUSE_CB and coro::rouse_wait are not available.

    1. My $coro = Async {
    2. Http_get "http://www.example.com/", CORO::ROUSE_CB;
    3. My ($data, $header) = coro::rouse_wait;
    4. Print dumper $header;
    5. };
Copy Code

The one above can. This is not going to be the next one.

    1. My $coro = Async {
    2. My $ua = mojo::useragent->new;
    3. $ua->get (' http://www.example.com/' = CORO::ROUSE_CB);
    4. My ($ua 2, $tx) = coro::rouse_wait;
    5. My $title = $tx->res->dom->at (' head title ')->text;
    6. print "$title \ n";
    7. };
Copy Code
www.hwmqh.com/gggbdfwww.hwmqh.com/gbdfgfwwww.hwmqh.com/gbdfkhwwww.hwmqh.com/gbdfshwww.hwmqh.com/gbdfsjxzwww.hwmqh.com/gbdfylsjxzwww.hwmqh.com/gbdfwfmwww.hwmqh.com/gbdfdtkhwww.hwmqh.com/gbdfhywww.hwmqh.com/gbdfrhkhwww.hwmqh.com/gbdfzdlwww.hwmqh.com/gbdfwwww.hwmqh.com/gbdfdtkmdlwww.hwmqh.com/gbdfglwwww.hwmqh.com/gbdfxjwwww.hwmqh.com/gbdfwtkhzxwww.hwmqh.com/gbdfwtdhkhwww.hwmqh.com/gbdfwkhwww.hwmqh.com/gbdfwthykhwww.hwmqh.com/gbdftgywww.hwmqh.com/gbdfylwzwww.hwmqh.com/gbdfzmzcwww.hwmqh.com/gbdfbjlwww.hwmqh.com/gbdfylyqwww.hwmqh.com/mdgbdfrqrhwww.hwmqh.com/gbdfmdyjmwww.hwmqh.com/mdgbdfaqmwww.hwmqh.com/gbdfkmdlwww.hwmqh.com/gbdfxwzwww.hwmqh.com/gbdfwtzxwww.hwmqh.com/gbdfdmswww.hwmqh.com/gbdfzcwww.hwmqh.com/gbdfsywww.hwmqh.com/gbdfwzxwww.hwmqh.com/gbdfzjwww.hwmqh.com/gbdfdzwww.rhliv.com/gbdfwww.rhliv.com/gbdfkhwww.rhliv.com/gbdfylwwww.rhliv.com/gbdfylwww.rhliv.com/gbdfhykhwww.rhliv.com/1659988_comgbdfwww.rhliv.com/gbdfdhtzwww.rhliv.com/gbdfylptwww.rhliv.com/gbdfshywww.rhliv.com/gbdfzxkhwww.rhliv.com/gbdfgwwww.rhliv.com/gbdfwtwww.rhliv.com/gbdfylcwww.rhliv.com/gbdfdlwww.rhliv.com/gbdfxcwww.rhliv.com/gbdfyldlwww.rhliv.com/gbdfkhblwww.rhliv.com/gbdfylkhwww.rhliv.com/gbylgbdfwww.rhliv.com/gggbdfylcwww.rhliv.com/gbdfsjzmdlwww.rhliv.com/gbdfylflwww.rhliv.com/gbdfzmnyqwww.rhliv.com/gbdfyjwww.rhliv.com/gbdfxmfwww.rhliv.com/szdmdgbdfwww.rhliv.com/mdgbdfwww.rhliv.com/gbdfdhkhwww.rhliv.com/gbdfdlkhwww.rhliv.com/gbdfwtkhwww.rhliv.com/gbdfkh1581260www.rhliv.com/gbdfylhbwzwww.rhliv.com/gbdfyqwww.rhliv.com/sygbdfylwww.rhliv.com/gbdfylzmyqwww.rhliv.com/gbdfylyflmwww.rhliv.com/gbdfylcznlwww.rhliv.com/gbdfwzwww.rhliv.com/gbdftzwww.rhliv.com/gbdfdhwww.rhliv.com/gbdfsjwww.rhliv.com/gggbdfwww.rhliv.com/gbdfgfwwww.rhliv.com/gbdfkhwwww.rhliv.com/gbdfshwww.rhliv.com/gbdfsjxzwww.rhliv.com/gbdfylsjxzwww.rhliv.com/gbdfwfmwww.rhliv.com/gbdfdtkhwww.rhliv.com/gbdfhywww.rhliv.com/gbdfrhkhwww.rhliv.com/gbdfzdlwww.rhliv.com/gbdfwwww.rhliv.com/gbdfdtkmdlwww.rhliv.com/gbdfglwwww.rhliv.com/gbdfxjwwww.rhliv.com/gbdfwtkhzxwww.rhliv.com/gbdfwtdhkhwww.rhliv.com/gbdfwkhwww.rhliv.com/gbdfwthykhwww.rhliv.com/gbdftgywww.rhliv.com/gbdfylwzwww.rhliv.com/gbdfzmzcwww.rhliv.com/gbdfbjlwww.rhliv.com/gbdfylyqwww.rhliv.com/mdgbdfrqrhwww.rhliv.com/gbdfmdyjmwww.rhliv.com/mdgbdfaqmwww.rhliv.com/gbdfkmdlwww.rhliv.com/gbdfxwzwww.rhliv.com/gbdfwtzxwww.rhliv.com/gbdfdmswww.rhliv.com/gbdfzcwww.rhliv.com/gbdfsywww.rhliv.com/gbdfwzxwww.rhliv.com/gbdfnyqbwww.rhliv.com/gbdfzjwww.rhliv.com/gbdfdzwww.bbilo.com/gbdfwww.bbilo.com/gbdfkhwww.bbilo.com/gbdfylwwww.bbilo.com/gbdfylwww.bbilo.com/gbdfhykhwww.bbilo.com/1659988_comgbdfwww.bbilo.com/gbdfylptwww.bbilo.com/gbdfshywww.bbilo.com/gbdfzxkhwww.bbilo.com/gbdfgwwww.bbilo.com/gbdfwtwww.bbilo.com/gbdfylcwww.bbilo.com/gbdfdlwww.bbilo.com/gbdfxcwww.bbilo.com/gbdfyldlwww.bbilo.com/gbdfkhblwww.bbilo.com/gbdfylkhwww.bbilo.com/gggbdfylcwww.bbilo.com/gbdfsjzmdlwww.bbilo.com/gbdfylflwww.bbilo.com/gbdfzmnyqwww.bbilo.com/gbdfyjwww.bbilo.com/gbdfxmfwww.bbilo.com/szdmdgbdfwww.bbilo.com/mdgbdfwww.bbilo.com/gbdfdhkhwww.bbilo.com/gbdfwtkhwww.bbilo.com/gbdfkh1581260www.bbilo.com/gbdfylhbwzwww.bbilo.com/gbdfyqwww.bbilo.com/gbdfylzmyqwww.bbilo.com/gbdfylyflmwww.bbilo.com/gbdfylcznlwww.bbilo.com/gbdfwzwww.bbilo.com/gbdftzwww.bbilo.com/gbdfdhwww.bbilo.com/gbdfsjwww.bbilo.com/gggbdfwww.bbilo.com/gbdfgfwwww.bbilo.com/gbdfkhwwww.bbilo.com/gbdfshwww.bbilo.com/gbdfsjxzwww.bbilo.com/gbdfylsjxzwww.bbilo.com/gbdfwfmwww.bbilo.com/gbdfhywww.bbilo.com/gbdfzdlwww.bbilo.com/gbdfwwww.bbilo.com/gbdfdtkmdlwww.bbilo.com/gbdfglwwww.bbilo.com/gbdfxjwwww.bbilo.com/gbdfwtkhzxwww.bbilo.com/gbdfwtdhkhwww.bbilo.com/gbdfwkhwww.bbilo.com/gbdfwthykhwww.bbilo.com/gbdftgywww.bbilo.com/gbdfylwz

Introduce the DOM selector for mojolicious Mojo::D om and its mojo::useragent (compare Web::scraper)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.