Process the mixed Chinese and English features in the log to be analyzed

Source: Internet
Author: User

For example:

The following log information appears:

(A) "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5%E4%B8%AD%E5%9B%BD%E5%A4%A7%E5%AD%A6%E7%94%9F5%E4%BA%BA%E5%88%B6%E8%B6%B3%E7%90%83%E8%81%94%E8%B5%9B ";

And

(B) "http://fenlei.hudong.com/xtep Chinese soccer league with five people /"

The feature information to be matched is:"Http://fenlei.hudong.com/xtep Chinese soccer league with five people /"

How can I change the encoding method to correctly collect traffic information?

1. Start using the following methods:

Write the utf8 encoding of the features to be matched to the log information:

my $fl1 = "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5%E4%B8%AD%E5%9B%BD%E5%A4%A7%E5%AD%A6%E7%94%9F5%E4%BA%BA%E5%88%B6%E8%B6%B3%E7%90%83%E8%81%94%E8%B5%9B/";my $fl2 = "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5/";my $fl3 = "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5%E5%8D%81%E5%B9%B4/";if($ref eq $fl1 || $ref eq $fl2 || $ref eq $fl3){$fenleiInfo{$ref}++;$totalpv++;}

This disadvantage is that when the log information appears (B), the statistics will be missed.


2. Later I thought of encoding, And then I used the following method.

My $ FL1 = "Courier"; my $ fl2 = "http://fenlei.hudong.com/xtep/"; my $ fl3 = "Courier"; $ ref = decode ('utf8', $ ref ); if ($ ref EQ $ FL1 | $ ref EQ $ fl2 | $ ref EQ $ fl3) {$ fenleiinfo {$ ref }++; $ totalpv ++ ;}

This kind of drawback is that when the log information appears (A), statistics are missing.

3. It seems that none of the above two methods can perfectly solve the problem of such a mix of Chinese and English, so I thought of how to use uri_unescape.

The problem is solved and the sum of the results 1 and 2 is obtained.

Use URI: escape; my $ FL1 = "regular"; my $ fl2 = "http://fenlei.hudong.com/xtep/"; my $ fl3 = "regular"; my $ ref1 = ""; if ($ ref = ~ /HTTP: \// fenlei.hudong.com \ //) {$ ref1 = uri_unescape ($ ref); # decode ('utf8', $ ref ); $ ref1 = decode ('utf8', $ ref1);} if ($ ref1 EQ $ FL1 | $ ref1 EQ $ fl2 | $ ref1 EQ $ fl3) {$ fenleiinfo {$ ref1} ++; $ totalpv ++ ;}


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.