For example:
The following log information appears:
(A) "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5%E4%B8%AD%E5%9B%BD%E5%A4%A7%E5%AD%A6%E7%94%9F5%E4%BA%BA%E5%88%B6%E8%B6%B3%E7%90%83%E8%81%94%E8%B5%9B ";
And
(B) "http://fenlei.hudong.com/xtep Chinese soccer league with five people /"
The feature information to be matched is:"Http://fenlei.hudong.com/xtep Chinese soccer league with five people /"
How can I change the encoding method to correctly collect traffic information?
1. Start using the following methods:
Write the utf8 encoding of the features to be matched to the log information:
my $fl1 = "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5%E4%B8%AD%E5%9B%BD%E5%A4%A7%E5%AD%A6%E7%94%9F5%E4%BA%BA%E5%88%B6%E8%B6%B3%E7%90%83%E8%81%94%E8%B5%9B/";my $fl2 = "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5/";my $fl3 = "http://fenlei.hudong.com/%E7%89%B9%E6%AD%A5%E5%8D%81%E5%B9%B4/";if($ref eq $fl1 || $ref eq $fl2 || $ref eq $fl3){$fenleiInfo{$ref}++;$totalpv++;}
This disadvantage is that when the log information appears (B), the statistics will be missed.
2. Later I thought of encoding, And then I used the following method.
My $ FL1 = "Courier"; my $ fl2 = "http://fenlei.hudong.com/xtep/"; my $ fl3 = "Courier"; $ ref = decode ('utf8', $ ref ); if ($ ref EQ $ FL1 | $ ref EQ $ fl2 | $ ref EQ $ fl3) {$ fenleiinfo {$ ref }++; $ totalpv ++ ;}
This kind of drawback is that when the log information appears (A), statistics are missing.
3. It seems that none of the above two methods can perfectly solve the problem of such a mix of Chinese and English, so I thought of how to use uri_unescape.
The problem is solved and the sum of the results 1 and 2 is obtained.
Use URI: escape; my $ FL1 = "regular"; my $ fl2 = "http://fenlei.hudong.com/xtep/"; my $ fl3 = "regular"; my $ ref1 = ""; if ($ ref = ~ /HTTP: \// fenlei.hudong.com \ //) {$ ref1 = uri_unescape ($ ref); # decode ('utf8', $ ref ); $ ref1 = decode ('utf8', $ ref1);} if ($ ref1 EQ $ FL1 | $ ref1 EQ $ fl2 | $ ref1 EQ $ fl3) {$ fenleiinfo {$ ref1} ++; $ totalpv ++ ;}