Submitted a question on StackOverflow just now.
I intended to extract content from a web page which contains many unicode characters represented in the form of "%xx". As I used Perl module LWP to get web page, naturally handled these unicode characters using Perl Regex as below.
my $html = "%20%26%40 ";$html =~ s#%([0-9a-f]+)#\x{\1}#ig;print "$html\n";
But above code dosen't work, it output nothing but "00". Get stuck now ... Any hint would be appreciated.
Some people replied very quickly. Below are their answers.
Perl has functions built in the URI::Escape
module
for this already. You don't need to mess with regular expressions
use URI::Escape;my $encode = uri_unescape($string);
See this page for more
Funny and ugly code :
my $html = "%20%26%40 ";$html =~ s#%([0-9a-f]{2})#"chr(0x$1)"#igee;print "$html\n";
Edit : (I'm obliged to say) this code is maybe cute, but do not use
this in production ! (there are many cases where it's not working)
You can observe all the discussion here http://stackoverflow.com/questions/12144401/how-can-convert-character-xx-in-html-using-perl.
I should say StackOverflow is indeed a great place for technical people:-)