File_get_contents is used to collect data from a page. the obtained data is garbled and the encoding method has been used,
UTF-8 is detected. my page encoding is UTF-8, but it still displays garbled characters. I don't know why.
$url="xxx";$opts = array( 'http'=>array( 'user_agent' => "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)", ) ); $context = stream_context_create($opts); $neirong = file_get_contents($url, false, $context); header("content-Type: text/html; charset=Utf-8"); ob_end_flush(); $encode = mb_detect_encoding($neirong, array("ASCII","UTF-8","GB2312","GBK","BIG5")); echo $encode."
";if ($encode!="UTF-8"){ $neirong=mb_convert_encoding($neirong, "UTF-8", $encode); } echo $neirong;
$ Encode: UTF-8
$ Neirong output is garbled
My page code is UTF-8
Reply to discussion (solution)
...$neirong = file_get_contents($url, false, $context);echo base64_encode($neirong);
Post result
...$neirong = file_get_contents($url, false, $context);echo base64_encode($neirong);
Post result
It's an article. The result is too long. I'll post it for a while.
77u/ICAgIOiwjeeWlOahteaYlOaZhOixkeebqOaak++8jOmAveS6juWHkeS+k+WyqeeGmueeg+ebqOa0headremAv
$c = '77u/ICAgIOiwjeeWlOahteaYlOaZhOixkeebqOaak++8jOmAveS6juWHkeS+k+WyqeeGmueeg+ebqOa0headremAv';echo base64_decode($c);
Too many? Ever ????,? Yuji? Rock ???? Hang ??
? Yes ?? Ah. Then? Why? Your base64 is incomplete.
$c = '77u/ICAgIOiwjeeWlOahteaYlOaZhOixkeebqOaak++8jOmAveS6juWHkeS+k+WyqeeGmueeg+ebqOa0headremAv';echo base64_decode($c);
Too many? Ever ????,? Yuji? Rock ???? Hang ??
? Yes ?? Ah. Then? Why? Your base64 is incomplete.
The correct output should be "what makes Tian Shuxin speechless is that this Liu Bo really doesn't mean this. "
It's garbled.
Put? Set address? Outbound ?.
Put? Set address? Outbound ?.
This is the data collection address.
Http://www.ziyouge.com/conbdhekbefiab
This is the display page of its website.
Http://www.ziyouge.com/zy/4/4980/1333249.html
The data of the collected address is abnormal, but its page is displayed normally.
Yes ?? Set ?? Do something? .
= 224) {$ result. = change (mb_substr ($ content, $ I, 3); $ I = $ I + 3;} else {$ result. = mb_substr ($ content, $ I, 1); $ I = $ I + 1 ;}} echo $ result ;//? Function change ($ str) {$ ignore = array (''', '"', '! ','... ',': ','); If (in_array ($ str, $ ignore) {return $ str ;}$ prefix = "% u "; $ postfix = ""; $ str = iconv ('utf-8', 'ucs-2', $ str); $ arrstr = str_split ($ str, 2 ); $ unistr = ''; for ($ I = 0, $ len = count ($ arrstr); $ I <$ len; $ I ++) {$ tmp = hexdec (bin2hex ($ arrstr [$ I]); $ tmp = str_pad (dechex ($ tmp), 4, '0', STR_PAD_LEFT ); $ tmp = decrypt (substr ($ tmp, 2, 2 ). substr ($ tmp, 0, 2); $ unistr. = $ prefix. $ tmp. $ postfix;} return Unescape ($ unistr);} // decrypt function decrypt ($ d) {$ result. = str_pad (dechex (hexdec ($ d)-100), 4, '0', STR_PAD_LEFT); return $ result ;}//? Chinese function unescape ($ str) {$ ret = ''; $ len = strlen ($ str); for ($ I = 0; $ I <$ len; $ I ++) {if ($ str [$ I] = '%' & $ str [$ I + 1] = 'u ') {$ val = hexdec (substr ($ str, $ I + 2, 4); if ($ val <0x7f) $ ret. = chr ($ val); else if ($ val <0x800) $ ret. = chr (0xc0 | ($ val> 6 )). chr (0x80 | ($ val & 0x3f); else $ ret. = chr (0xe0 | ($ val> 12 )). chr (0x80 | ($ val> 6) & 0x3f )). ch R (0x80 | ($ val & 0x3f); $ I + = 5;} else if ($ str [$ I] = '%') {$ ret. = urldecode (substr ($ str, $ I, 3); $ I + = 2;} else $ ret. = $ str [$ I];} return $ ret;}?>
It's already around 11 o'clock in the evening. what about the path? Who is it, but there are three items that are still in high spirits, and they do not mean to give up until dawn,
Fdipzone: garbled characters are output in your method, but you are not familiar with decryption.
Are you there? Add to html
Its source ?? Why ?? Me? Program already? Yes ???? .
I put? Set? Outbound ?, Direct? You can.
$ V) {$ headerArr [] = $ n. ':'. $ v ;}$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, true); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_HTTPHEADER, $ headerArr); // Construct IPcurl_setopt ($ ch, CURLOPT_REFERER, 'http: // www.ziyouge.com/'); // Construct a ro $ content = curl_exec ($ ch ); $ content = substr ($ content, 3); if ($ error = curl_error ($ ch) {die ($ error);} curl_close ($ ch ); // analysis program $ result = ''; $ str_length = mb_strlen ($ content); $ I = 0; while ($ I <= $ str_length) {$ temp_str = mb_substr ($ content, $ I, 1); $ ascnum = Ord ($ temp_str); if ($ ascnum> = 224) {$ result. = change (mb_substr ($ content, $ I, 3); $ I = $ I + 3;} else {$ result. = mb_substr ($ content, $ I, 1); $ I = $ I + 1 ;}} echo'
'; Echo $ result ;//? Function change ($ str) {$ ignore = array (''', '"', '! ','... ',': ','); If (in_array ($ str, $ ignore) {return $ str ;}$ prefix = "% u "; $ postfix = ""; $ str = iconv ('utf-8', 'ucs-2', $ str); $ arrstr = str_split ($ str, 2 ); $ unistr = ''; for ($ I = 0, $ len = count ($ arrstr); $ I <$ len; $ I ++) {$ tmp = hexdec (bin2hex ($ arrstr [$ I]); $ tmp = str_pad (dechex ($ tmp), 4, '0', STR_PAD_LEFT ); $ tmp = decrypt (substr ($ tmp, 2, 2 ). substr ($ tmp, 0, 2); $ unistr. = $ prefix. $ tmp. $ postfix;} return Unescape ($ unistr);} // decryption function decrypt ($ d) {$ result = str_pad (dechex (hexdec ($ d)-100), 4, '0 ', STR_PAD_LEFT); return $ result ;}//? Chinese function unescape ($ str) {$ ret = ''; $ len = strlen ($ str); for ($ I = 0; $ I <$ len; $ I ++) {if ($ str [$ I] = '%' & $ str [$ I + 1] = 'u ') {$ val = hexdec (substr ($ str, $ I + 2, 4); if ($ val <0x7f) $ ret. = chr ($ val); else if ($ val <0x800) $ ret. = chr (0xc0 | ($ val> 6 )). chr (0x80 | ($ val & 0x3f); else $ ret. = chr (0xe0 | ($ val> 12 )). chr (0x80 | ($ val> 6) & 0x3f )). ch R (0x80 | ($ val & 0x3f); $ I + = 5;} else if ($ str [$ I] = '%') {$ ret. = urldecode (substr ($ str, $ I, 3); $ I + = 2;} else $ ret. = $ str [$ I];} return $ ret;}?>
Are you there? Add to html
Its source ?? Why ?? Me? Program already? Yes ???? .
I put? Set? Outbound ?, Direct? You can.
[/Code]
The error code is displayed because php versions are different. I tested it normally in 5.3.28. in PHP 6.0.0-dev, the test is garbled. Is it because PHP 6.0.0-dev lacks any components?
Maybe, dev ..
Maybe, dev ..
The local version 5.3.28 is normal, and garbled characters appear again when you switch to the server version 5.3.28...
Linux environment, ubuntu local, and Debian server
Estimate ?? Php mb string versions ?.
? Environment ?? Depends on yourself? Excuse me, why ??? Yes ?? More? Environment.
Estimate ?? Php mb string versions ?.
? Environment ?? Depends on yourself? Excuse me, why ??? Yes ?? More? Environment.
The problem has been found. Different platforms
$ Str = iconv ('utf-8', 'ucs-2', $ str); // The output result is different.
// Example: $ str = "? "; $ Str = iconv ('utf-8', 'ucs-2', $ str); the normal result is" V ^ "; the abnormal result is "^ V". ask how to solve this problem.
Find the method .. Different platforms convert different usc-2 codes
For UCS-2, UCS-2BE is by default in linux. Iconv (specifies the UCS-2) is used to convert the unicode of the UCS-2BE. If you convert a UCS-2 from a windows platform, you need to specify a UCS-2LE.
Hmm
$ Str = iconv ('utf-8', 'ucs-2', $ str );
Change?
$ Str = iconv ('utf-8', 'ucs-2le', $ str );
You can.