Mb_convert_encoding (Addslashes ($u [+]), ' UTF-8 ', ' iso-8859-15,shift-jis,eucjp-win,sjis-win,iso-8859-1,utf-8 ')
This transcoding only the first valid, such as the first one is iso-8859-15, Western European code can be converted, but the Japanese code will not be able to turn
The first one is Eucjp-win,sjis-win. These Japanese codes can be turned around, and the Western European code does not turn.
Excuse me, how to deal with this situation???
Reply to discussion (solution)
The inner code of the character set list you gave has a intersection, not uniquely identifying
For this reason, the mb_string developer provides a mb_check_encoding function for you to judge individually
Do not think that the MB module is so intelligent
Without semantic analysis, I don't think the computer is iso-8859-1 or Shift-jis.
I know why. Each character equals the Japanese and western European codes, so the first encoding is valid
How do you deal with this situation like me?
The following are the European
Urbanizaci? Camino de Vi?les Calle Rio Aragon N9 Pinseque
The following are the Japanese
"???????? Iphone5/4s/4??? ?? ?????????? ?????? ??????? "
Mainly in Western Europe inside there is garbled will be judged as Japanese, if a character one character to judge that is not going to take a long time to deal with??
If I say
Urbanizaci? Camino de Vi?les Calle Rio Aragon N9 Pinseque
Is GBK code can you accept this statement?
$s = "???????? Iphone5/4s/4??? ?? ?????????? ?????? ??????? Echo mb_detect_encoding ($s, "ascii,jis,utf-8,euc-jp,sjis"), Echo mb_convert_encoding ($s, ' utf-8 ', ' SJIS '), Php_eol; $d = Explode (', ', ' SHIFT-JIS,EUCJP-WIN,SJIS-WIN,UTF-8,JIS,SJIS,EUC-JP,GBK '), foreach ($d as $t) var_dump ($t, Mb_ Check_encoding ($s, $t));
SJISセカンドショップIPHONE5/4S/4 with repair decomposition tool star type ドライバ? スクレイパ? ネジ Pedestal Cup
String (9) "Shift-jis"
BOOL (TRUE)
String (9) "Eucjp-win"
BOOL (FALSE)
String (8) "Sjis-win"
BOOL (TRUE)
String (5) "Utf-8"
BOOL (FALSE)
String (3) "JIS"
BOOL (FALSE)
String (4) "Sjis"
BOOL (TRUE)
String (6) "EUC-JP"
BOOL (FALSE)
String (3) "GBK"
BOOL (TRUE)
Let me get this over with, haha.
Iso-8859-1urbanización Camino de Viñales Calle Rio Aragon N9 Pinsequeue
String (9) "Shift-jis"
BOOL (FALSE)
String (9) "Eucjp-win"
BOOL (FALSE)
String (8) "Sjis-win"
BOOL (TRUE)
String (5) "Utf-8"
BOOL (FALSE)
String (3) "JIS"
BOOL (FALSE)
String (4) "Sjis"
BOOL (FALSE)
String (6) "EUC-JP"
BOOL (FALSE)
String (3) "GBK"
BOOL (TRUE)
String (6) "Euc-kr"
BOOL (TRUE)
String (Ten) "Iso-8859-1"
BOOL (TRUE)
Not intentionally hit LZ, just want to explain a little: write a program to be thoughtful, can do things to do, make it as far as possible not wrong, after all, with the people who believe you
You're not going to mess it up. In the absence of a BOM indication, it is extremely difficult to identify a character set of a string
A unique recognition is possible only when a string contains a set of differences in a charset. And the string should be long enough
Upstairs two big, thank you.
I would like to explain that these are the data to be processed, the raw data we download from the national websites.
With your code, I think it's impossible to identify exactly what data we're dealing with with one or two functions.
So I'm now using a workaround, we have a field that is country-based, and the corresponding string is transferred according to the country.