= mb_convert_encoding ($str, $coding, $enc);return $str;}Detection System CodingAt present, there is no suitable method, can only be placed in a Chinese file, and then cycle the use of different encoding detection, can read the file on the code is correctprotected static function _detectoscode () {$codingFile = '/coded-encoding-os-path.html ';$detectPath = __dir__. $codingFile;$allCoding = Mb_list_encodings ();foreach ($allCoding as $coding) {if (False!== stripos (' |byte2be|byte2le|byte4be|byt
linux and Windows is correct;
"Php
// Convert utf8 encoding to the current system encoding
Protected static function _ toOsCode ($ str, $ coding = null ){
$ Enc = 'utf-8 ';
If (empty ($ coding )){
$ Coding = self: $ osPathEncoding;
}
$ Str = mb_convert_encoding ($ str, $ coding, $ enc );
Return $ str;
}
// Detection system code
// Currently, no proper method is found. Only one Chinese file can be stored, and different encoding methods can be used for recycling. if you can read the file, t
Original string: http://www.qdta.cn/xxw/xxInfo.asp?xxlx= Travel Hot News id=12939
Want to convert to: http://www.qdta.cn/xxw/xxInfo.asp?xxlx= Travel Hot News id=12939
what function do you use?
Reply to discussion (solution)
function Unescape ($str) { $str = Rawurldecode ($STR); Preg_match_all ("/(?:%u.{4}) | #x. {4};|#\d+;|.+/u", $str, $r); $ar = $r [0];p Rint_r ($ar); foreach ($ar as $k + $v) { if (substr ($v, 0,2) = = "%u") $ar [$k] = Iconv ("
methods, such as the UTF-16 we talked about above, other UTF-8 we often use, and UTF-32.
Occasionally, we will see UCS-2 and UCS-4, what is their relationship with Unicode? In fact, Unicode Character Set, is called Unicode Character Set, English is Unicode character set, that is, UCOS, then the UCS-2 is to represent two bytes, that is, 16 bits to represent Unico
UNICODE: The encoding mechanism developed by unicode.org should include common texts all over the world.In 1.0, It is a 16-bit code, from u + 0000 to U + FFFF. each 2byte Code corresponds to one character. At the beginning of 2.0, the 16-bit limit was abandoned. The original 16-bit is used as the basic bit plane, and the 16-bit plane is added, which is equivalent to 20-bit encoding, the encoding range is 0 to 0x10ffff.
UCs:The universal character set defined in iso000046 according to ISO, which
I believe you must have met, open a Web page, but show a heap of like garbled, such as "бїяазъся", "????????"? Remember the message header fields Accept-charset, accept-encoding, Accept-language, content-encoding, content-language in HTTP? And that's what we're going to discuss next.
Directory:
1. Basic knowledge
2. Common character set and character encoding
2.1. ASCII character set encoding 2.2. Gbxxxx Character Set encoding 2.3. BIG5 Character Set encoding
3. The g
>
Unicode is commonly used in the UCS-2, it uses two bytes to encode a character, such as the Chinese character "warp" encoding is 0X7ECF, 0X7ECF converted to decimal is 32463,ucs-2 with two bytes to encode characters, 2 16 is equal to 65536, so ucs- 2 can encode a maximum of 65,536 characters. Encoding from 0 to 127 characters like ASCII-encoded characters, suc
Due to the working relationship, JNI must be used to call methods and transmit data between C ++ and Java programs. However, JNI used to work in an English environment and is encoded in Chinese (similar to other languages) I am not paying much attention to the problem. I recently took some time to study it and sorted out my experiences as follows for your discussion or reference.Before further discussion, we need to explain the following basic knowledge:
Inside Java, all string encodings use U
file and converts it from the 8th to the gb2312th file, and the output is directed to the bbb.txt file.
OverviewIconv is a library that uses Unicode as the intermediate code to convert various internal codes. It basically covers all the coding methods in the world, for example, ASCII, gb2312, GBK, gb18030, big5, UTF-8, UCS-2, UCS-2BE, UCS-2LE,
Unicode is currently widely used in UCS-2, it uses two bytes to encode a character, for example, the Chinese character "by" encoding is 0x7ecf, 0x7ecf to convert to decimal is 32463, the UCS-2 uses two bytes to encode characters. The power of 2 is equal to 65536, so the UCS-2 can encode up to 65536 characters. The characters encoded from 0 to 127 are the same as
character encoding schemes.
That is to say,Although each character can find a unique serial number (UNICODE Code) in the Unicode Character Set, the final byte stream is determined by the specific character encoding.. For example, the Unicode Character "a" is also encoded, the byte stream produced by the UTF-8 character encoding is 0x41, and the UTF-16 (large-end mode) gets 0x00 0x41.Common unicode encoding
UCS-2/UTF-16
How can we implement the BMP c
There are two organizations that develop Unicode encoding standards, one is ISO, one is a unified Code alliance consisting of multiple language software manufacturers.The universal Character Set UCS (Universal Character set) is a coding scheme developed by ISO, UCS-2 encoded with 2 bytes and UCS-4 encoded in 4 bytes.The Unicode Conversion format UTF (Unicode Tran
The following two methods are used to convert similar #54620, such fragments into normal textThe first method is to use the regular to find the fragments that need to be replaced, then loop, the second method uses Preg_replace_callback, the second method has less code and looks more elegant.Method One$test _str= "Zeng #54620, Li #44397; #50612;";Echo unescape ($test _str);function Unescape ($STR) {Preg_match_all ("/(?:%u.{4}) | #x. {4};|#\d+;|.+/u", $str, $r);$ar = $r [0];foreach ($ar as $k =
) {
$enc = ' UTF-8 ';
if (empty ($coding)) {
$coding = self:: $osPathEncoding;
}
$str = mb_convert_encoding ($str, $coding, $enc);
return $str;
}
Detection System Coding
At present, there is no suitable method, can only be placed in a Chinese file, and then cycle the use of different encoding detection, can read the file on the code is correct
protected static function _detectoscode () {
$codingFile = '/coded-encoding-os-path.html ';
$detectPath = __dir__. $codingFile;
$allCoding = Mb_list
Because of the working relationship, you need to use JNI to make method calls and data transfer between C + + and Java programs. But in the past always work in English environment, the Chinese (Other language coding empathy) problem is not too concerned about, recently smoked a little time to study, will be their own experience sorted as follows, for you to discuss or reference.
Before further discussion, there are a few basics to note:
Inside Java, all string encodings use Unicode, or
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.