The words entered by Zhimeng Chinese word segmentation return garbled characters. what should I do? Zhimeng Chinese word segmentation & nbsp;, which clearly contains troubles & nbsp; and sorrow such words & nbsp ;, but the test input & nbsp; returns garbled characters & nbsp;. I don't know what is going on & nbsp ;, what should I do if there is such a code?
The Chinese word segmentation of Zhimeng makes it clear that there are troubles and sorrows in the dictionary. However, if the input is tested and garbled, I don't know what is going on, there is also such code else if ($ n> 0xA13F & $ n <0xAA40), where how does 0xA13F and 0xAA40 come from. Php Chinese Word Segmentation: 0xA13F? &&? $ N ?... 'Data-pics = '/img/2013/10/29/110454180 .png'>
------ Solution --------------------
If ($ n> 0xA13F & $ n <0xAA40) is a fullwidth symbol
He uses the gbk character set. if not, it will be garbled.
------ Solution --------------------
He first pre-processes the input string using the ReviseString method.
There are
// If Chinese characters
If (isset ($ str [$ I + 1]) {
$ C = $ str [$ I]. $ str [$ I + 1];
That is to say, he thinks that a Chinese character is composed of two bytes. this is the gbk encoding rule.
A non-ascii UTF-8 character can contain 2, 3, and 4... bytes.
The UTF-8 character consists of three bytes.
You only change the file content to UTF-8 without changing the processing rules.
Isn't it normal to see garbled characters?
------ Solution --------------------
Convert UTF-8 to gbk
After the call, convert the VIP card to UTF-8
This eliminates the need to study algorithms.