In PHP ProgramWe are familiar with using the mb_convert_encoding () method for character encoding and conversion, and we are also using it in large quantities. In general, this method is also good enough and worthy of praise. However, in a project, we need to use it to convert utf8 to GBK, which is not a small problem when converting some special characters. The specific manifestation is that MB converts UTF-8 encoded characters and non-encoded characters in GBK into/0x00/0x80, in this way, the GBK character after conversion is problematic.
In our consciousness, in the process of character encoding conversion, if the target encoding cannot represent any character, what the transcoding program should do is discard this character, in this way, although some data is lost, the transcoded Character Sequence will not be unavailable. It is unclear why the above method is used instead of the discard method.
The temporary solution is to filter out the transcoded string sequence and filter out all the characters/x00/80; or filter the UTF-8 string before escaping, filter out all characters that can be expressed by ut8 and not represented by GBK. In terms of implementation difficulty, the first filtering method is easier.