[Golang] A complex Chinese coding problem

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Today in the online encounter a problem, think very interesting, help people answer.

Demand

Insert the Chinese data in the MySQL database encoded as Latin1, and the Latin1 encoded string is transcoded to GBK by another system and sent as text message content.

Simple version Solution

import (    "golang.org/x/text/encoding/charmap"    "golang.org/x/text/encoding/simplifiedchinese")func Convert(src string) (string, error) {    gbk, err := simplifiedchinese.GBK.NewEncoder().Bytes([]byte(src))    if err != nil {        return "", err    }    latin1, err := charmap.ISO8859_1.NewDecoder().Bytes(gbk)    if err != nil {        return "", err    }    return string(latin1), nil}

Analytical

Latin1 namely iso-8859-1, copy a section of introduction, see Baidu Encyclopedia

Because the ISO-8859-1 encoding range uses all the space within a single byte, the stream and storage of any other encoded byte stream in a system that supports iso-8859-1 is not discarded. In other words, it is no problem to treat any other coded byte stream as iso-8859-1 encoded. This is a very important feature, the MySQL database default encoding is Latin1 is the use of this feature. ASCII encoding is a 7-bit container, and the ISO-8859-1 encoding is a 8-bit container.

First of all, the principle of dealing with coding issues: ensure that both write and read use the same set of rules .

In accordance with this principle is in the middle of the database does not support the Chinese problem how to deal with, to first look at the other system how to read the data:
Transcode Latin1 encoded string to GBK as SMS content
So our mission is to:
Force transcoding to Iso-8859-1 and then into the database using the GBK encoding of the SMS content
Clear the task, the back is realized.

    1. Utf8->gbk,golang is UTF8 encoded, then the first transcoding GBK. One thing to note here is that the encoder.string () method cannot be used because it forces the encoded GBK byte stream to be decoded with the Golang built-in UTF8 decoder, and the resulting garbled String cannot be restored back to the original GBK byte stream.

    2. GBK byte throttle forced to iso-8859-1 byte stream, how to do it? is to do nothing ...

    3. Iso-8859-1 byte stream->utf8 string, I'm not quite sure how to commit in SQL []byte, then a conservative approach is to first iso-8859-1 transcoding to UTF8, and then the database driver will UTF8 back to iso-8859-1 commit.

Another point can be mentioned, because Iso-8859-1 does not support Chinese, so directly submit UTF8 Chinese, the database driver will directly replace the Chinese as?.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.