Characters and strings in the Go language

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Characters in the Go language

In the go language, there is no character type, character type is rune Type, Rune is Int32 's nickname.

Here is a simple program to demonstrate the character type:

package mainimport (    "fmt"    "reflect")func main() {    r := '我'    fmt.Printf("%q的类型为:%t 二进制为:%b\n", r, r, r)    rType := reflect.TypeOf(r).Kind()    fmt.Printf("r的实际类型为:%s\n", rType)}

Program output:

'我'的类型为:%!t(int32=25105) 二进制为:110001000010001r的实际类型为:int32

The go language natively supports Unicode, and I have a question: what is the length of Unicode characters?
Is there a problem with this question? In fact, it is a problem to ask. First of all, the basis of Unicode is a numbered character set, on the character set of the modular coding and other technical levels, a variety of specific coding forms are inconsistent. Therefore, strictly speaking, Unicode is not the "length" of the said, it is an abstract character, only Unicode encoding has a specific byte length. and different coding implementations, the length is not the same.

The total space currently planned for Unicode is 17 planes (planar 0 to 16) with 65,536 code points per plane. Our commonly used planar 0 ("basic multilingual plane", or "bmp") code points range from 0x0000 to 0xFFFF, which is not the whole of Unicode.

The BMP character is the most basic and most commonly used part of Unicode, with a UTF-16 encoding using 2 bytes and 1 to 3 bytes when encoding with UTF-8. Characters that exceed the BMP require 4 bytes for UTF-16 or UTF-8 encoding. There is also a less-used encoding, UTF-32, which requires 4 bytes to encode any Unicode character.

Strings in the Go language

The go language string is represented in two ways:

    • Double quotes, you can use escape characters, such ass := "Go语言字符串\n不能跨行赋值"
    • Anti-quote, the string is the same as the format in the anti-quote, the raw Type

      s := `Go原格式字符串    可以跨行`

The go language string is encoded and stored in UTF-8 format, with a simple example:

package mainimport (    "fmt")func main() {    s := "我"    fmt.Printf("s的类型为:%t, 长度为:%d, 16进制为:%x\n", s, len(s), s)    for i, b := range []byte(s) {        fmt.Printf("第%d个字节为:%b\n", i, b)    }}

Program output:

s的类型为:%!t(string=我), 长度为:3, 16进制为:e68891第0个字节为:11100110第1个字节为:10001000第2个字节为:10010001

The variable s holds the UTF-8 encoding of the string, and when you use the Len (s) function to get the length of the string, the UTF-8 encoded length of the string is obtained, and storing a character may take 2, 3, or 4 bytes, which is not fixed.

UTF-8 code follows the following 2 rules:

    • For a single-byte symbol, the first bit of the byte is set to 0, and the next 7 bits are the Unicode code for the symbol. So for the English alphabet, the UTF-8 encoding and ASCII code are the same.
    • For the N-byte notation (n>1), the first n bits are set to 1, the n+1 bit is set to 0, and the first two bits of the subsequent bytes are set to 10. The rest of the bits are not mentioned, all of which are Unicode codes for this symbol.

Based on these two simple rules, we can convert the UTF-8 code to Unicode code points:

//utf8转为unicode          1110 0110 1000 1000 1001 0001 // s               0110   00 1000   01 0001 // s utf8 -> unicode          0000 0000 0110 0010 0001 0001 // s utf8 -> unicode0000 0000 0000 0000 0110 0010 0001 0001 // r

We can see that the results above are consistent with the output of the program.

Reference

    • Character-coded notes: Ascii,unicode and UTF-8
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.