This is a creation in Article, where the information may have evolved or changed.
Unicode resources are very valuable, and emoji's expression is really more and more, then how is emoji encoded?
In the ordinary chat software, such as QQ, some basic expressions are the use of ordinary characters escaped expression. For example, when detected in a string [微笑] , it is automatically replaced with an emoticon.
But emoji is a special character that is really encoded into Unicode characters. It occupies part of the range of u+1f300 to U+1f9ef in the character set.
But in order to achieve a rich emoji, this expression does not necessarily occupy only one character, especially in order to achieve emoji neutrality, it takes 1-2 characters to complete an expression, up to 7 characters.
Now use a snippet of code to illustrate:
123456789101112131415 |
PackageMainImport "FMT"Import "Unicode/utf8" func main() {strs: = []string{' Golang enough waves ',' A ',``,``,`?`,`?? `,`?? `,`?? ♂️ ',`???? `} for_, str: =RangeSTRs {fmt. Printf ("%s, Rune count:%d, len:%d\n", str, UTF8. Runecountinstring (str),Len(str)) for_, Therune: =Rangestr {FMT. Printf ("%s:0x%x |",string(Therune), Therune)}fmt. Print ("\ n------\ n")}} |
Run in playground to see the results
Let's analyze each piece of content piecemeal.
The string Golang够浪 has a total of 8 characters, of which Golang six characters, each character occupies 1 bytes of space in the UTF8, while each character 够浪 in the word occupies 3 bytes of space.
aThis most basic content, only occupies 1 bytes in the UTF8, because they are too common, the Huffman code idea, also must use the short coding way.
is a relatively common symbol, but not in the 128 most basic characters of 0~127, which occupies 2 bytes in UTF8.
This character is the Apple logo, other operating systems are not necessarily able to display correctly, relatively very uncommon, in the UTF8 occupies 3 bytes. In a Mac system, you only need to press ⇧ (SHIFT) + ⌥ (option) + at the same time K .
?is a monkey emoji expression that occupies 4 bytes in the UTF8. Most emoji expressions occupy 4 bytes.
??is a Chinese man wearing a Chinese hat, it needs to be a ? man wearing a Chinese hat, with the ? yellow color, put together, two characters together, a total of 8 bytes.
Similarly, ?? a black man wearing a Chinese hat, which needs to be made up of ? two characters with a Chinese hat and a man with a black complexion, with a ? total of 8 bytes.
??♂️The brown man with the hair cut, with ? a ? brown complexion, plus a 200d connector, plus the ♂ male symbol, and finally the u+fe0f symbol, cost 17 bytes. This version of the Mac and iOS system may also not show up, the new version of iOS will be displayed after the launch.
The longest and most emoji expression of neutrality is ???? that it represents 两个爸爸和两个儿子一家 . It consists of 7 characters, respectively? : u+1f468 Ordinary Man | : u+200d connector |? : u+1f468 Ordinary Man | : u+200d connector |? : u+1f466 Ordinary Boy | : u+200d connector |? : u+1f466 Ordinary Boy | Composition
Do you think it helps to understand the differences between the UTF8 and Unicode concepts? ~ Enjoy the emoji!?
Appendix
Emoji are displayed differently on different platforms, referring to emoji Unicode Tables, Apple's emoji implementation is generally the most respectful of the original.
Emoji over time in the continuous enrichment, according to the year, the version is also constantly the whole, reference emoji-versions
Unicode 10.0 uses a 1,182-character emoji identifier in 22 blocks, of which 1,085 are single emoji, 26 are distinguishing indicator symbols, paired combinations form a sign emoji, and 12 (#,* and 0-9) are the basic character emoji sequences of the key caps.
637 of the 768 code points in the miscellaneous symbols and pictograms are considered emoticons. 134 of the 148 code points in the supplemental symbols and pictograms are considered emoticons. All 80 code points in the emoji block are considered emoticons. 94 of the 107 code points in the transport and map symbol blocks are considered emoticons. 80 of the 256 code points in the miscellaneous symbol block are considered emoticons. 33 of the 192 code points in the Dingbats block are considered emoticons.
The so-called emoji neutrality is to prevent discrimination. expression should be non-racial, should be non-gender-specific, the same expression should have a variety of colors, there are men and women.
Reference documents
- Emoji-wikipedia
- Emoji expression transmission and preservation: processing of non-BMP-scoped Unicode characters
- Emoji Unicode Tables
- Emoji 5.0 Data
- Emoji Neutral
- mac--How to enter special characters such as ⌘, ⌥, ⇧, ⌃, ⎋ (link 1), (link 2)