Code of talkone instant messaging Development

Source: Internet
Author: User
Tags ruby on rails

The coding process has been very important for the past few days. Today, it is a complete solution. The background of talkone instant messaging is developed using Erlang, and front-end management uses Ruby on Rails, because it involves data communication issues, considering the performance or scalability, the two adopt JSON for data communication, Erlang adopts rfc4627, And the encoding is UTF-8, however, the first two 0 bytes cannot be directly decoded on the ROR side. The conversion of UTF-8 and Unicode is involved.

In theory, UTF-8 encoding characters can be up to 6 bytes long, but 16-bit BMP (Basic multilingual plane) characters can be up to 3 bytes long. Let's take a look at the UTF-8 encoding table:

U-00000000-U-0000007F: 0 xxxxxxx
U-00000080-U-000007FF: 110 XXXXX 10 xxxxxx
U-00000800-U-0000FFFF: 1110 XXXX 10 xxxxxx 10 xxxxxx
U-00010000-U-001FFFFF: 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U-00200000-U-03FFFFFF: 111110xx 10 xxxxxx 10 xxxxxx 10 xxxxxx
U-04000000-U-7FFFFFFF: 1111110x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx

 

Of course, the conversion from UTF-8 to Unicode is also completed by shift, is to pull out the binary number of the corresponding location of the UTF-8 format. In the preceding example, "you" is three bytes. Therefore, each byte is required for processing, from high to low. In the UTF-8 you are 11100100100,10111101, 10100000. Starting from the high position, the first byte 11100100 is to give out the "0100". This is very simple, as long as the sum of 11111 (0x1f) and (&), it can be learned from three bytes that the most in place is definitely before 12 bits, because each time we take six bits. Therefore, we also need to shift the result to 12 places left, and the highest bit will complete 000000. While the second digit is to give "111101", you only need to take the second byte 10111101 and 111111 (0x3f) and (&). After moving the obtained result to the left by 6 bits and the highest byte, the result is (|), and the second bits are completed. The obtained result is 000000. And so on, get and (&) from the last digit directly with 111111 (0x3f), and then get the result with the previous result or (|) to get the result 0100,111101, 100000

Therefore, Ruby can be used to convert utf8 to Unicode.Code
IRB (main): 023: 0> IRB
IRB #1 (main): 001: 0> A = 0xe4
=> 228
IRB #1 (main): 002: 0> A & 0x1f
=> 4
IRB #1 (main): 003: 0> Unicode = (A & 0x1f) <12
=> 16384
IRB #1 (main): 004: 0> Unicode | = (0xbd & 0x3f) <6
=> 20288
IRB #1 (main): 005: 0> Unicode | = (0xa0 & 0x3f)
=> 20320
IRB #1 (main): 006: 0>

the above is a simulated verification operation. After this operation, the JSON string is analyzed and matched by a regular expression. Note that there are three connections encoded in UTF-8, through the above conversion, the Unpack ("U") return characters are used to display normal Chinese characters on the page.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.