JavaScript optimization: converting Chinese characters to PinYin

Source: Internet
Author: User

Similar examples are emerging on the Internet, but most of them remain unchanged: put all the words with the same pronunciation in one row, which corresponds to a pinyin line; search for the row where the Chinese characters are located during conversion, then read the pinyin corresponding to this line. Not to mention the efficiency, simply recording all Chinese characters is not a small space. Even if there are nearly 7000 commonly used Chinese characters, if it is to include GBK Chinese characters such as "yellow", it will exceed 20000, and there will be 40 K records alone.

Obviously, there is still much room for improvement. Consider the relevant attributes of Chinese characters. There are a total of 20902 Chinese characters in the GBK font, and the pronunciation of Chinese characters is a combination of initials, but only 400. The average pronunciation is about 50. If you can get all Chinese characters in the pinyin order from a-z, you only need to record the first 50 characters, just like the key frame. However, Unicode of Chinese characters is not arranged by pronunciation, so you must find a breakthrough.
Recall the relevant methods of the String class, which are related to the pronunciation order of Chinese characters. String. localeCompare! You have seen more or less examples of Chinese Character sorting. This method uses the local sequence of Chinese characters to sort them.

The local order is exactly the pinyin order! Here, it is suddenly clear. List all Chinese characters (0x4e00-0x9fa5) first, and then sort them in the local order, which is called the dictionary order.

<Script>
Var arr = [];
For (var I = 0x4e00; I <= 0x9fa5; I ++)
Arr [i-0x4e00] = I;
Arr = String. fromCharCode. apply (null, arr). split ("");
Arr. sort (function (a, B) {return a. localeCompare (B )});
Document. write (arr );
</Script>

But the efficiency is not ideal. Track the number of times a. localeCompare (B), which is about 0.53 million. Although not many, IE will have to run for 2 seconds (dual-core 2.5, IE6 ). Although it only needs to be run once, it is still not the best and still needs to be improved.

Here we will mention the number of Chinese characters: more than 20 thousand, but only 1/4 is used in practical applications. Obviously, there is no need to move all Chinese characters out. It is better to query the pinyin corresponding to each Chinese character at runtime. In this case, binary queries can naturally show up.

As we all know about the binary method, each time we take the half of the half, and then recursion, although the pronunciation range is more than 400, it can be determined only after 8 judgments. The only difference is that localeCompare is used for the judgment. Cache converted Chinese characters and then directly read them from the cache.

This is a success. Unfortunately, Opera and Chrome do not implement localeCompare as standard. They actually returned the Unicode difference (speechless) between the two)

Demonstration (there are many errors due to the problem of multi-tone and uncommon words ):

<! DOCTYPE html PUBLIC "-// W3C // dtd xhtml 1.0 Transitional // EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

Tip: the code can be modified before running!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.