The main library of Chinese characters to Pinyin is:
Pinyin Https://github.com/hotoo/pinyin
Pymethod https://github.com/a85816841/PotentialGragonSnail/tree/master/ql/lib/pinying
Poapinyin Https://github.com/leeeboo/POAPinyin
PINYIN4OBJC HTTPS://GITHUB.COM/KIMZIV/PINYIN4OBJC
Implementation principle:
- Pinyin is the first letter of the character portion of Unicode is extracted to the array , when the phonetic array [The Unicode value of Chinese characters -unicode the starting Chinese character value ] is directly obtained .
- Pymethod is to convert Unicode to GBK, and then according to the GBK high and low two values to determine the position of the corresponding pinyin to get pinyin
- Poapinyin is to make a table of all pinyin corresponding to the Chinese characters, and then to query (native convert method )
- The improved Quickconvert method is to get the upper and lower bounds of the Unicode value of a Chinese character, and then convert the table above into unicode-- Pinyin so that the query is hash lookup, faster, If this Unicode is not contiguous there will be a big problem ( the table is missing the word: "? g?i?k still????????????? X?z? {???? di She???? | Leng throw? Y- te???????????????????? Stuffed??? TB "). This function also skips some non-ASCII symbols. Another method Stringconvert fixed the issue of non-ASCII code. It is best to add the words mentioned above into the table.
Comparison:
- The size of the pinyin is minimal, and the Poapinyin statement is almost done .
- The speed is actually the same , but do not use the poapinyin the original convert, that each time traverse lookup is very slow .
- Contrast Pinyin can only obtain the first letter of the Chinese character corresponding to pinyin , Pymethod originally applied to stock query , its pinyin number is less than poapinyin.
For this Chinese character " ah", i pinyin input method is "en" hit out, Pymethod get is en, but poapinyin get is ng, Baidu Encyclopedia also read ng ....
PINYIN4OBJC is a highly efficient Chinese Pinyin class library that supports both simplified and traditional Chinese. The
has the following characteristics:
1. High efficiency, using data cache, after the first initialization, Ruby data into the file cache and memory cache, the conversion efficiency greatly improved;
2. Support custom formatting, pinyin case, etc.
3. Pinyin data complete, support Chinese Simplified and traditional, Compared with the popular items on the network, the data is very full and there is almost no problem of conversion error.
Performance Comparison: Compared with previous projects such as Pinyin,poapinyin and Pymethod, the PINYIN4OBJC speed is very fast, almost: 0.20145 seconds/1000 words