Reprinted from: https://www.cnblogs.com/code123-cc/p/4822886.html
Recently when using Python to do projects, you need to convert Chinese characters into corresponding pinyin. A ready-made program was found on GitHub.
Python kanji to Pinyin
Examples of usage are as follows:
from pinyin import pinyintest = Pinyin () Test.load_word () Print test.hanzi2pinyin (string= ' Diaoyu is Chinese ') print Test.hanzi2pinyin_split (string= ' Diaoyu Islands are Chinese ', split= "-")
Output:
[' Diao ', ' Yu ', ' dao ', ' Shi ', ' zhong ', ' Guo ', ' de '] ' diao-yu-dao-shi-zhong-guo-de '
Where the Hanzi2pinyin function return value is a list, and the Hanzi2pinyin_split function returns a list when the split argument is empty, not NULL is the return string.
However, there are two problems with the procedure, the first is that English will be lost when the text is in English. The second is that the return value of Hanzi2pinyin_split is a list, a string, which makes people confused.
For example:
Test.hanzi2pinyin_split (string= ' Diaoyu Islands is China's code123 ', split= "")
The results we are looking for are:
U ' diaoyudaoshizhongguodecode123 '
But the actual result is:
U ' diaoyudaoshizhongguode '
For this reason, the following rewrite was made in the original program.
1.hanzi2pinyin function Modification
The original Hanzi2pinyin function:
def hanzi2pinyin (self, string= ""): result = [] If not isinstance (string, Unicode): string = String.decode (" Utf-8 ") for char in string: key = '%x '% ord (char) result.append (Self.word_dict.get (Key, Char). Split () [0][ : -1].lower ()) return result
The modified Hanzi2pinyin function:
def hanzi2pinyin (self, string= ""): result = [] If not isinstance (string, Unicode): string = String.decode (" Utf-8 ") for char in string: key = '%x ' percent ord (char) if not Self.word_dict.get (key): Result.append (char) C7/>else: result.append (Self.word_dict.get (Key, Char). Split () [0][:-1].lower ()) return result
The modified Hanzi2pinyin function prevents English from being lost in the case of mixed English and Chinese.
The 2.hanzi2pinyin_split function modifies the return value to a uniform string
The original Hanzi2pinyin_split function:
def hanzi2pinyin_split (self, string= "", Split= ""): result = Self.hanzi2pinyin (string=string) if split = = "": return result else: return split.join (Result)
The modified Hanzi2pinyin_split function (Hanzi2pinyin_split returns a string regardless of whether the split argument is empty):
def hanzi2pinyin_split (self, string= "", Split= ""): result = Self.hanzi2pinyin (string=string) #if split = = "": c2/># return result #else: return split.join (Result)
Python kanji converted into pinyin