Environment: Win10 python3.6
First of all, the idea of arithmetic:
First, the local phonetic library (without tones) is established. Use the greedy algorithm to scan the string from left to right, match the string to the local phonetic library (provided here to everyone), and then continue scanning until the match is found to match or end. Repeat the process.
Here's the Python code:
defPinyin_or_word (String):" "judge A string is a pinyin or a 中文版 word.Pinyin_lib comes from a txt file." "String=String.Lower () Stringlen= Len(string) result=[] while True: i_list=[] forIinch Range(1, Stringlen+1):ifstring[0IinchPinyin_Lib:i_list.append (i)if Len(i_list)== 0:Print("This is an English word!" ") Temp_result=[] Break Else: Temp= Max(i_list) Result.append (string[0: temp]) string=String.Replace (string[0: temp],"') Stringlen= Len(string)ifStringlen== 0:# Print ("This is a pinyin!") ")# Print (Result) Break returnResultin [1]: Pinyin_or_word ("Woaizhongguo") out[1]: [' wo ',' AI ',' Zhong ',' Guo ']
Here I encapsulated a function: The argument is a string, output "pinyin + Pinyin Length" or determine English.
In fact, this algorithm is flawed:
① such as you enter an English word ' open ', will return pinyin ' o ' + ' pen '
② Although Judge Pinyin or words, but the main should be said to judge Pinyin, can not strictly judge the word, want to accurately judge, need to add word library.
About the 2nd this easy to repair, the 1th temporarily unexpected solution, if any hero can think of, also hope advice.
Python recognizes whether a string of letters is a phonetic or English word