Problem Description:
Example: Woshidewenfensi = Wo shi de wen fen si
Woshidewenfensi this pinyin may be entered when there is a separation such as: Woshi Dewen fensi or Woshi de wen Fensi and so on should eventually be converted to Wo Shi de wen fen si. Special symbols and punctuation are replaced with separators (spaces).
Workaround:
Use Python to write a script pinyin.py, so that when PHP is called, only the command line script path + parameters are used to return the results.
#!/usr/bin/env python import sys #vim: encoding=utf-8import sysfrom pprint Import pprintpy = set ([u ' gu ', u ' qiao ', U ' Qian ', U ' ge ', U ' gang ', U ' ga ', U ' lian ', U ' liao ', U ' rou ', U ' zong ', U ' tu ', U ' seng ', U ' ti ', u ' te ', U ' ta ', U ' Nong ', U ' Zhang ', u ' fan ', U ' Tuan ', U ' gua ', U ' die ', U ' gui ', U ' Guo ', u ' gun ', U ' sang ', U ' diu ', U ' tei ', u ' zi ', u ' ze ', u ' za ', U ' chen ', u ' zu ', U ' Ruo ', U ' Dian ', U ' Diao ', U ' nei ', u ' suo ', u ' Sun ', U ' Zhao ', U ' sui ', U ' Kuo ', U ' kun ', u ' Kui ', U ' Zhai ', U ' Zuan ', U ' Kua ', U ' bo ', U ' ning ', U ' lei ', U ' Neng ', U ' Men ', U ' mei ', U ' Geng ', U ' Chang ', U ' Shua ', U ' cha ', u ' che ', u ' fen ', U ' chi ', U ' fei ', U ' chu ', u ' shui ', U ' m E ', U ' ma ', U ' Mo ', U ' mi ', U ' mu ', u ' dei ', U ' cai ', U ' Zhan ', U ' Cao ', U ' can ', U ' den ', U ' Wang ', u ' Beng ', U ' zhuang ', U ' tan ', U ' ta O ', u ' Tai ', U ' eng ', U ' song ', U ' ping ', U ' hou ', U ' Cuan ', U ' \u0148g ', U ' lan ', U ' lao ', U ' fu ', u ' fa ', U ' Jiong ', U ' mai ', U ' Xiang ', U ' mao ', U ' fo ', U ' a ', U ' jiang ', u ' Kuang ', U ' Bing ', U ' su ', u ' si ', u ' sa ', u ' se ', u ' Zan ', U ' m\u0300 ', U ' Xuan ', U ' zei ', u ' ze n ', U ' kong ', U ' pang ', U 'Le ', u ' Jia ', U ' Jin ', U ' lo ', U ' lai ', U ' li ', U ' peng ', U ' lu ', u ' yi ', U ' ", u ' ya ', u ' cen ', U ' dan ', U ' dao ', U ' ye ', u ' din ', U ' c Ei ', u ' zhen ', U ' jiu ', U ' bang ', U ' nou ', U ' yu ', U ' Weng ', U ' wong ', U ' en ', u ' ei ', u ' kang ', U ' dia ', u ' er ', u ' ru ', U ' Keng ', U ' re ', u ' ren ', u ' Gou ', U ' ri ', u ' she ', U ' tian ', U ' Tiao ', u ' que ', u ' shi ', u ' shun ', U ' Shuo ', U ' Qun ', U ' xue ', u ' Yun ', U ' Xun ', U ' fi Ao ', U ' yue ', u ' ding ', U ' Zao ', U ' rang ', U ' XI ', U ' yong ', U ' Zai ', U ' guan ', U ' Guai ', U ' dong ', U ' Kuai ', U ' ying ', U ' Kuan ', U ' Xu ' , u ' xia ', U ' Xie ', U ' Yin ', u ' rong ', U ' xin ', U ' tou ', U ' Nian ', U ' Niao ', U ' Xiu ', U ' man ', U ' Kou ', U ' Niang ', U ' Hua ', U ' Chao ', U ' Hun ', u ' huo ', U ' hui ', U ' Shuan ', U ' quan ', U ' Shuai ', U ' Chong ', U ' bei ', U ' ben ', U ' dang ', U ' sai ', U ' ang ', U ' sao ', U ' san ', U ' re Ng ', U ' ran ', U ' Rao ', U ' Ming ', U ' l\u01dc ', U ' l\u01da ', U ' l\u01d8 ', U ' lie ', U ' lia ', U ' min ', U ' miao ', U ' mian ', U ' mie ', U ' Liu ' , u ' zou ', U ' miu ', U ' nen ', U ' kai ', U ' kao ', u ' kan ', u ' dai ', U ' ka ', u ' ke ', u ' yang ', u ' ku ', U ' Deng ', U ' dou ', U ' shou ', U ' Chuang ', U ' Nang ', U ' Feng ', U ' mEng ', U ' Cheng ', u ' di ', U ' de ', U ' da ', U ' gei ', u ' du ', u ' gen ', U ' qu ', u ' shu ', U ' sha ', U ' \u1e3f ', U ' ban ', U ' bao ', U ' bai ', U ' nu n ', u ' nuo ', u ' sen ', U ' kei ', U ' Fang ', U ' teng ', u ' lun ', U ' Luo ', U ' Ken ', U ' wa ', U ' wo ', U ' ju ', U ' tui ', U ' Wu ', u ' Jie ', U ' ji ', u ' Huang ', u ' Tuo ', U ' cou ', U ' la ', U ' mang ', U ' ci ', u ' tun ', U ' tong ', U ' ca ', U ' Pou ', U ' ce ', u ' gong ', U ' cu ', u ' DUI ', u ' dun ', U ' d UO ', u ' ting ', U ' qie ', U ' yao ', U ' yan ', U ' pi ', u ' po ', u ' Suan ', U ' Chua ', U ' chun ', U ' \u0148 ', U ' chui ', U ' Gao ', u ' gan ', U ' ao ', U ' gai ', U ' xiong ', U ' tang ', U ' n ', U ' Pian ', U ' Piao ', U ' Cang ', U ' heng ', U ' Xian ', U ' xiao ', u ' bian ', U ' Biao ', U ' Zhua ', U ' Duan ', U ' cong ', U ' Zhui ', U ' Zhuo ', U ' Zhun ', U ' hong ', u ' Shuang ', U ' Juan ', U ' Zhei ', U ' pai ', U ' shai ', U ' shan ', U ' Shao ', u ' pan ', U ' pa O ', U ' nin ', U ' nia ', U ' Hang ', U ' \u01f9g ', u ' nie ', u ' Zhuai ', U ' mou ', U ' Zhuan ', U ' yuan ', u ' niu ', U ' zhong ', U ' qi ', U ' lin ', U ' g Uang ', u ' nao ', U ' n\u01d8 ', U ' n\u01da ', U ' n\u01dc ', U ' hai ', U ' han ', U ' hao ', U ' wei ', U ' wen ', U ' Ruan ', U ' CuO ', U ' cun ', U ' cui ' , u ' bin ', U ' Bie ', U ' l\xfCe ', u ' shen ', U ' Shei ', U ' fou ', U ' xing ', U ' \u0144g ', U ' QIA ', U ' Qiang ', U ' Nuan ', U ' pen ', U ' pei ', u ' \u01f9 ', U ' rui ', U ' run ', U ' ba ', U ' sheng ', U ' Rua ', U ' bi ', u ' bu ', u ' Chuan ', U ' qing ', U ' Chuai ', U ' pu ', u ' o ', U ' Chou ', u ' ou ', u ' Zui ', U ' Luan ', U ' Zuo ', U ' jian ', u ' jiao ', u ' sou ', U ' wan ', U ' jing ', U ' Qiong ', U ' wai ', U ' long ', U ' pa ', U ' Liang ', U ' lou ', U ' Huan ', U ' hen ', u ' hei ', U ' Huai ', U ' n\xfce ', U ' \u0144 ', U ' Jue ', U ' shang ', U ' June ', U ' hu ', U ' HM ', U ' ling ', U ' ha ', u ' he ', U ' Zhu ', U ' CEng ', U ' Zha ', U ' zhe ', u ' zhi ', U ' qin ', U ' pin ', u ' ai ', U ' chai ', U ' chan ', U ' pie ', U ' Zeng ', U ' an ', U ' Qiu ', U ' ni ', U ' na ', u ' Zang ', U ' nai ', U ' nan ', U ' ne ', u ' ng ', U ' Chuo ', U ' tie ', U ' You ', U ' nu ', U ' zheng ', U ' leng ', U ' Zun ', U ' Zhou ', u ' lang ', U ' e ', U ' hng '] def pinyin_split (): Pinyin = sys.argv[1] L = len (pinyin) p = [l]* (l+1) P[l] = 0 s = [0]*l for i in Xrange (L-1,-1,- 1): For J in Xrange (L-i): if J = = 0 or pinyin[i:i+j+1] in py and p[i+j+1]+1
Example:
reference: http://www.dewen.io/q/1037/%E5%A6%82%E4%BD%95%E5%AE% 9e%e7%8e%b0%e5%b0%86%e8%bf%9e%e7%bb%ad%e7%9a%84%e6%b1%89%e8%af%ad%e6%8b%bc%e9%9f%b3%e5%88%86%e9%9a%94%e5%bc%80
Above the introduction of PHP will be a continuous separation of Hanyu Pinyin, including aspects of the content, I hope to teach PHP Friends who are interested in the process.