C # conversion of Chinese characters to PinYin (support for multiphoneme characters ),
Previously, due to project requirements, a function of converting Chinese characters to Pinyin and first spelling was required in the middle for query. I felt that this function was basically mature, so I found the relevant code, first, we will see the following two articles:
1. C # converting Chinese characters to PinYin (all Chinese characters in the GB2312 character set are supported)
2. [dry goods] ultimate scheme for converting Chinese characters and Pinyin in JS, with a simple JS PinYin Input Method
Thanks to the two bloggers, the writing is complete and detailed. The source code is provided for your reference.
Taking into account the needs of the interface, I referred to the first article. the source code of the author in this article can basically meet the needs of converting Chinese characters to pinyin. For other special characters, I can also add a supplement to it, the disadvantage is that it does not support polyphonic words. Because it is necessary to support the query of polyphonic words, I have checked other articles and found that there are no existing articles (or I may search horizontally ). Later, I found that for converting Chinese characters to pinyin, Microsoft already providedMicrosoft Visual Studio International PackAnd very powerful. So I tried it.
First, reference the corresponding package in nuget
Find PinYinConverter
Simple demo
It's easy to use, as long as you directly use the ChineseChar class for installation and replacement.
1 string ch = Console.ReadLine();2 ChineseChar cc = new ChineseChar(ch[0]);3 var pinyins = cc.Pinyins.ToList();4 pinyins.ForEach(Console.WriteLine);
The result is as follows:
We can see that,LineOfHang, heng, xingThree. the phonetic symbols are also provided here, which is indeed very convenient. What I need is to inputBankAnd then convert to PinYinYinhang, yinheng, yinxing, The first fight isYh, yx. If you have the ChineseChar class, it's easy to do it.
Encapsulation of conversion from Chinese characters to PinYin
1. Split the input Chinese Characters
2. Then, use ChineseChar to obtain multiple pinyin characters for each Chinese character.
3. Remove the numbers, remove the duplicates, extract the first character, and then combine them.
So I wrote a help class for installation and replacement. The Code is as follows:
Public class PinYinConverterHelp {public static PingYinModel GetTotalPingYin (string str) {var chs = str. toCharArray (); // record the full spelling of each Chinese Character Dictionary <int, List <string> totalPingYins = new Dictionary <int, List <string> (); for (int I = 0; I <chs. length; I ++) {var pinyins = new List <string> (); var ch = chs [I]; // whether it is a valid Chinese character if (ChineseChar. isValidChar (ch) {ChineseChar cc = new ChineseChar (ch); pinyins = c C. Pinyins. Where (p =>! String. isNullOrWhiteSpace (p )). toList ();} else {pinyins. add (ch. toString ();} // removes the tone and converts it to lowercase pinyins = pinyins. convertAll (p => Regex. replace (p, @ "\ d ",""). toLower (); // deduplicated pinyins = pinyins. where (p =>! String. isNullOrWhiteSpace (p )). distinct (). toList (); if (pinyins. any () {totalPingYins [I] = pinyins;} PingYinModel result = new PingYinModel (); foreach (var pinyins in totalPingYins) {var items = pinyins. value; if (result. totalPingYin. count <= 0) {result. totalPingYin = items; result. firstPingYin = items. convertAll (p => p. substring (0, 1 )). distinct (). toList ();} else {// All-round loop match var newTotalPingYins = new List <string> (); foreach (var totalPingYin in result. totalPingYin) {newTotalPingYins. addRange (items. select (item => totalPingYin + item);} newTotalPingYins = newTotalPingYins. distinct (). toList (); result. totalPingYin = newTotalPingYins; // the first character loop matches var newFirstPingYins = new List <string> (); foreach (var firstPingYin in result. firstPingYin) {newFirstPingYins. addRange (items. select (item => firstPingYin + item. substring (0, 1);} newFirstPingYins = newFirstPingYins. distinct (). toList (); result. firstPingYin = newFirstPingYins ;}} return result ;}}
Call method:
Console. writeLine ("enter Chinese:"); string str = Console. readLine (); var pingyins = PinYinConverterHelp. getTotalPingYin (str); Console. writeLine ("All pinyin:" + String. join (",", pingyins. totalPingYin); Console. writeLine ("Subtitle:" + String. join (",", pingyins. firstPingYin); Console. writeLine ();
Result:
Currently, some uncommon words are supported, and some are too biased. However, for general Chinese characters to be converted into pinyin, the support for multiphonograph words is enough here.
Here we only useMicrosoft Visual Studio International PackIn this extended package, Chinese characters are converted into pinyin, and there are Chinese, Japanese, Korean, English and other language packs in the package, it also provides methods to implement powerful functions such as mutual conversion, acquisition, Word Count acquisition, and even pen and video number acquisition. If you are interested, you can query its api by yourself.
Source code sharing
Sharing is a virtue. Sometimes awesome articles can improve our technical level, but sometimes more needs are at the business level, the sharing of many small-knowledge applications can help us solve business-level problems. As long as the shared knowledge points are useful, we hope you can share them with others, even if they are all learning.
Finally, I will share the source code with you. If there are any errors or deficiencies, I hope to correct them.
Address: https://github.com/qq1206676756/PinYinParse