C # conversion of Chinese characters to PinYin (support for multiphoneme characters ),

Source: Internet
Author: User

C # conversion of Chinese characters to PinYin (support for multiphoneme characters ),

Previously, due to project requirements, a function of converting Chinese characters to Pinyin and first spelling was required in the middle for query. I felt that this function was basically mature, so I found the relevant code, first, we will see the following two articles:

1. C # converting Chinese characters to PinYin (all Chinese characters in the GB2312 character set are supported)

2. [dry goods] ultimate scheme for converting Chinese characters and Pinyin in JS, with a simple JS PinYin Input Method

Thanks to the two bloggers, the writing is complete and detailed. The source code is provided for your reference.

Taking into account the needs of the interface, I referred to the first article. the source code of the author in this article can basically meet the needs of converting Chinese characters to pinyin. For other special characters, I can also add a supplement to it, the disadvantage is that it does not support polyphonic words. Because it is necessary to support the query of polyphonic words, I have checked other articles and found that there are no existing articles (or I may search horizontally ). Later, I found that for converting Chinese characters to pinyin, Microsoft already providedMicrosoft Visual Studio International PackAnd very powerful. So I tried it.

First, reference the corresponding package in nuget

Find PinYinConverter

Simple demo

It's easy to use, as long as you directly use the ChineseChar class for installation and replacement.

1             string ch = Console.ReadLine();2             ChineseChar cc = new ChineseChar(ch[0]);3             var pinyins = cc.Pinyins.ToList();4             pinyins.ForEach(Console.WriteLine);

The result is as follows:

We can see that,LineOfHang, heng, xingThree. the phonetic symbols are also provided here, which is indeed very convenient. What I need is to inputBankAnd then convert to PinYinYinhang, yinheng, yinxing, The first fight isYh, yx. If you have the ChineseChar class, it's easy to do it.

Encapsulation of conversion from Chinese characters to PinYin

1. Split the input Chinese Characters

2. Then, use ChineseChar to obtain multiple pinyin characters for each Chinese character.

3. Remove the numbers, remove the duplicates, extract the first character, and then combine them.

So I wrote a help class for installation and replacement. The Code is as follows:

 

Public class PinYinConverterHelp {public static PingYinModel GetTotalPingYin (string str) {var chs = str. toCharArray (); // record the full spelling of each Chinese Character Dictionary <int, List <string> totalPingYins = new Dictionary <int, List <string> (); for (int I = 0; I <chs. length; I ++) {var pinyins = new List <string> (); var ch = chs [I]; // whether it is a valid Chinese character if (ChineseChar. isValidChar (ch) {ChineseChar cc = new ChineseChar (ch); pinyins = c C. Pinyins. Where (p =>! String. isNullOrWhiteSpace (p )). toList ();} else {pinyins. add (ch. toString ();} // removes the tone and converts it to lowercase pinyins = pinyins. convertAll (p => Regex. replace (p, @ "\ d ",""). toLower (); // deduplicated pinyins = pinyins. where (p =>! String. isNullOrWhiteSpace (p )). distinct (). toList (); if (pinyins. any () {totalPingYins [I] = pinyins;} PingYinModel result = new PingYinModel (); foreach (var pinyins in totalPingYins) {var items = pinyins. value; if (result. totalPingYin. count <= 0) {result. totalPingYin = items; result. firstPingYin = items. convertAll (p => p. substring (0, 1 )). distinct (). toList ();} else {// All-round loop match var newTotalPingYins = new List <string> (); foreach (var totalPingYin in result. totalPingYin) {newTotalPingYins. addRange (items. select (item => totalPingYin + item);} newTotalPingYins = newTotalPingYins. distinct (). toList (); result. totalPingYin = newTotalPingYins; // the first character loop matches var newFirstPingYins = new List <string> (); foreach (var firstPingYin in result. firstPingYin) {newFirstPingYins. addRange (items. select (item => firstPingYin + item. substring (0, 1);} newFirstPingYins = newFirstPingYins. distinct (). toList (); result. firstPingYin = newFirstPingYins ;}} return result ;}}

 

Call method:

Console. writeLine ("enter Chinese:"); string str = Console. readLine (); var pingyins = PinYinConverterHelp. getTotalPingYin (str); Console. writeLine ("All pinyin:" + String. join (",", pingyins. totalPingYin); Console. writeLine ("Subtitle:" + String. join (",", pingyins. firstPingYin); Console. writeLine ();

Result:

Currently, some uncommon words are supported, and some are too biased. However, for general Chinese characters to be converted into pinyin, the support for multiphonograph words is enough here.

Here we only useMicrosoft Visual Studio International PackIn this extended package, Chinese characters are converted into pinyin, and there are Chinese, Japanese, Korean, English and other language packs in the package, it also provides methods to implement powerful functions such as mutual conversion, acquisition, Word Count acquisition, and even pen and video number acquisition. If you are interested, you can query its api by yourself.

Source code sharing

Sharing is a virtue. Sometimes awesome articles can improve our technical level, but sometimes more needs are at the business level, the sharing of many small-knowledge applications can help us solve business-level problems. As long as the shared knowledge points are useful, we hope you can share them with others, even if they are all learning.

Finally, I will share the source code with you. If there are any errors or deficiencies, I hope to correct them.

Address: https://github.com/qq1206676756/PinYinParse

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.