PHP Chinese character conversion to phonetic head principle

Source: Internet
Author: User
Tags final range return

 

In GB 2312, the received Chinese characters are "partitioned", with 94 characters/symbols per zone.    This representation is also called Location code.    1) Area 01-09 is a special symbol. 2) 16-55 is a first-level Chinese character, sorted by pinyin.      3) 56-87 is a two-level Chinese character, sorted by radical/stroke.    4) 10-15 and 88-94 regions are not encoded. In programs that use GB2312, the EUC storage method is usually used for compatibility with ASCII.    The "GB2312" on the browser's coded table usually refers to the "EUC-CN" notation. Each character and symbol is expressed in two bytes.    The first byte is called "High Byte" (also known as "area Byte"), and the second byte is called "Low byte" (also known as "bit Byte"). "High byte" uses 0xa1-0xf7 (the area code of area 01-87 plus 0xa0), "Low byte" uses 0xa1-0xfe (01-94 plus 0xa0). Since the first level of Chinese characters from the beginning of 16, the "High byte" range is 0xb0-0xf7, "low byte" range is 0xa1-0xfe, occupy the code bit is 72*94=6768.    5 of these vacancies are d7fa-d7fe. For example, the word "ah" is stored in most programs in two bytes, 0xb0 (the first byte) 0xa1 (the second byte).    Location Code = Region byte + bit byte (compared with location code: 0XB0=0XA0+16,0XA1=0XA0+1). Design:   Use an array to store the initial location code of different pronunciations of Chinese national standard: such as  static final int[] secposvaluelist = {     &N Bsp  1601, 1637, 1833, 2078, 2274, 2302, 2433, 2594, 2787, 3106, 3212, 3472,         3635, 3 722, 3730, 3858, 4027, 4086, 4390, 4558, 4684, 4925, 5249, 9999};    A data storage store the initial location code of the different pronunciations of the Chinese national standard corresponding to the pronunciation   static final char[] Firstletter = {       & nbsp;' A ', ' B ', ' C ', ' d ', ' e ', ' f ', ' g ', ' H ', ' J ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ',          ' Q ', ' R ', ' S ', ' t ', ' W ', ' x ', ' y ', ' z '};    treatment method: 1. Judge whether it is the mother of English 2. If yes, return directly to the English Mother 3. The encoded value of the character is 4. Compare the position of the encoded value in the Code table. 5. According to the position value, return the corresponding value in the master table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.