Two examples of getting the first letter of a Chinese character in php

Source: Internet
Author: User
Tags chr ord strlen

Example 1

The main functions are: clear functions, easy to modify, maintain, and expand; English strings: unchanged return (including numbers); Chinese Strings: return the first character of pinyin; Chinese-English string: returns the first and English characters of pinyin. This algorithm uses a binary search method to fix the error of reading the letter Z to Y. Good things need to be added to favorites, so we leave a mark here for future generations to study!

The code is as follows: Copy code

<? Php
/**
* Solution
* Tool for the first letter of Chinese character and Pinyin
* Note: The English string does not change and returns (including numbers) eg. abc123 => abc123
* Chinese character string: return the first character of pinyin. For example, test string => CSZFC
* Chinese-English mixed string: return the first character of the Chinese alphabet and the English eg. I j => WIWJ
* Eg.
* $ Py = new str2PY ();
* $ Result = $ py-> getInitials ('Ah, it's just like Ele. Me, I saw it. You know, it's just him uv, I want to be ');
*/
Class str2PY
{
Private $ _ pinyins = array (
176161 => 'A ',
176197 => 'B ',
178193 => 'C ',
180238 => 'D ',
182234 => 'e ',
183162 => 'F ',
184193 => 'G ',
185254 => 'h ',
187247 => 'J ',
191166 => 'K ',
192172 => 'L ',
194232 =>'m ',
196195 => 'N ',
197182 => 'O ',
197190 => 'P ',
198218 => 'Q ',
200187 => 'R ',
200246 =>'s ',
203250 => 'T ',
205218 => 'W ',
206244 => 'X ',
209185 => 'y ',
212209 => 'Z ',
);
Private $ _ charset = null;
/**
* Constructor, specifying the required encoding default: UTF-8
* UTF-8 and gb2312 supported
     *
* @ Param unknown_type $ charset
*/
Public function _ construct ($ charset = 'utf-8 ')
    {
$ This-> _ charset = $ charset;
    }
/**
* Chinese character string substr
     *
* @ Param string $ str
* @ Param int $ start
* @ Param int $ len
* @ Return string
*/
Private function _ msubstr ($ str, $ start, $ len)
    {
$ Start = $ start * 2;
$ Len = $ len * 2;
$ Strlen = strlen ($ str );
$ Result = '';
For ($ I = 0; $ I <$ strlen; $ I ++ ){
If ($ I >=$ start & $ I <($ start + $ len )){
If (ord (substr ($ str, $ I, 1)> 129) $ result. = substr ($ str, $ I, 2 );
Else $ result. = substr ($ str, $ I, 1 );
            }
If (ord (substr ($ str, $ I, 1)> 129) $ I ++;
        }
Return $ result;
    }
/**
* The string is partitioned into arrays (Chinese characters or characters in units)
     *
* @ Param string $ str
* @ Return array
*/
Private function _ cutWord ($ str)
    {
$ Words = array ();
While ($ str! = "")
         {
If ($ this-> _ isAscii ($ str) {/* non-Chinese */
$ Words [] = $ str [0];
$ Str = substr ($ str, strlen ($ str [0]);
} Else {
$ Word = $ this-> _ msubstr ($ str, 0, 1 );
$ Words [] = $ word;
$ Str = substr ($ str, strlen ($ word ));
            }
         }
Return $ words;
    }
/**
* Determines whether the character is an ascii character.
     *
* @ Param string $ char
* @ Return bool
*/
Private function _ isAscii ($ char)
    {
Return (ord (substr ($ char, 160) <);
    }
/**
* Determines whether the first three characters of a string are ascii characters.
     *
* @ Param string $ str
* @ Return bool
*/
Private function _ isAsciis ($ str)
    {
$ Len = strlen ($ str)> = 3? 3: 2;
$ Chars = array ();
For ($ I = 1; $ I <$ len-1; $ I ++ ){
$ Chars [] = $ this-> _ isAscii ($ str [$ I])? 'Yes': 'no ';
        }
$ Result = array_count_values ($ chars );
If (empty ($ result ['no']) {
Return true;
        }
Return false;
    }
/**
* Obtain the first character of a Chinese string in Chinese.
     *
* @ Param string $ str
* @ Return string
*/
Public function getInitials ($ str)
    {
If (empty ($ str) return '';
If ($ this-> _ isAscii ($ str [0]) & $ this-> _ isAsciis ($ str )){
Return $ str;
        }
$ Result = array ();
If ($ this-> _ charset = 'utf-8 '){
$ Str = iconv ('utf-8', 'gb2312', $ str );
        }
$ Words = $ this-> _ cutWord ($ str );
Foreach ($ words as $ word)
        {
If ($ this-> _ isAscii ($ word) {/* non-Chinese */
$ Result [] = $ word;
Continue;
            }
$ Code = ord (substr ($ word, 0, 1) * 1000 + ord (substr ($ word, 1, 1 ));
/* Get the A--Z of the first letter of pinyin */
If ($ I = $ this-> _ search ($ code ))! =-1 ){
$ Result [] = $ this-> _ pinyins [$ I];
            }
        }
Return strtoupper (implode ('', $ result ));
    }
Private function _ getChar ($ ascii)
    {
If ($ ascii >=48 & $ ascii <= 57 ){
Return chr ($ ascii);/* number */
} Elseif ($ ascii >=65 & $ ascii <= 90 ){
Return chr ($ ascii);/X A--Z */
} Elseif ($ ascii >=97 & $ ascii <= 122 ){
Return chr ($ ascii-32);/* a -- z */
} Else {
Return '-';/* other */
        }
    }

/**
* Search for the expected Chinese character inner code (gb2312) corresponding to the Pinyin character (bipartite)
     *
* @ Param int $ code
* @ Return int
*/
Private function _ search ($ code)
    {
$ Data = array_keys ($ this-> _ pinyins );
$ Lower = 0;
$ Upper = sizeof ($ data)-1;
$ Middle = (int) round ($ lower + $ upper)/2 );
If ($ code <$ data [0]) return-1;
For (;;){
If ($ lower> $ upper ){
Return $ data [$ lower-1];
            }
$ Tmp = (int) round ($ lower + $ upper)/2 );
If (! Isset ($ data [$ tmp]) {
Return $ data [$ middle];
} Else {
$ Middle = $ tmp;
   }
If ($ data [$ middle] <$ code ){
$ Lower = (int) $ middle + 1;
} Else if ($ data [$ middle] = $ code ){
Return $ data [$ middle];
} Else {
$ Upper = (int) $ middle-1;
            }
        }
    }
}
?>

Example 2

Take the asc range of the Chinese character and return the first letter of the Chinese character.

 

The code is as follows: Copy code
<Meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8"/>
<? Php
Function getfirstchar ($ s0 ){
$ Fchar = ord ($ s0 {0 });
If ($ fchar> = ord ("A") and $ fchar <= ord ("z") return strtoupper ($ s0 {0 });
$ S1 = iconv ("UTF-8", "gb2312", $ s0 );
$ S2 = iconv ("gb2312", "UTF-8", $ s1 );
If ($ s2 = $ s0) {$ s = $ s1;} else {$ s = $ s0 ;}
$ Asc = ord ($ s {0}) * 256 + ord ($ s {1})-65536;
If ($ asc >=- 20319 and $ asc <=-20284) return "";
If ($ asc >=- 20283 and $ asc <=-19776) return "B ";
If ($ asc >=- 19775 and $ asc <=-19219) return "C ";
If ($ asc >=- 19218 and $ asc <=-18711) return "D ";
If ($ asc >=- 18710 and $ asc <=-18527) return "E ";
If ($ asc >=- 18526 and $ asc <=-18240) return "F ";
If ($ asc >=- 18239 and $ asc <=-17923) return "G ";
If ($ asc >=- 17922 and $ asc <=-17418) return "I ";
If ($ asc >=- 17417 and $ asc <=-16475) return "J ";
If ($ asc >=- 16474 and $ asc <=-16213) return "K ";
If ($ asc >=- 16212 and $ asc <=-15641) return "L ";
If ($ asc >=- 15640 and $ asc <=-15166) return "M ";
If ($ asc >=- 15165 and $ asc <=-14923) return "N ";
If ($ asc >=- 14922 and $ asc <=-14915) return "O ";
If ($ asc >=- 14914 and $ asc <=- 14631) return "P ";
If ($ asc >=- 14630 and $ asc <=-14150) return "Q ";
If ($ asc >=- 14149 and $ asc <=-14091) return "R ";
If ($ asc >=- 14090 and $ asc <=-13319) return "S ";
If ($ asc >=- 13318 and $ asc <=-12839) return "T ";
If ($ asc >=- 12838 and $ asc <=-12557) return "W ";
If ($ asc >=- 12556 and $ asc <=-11848) return "X ";
If ($ asc >=- 11847 and $ asc <=-11056) return "Y ";
If ($ asc >=- 11055 and $ asc <=- 10247) return "Z ";
Return null;
}
 
 
Function pinyin1 ($ zh ){
$ Ret = "";
$ S1 = iconv ("UTF-8", "gb2312", $ zh );
$ S2 = iconv ("gb2312", "UTF-8", $ s1 );
If ($ s2 = $ zh) {$ zh = $ s1 ;}
For ($ I = 0; $ I <strlen ($ zh); $ I ++ ){
$ S1 = substr ($ zh, $ I, 1 );
$ P = ord ($ s1 );
If ($ p & gt; 160 ){
$ S2 = substr ($ zh, $ I ++, 2 );
$ Ret. = getfirstchar ($ s2 );
} Else {
$ Ret. = $ s1;
        }
    }
Return $ ret;
}
Echo "this is a Chinese string <br/> ";
Echo pinyin1 ('This is a Chinese string ');
 
?>
Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.