Use PHP to extract Chinese and English words and numbers. There is a need for a recent project to extract the first letter of each word in a massive dictionary of Chinese and English (including Arabic numerals 0-9: gannicus-G free-Z2B--E silly recently Project has a demand, in a Chinese and English (including Arabic numerals 0-9) massive dictionary, extract the first letter of each word:
Gannicus --> G
Free --> Z
2B --> E
Silly X --> S
The code is as follows:
Private function getfirstchar ($ s0 ){
$ S = iconv ('utf-8', 'gb2312', $ s0 );
If (ord ($ s0)> 128) {// starts with a Chinese character
$ Asc = ord ($ s {0}) * 256 + ord ($ s {1})-65536;
If ($ asc >=- 20319 and $ asc <=-20284) return "";
If ($ asc >=- 20283 and $ asc <=-19776) return "B ";
If ($ asc >=- 19775 and $ asc <=-19219) return "C ";
If ($ asc >=- 19218 and $ asc <=-18711) return "D ";
If ($ asc >=- 18710 and $ asc <=-18527) return "E ";
If ($ asc >=- 18526 and $ asc <=-18240) return "F ";
If ($ asc >=- 18239 and $ asc <=-17923) return "G ";
If ($ asc >=- 17922 and $ asc <=-17418) return "I ";
If ($ asc >=- 17417 and $ asc <=-16475) return "J ";
If ($ asc >=- 16474 and $ asc <=-16213) return "K ";
If ($ asc >=- 16212 and $ asc <=-15641) return "L ";
If ($ asc >=- 15640 and $ asc <=-15166) return "M ";
If ($ asc >=- 15165 and $ asc <=-14923) return "N ";
If ($ asc >=- 14922 and $ asc <=-14915) return "O ";
If ($ asc >=- 14914 and $ asc <=- 14631) return "P ";
If ($ asc >=- 14630 and $ asc <=-14150) return "Q ";
If ($ asc >=- 14149 and $ asc <=-14091) return "R ";
If ($ asc >=- 14090 and $ asc <=-13319) return "S ";
If ($ asc >=- 13318 and $ asc <=-12839) return "T ";
If ($ asc >=- 12838 and $ asc <=-12557) return "W ";
If ($ asc >=- 12556 and $ asc <=-11848) return "X ";
If ($ asc >=- 11847 and $ asc <=-11056) return "Y ";
If ($ asc >=- 11055 and $ asc <=- 10247) return "Z ";
} Else if (ord ($ s)> = 48 and ord ($ s) <= 57) {// start with a number
Switch (iconv_substr ($ s, 0, 1, 'utf-8 '))
{
Case 1: return "Y ";
Case 2: return "E ";
Case 3: return "S ";
Case 4: return "S ";
Case 5: return "W ";
Case 6: return "L ";
Case 7: return "Q ";
Case 8: return "B ";
Case 9: return "J ";
Case 0: return "L ";
}
} Else if (ord ($ s)> = 65 and ord ($ s) <= 90) {// starts with an uppercase letter
Return substr ($ s, 0, 1 );
} Else if (ord ($ s)> = 97 and ord ($ s) <= 122) {// start with lowercase English
Return strtoupper (substr ($ s, 0, 1 ));
}
Else
{
Return iconv_substr ($ s0, 'utf-8'); // A mix of Chinese and English words. it is not suitable for any situations, so extract the first character directly.
}
}
Legacy problems: there are still a small number of words that cannot be extracted. for example, G is not extracted
Final effect
Extract the first letter of each word from the massive dictionary (including the Arabic numerals 0-9): gannicus -- G free -- Z 2B--E silly...