Many string segmentation methods have been looked up on the internet, and the UTF8 string is not correctly segmented to return an array of individual characters. After the analysis of the FTU8 encoding, the following method is written to divide the UTF8. I am testing available. This method only supports UTF8 encoding, and other encodings convert themselves to UT8 and then use.
$tempaddtext = "Http://www.jishubu.net php returns an array of single-word segmentation for UTF8 font strings";
$cind = 0; $arr _cont = Array (); for ($i = 0; $i < strlen ($tempaddtext); $i + +) { if (strlen (substr) ($tempaddtext, $cind , 1)) > 0) { if (Ord (substr ($tempaddtext, $cind, 1)) < 192) { if (substr ($tempaddtext, $cind, 1)! = "") {
array_push ($arr _cont, substr ($tempaddtext, $cind, 1)); } $cind + +; } ElseIf (Ord (substr ($tempaddtext, $cind, 1)) < 224) { Array_push ($arr _cont, substr ($tempaddtext, $cind, 2)); $cind +=2; } else { Array_push ($arr _cont, substr ($tempaddtext, $cind, 3)); $cind +=3; }}} Print_r ($arr _cont);
return Result:
Array ([0] = h [1] = = [2] = = [3] = = = [4] = = = [5] =/[6] =/[7] = w [8] = w [9] =& Gt w [Ten] =. [One] = j [K] = I [+] = s [+] and h [+] = u [+] = b [+] = u [+] =. [+] = n [+]-e [+] [+] = [+] = p [+] = h [+] = p [+] = [+] = U [+] = [28] + F [+] + 8 [+]-[[+] = [+] = [[+] = [] = [] = [+] = [+] = [+] = [+] = [+] = [[] = [] = [[] = [] = [[] = [] = [[]]