: This article mainly introduces filtering utf8 characters that exceed three bytes, or non-utf8 characters. For more information about PHP tutorials, see.
Function filterUtf8 ($ str) {/* utf8 encoding table: * Unicode symbol range | UTF-8 encoding method * u0000 0000-u0000 007F | 0 xxxxxxx * u0000 0080-u0000 07FF | 110 xxxxx 10 xxxxxx * u0000 0800-u0000 FFFF | 1110 xxxx 10 xxxxxx 10 xxxxxx **/$ re = ''; $ str = str_split (bin2hex ($ str), 2); $ mo = 1 <7; $ mo2 = $ mo | (1 <6 ); $ mo3 = $ mo2 | (1 <5); // Three bytes $ mo4 = $ mo3 | (1 <4 ); // Four bytes $ mo5 = $ mo4 | (1 <3); // five bytes $ mo6 = $ mo5 | (1 <2 ); // six bytes for ($ I = 0; $ I <count ($ str); $ I ++) {if (hexdec ($ str [$ I]) & ($ mo) = 0) {$ re. = chr (hexdec ($ str [$ I]); continue;} // remove if (hexdec ($ str [$ I]) from 4 bytes and above) & ($ mo6) ==$ mo6) {$ I = $ I + 5; continue;} if (hexdec ($ str [$ I]) & ($ mo5) ==$ mo5) {$ I = $ I + 4; continue;} if (hexdec ($ str [$ I]) & ($ mo4) ==$ mo4) {$ I = $ I + 3; continue;} if (hexdec ($ str [$ I]) & ($ mo3) ==$ mo3) {$ I = $ I + 2; if (hexdec ($ str [$ I]) & ($ mo )) ==$ mo) & (hexdec ($ str [$ I-1]) & ($ mo) = $ mo )) {$ r = chr (hexdec ($ str [$ I-2]). chr (hexdec ($ str [$ I-1]). chr (hexdec ($ str [$ I]); $ re. = $ r;} continue;} if (hexdec ($ str [$ I]) & ($ mo2) ==$ mo2) {$ I = $ I + 1; if (hexdec ($ str [$ I]) & ($ mo) ==$ mo) {$ re. = chr (hexdec ($ str [$ I-1]). chr (hexdec ($ str [$ I]) ;}continue ;}return $ re ;}
The above describes how to filter UTF-8 characters that exceed three bytes, or non-utf8 characters, including some content, and hope to be helpful to PHP tutorials.