What is this character? The page displays space, base64_encode result: Hw =
Is there any way to completely filter out spaces like this? I have used iconv to convert it to gbk and then turned it back. I still cannot filter it.
$str = iconv('UTF-8', 'GBK//IGNORE', strip_tags($str));$str = iconv('GBK', 'UTF-8//IGNORE', $str)
Reply to discussion (solution)
/** Reserved character String. the Chinese character and partial character of UTF8 are reserved. * @ param String $ ostr * @ return String */function filter_utf8_char ($ ostr) {preg_match_all ('/[\ x {FF00}-\ x {FFEF} | \ x {0000}-\ x {00ff} | \ x {4e00}-\ x {9fff }] +/U ', $ ostr, $ matches); $ str = join ('', $ matches [0]); if ($ str = '') {// special characters must be processed individually. $ returnstr = ''; $ I = 0; $ str_length = strlen ($ ostr); while ($ I <= $ str_length) {$ temp_str = substr ($ ostr, $ I, 1); $ ascnum = Ord ($ temp_str); if ($ ascnum >=224) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 3); $ I = $ I + 3;} elseif ($ ascnum> = 192) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 2); $ I = $ I + 2;} elseif ($ ascnum >=65 & $ ascnum <= 90) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 1); $ I = $ I + 1;} elseif ($ ascnum >=128 & $ ascnum <= 191) {// Special Character $ I = $ I + 1;} else {$ returnstr = $ returnstr. substr ($ ostr, $ I, 1); $ I = $ I + 1 ;}}$ str = $ returnstr; preg_match_all ('/[\ x {FF00}-\ x {FFEF} |\x {0000}-\ x {00ff} | \ x {4e00}-\ x {9fff}] +/U ', $ str, $ matches); $ str = join ('', $ matches [0]);} return $ str ;}
Echo bin2hex (base64_decode ('HW = '));
1fUS (unit separator) unit separator
I have worked on underlying development (Assembly, C) for many years, and I am not familiar with it.
I have worked on underlying development (Assembly, C) for many years, and I am not familiar with it.
No wonder. it turned out to be a master. The assembly of ox B is a ox.
/** Reserved character String. the Chinese character and partial character of UTF8 are reserved. * @ param String $ ostr * @ return String */function filter_utf8_char ($ ostr) {preg_match_all ('/[\ x {FF00}-\ x {FFEF} | \ x {0000}-\ x {00ff} | \ x {4e00}-\ x {9fff }] +/U ', $ ostr, $ matches); $ str = join ('', $ matches [0]); if ($ str = '') {// special characters must be processed individually. $ returnstr = ''; $ I = 0; $ str_length = strlen ($ ostr); while ($ I <= $ str_length) {$ temp_str = substr ($ ostr, $ I, 1); $ ascnum = Ord ($ temp_str); if ($ ascnum >=224) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 3); $ I = $ I + 3;} elseif ($ ascnum> = 192) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 2); $ I = $ I + 2;} elseif ($ ascnum >=65 & $ ascnum <= 90) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 1); $ I = $ I + 1;} elseif ($ ascnum >=128 & $ ascnum <= 191) {// Special Character $ I = $ I + 1;} else {$ returnstr = $ returnstr. substr ($ ostr, $ I, 1); $ I = $ I + 1 ;}}$ str = $ returnstr; preg_match_all ('/[\ x {FF00}-\ x {FFEF} |\x {0000}-\ x {00ff} | \ x {4e00}-\ x {9fff}] +/U ', $ str, $ matches); $ str = join ('', $ matches [0]);} return $ str ;}
Will this regular expression not match the full-angle comma, comma, and other Chinese punctuation marks? Will the slashes and backslashes be filtered out?
/** Reserved character String. the Chinese character and partial character of UTF8 are reserved. * @ param String $ ostr * @ return String */function filter_utf8_char ($ ostr) {preg_match_all ('/[\ x {FF00}-\ x {FFEF} | \ x {0000}-\ x {00ff} | \ x {4e00}-\ x {9fff }] +/U ', $ ostr, $ matches); $ str = join ('', $ matches [0]); if ($ str = '') {// special characters must be processed individually. $ returnstr = ''; $ I = 0; $ str_length = strlen ($ ostr); while ($ I <= $ str_length) {$ temp_str = substr ($ ostr, $ I, 1); $ ascnum = Ord ($ temp_str); if ($ ascnum >=224) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 3); $ I = $ I + 3;} elseif ($ ascnum> = 192) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 2); $ I = $ I + 2;} elseif ($ ascnum >=65 & $ ascnum <= 90) {$ returnstr = $ returnstr. substr ($ ostr, $ I, 1); $ I = $ I + 1;} elseif ($ ascnum >=128 & $ ascnum <= 191) {// Special Character $ I = $ I + 1;} else {$ returnstr = $ returnstr. substr ($ ostr, $ I, 1); $ I = $ I + 1 ;}}$ str = $ returnstr; preg_match_all ('/[\ x {FF00}-\ x {FFEF} |\x {0000}-\ x {00ff} | \ x {4e00}-\ x {9fff}] +/U ', $ str, $ matches); $ str = join ('', $ matches [0]);} return $ str ;}
Will this regular expression not match the full-angle comma, comma, and other Chinese punctuation marks? Will the slashes and backslashes be filtered out?
No, because these are all printable characters.