utf8的資料庫,存入Emoji,會出錯
代碼如下 |
複製代碼 |
Incorrect string value: '\xF0\x9F\x98\x84\xF0\x9F...' for column 'content' |
錯誤的解決辦法:
代碼如下 |
複製代碼 |
4 byte Unicode characters aren't yet widely used, so not every application out there fully supports them. MySQL 5.5 works fine with 4 byte characters when properly configured – check if your other components can work with them as well.
Here's a few other things to check out:
Make sure all your tables' default character sets and text fields are converted to utf8mb4, in addition to setting the client & server character sets, e.g. ALTER TABLE mytable charset=utf8mb4, MODIFY COLUMN textfield1 VARCHAR(255) CHARACTER SET utf8mb4,MODIFY COLUMN textfield2 VARCHAR(255) CHARACTER SET utf8mb4; and so on.
If your data is already in the utf8 character set, it should convert to utf8mb4 in place without any problems. As always, back up your data before trying!
Also make sure your app layer sets its database connections' character set to utf8mb4. Double-check this is actually happening – if you're running an older version of your chosen framework's mysql client library, it may not have been compiled with utf8mb4 support and it won't set the charset properly. If not, you may have to update it or compile it yourself.
When viewing your data through the mysql client, make sure you're on a machine that can display emoji, and run a SET NAMES utf8mb4 before running any queries.
Once every level of your application can support the new characters, you should be able to use them without any corruption. |
總結就是,表結構改為支援4位元組的unicode,資料庫連接也用這個字元集哦,證明是可行的。
如果別的地方不支援,可以考慮去掉這些字元:
代碼如下 |
複製代碼 |
Since 4-byte UTF-8 sequences always start with the bytes 0xF0-0xF7, the following should work:
$str = preg_replace('/[\xF0-\xF7].../s', '', $str); Alternatively, you could use preg_replace in UTF-8 mode but this will probably be slower:
$str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str); This works because 4-byte UTF-8 sequences are used for code points in the supplementary Unicode planes starting from 0x10000. |