正確的改變table字元集的語句是:
alter table xxx convert to character set utf8;
而不是想當然的:
alter table xxx default charset utf8;
字元集從GBK轉成utf8,
會增大欄位所佔用的空間,有可能會改變欄位的類型:
比如text有可能會自動變成medium text
但是varchar沒有自動變成medium text.
下面看測試:
建一張GBK的表:
mysql> SHOW CREATE TABLE xxx;
CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(255) NOT NULL DEFAULT '',
`body` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk
用想當然的辦法把它轉成utf8:
mysql> ALTER TABLE xxx DEFAULT charset=utf8;
Query OK, 0 ROWS affected (0.13 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql>
mysql> SET names utf8;
Query OK, 0 ROWS affected (0.00 sec)
插入測試資料:
mysql> INSERT INTO xxx SET title='我愛北京天安門';
Query OK, 1 ROW affected (0.06 sec)
成功了。。。然後您就認為萬事OK了?
mysql> INSERT INTO xxx SET title='㤇';
Query OK, 1 ROW affected, 1 warning (0.03 sec)
Warning (Code 1366): Incorrect string VALUE: 'xE3xA4x87' FOR COLUMN 'title' at ROW 1
注意 title和body欄位的CHARACTER SET 仍然為gbk
mysql> SHOW CREATE TABLE xxx;
| xxx | CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(255) CHARACTER SET gbk NOT NULL DEFAULT '',
`body` text CHARACTER SET gbk,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 |
正確的做法:
mysql> DROP TABLE xxx;
Query OK, 0 ROWS affected (0.02 sec)
mysql> CREATE TABLE `xxx` (
-> `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-> `title` VARCHAR(255) NOT NULL DEFAULT '',
-> `body` text,
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=gbk;
Query OK, 0 ROWS affected (0.06 sec)
mysql> ALTER TABLE xxx CONVERT TO CHARACTER SET utf8;
Query OK, 0 ROWS affected (0.14 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> INSERT INTO xxx SET title='㤇';
Query OK, 1 ROW affected (0.04 sec)
mysql> SHOW CREATE TABLE xxx;
| xxx | CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(255) NOT NULL DEFAULT '',
`body` mediumtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
可是你注意到 body欄位從text變成 mediumtext了嗎?
text欄位,最多儲存65535位元組,換成GBK的字元就是32767個字元,這32767個gbk字元轉成utf8卻要佔用98301位元組,已經超過text的儲存能力,所以被自動轉成了mediumtext。
下面測試VARCHAR的情況:
VARCHAR除了資料部分,還有1-2個位元組用來儲存資料的長度。如果只使用一個位元組,那麼長度上限為255(2^8-1),如果使用二個位元組,長度上限為65535(2^16-1)。
所以VARCHAR最多儲存65535位元組,換成GBK字元為32767個:
mysql> DROP TABLE xxx;
Query OK, 0 ROWS affected (0.02 sec)
mysql>
mysql> CREATE TABLE `xxx` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`title` VARCHAR(32768) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1074 (42000): COLUMN LENGTH too big FOR COLUMN 'title' (MAX = 32767); USE BLOB OR TEXT instead
mysql>
mysql>
可是 32767也是不成的。。還有每行記錄的總長度限制(不包括text和BLOB欄位) 65535:
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32767) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32766) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32765) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
ERROR 1118 (42000): ROW SIZE too LARGE. The maximum ROW SIZE FOR the used TABLE TYPE, NOT counting BLOBs, IS 65535. You have TO CHANGE SOME COLUMNS TO TEXT OR BLOBs
mysql> CREATE TABLE `xxx` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `title` VARCHAR(32764) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=gbk;
Query OK, 0 ROWS affected (0.06 sec)
32764 * 2 + 2 + 4 = 65534 最接近於65535了,
32765 * 2 + 2 + 4 = 65536 超過65535。
*2是因為gbk字元佔用2位元組。
+2是VARCHAR還需要額外2位元組儲存資料的長度。
+4是id欄位INT UNSIGNED佔了4位元組。
合理嗎?
轉換字元集為utf8之後,VARCHAR(32764) 已經不足以儲存 32764個utf8字元:
mysql> ALTER TABLE xxx CONVERT TO CHARACTER SET utf8;
ERROR 1074 (42000): COLUMN LENGTH too big FOR COLUMN 'title' (MAX = 21845); USE BLOB OR TEXT instead
21845 * 3 = 65535.
後面附一些關於mysql資料庫字元集修改方法
MySQL:修改預設字元集,轉換字元集(MySQL 5.X)
1. 修改 MySQL 資料庫預設字元集(mysql database default character set)
alter database testdb default character set = gb2312; -www.2cto.com-
2. 修改 MySQL 資料表預設字元集(mysql table default character set)
alter table ip_to_country default character set = gb2312;
注意:修改 MySQL 的預設字元集,不管是在資料庫層級,還是資料表層級,對已經儲存的字元資料無任何改變。只是新增的表或列,開始使用新的字元集。