MySQL Schema設計(五)用Python管理字元集

來源:互聯網
上載者:User

一提字元集,可能有人會說,不管天崩地裂,全用utf8,整個世界都清淨了。但某些字元集是需要更多CPU、消費更多的記憶體和磁碟空間、甚至影響索引使用,這還不包括令人蛋碎的亂碼。可見、我們還是有必要花點時間略懂下MySQL字元集。
先直觀認識各階梯下顯示使用字元集:

# 囊括三個層級:DB、Table、Columnmysql> create database d charset utf8;Query OK, 1 row affected (0.04 sec)mysql> create table d.t    -> (str varchar(10) charset latin1)    -> default charset=utf8;Query OK, 0 rows affected (0.05 sec)

那如果沒有顯示指定?MySQL是如何設定?兵分兩路:
     
  ① 建立對象時的預設設定
        
     這是個逐層繼承的預設設定:
     Server → DB → Table → Column
     高層為底層設定預設值、底層可遵可棄
     沒有指定字元集、謂之可遵
     顯示指定字元集、謂之可棄
        
  ② 伺服器和用戶端通訊時的設定
        
     當用戶端提交一條SQL到MySQL時、MySQL Server總是假定用戶端字元集是character_set_client
     其後、Server把character_set_client轉為character_set_connection進行SQL處理、
     在返回結果集給用戶端時、Server又將character_set_connection轉為character_set_result、然後返回
        
     以上涉及的三個字元集、我們可以通過set names 一次搞定
        
     
字元集之間的相互轉換是需要額外的系統開銷的、如何知道?explain extended + show warnings 即可。那該如何盡量避免這種隱式轉換?這裡介紹一種被稱為"極簡原則"的方法、如下:
先為伺服器(或資料庫)選擇合適的字元集、然後依據業務、讓某些列選擇合適的字元集

在MySQL字元集中隱含了些意外驚喜、主要有三:

① 有趣的character_set_database
  
  當character_set_database和character_set_server不同時、庫的預設字元集由後者決定
  你不能直接修改csd、改變css就改變了csd、因為csd和庫預設字元集相同、
  改變庫預設字元集、csd就隨之改變、而css決定庫的預設字元集
  所以、當串連到mysql執行個體、又沒有指定庫時、預設字元集與css相同
  
② load data infile 
  
  進行此操作時、建議最佳實務如下:
  use 庫;
  set names 字元集;
  開始載入資料;
  這就使用統一字元集、避免混搭的"字元集style"
  
③ select into outfile 
  
  該行為沒有進行任何轉碼操作!
㈠ 顯示字元集
表結構:
mysql> desc sakila.actor;+-------------+----------------------+------+-----+-------------------+-----------------------------+| Field       | Type                 | Null | Key | Default           | Extra                       |+-------------+----------------------+------+-----+-------------------+-----------------------------+| actor_id    | smallint(5) unsigned | NO   | PRI | NULL              | auto_increment              || first_name  | varchar(45)          | NO   |     | NULL              |                             || last_name   | varchar(45)          | NO   | MUL | NULL              |                             || last_update | timestamp            | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |+-------------+----------------------+------+-----+-------------------+-----------------------------+4 rows in set (0.00 sec)

資料庫連接模組:
[root@DataHacker ~]# cat dbapi.py#!/usr/bin/env ipython#coding = utf-8#Author: linwaterbin@gmail.com#Time: 2014-1-29import MySQLdb as dbapiUSER = 'root'PASSWD = 'oracle'HOST = '127.0.0.1'DB = 'sakila'conn = dbapi.connect(user=USER,passwd=PASSWD,host=HOST,db=DB)

查看版本:
[root@DataHacker ~]# ./show_charset.py --version1.0

查看協助:
[root@DataHacker ~]# ./show_charset.py -hUsage: show_charset.py [options] <arg1> <arg2> [<arg3>...]Options:  --version   show program's version number and exit  -h, --help  show this help message and exit  -d DB_NAME  Database name(leave blank is all Databases)  -t T_NAME   Table name (leave blank is all tabless)  -c C_NAME   Column name(leave blank is all columns)

我們要的效果:
[root@DataHacker ~]# ./show_charset.py -d sakila -t actorsakila.actor.first_name:     utf8     utf8_general_cisakila.actor.last_name:     utf8     utf8_general_ci

細心的朋友或許已經看出actor_id與last_update這兩列並沒有被統計,由此我們也可以確定,只有基於字元的值才有字元集的概念在5.6更是直截了當:
mysql> create table tt (str char(2) charset utf8);Query OK, 0 rows affected (0.20 sec)mysql> create table tt (str int(11) charset utf8);ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'charset utf8)' at line 1

代碼:
[root@DataHacker ~]# cat show_charset.py#!/usr/bin/env pythonfrom optparse import OptionParserfrom dbapi import connimport MySQLdb# 函數一:命令列參數輸入def parse_options():  parser = OptionParser(usage="%prog [options] <arg1> <arg2> [<arg3>...]",version='1.0',)  parser.add_option("-d",dest="db_name",help="Database name(leave blank is all Databases)")  parser.add_option("-t",dest="t_name",help="Table name (leave blank is all tabless)")  parser.add_option("-c",dest="c_name",help="Column name(leave blank is all columns)")  return parser.parse_args()# 主功能實現:顯示字元集def show_charsets():  query="""select * from information_schema.columns            where table_schema not in ('mysql','INFORMATION_SCHEMA')              and character_set_name is not null"""  #三個if條件實現過濾  if options.db_name:    query += " and table_schema='%s'" % options.db_name  if options.t_name:    query += " and table_name='%s'" % options.t_name  if options.c_name:    query += " and column_name='%s'" % options.c_name  #預設傳回值形式是元組,我們通過屬性cursors.DictCursor轉為字典  cur = conn.cursor(MySQLdb.cursors.DictCursor)  cur.execute(query)  for record in cur.fetchall():    character_set_name = record['CHARACTER_SET_NAME']    collation_name = record['COLLATION_NAME']    print "%s.%s.%s:\t%s\t%s" % (record['TABLE_SCHEMA'],record['TABLE_NAME'],record['COLUMN_NAME'],character_set_name,collation_name)  cur.close()#採用try-finally形式關閉資料庫連接  try:    options,args = parse_options()    show_charsets()finally:    conn.close()


㈡ 修改列的字元集
查看協助:
[root@DataHacker ~]# ./modify.py -hUsage: modify.py schema_name.table_name.column_name new_charset_name [new_collate_name]Options:  --version   show program's version number and exit  -h, --help  show this help message and exit

我們要的效果:
#修改前mysql> show create table testdb.t\G;*************************** 1. row ***************************       Table: tCreate Table: CREATE TABLE `t` (  `id` int(11) DEFAULT NULL,  `name` varchar(10) CHARACTER SET latin1 DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.00 sec)#修改[root@DataHacker ~]# ./modify.py testdb.t.name gbksuccessfully executed:     alter table testdb.t modify column name varchar(10) CHARSET gbk#修改後mysql> show create table testdb.t\G;*************************** 1. row ***************************       Table: tCreate Table: CREATE TABLE `t` (  `id` int(11) DEFAULT NULL,  `name` varchar(10) CHARACTER SET gbk DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.01 sec)

實現代碼如下:
[root@DataHacker ~]# cat modify.py #!/usr/bin/env pythonimport MySQLdbfrom dbapi import *from optparse import OptionParser#這裡省略掉option值,只要求輸入argsdef parse_options():  parser = OptionParser(usage="\n%prog schema_name.table_name.column_name new_charset_name [new_collate_name]",version='1.0',)  return parser.parse_args()#主程式def modify_column():  cur = conn.cursor(MySQLdb.cursors.DictCursor)  v_sql = """          select * from information_schema.columns            where table_schema='%s'              and table_name='%s'             and column_name='%s'""" % (schema_name,table_name,column_name)  cur.execute(v_sql)  row = cur.fetchone()  #當row為null時,程式請求檢查column是否存在  if not row:    print "please check schema_name.table_name.column_name whether exists ?"    exit(1)  column_type = row['COLUMN_TYPE']  column_default = row['COLUMN_DEFAULT']  is_nullable = (row['IS_NULLABLE'] == 'YES')  query = "alter table %s.%s modify column %s %s" % (schema_name,table_name,column_name,column_type)  query += " CHARSET %s" % new_charset  if collation_supplied:    query += "COLLATE %s" % new_collation  if not is_nullable:    query += "NOT NULL"  if column_default:    query += "DEFAULT '%s'" % column_default  try:    alter_cur = conn.cursor()    alter_cur.execute(query)    print "successfully executed:\n \t%s" % query  finally:    alter_cur.close()  cur.close()try:  (options,args) = parse_options()  if not 2<= len(args) <=3:    print "Usage: schema_name.table_name.column_name new_charset_name [new_collate_name]"    exit(1)  column_tokens = args[0].split(".")  if len(column_tokens) != 3:    print "column must in the following format: schema_name.table_name.column_name"    exit(1)    schema_name,table_name,column_name = column_tokens  new_charset = args[1]  collation_supplied = (len(args) == 3)  if collation_supplied:    new_collation = args[2]  modify_column()finally:  if conn:    conn.close()



By DataHacker2014-2-11Good Luck!

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.