MySQL Schema Design (5) Use Python to manage character sets

Source: Internet
Author: User

When we mention character sets, some people may say that the whole world is clean no matter the sky is split or utf8 is used. However, some character sets require more CPU, more memory and disk space, and even affect index usage. This does not include broken garbled characters. Obviously, it is necessary for us to take some time to understand the MySQL character set.
First, you can intuitively understand the character set used for display under each step:

# Three levels: DB, Table, Columnmysql> create database d charset utf8; Query OK, 1 row affected (0.04 sec) mysql> create table d. t-> (str varchar (10) charset latin1)-> default charset = utf8; Query OK, 0 rows affected (0.05 sec)

If no display is specified? How is MySQL set? Bind two channels:

① Default settings when creating an object

This is the default settings for layer-by-layer inheritance:
Server → DB → Table → Column
The top layer sets the default value for the bottom layer and the bottom layer can be discarded
No character set specified.
Display the specified character set, which can be discarded

② Settings for communication between the server and the client

When the client submits an SQL statement to MySQL, MySQL Server Always assumes that the character set of the client is character_set_client.
Then, the Server converts character_set_client to character_set_connection for SQL processing,
When the returned result set is sent to the client, the Server converts character_set_connection to character_set_result, and then returns

The three character sets involved above can be done once through set names


The conversion between character sets requires additional system overhead. How do I know? Explain extended + show warnings. So how can we avoid this implicit conversion? Here we introduce a method called "simple principle", as follows:
Select the appropriate character set for the server (or database), and then select the appropriate character set for certain columns based on the business.

Some surprises are implied in the MySQL Character Set, mainly including three:

① Interesting character_set_database

When character_set_database and character_set_server are different, the default Character Set of the database is determined by the latter.
You cannot directly modify csd or css to change csd, because the default Character Set of csd is the same as that of the library,
Changing the default Character Set of the library, csd changes, and css determines the default Character Set of the library.
Therefore, when the mysql instance is connected and no library is specified, the default character set is the same as that of css.

② Load data infile

The best practices are as follows:
Use library;
Set names Character set;
Start to load data;
In this case, the unified character set is used to avoid mixing "Character Set style"

③ Select into outfile

This action has not been transcoded!
(I) display character set
Table Structure:
mysql> desc sakila.actor;+-------------+----------------------+------+-----+-------------------+-----------------------------+| Field       | Type                 | Null | Key | Default           | Extra                       |+-------------+----------------------+------+-----+-------------------+-----------------------------+| actor_id    | smallint(5) unsigned | NO   | PRI | NULL              | auto_increment              || first_name  | varchar(45)          | NO   |     | NULL              |                             || last_name   | varchar(45)          | NO   | MUL | NULL              |                             || last_update | timestamp            | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |+-------------+----------------------+------+-----+-------------------+-----------------------------+4 rows in set (0.00 sec)

Database connection module:
[root@DataHacker ~]# cat dbapi.py#!/usr/bin/env ipython#coding = utf-8#Author: linwaterbin@gmail.com#Time: 2014-1-29import MySQLdb as dbapiUSER = 'root'PASSWD = 'oracle'HOST = '127.0.0.1'DB = 'sakila'conn = dbapi.connect(user=USER,passwd=PASSWD,host=HOST,db=DB)

View version:
[root@DataHacker ~]# ./show_charset.py --version1.0

View help:
[root@DataHacker ~]# ./show_charset.py -hUsage: show_charset.py [options] <arg1> <arg2> [<arg3>...]Options:  --version   show program's version number and exit  -h, --help  show this help message and exit  -d DB_NAME  Database name(leave blank is all Databases)  -t T_NAME   Table name (leave blank is all tabless)  -c C_NAME   Column name(leave blank is all columns)

What we want:
[root@DataHacker ~]# ./show_charset.py -d sakila -t actorsakila.actor.first_name:     utf8     utf8_general_cisakila.actor.last_name:     utf8     utf8_general_ci

Careful friends may have seen that the actor_id and last_update columns are not counted. Therefore, we can also make sure that only character-based values have the character set concept in 5.6, which is even more straightforward:
mysql> create table tt (str char(2) charset utf8);Query OK, 0 rows affected (0.20 sec)mysql> create table tt (str int(11) charset utf8);ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'charset utf8)' at line 1

Code:
[Root @ DataHacker ~] # Cat show_charset.py #! /Usr/bin/env pythonfrom optparse import OptionParserfrom dbapi import connimport MySQLdb # function 1: Input def parse_options () to the command line parameter (): parser = OptionParser (usage = "% prog [options] <arg1> <arg2> [<arg3>...] ", version = '1. 0',) parser. add_option ("-d", dest = "db_name", help = "Database name (leave blank is all Databases)") parser. add_option ("-t", dest = "t_name", help = "Table name (leave blank is all tabless)") parser. add_option ("-c", dest = "c_name", help = "Column name (leave blank is all columns)") return parser. parse_args () # main function implementation: display the character set def show_charsets (): query = "" select * from information_schema.columns where table_schema not in ('mysql', 'information _ scheme ') and character_set_name is not null "" # filter if options using the three if conditions. db_name: query + = "and table_schema = '% S'" % options. db_name if options. t_name: query + = "and table_name = '% S'" % options. t_name if options. c_name: query + = "and column_name = '% S'" % options. c_name # The default return value is a tuples. We use the attribute cursors. convert DictCursor to the dictionary cur = conn. cursor (MySQLdb. cursors. dictCursor) cur.exe cute (query) for record in cur. fetchall (): character_set_name = record ['character _ SET_NAME '] collation_name = record ['collation _ name'] print "% s. % s. % s: \ t % s "% (record ['table _ scheme'], record ['table _ name'], record ['column _ name'], character_set_name, collation_name) cur. close () # Use try-finally to close the database connection try: options, args = parse_options () show_charsets () finally: conn. close ()


(Ii) modify the character set of a column
View help:
[root@DataHacker ~]# ./modify.py -hUsage: modify.py schema_name.table_name.column_name new_charset_name [new_collate_name]Options:  --version   show program's version number and exit  -h, --help  show this help message and exit

What we want:
# Mysql> show create table testdb before modification. t \ G; *************************** 1. row *************************** Table: tCreate Table: create table 'T' ('id' int (11) default null, 'name' varchar (10) character set latin1 default null) ENGINE = InnoDB default charset = utf81 row in set (0.00 sec) # modify [root @ DataHacker ~] #. /Modify. py testdb. t. name gbksuccessfully executed: alter table testdb. t modify column name varchar (10) CHARSET gbk # after modification, mysql> show create table testdb. t \ G; *************************** 1. row *************************** Table: tCreate Table: create table 'T' ('id' int (11) default null, 'name' varchar (10) character set gbk default null) ENGINE = InnoDB default charset = utf81 row in set (0.01 sec)

The implementation code is as follows:
[Root @ DataHacker ~] # Cat modify. py #! /Usr/bin/env pythonimport MySQLdbfrom dbapi import * from optparse import OptionParser # The option value is omitted here. You only need to enter argsdef parse_options (): parser = OptionParser (usage = "\ n % prog schema_name.table_name.column_name new_charset_name [new_collate_name]", version = '1. 0',) return parser. parse_args () # main program def modify_column (): cur = conn. cursor (MySQLdb. cursors. dictCursor) v_ SQL = "" select * from information_schema.colum Ns where table_schema = '% s' and table_name =' % s' and column_name = '% S' "" % (schema_name, table_name, column_name) cur.exe cute (v_ SQL) row = cur. fetchone () # When row is null, the program requests to check whether the column exists if not row: print "please check schema_name.table_name.column_name whether exists? "Exit (1) column_type = row ['column _ type'] column_default = row ['column _ default'] is_nullable = (row ['is _ nullable'] = 'yes ') query = "alter table % s. % s modify column % s "% (schema_name, table_name, column_name, column_type) query ++ =" CHARSET % s "% new_charset if collation_supplied: query + = "COLLATE % s" % new_collation if not is_nullable: query + = "not null" if column_default: query + = "DEFAULT' % S' "% column_default try: alter_cur = conn. cursor () alter_cur.execute (query) print "successfully executed: \ n \ t % s" % query finally: alter_cur.close () cur. close () try: (options, args) = parse_options () if not 2 <= len (args) <= 3: print "Usage: schema_name.table_name.column_name new_charset_name [new_collate_name] "exit (1) column_tokens = args [0]. split (". ") if len (column_tokens )! = 3: print "column must in the following format: Drawing" exit (1) schema_name, table_name, column_name = column_tokens new_charset = args [1] collation_supplied = (len (args) = 3) if collation_supplied: new_collation = args [2] modify_column () finally: if conn: conn. close ()



By DataHacker2014-2-11Good Luck!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.