about using the UTF-8 fields in MySQL

Last Update:2015-04-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

https://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql/

I sometimes hear: "Make everything utf-8 on your database, and all would be fine". This so-called advice could is further from the truth. Indeed, it'll take care of internationalization and code-page problems when you use UTF-8, but it comes with a price, WH Ich may is too high in the especially if you have never realized it ' s there. indexing is everything ... or at least.. Good indexing makes or breaks your database. The fact remains:the smaller your indexes, the more index records can is loaded into memory and the faster the searches W Ill be. So using small indexes pays off. Period. But what have got this to does with UTF-8?

First Off:beware of the VARCHAR

As you know, a VARCHAR field can hold a variable amount of data in which you have supply the maximum amount that's you can s Tore. So a VARCHAR (255) can hold 255 characters, if you store is only 5 characters, it would only use 5 characters of data. The other is not lost. This was completely different than using a CHAR (255) where storing a 5 character string results in padding of character S. So VARCHAR () had a big advantage over CHAR () if you had variable sized strings. But you had to realize the this advantage was for disk storage only. It does not apply to any other data structure this MySQL uses internally or for indexes.

How MySQL treats Varchars

When MySQL needs-to-sort records, it must create some space for sorting that data. This space allocation was done before the actual sorting takes place. This however, means the MySQL needs to know how much memory it needs to allocate. When we need to sort VARCHAR fields, MySQL would take care of the allocating the worst-case memory usage, which is the Maximum size a VARCHAR field can take. For Example:when declared a field as VARCHAR (+), MySQL would reserve space for the characters plus an Additiona L 1 or 2 bytes for holding the length of the string (1 when the length was 255 or less, 2 otherwise). So this would bust the myth that "you can safely use VARCHAR (255) for all fields without problems".

Characters and bytes:or the Utf8-problem

Did you notice this I talk about "characters" and "bytes"? That ' s because those, terms is not the same. A byte equals 8 bits, and can hold any number ranging from 0 through 255 (or-128..127, if you have read my and complement blog ). The size of a character however, depends on the character encoding used and here are where the UTF-8 "problem" kicks in. Back in the people stored strings in a latin1 charset, every character could being stored in a Te. Thus:varchar (would) is bytes (+1 for the length). But it's not enough-characters in the world (for instance, Arabic and Japanese characters cannot being stored In Latin1). That's why UTF-8 can with multiple bytes for some characters. The "standard" characters would be stored in 1 byte so most UTF8 strings is almost the same size as latin1 strings, but WH En you need different characters it can use up to 4 bytes per character. If you like to know more about UTF-8, there is excellent other blogs ABOut it.

You just has to realize this MySQL only uses a maximum of 3 bytes for UTF-8, which means no all utf-8 characters can is stored in MySQL, but most of the UTF-8 characters possible aren ' t used anyway. That's why it's might get confusing when reading upon UTF-8 this uses 4 bytes, and the 3 bytes that MySQL uses.

Let's define a table with an index:

CREATE TABLE' tbl ' (' ID ' )int(Ten) unsigned not NULLauto_increment, ' first_name 'varchar( -)character SetLatin1 COLLATE latin1_general_ci not NULL, ' last_name 'varchar( -)character SetLatin1 COLLATE latin1_general_ci not NULL, ' birth_date ' date not NULL, PRIMARY KEY(' id '),KEY' first_name ' (' first_name ')) ENGINE=MyISAMDEFAULTCHARSET=Latin1

This is creates a simple table with a primary index on ID and only an index on ' first_name '. You need-to-add at least 2 rows, otherwise the explain won't work correctly for this example. So add some data and find out what index would be used when issuing the following query:

SELECT *  from WHERE  like ' Joshua ';

Output:

The most important field is the Key_len. This field is 102 bytes. Bytes for the VARCHAR (s), since it's encoded with latin-1. The additional 2 bytes here is the length-bytes.

Now, let's adjust the fields to UTF-8:

ALTER  TABLE  ' TBL '  change  ' first_name '  VARCHAR'  CHARACTER   SETnot  NULL;

SELECT *  from WHERE  like ' Joshua ';

Output:

Immediately you should see the impact. The Key_len is bytes larger, which means so we can hold less index-records in memory, which means more disk reads WH Ich means a slower database.

But it doesn ' t stop at the indexes. As said, this limitation are for all internal buffers. All temporary sorting uses fixed length buffers and tables that is sorted in memory when using Latin1, could just as Easi Ly is moved to a temporary table on disk because of it ' s size. It would perform less efficient because of more disk reads and writes.

Conclusion:

MySQL and it ' s internal working can be insanely complex. It ' s important to never assume anything and test everything. Don ' t convert everything to UTF-8 just because. But do sure you have good reasons not to use a single-byte encoding like latin1. If you need to use the UTF-8 encoding and then make sure so you use the correct sizes. Don ' t everything VARCHAR (255) So at least you can store really long names. The penalties for "disrespecting" the database can and would be severe. :)

about using the UTF-8 fields in MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

about using the UTF-8 fields in MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

about using the UTF-8 fields in MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support