https://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql/
I sometimes hear: "Make everything utf-8 on your database, and all would be fine". This so-called advice could is further from the truth. Indeed, it'll take care of internationalization and code-page problems when you use UTF-8, but it comes with a price, WH Ich may is too high in the especially if you have never realized it ' s there. indexing is everything ... or at least.. Good indexing makes or breaks your database. The fact remains:the smaller your indexes, the more index records can is loaded into memory and the faster the searches W Ill be. So using small indexes pays off. Period. But what have got this to does with UTF-8?
First Off:beware of the VARCHAR
As you know, a VARCHAR field can hold a variable amount of data in which you have supply the maximum amount that's you can s Tore. So a VARCHAR (255) can hold 255 characters, if you store is only 5 characters, it would only use 5 characters of data. The other is not lost. This was completely different than using a CHAR (255) where storing a 5 character string results in padding of character S. So VARCHAR () had a big advantage over CHAR () if you had variable sized strings. But you had to realize the this advantage was for disk storage only. It does not apply to any other data structure this MySQL uses internally or for indexes.
How MySQL treats Varchars
When MySQL needs-to-sort records, it must create some space for sorting that data. This space allocation was done before the actual sorting takes place. This however, means the MySQL needs to know how much memory it needs to allocate. When we need to sort VARCHAR fields, MySQL would take care of the allocating the worst-case memory usage, which is the Maximum size a VARCHAR field can take. For Example:when declared a field as VARCHAR (+), MySQL would reserve space for the characters plus an Additiona L 1 or 2 bytes for holding the length of the string (1 when the length was 255 or less, 2 otherwise). So this would bust the myth that "you can safely use VARCHAR (255) for all fields without problems".
Characters and bytes:or the Utf8-problem
Did you notice this I talk about "characters" and "bytes"? That ' s because those, terms is not the same. A byte equals 8 bits, and can hold any number ranging from 0 through 255 (or-128..127, if you have read my and complement blog ). The size of a character however, depends on the character encoding used and here are where the UTF-8 "problem" kicks in. Back in the people stored strings in a latin1 charset, every character could being stored in a Te. Thus:varchar (would) is bytes (+1 for the length). But it's not enough-characters in the world (for instance, Arabic and Japanese characters cannot being stored In Latin1). That's why UTF-8 can with multiple bytes for some characters. The "standard" characters would be stored in 1 byte so most UTF8 strings is almost the same size as latin1 strings, but WH En you need different characters it can use up to 4 bytes per character. If you like to know more about UTF-8, there is excellent other blogs ABOut it.
You just has to realize this MySQL only uses a maximum of 3 bytes for UTF-8, which means no all utf-8 characters can is stored in MySQL, but most of the UTF-8 characters possible aren ' t used anyway. That's why it's might get confusing when reading upon UTF-8 this uses 4 bytes, and the 3 bytes that MySQL uses.
Let's define a table with an index:
CREATE TABLE' tbl ' (' ID ' )int(Ten) unsigned not NULLauto_increment, ' first_name 'varchar( -)character SetLatin1 COLLATE latin1_general_ci not NULL, ' last_name 'varchar( -)character SetLatin1 COLLATE latin1_general_ci not NULL, ' birth_date ' date not NULL, PRIMARY KEY(' id '),KEY' first_name ' (' first_name ')) ENGINE=MyISAMDEFAULTCHARSET=Latin1
This is creates a simple table with a primary index on ID and only an index on ' first_name '. You need-to-add at least 2 rows, otherwise the explain won't work correctly for this example. So add some data and find out what index would be used when issuing the following query:
SELECT * from WHERE like ' Joshua ';
Output:
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| 1 | Simple | TBL | Range | first_name | first_name | 102 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
1 row in Set (0.00 sec)
The most important field is the Key_len. This field is 102 bytes. Bytes for the VARCHAR (s), since it's encoded with latin-1. The additional 2 bytes here is the length-bytes.
Now, let's adjust the fields to UTF-8:
ALTER TABLE ' TBL ' change ' first_name ' VARCHAR' CHARACTER SETnot NULL;
SELECT * from WHERE like ' Joshua ';
Output:
+----+-------------+-------+-------+---------------+------------+---------+--- ---+------+-------------+
| ID | select_type | table | type | possible_ Keys | key | Key_len | ref | Rows | extra |
+----+-------------+-------+-------+---------------+------------+---------+-- ----+------+-------------+
| 1 | simple | tbl | Range | first_name | first_name | 302 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+------------+---------+-- ----+------+-------------+
1 row in Set (0.00 sec)
Immediately you should see the impact. The Key_len is bytes larger, which means so we can hold less index-records in memory, which means more disk reads WH Ich means a slower database.
But it doesn ' t stop at the indexes. As said, this limitation are for all internal buffers. All temporary sorting uses fixed length buffers and tables that is sorted in memory when using Latin1, could just as Easi Ly is moved to a temporary table on disk because of it ' s size. It would perform less efficient because of more disk reads and writes.
Conclusion:
MySQL and it ' s internal working can be insanely complex. It ' s important to never assume anything and test everything. Don ' t convert everything to UTF-8 just because. But do sure you have good reasons not to use a single-byte encoding like latin1. If you need to use the UTF-8 encoding and then make sure so you use the correct sizes. Don ' t everything VARCHAR (255) So at least you can store really long names. The penalties for "disrespecting" the database can and would be severe. :)
about using the UTF-8 fields in MySQL