If I haven't written an article for a long time, do I want to write something? I will share my database design ideas, mainly aiming at the query optimization techniques for billions or more data on a single machine.
If you only perform simple queries without frequent write operations and the query speed is not limited to milliseconds, you do not need to design complex cluster relationships for large database software, distributed horizontal segmentation is not required.
You only need to use mysql to build a normal environment in the local notebook.
First, we made some common optimizations for mysql, such as table sharding, index creation, table field design, and mysql configuration optimization, such as the buffer size, detailed can see http://www.cnblogs.com/Bozh/archive/2013/01/22/2871545.html
. In fact, mysql horizontal table Sharding is also the practice of horizontal data segmentation. You only need to import different databases to different tables when the database is in the warehouse. For a large single database, such as a single database of the last 0.2 billion, in this case, you can split the table into two or three tables. In this case, a single table cannot exceed 0.1 billion.
In addition to these optimizations, my core design philosophy is to build an index and query code.
When designing a social engineering database, many people index all fields to be queried. For a database, the smaller the data volume of the database to be queried, the faster the query speed, in addition, indexes also occupy space. Therefore, we can perform operations on indexes to save a lot of hard disk space and optimize the query speed.
My approach is to query only the tables that may have the data that I want. There is certainly no data table that I want to directly do not query, and I do not need to create an index for fields that I do not need to query, in this way, we can achieve a small amount of queries and less index for many fields.
A simple example is as follows:
Table 1A and Table B. The fields in the AB table are the same. The field "password" is not empty for all files in the AB table.
2. The username field of Table A is not empty, and the username field of Table B is empty.
3. The email field in Table B is not empty, and the email field in Table A is empty.
In this case, you can design the two tables as follows:
1. Indexes are created for the password fields in Table A and Table B.
2. Create an index for the username field in table A. The username field in Table B is empty by default.
3. Create an index for the email field in Table B. The email field in Table A is not indexed and is blank by default.
OK, the table design is complete.
Then we can use this table design to reduce the amount of queries to optimize the query speed.
For indexed fields, the index information of all tables is saved in the statistics table of information_schema of mysql database, then we can use this table to filter out the tables where the fields we want to query are not indexed.
For example, I want to query information in the info Library mailbox for root@cnseay.com. So in this query I will lose Table A, because the e-mail field of Table A is empty, certainly no root@cnseay.com information I want to query.
How to filter out table. One SQL statement:
SELECT TABLE_NAME FROM information_schema.statistics WHERE INDEX_NAME! = 'Primary' and table_schema = 'info' and COLUMN_NAME = 'Email 'group BY TABLE_NAME;
With this statement, you can output a table with an index in the email field of the info database. Then, use the script to dynamically splice the union query statement for query.
My environment is USB3.0 2 TB mobile hard drive, laptop win7 i7-4700, 8 GB memory. The query speed of 2 billion million data records is 1 ~ 3 seconds.