100 million or 1 billion data, easy seconds out
Real-time Monitoring field has two notable features, one is a lot of data sources and large data, from the surveillance cameras, GPS, smart devices, and so on, the second is the need for real-time processing. Our customers encounter this problem when they do real-time processing. A customer's data table has 1 billion records, hope that the statistical query response time within 30s, and the use of a single MySQL response time of more than 300s. Oneproxy's unique parallel query function reduces response time to less than 10s.
Second, parallel query premise of the sub-database sub-table
In the era of Internet +, the sub-database table has become one of the standard solutions in the industry. In many applications, data is incremented daily at the GB level, with several hotspot tables reaching dozens of g or hundreds of G. It is well known that there are many problems with large data sheets when used and maintained.
1) Some DDL operations, the response time is difficult to accept. Adding, modifying, and deleting data table fields takes too long (even a few days) and consumes a lot of Io, leading to an impact on the online business;
2) performance is difficult to improve. A large table can only be placed on one server, and the service capability of a single server is limited, which results in the performance of the system being subject to the ability of a single server;
3) statistical analysis, time consuming too long. Since MySQL itself does not support concurrency within a single SQL statement, a statistical analytic SQL statement is constrained by the speed of the full table scan, which is unacceptable, especially when the table cannot be placed entirely in memory.
For the above problem, a widely adopted solution is to split a large table into multiple small tables into several different databases, namely, the sub-database sub-table, 1.
Figure 1 Sub-Library sub-table
Third, the principle of parallel query
The implementation of Oneproxy parallel query relies on its own sub-Library table function. It greatly improves the response speed by using the built-in SQL parser, which overwrites SQL queries from the client into multiple queries with shard information and distributes them to different hosts in parallel execution, which significantly increases the responsiveness of 2.
Figure 2 Parallel query
Iv. deployment
1) More than one Linux host, recommended CENTOS6 64bit
2) Download the installation file (Http://www.onexsoft.com/download) from our website
3) Installation
A) Extract to the specified directory
MV oneproxy-rhel5-linux64*/usr/local/ Tar zxvf oneproxy-rhel5-linux64* |
B) Change the startup file
#!/bin/bash Export Oneproxy_home=/data/oneproxy If [-f ${oneproxy_home}/oneproxy.pid]; Then Kill-9 ' Cat ${oneproxy_home}/oneproxy.pid ' Fi Sleep 2 # Valgrind--leak-check=full--show-reachable=yes \ ${oneproxy_home}/oneproxy--proxy-address=:3307--proxy-extra-address=:3308 \ --proxy-master-addresses=192.168.1.119:[email protected] \ --proxy-user-list=test/[email protected] \ --proxy-part-tables=${oneproxy_home}/part.txt \ --proxy-charset=gbk_chinese_ci--proxy-found-rows \ --proxy-group-policy=default:master-only \ --event-threads=6--proxy-group-security=default:0 \ --log-file=${oneproxy_home}/oneproxy.log \ --pid-file=${oneproxy_home}/oneproxy.pid |
After the change
#/bin/bash # Export Oneproxy_home=/usr/local/oneproxy If [-f ${oneproxy_home}/oneproxy.pid]; Then Kill-9 ' Cat ${oneproxy_home}/oneproxy.pid ' Fi Sleep 2 # Valgrind--leak-check=full--show-reachable=yes \ ${oneproxy_home}/oneproxy--proxy-address=:3307--proxy-extra-address=:3308 \ --proxy-master-addresses=<ip1>:[email protected] \ --proxy-master-addresses=<ip2>:[email protected] \ --proxy-master-addresses=<ip3>:[email protected] \ --proxy-master-addresses=<ip4>:[email protected] \ --proxy-master-addresses=<ip5>:[email protected] \ --proxy-master-addresses=<ip6>:[email protected] \ --proxy-master-addresses=<ip7>:[email protected] \ --proxy-master-addresses=<ip8>:[email protected] \ --proxy-user-list=test/[email protected] \ --proxy-part-tables=${oneproxy_home}/hashpart.txt \ --proxy-charset=utf8_bin--proxy-found-rows \ --proxy-group-policy=data1:master-only \ --proxy-group-policy=data2:master-only \ --proxy-group-policy=data3:master-only \ --proxy-group-policy=data4:master-only \ --proxy-group-policy=data5:master-only \ --proxy-group-policy=data6:master-only \ --proxy-group-policy=data7:master-only \ --proxy-group-policy=data8:master-only \ --event-threads=6 \ --keepalive \ --log-file=${oneproxy_home}/oneproxy.log \ --pid-file=${oneproxy_home}/oneproxy.pid |
Comments:
A)--proxy-user-list=test/[email protected] The first test is the user name of the connection proxy, which is also the user name used to connect the backend database ; 1378f6cc3a8e8a43ca388193fbed5405982fbbd3 is the encrypted password of the test user; the last test is the test database in each database node.
B) password encryption. After you start Oneproxy, go to the management port, and then type passwd <string>.
mysql-uadmin-poneproxy-p4041--protocol=tcp passwd test |
The output is:
C) Sub-table file Hashpart.txt
[ { "Table": "BigTable", "Pkey": "id", "Type": "int", "Method": "Hash", "Partitions": 256, "Groups": ["Data1", "Data2", "Data3", "data4", "Data5", "Data6", "Data7", "Data8"] } ] |
|
Where table: Split tables; Pkey: Table column, type: Table column type, currently supported by INT and Char;method: Partitioning method, currently supported Hash,range,list,global
4). Install the data node and give the user the appropriate permissions.
A) MySQL installation method Please refer to the official installation documentation https://dev.mysql.com/doc/refman/5.6/en/linux-installation.html
You can use Onesql if you are interested in thread pool, SQL level current limit, and second kill solution.
: Www.onexsoft.com/download, the mysqld file can be extracted from the replacement system. See document for specific use: Http://www.onexsoft.com/OneSQL_Best_Practise.pdf.
B) Permission Assignment
Log on to the Linux host where each underlying database resides, and run the following command:
MYSQL-E "Grant all privileges on * * to [e-mail protected] '% ' identified by ' test '" |
5). Start
Check the startup output
2015-05-04 19:38:15: (critical) Plugin oneproxy5.4.1-20150425 started
2015-05-04 19:38:15: (critical) valid config checksum = 88429892
6). Check back-End Database node status
Go to management port (default port is 4041)
mysql-uadmin-poneproxy-p4041--protocol=tcp |
and then type
You can see the output as:
Status This column is up, which indicates that the cluster is healthy.
7). Create a table
Log in to the Oneproxy forwarding port
Mysql-h<oneproxyip>-p3307-utest-ptest |
DDL statements:
CREATE TABLE ' BigTable ' ( ' ID ' bigint () not NULL, ' Start ' datetime DEFAULT NULL, ' Icol3 ' int (one) DEFAULT NULL COMMENT ' 1-10 ', ' Icol4 ' int (one) DEFAULT NULL COMMENT ' 1-100 ', ' Icol5 ' int (one) DEFAULT NULL COMMENT ' 1-1000 ', ' Icol6 ' int (one) DEFAULT NULL, ' vcol7 ' varchar DEFAULT NULL, ' Vcol8 ' varchar DEFAULT NULL, ' VCOL9 ' varchar (+) DEFAULT NULL, ' vcol10 ' varchar DEFAULT NULL, PRIMARY KEY (' id ') ) Engine=innodb DEFAULT charset=latin1 |
Execute this statement and you will see an error: Error 1044 (42000): Access denied due to security policy, DDL disabled or DML restricted!
Explanation: The default is to disallow DDL statements because Oneproxy is for security reasons.
Workaround: From the Oneproxy management port, run the following command
Mysql-h<oneproxyip>-uadmin-poneproxy-p4041-e "Set gaccess data1 0;set gaccess data2 0;set gaccess data3 0;set Gaccess data4 0;set gaccess data5 0;set gaccess data6 0;set gaccess data7 0;set gaccess data8 0 " |
Log on to the Oneproxy forwarding port again, and the DDL command runs successfully.
After the above steps, 256 tables were created in 8 underlying databases, respectively, Bigtable_0, bigtable_1 ... bigtable_255.
Five, test data generation
Customers can use their own programs to generate test data, or they can use mydbtest software written by the civilian Software founder (Lou Fangxin). Mydbtest can not only generate test data, but also perform tests based on user-defined SQL statements. Program:
Http://www.onexsoft.com/software/mydbtest_linux64.zip
Documents can be found in the Baidu Cloud:
Http://pan.baidu.com/s/1mgJpukg#path=%252FOneSQL%252FDocument
Then find "MySQL Database test tool" to download. Please refer to the documentation for specific usage and detailed parameters.
This article uses mydbtest to do data loading tools, using methods and configuration files as follows
Nohup./mydbtest_linux64.bin query=insert.cnf degree=500 >/dev/null 2>&1 & |
Configuration file Insert.cnf
Option User test/[email protected]<oneproxyip>:3307:data1 Loop 100000000 Log Insert_%p.log Time 24h Declare vid bigseq 1 10000000000 vstart TIMESTAMP-10 vicol3 int 1 vicol4 int 1 vicol5 int 1 + vicol6 int 1 100000 Vvcol7 Char Vvcol8 char ( vvcol9 char) &NBSP;&NBSP ; Vvcol10 char Begin INSERT INTO bigtable (ID, start, icol3, Icol4, Icol5, Icol6, VCOL7, Vcol8 , VCOL9, VCOL10) Values (: Vid,: Vstart,: vicol3,: Vicol4,: Vicol5,: vicol6,: Vvcol7,: Vvcol8,: vvcol9,: Vvcol1 0); End |
Vi. Parallel Query Testing
Test environment:
OS |
Centos 6.3 x64 |
Cpu |
8Core 2.4GHZ |
Memory |
32G |
Disk |
RAID1 300G 7200r/min |
MySQL configuration:
Innodb_buffer_pool_size |
24G |
Max_connections |
4096 |
tcc_control_min_connections [Onesql] |
16 |
Test data:
Table name |
BigTable |
Number of tables |
256 |
Data node points |
8 |
Number of records |
1,401,817,594 |
Data File Size |
25G * 8 |
Test results
Test method |
Response time |
Select Count (*) from bigtable non-parallel |
286.38s |
Select/* Parrallel */count (*) from bigtable parallel |
8.23s |
Select Icol3,count (*) from BigTable GROUP by ICOL3; Non-parallel |
652.64s |
Select/*parallel*/Icol3,count (*) from BigTable GROUP by Icol3 parallel |
22.24s |
You can see that after parallelism, the COUNT (*) performance increases by about 35 times times.
Oneproxy using a distributed database cluster for big data analysis