Oneproxy of large data analysis using distributed database cluster

Source: Internet
Author: User
Tags documentation valgrind

100 million or 1 billion data, easy seconds out

Real-time Monitoring field has two notable features, one is a lot of data sources and large data, from the surveillance cameras, GPS, smart devices, and so on, the second is the need for real-time processing. Our customers encounter this problem when they do real-time processing. A customer's data table has 1 billion records, hope that the statistical query response time within 30s, and the use of a single MySQL response time of more than 300s. Oneproxy's unique parallel query function reduces response time to less than 10s.

Second, parallel query premise of the sub-database sub-table

In the era of Internet +, the sub-database table has become one of the standard solutions in the industry. In many applications, data is incremented daily at the GB level, with several hotspot tables reaching dozens of g or hundreds of G. It is well known that there are many problems with large data sheets when used and maintained.

1) Some DDL operations, the response time is difficult to accept. Adding, modifying, and deleting data table fields takes too long (even a few days) and consumes a lot of Io, leading to an impact on the online business;

2) performance is difficult to improve. A large table can only be placed on one server, and the service capability of a single server is limited, which results in the performance of the system being subject to the ability of a single server;

3) statistical analysis, time consuming too long. Since MySQL itself does not support concurrency within a single SQL statement, a statistical analytic SQL statement is constrained by the speed of the full table scan, which is unacceptable, especially when the table cannot be placed entirely in memory.

For the above problem, a widely adopted solution is to split a large table into multiple small tables into several different databases, namely, the sub-database sub-table, 1.

Figure 1 Sub-Library sub-table

Third, the principle of parallel query

The implementation of Oneproxy parallel query relies on its own sub-Library table function. It greatly improves the response speed by using the built-in SQL parser, which overwrites SQL queries from the client into multiple queries with shard information and distributes them to different hosts in parallel execution, which significantly increases the responsiveness of 2.

Figure 2 Parallel query

Iv. deployment

1) More than one Linux host, recommended CENTOS6 64bit

2) Download the installation file (Http://www.onexsoft.com/download) from our website
3) Installation

A) Extract to the specified directory

MV oneproxy-rhel5-linux64*/usr/local/

Tar zxvf oneproxy-rhel5-linux64*

B) Change the startup file

CD Oneproxy

Cat demo.sh

#!/bin/bash

Export Oneproxy_home=/data/oneproxy

If [-f ${oneproxy_home}/oneproxy.pid]; Then

Kill-9 ' Cat ${oneproxy_home}/oneproxy.pid '

Fi

Sleep 2

# Valgrind--leak-check=full--show-reachable=yes \

${oneproxy_home}/oneproxy--proxy-address=:3307--proxy-extra-address=:3308 \

--proxy-master-addresses=192.168.1.119:[email protected] \

--proxy-user-list=test/[email protected] \

--proxy-part-tables=${oneproxy_home}/part.txt \

--proxy-charset=gbk_chinese_ci--proxy-found-rows \

--proxy-group-policy=default:master-only \

--event-threads=6--proxy-group-security=default:0 \

--log-file=${oneproxy_home}/oneproxy.log \

--pid-file=${oneproxy_home}/oneproxy.pid

After the change

#/bin/bash

#

Export Oneproxy_home=/usr/local/oneproxy

If [-f ${oneproxy_home}/oneproxy.pid]; Then

Kill-9 ' Cat ${oneproxy_home}/oneproxy.pid '

Fi

Sleep 2

# Valgrind--leak-check=full--show-reachable=yes \

${oneproxy_home}/oneproxy--proxy-address=:3307--proxy-extra-address=:3308 \

--proxy-master-addresses=<ip1>:[email protected] \

--proxy-master-addresses=<ip2>:[email protected] \

--proxy-master-addresses=<ip3>:[email protected] \

--proxy-master-addresses=<ip4>:[email protected] \

--proxy-master-addresses=<ip5>:[email protected] \

--proxy-master-addresses=<ip6>:[email protected] \

--proxy-master-addresses=<ip7>:[email protected] \

--proxy-master-addresses=<ip8>:[email protected] \

--proxy-user-list=test/[email protected] \

--proxy-part-tables=${oneproxy_home}/hashpart.txt \

--proxy-charset=utf8_bin--proxy-found-rows \

--proxy-group-policy=data1:master-only \

--proxy-group-policy=data2:master-only \

--proxy-group-policy=data3:master-only \

--proxy-group-policy=data4:master-only \

--proxy-group-policy=data5:master-only \

--proxy-group-policy=data6:master-only \

--proxy-group-policy=data7:master-only \

--proxy-group-policy=data8:master-only \

--event-threads=6 \

--keepalive \

--log-file=${oneproxy_home}/oneproxy.log \

--pid-file=${oneproxy_home}/oneproxy.pid

Comments:

A)--proxy-user-list=test/[email protected] The first test is the user name of the connection proxy, which is also the user name used to connect the backend database ; 1378f6cc3a8e8a43ca388193fbed5405982fbbd3 is the encrypted password of the test user; the last test is the test database in each database node.

B) password encryption. After you start Oneproxy, go to the management port, and then type passwd <string>.

mysql-uadmin-poneproxy-p4041--protocol=tcp

passwd test

The output is:

C) Sub-table file Hashpart.txt

[

{

"Table": "BigTable",

"Pkey": "id",

"Type": "int",

"Method": "Hash",

"Partitions": 256,

"Groups": ["Data1", "Data2", "Data3", "data4", "Data5", "Data6", "Data7", "Data8"]

}

]

Where table: Split tables; Pkey: Table column, type: Table column type, currently supported by INT and Char;method: Partitioning method, currently supported Hash,range,list,global

4). Install the data node and give the user the appropriate permissions.

A) MySQL installation method Please refer to the official installation documentation https://dev.mysql.com/doc/refman/5.6/en/linux-installation.html

You can use Onesql if you are interested in thread pool, SQL level current limit, and second kill solution.

: Www.onexsoft.com/download, the mysqld file can be extracted from the replacement system. See document for specific use: Http://www.onexsoft.com/OneSQL_Best_Practise.pdf.

B) Permission Assignment

Log on to the Linux host where each underlying database resides, and run the following command:

MYSQL-E "Grant all privileges on * * to [e-mail protected] '% ' identified by ' test '"

5). Start

Sh./demo.sh

Check the startup output

Cat Oneproxy.log

2015-05-04 19:38:15: (critical) Plugin oneproxy5.4.1-20150425 started
2015-05-04 19:38:15: (critical) valid config checksum = 88429892

6). Check back-End Database node status

Go to management port (default port is 4041)

mysql-uadmin-poneproxy-p4041--protocol=tcp

and then type

List backend;

You can see the output as:

Status This column is up, which indicates that the cluster is healthy.

7). Create a table

Log in to the Oneproxy forwarding port

Mysql-h<oneproxyip>-p3307-utest-ptest

DDL statements:

CREATE TABLE ' BigTable ' (

' ID ' bigint () not NULL,

' Start ' datetime DEFAULT NULL,

' Icol3 ' int (one) DEFAULT NULL COMMENT ' 1-10 ',

' Icol4 ' int (one) DEFAULT NULL COMMENT ' 1-100 ',

' Icol5 ' int (one) DEFAULT NULL COMMENT ' 1-1000 ',

' Icol6 ' int (one) DEFAULT NULL,

' vcol7 ' varchar DEFAULT NULL,

' Vcol8 ' varchar DEFAULT NULL,

' VCOL9 ' varchar (+) DEFAULT NULL,

' vcol10 ' varchar DEFAULT NULL,

PRIMARY KEY (' id ')

) Engine=innodb DEFAULT charset=latin1

Execute this statement and you will see an error: Error 1044 (42000): Access denied due to security policy, DDL disabled or DML restricted!

Explanation: The default is to disallow DDL statements because Oneproxy is for security reasons.

Workaround: From the Oneproxy management port, run the following command

Mysql-h<oneproxyip>-uadmin-poneproxy-p4041-e "Set gaccess data1 0;set gaccess data2 0;set gaccess data3 0;set Gaccess data4 0;set gaccess data5 0;set gaccess data6 0;set gaccess data7 0;set gaccess data8 0 "

Log on to the Oneproxy forwarding port again, and the DDL command runs successfully.

After the above steps, 256 tables were created in 8 underlying databases, respectively, Bigtable_0, bigtable_1 ... bigtable_255.

Five, test data generation

Customers can use their own programs to generate test data, or they can use mydbtest software written by the civilian Software founder (Lou Fangxin). Mydbtest can not only generate test data, but also perform tests based on user-defined SQL statements. Program:

Http://www.onexsoft.com/software/mydbtest_linux64.zip

Documents can be found in the Baidu Cloud:

Http://pan.baidu.com/s/1mgJpukg#path=%252FOneSQL%252FDocument

Then find "MySQL Database test tool" to download. Please refer to the documentation for specific usage and detailed parameters.

This article uses mydbtest to do data loading tools, using methods and configuration files as follows

Nohup./mydbtest_linux64.bin query=insert.cnf degree=500 >/dev/null 2>&1 &

Configuration file Insert.cnf

Option

  User test/[email protected]<oneproxyip>:3307:data1

  Loop 100000000

  Log Insert_%p.log

  Time 24h

Declare

    vid bigseq 1  10000000000

    vstart TIMESTAMP-10

    vicol3 int 1

    vicol4 int 1

    vicol5 int 1 +

    vicol6 int 1 100000

    Vvcol7 Char

    Vvcol8 char (

    vvcol9 char)

&NBSP;&NBSP ;  Vvcol10 char

Begin

INSERT INTO bigtable (ID, start, icol3, Icol4, Icol5, Icol6, VCOL7, Vcol8 , VCOL9, VCOL10)

Values (: Vid,: Vstart,: vicol3,: Vicol4,: Vicol5,: vicol6,: Vvcol7,: Vvcol8,: vvcol9,: Vvcol1 0);

End

Vi. Parallel Query Testing

Test environment:

OS

Centos 6.3 x64

Cpu

8Core

2.4GHZ

Memory

32G

Disk

RAID1 300G 7200r/min

MySQL configuration:

Innodb_buffer_pool_size

24G

Max_connections

4096

tcc_control_min_connections [Onesql]

16

Test data:

Table name

BigTable

Number of tables

256

Data node points

8

Number of records

1,401,817,594

Data File Size

25G * 8

Test results

Test method

Response time

Select Count (*) from bigtable non-parallel

286.38s

Select/* Parrallel */count (*) from bigtable parallel

8.23s

Select Icol3,count (*) from BigTable GROUP by ICOL3; Non-parallel

652.64s

Select/*parallel*/Icol3,count (*) from BigTable GROUP by Icol3 parallel

22.24s

You can see that after parallelism, the COUNT (*) performance increases by about 35 times times.

Oneproxy using a distributed database cluster for big data analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.