Query performance testing of views and redundant physical tables

Last Update:2016-05-27 Source: Internet

Author: User

Tags field table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If there are 30 tables, they each have their own custom fields, there are also some public fields; you need to get the public fields together. Unified query, generally there are two ways:
1. Public Field physical table data updates the data of its own table when it is in storage/update, and also updates the data into the public table
2. The view data updates the data of its own table when it is in storage/update, because the view is a logical table, so the query can be queried to update
Each of the two approaches has advantages and disadvantages:
1. public field mode
Advantages: The advantage of querying common tables, SQL is relatively simple, PG in the parsing and optimization of SQL time is relatively short, and the data in a table, do not need to scan multiple tables
Disadvantage: The app table and the Public field table must be updated at the same time, require transaction [transaction due to write pre-write log, etc., will reduce write performance]; If there is an index on the table, the index will need to be updated when it is added, also reduce the write performance, especially if the table is constantly getting larger
2. How the View works
Pros: Only need to update their app table, each app table data volume will be less than the total data volume, write dominant
Cons: Increased number of tables involving multiple table union all, slow query
--Of course, the above query refers to the use of public fields of the query, the app only use their own app table, so query performance is not affected by the above

So these two ways are similar to a Nanquan North leg, each with a good field, and not a rough comparison.
However, we are interested in the extent to which the query performance of the view is poor.

Environment: Single machine, 8 logic cores, 32G memory, CentOS7.0 operating system, PostgreSQL9.4
Table:

App table: tb_test_1/.../tb_test_30[A total of 30 tables, respectively, starttime,endtime and col[table id]_1/col[table id]_2/.../col[table id]_45 A total of 47 fields

As shown in the following:

Public Field table: Tb_test_common, with Startime,endtime and Col0_1/.../col0_40 (the first 40 fields for the app table), with a total of 42 fields
Views: V_evt, with Startime,endtime and Col0_1/.../col0_40 (the first 40 fields of the app table), with a total of 42 fields
App table 1 million data volume per table, 30 app tables, corresponding to a total of 30 million data volumes in the Tb_test_common

Indexes are indexed on StartTime and endtime of each table, and the speed of where queries are increased by StartTime and endtime "Of course, as mentioned earlier, indexes consume storage space on one hand and need to update index on the one hand. Reduce write Speed "

Let's look at the execution plan, and we can see clearly that the SELECT statement is Gone Bitmap index scan.

The data is ready, and we have designed 5 comparison methods, namely:

Execution results (the first execution results, the subsequent view query speed will be significantly faster, because there is an execution plan cache)

I use a form to show that it will be clearer:

	public Field Table	View
Where Condition filtering	0.0371432304382	0.379799127579
Filter conditions +order by limit 10	11.1989729404	44.5428109169
Aggregation functions	50.8744649887	50.5313501358
No filtering, full table order by take limit	0.122601985931	0.368887901306
Count	43.4130840302	67.3434300423

The above time is measured in seconds. We should probably have a perceptual understanding of performance.

= = Gorgeous Split line, here is the script to build the data ======================

#coding: Utf-8import psycopg2import tracebackimport randomimport timehost= "localhost" dbname= "username=" "PASSWORD=" "#数据库时间为2016-05-23 Day starttime = 1463932800endtime = 1464019199def create_tables (nums): arr = list () for I in range (0 , nums): tablename = "Tb_test_" +str (i) if i = = 0:tablename = ' tb_test_common ' sql = ' creat E table "+ tablename +" ("+" StartTime Integer, "+" Endtime integer, "colnum = #公 Total table Time field + 40 fields, normal app table Time field + 45 fields if I = = 0:colnum = $ for J in Range (1,colnum): SQL + =        "Col" +str (i) + "_" +str (j) + "varchar (1024x768)" If J! = colnum-1: SQL + = "," SQL + = ")" Arr.append (SQL) return ";".    Join (arr) #写一个一百万数据的文件def Write_million_data (filepath): print "Create data file start ..." f = open (filepath, "w") For I in Range (0,1000000): Curstart = Random.randint (starttime,endtime) f.write (str (curstart) + ",") F. WRite (str (random.randint (curstart,endtime)) + ",") for J in Range (0,45): F.write ("Hello") if J! = 44:f.write (",") f.write ("\ n") f.close () print "Create Data file End ..." Def Create_view (num s): sql = "CREATE View v_evt as" for I in Range (1,nums): SQL + = "Select Starttime,endtime," for J in Range (1,41): SQL + = "col" +str (i) + "_" +str (j) + "as col_0_" +str (j) if J! = 40:sql + = ", "SQL + =" from Tb_test_ "+str (i) +" "If I! = nums-1: SQL + =" UNION All "return Sqldef init_a LL (): filepath = "/tmp/view_data" Write_million_data (filepath) conn = Psycopg2.connect (host=host,dbname=dbname,us    Er=username,password=password) cur = conn.cursor () #新建31张表; the first one is the public field table, the other 30 is the app table Cur.execute (Create_tables (31)) Conn.commit () print "Create tables" #将数据copy入第1-30 app table and common table for I in range (1,31): Cur.copy_from (Open (file Path, "R"), "Tb_test_" +str(i), sep= ",") print "Copy data into Table" Conn.commit () for I in Range (1,31): Insql = "INSERT INTO Tb_            Test_common Select Starttime,endtime, "for J in Range (1,41): Insql + =" Col "+str (i) +" _ "+str (j) If J! = 40:insql + = ', ' insql + = ' from Tb_test_ "+str (i) Print Insql cur.execute (INS QL) Conn.commit () #在starttime和endtime字段建立索引 for I in Range (1,31): Cur.execute ("CREATE INDEX Index_tb_test_ "+str (i) +" _starttime on Tb_test_ "+str (i) +" (starttime) ") Cur.execute (" CREATE INDEX Index_tb_test_ "+str (i) +" _endime o N tb_test_ "+str (i) +" (Endtime)) Conn.commit () print "CREATE index on Index_tb_test_" +str (i) Cur.execute (" Create INDEX Index_tb_test_common_starttime on Tb_test_common (starttime) ") Cur.execute (" CREATE INDEX Index_tb_test_ Common_endime on Tb_test_common (endtime) ") Conn.commit () print" CREATE index on Tb_test_common "#建立视图 vsql = C Reate_view (Cur.execut)E (Vsql) conn.commit () print "CREATE View" #将数据写入40张表 conn.close () if __name__ = = ' __main__ ': #init_all () The advantage of the redundant table is that the query, and the disadvantage is to delete and change [need transactions, while the need to write two copies] "" The advantage of the view is to delete and change only need to change the original physical table, and the disadvantage is that the query "" "so the following test is redundant table of the long Duel view of the short '" "    , I index the fields Startime and Endtime fields for common and normal tables, and use it in the query condition "" but it is well known that indexing means that both insert and update need to update the index, so the write speed is affected by the ' #进行性能测试 conn = Psycopg2.connect (host=host,dbname=dbname,user=username,password=password) cur = conn.cursor () #第零组, verifying where condition Filter Speed stime = Time.time () cur.execute ("SELECT count (*) from Tb_test_common where starttime=1463932800") ETime =    Time.time () print "Query Tb_test_common, Waste time:", Etime-stime, "S;count:", Cur.fetchone () stime = Time.time () Cur.execute ("SELECT count (*) from v_evt where starttime=1463932800") ETime = Time.time () print "Query v_evt, wast E time: ", Etime-stime," S;count: ", Cur.fetchone () print" *********************************************************** * * "#第一组, verify filter conditions +orderBy limit 10 Speed stime = Time.time () cur.execute ("select * from Tb_test_common where starttime>=1463932800 and STA rttime<=1463936400 ORDER BY starttime desc ") etime = Time.time () print" Query Tb_test_common, Waste time: ", etime- Stime, "S;count:", Len (Cur.fetchall ()) stime = Time.time () cur.execute ("select * from V_evt where starttime>=146393 2800 and starttime<=1463936400 ORDER by starttime desc ") etime = Time.time () print" Query v_event, Waste time: ", E    Time-stime, "S;count:", Len (Cur.fetchall ()) print "*************************************************************" #第二组, verify the speed of the aggregate function stime = Time.time () cur.execute ("Select To_char (To_timestamp (starttime), ' Yyyy-mm-dd ') as Mydate,cou NT (*) as num from Tb_test_common where starttime>=1463932800 and starttime<=1463940000 group by mydate ORDER by Myda    Te desc ") etime = Time.time () print" Query Tb_test_common, Waste time: ", Etime-stime," S;count: ", Len (Cur.fetchall ())   Stime = Time.time () Cur.execute ("Select To_char" (To_timestamp (StartTime), ' Yyyy-mm-dd ') as Mydate,count (*) as num from v_evt where starttime& gt;=1463932800 and starttime<=1463940000 GROUP by mydate ORDER by mydate desc ") etime = Time.time () print" Query V_event, Waste time: ", Etime-stime," S;count: ", Len (Cur.fetchall ()) print" ****************************************** "#第三组, verify that no filtering is done, the full table order by returns the speed of limit stime = Time.time () cur.execute (" Select * from Tb_test_c Ommon ORDER BY starttime desc LIMIT ") ETime = Time.time () print" Query Tb_test_common, Waste time: ", Etime-stime, "S;count:", Len (Cur.fetchall ()) stime = Time.time () cur.execute ("SELECT * from V_evt ORDER BY starttime DESC LIMIT 10 0 ") ETime = Time.time () print" Query v_event, Waste time: ", Etime-stime," S;count: ", Len (Cur.fetchall ()) Print * * * * * "#第四组, verify the speed of Count stime = Time.time () cur.execute (" Sel ECT COUNT (*) from Tb_test_common ") ETime = Time.time () print" Query Tb_test_common, Waste time: ", Etime-stime," S;count: ", Cur.fetchone () stime = t Ime.time () Cur.execute ("SELECT count (*) from v_evt") ETime = Time.time () print "Query v_event, Waste time:", etime -stime, "S;count:", Cur.fetchone () Cur.close () Conn.close ()

With the result of a two-time query, you can see that the query performance of the view is improved significantly (because the query plan is cached)

Query performance testing of views and redundant physical tables

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More