SQL Tuning Tips: Statistical information (end-of-article benefits)

Source: Internet
Author: User

Click above "Asynchronous Community", select" Top Public number "

Technical dry, first time delivery

statistics are similar to scout soldiers in war, and if intelligence work is not done well, war will be lost. Similarly, if the statistics of the tables are not collected correctly, or if the statistics of the tables are not updated in a timely manner, the SQL execution plan will run off and SQL will have performance problems. statistics are collected to allow the optimizer to select the best execution plan and to query out the data in the table at the least cost.

The statistic information is divided into the statistic information of the table, the statistic information of the column, the statistic information of the index, the statistic information of the system, the statistic information of the data dictionary and the statistic information of the dynamic Performance View EOG table.

Statistical information about the system, statistics of the data dictionary, and the statistics of the Dynamic Performance View EOG Table This article does not discuss, this article mainly discusses the table statistic information, the column statistic information as well as the index statistic information.

Table statistics mainly include the total number of rows (num_rows) of the table, the number of blocks (blocks) of the table, and the average row length (Avg_row_len), and we can get the statistics of the table by querying the data dictionary dba_tables.

Now we create a test table t_stats.

We look at the statistics of tables that are commonly used in table t_stats.


Since T_stats is a newly created table and no statistics have been collected, the query data from Dba_tables is empty.

Now let's collect statistics for the table t_stats.

We look at the statistics for the table again.


From the query we can see that the table t_stats altogether has 72?674 rows of data, 1?061 data blocks, the average row length is 97 bytes.

The statistics for a column mainly include the cardinality of the column, the number of empty values in the column, and the data distribution of the column (histogram). We can view the statistics of the columns through the data dictionary dba_tab_col_statistics.

Now we look at the table t_stats commonly used column statistics.


In the above query, the first column represents the column name, the second column represents the column's cardinality, the third column represents the number of NULL values in the column, the fourth column represents the number of buckets for the histogram, and the last column represents the histogram type.

At work, we often use the following script to view the statistics of tables and columns.


The statistics for indexes mainly include index Blevel (index height-1), Number of leaf blocks (leaf_blocks), and cluster factor (Clustering_factor). We can view the statistical information of the index through the data dictionary dba_indexes.

We create an index on the object_id column.

Index statistics are automatically collected when the index is created, and the following script is run to view the index statistics.

In later chapters, we will detail the statistics of the table, the statistics of the columns, and how the statistical information of the index is applied to the costing.

Statistical information Important parameter setting

We typically use the following script to collect statistics about tables and indexes.

Ownname represents the owner of the table and is not case-sensitive.

TabName represents the table name, not case sensitive.

Granularity represents the granularity of the collection of statistics, which takes effect only on the partitioned table, which, by default, is auto, which means that Oracle determines how to collect statistics for partitioned tables based on the partition type of the table. For this option, we typically use auto, which is the default for the database, so this option is omitted from the script that follows.

The estimate_percent represents the sample rate, and the range is 0.000 001~100.

We typically do 100% samples for tables smaller than 1GB, because the table is small, even if the 100% sample speed is relatively fast. Sometimes the small table may have uneven data distribution, if there is no 100% sampling, it may lead to inaccurate statistics. We therefore recommend sampling the small table 100%.

We typically sample 50% of the table size in 1GB~5GB, and sample 30% for tables larger than 5GB. If the table is particularly large, with dozens of or even hundreds of gigabytes, we recommend that you partition the table first, and then collect the statistics for each partition separately.

In general, to ensure that statistics are accurate, we recommend that the sampling rate not be less than 30%.

We can use the following script to see the sample rate of the table.

From the above query we can see that the table T_stats is 100% samples. Now let's set the sample rate to 30%.


From the above query we can see that the sample rate is 30%, the Total row count of the table is estimated to be 73?067, and the total number of rows in the table is 72?674. Set the sampling rate of 30%, a total of analysis of 21?920 bar data, the table is equal to round (21?920*100/30), that is, 73?067.

Unless a table is a small table, it is not necessary to sample a table 100%. Because the table always has DML operations, the data in the table is always changing.

The method_opt is used to control the collection histogram policy.

Method_opt = ' For all columns size 1 '

Indicates that no histogram is collected for all columns, as shown below.

We look at the histogram information.


From the above query we see that all columns do not collect histograms.

Method_opt = ' For all columns size skewonly '

Represents the collection of automatic judgments for all columns in a table if the histogram is collected, as shown below.

We look at the histogram information as shown below.


From the above query we can see that in addition to the OBJECT_ID column and the Edition_name column, all the remaining columns collect histograms. Because the Edition_name column is all null, there is no need to collect histograms. OBJECT_ID column selectivity is 100%, there is no need to collect histograms.

Never use it in real work.

Method_opt = ' For all columns size skewonly '

Collects histogram information because not all columns in the table appear in the Where condition, and it does not make sense to collect histograms for columns that do not appear in the Where condition.

Method_opt = ' For all columns size auto '

Indicates whether the histogram is collected automatically for columns that appear in the Where condition.

Now let's delete the histogram for all the columns in the table.

We execute the following SQL so that the owner column is placed in the Where condition.

Next we refresh the database monitoring information.

We collect statistics on the table using the method_opt = ' For all columns size auto ' method.

Then we look at the histogram information.


From the above query we can see that Oracle automatically collects histograms for the owner column.

Think that if you put a high selectivity column in the Where condition, will the histogram be collected automatically? Now let's put the object_name column in the Where condition.

Then we refresh the database monitoring information.

We collect statistical information.

We see if the object_name column collects histograms.

From the above query we can see that the object_name column does not collect histograms. Thus, it is smart to collect histograms using Auto mode. Mothod_opt The default parameter is for all columns size auto. Method_opt = ' For all columns size repeat ' indicates which columns are currently collecting histograms, and which columns are now collected.

This digest is from the core idea of SQL optimization

The core idea of SQL optimization

Robinson Huang Chao Zhong

Click on the cover to buy a paper book

The Structured Query Language (structured query language,sql) is a powerful database language. It is based on the relational algebra operation, the function is rich, the language is concise, the use is convenient and flexible, has become the relational database standard language. This book is designed to guide readers through SQL optimization skills to better improve database performance. This book is based on Oracle and is designed to be read-only and suitable for all levels of readers.

This book is intended for frontline engineers, operations engineers, database administrators, and system design and development staff , and will benefit from both beginners and some basic readers.

Small welfare

Focus on the "async community" service number, forward this article to a circle of friends or more than 50 people, send to the Async community service number backstage, and leave a message at the bottom of the article you learn SQL language experience, or probation book feel, we will elect 3 readers to give "SQL optimization core ideas" 1, quickly actively participate in it!
Event deadline: May 10, 2018


In the "async Community " Backstage reply " concern ", you can get free 2000 online video courses ; recommend friends follow the tips to get books link, free to get an asynchronous book. Come and join me!

Sweep the QR code above, reply to the "Attention" participation activities!

Read the original and buy the core idea of SQL optimization

Read the original


SQL Tuning Tips: Statistical information (end-of-article benefits)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.