Those years we used to encounter the pit in parallel 01

Source: Internet
Author: User
Tags commit rand

background: When testing a one-machine product today, in the Performance Stress test section, the test cases that were found in the product manuals were missing some details in the operation about parallelism, which is likely to make it impossible for inexperienced people to test efficiently.
Now record the process and look back at the pits we encountered in parallel in those years:

Environment: Oracle RAC 11.2.0.4 (3 nodes)

    • 1. Parallel Insert No effect
    • 2. Parallel only on local node
    • 3. Increase the effect of parallelism
    • 4. Parallel testing of all nodes
1. Parallel Insert No effect

Test Case:

create table Z_OBJ tablespace TBS_1 as select * from dba_objects ;insert /*+ append parallel(t0,16) */ into Z_OBJ t0 select /*+ parallel(t1,16) */ * from Z_OBJ t1;commit;--多次执行并查询大小select owner,segment_name,bytes/1024/1024 from dba_segments where segment_name='Z_OBJ';

Based on the test case execution, it was found that the actual parallelism was not reasonably used and was inefficient (monitoring that I/O writes are only hundred megabytes per second, which should normally be a gigabit per second level).
To view the execution plan:

Sql> explain plan for insert/*+ append parallel (t0,16) */into z_obj t0 select/*+ Parallel (t1,16) */* from z_obj T1;  Explained.sql> Set Lines pages 200sql> select * FROM table (Dbms_xplan.display ()); Plan_table_ OUTPUT------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------Plan Hash value:1886916412------------ ---------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU) |    Time | TQ | In-out| PQ Distrib |----------------------------------------------------------------------------------------------------   -----------| 0 |          INSERT STATEMENT |    |    91m| 17g| 23842 (1) |        00:00:01 |      |         |   ||  1 | LOAD as SELECT |       Z_obj |       |            |          |        |      |            |   ||   2 |          PX Coordinator |       |       |            |          |        |      |            |   ||    3 | PX SEND QC (RANDOM) |    : TQ10000 |    91m| 17g| 23842 (1) |  00:00:01 | q1,00 | P->s |   QC (RAND) | |     4 |          PX BLOCK ITERATOR |    |    91m| 17g| 23842 (1) |  00:00:01 | q1,00 |            PCWC |   ||      5 | TABLE ACCESS full|    Z_obj |    91m| 17g| 23842 (1) |  00:00:01 | q1,00 |            PCWP | |-------------------------------------------------------------------------------------------------------------- -note------Dynamic sampling used for this statement (level=2) and selected.

As you can see, only the query part is used in parallel, and the insert part is not used in parallel, although we specify the hint of the degree of parallelism.
Knowledge point 1: Not only the insert operation, but also the parallelism of other DML operations, requires that DML-enabled parallelism be displayed:

alter session enable parallel dml;

Review the execution plan again and find that the Insert section is already ready to be used in parallel:

Sql> explain plan for insert/*+ append parallel (t0,16) */into z_obj t0 select/*+ Parallel (t1,16) */* from z_obj T1; Explained.sql> select * FROM table (Dbms_xplan.display ()); Plan_table_ OUTPUT------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------Plan Hash value:2135351304------------ ---------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU) |    Time | TQ | In-out| PQ Distrib |----------------------------------------------------------------------------------------------------   -----------| 0 |          INSERT STATEMENT |    |    91m| 17g| 23842 (1) |        00:00:01 |      |            |   ||  1 |       PX Coordinator|       |       |            |          |        |      |            |   ||   2 | PX SEND QC (RANDOM) |    : TQ10000 |    91m| 17g| 23842 (1) |  00:00:01 | q1,00 | P->s |   QC (RAND) | |    3 | LOAD as SELECT |       Z_obj |       |            |          |  | q1,00 |            PCWP |   ||     4 |          PX BLOCK ITERATOR |    |    91m| 17g| 23842 (1) |  00:00:01 | q1,00 |            PCWC |   ||      5 | TABLE ACCESS full|    Z_obj |    91m| 17g| 23842 (1) |  00:00:01 | q1,00 |            PCWP | |-------------------------------------------------------------------------------------------------------------- -note------Dynamic sampling used for this statement (level=2) and selected.
2. Parallel only on local node

By default, parallel operations are distributed to each node of the RAC, which is usually the result we do not want to see.
Knowledge point 2: The parameter parallel_force_local=true can be set to force parallel operations to execute on the local node, which is a dynamic parameter:

alter system set parallel_force_local=true sid='*';

This performs the insert operation, dstat monitoring at each node, it is found that only the local node has a write operation of hundreds of m per second, indicating that the Parallel_force_local=true parameter is active:

----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq| Read writ|  Recv send| In-out | int CSW 1 0 98 0 0 0|  163M 326m|   74k 61k|  0 0 | 17k 51k 2 0 98 0 0 0| 164M 325m|   479k 29k|  0 0 | 18k 51k 2 0 98 0 0 0| 165M 330m|   833k 1347k|  0 0 | 21k 54k 1 0 98 0 0 0|  167M 336m|   47k 58k|  0 0 | 18k 52k 1 0 98 0 0 0| 173M 340m|   507k 31k|  0 0 | 18k 53k 1 0 98 0 0 0|  176M 354m|   77k 546k|  0 0 | 18k 54k 1 0 98 0 0 0|  168M 341m|   43k 44k|  0 0 | 18k 53k 2 0 98 0 0 0|  177M 353m|   32k 42k|  0 0 | 18k 54k 2 0 98 0 0 0|  183M 362m|   65k 67k|  0 0 | 17k 54k 1 0 98 0 0 0|  163M 329m|   44k 44k|  0 0 | 16k 49k 1 0 98 0 0 0|  165M 328m|   39k 33k|  0 0 | 18k 51k 1 0 98 0 0 0|  161M 323m|   43k 56k|  0 0 |   17k 50k 20 98 0 0 0|  182M 360m|   44k 49k|  0 0 | 18k 55k 1 0 98 0 0 0|  166M 331m|   34k 52k|  0 0 | 18k 51k 2 0 98 0 0 0|  162M 327m|   25k 25k|  0 0 | 18k 51k

At this point, with the experience in 1, enabling DML parallelism, you can see a significant increase in efficiency, with local nodes having write operations of thousands of M per second:

----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq| Read writ|  Recv send| In-out | int CSW 8 1 1 0 0|2927m 5882m|   771k 140k| 0 0 | 107k 157k 9 1 1 0 0|3134m 6266m|   759k 1484k| 0 0 | 108k 161k 8 1 1 0 0|3021m 6042m|   154k 178k| 0 0 | 104k 155k 9 1 0 0 0|3000m 6004m|   259k 266k| 0 0 | 106k 156k 9 1 0 0 0|2875m 5754m|   129k 142k| 0 0 | 102k 150k 9 1 0 0 0|3082m 6160m|   127k 135k| 0 0 | 108k 158k 9 1 0 0 0|3044m 6095m|   655k 642k| 0 0 | 107k 158k 9 1 0 0 0|2961m 5923m|   125k 134k| 0 0 | 105k 153k 9 1 0 0 0|2875m 5747m|   137k 168k| 0 0 | 102k 150k 9 1 0 0 0|3156m 6312m|   127k 135k| 0 0 | 109k 163k 9 1 1 0 0|3144m 6291m|   130k 138k| 0 0 | 109k 162k 9 1 1 0 0|3058m 6117m|   125k 143k| 0 0 |   106k 157k 91 0 0 0|3138m 6279m|   132k 139k| 0 0 | 108k 161k 9 1 0 0 0|3039m 6074m|   141k 143k| 0 0 | 106k 156k 4 1 0 0 0|1237m 2615m|   986k 61k|  0 0 | 68k 90k
3. Increase the effect of parallelism

To create a large table z_obj_3, insert data using 32 degrees of parallelism:

create table Z_OBJ_3 tablespace TBS_3 as select * from dba_objects ;insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;commit;

The actual cost of 25s of time to insert is complete, and the performance of parallelism increases further:

SQL> insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;867092478 rows created.Elapsed: 00:00:25.52

At this time Dstat monitoring, write operations per second reached 8000m+:

----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq| Read writ|  Recv send| In-out |   int CSW 0 0 0 0 0|2489k 1036k|   0 0 |  0 0 | 10k 9766 1 2 0 0|3755m 7542m|   699k 1055k| 0 0 | 143k 210k 2 2 0 0|3634m 7407m|   447k 453k| 0 0 | 147k 209k 1 2 0 0|4202m 8402m|   535k 553k| 0 0 | 141k 215k 1 2 0 0|4168m 8339m|   539k 556k| 0 0 | 144k 214k 1 2 0 1|4109m 8224m|   546k 552k| 0 0 | 142k 210k 1 3 0 0|4209m 8419m|   311k 327k| 0 0 | 138k 213k 1 3 0 0|4237m 8483m|   114k 114k| 0 0 |  136k 210k 9 1 1 0 1|2709m 5703m|   64k 65k| 0 0 |  156k 203k 1 2 0 0|4189m 8383m|   91k 87k| 0 0 |  136k 205k 1 3 0 0|4237m 8478m|   95k 101k| 0 0 |  136k 208k 1 2 0 0|4242m 8485m|   95k 109k| 0 0 |   139k 208k 141 3 0 0|4202m 8412m|   835k 103k| 0 0 |   137k 208k 1 2 0 0|4288m 8563m|1143k 1930k| 0 0 | 139k 211k 1 2 0 0|4229m 8477m|   101k 97k| 0 0 | 138k 209k

Create a large table z_obj_4 and insert the data using 64 degrees of parallelism:

create table Z_OBJ_4 tablespace TBS_4 as select * from dba_objects ;insert /*+ append parallel(t0,64) */ into Z_OBJ_4 t0 select /*+ parallel(t1,64) */ * from Z_OBJ t1;commit;

The actual cost of 28s of time is inserted, and it is found that even if the CPU is sufficient, there is no performance increase in the degree of parallelism, indicating that I/O has reached the bottleneck:

SQL> insert /*+ append parallel(t0,64) */ into Z_OBJ_4 t0 select /*+ parallel(t1,64) */ * from Z_OBJ t1;867092478 rows created.Elapsed: 00:00:28.61

At this time Dstat monitoring, write operations per second close to 8000M:

----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq| Read writ|  Recv send| In-out |   int CSW 2 Bayi 4 0 1|3844m 7711m|3571k 2567k| 0 0 | 130k 197k 1 3 0 0|3810m 7602m|   535k 1885k| 0 0 | 115k 175k 1 3 0 0|3799m 7607m|   603k 654k| 0 0 | 116k 174k 1 3 0 0|3810m 7638m|   550k 602k| 0 0 | 119k 176k 1 3 0 0|3766m 7531m|   630k 651k| 0 0 | 114k 171k 1 Bayi 4 0 0|3804m 7608m|   620k 669k| 0 0 | 117k 175k 1 3 0 0|3792m 7585m|   581k 616k| 0 0 | 117k 176k 1 3 0 0|3767m 7522m|   561k 612k| 0 0 | 116k 173k 1 3 0 0|3659m 7343m|   553k 601k| 0 0 | 115k 170k 1 3 0 0|3659m 7340m|   609k 668k| 0 0 | 121k 179k 1 3 0 0|3746m 7502m|   609k 644k| 0 0 | 117k 174k 1 3 0 0|3822m 7648m|   675k 773k| 0 0 |   118k 178k 131 3 0 0|3769m 7541m|1191k 632k| 0 0 |   115k 173k 1 3 0 0|3864m 7725m|1749k 2533k| 0 0 | 117k 177k 1 3 0 0|3741m 7481m|   613k 655k| 0 0 | 116k 172k

Knowledge point 3: Generally increasing the degree of parallelism can increase the return speed of the operation, but also limited by the overall system I/O capability

4. Parallel testing of all nodes
Test 3 nodes at a time:

--节点1set time onset timing ondrop table Z_OBJ_2 purge;create table Z_OBJ_2 tablespace TBS_2 as select * from dba_objects where 1=2;alter session enable parallel dml;--INSERT Z_OBJ_2insert /*+ append parallel(t0,32) */ into Z_OBJ_2 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;commit;--节点2set time onset timing ondrop table Z_OBJ_3 purge;create table Z_OBJ_3 tablespace TBS_3 as select * from dba_objects where 1=2;alter session enable parallel dml;--INSERT Z_OBJ_3insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;commit;--节点3set time onset timing ondrop table Z_OBJ_4 purge;create table Z_OBJ_4 tablespace TBS_4 as select * from dba_objects where 1=2;alter session enable parallel dml;--INSERT Z_OBJ_4insert /*+ append parallel(t0,32) */ into Z_OBJ_4 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;commit;

Each node observes the insertion time (a single execution time is longer, the overall I/O bottleneck causes):

15:26:06 SQL> insert /*+ append parallel(t0,32) */ into Z_OBJ_2 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;867092478 rows created.Elapsed: 00:00:48.5315:25:23 SQL>  insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;867092478 rows created.Elapsed: 00:00:45.8415:25:21 SQL>  insert /*+ append parallel(t0,32) */ into Z_OBJ_4 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;867092478 rows created.Elapsed: 00:00:47.63

Each node Dstat simultaneously observed:

--NODE1:----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq| Read writ|  Recv send| In-out | int CSW 7 1 9 0 0|2110m 4223m|   169k 230k|  0 0 | 78k 122k 7 1 9 0 0|2107m 4209m|   176k 178k|  0 0 | 79k 123k 9 1 Bayi 9 0 0|2614m 5237m|   190k 195k|  0 0 | 96k 148k 8 1 Bayi 0 0|2171m 4339m|   195k 232k|  0 0 | 84k 127k 7 1 9 0 0|1975m 3947m|   220k 184k|  0 0 | 76k 117k 7 1 9 0 0|2051m 4099m|   166k 169k|  0 0 |   78k 121k 7 1 0 0|2059m 4121m|1193k 170k|  0 0 | 79k 121k 7 1 9 0 0|2001m 4011m|   384k 1463k|  0 0 | 76k 118k 3 0 93 4 0 0| 802M 1570m|   148k 144k|  0 0 | 36k 53k 2 0 96 2 0 0| 355M 886m|   113k 137k|  0 0 | 47k 61k 8 1 9 0 0|2122m 4255m|   189k 202k|  0 0 | 79k 123k 7 1 9 0 0|2040m 4069m|   162k 164k|  0 0 | 76k 119k 8 1 9 0 0|2208m 4436m|   839k 843k|  0 0 | 83k 130k 9 1 7 0 0|2506m 5037m|   305k 307k|  0 0 | 94k 145k 4 0 2 0 0|1098m 2273m|   218k 233k|  0 0 | 49k 72k--node2:----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq| Read writ|  Recv send| In-out | int CSW 6 1 0 0|2152m 4312m|   221k 224k|  0 0 | 79k 130k 7 1 0 0|2226m 4447m|   216k 218k|  0 0 | 81k 133k 1 Bayi 8 0 0|2775m 5559m|   244k 214k| 0 0 | 100k 159k 7 1 9 0 0|2110m 4205m|   220k 221k|  0 0 | 77k 126k 7 1 0 0|2104m 4219m|   231k 266k|  0 0 | 76k 126k 7 1 0 0|2158m 4311m|   207k 207k|  0 0 | 78k 129k 7 1 0 0|2103m 4214m|   877k 849k|  0 0 | 76k 126k 7 1 0 0|2109m 4214m|   207k 209k|  0 0 | 76k 124k 1 Bayi 8 0 0|2934m 5866m|   212k 216k| 0 0 | 102k  165k 7 1 0 0|2281m 4551m|   207k 227k|  0 0 | 82k 133k 7 1 0 0|2136m 4281m|   206k 205k|  0 0 | 79k 128k 6 1 0 0|1951m 3940m|   313k 341k|  0 0 | 73k 120k 4 0 4 0 0|1044m 2250m|   672k 642k|  0 0 |  56k 88k 0 0 99 0 0 0| 50M 116m|   258k 276k|  0 0 | 11k 14k 0 0 100 0 0 0| 323k 58k|   208k 202k| 0 0 |8385 10k--node3:----total-cpu-usage-----dsk/total--net/total----Paging-----system--usr sys IDL Wai HiQ siq | Read writ|  Recv send| In-out | int CSW 6 1 0 0|2144m 4274m|   149k 156k|  0 0 | 77k 129k 6 1 0 0|2223m 4452m|   165k 189k|  0 0 | 80k 133k 6 1 0 0|2203m 4404m|   189k 198k|  0 0 | 79k 131k 7 0 0 0|2119m 4233m|   140k 211k|  0 0 | 75k 125k 7 1 0 0|2156m 4311m|   870k 731k|  0 0 | 78k 128k 7 1 0 0|2157m 4318m|   143k 149k|  0 0 | 79k 129k 7 1 9 0 0|2172m 4344m|   165k 170k|  0 0 | 79k 131k 7 1 0 0|2139m 4283m|   140k 141k|  0 0 | 78k 125k 7 1 0 0|2145m 4303m|   143k 151k|  0 0 | 78k 129k 7 1 0 0|2121m 4226m|   146k 450k|  0 0 | 76k 126k 7 1 0 0|2442m 4884m|   460k 155k|  0 0 | 87k 144k 6 0 0 0|2083m 4177m|   217k 156k|  0 0 | 76k 126k 4 0 7 0 0|1445m 2863m|   130k 126k|  0 0 | 54k 89k 2 0 94 3 0 0| 577M 1341m|   121k 124k|  0 0 | 53k 73k 7 1 0 0|2219m 4437m|   157k 193k|  0 0 | 81k 133k

Knowledge Point 4: The overall efficiency of the simultaneous parallel operation of each node is also limited by the overall system I/O capability
Test here, there is a doubt, why not create? Let's try the create operation according to the test case, it is unsatisfactory, only more than 300 m of the write speed, nearly 10 minutes before the creation is complete. The parallel insert above has a write speed of more than 8,000 m, 20s+ can be inserted to complete:

drop table Z_OBJ_2 purge;create table Z_OBJ_2 tablespace TBS_2 as select /*+ parallel(t1,32) */ * from Z_OBJ t1;Elapsed: 00:09:19.5215:49:58 SQL> insert /*+ append parallel(t0,32) */ into Z_OBJ_2 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;867092478 rows created.Elapsed: 00:00:25.24

Obviously, the create operation is equivalent to not using parallelism, how can the create operation also be used for parallelism? This requires that the SQL statement be rewritten as follows:

--使用到并行,26s就完成了百G大小表的创建:drop table Z_OBJ_2 purge;create table Z_OBJ_2 tablespace TBS_2 parallel(degree 32) as select /*+ parallel(t1,32) */ * from Z_OBJ t1;Elapsed: 00:00:26.76--使用到并行+nologging,差距不大,只需25s就完成了百G大小表的创建:drop table Z_OBJ_2 purge;create table Z_OBJ_2 tablespace TBS_2 parallel(degree 32) nologging as select /*+ parallel(t1,32) */ * from Z_OBJ t1;Elapsed: 00:00:25.77

That is, when we use parallelism, we pay particular attention to whether the parts are effectively used in parallel.
about parallelism, there are some interesting scenarios, such as the hint that have been encountered by developers to write the wrong degree of SQL Parallelism caused Oracle to adopt automatic DOP, that is, the maximum degree of parallelism execution, resulting in system resources are basically fully occupied, and other operations are not efficient operation resulting in performance failure. Other pits about parallelism will be described in detail later in the chapter.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.