Oracle Index Clustering Factor(叢集因子)

來源:互聯網
上載者:User

Oracle Index Clustering Factor(叢集因子)

一、本文說明:

    今天在做測試的時候發現欄位上有索引,但是執行計畫就是不走索引,經過在網上尋找才發現原來是索引的叢集因子過高導致的。

二、官網說明

    The index clustering factor measures row order in relation to an indexed value suches employee last name.The more order that exists in rowstorage for this value,the lower the clustering factor.

    ----row儲存的越有序,clustering factor的值越低。

    The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index:

    (1)、If the clustering factor is high,then Oracle Database performs a relatively high number of I/Os during a large index range scan.The index entriespoint to random table blocks,so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.

     ----當clustering factor很高時,說明index entry (rowid) 是隨機指向一些block的,在一個大的index range scan時,這樣為了讀取這些rowid指向的block,就需要一次又一次重複的去讀這些block。

    (2)、If the clustering factor is low,then Oracle Database performs a relatively low number of I/Os during a large index range scan.The index keys in arange tend to point to the same data blcok,so the database does not have to read and reread the same blocks over and over.

      ----當clustering factor值低時,說明index keys (rowid) 是指向的記錄是儲存在相同的block裡,這樣去讀row時,只需要在同一個block裡讀取就可以了,這樣減少重複讀取blocks的次數。

      The clustering factor is relevant for index scans because it can show:

          (1)、Whether the database will use an index for large range scans;

          (2)、The degree of table organization in relation to the index key;

          (3)、Whether you should consider using an index-organized table,partitioning,or table cluster if rows must be ordered by the index key.

三、Index Clustering Factor說明

    簡單的說,Index Clustering Factor是通過一個索引掃描一張表,需要訪問的表的資料區塊的數量,即對I/O的影響,也代表索引鍵儲存位置是否有序。

    (1)、如果越有序,即相鄰的KVStore for Redis在相同的block,那麼這時候Clustering Factor的值就越低;

    (2)、如果不是很有序,即索引值是隨機的儲存在block上,這樣在讀取索引值時,可能就需要一次又一次的去訪問相同的block,從而增加了I/O。

    Clustering Factor的計算方式如下:

    (1)、掃描一個索引(large index range scan);

    (2)、比較某行的rowid和前一行的rowid,如果這兩個rowid不屬於同一個資料區塊,那麼cluster factor增加1;

    (3)、整個索引掃描完畢後,就得到了該索引的clustering factor。

            如果clustering factor接近於表格儲存體的塊數,說明這張表是按照索引欄位的順序儲存的。

            如果clustering factor接近於行的數量,那說明這張表不是按索引欄位順序儲存的。

            在計算索引訪問成本的時候,這個值十分有用。Clustering Factor乘以選擇性參數(selectivity)就是訪問索引的開銷。

            如果這個統計資料不能真實反映出索引的真實情況,那麼可能會造成最佳化器錯誤的選擇執行計畫。另外如果某張表上的大多數訪問是按照某個索引做索引掃描,那麼將該表的資料按照索引欄位的順序重新組織,可以提高該表的訪問效能。

四、測試

  4.1、產生問題:

----查看一下資料庫的版本----
SQL> select * from v$version where rownum=1;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

  ----建立一張測試表jack----
SQL> create table jack as select * from dba_objects where 1=2;

Table created.

  ----將資料無序的插入jack表中----
SQL> begin
  2      for i in 1..10 loop
  3        insert /*+ append */ into jack select * from dba_objects order by i;
  4      commit;
  5    end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select count(*) from jack;

  COUNT(*)
----------
    725460

  ----查看一下表的大小-----
SQL> set wrap off
SQL> col owner for a10;
SQL> col segment_name for a15;
SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK';

SEGMENT_NAME    BLOCKS    EXTENTS  size
------------- ---------- ---------- --------
JACK            11264      82      88M

  ----在object_id上建立索引----
SQL> create index jack_ind on jack(object_id);

Index created.

  ----查看一下索引的大小----
SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND';

SEGMENT_NAME    SEGMENT_TYPE      BLOCKS    EXTENTS    size
------------ ------------------ ---------- ---------- ---------
JACK_IND          INDEX          1664        28        13M
  ----在沒有收集相關的統計資訊之前,查看一下index clustering factor----
SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';

INDEX_NAME      CLUSTERING_FACTOR  NUM_ROWS
--------------- ----------------- ----------
JACK_IND              725460        725460

  ----簡單的收集一下統計資訊----
SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true);

PL/SQL procedure successfully completed.

  ----再次查看index clustering factor----
SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';

INDEX_NAME    CLUSTERING_FACTOR  NUM_ROWS
-------------- ----------------- ----------
JACK_IND              725460      725460      ----顯然統計資訊收集前和後,clustering factor值不變,說在建立索引的時候,會收集表中的資料真正的行數。並且這裡的clustering factor等num_rows,也說明表的clustering factor是無序的。

  ----查看一個確定值,然後查看執行計畫----
SQL> explain plan for select * from jack where object_id=1501;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2860868395

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Ti
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT        |          |    10 |  970 |    13    (0)| 00
|  1 |  TABLE ACCESS BY INDEX ROWID| JACK    |    10 |  970 |    13    (0)| 00
|*  2 |  INDEX RANGE SCAN        | JACK_IND |    10 |      |    3    (0)| 00
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------


  2 - access("OBJECT_ID"=1501)

14 rows selected.                ----在這裡走了索引,cost為13.

SQL> alter system flush buffer_cache;

System altered.

SQL> set autotrace traceonly;
  ----查詢一個範圍的執行計畫----
SQL> select * from jack where object_id>1000 and object_id<2000;

9880 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 949574992

--------------------------------------------------------------------------
| Id  | Operation      | Name | Rows  | Bytes | Cost (%CPU)| Time    |
--------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |    |  9657 |  914K|  1824  (1)| 00:00:22 |
|*  1 |  TABLE ACCESS FULL| JACK |  9657 |  914K|  1824  (1)| 00:00:22 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  1 - filter("OBJECT_ID"<2000 AND "OBJECT_ID">1000)


Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
    10993 consistent gets
    10340 physical reads
      0  redo size
  471945 bytes sent via SQL*Net to client
    7657  bytes received via SQL*Net from client
    660  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
    9880  rows processed      ----注意,object_id上是有索引的,但是這裡並沒有使用索引,而是使用了全表掃描。

SQL> alter system flush buffer_cache;

System altered.

 ----強制走索引,查看執行計畫----
SQL> select /*+ index(jack jack_ind) */ * from jack where object_id>1000 and object_id<2000;

9880 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2860868395

----------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time    |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT        |          |  9657 |  914K|  9683    (1)| 00:01:57 |
|  1 |  TABLE ACCESS BY INDEX ROWID| JACK    |  9657 |  914K|  9683    (1)| 00:01:57 |
|*  2 |  INDEX RANGE SCAN        | JACK_IND |  9657 |      |    24    (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)


Statistics
----------------------------------------------------------
      0    recursive calls
      0    db block gets
    10561  consistent gets
    164    physical reads
      0    redo size
  988947  bytes sent via SQL*Net to client
    7657  bytes received via SQL*Net from client
    660    SQL*Net roundtrips to/from client
      0    sorts (memory)
      0    sorts (disk)
    9880  rows processed   
----強制走索引之後,使用了index range scan,但是cost變成了9683,而全表掃描時是1824.
----還有比較一下兩次查詢中物理讀的情況:全表掃描的物理讀明顯比索引的要高很多,但是Oracle卻沒有使用索引。
----因此Oracle認為走索引的Cost比走全表掃描大,而是大N倍,CBO是基於Cost來決定執行計畫的。
----由此得出,對於索引的Cost,Oracle是根據clustering factor參數來計算的,而該實驗中的clustering factor參數是很高的,資料存放區無序。這就造成了Oracle認為走索引的cost比全表掃描的大。

 4.2、解決問題:

----通過上面的分析,可以看出,要降低clustering factor才能解決問題,而要解決clustering factor,就需要重新對錶的儲存位置進行排序。----
  ----重建jakc表----
SQL> create table echo as select * from jack where 1=0;

Table created.

SQL> insert /*+ append */ into echo select * from jack order by object_id;

725460 rows created.

SQL> commit;

Commit complete.

SQL> truncate table jack;

Table truncated.

SQL> insert /*+ append */ into jack select * from echo;

725460 rows created.

SQL> commit;

Commit complete.

  ----查看錶和索引的資訊----
SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK';

SEGMENT_NAME    BLOCKS    EXTENTS    size
------------- ---------- ---------- -----------
JACK            11264      82        88M

SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND';

SEGMENT_NAME    SEGMENT_TYPE      BLOCKS    EXTENTS    size
------------ ------------------ ---------- ---------- -------------
JACK_IND            INDEX          1536          27    12M

SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';

INDEX_NAME    CLUSTERING_FACTOR NUM_ROWS
------------- ----------------- ----------
JACK_IND            725460      725460

  ----對索引進行rebuild----
SQL> alter index jack_ind rebuild;

Index altered.

  ----查看cluster factor----
SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';

INDEX_NAME      CLUSTERING_FACTOR  NUM_ROWS
--------------- ----------------- ----------
JACK_IND              10327      725460    ------注意這裡的Factor,已經變成10327,我們收集一下表的統計資訊,然後與表的block進行一次比較。

SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true);

PL/SQL procedure successfully completed.

SQL> select blocks from dba_tables where table_name='JACK';

    BLOCKS
----------
    10474  ----表jack實際使用的block是10474,clustering factor是10327基本還是比較接近了,這也說明相鄰的row是儲存在相同的block裡。

SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';

INDEX_NAME              CLUSTERING_FACTOR  NUM_ROWS
------------------------------ ----------------- ----------
JACK_IND                  10327    725460

SQL> alter system flush buffer_cache;

System altered.

SQL> set autotrace traceonly;
  ----再次查看之前sql的執行計畫----
SQL> select * from jack where object_id>1000 and object_id<2000;

9880 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2860868395

----------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time    |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT        |          |  9657 |  914K|  162    (0)| 00:00:02 |
|  1 |  TABLE ACCESS BY INDEX ROWID| JACK    |  9657 |  914K|  162    (0)| 00:00:02 |
|*  2 |  INDEX RANGE SCAN        | JACK_IND |  9657 |      |    24    (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)


Statistics
----------------------------------------------------------
      1  recursive calls
      0  db block gets
    1457  consistent gets
    151  physical reads
      0  redo size
  988947  bytes sent via SQL*Net to client
    7657  bytes received via SQL*Net from client
    660  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
    9880  rows processed
  ----注意這裡的cost已經降到了162,效能提升還是非常明顯。

 五、小結

    通過以上說明和測試,可以看到clustering factor也是索引健康的一個重要判斷的標準。其值越低越好。它會影響CBO選擇正確的執行計畫。但是注意一點,clustering factor總是趨勢與不斷惡化的。

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.