Use the Sample parameter of DataPump for data sampling.

Source: Internet
Author: User

Use the Sample parameter of DataPump for data sampling.

Data Pump Datapump is a convenient data logical backup and restoration tool launched by Oracle. Compared with the traditional Exp/Imp, Datapump provides many advantages in both performance and functionality. In practice, big data sampling is a frequent business requirement. For a massive data table, you can extract representative data volumes to complete the corresponding work.

In the select statement, we can use the sample statement to implement some corresponding functions. With Datapump, we can also back up data tables in the sampling mode.

1. Environment Description

We chose Oracle 11gR2 for testing. The specific version is 11.2.0.4.

SQL> select * from v $ version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0-64bit Production

PL/SQL Release 11.2.0.4.0-Production

CORE 11.2.0.4.0 Production

TNS for Linux: Version 11.2.0.4.0-Production

NLSRTL Version 11.2.0.4.0-Production

Create a data table SM_TEST.

SQL> create table scott. sm_test as select * from dba_objects;

Table created

SQL> select count (*) from scott. sm_test;

COUNT (*)

----------

133391

2. Test the experiment

There are few official interpretations of the Sample parameter of Expdp.

SAMPLE

Percentage of data to be exported.

Datapump can work in three modes: full database mode, Schema mode, and table mode. You can use the Sample parameter to set the three modes. The following describes how to set the sampling ratio by specifying the simplest ratio value.

[Oracle @ localhost ~] $ Expdp \ "/as sysdba \" dumpfile = TEST_1.dmp tables = scott. sm_test sample = 50

Export: Release 11.2.0.4.0-Production on Mon Nov 23 13:50:02 2015

Copyright (c) 1982,201 1, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0-64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Starting "SYS". "SYS_EXPORT_TABLE_01": "/******** as sysdba" dumpfile = TEST_1.dmp tables = scott. sm_test sample = 50

Estimate in progress using BLOCKS method...

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 8 MB

Processing object type TABLE_EXPORT/TABLE

.. Exported "SCOTT". "SM_TEST" 6.402 MB 66587 rows

Master table "SYS". "SYS_EXPORT_TABLE_01" successfully loaded/unloaded

**************************************** **************************************

Dump file set for SYS. SYS_EXPORT_TABLE_01 is:

/U01/app/admin/sicsdb/dpdump/TEST_1.dmp

Job "SYS". "SYS_EXPORT_TABLE_01" successfully completed at Mon Nov 23 13:50:11 2015 elapsed 0 00:00:07

In the table mode where the exported data table name is specified, the percentage parameter of sample is used to determine the content of a specific table. About 60 thousand of the exported data is exactly 0.13 million of the total data of 50%, which is a reasonable result.

If you run in schema mode and specify the proportional share directly, what is the result?

SQL> select count (*) from scott. emp;

COUNT (*)

----------

14

[Oracle @ localhost ~] $ Expdp \ "/as sysdba \" dumpfile = TEST_2.dmp schemas = scott sample = 50

Export: Release 11.2.0.4.0-Production on Mon Nov 23 13:51:31 2015

Copyright (c) 1982,201 1, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0-64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Starting "SYS". "SYS_EXPORT_SCHEMA_01": "/******** as sysdba" dumpfile = TEST_2.dmp schemas = scott sample = 50

Estimate in progress using BLOCKS method...

Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 8.093 MB

Processing object type SCHEMA_EXPORT/USER

Processing object type SCHEMA_EXPORT/SYSTEM_GRANT

Processing object type SCHEMA_EXPORT/ROLE_GRANT

Processing object type SCHEMA_EXPORT/DEFAULT_ROLE

Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA

Processing object type SCHEMA_EXPORT/TABLE

Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT

Processing object type SCHEMA_EXPORT/TABLE/INDEX

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS

Processing object type SCHEMA_EXPORT/VIEW

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

.. Exported "SCOTT". "SM_TEST" 6.406 MB 66611 rows

.. Exported "SCOTT". "DEPT" 5.859 KB 1 rows

.. Exported "SCOTT". "EMP" 8.359 KB 9 rows

.. Exported "SCOTT". "SALGRADE" 5.835 KB 3 rows

.. Exported "SCOTT". "BONUS" 0 KB 0 rows

Master table "SYS". "SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded

**************************************** **************************************

Dump file set for SYS. SYS_EXPORT_SCHEMA_01 is:

/U01/app/admin/sicsdb/dpdump/TEST_2.dmp

Job "SYS". "SYS_EXPORT_SCHEMA_01" successfully completed at Mon Nov 23 13:51:48 2015 elapsed 0 00:00:16

The sampling ratio takes effect within the range of all data tables in the Schema. All data tables are sampled at a rate of 50%.

Finally, if we export multiple data tables and only sample some data tables, how do we specify the sample parameter?

[Oracle @ localhost ~] $ Expdp \ "/as sysdba \" dumpfile = TEST_3.dmp schemas = scott sample = scott. sm_test: 30

Export: Release 11.2.0.4.0-Production on Mon Nov 23 13:52:50 2015

Copyright (c) 1982,201 1, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0-64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Starting "SYS". "SYS_EXPORT_SCHEMA_01": "/******** as sysdba" dumpfile = TEST_3.dmp schemas = scott sample = scott. sm_test: 30

Estimate in progress using BLOCKS method...

Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA

Total estimation using BLOCKS method: 4.987 MB

Processing object type SCHEMA_EXPORT/USER

Processing object type SCHEMA_EXPORT/SYSTEM_GRANT

Processing object type SCHEMA_EXPORT/ROLE_GRANT

Processing object type SCHEMA_EXPORT/DEFAULT_ROLE

Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA

Processing object type SCHEMA_EXPORT/TABLE

Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT

Processing object type SCHEMA_EXPORT/TABLE/INDEX

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS

Processing object type SCHEMA_EXPORT/VIEW

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS

.. Exported "SCOTT". "SM_TEST" 3.861 MB 40146 rows

.. Exported "SCOTT". "DEPT" 5.960 KB 5 rows

.. Exported "SCOTT". "EMP" 8.562 KB 14 rows

.. Exported "SCOTT". "SALGRADE" 5.859 KB 5 rows

.. Exported "SCOTT". "BONUS" 0 KB 0 rows

Master table "SYS". "SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded

**************************************** **************************************

Dump file set for SYS. SYS_EXPORT_SCHEMA_01 is:

/U01/app/admin/sicsdb/dpdump/TEST_3.dmp

Job "SYS". "SYS_EXPORT_SCHEMA_01" successfully completed at Mon Nov 23 13:53:00 2015 elapsed 0 00:00:09

In the above experiment, if you only sample a specific data table, you need to specify the data table name: sampling ratio.

3. Conclusion

According to my estimation, the sample parameter is mainly implemented by the sample statement that is passed to the select statement in expdp.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.