Data Sampling using ORACLE

Last Update:2013-11-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What should I do if I use ORACLE to perform data sampling for data analysis and encounter a large amount of recorded data? Comprehensive analysis is unrealistic and unnecessary. The following describes the sampling methods and several common sampling methods: 1. simple random sampling (simple random sampling) samples the total number of all the survey data. Advantage: simple operation, simple calculation of average number, rate and corresponding standard errors. Disadvantage: it is difficult to make numbers one by one when the total size is large. 2. systematic sampling, also known as mechanical sampling and equidistance sampling, divides the overall unit of observation into n parts according to a certain sequence number, then, the observation unit k is randomly selected from the first part, and one observation unit is extracted from each part with equal spacing to form a sample. Advantages: easy to understand and easy to use. Disadvantage: the overall trend of periodic or increase/decrease is prone to bias. 3. cluster sampling divides the population into several sub-populations according to one or more features. group), each sub-population is called a layer, and then a sub-sample is randomly extracted from each layer, and they are combined, that is, the overall sample, called a Layered Sample advantages: easy to organize and save money. Disadvantage: the sampling error is greater than that of random sampling. 4. stratified sampling divides an overall sample into several types or layers based on its attributes and features, and then randomly selects the sample unit in the type or layer to form a sample. Proportional allocation and optimal allocation are available (is oversampling the optimal allocation method ?) Two solutions. Feature: because of the classification and hierarchy, the common characteristics of various types of units are increased, and representative survey samples are easy to be extracted. This method is applicable to situations where the overall situation is complex, and there are large differences between different categories (such as differences in risk/non-risk samples of Financial customers) and many categories. Advantage: The sample is representative and the sampling error is reduced. We need to use the sampling method to randomly extract samples from the total number of users. It is impossible to go down to the local machine and perform sas sampling! Direct sas online sampling is not possible! Directly submit the server for sampling, and then link to the Local Machine for analysis. Now we will introduce the ORACLE sampling method: oracle reads random data to randomly view the first N records SELECT * FROM (SELECT * FROM TB_PHONE_NO order by SYS_GUID () where rownum <10; SELECT * FROM (SELECT * FROM chifan order by dbms_random.random) where rownum <= 5SQL> SELECT * FROM (SELECT * from a sample (0.01) where rownum <= 1; note that the values obtained each time are different. SAMPLE is a random SAMPLE, and the subsequent values are the SAMPLE percentage. The following describes how to randomly retrieve data in oracle: 1. fast random data retrieval (recommended): select * from MEMBER sample (1) where rownum <= 102. random data retrieval, slow select * from (select * from MEMBER order by dbms_random.value) where rownum <= 10 ========= Original article ========= recently the problem of randomly selecting records was used during system creation; I searched for a lot of relevant information online and found different methods and differences. These are all ORACLE-based methods. The first one is to randomly select 6 select * from (select * from tablename order by dbms_random.value) where rownum <7 I think the principle of this method is to sort all the data in the table by random number and then query 6 records from the queried data, in the process of using this method, I found that if there are more than one record, the query speed is a little slow, and the test time is 7000. If the number of records is tens of thousands or 100,000, it may be slower; the second is to use the oracle sample () or sample block method select * from tablename sample (50) where rownum <6. Here, we will briefly introduce the basic methods for accessing data in sample Oracle: 1. full table scan 2. sample table Scan Full table Scan return table All records. Perform a full table scan and read all records in the Oracle table to check whether each row meets the WHERE condition. Oracle sequential reads are allocated to each data block of the table, so that full table scanning can benefit from multiple reads. each data block is read-only once in Oracle. sample table scan: The sample table scan returns the random sample data in the table. This access method must include the SAMPLE option or sample block option in the FROM statement. SAMPLE option: When a SAMPLE table is scanned by row sampling, Oracle reads a specific percentage of records from the table and determines whether the WHERE clause is satisfied to return results. Sample block option: When this option is used, Oracle reads a BLOCK of a specific percentage and checks whether the result set meets the WHERE condition to return records that meet the condition. sample_Percent: Sample_Percent is a number that defines the percentage of records in the result set to the total number of records. The Sample value must be in the range of [0.000001, 99.999999. Pay attention to the following points: 1. sample is only valid for a single table and cannot be used for table join and remote table 2. sample will enable SQL to automatically use CBOPS: Although random data can be obtained in this way, the output order is still sequential. I don't know if it is the Oracle mechanism, why can't cainiao find out for the time being? However, this solves the random demand. This is the case for the time being ~~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data Sampling using ORACLE

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data Sampling using ORACLE

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support