SAS Optimization Tips (4) Perform the necessary partial where,if, select,if else, Obs firstobs, read in external data when selecting the required Obs (if+input), Keep/drop

Source: Internet
Author: User
Tags time 0

1:where and if the most essential differences, as well as some small differences

1.1:the WHERE statement examines what was in the input page buffer and selects observations before they was loaded in the P Rogram data vector, which results in a savings in CPU operations (where to filter from buffer and then read into PDV)

The subsetting IF statement loads all observations sequentially to the program data vector. If The statement finds a match and the statement are true, then the data is processed and are written to the output page buf Fer (if first read into PDV and then filtered)

1.2:if can filter from data in input and SAS datasets, where only data from SAS datasets is filtered

If you can select a clause with the condition condition of the IF statement, where cannot

Where is more efficient than if

Where can be used contains of the place are considered to use like

If statement < executable statement >

IF Statement tells SAS which observations to include, the DELETE statement tells SAS WHI CH Observations to exclude

IF Sex = ' F '; IF Sex = ' m ' then DELETE; function like!

Data b;    Set sashelp.class;    If _n_ le 4;  * If true, continue execution of the statement after the IF, and finally output an observation that satisfies the if condition, and if False then immediately returns to the beginning of data step to proceed with the next Set statement;    y = ' Now ';
/*
y = ' now ';
If _n_ le 4; the same result can be obtained, but the efficiency is relatively low, because the assignment statement of Y is executed repeatedly
*/run;

Two other formats for if
If X=3 then y=4; There's only one piece of data to express.
If X=3 then do y=4;z=5;end; For more than one statement to be expressed, use then do end;


Note: Sashelp from the DataSet. CLASS. 19 observations were read
Note: Data set work. B has 4 observations and 6 variables.
Note: The time taken by the "DATA statement" (Total processing time):
Actual time 0.03 seconds
CPU time 0.03 seconds in the log read 19 observations, prove to be all read and then one by one to determine whether to meet the conditions

Data A;    input x [email protected]@;    cards;    1 1 0 2 2 3 3 4    3 4    run;proc sort data=a;by x;run;data b;    Set A;    *where x; * After the condition is not added to filter x is not 0 and is not a missing value of numeric data, only applicable to numerical type;    where x is not missing; * Filter x data not missing values include 0 for numeric and character types; Run;proc print data=b noobs;    

Where and if the most important points of difference

1:where not enforceable, if executable

2:where has its own specific expression, if is a generic expression such as where X is missing;

3:where can only select observations from existing SAS datasets, and the IF statement can also be selected using the observations generated by the input statement. * Commercial is generally the existing SAS data set;

4:where efficiency is higher than if

5: When to use if when to use where? If the PDV observations need to be processed to determine which observation, only the if is used. The rest can be used where

Selection of 2:select, if else if

For numeric variables, SELECT statements should always being slightly More Efficient (use less CPU time) than If-then/else statements. The performance gap between If-then/else and SELECT statements gradually widens as the number of conditions increases

For character variables, if-then/else statements is always more efficient than SELECT STA Tements. The performance gap widens quickly between the techniques as the number of conditions increases.

Best case scenario with two options

Use If-then/else statements when
?? 1:the data values are character values
?? 2:the data values is not uniformly distributed
?? 3:there is few conditions to check.

Use SELECT statements when
?? 1:you has a long series of mutually exclusive numeric conditions
?? 2:data values are uniformly distributed.

3:where and Obs/firstobs, to select the desired observation line.

Remember, Obs/firstobs is the logical choice of observations, not the actual selection, which means that no other observations will be removed.

Where is performed before Obs/firstobs

4: Select the required obs when reading external data

Let's look at an external data flow chart inside the SAS program after reading it in.

You cannot use where to intercept a subset, because where cannot be used to filter external data.

This is where the if is placed after input to filter the observed lines read into the PDV to reduce CPU uptime

The number of I/O times refers to the number of times the communication between the external data and the buffers, and the reduced way is certainly not through the where/if and other statements, only possible through BUFNO, BufSize, Sasfile and other options

5:keep/drop

Demonstrate the role of keep and drop for data

Keep, drop after buffer reduces the number of variables in the input PDV and reduces CPU time

Keep, drop before buffer reduces the number of variables that are read into the buffer, which is the total size of the read data, thus reducing I/o times

SAS Optimization Tips (4) Perform the necessary partial where,if, select,if else, Obs firstobs, read in external data when selecting the required Obs (if+input), Keep/drop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.