SAS Importing external data files

Source: Internet
Author: User

1. Located in the path: C: "Books" Learning under the text file MyData.txt, its data to open, the form is as follows:

M 50 68 155
F 23 60 101
M 65 72 220
F 35 65 133
M 15 71 166

The code to read into the SAS is as follows:

Data demographics;
InFile ' C: "Books" Learning "MyData.txt";
Input Gender $ age Height Weight;
Run

The symbol $ indicates that the variable gender is a one-character variable. To check that the read-in file is correct, you can view its observations through the print process.

Title "Listing of Data set demographics";
Proc Print data=demographics;
Run

What if the data in the source file has missing values? For example, MyData.txt's data is as follows

M 50 68 155
F 60 101
M 65 72 220
F 35 65 133
M 15 71 166

When SAS reads the second line, it will be 60 for age and 101 for height. Next, the following line of M represents weight, and M is a character variable. Tinker Bell, the SAS error in log. The right way for, to. Represents the missing value. Such as

M 50 68 155
F. 60 101
M 65 72 220
F 35 65 133
M 15 71 166

2. Read the file in CSV format.

No nonsense, look at the code.

Data demographics;
InFile ' C: "Books" Learning "Mydata.csv ' DSD;
Input Gender $ age Height Weight;
Run

DSD (delimiter-sensitive data) has several functions. One is to indicate that the data is separated by commas; second, if there is a two comma in a row, it indicates that there is a missing value in the middle, and that the quotation marks can be removed if the character type variable is enclosed in quotation marks. Another form of the above code is

filename Preston ' C: "Books" Learning "Mydata.csv";
Data demographics;
InFile Preston DSD;
Input Gender $ age Height Weight;
Run

The variable demographics is the alias of the actual file name.

What if the CSV format files are not separated by commas? For example, the following file separates data with colons

m:50:68:155
F:23:60:101
m:65:72:220
f:35:65:133
m:15:71:166

When you read this file, use the following form

InFile ' file-description ' dlm= ': ' or infile ' file-description ' delimiter= ': ';

What if the data is tab-delimited? It's a bit of a hassle, but it's still under control. At this point, a hexadecimal character represents tab, for example

For ASCII files, infile ' file-description ' dlm= ' x;

For EBCDIC files, infile ' file-description ' dlm= ' x;

3. Read in fixed column--mode 1: Column input

Many of the data is stored in a fixed column (columns) in the form of a file. Read into this type of file can be entered in the form of input (column input). This method can be read into a character variable and format the number. Format numbers, not just the positive and negative forms, but also the exponential form of the data. Let's say 3.4E3 means 3.4*10^3.

Suppose you have the following form of data, saved in Bank.txt.

00110/21/1955m 1145
00211/18/2001f 18722
00305/07/1944m 123.45
00407/25/1945f-12345

There are no delimiters. What to do, see the code

Data financial;
InFile ' C: "Books" Learning "Bank.txt";
Input SUBJ $1-3
DOB $4-13
Gender $14
Balance 15-21;
Run

Mode 2. Format input (formatted inputs)

When data is stored in a non-normalized form, the format input method should be considered. As an example of the above example, the formatted input code is

Data financial;
InFile ' C: "Books" Learning "Bank.txt";
Input @1 subj.
@4 DOB Mmddyy10.
@14 Gender $.
@15 Balance 7.;
Run

The symbol @ is called the column pointer (Pointer), and @4 tells the SAS that it refers to the 4th column. There are two forms of formatting, w.d and $w. W tells the SAS that there are several columns of data to read, and D indicates where the decimal point is in its value. For example, a 3.0 read in 123,sas will be saved as 123.0, the same number will be read in 3.1, and the SAS would be saved as 12.3. If the number you want to read has a decimal, then D is ignored. If you read the 1.23,sas in 4.1, it will still be saved as 1.23. Instead, $w indicates that the W column data will be read.

When you need to read the date data, MMDDYY10. Tells the SAS to read the date in mm/dd//yyyy form, and the SAS will save the days from January 1, 1960 to this date. So, if the read-in 01/01/1960,sas will be saved as 0.

After executing the above code, the data displayed is

Obs subj DOB Gender Balance
1 001-1533 M 1145.00
2 002 15297 F 18722.00
3 003-5717 M 123.45
4 004-5273 F-12345.00

Birthdays are now displayed as days from January 1, 1960. Reformat the date if you want to display it as a normal form output

Title "Listing of Financial";
Proc Print data=financial;
Format DOB mmddyy10.
Balance dollar11.2;
Run

The results are displayed as

Listing of Financial
Obs subj DOB Gender Balance
1 001 10/21/1955 M $1,145.00
2 002 11/18/2001 F $18,722.00
3 003 05/07/1944 M $123.45
4 004 07/25/1945 F $-12,345.00

4. Using the format expression

Suppose you have a file whose data is separated by a space or a comma, and the characters are longer than 8 bytes. When read-in, the format is appended to the variable name in the input expression. The middle is separated by a colon.

For example, there is a CSV document with the following content

"001", "Christopher Mullens", 11/12/1955, "$45,200"
"002", "Michelle Kwo", 9/12/1955, "$78,123"
"003", "Roger W. McDonald", 1/1/1960, "$107,200"

Look at the code:

Data list_example;
InFile ' C: "Books" Learning "List.csv ' DSD;
Input SUBJ: $.
Name: $ $.
Dob:mmddyy10.
Salary:dollar8.;
Format DOB date9. Salary dollar8.;
Run

Another form of read data that uses Informat expressions is the explicit use of informat. The read-in code above can be considered

Data list_example;
Informat subj.
Name $ $.
DOB Mmddyy10.
Salary dollar8.;
InFile ' C: "Books" Learning "List.csv ' DSD;
Input SUBJ
Name
DOB
Salary;
Format DOB date9. Salary dollar8.;
Run

If the data in the above CSV file is separated by a space, the characters are not quoted. For example

001 Christopher Mullens 11/12/1955 $45,200
002 Michelle Kwo 9/12/1955 $78,123
003 Roger W. McDonald 1/1/1960 $107,200

Here's the problem. Spaces in the name use the SAS to cause the variable to end, resulting in an incorrect read.

The solution to the problem is to replace the colon with &: The code read in

Data list_example;
InFile ' C: "Books" Learning "List.txt";
Input SUBJ: $.
Name & $ $.
Dob:mmddyy10.
Salary:dollar8.;
Format DOB date9. Salary dollar8.;
Run

    • Top

    • 0

    • Step


This article is from "Pioneer Home" blog, please make sure to keep this source http://jackwxh.blog.51cto.com/2850597/1738548

SAS Importing external data files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.