R: Ways to import other style data

Source: Internet
Author: User
Tags documentation


► Import XML Data

Data encoded in XML format is increasing. There are several packages for working with XML files in R. XML packages written by Duncan Temple Lang allow users to read, write, and manipulate XML files.

Readers interested in using R to access XML documents can refer to: Www.omegahat.org/RSXML, where you can find several excellent package documentation.

► Fetching data from a Web page

In the process of Web data fetching (webscraping), the user extracts the information embedded in the Web page from the Internet and saves it as a data structure in R for further analysis. One way to do this is to use the function readlines () To download the Web page, and then use the grep () and gsub () functions to handle it.

For a complex web page, you can use the Rcurl package and the XML package to extract the information that you want.

► Import SPSS Data

The SPSS dataset can be imported into R through the function Read.spss () in the foreign package, or you can use the Hmisc spss.get () function. The function Spss.get () is an encapsulation of READ.SPSS (), which can automatically set many of the latter's parameters for you, making the entire conversion process simpler and consistent.

First, download and install the Hmisc package (the foreign package is installed by default):

> install.packages ("Hmisc")

Then, use the following code to import:

> Library (HMISC)

MyData <-spss.get ("D:/spss/myspss.sav", Use.value.labels=true)

In this code, MYSPSS.SAV is the SPSS data file to be imported, and use.value.labels=true means that the function imports the variable with the value label into the same factor as the horizontal corresponding to R. MyData is the imported dialog box.

► Import SAS Data

R involves several functions for importing SAS datasets, including the READ.SSD () in the foreign package and Sas.get () in the Hmisc package. Unfortunately, if you are using a newer version of SAS (SAS 9.1 or later), you will most likely find that these functions do not work properly because R has not yet followed up with the SAS ' changes to the file structure. Personally recommended two solutions.

You can use proc export in SAS to save the SAS dataset as a comma-delimited text file and to read the exported file into R using the method of importing data from a delimited text file.

In addition, a commercial software named Stat/transfer can be used to save SAS datasets (including any known variable format) as an R data frame intact.

► Import Stata Data

To import Stata data into R is very simple, the command is as follows:

> Library (Foreign)

> MyData <-read.dta ("Mydata.dta")

Here, Mydata.dta is the Stata dataset, and MyData is the returned R data frame.

► Import NETCDF Data

The Unidata project-led open-source software library NETCDF (network Common data form, a Web-based, universal format) defines a machine-independent data format that can be used to create and distribute array-oriented scientific data. The NETCDF format is typically used to store geophysical data. The NCDF package and the NCDF4 package provide a high-level R interface for the netCDF file.

The NCDF package provides support for data files created through the Unidata NETCDF Library (version 3 or earlier) and can be used on Windows, Mac OS x, and Linux. The NCDF4 package supports NETCDF version 4 or earlier, but is not yet available on Windows.

Library (NCDF)

NC <-Nc_open ("Mynetcdffile")

MyData <-GET.VAR.NCDF (NC, MyVar)

In this case, for the variable myvar contained in the netCDF file Mynetcdffile, all of its data is read and saved to an R array named MyArray. It is important to note that the NCDF package and the NCDF4 package have recently been significantly upgraded and may be used differently from the old version. In addition, the function names in the two packages are also different. Please read the online documentation for more information.

► Import HDF5 Data

HDF5 (Hierarchical Data format) is a suite of software technology solutions for managing extremely large and structurally complex data sets. The HDF5 package is capable of writing R objects to a file in a format that can be read by software that understands the HDF5 format. These files can then be read back to R. This package is experimental in nature and assumes that the user has installed HDF5. Currently, R has very limited support for HDF5 formats.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.