R Language Programming

Last Update:2015-07-31 Source: Internet

Author: User

Tags sprintf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Datathe zip file containing the data can be downloaded here:

Specdata.zip [2.4MB]

The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate Matter (PM) air pollution at 332 locations in the states. Each of the file contains data from a single monitor and the ID number for each monitor are contained in the file name. For example, data for monitor is contained in the file "200.csv". Each file contains three variables:

Date:the date of the observation in YYYY-MM-DD format (year-month-day)
Sulfate:the level of sulfate PM in the air on this date (measured in micrograms per cubic meter)
Nitrate:the level of nitrate PM in the air on this date (measured in micrograms per cubic meter)

For this programming assignment you'll need to unzip this file and create the directory ' Specdata '. Once You has unzipped the zip file, do not make a modifications to the files in the ' Specdata ' directory. In each file you'll notice that there is many days where either sulfate or nitrate (or both) is missing (coded as NA). This was common with air pollution monitoring data in the states.Part 1

Write a function named ' Pollutantmean ' that calculates the mean of a pollutant (sulfate or nitrate) across a specified List of monitors. The function ' Pollutantmean ' takes three arguments: ' directory ', ' pollutant ', and ' id '. Given a vector monitor ID numbers, ' Pollutantmean ' reads that monitors ' particulate matter data from the directory Specifi Ed in the ' directory ' argument and returns the mean of the pollutant across all of the monitors, ignoring any missing valu Es coded as NA. A prototype of the function is as follows

Pollutantmean <-function (directory, pollutant, id = 1:332) {        # # ' directory ' is a character vector of length 1 indi Cating # # The location of the        CSV files        # # ' Pollutant ' is a character vector of length 1 indicating        # # The NAM E of the pollutant for which we'll calculate the        # # mean; either "sulfate" or "nitrate".        # # ' ID ' is an integer vector indicating the monitor ID numbers # to be        used        # # Return The mean of the pollutant a Cross all Monitors list        # # in the ' id ' vector (ignoring NA values)        # # Note:do not round the result!}

You can see the some example output from the This function. The function that you write should is able to match this output. Please save your code to a file named Pollutantmean. R.

Pollutantmean <-function (directory, pollutant, id = 1:332) {    tempsum <-0    templen <-0 for    (i in ID) {            FID <-sprintf ("%03d", i)            filename <-paste (directory, "/", FID, ". csv", sep= "")            dat <-read.csv ( FileName)            src <-dat[pollutant]            src <-na.omit (src)              # omit na            tempsum <-tempsum + sum (SRC) 
   templen <-Templen + Dim (src) [1]    }    if (Templen >0) {        pollutantmean <-tempsum/templen    }< C15/>pollutantmean}

Part 2

Write a function that reads a directory full of files and reports the number of completely observed cases in each data fil E. The function should return a data frame where the first column is the name of the file and the second column is the NUM ber of complete cases. A prototype of this function follows

Complete <-function (directory, id = 1:332) {        # # ' directory ' is a character vector of length 1 indicating        # # th E Location of the CSV files        # # ' ID ' is an integer vector indicating the monitor ID numbers # to be        used                # # Re Turn a data frame of the form:        # # ID nobs        # 1  117        # 2  1041        # # ...        # # where ' ID ' is the monitor ID number and ' Nobs ' are the # # number of complete        cases}

You can see the some example output from the This function. The function that you write should is able to match this output. Please save your code to a file named complete . R. To run the submit script for the sure your working directory have the file complete . R in it.

Complete <-function (directory, id = 1:332) {    nobs <-NULL    for (i in ID) {        fid <-sprintf ("%03d", i) 
   
    filename <-Paste (Directory, "/", FID, ". csv", sep= "")        dat <-read.csv (filename)        src <-na.omit (DAT)              # omit NA        nobs <-C (Nobs,dim (SRC) [1])    } complete    <-data.frame (id,nobs)}

Part 3

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation be Tween sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greate R than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows

Corr <-function (directory, threshold = 0) {        # # ' directory ' is a character vector of length 1 indicating        # # The Location of the CSV files        # # ' threshold ' is a numeric vector of length 1 indicating the        # # number of completely o  Bserved observations (on all        # variables) required to compute the correlation between        # # nitrate and sulfate; Default is 0        # # Return A numeric vector of correlations        # # Note:do not round the result!}

For this function, you'll need to use the ' cor ' function in R which calculates the correlation between the vectors. Please read the "Help" page for this function via '? COR ' and make sure. Know.

You can see the some example output from the This function. The function that you write should is able to match this output. Please save your code to a file named Corr. R. To run the submit script for the sure your working directory has the file Corr. R in it.

Corr <-function (directory, threshold = 0) {    corr.list <-null    ID <-1:332    dat <-null for    (i I N ID) {        fid <-sprintf ("%03d", i)        filename <-paste (directory, "/", FID, ". csv", sep= "")        dat <- Read.csv (filename)        src <-na.omit (DAT)              # omit na '        dat <-src        len <-length (dat$id);        if (Len > Threshold && threshold >=0) {            corr.re <-cor (dat$sulfate, dat$nitrate)            corr.list=c ( Corr.list, Corr.re)        }    }    corr.list}

R Language Programming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More