[R] data import read. Table Function details, how to read irregular data (fill = T)

Source: Internet
Author: User

Functionread.tableIt is the most convenient way to read rectangular lattice data. Some functions are preset because there may be many situations. These functions are called.read.tableBut some of its default parameters are changed.

Note,read.tableIt is not an effective way to read a large numeric matrix: See the followingscanFunction.

Some issues that need to be considered are:

  1. Encoding Problems

    If the file contains non-ASCII character fields, make sure to read them in the correct encoding mode. This is a major problem reading Latin-1 files in the UTF-8's local system. In this case, you can handle the following:

              read.table(file("file.dat", encoding="latin1"))     

    Note that this is run in any local system that can display the Latin-1 Name.

  2. First line

    We recommend that you setheaderParameters. By convention, the first row has only the fields of the corresponding column but no fields corresponding to the row label. Therefore, it will have one field less than the remaining rows. (If you need to see this line in R, Setheader = TRUE.) If the file to be read contains a line label header field (which may be null), read

              read.table("file.dat", header = TRUE, row.names = 1)     

    You can usecol.namesExplicitly set. The explicitly set name replaces the column name in the first row (if any ).

  3. Separator Problems

    Generally, when you open a file, you can check the field Separator Used by the file. However, you can select the defaultsep = ""(It can use any blank characters as separators, such as spaces, tabs, and line breaks ),sep = " "Orsep = "\t". Note that the selection of separators affects the referenced strings.

    You must usesep = "\t".

  4. ReferenceBy default, a string can be enclosed by "or", and in both cases, the characters inside the quotation marks are part of the string. Valid reference characters (may not) are set by ParametersquoteControl. Forsep = "\n"The default value is changedquote = "".

    If no Delimiter is set, in the string enclosed by quotation marks, the quotation marks must be escaped in the escape mode in the C format, that is, the backslash \ is directly added before the quotation marks \.

    If a separator is set, repeat the quotation marks twice in the string enclosed by quotation marks in the workbook to escape the effect. For example

              'One string isn''t two',"one more"     

    Can be read by the following command

              read.table("testfile", sep = ",")     

    This does not work in files with default delimiters.

  5. Defect ValueBy default, the file is assumed to useNAIndicates the defect value. However, this parameter can be usedna.stringsChange. Parametersna.stringsIt is a vector that can contain one or more missing character description methods.

    Empty fields in the value column are also considered as missing values.

    In the value column, the valueNaN,InfAnd-InfAre acceptable.

  6. Rows with null fields at the end

    Files exported from a workbook usually end with empty fields (including? Why is it? Ignore. Parameters must be set to read such filesfill = TRUE.

  7. Blank in character field

    If a Delimiter is set, the white space at the start and end of the character field will be viewed as part of the field. To remove the white space, you can use the Parameterstrip.white = TRUE.

  8. Blank line

    By default,read.tableIgnore blank rows. You can setblank.lines.skip = FALSETo change. However, this parameter is onlyfill = TRUEValid only when used together. In this case, a blank row may be used to indicate the defective sample in the rule data.

  9. Variable type

    Unless you take special actions,read.tableA suitable type is selected for each variable in the Data box. If the field is not defective and cannot be converted directlylogical,integer,numericAndcomplexTo determine the field type in sequence. If all these types fail, the variable is converted into a factor.

    ParameterscolClassesAndas.isProvides great control.as.isWill suppress the conversion of character vectors to factors (this function only ).colClassesRun to set the required type for each column in the input.

    Note,colClassesAndas.isPairEachColumn-specific, notEachVariable. Therefore, it also applies to row label columns (if any ).

  10. Note

    By default,read.tableUse # as a comment to identify characters. If you encounter this character (except in the referenced string), the subsequent content in this row will be ignored. Rows with only blank spaces and comments are treated as blank rows.

    If you confirm that there is no comment in the data file, usecomment.char = ""It will be safer (or faster ).

  11. Escape

    Many operating systems have the habit of using backslashes as escape characters in text files, but Windows systems are an exception (using backslashes in pathnames ). In R, you can set whether this habit is used for data files.

    read.tableAndscanEach has a logical parameter.allowEscapes. From R 2.2.0, this parameter defaults to "no" and the backslash is the only character interpreted as an escape quote (in the environment described above ). If this parameter is set to yes, escape rules in the form of C are interpreted, that is, the control operator is as follows:\a, \b, \f, \n, \r, \t, \vThe octal and hexadecimal formats are as follows:\040And\0x2ASame description. All other escape characters are themselves, including backslash.

Common functionsread.csvAndread.delimIsread.tableSet parameters to match the CSV and tab-Separated Files exported from a spreadsheet in the local English language system. Variants corresponding to these two functionsread.csv2Andread.delim2It is designed for countries where Commas are used as decimal points.

Ifread.tableIncorrect option settings, error information is usually displayed in the following form

     Error in scan(file = file, what = what, sep = sep, :             line 1 did not have 5 elements

Or

     Error in read.table("files.dat", header = TRUE) :             more columns than column names

This information may be sufficient to locate the problem, but the Helper Functioncount.fieldsYou can further study the problem.

When reading big data grids, efficiency is the most important. Setcomment.char = "", Set each columncolClassesNumber of rows to be readnrows(It is better to overestimate a bit if this parameter is not set) and other measures will improve the efficiency.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.