Functionread.table
It is the most convenient way to read rectangular lattice data. Some functions are preset because there may be many situations. These functions are called.read.table
But some of its default parameters are changed.
Note,read.table
It is not an effective way to read a large numeric matrix: See the followingscan
Function.
Some issues that need to be considered are:
- Encoding Problems
If the file contains non-ASCII character fields, make sure to read them in the correct encoding mode. This is a major problem reading Latin-1 files in the UTF-8's local system. In this case, you can handle the following:
read.table(file("file.dat", encoding="latin1"))
Note that this is run in any local system that can display the Latin-1 Name.
- First line
We recommend that you setheader
Parameters. By convention, the first row has only the fields of the corresponding column but no fields corresponding to the row label. Therefore, it will have one field less than the remaining rows. (If you need to see this line in R, Setheader = TRUE
.) If the file to be read contains a line label header field (which may be null), read
read.table("file.dat", header = TRUE, row.names = 1)
You can usecol.names
Explicitly set. The explicitly set name replaces the column name in the first row (if any ).
- Separator Problems
Generally, when you open a file, you can check the field Separator Used by the file. However, you can select the defaultsep = ""
(It can use any blank characters as separators, such as spaces, tabs, and line breaks ),sep = " "
Orsep = "\t"
. Note that the selection of separators affects the referenced strings.
You must usesep = "\t"
.
- ReferenceBy default, a string can be enclosed by "or", and in both cases, the characters inside the quotation marks are part of the string. Valid reference characters (may not) are set by Parameters
quote
Control. Forsep = "\n"
The default value is changedquote = ""
.If no Delimiter is set, in the string enclosed by quotation marks, the quotation marks must be escaped in the escape mode in the C format, that is, the backslash \ is directly added before the quotation marks \.
If a separator is set, repeat the quotation marks twice in the string enclosed by quotation marks in the workbook to escape the effect. For example
'One string isn''t two',"one more"
Can be read by the following command
read.table("testfile", sep = ",")
This does not work in files with default delimiters.
- Defect ValueBy default, the file is assumed to use
NA
Indicates the defect value. However, this parameter can be usedna.strings
Change. Parametersna.strings
It is a vector that can contain one or more missing character description methods.Empty fields in the value column are also considered as missing values.
In the value column, the valueNaN
,Inf
And-Inf
Are acceptable.
- Rows with null fields at the end
Files exported from a workbook usually end with empty fields (including? Why is it? Ignore. Parameters must be set to read such filesfill = TRUE
.
- Blank in character field
If a Delimiter is set, the white space at the start and end of the character field will be viewed as part of the field. To remove the white space, you can use the Parameterstrip.white = TRUE
.
- Blank line
By default,read.table
Ignore blank rows. You can setblank.lines.skip = FALSE
To change. However, this parameter is onlyfill = TRUE
Valid only when used together. In this case, a blank row may be used to indicate the defective sample in the rule data.
- Variable type
Unless you take special actions,read.table
A suitable type is selected for each variable in the Data box. If the field is not defective and cannot be converted directlylogical
,integer
,numeric
Andcomplex
To determine the field type in sequence. If all these types fail, the variable is converted into a factor.
ParameterscolClasses
Andas.is
Provides great control.as.is
Will suppress the conversion of character vectors to factors (this function only ).colClasses
Run to set the required type for each column in the input.
Note,colClasses
Andas.is
PairEachColumn-specific, notEachVariable. Therefore, it also applies to row label columns (if any ).
- Note
By default,read.table
Use # as a comment to identify characters. If you encounter this character (except in the referenced string), the subsequent content in this row will be ignored. Rows with only blank spaces and comments are treated as blank rows.
If you confirm that there is no comment in the data file, usecomment.char = ""
It will be safer (or faster ).
- Escape
Many operating systems have the habit of using backslashes as escape characters in text files, but Windows systems are an exception (using backslashes in pathnames ). In R, you can set whether this habit is used for data files.
read.table
Andscan
Each has a logical parameter.allowEscapes
. From R 2.2.0, this parameter defaults to "no" and the backslash is the only character interpreted as an escape quote (in the environment described above ). If this parameter is set to yes, escape rules in the form of C are interpreted, that is, the control operator is as follows:\a, \b, \f, \n, \r, \t, \v
The octal and hexadecimal formats are as follows:\040
And\0x2A
Same description. All other escape characters are themselves, including backslash.
Common functionsread.csv
Andread.delim
Isread.table
Set parameters to match the CSV and tab-Separated Files exported from a spreadsheet in the local English language system. Variants corresponding to these two functionsread.csv2
Andread.delim2
It is designed for countries where Commas are used as decimal points.
Ifread.table
Incorrect option settings, error information is usually displayed in the following form
Error in scan(file = file, what = what, sep = sep, : line 1 did not have 5 elements
Or
Error in read.table("files.dat", header = TRUE) : more columns than column names
This information may be sufficient to locate the problem, but the Helper Functioncount.fields
You can further study the problem.
When reading big data grids, efficiency is the most important. Setcomment.char = ""
, Set each columncolClasses
Number of rows to be readnrows
(It is better to overestimate a bit if this parameter is not set) and other measures will improve the efficiency.