I. Batch build folder, bulk Read folder name
Today, the work encountered a problem: Boss gave us more than 200 companies ID code (such as 6007, 7920, etc.), need to search for the download news according to these ID numbers, so that the download to the news to the company ID generated under the specified folder! So the first problem is that more than 200 companies, it is impossible to download every time, you have to click: Create a new folder, and then name the folder, too boring and monotonous. Hey, lucky to have R's company! Let's do it and see how R works!
The main use of two functions
I. dir.create-generate folder two. dir-Read folder name
First, read the ID number (previously stored in the TXT file)
> filenames<-read.table (' c:/users/wb-tangyang.b/desktop/key.txt ')
# to generate such a column
> Filenames[1,1][1] 16101
Then, build the folder in batches based on the elements in filename!
>i=1 # Specify the initial line number! >while (i<=84) {# 84 is the number of folders that can be specified here! Dir.create (As.character (filenames[i,1)) i=i+1} |
Finally, take a look! The folder is all in place, haha! Nice bar
So how do you do that in turn? That is, you want to read the name of the N multiple folders in the computer for later use, just a command dir!
> Filename<-dir (' e:/work/news data download/tynews2014/company key ')
The resulting filename is a character vector, and we need to convert it to a numeric type, and there is a column in the form!
> Filename<-as.numeric (fileName) > Matrix (data = Filename,nrow = 138,ncol = 1) |
Yeah, this is what we want!
Two. File System Management with R
Preface
The R language, as a scripting language, has a set of function functions for file system management, as well as a python-like system management function. This article will describe in detail the file system management of the R language. Why don't you try something?
Catalogue
- File System Introduction
- Directory Operations
- File operations
- A few special directories
|
1. File System Introduction
A computer's file system is a way to store and organize computer data, making it easy to access and find, and the file system uses abstract logic concepts for files and tree catalogs to replace the concept of using chunks of physical devices such as hard disks and optical discs. Users can use the file system to save data without having to worry about how much data is actually stored on the hard disk (or CD) with the address of the data block, just remember the directory and file name of the file. Before the new data is written, the user does not have to care that the block address on the hard disk is not being used, and the storage space management (allocation and deallocation) function on the hard disk is automatically completed by the file system, and the user only has to remember which file the data is written to.
The R language, like other programming languages, has operations on the file system, including file operations and directory operations, which are defined in the base package.
2. Directory Operations
System environment: Linux:ubuntu Server 12.04.2 LTS 64bit; r:3.1.0 X86_64-pc-linux-gnu
2.1 View CatalogView subdirectories under the current directory.
# 启动R程序~ R# 当前的目录> getwd()[1] "/home/conan/R/fs"# 查看当前目录的子目录> list.dirs()[1] "." "./tmp"
View subdirectories and files for the current directory.
> dir () [1] "readme.txt" "tmp" # View subdirectories and files for the specified directory. > dir (path= "/home/conan/r") [1] "A.txt" "Catools" [3] "Chinaweather" " Demorjava "[5]" env "" Fastrweb "[7]" Font "" FS "[9]" GitHub " "Lineprof" [One] "Pryr" "Readme.txt" [] "Rmysql" "Rserve" [[] "Rstudio-server-0.97.551-amd64.deb" "websockets" [+] "x86_64-pc-linux-gnu-library" # lists only subdirectories or files that start with the letter R & Gt Dir (path= "/home/conan/r", pattern= ' ^r ') [1] "Rmysql" "Rserve" # Lists all directories and files in the directory, including hidden files, such as. a.txt> dir (path= "/home/conan/r", All.files=true) [1] "." ".." [3] ". A.txt "" A.txt "[5]" Catools "" Chinaweather "[7]" Demorjava " "ENV" [9] "fastrweb" "Font" [One] "FS" "GitHub" [13] " Lineprof " "Pryr" [[] "Readme.txt" "Rmysql" [+] "Rserve" "rstudio-server-0.97. 551-amd64.deb "[+]" websockets "" X86_64-pc-linux-gnu-library "
View subdirectories and files of the current directory, with the Dir () function.
> list.files()[1] "readme.txt" "tmp"> list.files(".",all.files=TRUE)[1] "." ".." "readme.txt" "tmp"
View the complete catalog information.
# 查看当前目录权限> file.info(".") size isdir mode mtime ctime atime uid gid uname grname. 4096 TRUE 775 2013-11-14 08:40:46 2013-11-14 08:40:46 2013-11-14 08:41:57 1000 1000 conan conan# 查看指定目录权限> file.info("./tmp") size isdir mode mtime ctime atime uid gid uname grname./tmp 4096 TRUE 775 2013-11-14 14:35:56 2013-11-14 14:35:56 2013-11-14 14:35:56 1000 1000 conan conan
2.2 Creating a directory
# 在当前目录下,新建一个目录> dir.create("create")> list.dirs()[1] "." "./create" "./tmp"
Create a 3 level subdirectory./a1/b2/c3
# 直接创建,出错
> dir.create(path="a1/b2/c3")
Warning message:
In dir.create(path = "a1/b2/c3") :
cannot create dir ‘a1/b2/c3‘, reason ‘No such file or directory‘
# 递归创建,成功
> dir.create(path="a1/b2/c3",recursive = TRUE)
> list.dirs()
[1] "." "./a1" "./a1/b2" "./a1/b2/c3" "./create" "./tmp"
# 通过系统命令查看目录结构
> system("tree")
.
├── a1
│ └── b2
│ └── c3
├── create
├── readme.txt
└── tmp
2.3 Checking If a directory exists
# 目录存在> file.exists(".")[1] TRUE> file.exists("./a1/b2")[1] TRUE# 目录不存在> file.exists("./aa")[1] FALSE
2.4 Checking the permissions of a directory
Check permissions for a directory
> df<-dir(full.names = TRUE)# 检查文件或目录是否存在,mode=0> file.access(df, 0) == 0./a1 ./create ./readme.txt ./tmpTRUE TRUE TRUE TRUE# 检查文件或目录是否可执行,mode=1,目录为可以执行> file.access(df, 1) == 0./a1 ./create ./readme.txt ./tmpTRUE TRUE FALSE TRUE# 检查文件或目录是否可写,mode=2> file.access(df, 2) == 0./a1 ./create ./readme.txt ./tmpTRUE TRUE TRUE TRUE# 检查文件或目录是否可读,mode=4> file.access(df, 4) == 0./a1 ./create ./readme.txt ./tmpTRUE TRUE TRUE TRUE
Modify directory permissions.
# 修改目录权限,所有用户只读> Sys.chmod("./create", mode = "0555", use_umask = TRUE)# 查看目录完整信息,mode=555> file.info("./create") size isdir mode mtime ctime atime uid gid uname grname./create 4096 TRUE 555 2013-11-14 08:36:28 2013-11-14 09:07:05 2013-11-14 08:36:39 1000 1000 conan conan# create目录不可以写> file.access(df, 2) == 0./a1 ./create ./readme.txt ./tmpTRUE FALSE TRUE TRUE
2.5 Duplicate names of directories
# 对tmp目录重命名> file.rename("tmp", "tmp2")[1] TRUE# 查看目录> dir()[1] "a1" "create" "readme.txt" "tmp2"
2.6 Deleting a directory
# 删除tmp2目录> unlink("tmp2", recursive = TRUE)# 查看目录> dir()[1] "a1" "create" "readme.txt"
2.7 Other function functionsStitching directory Strings
# 拼接目录字符串> file.path("p1","p2","p3")[1] "p1/p2/p3"> dir(file.path("a1","b2"))[1] "c3"
Get the bottommost subdirectory name
# 当前目录> getwd()[1] "/home/conan/R/fs"# 最底层子目录> dirname("/home/conan/R/fs/readme.txt")[1] "/home/conan/R/fs"# 最底层子目录或文件名> basename(getwd())[1] "fs"> basename("/home/conan/R/fs/readme.txt")[1] "readme.txt"
Convert file extension path
# 转换~为用户目录> path.expand("~/foo")[1] "/home/conan/foo"
Normalized path, used to convert the path delimiter of Win or Linux
# linux> normalizePath(c(R.home(), tempdir()))[1] "/usr/lib/R" "/tmp/RtmpqNyjPD"# win> normalizePath(c(R.home(), tempdir()))[1] "C:\\Program Files\\R\\R-3.0.1"[2] "C:\\Users\\Administrator\\AppData\\Local\\Temp\\RtmpMtSnci"
Short path, reduce the display length of the path, only run in win.
# win> shortPathName(c(R.home(), tempdir()))[1] "C:\\PROGRA~1\\R\\R-30~1.1"[2] "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\RTMPMT~1"
3. File operation
3.1 Viewing Files
> dir()[1] "create" "readme.txt"# 检查文件是否存在> file.exists("readme.txt")[1] TRUE# 文件不存在> file.exists("readme.txt222")[1] FALSE# 查看文件完整信息> file.info("readme.txt") size isdir mode mtime ctime atime uid gid uname grnamereadme.txt 7 FALSE 664 2013-11-14 08:24:50 2013-11-14 08:24:50 2013-11-14 08:24:50 1000 1000 conan conan# 查看文件访问权限,存在> file.access("readme.txt",0)readme.txt 0
# 不可执行> file.access("readme.txt",1)readme.txt -1
# 可写> file.access("readme.txt",2)readme.txt 0# 可读> file.access("readme.txt",4)readme.txt 0# 查看一个不存在的文件访问权限,不存在> file.access("readme.txt222")readme.txt222 -1
Determine whether it is a file or a directory.
# 判断是否是目录> file_test("-d", "readme.txt")[1] FALSE> file_test("-d", "create")[1] TRUE# 判断是否是文件> file_test("-f", "readme.txt")[1] TRUE> file_test("-f", "create")[1] FALSE
3.2 Creating a file
# 创建一个空文件 A.txt> file.create("A.txt")[1] TRUE# 创建一个有内容的文件 B.txt> cat("file B\n", file = "B.txt")> dir()[1] "A.txt" "B.txt" "create" "readme.txt"# 打印A.txt> readLines("A.txt")character(0)# 打印B.txt> readLines("B.txt")[1] "file B"
Merge the contents of the file B.txt into the A.txt.
# 合并文件> file.append("A.txt", rep("B.txt", 10)) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE# 查看文件内容> readLines("A.txt") [1] "file B" "file B" "file B" "file B" "file B" "file B" "file B" "file B" "file B" "file B"
Copy the file A.txt to the file C.txt
# 复制文件> file.copy("A.txt", "C.txt")[1] TRUE# 查看文件内容> readLines("C.txt") [1] "file B" "file B" "file B" "file B" "file B" "file B" "file B" "file B" "file B" "file B"
3.3 Modifying file Permissions
# 修改文件权限,创建者可读可写可执行,其他人无权限> Sys.chmod("A.txt", mode = "0700", use_umask = TRUE)# 查看文件信息> file.info("A.txt") size isdir mode mtime ctime atime uid gid uname grnameA.txt 70 FALSE 700 2013-11-14 12:55:18 2013-11-14 12:57:39 2013-11-14 12:55:26 1000 1000 conan conan
3.4 File renaming
# 给文件A.txt重命名为AA.txt> file.rename("A.txt","AA.txt")[1] TRUE> dir()[1] "AA.txt" "B.txt" "create" "C.txt" "readme.txt"
3.5 Hard connections and soft connections
A hard connection, which refers to a connection through an index node. In a Linux file system, a file saved in a disk partition, regardless of the type, assigns a number to it, called the index node number (Inode index). In Linux, multiple file names point to the same index node that exists. In general, this connection is a hard connection. The purpose of a hard connection is to allow a file to have multiple valid pathname, so that users can establish a hard connection to important files to prevent "accidental deletion" of the function. The reason for this is as above, because there is more than one connection to the index node that should be the directory. Deleting only one connection does not affect the index node itself and other connections, and the connection to the file's data block and directory will be released only if the last connection is deleted. That is, the condition that the file is actually deleted is that all the hard connection files associated with it are deleted.
A soft connection, also called a symbolic connection (symbolic link). A soft-link file has a shortcut similar to Windows. It's actually a special file. In a symbolic connection, a file is actually a text file that contains location information for another file. Hard and soft connections, only used in Linux systems.
# 硬连接> file.link("readme.txt", "hard_link.txt")[1] TRUE# 软连接> file.symlink("readme.txt", "soft_link.txt")[1] TRUE# 查看文件目录> system("ls -l")-rwx------ 1 conan conan 70 Nov 14 12:55 AA.txt-rw-rw-r-- 1 conan conan 7 Nov 14 12:51 B.txtdr-xr-xr-x 2 conan conan 4096 Nov 14 08:36 create-rw-rw-r-- 1 conan conan 70 Nov 14 12:56 C.txt-rw-rw-r-- 2 conan conan 7 Nov 14 08:24 hard_link.txt-rw-rw-r-- 2 conan conan 7 Nov 14 08:24 readme.txtlrwxrwxrwx 1 conan conan 10 Nov 14 13:11 soft_link.txt -> readme.txt
File Hard_link.txt is a file readme.txt hard connect file, file Soft_link.txt is a file Readme.txt soft connection file,
3.5 Deleting files
There are two functions that can use File.remove and unlink, where the unlink function is the same as the delete directory operation.
# 删除文件> file.remove("A.txt", "B.txt", "C.txt")[1] FALSE TRUE TRUE# 删除文件> unlink("readme.txt")# 查看目录文件> system("ls -l")total 12-rwx------ 1 conan conan 70 Nov 14 12:55 AA.txtdr-xr-xr-x 2 conan conan 4096 Nov 14 08:36 create-rw-rw-r-- 1 conan conan 7 Nov 14 08:24 hard_link.txtlrwxrwxrwx 1 conan conan 10 Nov 14 13:11 soft_link.txt -> readme.txt# 打印硬连接文件> readLines("hard_link.txt")[1] "file A"# 打印软连接文件,soft_link.txt,由于原文件被删除,有错误> readLines("soft_link.txt")Error in file(con, "r") : cannot open the connectionIn addition: Warning message:In file(con, "r") : cannot open file ‘soft_link.txt‘: No such file or directory
4. A few special directories
- &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NB sp; R.home () View related directories for R software
- & nbsp; . Library View the contents of the R core package
- & nbsp; . Library.site View the directory of the R core package and the root user installation package directory
- . Libpaths () View the storage directory for all packages of R
- & nbsp; System.file () View the directory where the specified package is located
|
4.1 r.home () see the relevant directories for R software
# 打印R软件安装目录> R.home()[1] "/usr/lib/R"# 打印R软件bin的目录> R.home(component="bin")[1] "/usr/lib/R/bin"# 打印R软件文件的目录> R.home(component="doc")[1] "/usr/share/R/doc"
The location of the R file is found through the system command.
# 检查系统中R文件的位置~ whereis RR: /usr/bin/R /etc/R /usr/lib/R /usr/bin/X11/R /usr/local/lib/R /usr/share/R /usr/share/man/man1/R.1.gz# 打印环境变量R_HOME~ echo $R_HOME/usr/lib/R
With the R.home () function, we can easily locate the directory of R software.
4.2 Package directory for R software
# 打印核心包的目录> .Library[1] "/usr/lib/R/library"# 打印核心包的目录和root用户安装包目录> .Library.site[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"[3] "/usr/lib/R/library"# 打印所有包的存放目录> .libPaths()[1] "/home/conan/R/x86_64-pc-linux-gnu-library/3.0"[2] "/usr/local/lib/R/site-library"[3] "/usr/lib/R/site-library"[4] "/usr/lib/R/library"
4.3 Viewing the directory where the specified package resides
# base包的存放目录> system.file()[1] "/usr/lib/R/library/base"# pryr包的存放目录> system.file(package = "pryr")[1] "/home/conan/R/x86_64-pc-linux-gnu-library/3.0/pryr"
In fact, operating the file system with the R language is still very convenient. But for the function naming is really not very normative, we need to spend time to remember
Brief summary (below the incomplete, another day to complete)
RM (List=ls ()) Path = ' j:/lab/ex29--file (clip) operation in R language ' SETWD (PATH) Cat ("File a\n", file= "a") #创建一个文件A, the file content is ' files A ', ' \ n ' means line wrapping, which is a good habit Cat ("File b\n", file= "B") #创建一个文件B File.append ("A", "B") #将文件B的内容附到A内容的后面, note that there are no blank lines File.create ("A") #创建一个文件A, notice that the original file is overwritten File.append ("A", Rep ("B", "Ten)") #将文件B的内容复制10便, and appended to document A content File.show ("A") #新开工作窗口显示文件A的内容 File.Copy ("A", "C") #复制文件A保存为C文件, same folder Dir.create ("tmp") #创建名为tmp的文件夹 File.Copy (C ("A", "B"), "tmp") #将文件夹拷贝到tmp文件夹中 List.files ("tmp") #查看文件夹tmp中的文件名 Unlink ("tmp", recursive=f) #如果文件夹tmp为空, delete folder tmp Unlink ("tmp", recursive=true) #删除文件夹tmp if one of the files is deleted File.remove ("A", "B", "C") #移除三个文件 |
r8-Batch Build folder, bulk Read folder name +r file Management system operation function