International - English

Cart Console

Topic Center

Contact Sales

Home > Others

R----Stringr Package Introduction Learning

Last Update:2016-11-14 Source: Internet

Author: User

Tags index sort stringr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

Stringr Introduction
Stringr Installation
Stringr's API Introduction

1. Stringr Introduction

The STRINGR package is defined as a consistent, easy-to-use string toolset. All functions and parameter definitions are consistent, for example, NA processing and 0-length vector processing in the same way.

String processing is not the main function of the R language, but it is also necessary, data cleansing, visualization and other operations will be used. For the R language itself, the base package provided by the string base function, with the accumulation of time, has become a lot of inconsistencies, non-canonical naming, not standard parameter definitions, it is difficult to take a look at the start of use. String processing is very handy in other languages, and the R language is really lagging behind. Stringr package is to solve this problem, so that the string processing becomes easy to use, provide a friendly string operation interface.

Stringr's Project home: https://cran.r-project.org/web/packages/stringr/index.html

2. Stringr Installation

The system environment used in this article

Win10 64bit
r:3.2.3 x86_64-w64-mingw32/x64 B4bit

Stringr is a standard library published in Cran, which is easy to install, with 2 commands.

~ R> install.packages(‘stringr‘)> library(stringr)

3. Stringr's API Introduction

Stringr Package 1.0.0 version, altogether provides 30 functions, facilitates us to the string processing. The processing of commonly used strings is named after the beginning of Str_, which makes it easier to understand the definition of a function more intuitively. We can classify functions according to usage habits:

string concatenation function

Str_c: string concatenation.
Str_join: string concatenation, same as str_c.
Str_trim: Remove the space and tab of the string (\ t)
Str_pad: The length of the supplemental string
Str_dup: Copying strings
Str_wrap: Control string output format
Str_sub: Intercepting strings
Str_sub<-intercepts strings and assigns values to the same str_sub

String calculation function

Str_count: String Count
Str_length: String length
Str_sort: Sorting String values
Str_order: String index Sort, rule same as Str_sort

String matching function

Str_split: String Segmentation
Str_split_fixed: String segmentation, same as Str_split
Str_subset: Returns a matching string
Word: Extracting words from text
Str_detect: Checking for characters that match a string
Str_match: Extracts a matching group from a string.
Str_match_all: Extracts a matching group from a string, with Str_match
Str_replace: String substitution
Str_replace_all: string substitution, same as Str_replace
Str_replace_na: Substituting na for na string
Str_locate: The location of the matching string is found.
Str_locate_all: Find the location of the matched string, same as Str_locate
Str_extract: Extracting matching characters from a string
Str_extract_all: Extracts matching characters from a string, same as Str_extract

String transformation functions

Str_conv: Character encoding conversion
Str_to_upper: string turns into uppercase
Str_to_lower: string converted to lowercase, rule same as Str_to_upper
Str_to_title: The string is capitalized in the first letter, the rule is the same as Str_to_upper

parameter control functions , which are used only to construct the parameters of the function, cannot be used independently.

Boundary: Defining usage boundaries
Coll: defines the string standard collation.
Fixed: Defines the characters used for matching, including escape characters in regular expressions
Regex: Defining Regular expressions

3.1 String concatenation function

3.1.1 Str_c, string concatenation operation, identical to Str_join, is not exactly the same as the paste () behavior.

function definition:

str_c(..., sep = "", collapse = NULL)str_join(..., sep = "", collapse = NULL)

Parameter list:

...: Multi-parameter input
Sep: Concatenation of multiple strings into a large string, used for the delimiter of the string.
Collapse: Stitching multiple vector parameters into a large string, used for the delimiter of the string.

Stitching multiple strings into a large string.

> str_c(‘a‘,‘b‘)[1] "ab"> str_c(‘a‘,‘b‘,sep=‘-‘)[1] "a-b"> str_c(c(‘a‘,‘a1‘),c(‘b‘,‘b1‘),sep=‘-‘)[1] "a-b"   "a1-b1"

Stitching multiple vector parameters into a large string.

> str_c(head(letters), collapse = "")[1] "abcdef"> str_c(head(letters), collapse = ", ")[1] "a, b, c, d, e, f"# collapse参数，对多个字符串无效> str_c(‘a‘,‘b‘,collapse = "-")   [1] "ab"> str_c(c(‘a‘,‘a1‘),c(‘b‘,‘b1‘),collapse=‘-‘)[1] "ab-a1b1"

When stitching a string vector with Na values, na or NA

> str_c(c("a", NA, "b"), "-d")[1] "a-d" NA    "b-d"

Compare the differences between the str_c () function and the paste () function.

# 多字符串拼接，默认的sep参数行为不一致> str_c(‘a‘,‘b‘)[1] "ab"> paste(‘a‘,‘b‘)[1] "a b"# 向量拼接字符串，collapse参数的行为一致> str_c(head(letters), collapse = "")[1] "abcdef"> paste(head(letters), collapse = "")[1] "abcdef" #拼接有NA值的字符串向量，对NA的处理行为不一致> str_c(c("a", NA, "b"), "-d")[1] "a-d" NA    "b-d"> paste(c("a", NA, "b"), "-d")[1] "a -d"

3.1.2 Str_trim: Remove the space and tab of the string (\ t)

function definition:

str_trim(string, side = c("both", "left", "right"))

Parameter list:

String: Strings, String vectors.
Side: Filter method, both both sides of the filter, left to filter, right filter

Remove the Space and tab of the string (\ t)

#只过滤左边的空格> str_trim("  left space\t\n",side=‘left‘) [1] "left space\t\n"#只过滤右边的空格> str_trim("  left space\t\n",side=‘right‘)[1] "  left space"#过滤两边的空格> str_trim("  left space\t\n",side=‘both‘)[1] "left space"#过滤两边的空格> str_trim("\nno space\n\t")[1] "no space"

3.1.3 Str_pad: Length of supplementary string

function definition:

str_pad(string, width, side = c("left", "right", "both"), pad = " ")

Parameter list:

String: Strings, String vectors.
Width: Length after string padding
Side: Fill direction, both both sides, left padding, right padding
Pad: The character used for padding

The length of the complement string.

# 从左边补充空格，直到字符串长度为20> str_pad("conan", 20, "left")[1] "               conan"# 从右边补充空格，直到字符串长度为20> str_pad("conan", 20, "right")[1] "conan               "# 从左右两边各补充空格，直到字符串长度为20> str_pad("conan", 20, "both")[1] "       conan        "# 从左右两边各补充x字符，直到字符串长度为20> str_pad("conan", 20, "both",‘x‘)[1] "xxxxxxxconanxxxxxxxx"

3.1.4 Str_dup: Copying strings

function definition:

str_dup(string, times)

Parameter list:

String: Strings, String vectors.
Times: Copy Quantity

Copies a string vector.

> val <- c("abca4", 123, "cba2")# 复制2次> str_dup(val, 2)[1] "abca4abca4" "123123"     "cba2cba2"  # 按位置复制> str_dup(val, 1:3)[1] "abca4"        "123123"       "cba2cba2cba2"

3.1.5 Str_wrap, controlling string output format

function definition:

str_wrap(string, width = 80, indent = 0, exdent = 0)

Parameter list:

String: Strings, String vectors.
Width: Sets the length of the row.
Indent: Indent value for first line of paragraph
Exdent: Indent value for non-first line of paragraph

 txt<-‘R语言作为统计学一门语言，一直在小众领域闪耀着光芒。直到大数据的爆发，R语言变成了一门炙手可热的数据分析的利器。随着越来越多的工程背景的人的加入，R语言的社区在迅速扩大成长。现在已不仅仅是统计领域，教育，银行，电商，互联网….都在使用R语言。‘# 设置宽度为40个字符> cat(str_wrap(txt, width = 40), "\n")R语言作为统计学一门语言，一直在小众领域闪耀着光芒。直到大数据的爆发，R语言变成了一门炙手可热的数据分析的利器。随着越来越多的工程背景的人的加入，R语言的社区在迅速扩大成长。现在已不仅仅是统计领域，教育，银行，电商，互联网….都在使用R语言。 # 设置宽度为60字符，首行缩进2字符> cat(str_wrap(txt, width = 60, indent = 2), "\n")  R语言作为统计学一门语言，一直在小众领域闪耀着光芒。直到大数据的爆发，R语言变成了一门炙手可热的数据分析的利器。随着越来越多的工程背景的人的加入，R语言的社区在迅速扩大成长。现在已不仅仅是统计领域，教育，银行，电商，互联网….都在使用R语言。 # 设置宽度为10字符，非首行缩进4字符> cat(str_wrap(txt, width = 10, exdent = 4), "\n")R语言作为    统计学一    门语言，    一直在小    众领域闪    耀着光芒。    直到大数据    的爆发，R    语言变成了    一门炙手可    热的数据分    析的利器。    随着越来    越多的工程    背景的人的    加入，R语    言的社区在    迅速扩大成    长。现在已    不仅仅是统    计领域，教    育，银行，    电商，互联    网….都在使

3.1.6 Str_sub, intercepting strings

function definition:

str_sub(string, start = 1L, end = -1L)

Parameter list:

String: Strings, String vectors.
Start: Starting position
End: Ending position

Intercepts a string.

> txt <- "I am Conan."# 截取1-4的索引位置的字符串> str_sub(txt, 1, 4)[1] "I am"# 截取1-6的索引位置的字符串> str_sub(txt, end=6)[1] "I am C"# 截取6到结束的索引位置的字符串> str_sub(txt, 6)[1] "Conan."# 分2段截取字符串> str_sub(txt, c(1, 4), c(6, 8))[1] "I am C" "m Con" # 通过负坐标截取字符串> str_sub(txt, -3)[1] "an."> str_sub(txt, end = -3)[1] "I am Cona"

Assigns a value to the intercepted string.

> x <- "AAABBBCCC"# 在字符串的1的位置赋值为1> str_sub(x, 1, 1) <- 1; x[1] "1AABBBCCC"# 在字符串从2到-2的位置赋值为2345> str_sub(x, 2, -2) <- "2345"; x[1] "12345C"

3.2 String Calculation function

3.2.1 Str_count, String count

function definition:

str_count(string, pattern = "")

Parameter list:

String: Strings, String vectors.
Pattern: matches the character.

Count of matched characters in a string

> str_count(‘aaa444sssddd‘, "a")[1] 3

Count of matched characters in a string vector

> fruit <- c("apple", "banana", "pear", "pineapple")> str_count(fruit, "a")[1] 1 3 1 1> str_count(fruit, "p")[1] 2 0 1 3

The '. ' In the string Character count, because. is a regular expression of the match, the direct judgment of the result of the count is not correct.

> str_count(c("a.", ".", ".a.",NA), ".")[1]  2  1  3 NA# 用fixed匹配字符> str_count(c("a.", ".", ".a.",NA), fixed("."))[1]  1  1  2 NA# 用\\匹配字符> str_count(c("a.", ".", ".a.",NA), "\\.")[1]  1  1  2 NA

3.2.2 Str_length, String length

function definition:

str_length(string)

Parameter list:

String: Strings, String vectors.

To calculate the length of a string:

> str_length(c("I", "am", "张丹", NA))[1]  1  2  2 NA

3.2.3 Str_sort, sorting string values, sorting with Str_order indexes

function definition:

str_sort(x, decreasing = FALSE, na_last = TRUE, locale = "", ...)str_order(x, decreasing = FALSE, na_last = TRUE, locale = "", ...)

Parameter list:

X: String, String vector.
Decreasing: Sort direction.
The location of the Na_last:na value, a total of 3 values, true put to the last, false put to the front, NA filter processing
Locale: Sort by which language you are accustomed to

Sorts the string values.

# 按ASCII字母排序> str_sort(c(‘a‘,1,2,‘11‘), locale = "en")  [1] "1"  "11" "2"  "a" # 倒序排序> str_sort(letters,decreasing=TRUE)          [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j" "i" "h"[20] "g" "f" "e" "d" "c" "b" "a"# 按拼音排序> str_sort(c(‘你‘,‘好‘,‘粉‘,‘丝‘,‘日‘,‘志‘),locale = "zh")  [1] "粉" "好" "你" "日" "丝" "志"

Sort processing of NA values

 #把NA放最后面> str_sort(c(NA,‘1‘,NA),na_last=TRUE) [1] "1" NA  NA #把NA放最前面> str_sort(c(NA,‘1‘,NA),na_last=FALSE) [1] NA  NA  "1"#去掉NA值 > str_sort(c(NA,‘1‘,NA),na_last=NA)    [1] "1"

3.3 String Matching function

3.3.1 Str_split, String segmentation, with str_split_fixed

function definition:

str_split(string, pattern, n = Inf)str_split_fixed(string, pattern, n)

Parameter list:

String: Strings, String vectors.
Pattern: matches the character.
N: Number of partitions

Splits a string.

> val <- "abc,123,234,iuuu"# 以,进行分割> s1<-str_split(val, ",");s1[[1]][1] "abc"  "123"  "234"  "iuuu"# 以,进行分割，保留2块> s2<-str_split(val, ",",2);s2[[1]][1] "abc"          "123,234,iuuu"# 查看str_split()函数操作的结果类型list> class(s1)[1] "list"# 用str_split_fixed()函数分割，结果类型是matrix> s3<-str_split_fixed(val, ",",2);s3     [,1]  [,2]          [1,] "abc" "123,234,iuuu"> class(s3)[1] "matrix"

3.3.2 Str_subset: Matching string returned

function definition:

str_subset(string, pattern)

Parameter list:

String: Strings, String vectors.
Pattern: matches the character.

> val <- c("abc", 123, "cba")# 全文匹配> str_subset(val, "a")[1] "abc" "cba"# 开头匹配> str_subset(val, "^a")[1] "abc"# 结尾匹配> str_subset(val, "a$")[1] "cba"

3.3.3 Word, extracting words from text

function definition:

word(string, start = 1L, end = start, sep = fixed(" "))

Parameter list:

String: Strings, String vectors.
Start: Starting position.
End: the ending position.
Sep: Matches the character.

> val <- c("I am Conan.", "http://fens.me, ok")# 默认以空格分割，取第一个位置的字符串> word(val, 1)[1] "I"               "http://fens.me,"> word(val, -1)[1] "Conan." "ok"    > word(val, 2, -1)[1] "am Conan." "ok"       # 以,分割，取第一个位置的字符串 > val<-‘111,222,333,444‘> word(val, 1, sep = fixed(‘,‘))[1] "111"> word(val, 3, sep = fixed(‘,‘))[1] "333"

3.3.4 Str_detect matches the character of a string

function definition:

str_detect(string, pattern)

Parameter list:

String: Strings, String vectors.
Pattern: Match character.

> val <- c("abca4", 123, "cba2")# 检查字符串向量，是否包括a> str_detect(val, "a")[1]  TRUE FALSE  TRUE# 检查字符串向量，是否以a为开头> str_detect(val, "^a")[1]  TRUE FALSE FALSE# 检查字符串向量，是否以a为结尾> str_detect(val, "a$")[1] FALSE FALSE FALSE

3.3.6 Str_match, extracting matching groups from a string

function definition:

str_match(string, pattern)str_match_all(string, pattern)

Parameter list:

String: Strings, String vectors.
Pattern: Match character.

Extracting a matching group from a string

> val <- c("abc", 123, "cba")# 匹配字符a，并返回对应的字符> str_match(val, "a")     [,1][1,] "a" [2,] NA  [3,] "a" # 匹配字符0-9，限1个，并返回对应的字符> str_match(val, "[0-9]")     [,1][1,] NA  [2,] "1" [3,] NA  # 匹配字符0-9，不限数量，并返回对应的字符> str_match(val, "[0-9]*")     [,1] [1,] ""   [2,] "123"[3,] ""

Extracts a matching group from a string and returns it in the string matrix format

> str_match_all(val, "a")[[1]]     [,1][1,] "a" [[2]]     [,1][[3]]     [,1][1,] "a" > str_match_all(val, "[0-9]")[[1]]     [,1][[2]]     [,1][1,] "1" [2,] "2" [3,] "3" [[3]]     [,1]

3.3.7 Str_replace, String substitution

function definition:

str_replace(string, pattern, replacement)

Parameter list:

String: Strings, String vectors.
Pattern: Match character.
Replacement: The character used for substitution.

> val <- c("abc", 123, "cba")# 把目标字符串第一个出现的a或b，替换为-> str_replace(val, "[ab]", "-")[1] "-bc" "123" "c-a"# 把目标字符串所有出现的a或b，替换为-> str_replace_all(val, "[ab]", "-")[1] "--c" "123" "c--"# 把目标字符串所有出现的a，替换为被转义的字符> str_replace_all(val, "[a]", "\1\1")[1] "\001\001bc" "123"        "cb\001\001"

3.3.8 Str_replace_na to replace na with Na string

function definition:

str_replace_na(string, replacement = "NA")

Parameter list:

String: Strings, String vectors.
Replacement: The character used for substitution.

Replace Na with a string

> str_replace_na(c(NA,‘NA‘,"abc"),‘x‘)[1] "x"   "NA"  "abc"

3.3.9 Str_locate, finds the position of the pattern in the string.

function definition:

str_locate(string, pattern)str_locate_all(string, pattern)

Parameter list:

String: Strings, String vectors.
Pattern: Match character.

> val <- c("abca", 123, "cba")# 匹配a在字符串中的位置> str_locate(val, "a")     start end[1,]     1   1[2,]    NA  NA[3,]     3   3# 用向量匹配> str_locate(val, c("a", 12, "b"))     start end[1,]     1   1[2,]     1   2[3,]     2   2# 以字符串matrix格式返回> str_locate_all(val, "a")[[1]]     start end[1,]     1   1[2,]     4   4[[2]]     start end[[3]]     start end[1,]     3   3# 匹配a或b字符，以字符串matrix格式返回> str_locate_all(val, "[ab]")[[1]]     start end[1,]     1   1[2,]     2   2[3,]     4   4[[2]]     start end[[3]]     start end[1,]     2   2[2,]     3   3

3.3.10 Str_extract extracting a matching pattern from a string

function definition:

str_extract(string, pattern)str_extract_all(string, pattern, simplify = FALSE)

Parameter list:

String: Strings, String vectors.
Pattern: Match character.
Simplify: Return value, True returns matrix,false return string vector

> val <- c("abca4", 123, "cba2")# 返回匹配的数字> str_extract(val, "\\d")[1] "4" "1" "2"# 返回匹配的字符> str_extract(val, "[a-z]+")[1] "abca" NA     "cba" > val <- c("abca4", 123, "cba2")> str_extract_all(val, "\\d")[[1]][1] "4"[[2]][1] "1" "2" "3"[[3]][1] "2"> str_extract_all(val, "[a-z]+")[[1]][1] "abca"[[2]]character(0)[[3]][1] "cba"

3.4 String Transformation functions

3.4.1 Str_conv: Character encoding conversion

function definition:

str_conv(string, encoding)

Parameter list:

String: Strings, String vectors.
Encoding: the encoding name.

Transcoding of Chinese is handled.

# 把中文字符字节化> x <- charToRaw(‘你好‘);x[1] c4 e3 ba c3# 默认win系统字符集为GBK，GB2312为GBK字集，转码正常> str_conv(x, "GBK")[1] "你好"> str_conv(x, "GB2312")[1] "你好"# 转UTF-8失败> str_conv(x, "UTF-8")[1] "???"Warning messages:1: In stri_conv(string, encoding, "UTF-8") :  input data \xffffffc4 in current source encoding could not be converted to Unicode2: In stri_conv(string, encoding, "UTF-8") :  input data \xffffffe3\xffffffba in current source encoding could not be converted to Unicode3: In stri_conv(string, encoding, "UTF-8") :  input data \xffffffc3 in current source encoding could not be converted to Unicode

Turn Unicode into UTF-8

> x1 <- "\u5317\u4eac"> str_conv(x1, "UTF-8")[1] "北京"

3.4.2 Str_to_upper, string capitalization conversion.

function definition:

str_to_upper(string, locale = "")str_to_lower(string, locale = "")str_to_title(string, locale = "")

Parameter list:

String: Strings.
Locale: Sort by which language you are accustomed to

String Capitalization conversions:

> val <- "I am conan. Welcome to my blog! http://fens.me"# 全大写> str_to_upper(val)[1] "I AM CONAN. WELCOME TO MY BLOG! HTTP://FENS.ME"# 全小写> str_to_lower(val)[1] "i am conan. welcome to my blog! http://fens.me"# 首字母大写> str_to_title(val)[1] "I Am Conan. Welcome To My Blog! Http://Fens.Me"

String is often used in the usual data processing, need to divide, connect, transform, and so on, in this article through the introduction of STRINGR, flexible string processing library, can effectively improve the coding efficiency. With good tools, working with strings in the R language is handy.

reproduced in: R Language String processing package Stringr

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

Common features: # Merge strings Fruit <-C ("Apple", "banana", "pear", "pinapple") Res <-str_c (1:4,fruit,sep= ", collapse=") str_c (' I w  Ant to buy ', res,collapse= ') # Calculates the string length Str_length (c ("I", "like", "Programming R", 123,res)) # take substring by position str_sub (fruit, 1, 3) # substring re-assigned capital <-toupper (Str_sub (fruit,1,1)) str_sub (fruit, Rep (1,4), Rep (1,4)) <-Capital # repeating string Str_dup (fruit , C (1,2,3,4)) # Plus blank str_pad (fruit, ten, "both") # Remove blank Str_trim (fruit) #  Check if match str_detect (fruit, "a$") according to regular expression str_ Detect (Fruit, "[Aeiou]") # find the matching string position str_locate (fruit, "a") # Extract the matching section str_extract (fruit, "[a-z]+") Str_match (Fruit, "[ a-z]+ ") # replaces matched portions of str_replace (fruit," [Aeiou] ","-") # Split Str_split (res," ")

R----Stringr Package Introduction Learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

coursera introduction to machine learning datacamp introduction to r introduction to r datacamp learning kali linux introduction to penetration testing introduction to machine learning second edition install xml package r dplyr r package

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

R----Stringr Package Introduction Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support