R language-Regular Expressions and Regular Expressions
Definition
Regular ExpressionIs a logical formula for string operations.
Target object
The target object of a regular expression isText.
Function
* Logical Filtering
* Precise capturing
Feature syntax rules
\ Escape characters
Any character except line breaks
^ Put it at the beginning of the sentence, indicating the start of a line of strings
$ Is placed at the end of a sentence, indicating the end of a string.
* Zero or multiple previous characters
+ One or more previous characters
? Zero or a previous character
Square brackets [] indicate that any character can be matched. ^ Represents "not" in [], and-represents ""
-[Qjk]: any character in q, j, and k
-[^ Qjk]: any other character other than q, j, and k
-[A-z]: Any lowercase character in a to z
-[^ A-z]: other characters other than any lowercase letters a to z (can be uppercase characters)
-[A-zA-Z]: any English letter
-[A-z] +: one or more lower-case English letters
| Or
Parentheses () and braces {} are used together with "|"
Note:Escape Character \ is required to escape all reserved characters
For example:
Meanings of Common special escape characters
\ N: linefeed
\ T: tab
\ W: Any letter (including underscores) or number is [a-zA-Z0-9 _]
\ W: \ w antsense is [^ a-zA-Z0-9 _]
\ D: Any number [0-9]
\ D: \ d's antsense is [^ 0-9]
\ S: any space, such as space, tab, newline, etc.
\ S: \ s. Any non-space
Common Regular Expression Functions
Grepl: returns a logical value.
Grep: returns the matched id,
Agrep: returns the matched id,
Regular Expression replacement: sub and gsub
The differences between the two are as follows:
# Replace B with Bgsub (pattern = "B", replacement = "B", x = "baby") [1] "BaBy" gsub (pattern = "B ", replacement = "B", x = c ("abcb", "boy", "baby ")) [1] "aBcB" "Boy" "BaBy" # Replace only the first bsub (pattern = "B", replacement = "B", x = "baby ") [1] "Baby" sub (pattern = "B", replacement = "B", x = c ("abcb", "baby ")) [1] "aBcb" "Baby"
Regexpr: returns a number. 1 indicates matching,-1 indicates not matching, and two attributes. The length of the matching and whether useBytes is used
Regexec: returns a list, the first matching string, its length, and whether useBytes is used.
Gregexpr: returns a list. Each matching string and its length, and whether useBytes is used.